sen-material: comparison handouts/ho09.tex

equal deleted inserted replaced

-:8e5e84b14041
+:619073c37649
 \usepackage{../style}
 \usepackage{../langs}
 \usepackage{../graphics}
 \usepackage{../grammar}
 \usepackage{multicol}
-\usepackage{array}
 \begin{document}
 \section*{Handout 9 (Static Analysis)}
 with real programming languages. The language we will study
 contains, amongst other things, variables holding integers.
 Using static analysis, we want to find out what the sign of
 these integers (positive or negative) will be when the program
 runs. This sign-analysis seems like a very simple problem. But
-teven such simple problems, if approached naively, are in
+even such simple problems, if approached naively, are in
 general undecidable, just like Turing's halting problem. I let
 you think why?
 Is sign-analysis of variables an interesting problem? Well,
 yes---if a compiler can find out that for example a variable
 will never be negative and this variable is used as an index
-for an array, then the comopiler does not need to generate code
+for an array, then the compiler does not need to generate code
 for an underflow-test. Remember some languages are immune to
 buffer-overflow attacks, but they need to add underflow and
 overflow checks everywhere. If the compiler can omit the
 underflow test, for example, then this can potentially
-drastically speed up the generated code.
+drastically speed up the generated code. According to the John
+Regehr, an expert in the field of compilers, overflow checks
+can cause 5-10\% slowdown, and in some languages even 100\%
+for tight loops.\footnote{\url{http://blog.regehr.org/archives/1154}}
 What do programs in our simple programming language look like?
 The following grammar gives a first specification:
 \begin{multicols}{2}
 and \emph{expressions} as well as \emph{programs}, which are
 sequences of statements. Statements are either labels,
 variable assignments, conditional jumps (\pcode{jmp?}) and
 unconditional jumps (\pcode{goto}). Labels are just strings,
 which can be used as the target of a jump. We assume that in
-every program the labels are unique---if there is a clash we
+every program the labels are unique---if there is a clash,
-do not know where to jump to. The conditional jumps and
+then we do not know where to jump to. The conditional jumps
-variable assignments involve (arithmetic) expressions.
+and variable assignments involve (arithmetic) expressions.
 Expressions are either numbers, variables or compound
 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
 (for simplicity reasons we do not consider any other
 operations). We assume we have negative and positive numbers,
 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
 real code basic number formats have a range and might
 over-flow or under-flow from this range. Also the number of
 variables in our programs is potentially unlimited, while
 memory in an actual computer, of course, is always limited
 somehow on any actual. To sum up, our language might look
-ridiculously simple, but it is not much removed from
+ridiculously simple, but it is not far removed from
 practically relevant issues.
 \subsubsection*{An Interpreter}
 \noindent How should they work? We could introduce Booleans
 (\pcode{true} and \pcode{false}) and then jump only when the
 condition is \pcode{true}. However, since we have numbers in
 our language anyway, why not just encoding \pcode{true} as
 one, and \pcode{false} as zero? In this way we can dispense
-with the additional concept of Booleans, {\bf but also we could
+with the additional concept of Booleans.
-replace the jump above by
-\begin{center}
-\code{jmp? 1 + (n + -n) done}
-\end{center}
-\noindent which behaves exactly the same. But what does it
-actually mean that two jumps behave the same? Or two programs
-for that matter?}
 I hope the above discussion makes it already clear we need to
 be a bit more careful with our programs. Below we shall
 describe an interpreter for our programming language, which
 specifies exactly how programs are supposed to be
 run\ldots{}at least we will specify this for all \emph{good}
 programs. By good programs I mean where all variables are
 initialised, for example. Our interpreter will just crash if
 it cannot find out the value for a variable, in case it is not
-initialised. Also we will assume that labels in good programs
+initialised. Also, we will assume that labels in good programs
 are unique, otherwise our programs will calculate ``garbage''.
 First we will pre-process our programs. This will simplify the
 definition of our interpreter later on. We will transform
-programs into \emph{snippets}. A snippet is a label and all
+programs into \emph{snippets}. Their purpose is to simplify
-code that comes after the label. This essentially means a
+the definition of what the interpreter should do in case of a
-snippet is a \emph{map} from labels to code. Given that
+jump. A snippet is a label and all code that comes after the
-programs are sequences (or lists) of statements, we can easily
+label. This essentially means a snippet is a \emph{map} from
-calculate the snippets by just recursively generating the map.
+labels to code. Given that programs are sequences (or lists)
+of statements, we can easily calculate the snippets by just
+traversing this sequence and recursively generating the map.
 Suppose a program is of the general form
 \begin{center}
 $stmt_1\;stmt_2\; \ldots\; stmt_n$
 \end{center}
-\noindent In order to calculate the snippets of such a
+\noindent The idea is to go through this sequence of
-program, we have to go through the statements one by one and
+statements one by one and check whether they are a label. If
-check whether they are a label. If yes, we add the label and
+yes, we add the label and the remaining statements to our map.
-the remaining statements to our map. If no we just continue
+If no, we just continue with the next statement. To come up
-with the next statement. To come up with a recursive
+with a recursive definition for generating snippets, let us
-definition for generating snippets, let us write $[]$ for the
+write $[]$ for the program that does not contain any
-program that does not contain any statement. Consider the
+statement. Consider the following definition:
-following definition:
 \begin{center}
 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
 \end{cases}$
 \end{tabular}
 \end{center}
 \noindent In the first clause we just return the empty map for
-the program that does not contain any statements. In the
+the program that does not contain any statement. In the second
-second clause, we have to distinguish the case where the first
+clause, we have to distinguish the case where the first
-statement is a label or not. If not, then we just ``throw
+statement is a label or not. As said before, if not, then we
-away'' the label and recursively calculate the snippets for
+just ``throw away'' the label and recursively calculate the
-the rest of the program. If yes, then we do the same, but also
+snippets for the rest of the program. If yes, then we do the
-update the map so that $label$ now points to the rest of the
+same, but also update the map so that $label$ now points to
-statements. There is one small problem we need to overcome:
+the rest of the statements. There is one small problem we need
-our two programs have no label as entry point---that is where
+to overcome: our two programs have no label as \emph{entry
-the execution starts. We always assume that the first
+point}---that is where the execution starts. We usually assume
-statement will be run first. For us it seems convenient if
+that the first statement will be run first. To make this the
-we add to all our programs a default label, say
+default, it is convenient if we add to all our programs a
-\pcode{""} (the empty string). With this we can define
+default label, say \pcode{""} (the empty string). With this we
-our preprocessing of programs as follows
+can define our pre-processing of programs as follows
 \begin{center}
 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$
 \end{center}
 \noindent Let us see how this pans out in practice. If we
-preprocess the factorial program shown earlier we obtain the
+pre-process the factorial program shown earlier, we obtain the
 following map:
 \begin{center}
 \small
 \lstset{numbers=none,
 language={},
 xleftmargin=0mm,
 aboveskip=0.5mm,
+belowskip=0.5mm,
 frame=single,
 framerule=0mm,
 framesep=0mm}
 \begin{tikzpicture}[node distance=0mm]
 \node (A1) [draw]{\pcode{""}};
 \node (C3) [right=of B3] {$[]$};
 \end{tikzpicture}
 \end{center}
 \noindent I highlighted the \emph{keys} in this map. Since
 there are three labels in the factorial program, there are
-three keys.
+three keys. When running the factorial program and
+encountering a jump, then we only have to consult this map, in
+order to find out what the next instruction should be.
+We should now be in the position to define how a program
+should be run. This ``running'' of programs is often called
+\emph{evaluation}. Let us start with the definition for
+how expressions are evaluated. A first attempt is the
+following recursive function:
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
+$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
+$\begin{cases}
+\textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
+\textit{snippets}(rest)                & \text{otherwise}
+\end{cases}$
+\end{tabular}
+\end{center}
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 355	619073c37649
parent 354	8e5e84b14041
child 356	e1e0f78baa70