sen-material: comparison handouts/ho09.tex

equal deleted inserted replaced

-:605f815986dd
+:8e5e84b14041
 \usepackage{../style}
 \usepackage{../langs}
 \usepackage{../graphics}
 \usepackage{../grammar}
 \usepackage{multicol}
+\usepackage{array}
 \begin{document}
 \section*{Handout 9 (Static Analysis)}
 with real programming languages. The language we will study
 contains, amongst other things, variables holding integers.
 Using static analysis, we want to find out what the sign of
 these integers (positive or negative) will be when the program
 runs. This sign-analysis seems like a very simple problem. But
-it will turn out even such simple problems, if approached
+teven such simple problems, if approached naively, are in
-naively, are in general undecidable, just like Turing's
+general undecidable, just like Turing's halting problem. I let
-halting problem. I let you think why?
+you think why?
 Is sign-analysis of variables an interesting problem? Well,
 yes---if a compiler can find out that for example a variable
 will never be negative and this variable is used as an index
-for an array, then the compiler does not need to generate code
+for an array, then the comopiler does not need to generate code
 for an underflow-test. Remember some languages are immune to
 buffer-overflow attacks, but they need to add underflow and
 overflow checks everywhere. If the compiler can omit the
 underflow test, for example, then this can potentially
 drastically speed up the generated code.
-What do programs in our programming language look like? The
+What do programs in our simple programming language look like?
-following grammar gives a first specification:
+The following grammar gives a first specification:
 \begin{multicols}{2}
 \begin{plstx}[rhs style=,one per line,left margin=9mm]
 : \meta{Stmt} ::= \meta{label} \texttt{:}
 | \meta{var} \texttt{:=} \meta{Exp}
 and \emph{expressions} as well as \emph{programs}, which are
 sequences of statements. Statements are either labels,
 variable assignments, conditional jumps (\pcode{jmp?}) and
 unconditional jumps (\pcode{goto}). Labels are just strings,
 which can be used as the target of a jump. We assume that in
-every program the labels are unique---otherwise if there is a
+every program the labels are unique---if there is a clash we
-clash we do not know where to jump to. The conditional jumps
+do not know where to jump to. The conditional jumps and
-and variable assignments involve (arithmetic) expressions.
+variable assignments involve (arithmetic) expressions.
 Expressions are either numbers, variables or compound
 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
 (for simplicity reasons we do not consider any other
 operations). We assume we have negative and positive numbers,
 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
 \pcode{2}\ldots{} An example program that calculates the
-factorial of 5 is as follows:
+factorial of 5 is in oure programming language as follows:
 \begin{lstlisting}[language={},xleftmargin=10mm]
 a := 1
 n := 5
 top:
 n := n + -1
 goto top
 done:
 \end{lstlisting}
-\noindent Each line of the program contains a statement. In
+\noindent As can be seen each line of the program contains a
-the first two lines we assign values to the variables
+statement. In the first two lines we assign values to the
-\pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n}
+variables \pcode{a} and \pcode{n}. In line 4 we test whether
-is zero, in which case we jump to the end of the program
+\pcode{n} is zero, in which case we jump to the end of the
-marked with the label \pcode{done}. If \pcode{n} is not zero,
+program marked with the label \pcode{done}. If \pcode{n} is
-we multiply the content of \pcode{a} by \pcode{n}, decrease
+not zero, we multiply the content of \pcode{a} by \pcode{n},
-\pcode{n} by one and jump back to the beginning of the loop,
+decrease \pcode{n} by one and jump back to the beginning of
-marked with the label \pcode{top}. Another program in our
+the loop, marked with the label \pcode{top}. Another program
-language is shown in Figure~\ref{fib}. I let you think what it
+in our language is shown in Figure~\ref{fib}. I let you think
-calculates.
+what it calculates.
 \begin{figure}[t]
 \begin{lstlisting}[numbers=none,
 language={},xleftmargin=10mm]
 n := 6
 Even if our language is rather small, it is still Turing
 complete---meaning quite powerful. However, discussing this
 fact in more detail would lead us too far astray. Clearly, our
 programming is rather low-level and not very comfortable for
-writing programs. It is inspired by machine code, which is the
+writing programs. It is inspired by real machine code, which
-code that is actually executed by a CPU. So a more interesting
+is the code that is executed by a CPU. So a more interesting
 question is what is missing in comparison with real machine
 code? Well, not much\ldots{}in principle. Real machine code,
 of course, contains many more arithmetic instructions (not
 just addition and multiplication) and many more conditional
 jumps. We could add these to our language if we wanted, but
 or large, which is not the case with real machine code. In
 real code basic number formats have a range and might
 over-flow or under-flow from this range. Also the number of
 variables in our programs is potentially unlimited, while
 memory in an actual computer, of course, is always limited
-somehow on any actual. To sum up, our language might look very
+somehow on any actual. To sum up, our language might look
-simple, but it is not completely removed from practically
+ridiculously simple, but it is not much removed from
-relevant issues.
+practically relevant issues.
 \subsubsection*{An Interpreter}
 Designing a language is like playing god: you can say what
 names for variables you allow; what programs should look like;
 most importantly you can decide what each part of the program
 should mean and do. While our language is rather simple and
-the meaning is rather straightforward, there are still places
+the meaning of statements, for example, is rather
-where we need to make a real choice. For example with
+straightforward, there are still places where we need to make
-conditional jumps, say the one in the factorial program:
+real choices. For example consider the conditional jumps, say
+the one in the factorial program:
 \begin{center}
 \code{jmp? n = 0 done}
 \end{center}
 \noindent How should they work? We could introduce Booleans
 (\pcode{true} and \pcode{false}) and then jump only when the
 condition is \pcode{true}. However, since we have numbers in
-our language anyway, why not just encoding \emph{true} as
+our language anyway, why not just encoding \pcode{true} as
-zero, and \pcode{false} as anything else? In this way we can
+one, and \pcode{false} as zero? In this way we can dispense
-dispense with the additional concept of Booleans, but also we
+with the additional concept of Booleans, {\bf but also we could
-could replace the jump above by
+replace the jump above by
 \begin{center}
-\code{jmp? n done}
+\code{jmp? 1 + (n + -n) done}
 \end{center}
 \noindent which behaves exactly the same. But what does it
-mean that two jumps behave the same?
+actually mean that two jumps behave the same? Or two programs
+for that matter?}
 I hope the above discussion makes it already clear we need to
 be a bit more careful with our programs. Below we shall
-describe an interpreter for our programs, which specifies
+describe an interpreter for our programming language, which
-exactly how programs are supposed to be run\ldots{}at least we
+specifies exactly how programs are supposed to be
-will specify this for all \emph{good} programs. By good
+run\ldots{}at least we will specify this for all \emph{good}
-programs we mean where for example all variables are
+programs. By good programs I mean where all variables are
-initialised. Our interpreter will just crash if it cannot find
+initialised, for example. Our interpreter will just crash if
-out the value for a variable, because it is not initialised.
+it cannot find out the value for a variable, in case it is not
+initialised. Also we will assume that labels in good programs
-First we will pre-process our programs. This will simplify
+are unique, otherwise our programs will calculate ``garbage''.
-our definition of our interpreter later on. We will transform
-programs into \emph{snippets}.
+First we will pre-process our programs. This will simplify the
+definition of our interpreter later on. We will transform
+programs into \emph{snippets}. A snippet is a label and all
+code that comes after the label. This essentially means a
+snippet is a \emph{map} from labels to code. Given that
+programs are sequences (or lists) of statements, we can easily
+calculate the snippets by just recursively generating the map.
+Suppose a program is of the general form
+\begin{center}
+$stmt_1\;stmt_2\; \ldots\; stmt_n$
+\end{center}
+\noindent In order to calculate the snippets of such a
+program, we have to go through the statements one by one and
+check whether they are a label. If yes, we add the label and
+the remaining statements to our map. If no we just continue
+with the next statement. To come up with a recursive
+definition for generating snippets, let us write $[]$ for the
+program that does not contain any statement. Consider the
+following definition:
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
+$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
+$\begin{cases}
+\textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
+\textit{snippets}(rest)                & \text{otherwise}
+\end{cases}$
+\end{tabular}
+\end{center}
+\noindent In the first clause we just return the empty map for
+the program that does not contain any statements. In the
+second clause, we have to distinguish the case where the first
+statement is a label or not. If not, then we just ``throw
+away'' the label and recursively calculate the snippets for
+the rest of the program. If yes, then we do the same, but also
+update the map so that $label$ now points to the rest of the
+statements. There is one small problem we need to overcome:
+our two programs have no label as entry point---that is where
+the execution starts. We always assume that the first
+statement will be run first. For us it seems convenient if
+we add to all our programs a default label, say
+\pcode{""} (the empty string). With this we can define
+our preprocessing of programs as follows
+\begin{center}
+$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$
+\end{center}
+\noindent Let us see how this pans out in practice. If we
+preprocess the factorial program shown earlier we obtain the
+following map:
+\begin{center}
+\small
+\lstset{numbers=none,
+language={},
+xleftmargin=0mm,
+aboveskip=0.5mm,
+frame=single,
+framerule=0mm,
+framesep=0mm}
+\begin{tikzpicture}[node distance=0mm]
+\node (A1) [draw]{\pcode{""}};
+\node (B1) [right=of A1] {$\mapsto$};
+\node (C1) [right=of B1,anchor=north west] {
+\begin{minipage}{3.5cm}
+\begin{lstlisting}[language={},xleftmargin=0mm]
+a := 1
+n := 5
+top:
+jmp? n = 0 done
+a := a * n
+n := n + -1
+goto top
+done:
+\end{lstlisting}
+\end{minipage}};
+\node (A2) [right=of C1.north east,draw] {\pcode{top}};
+\node (B2) [right=of A2] {$\mapsto$};
+\node (C2) [right=of B2, anchor=north west] {
+\begin{minipage}{3.5cm}
+\begin{lstlisting}[language={},xleftmargin=0mm]
+jmp? n = 0 done
+a := a * n
+n := n + -1
+goto top
+done:
+\end{lstlisting}
+\end{minipage}};
+\node (A3) [right=of C2.north east,draw] {\pcode{done}};
+\node (B3) [right=of A3] {$\mapsto$};
+\node (C3) [right=of B3] {$[]$};
+\end{tikzpicture}
+\end{center}
+\noindent I highlighted the \emph{keys} in this map. Since
+there are three labels in the factorial program, there are
+three keys.
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 354	8e5e84b14041
parent 352	da5713bcdbb0
child 355	619073c37649