updated
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Wed, 17 Dec 2014 17:50:34 +0000
changeset 354 8e5e84b14041
parent 353 605f815986dd
child 355 619073c37649
updated
handouts/ho08.pdf
handouts/ho09.pdf
handouts/ho09.tex
Binary file handouts/ho08.pdf has changed
Binary file handouts/ho09.pdf has changed
--- a/handouts/ho09.tex	Fri Dec 12 15:43:55 2014 +0000
+++ b/handouts/ho09.tex	Wed Dec 17 17:50:34 2014 +0000
@@ -4,6 +4,7 @@
 \usepackage{../graphics}
 \usepackage{../grammar}
 \usepackage{multicol}
+\usepackage{array}
 
 \begin{document}
 
@@ -66,23 +67,23 @@
 Using static analysis, we want to find out what the sign of
 these integers (positive or negative) will be when the program
 runs. This sign-analysis seems like a very simple problem. But
-it will turn out even such simple problems, if approached
-naively, are in general undecidable, just like Turing's
-halting problem. I let you think why?
+teven such simple problems, if approached naively, are in
+general undecidable, just like Turing's halting problem. I let
+you think why?
 
 
 Is sign-analysis of variables an interesting problem? Well,
 yes---if a compiler can find out that for example a variable
 will never be negative and this variable is used as an index
-for an array, then the compiler does not need to generate code
+for an array, then the comopiler does not need to generate code
 for an underflow-test. Remember some languages are immune to
 buffer-overflow attacks, but they need to add underflow and
 overflow checks everywhere. If the compiler can omit the
 underflow test, for example, then this can potentially
 drastically speed up the generated code.
 
-What do programs in our programming language look like? The 
-following grammar gives a first specification:
+What do programs in our simple programming language look like?
+The following grammar gives a first specification:
 
 \begin{multicols}{2}
 \begin{plstx}[rhs style=,one per line,left margin=9mm]
@@ -109,16 +110,16 @@
 variable assignments, conditional jumps (\pcode{jmp?}) and
 unconditional jumps (\pcode{goto}). Labels are just strings,
 which can be used as the target of a jump. We assume that in
-every program the labels are unique---otherwise if there is a
-clash we do not know where to jump to. The conditional jumps
-and variable assignments involve (arithmetic) expressions.
+every program the labels are unique---if there is a clash we
+do not know where to jump to. The conditional jumps and
+variable assignments involve (arithmetic) expressions.
 Expressions are either numbers, variables or compound
 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
 (for simplicity reasons we do not consider any other
 operations). We assume we have negative and positive numbers,
 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
 \pcode{2}\ldots{} An example program that calculates the
-factorial of 5 is as follows:
+factorial of 5 is in oure programming language as follows:
 
 \begin{lstlisting}[language={},xleftmargin=10mm]
       a := 1
@@ -131,16 +132,16 @@
 done:
 \end{lstlisting}
 
-\noindent Each line of the program contains a statement. In
-the first two lines we assign values to the variables
-\pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n}
-is zero, in which case we jump to the end of the program
-marked with the label \pcode{done}. If \pcode{n} is not zero,
-we multiply the content of \pcode{a} by \pcode{n}, decrease
-\pcode{n} by one and jump back to the beginning of the loop,
-marked with the label \pcode{top}. Another program in our
-language is shown in Figure~\ref{fib}. I let you think what it
-calculates.
+\noindent As can be seen each line of the program contains a
+statement. In the first two lines we assign values to the
+variables \pcode{a} and \pcode{n}. In line 4 we test whether
+\pcode{n} is zero, in which case we jump to the end of the
+program marked with the label \pcode{done}. If \pcode{n} is
+not zero, we multiply the content of \pcode{a} by \pcode{n},
+decrease \pcode{n} by one and jump back to the beginning of
+the loop, marked with the label \pcode{top}. Another program
+in our language is shown in Figure~\ref{fib}. I let you think
+what it calculates.
  
 \begin{figure}[t]
 \begin{lstlisting}[numbers=none,
@@ -165,8 +166,8 @@
 complete---meaning quite powerful. However, discussing this
 fact in more detail would lead us too far astray. Clearly, our
 programming is rather low-level and not very comfortable for
-writing programs. It is inspired by machine code, which is the
-code that is actually executed by a CPU. So a more interesting
+writing programs. It is inspired by real machine code, which
+is the code that is executed by a CPU. So a more interesting
 question is what is missing in comparison with real machine
 code? Well, not much\ldots{}in principle. Real machine code,
 of course, contains many more arithmetic instructions (not
@@ -181,9 +182,9 @@
 over-flow or under-flow from this range. Also the number of
 variables in our programs is potentially unlimited, while
 memory in an actual computer, of course, is always limited
-somehow on any actual. To sum up, our language might look very
-simple, but it is not completely removed from practically
-relevant issues.
+somehow on any actual. To sum up, our language might look
+ridiculously simple, but it is not much removed from
+practically relevant issues.
 
 
 \subsubsection*{An Interpreter}
@@ -192,9 +193,10 @@
 names for variables you allow; what programs should look like;
 most importantly you can decide what each part of the program
 should mean and do. While our language is rather simple and
-the meaning is rather straightforward, there are still places
-where we need to make a real choice. For example with
-conditional jumps, say the one in the factorial program: 
+the meaning of statements, for example, is rather
+straightforward, there are still places where we need to make
+real choices. For example consider the conditional jumps, say
+the one in the factorial program: 
 
 \begin{center}
 \code{jmp? n = 0 done}
@@ -203,30 +205,132 @@
 \noindent How should they work? We could introduce Booleans
 (\pcode{true} and \pcode{false}) and then jump only when the
 condition is \pcode{true}. However, since we have numbers in
-our language anyway, why not just encoding \emph{true} as
-zero, and \pcode{false} as anything else? In this way we can
-dispense with the additional concept of Booleans, but also we
-could replace the jump above by
+our language anyway, why not just encoding \pcode{true} as
+one, and \pcode{false} as zero? In this way we can dispense
+with the additional concept of Booleans, {\bf but also we could
+replace the jump above by
 
 \begin{center}
-\code{jmp? n done}
+\code{jmp? 1 + (n + -n) done}
 \end{center}
 
 \noindent which behaves exactly the same. But what does it
-mean that two jumps behave the same?
+actually mean that two jumps behave the same? Or two programs
+for that matter?}
 
 I hope the above discussion makes it already clear we need to
 be a bit more careful with our programs. Below we shall
-describe an interpreter for our programs, which specifies
-exactly how programs are supposed to be run\ldots{}at least we
-will specify this for all \emph{good} programs. By good
-programs we mean where for example all variables are
-initialised. Our interpreter will just crash if it cannot find
-out the value for a variable, because it is not initialised.
+describe an interpreter for our programming language, which
+specifies exactly how programs are supposed to be
+run\ldots{}at least we will specify this for all \emph{good}
+programs. By good programs I mean where all variables are
+initialised, for example. Our interpreter will just crash if
+it cannot find out the value for a variable, in case it is not
+initialised. Also we will assume that labels in good programs
+are unique, otherwise our programs will calculate ``garbage''.
+
+First we will pre-process our programs. This will simplify the
+definition of our interpreter later on. We will transform
+programs into \emph{snippets}. A snippet is a label and all
+code that comes after the label. This essentially means a
+snippet is a \emph{map} from labels to code. Given that
+programs are sequences (or lists) of statements, we can easily
+calculate the snippets by just recursively generating the map.
+Suppose a program is of the general form
+
+\begin{center}
+$stmt_1\;stmt_2\; \ldots\; stmt_n$
+\end{center}
+
+\noindent In order to calculate the snippets of such a
+program, we have to go through the statements one by one and
+check whether they are a label. If yes, we add the label and
+the remaining statements to our map. If no we just continue
+with the next statement. To come up with a recursive
+definition for generating snippets, let us write $[]$ for the
+program that does not contain any statement. Consider the
+following definition:
+
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
+$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
+$\begin{cases}
+   \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
+   \textit{snippets}(rest)                & \text{otherwise}
+\end{cases}$   
+\end{tabular}
+\end{center}
 
-First we will pre-process our programs. This will simplify 
-our definition of our interpreter later on. We will transform
-programs into \emph{snippets}. 
+\noindent In the first clause we just return the empty map for
+the program that does not contain any statements. In the
+second clause, we have to distinguish the case where the first
+statement is a label or not. If not, then we just ``throw
+away'' the label and recursively calculate the snippets for
+the rest of the program. If yes, then we do the same, but also
+update the map so that $label$ now points to the rest of the
+statements. There is one small problem we need to overcome:
+our two programs have no label as entry point---that is where
+the execution starts. We always assume that the first
+statement will be run first. For us it seems convenient if
+we add to all our programs a default label, say 
+\pcode{""} (the empty string). With this we can define
+our preprocessing of programs as follows
+
+\begin{center}
+$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
+\end{center} 
+
+\noindent Let us see how this pans out in practice. If we
+preprocess the factorial program shown earlier we obtain the 
+following map:
+
+\begin{center}
+\small
+\lstset{numbers=none,
+        language={},
+        xleftmargin=0mm,
+        aboveskip=0.5mm,
+        frame=single,
+        framerule=0mm,
+        framesep=0mm}
+\begin{tikzpicture}[node distance=0mm]
+\node (A1) [draw]{\pcode{""}};
+\node (B1) [right=of A1] {$\mapsto$};
+\node (C1) [right=of B1,anchor=north west] {
+\begin{minipage}{3.5cm}
+\begin{lstlisting}[language={},xleftmargin=0mm]
+  a := 1
+  n := 5 
+top:  
+  jmp? n = 0 done 
+  a := a * n 
+  n := n + -1 
+  goto top 
+done:
+\end{lstlisting}
+\end{minipage}};
+\node (A2) [right=of C1.north east,draw] {\pcode{top}};
+\node (B2) [right=of A2] {$\mapsto$};
+\node (C2) [right=of B2, anchor=north west] {
+\begin{minipage}{3.5cm}
+\begin{lstlisting}[language={},xleftmargin=0mm]
+  jmp? n = 0 done 
+  a := a * n 
+  n := n + -1 
+  goto top 
+done:
+\end{lstlisting} 
+\end{minipage}}; 
+\node (A3) [right=of C2.north east,draw] {\pcode{done}};
+\node (B3) [right=of A3] {$\mapsto$};
+\node (C3) [right=of B3] {$[]$};
+\end{tikzpicture}
+\end{center}
+
+\noindent I highlighted the \emph{keys} in this map. Since
+there are three labels in the factorial program, there are 
+three keys.
 
 \end{document}