# HG changeset patch # User Christian Urban # Date 1418838634 0 # Node ID 8e5e84b1404100274b6064125d53dd42b42ae999 # Parent 605f815986dd6ad9ef9b6bba3ebe325c7e6f707a updated diff -r 605f815986dd -r 8e5e84b14041 handouts/ho08.pdf Binary file handouts/ho08.pdf has changed diff -r 605f815986dd -r 8e5e84b14041 handouts/ho09.pdf Binary file handouts/ho09.pdf has changed diff -r 605f815986dd -r 8e5e84b14041 handouts/ho09.tex --- a/handouts/ho09.tex Fri Dec 12 15:43:55 2014 +0000 +++ b/handouts/ho09.tex Wed Dec 17 17:50:34 2014 +0000 @@ -4,6 +4,7 @@ \usepackage{../graphics} \usepackage{../grammar} \usepackage{multicol} +\usepackage{array} \begin{document} @@ -66,23 +67,23 @@ Using static analysis, we want to find out what the sign of these integers (positive or negative) will be when the program runs. This sign-analysis seems like a very simple problem. But -it will turn out even such simple problems, if approached -naively, are in general undecidable, just like Turing's -halting problem. I let you think why? +teven such simple problems, if approached naively, are in +general undecidable, just like Turing's halting problem. I let +you think why? Is sign-analysis of variables an interesting problem? Well, yes---if a compiler can find out that for example a variable will never be negative and this variable is used as an index -for an array, then the compiler does not need to generate code +for an array, then the comopiler does not need to generate code for an underflow-test. Remember some languages are immune to buffer-overflow attacks, but they need to add underflow and overflow checks everywhere. If the compiler can omit the underflow test, for example, then this can potentially drastically speed up the generated code. -What do programs in our programming language look like? The -following grammar gives a first specification: +What do programs in our simple programming language look like? +The following grammar gives a first specification: \begin{multicols}{2} \begin{plstx}[rhs style=,one per line,left margin=9mm] @@ -109,16 +110,16 @@ variable assignments, conditional jumps (\pcode{jmp?}) and unconditional jumps (\pcode{goto}). Labels are just strings, which can be used as the target of a jump. We assume that in -every program the labels are unique---otherwise if there is a -clash we do not know where to jump to. The conditional jumps -and variable assignments involve (arithmetic) expressions. +every program the labels are unique---if there is a clash we +do not know where to jump to. The conditional jumps and +variable assignments involve (arithmetic) expressions. Expressions are either numbers, variables or compound expressions built up from \pcode{+}, \pcode{*} and \emph{=} (for simplicity reasons we do not consider any other operations). We assume we have negative and positive numbers, \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, \pcode{2}\ldots{} An example program that calculates the -factorial of 5 is as follows: +factorial of 5 is in oure programming language as follows: \begin{lstlisting}[language={},xleftmargin=10mm] a := 1 @@ -131,16 +132,16 @@ done: \end{lstlisting} -\noindent Each line of the program contains a statement. In -the first two lines we assign values to the variables -\pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n} -is zero, in which case we jump to the end of the program -marked with the label \pcode{done}. If \pcode{n} is not zero, -we multiply the content of \pcode{a} by \pcode{n}, decrease -\pcode{n} by one and jump back to the beginning of the loop, -marked with the label \pcode{top}. Another program in our -language is shown in Figure~\ref{fib}. I let you think what it -calculates. +\noindent As can be seen each line of the program contains a +statement. In the first two lines we assign values to the +variables \pcode{a} and \pcode{n}. In line 4 we test whether +\pcode{n} is zero, in which case we jump to the end of the +program marked with the label \pcode{done}. If \pcode{n} is +not zero, we multiply the content of \pcode{a} by \pcode{n}, +decrease \pcode{n} by one and jump back to the beginning of +the loop, marked with the label \pcode{top}. Another program +in our language is shown in Figure~\ref{fib}. I let you think +what it calculates. \begin{figure}[t] \begin{lstlisting}[numbers=none, @@ -165,8 +166,8 @@ complete---meaning quite powerful. However, discussing this fact in more detail would lead us too far astray. Clearly, our programming is rather low-level and not very comfortable for -writing programs. It is inspired by machine code, which is the -code that is actually executed by a CPU. So a more interesting +writing programs. It is inspired by real machine code, which +is the code that is executed by a CPU. So a more interesting question is what is missing in comparison with real machine code? Well, not much\ldots{}in principle. Real machine code, of course, contains many more arithmetic instructions (not @@ -181,9 +182,9 @@ over-flow or under-flow from this range. Also the number of variables in our programs is potentially unlimited, while memory in an actual computer, of course, is always limited -somehow on any actual. To sum up, our language might look very -simple, but it is not completely removed from practically -relevant issues. +somehow on any actual. To sum up, our language might look +ridiculously simple, but it is not much removed from +practically relevant issues. \subsubsection*{An Interpreter} @@ -192,9 +193,10 @@ names for variables you allow; what programs should look like; most importantly you can decide what each part of the program should mean and do. While our language is rather simple and -the meaning is rather straightforward, there are still places -where we need to make a real choice. For example with -conditional jumps, say the one in the factorial program: +the meaning of statements, for example, is rather +straightforward, there are still places where we need to make +real choices. For example consider the conditional jumps, say +the one in the factorial program: \begin{center} \code{jmp? n = 0 done} @@ -203,30 +205,132 @@ \noindent How should they work? We could introduce Booleans (\pcode{true} and \pcode{false}) and then jump only when the condition is \pcode{true}. However, since we have numbers in -our language anyway, why not just encoding \emph{true} as -zero, and \pcode{false} as anything else? In this way we can -dispense with the additional concept of Booleans, but also we -could replace the jump above by +our language anyway, why not just encoding \pcode{true} as +one, and \pcode{false} as zero? In this way we can dispense +with the additional concept of Booleans, {\bf but also we could +replace the jump above by \begin{center} -\code{jmp? n done} +\code{jmp? 1 + (n + -n) done} \end{center} \noindent which behaves exactly the same. But what does it -mean that two jumps behave the same? +actually mean that two jumps behave the same? Or two programs +for that matter?} I hope the above discussion makes it already clear we need to be a bit more careful with our programs. Below we shall -describe an interpreter for our programs, which specifies -exactly how programs are supposed to be run\ldots{}at least we -will specify this for all \emph{good} programs. By good -programs we mean where for example all variables are -initialised. Our interpreter will just crash if it cannot find -out the value for a variable, because it is not initialised. +describe an interpreter for our programming language, which +specifies exactly how programs are supposed to be +run\ldots{}at least we will specify this for all \emph{good} +programs. By good programs I mean where all variables are +initialised, for example. Our interpreter will just crash if +it cannot find out the value for a variable, in case it is not +initialised. Also we will assume that labels in good programs +are unique, otherwise our programs will calculate ``garbage''. + +First we will pre-process our programs. This will simplify the +definition of our interpreter later on. We will transform +programs into \emph{snippets}. A snippet is a label and all +code that comes after the label. This essentially means a +snippet is a \emph{map} from labels to code. Given that +programs are sequences (or lists) of statements, we can easily +calculate the snippets by just recursively generating the map. +Suppose a program is of the general form + +\begin{center} +$stmt_1\;stmt_2\; \ldots\; stmt_n$ +\end{center} + +\noindent In order to calculate the snippets of such a +program, we have to go through the statements one by one and +check whether they are a label. If yes, we add the label and +the remaining statements to our map. If no we just continue +with the next statement. To come up with a recursive +definition for generating snippets, let us write $[]$ for the +program that does not contain any statement. Consider the +following definition: + +\begin{center} +\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} +$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ +$\textit{snippets}(stmt\;\; rest)$ & $\dn$ & +$\begin{cases} + \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\ + \textit{snippets}(rest) & \text{otherwise} +\end{cases}$ +\end{tabular} +\end{center} -First we will pre-process our programs. This will simplify -our definition of our interpreter later on. We will transform -programs into \emph{snippets}. +\noindent In the first clause we just return the empty map for +the program that does not contain any statements. In the +second clause, we have to distinguish the case where the first +statement is a label or not. If not, then we just ``throw +away'' the label and recursively calculate the snippets for +the rest of the program. If yes, then we do the same, but also +update the map so that $label$ now points to the rest of the +statements. There is one small problem we need to overcome: +our two programs have no label as entry point---that is where +the execution starts. We always assume that the first +statement will be run first. For us it seems convenient if +we add to all our programs a default label, say +\pcode{""} (the empty string). With this we can define +our preprocessing of programs as follows + +\begin{center} +$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ +\end{center} + +\noindent Let us see how this pans out in practice. If we +preprocess the factorial program shown earlier we obtain the +following map: + +\begin{center} +\small +\lstset{numbers=none, + language={}, + xleftmargin=0mm, + aboveskip=0.5mm, + frame=single, + framerule=0mm, + framesep=0mm} +\begin{tikzpicture}[node distance=0mm] +\node (A1) [draw]{\pcode{""}}; +\node (B1) [right=of A1] {$\mapsto$}; +\node (C1) [right=of B1,anchor=north west] { +\begin{minipage}{3.5cm} +\begin{lstlisting}[language={},xleftmargin=0mm] + a := 1 + n := 5 +top: + jmp? n = 0 done + a := a * n + n := n + -1 + goto top +done: +\end{lstlisting} +\end{minipage}}; +\node (A2) [right=of C1.north east,draw] {\pcode{top}}; +\node (B2) [right=of A2] {$\mapsto$}; +\node (C2) [right=of B2, anchor=north west] { +\begin{minipage}{3.5cm} +\begin{lstlisting}[language={},xleftmargin=0mm] + jmp? n = 0 done + a := a * n + n := n + -1 + goto top +done: +\end{lstlisting} +\end{minipage}}; +\node (A3) [right=of C2.north east,draw] {\pcode{done}}; +\node (B3) [right=of A3] {$\mapsto$}; +\node (C3) [right=of B3] {$[]$}; +\end{tikzpicture} +\end{center} + +\noindent I highlighted the \emph{keys} in this map. Since +there are three labels in the factorial program, there are +three keys. \end{document}