diff -r 8e5e84b14041 -r 619073c37649 handouts/ho09.tex --- a/handouts/ho09.tex Wed Dec 17 17:50:34 2014 +0000 +++ b/handouts/ho09.tex Thu Dec 18 12:00:44 2014 +0000 @@ -4,7 +4,6 @@ \usepackage{../graphics} \usepackage{../grammar} \usepackage{multicol} -\usepackage{array} \begin{document} @@ -67,7 +66,7 @@ Using static analysis, we want to find out what the sign of these integers (positive or negative) will be when the program runs. This sign-analysis seems like a very simple problem. But -teven such simple problems, if approached naively, are in +even such simple problems, if approached naively, are in general undecidable, just like Turing's halting problem. I let you think why? @@ -75,12 +74,15 @@ Is sign-analysis of variables an interesting problem? Well, yes---if a compiler can find out that for example a variable will never be negative and this variable is used as an index -for an array, then the comopiler does not need to generate code +for an array, then the compiler does not need to generate code for an underflow-test. Remember some languages are immune to buffer-overflow attacks, but they need to add underflow and overflow checks everywhere. If the compiler can omit the underflow test, for example, then this can potentially -drastically speed up the generated code. +drastically speed up the generated code. According to the John +Regehr, an expert in the field of compilers, overflow checks +can cause 5-10\% slowdown, and in some languages even 100\% +for tight loops.\footnote{\url{http://blog.regehr.org/archives/1154}} What do programs in our simple programming language look like? The following grammar gives a first specification: @@ -110,9 +112,9 @@ variable assignments, conditional jumps (\pcode{jmp?}) and unconditional jumps (\pcode{goto}). Labels are just strings, which can be used as the target of a jump. We assume that in -every program the labels are unique---if there is a clash we -do not know where to jump to. The conditional jumps and -variable assignments involve (arithmetic) expressions. +every program the labels are unique---if there is a clash, +then we do not know where to jump to. The conditional jumps +and variable assignments involve (arithmetic) expressions. Expressions are either numbers, variables or compound expressions built up from \pcode{+}, \pcode{*} and \emph{=} (for simplicity reasons we do not consider any other @@ -183,7 +185,7 @@ variables in our programs is potentially unlimited, while memory in an actual computer, of course, is always limited somehow on any actual. To sum up, our language might look -ridiculously simple, but it is not much removed from +ridiculously simple, but it is not far removed from practically relevant issues. @@ -207,16 +209,7 @@ condition is \pcode{true}. However, since we have numbers in our language anyway, why not just encoding \pcode{true} as one, and \pcode{false} as zero? In this way we can dispense -with the additional concept of Booleans, {\bf but also we could -replace the jump above by - -\begin{center} -\code{jmp? 1 + (n + -n) done} -\end{center} - -\noindent which behaves exactly the same. But what does it -actually mean that two jumps behave the same? Or two programs -for that matter?} +with the additional concept of Booleans. I hope the above discussion makes it already clear we need to be a bit more careful with our programs. Below we shall @@ -226,30 +219,31 @@ programs. By good programs I mean where all variables are initialised, for example. Our interpreter will just crash if it cannot find out the value for a variable, in case it is not -initialised. Also we will assume that labels in good programs +initialised. Also, we will assume that labels in good programs are unique, otherwise our programs will calculate ``garbage''. First we will pre-process our programs. This will simplify the definition of our interpreter later on. We will transform -programs into \emph{snippets}. A snippet is a label and all -code that comes after the label. This essentially means a -snippet is a \emph{map} from labels to code. Given that -programs are sequences (or lists) of statements, we can easily -calculate the snippets by just recursively generating the map. +programs into \emph{snippets}. Their purpose is to simplify +the definition of what the interpreter should do in case of a +jump. A snippet is a label and all code that comes after the +label. This essentially means a snippet is a \emph{map} from +labels to code. Given that programs are sequences (or lists) +of statements, we can easily calculate the snippets by just +traversing this sequence and recursively generating the map. Suppose a program is of the general form \begin{center} $stmt_1\;stmt_2\; \ldots\; stmt_n$ \end{center} -\noindent In order to calculate the snippets of such a -program, we have to go through the statements one by one and -check whether they are a label. If yes, we add the label and -the remaining statements to our map. If no we just continue -with the next statement. To come up with a recursive -definition for generating snippets, let us write $[]$ for the -program that does not contain any statement. Consider the -following definition: +\noindent The idea is to go through this sequence of +statements one by one and check whether they are a label. If +yes, we add the label and the remaining statements to our map. +If no, we just continue with the next statement. To come up +with a recursive definition for generating snippets, let us +write $[]$ for the program that does not contain any +statement. Consider the following definition: \begin{center} \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} @@ -263,26 +257,26 @@ \end{center} \noindent In the first clause we just return the empty map for -the program that does not contain any statements. In the -second clause, we have to distinguish the case where the first -statement is a label or not. If not, then we just ``throw -away'' the label and recursively calculate the snippets for -the rest of the program. If yes, then we do the same, but also -update the map so that $label$ now points to the rest of the -statements. There is one small problem we need to overcome: -our two programs have no label as entry point---that is where -the execution starts. We always assume that the first -statement will be run first. For us it seems convenient if -we add to all our programs a default label, say -\pcode{""} (the empty string). With this we can define -our preprocessing of programs as follows +the program that does not contain any statement. In the second +clause, we have to distinguish the case where the first +statement is a label or not. As said before, if not, then we +just ``throw away'' the label and recursively calculate the +snippets for the rest of the program. If yes, then we do the +same, but also update the map so that $label$ now points to +the rest of the statements. There is one small problem we need +to overcome: our two programs have no label as \emph{entry +point}---that is where the execution starts. We usually assume +that the first statement will be run first. To make this the +default, it is convenient if we add to all our programs a +default label, say \pcode{""} (the empty string). With this we +can define our pre-processing of programs as follows \begin{center} $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ \end{center} \noindent Let us see how this pans out in practice. If we -preprocess the factorial program shown earlier we obtain the +pre-process the factorial program shown earlier, we obtain the following map: \begin{center} @@ -291,6 +285,7 @@ language={}, xleftmargin=0mm, aboveskip=0.5mm, + belowskip=0.5mm, frame=single, framerule=0mm, framesep=0mm} @@ -329,8 +324,28 @@ \end{center} \noindent I highlighted the \emph{keys} in this map. Since -there are three labels in the factorial program, there are -three keys. +there are three labels in the factorial program, there are +three keys. When running the factorial program and +encountering a jump, then we only have to consult this map, in +order to find out what the next instruction should be. + +We should now be in the position to define how a program +should be run. This ``running'' of programs is often called +\emph{evaluation}. Let us start with the definition for +how expressions are evaluated. A first attempt is the +following recursive function: + +\begin{center} +\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} +$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ +$\textit{snippets}(stmt\;\; rest)$ & $\dn$ & +$\begin{cases} + \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\ + \textit{snippets}(rest) & \text{otherwise} +\end{cases}$ +\end{tabular} +\end{center} + \end{document}