handouts/ho09.tex
changeset 355 619073c37649
parent 354 8e5e84b14041
child 356 e1e0f78baa70
--- a/handouts/ho09.tex	Wed Dec 17 17:50:34 2014 +0000
+++ b/handouts/ho09.tex	Thu Dec 18 12:00:44 2014 +0000
@@ -4,7 +4,6 @@
 \usepackage{../graphics}
 \usepackage{../grammar}
 \usepackage{multicol}
-\usepackage{array}
 
 \begin{document}
 
@@ -67,7 +66,7 @@
 Using static analysis, we want to find out what the sign of
 these integers (positive or negative) will be when the program
 runs. This sign-analysis seems like a very simple problem. But
-teven such simple problems, if approached naively, are in
+even such simple problems, if approached naively, are in
 general undecidable, just like Turing's halting problem. I let
 you think why?
 
@@ -75,12 +74,15 @@
 Is sign-analysis of variables an interesting problem? Well,
 yes---if a compiler can find out that for example a variable
 will never be negative and this variable is used as an index
-for an array, then the comopiler does not need to generate code
+for an array, then the compiler does not need to generate code
 for an underflow-test. Remember some languages are immune to
 buffer-overflow attacks, but they need to add underflow and
 overflow checks everywhere. If the compiler can omit the
 underflow test, for example, then this can potentially
-drastically speed up the generated code.
+drastically speed up the generated code. According to the John
+Regehr, an expert in the field of compilers, overflow checks
+can cause 5-10\% slowdown, and in some languages even 100\%
+for tight loops.\footnote{\url{http://blog.regehr.org/archives/1154}}  
 
 What do programs in our simple programming language look like?
 The following grammar gives a first specification:
@@ -110,9 +112,9 @@
 variable assignments, conditional jumps (\pcode{jmp?}) and
 unconditional jumps (\pcode{goto}). Labels are just strings,
 which can be used as the target of a jump. We assume that in
-every program the labels are unique---if there is a clash we
-do not know where to jump to. The conditional jumps and
-variable assignments involve (arithmetic) expressions.
+every program the labels are unique---if there is a clash,
+then we do not know where to jump to. The conditional jumps
+and variable assignments involve (arithmetic) expressions.
 Expressions are either numbers, variables or compound
 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
 (for simplicity reasons we do not consider any other
@@ -183,7 +185,7 @@
 variables in our programs is potentially unlimited, while
 memory in an actual computer, of course, is always limited
 somehow on any actual. To sum up, our language might look
-ridiculously simple, but it is not much removed from
+ridiculously simple, but it is not far removed from
 practically relevant issues.
 
 
@@ -207,16 +209,7 @@
 condition is \pcode{true}. However, since we have numbers in
 our language anyway, why not just encoding \pcode{true} as
 one, and \pcode{false} as zero? In this way we can dispense
-with the additional concept of Booleans, {\bf but also we could
-replace the jump above by
-
-\begin{center}
-\code{jmp? 1 + (n + -n) done}
-\end{center}
-
-\noindent which behaves exactly the same. But what does it
-actually mean that two jumps behave the same? Or two programs
-for that matter?}
+with the additional concept of Booleans.
 
 I hope the above discussion makes it already clear we need to
 be a bit more careful with our programs. Below we shall
@@ -226,30 +219,31 @@
 programs. By good programs I mean where all variables are
 initialised, for example. Our interpreter will just crash if
 it cannot find out the value for a variable, in case it is not
-initialised. Also we will assume that labels in good programs
+initialised. Also, we will assume that labels in good programs
 are unique, otherwise our programs will calculate ``garbage''.
 
 First we will pre-process our programs. This will simplify the
 definition of our interpreter later on. We will transform
-programs into \emph{snippets}. A snippet is a label and all
-code that comes after the label. This essentially means a
-snippet is a \emph{map} from labels to code. Given that
-programs are sequences (or lists) of statements, we can easily
-calculate the snippets by just recursively generating the map.
+programs into \emph{snippets}. Their purpose is to simplify
+the definition of what the interpreter should do in case of a
+jump. A snippet is a label and all code that comes after the
+label. This essentially means a snippet is a \emph{map} from
+labels to code. Given that programs are sequences (or lists)
+of statements, we can easily calculate the snippets by just
+traversing this sequence and recursively generating the map.
 Suppose a program is of the general form
 
 \begin{center}
 $stmt_1\;stmt_2\; \ldots\; stmt_n$
 \end{center}
 
-\noindent In order to calculate the snippets of such a
-program, we have to go through the statements one by one and
-check whether they are a label. If yes, we add the label and
-the remaining statements to our map. If no we just continue
-with the next statement. To come up with a recursive
-definition for generating snippets, let us write $[]$ for the
-program that does not contain any statement. Consider the
-following definition:
+\noindent The idea is to go through this sequence of
+statements one by one and check whether they are a label. If
+yes, we add the label and the remaining statements to our map.
+If no, we just continue with the next statement. To come up
+with a recursive definition for generating snippets, let us
+write $[]$ for the program that does not contain any
+statement. Consider the following definition:
 
 \begin{center}
 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
@@ -263,26 +257,26 @@
 \end{center}
 
 \noindent In the first clause we just return the empty map for
-the program that does not contain any statements. In the
-second clause, we have to distinguish the case where the first
-statement is a label or not. If not, then we just ``throw
-away'' the label and recursively calculate the snippets for
-the rest of the program. If yes, then we do the same, but also
-update the map so that $label$ now points to the rest of the
-statements. There is one small problem we need to overcome:
-our two programs have no label as entry point---that is where
-the execution starts. We always assume that the first
-statement will be run first. For us it seems convenient if
-we add to all our programs a default label, say 
-\pcode{""} (the empty string). With this we can define
-our preprocessing of programs as follows
+the program that does not contain any statement. In the second
+clause, we have to distinguish the case where the first
+statement is a label or not. As said before, if not, then we
+just ``throw away'' the label and recursively calculate the
+snippets for the rest of the program. If yes, then we do the
+same, but also update the map so that $label$ now points to
+the rest of the statements. There is one small problem we need
+to overcome: our two programs have no label as \emph{entry
+point}---that is where the execution starts. We usually assume
+that the first statement will be run first. To make this the
+default, it is convenient if we add to all our programs a
+default label, say \pcode{""} (the empty string). With this we
+can define our pre-processing of programs as follows
 
 \begin{center}
 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
 \end{center} 
 
 \noindent Let us see how this pans out in practice. If we
-preprocess the factorial program shown earlier we obtain the 
+pre-process the factorial program shown earlier, we obtain the 
 following map:
 
 \begin{center}
@@ -291,6 +285,7 @@
         language={},
         xleftmargin=0mm,
         aboveskip=0.5mm,
+        belowskip=0.5mm,
         frame=single,
         framerule=0mm,
         framesep=0mm}
@@ -329,8 +324,28 @@
 \end{center}
 
 \noindent I highlighted the \emph{keys} in this map. Since
-there are three labels in the factorial program, there are 
-three keys.
+there are three labels in the factorial program, there are
+three keys. When running the factorial program and
+encountering a jump, then we only have to consult this map, in
+order to find out what the next instruction should be.
+
+We should now be in the position to define how a program
+should be run. This ``running'' of programs is often called 
+\emph{evaluation}. Let us start with the definition for
+how expressions are evaluated. A first attempt is the 
+following recursive function:
+
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
+$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
+$\begin{cases}
+   \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
+   \textit{snippets}(rest)                & \text{otherwise}
+\end{cases}$   
+\end{tabular}
+\end{center}
+
 
 \end{document}