--- a/handouts/ho09.tex Wed Dec 17 17:50:34 2014 +0000
+++ b/handouts/ho09.tex Thu Dec 18 12:00:44 2014 +0000
@@ -4,7 +4,6 @@
\usepackage{../graphics}
\usepackage{../grammar}
\usepackage{multicol}
-\usepackage{array}
\begin{document}
@@ -67,7 +66,7 @@
Using static analysis, we want to find out what the sign of
these integers (positive or negative) will be when the program
runs. This sign-analysis seems like a very simple problem. But
-teven such simple problems, if approached naively, are in
+even such simple problems, if approached naively, are in
general undecidable, just like Turing's halting problem. I let
you think why?
@@ -75,12 +74,15 @@
Is sign-analysis of variables an interesting problem? Well,
yes---if a compiler can find out that for example a variable
will never be negative and this variable is used as an index
-for an array, then the comopiler does not need to generate code
+for an array, then the compiler does not need to generate code
for an underflow-test. Remember some languages are immune to
buffer-overflow attacks, but they need to add underflow and
overflow checks everywhere. If the compiler can omit the
underflow test, for example, then this can potentially
-drastically speed up the generated code.
+drastically speed up the generated code. According to the John
+Regehr, an expert in the field of compilers, overflow checks
+can cause 5-10\% slowdown, and in some languages even 100\%
+for tight loops.\footnote{\url{http://blog.regehr.org/archives/1154}}
What do programs in our simple programming language look like?
The following grammar gives a first specification:
@@ -110,9 +112,9 @@
variable assignments, conditional jumps (\pcode{jmp?}) and
unconditional jumps (\pcode{goto}). Labels are just strings,
which can be used as the target of a jump. We assume that in
-every program the labels are unique---if there is a clash we
-do not know where to jump to. The conditional jumps and
-variable assignments involve (arithmetic) expressions.
+every program the labels are unique---if there is a clash,
+then we do not know where to jump to. The conditional jumps
+and variable assignments involve (arithmetic) expressions.
Expressions are either numbers, variables or compound
expressions built up from \pcode{+}, \pcode{*} and \emph{=}
(for simplicity reasons we do not consider any other
@@ -183,7 +185,7 @@
variables in our programs is potentially unlimited, while
memory in an actual computer, of course, is always limited
somehow on any actual. To sum up, our language might look
-ridiculously simple, but it is not much removed from
+ridiculously simple, but it is not far removed from
practically relevant issues.
@@ -207,16 +209,7 @@
condition is \pcode{true}. However, since we have numbers in
our language anyway, why not just encoding \pcode{true} as
one, and \pcode{false} as zero? In this way we can dispense
-with the additional concept of Booleans, {\bf but also we could
-replace the jump above by
-
-\begin{center}
-\code{jmp? 1 + (n + -n) done}
-\end{center}
-
-\noindent which behaves exactly the same. But what does it
-actually mean that two jumps behave the same? Or two programs
-for that matter?}
+with the additional concept of Booleans.
I hope the above discussion makes it already clear we need to
be a bit more careful with our programs. Below we shall
@@ -226,30 +219,31 @@
programs. By good programs I mean where all variables are
initialised, for example. Our interpreter will just crash if
it cannot find out the value for a variable, in case it is not
-initialised. Also we will assume that labels in good programs
+initialised. Also, we will assume that labels in good programs
are unique, otherwise our programs will calculate ``garbage''.
First we will pre-process our programs. This will simplify the
definition of our interpreter later on. We will transform
-programs into \emph{snippets}. A snippet is a label and all
-code that comes after the label. This essentially means a
-snippet is a \emph{map} from labels to code. Given that
-programs are sequences (or lists) of statements, we can easily
-calculate the snippets by just recursively generating the map.
+programs into \emph{snippets}. Their purpose is to simplify
+the definition of what the interpreter should do in case of a
+jump. A snippet is a label and all code that comes after the
+label. This essentially means a snippet is a \emph{map} from
+labels to code. Given that programs are sequences (or lists)
+of statements, we can easily calculate the snippets by just
+traversing this sequence and recursively generating the map.
Suppose a program is of the general form
\begin{center}
$stmt_1\;stmt_2\; \ldots\; stmt_n$
\end{center}
-\noindent In order to calculate the snippets of such a
-program, we have to go through the statements one by one and
-check whether they are a label. If yes, we add the label and
-the remaining statements to our map. If no we just continue
-with the next statement. To come up with a recursive
-definition for generating snippets, let us write $[]$ for the
-program that does not contain any statement. Consider the
-following definition:
+\noindent The idea is to go through this sequence of
+statements one by one and check whether they are a label. If
+yes, we add the label and the remaining statements to our map.
+If no, we just continue with the next statement. To come up
+with a recursive definition for generating snippets, let us
+write $[]$ for the program that does not contain any
+statement. Consider the following definition:
\begin{center}
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
@@ -263,26 +257,26 @@
\end{center}
\noindent In the first clause we just return the empty map for
-the program that does not contain any statements. In the
-second clause, we have to distinguish the case where the first
-statement is a label or not. If not, then we just ``throw
-away'' the label and recursively calculate the snippets for
-the rest of the program. If yes, then we do the same, but also
-update the map so that $label$ now points to the rest of the
-statements. There is one small problem we need to overcome:
-our two programs have no label as entry point---that is where
-the execution starts. We always assume that the first
-statement will be run first. For us it seems convenient if
-we add to all our programs a default label, say
-\pcode{""} (the empty string). With this we can define
-our preprocessing of programs as follows
+the program that does not contain any statement. In the second
+clause, we have to distinguish the case where the first
+statement is a label or not. As said before, if not, then we
+just ``throw away'' the label and recursively calculate the
+snippets for the rest of the program. If yes, then we do the
+same, but also update the map so that $label$ now points to
+the rest of the statements. There is one small problem we need
+to overcome: our two programs have no label as \emph{entry
+point}---that is where the execution starts. We usually assume
+that the first statement will be run first. To make this the
+default, it is convenient if we add to all our programs a
+default label, say \pcode{""} (the empty string). With this we
+can define our pre-processing of programs as follows
\begin{center}
$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$
\end{center}
\noindent Let us see how this pans out in practice. If we
-preprocess the factorial program shown earlier we obtain the
+pre-process the factorial program shown earlier, we obtain the
following map:
\begin{center}
@@ -291,6 +285,7 @@
language={},
xleftmargin=0mm,
aboveskip=0.5mm,
+ belowskip=0.5mm,
frame=single,
framerule=0mm,
framesep=0mm}
@@ -329,8 +324,28 @@
\end{center}
\noindent I highlighted the \emph{keys} in this map. Since
-there are three labels in the factorial program, there are
-three keys.
+there are three labels in the factorial program, there are
+three keys. When running the factorial program and
+encountering a jump, then we only have to consult this map, in
+order to find out what the next instruction should be.
+
+We should now be in the position to define how a program
+should be run. This ``running'' of programs is often called
+\emph{evaluation}. Let us start with the definition for
+how expressions are evaluated. A first attempt is the
+following recursive function:
+
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
+$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
+$\begin{cases}
+ \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
+ \textit{snippets}(rest) & \text{otherwise}
+\end{cases}$
+\end{tabular}
+\end{center}
+
\end{document}