diff -r 9f4f626cefea -r 54d6fc856950 handouts/ho09.tex --- a/handouts/ho09.tex Tue Dec 09 12:09:33 2014 +0000 +++ b/handouts/ho09.tex Wed Dec 10 23:50:35 2014 +0000 @@ -29,7 +29,7 @@ \url{http://sel4.systems} \end{center} -\noindent This work was in 2011 included in the MIT Technology +\noindent In 2011 this work was included in the MIT Technology Review in the annual list of the world’s ten most important emerging technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}} @@ -38,7 +38,7 @@ something much simpler, namely finding out properties about programs using \emph{static analysis}. -Static analysis is one technique that checks properties of a +Static analysis is a technique that checks properties of a program without actually running the program. This should raise alarm bells with you---because almost all interesting properties about programs are equivalent to the halting @@ -60,12 +60,13 @@ \subsubsection*{A Simple, Idealised Programming Language} Our starting point is a small, idealised programming language. -This language contains variables holding integers. We want to -find out what the sign of these integers will be when the -program runs. This seems like a very simple problem, but it -will turn out even such a simple analysis if approached -naively is in general undecidable, just like Turing's halting -problem. I let you think why? +This language, amongst other things, contains variables +holding integers. We want to find out what the sign of these +integers will be when the program runs. This sign-analysis +seems like a very simple problem, but it will turn out even +such simple problems, if approached naively, are in general +undecidable, just like Turing's halting problem. I let you +think why? Is sign-analysis of variables an interesting problem? Well, @@ -73,46 +74,76 @@ will never be negative and this variable is used as an index for an array, then the compiler does not need to generate code for an underflow-test. Remember some languages are immune to -buffer-overflow attacks because they add bound checks -everywhere. This could potentially drastically speed up the -generated code. +buffer-overflow attacks because they add underflow and +overflow checks everywhere. If the compiler can omit the +underflow test, for example, then this can potentially +drastically speed up the generated code. -Since we want to +What do programs in our programming language look like? The +following grammar gives a first specification: \begin{multicols}{2} -\begin{plstx}[rhs style=,one per line,left margin=9mm] : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} - | \meta{Exp} \texttt{*} \meta{Exp} - | \meta{Exp} \texttt{=} \meta{Exp} - | \meta{num} - | \meta{var}\\ -\end{plstx} -\columnbreak -\begin{plstx}[rhs style=,one per line] +\begin{plstx}[rhs style=,one per line,left margin=9mm] : \meta{Stmt} ::= \meta{label} \texttt{:} | \meta{var} \texttt{:=} \meta{Exp} | \texttt{jmp?} \meta{Exp} \meta{label} | \texttt{goto} \meta{label}\\ : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\ \end{plstx} +\columnbreak +\begin{plstx}[rhs style=,one per line] : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} + | \meta{Exp} \texttt{*} \meta{Exp} + | \meta{Exp} \texttt{=} \meta{Exp} + | \meta{num} + | \meta{var}\\ +\end{plstx} \end{multicols} -\begin{lstlisting}[numbers=none, - language={},xleftmargin=10mm] +\noindent I assume you are familiar with such +grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}} +There are three main syntactic categories: \emph{statments} +and \emph{expressions} as well as \emph{programs}, which are +sequences of statements. Statements are either labels, +variable assignments, conditional jumps (\pcode{jmp?}) and +unconditional jumps (\pcode{goto}). Labels are just strings, +which can be used as the target of a jump. The conditional +jumps and variable assignments involve (arithmetic) +expressions. Expressions are either numbers, variables or +compound expressions built up from \pcode{+}, \pcode{*} and +\emph{=} (for simplicity reasons we do not consider any other +operations). We assume we have negative and positive numbers, +\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, +\pcode{2}\ldots{} An example program that calculates the +factorial of 5 is as follows: + +\begin{lstlisting}[language={},xleftmargin=10mm] a := 1 n := 5 -top: jmp? n = 0 done +top: + jmp? n = 0 done a := a * n n := n + -1 goto top done: \end{lstlisting} +\noindent Each line of the program contains a statement. In +the first two lines we assign values to the variables +\pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n} +is zero, in which case we jump to the end of the program +marked with the label \pcode{done}. If \pcode{n} is not zero, +we multiply the content of \pcode{a} by \pcode{n}, decrease +\pcode{n} by one and jump back to the beginning of the loop, +marked with the label \pcode{top}. + +\begin{figure}[t] \begin{lstlisting}[numbers=none, language={},xleftmargin=10mm] n := 6 m1 := 0 m2 := 1 -loop: jmp? n = 0 done +loop: + jmp? n = 0 done tmp := m2 m2 := m1 + m2 m1 := tmp @@ -120,10 +151,14 @@ goto top done: \end{lstlisting} +\label{A mystery program in our idealised programming language.} +\end{figure} -\bigskip +Even if our language is rather small, it is still Turing +complete---so rather powerful. However, discussing this more +would lead us to far astray. -\noindent What would be missing in comparison with real +What would be missing in comparison with real (low-level machine) code? Well, the numbers we assume to be arbitrary precision, which is not the case in real code. There basic number formats have a rang and might over-run or