\documentclass{article}\usepackage{../style}\usepackage{../langs}\usepackage{../graphics}\usepackage{../grammar}\usepackage{multicol}\begin{document}\section*{Handout 9 (Static Analysis)}If we want to improve the safety and security of our programs,we need a more principled approach to programming. Testing isgood, but as Dijkstra famously said: \begin{quote}\it ``Program testing can be a very effective way to show the\underline{\smash{presence}} of bugs, but it is hopelesslyinadequate for showing their \underline{\smash{absence}}.''\end{quote}\noindent While such a more principled approach has been thesubject of intense study for a long, long time, only in thepast few years some impressive results have been achieved. Oneis the complete formalisation and (mathematical) verificationof a microkernel operating system called seL4.\begin{center}\url{http://sel4.systems}\end{center}\noindent In 2011 this work was included in the MIT TechnologyReview in the annual list of the world’s ten most importantemergingtechnologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}While this work is impressive, its technical details are tooenormous for an explanation here. Therefore let us look atsomething much simpler, namely finding out properties aboutprograms using \emph{static analysis}.Static analysis is a technique that checks properties of aprogram without actually running the program. This shouldraise alarm bells with you---because almost all interestingproperties about programs are equivalent to the haltingproblem, which we know is undecidable. For example estimatingthe memory consumption of programs is in general undecidable,just like the halting problem. Static analysis circumventsthis undecidability-problem by essentially allowing answers\emph{yes} and \emph{no}, but also \emph{don't know}. Withthis ``trick'' even the halting problem becomesdecidable\ldots{}for example we could always say \emph{don'tknow}. Of course this would be silly. The point is that weshould be striving for a method that answers as often aspossible either \emph{yes} or \emph{no}---just in cases whenit is too difficult we fall back on the\emph{don't-know}-answer. This might sound all like abstractnonsense. Therefore let us look at a concrete example.\subsubsection*{A Simple, Idealised Programming Language}Our starting point is a small, idealised programming language.This language, amongst other things, contains variablesholding integers. We want to find out what the sign of theseintegers will be when the program runs. This sign-analysisseems like a very simple problem, but it will turn out evensuch simple problems, if approached naively, are in generalundecidable, just like Turing's halting problem. I let youthink why?Is sign-analysis of variables an interesting problem? Well,yes---if a compiler can find out that for example a variablewill never be negative and this variable is used as an indexfor an array, then the compiler does not need to generate codefor an underflow-test. Remember some languages are immune tobuffer-overflow attacks because they add underflow andoverflow checks everywhere. If the compiler can omit theunderflow test, for example, then this can potentiallydrastically speed up the generated code.What do programs in our programming language look like? The following grammar gives a first specification:\begin{multicols}{2}\begin{plstx}[rhs style=,one per line,left margin=9mm]: \meta{Stmt} ::= \meta{label} \texttt{:} | \meta{var} \texttt{:=} \meta{Exp} | \texttt{jmp?} \meta{Exp} \meta{label} | \texttt{goto} \meta{label}\\: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\\end{plstx}\columnbreak\begin{plstx}[rhs style=,one per line]: \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} | \meta{Exp} \texttt{*} \meta{Exp} | \meta{Exp} \texttt{=} \meta{Exp} | \meta{num} | \meta{var}\\\end{plstx}\end{multicols}\noindent I assume you are familiar with suchgrammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}There are three main syntactic categories: \emph{statments}and \emph{expressions} as well as \emph{programs}, which aresequences of statements. Statements are either labels,variable assignments, conditional jumps (\pcode{jmp?}) andunconditional jumps (\pcode{goto}). Labels are just strings,which can be used as the target of a jump. The conditionaljumps and variable assignments involve (arithmetic)expressions. Expressions are either numbers, variables orcompound expressions built up from \pcode{+}, \pcode{*} and\emph{=} (for simplicity reasons we do not consider any otheroperations). We assume we have negative and positive numbers,\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},\pcode{2}\ldots{} An example program that calculates thefactorial of 5 is as follows:\begin{lstlisting}[language={},xleftmargin=10mm] a := 1 n := 5 top: jmp? n = 0 done a := a * n n := n + -1 goto top done:\end{lstlisting}\noindent Each line of the program contains a statement. Inthe first two lines we assign values to the variables\pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n}is zero, in which case we jump to the end of the programmarked with the label \pcode{done}. If \pcode{n} is not zero,we multiply the content of \pcode{a} by \pcode{n}, decrease\pcode{n} by one and jump back to the beginning of the loop,marked with the label \pcode{top}.\begin{figure}[t]\begin{lstlisting}[numbers=none, language={},xleftmargin=10mm] n := 6 m1 := 0 m2 := 1loop: jmp? n = 0 done tmp := m2 m2 := m1 + m2 m1 := tmp n := n + -1 goto topdone:\end{lstlisting}\label{A mystery program in our idealised programming language.}\end{figure}Even if our language is rather small, it is still Turingcomplete---so rather powerful. However, discussing this morewould lead us to far astray.What would be missing in comparison with real(low-level machine) code? Well, the numbers we assume to bearbitrary precision, which is not the case in real code. Therebasic number formats have a rang and might over-run orunder-run from this range. Our assumption about variables,does not correspond to actual registers, which are only limited on real hardware. Obviously, real code has richeroperations than just addition, multiplication and equality.But this are not really essential limitations of our simpleexamples.\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: