handouts/ho09.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Mon, 08 Dec 2014 11:14:33 +0000
changeset 347 efad8155513f
parent 346 5a6e8b7d20f7
child 350 54d6fc856950
permissions -rw-r--r--
updated

\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\usepackage{../graphics}
\usepackage{../grammar}
\usepackage{multicol}

\begin{document}

\section*{Handout 9 (Static Analysis)}

If we want to improve the safety and security of our programs,
we need a more principled approach to programming. Testing is
good, but as Dijkstra famously said: 

\begin{quote}\it 
``Program testing can be a very effective way to show the
\underline{\smash{presence}} of bugs, but it is hopelessly
inadequate for showing their \underline{\smash{absence}}.''
\end{quote}

\noindent While such a more principled approach has been the
subject of intense study for a long, long time, only in the
past few years some impressive results have been achieved. One
is the complete formalisation and (mathematical) verification
of a microkernel operating system called seL4.

\begin{center}
\url{http://sel4.systems}
\end{center}

\noindent This work was in 2011 included in the MIT Technology
Review in the annual list of the world’s ten most important
emerging
technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
While this work is impressive, its technical details are too
enormous for an explanation here. Therefore let us look at
something much simpler, namely finding out properties about
programs using \emph{static analysis}.

Static analysis is one technique that checks properties of a
program without actually running the program. This should
raise alarm bells with you---because almost all interesting
properties about programs are equivalent to the halting
problem, which we know is undecidable. For example estimating
the memory consumption of programs is in general undecidable,
just like the halting problem. Static analysis circumvents
this undecidability-problem by essentially allowing answers
\emph{yes} and \emph{no}, but also \emph{don't know}. With
this ``trick'' even the halting problem becomes
decidable\ldots{}for example we could always say \emph{don't
know}. Of course this would be silly. The point is that we
should be striving for a method that answers as often as
possible either \emph{yes} or \emph{no}---just in cases when
it is too difficult we fall back on the
\emph{don't-know}-answer. This might sound all like abstract
nonsense. Therefore let us look at a concrete example.


\subsubsection*{A Simple, Idealised Programming Language}

Our starting point is a small, idealised programming language.
This language contains variables holding integers. We want to
find out what the sign of these integers will be when the
program runs. This seems like a very simple problem, but it
will turn out even such a simple analysis if approached
naively is in general undecidable, just like Turing's halting
problem. I let you think why?


Is sign-analysis of variables an interesting problem? Well,
yes---if a compiler can find out that for example a variable
will never be negative and this variable is used as an index
for an array, then the compiler does not need to generate code
for an underflow-test. Remember some languages are immune to
buffer-overflow attacks because they add bound checks
everywhere. This could potentially drastically speed up the
generated code.

Since we want to 

\begin{multicols}{2}
\begin{plstx}[rhs style=,one per line,left margin=9mm]
: \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
               | \meta{Exp} \texttt{*} \meta{Exp}
               | \meta{Exp} \texttt{=} \meta{Exp} 
               | \meta{num}
               | \meta{var}\\
\end{plstx}
\columnbreak
\begin{plstx}[rhs style=,one per line]
: \meta{Stmt} ::= \meta{label} \texttt{:}
                | \meta{var} \texttt{:=} \meta{Exp}
                | \texttt{jmp?} \meta{Exp} \meta{label}
                | \texttt{goto} \meta{label}\\
: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
\end{plstx}
\end{multicols}

\begin{lstlisting}[numbers=none,
                   language={},xleftmargin=10mm]
      a := 1
      n := 5 
top:  jmp? n = 0 done 
      a := a * n 
      n := n + -1 
      goto top 
done:
\end{lstlisting}

\begin{lstlisting}[numbers=none,
                   language={},xleftmargin=10mm]
      n := 6
      m1 := 0
      m2 := 1
loop: jmp? n = 0 done
      tmp := m2
      m2 := m1 + m2
      m1 := tmp
      n := n + -1
      goto top
done:
\end{lstlisting}

\bigskip

\noindent What would be missing in comparison with real
(low-level machine) code? Well, the numbers we assume to be
arbitrary precision, which is not the case in real code. There
basic number formats have a rang and might over-run or
under-run from this range. Our assumption about variables,
does not correspond to actual registers, which are only 
limited on real hardware. Obviously, real code has richer
operations than just addition, multiplication and equality.
But this are not really essential limitations of our simple
examples.

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: