handouts/ho10.tex
changeset 675 d665e7dd66d7
parent 539 ed8f014217be
child 871 94b84d880c2b
equal deleted inserted replaced
674:e0a41a1f24cf 675:d665e7dd66d7
       
     1 % !TEX program = xelatex
       
     2 \documentclass{article}
       
     3 \usepackage{../style}
       
     4 \usepackage{../langs}
       
     5 \usepackage{../graphics}
       
     6 \usepackage{../grammar}
       
     7 \usepackage{multicol}
       
     8 
       
     9 %%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
       
    10 
       
    11 \begin{document}
       
    12 \fnote{\copyright{} Christian Urban, King's College London, 2014}
       
    13 
       
    14 %% why are shuttle flights so good with software
       
    15 %%http://www.fastcompany.com/28121/they-write-right-stuff
       
    16 
       
    17 \section*{Handout 10 (Static Analysis)}
       
    18 
       
    19 If we want to improve the safety and security of our programs,
       
    20 we need a more principled approach to programming. Testing is
       
    21 good, but as Edsger Dijkstra famously wrote: 
       
    22 
       
    23 \begin{quote}\it 
       
    24 ``Program testing can be a very effective way to show the
       
    25 \underline{\smash{presence}} of bugs, but it is hopelessly
       
    26 inadequate for showing their \underline{\smash{absence}}.''
       
    27 \end{quote}
       
    28 
       
    29 \noindent While such a more principled approach has been the
       
    30 subject of intense study for a long, long time, only in the
       
    31 past few years some impressive results have been achieved. One
       
    32 is the complete formalisation and (mathematical) verification
       
    33 of a microkernel operating system called seL4.
       
    34 
       
    35 \begin{center}
       
    36 \url{http://sel4.systems}
       
    37 \end{center}
       
    38 
       
    39 \noindent In 2011 this work was included in the MIT Technology
       
    40 Review in the annual list of the world’s ten most important
       
    41 emerging
       
    42 technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
       
    43 While this work is impressive, its technical details are too
       
    44 enormous for an explanation here. Therefore let us look at
       
    45 something much simpler, namely finding out properties about
       
    46 programs using \emph{static analysis}.
       
    47 
       
    48 Static analysis is a technique that checks properties of a
       
    49 program without actually running the program. This should
       
    50 raise alarm bells with you---because almost all interesting
       
    51 properties about programs  are equivalent to the halting
       
    52 problem, which we know is undecidable. For example estimating
       
    53 the memory consumption of programs is in general undecidable,
       
    54 just like the halting problem. Static analysis circumvents
       
    55 this undecidability-problem by essentially allowing answers
       
    56 \emph{yes} and \emph{no}, but also \emph{don't know}. With
       
    57 this ``trick'' even the halting problem becomes
       
    58 decidable\ldots{}for example we could always say \emph{don't
       
    59 know}. Of course this would be silly. The point is that we
       
    60 should be striving for a method that answers as often as
       
    61 possible either \emph{yes} or \emph{no}---just in cases when
       
    62 it is too difficult we fall back on the
       
    63 \emph{don't-know}-answer. This might sound all like abstract
       
    64 nonsense. Therefore let us look at a concrete example.
       
    65 
       
    66 
       
    67 \subsubsection*{A Simple, Idealised Programming Language}
       
    68 
       
    69 Our starting point is a small, idealised programming language.
       
    70 It is idealised because we cut several corners in comparison
       
    71 with real programming languages. The language we will study
       
    72 contains, amongst other things, variables holding integers.
       
    73 Using static analysis, we want to find out what the sign of
       
    74 these integers (positive or negative) will be when the program
       
    75 runs. This sign-analysis seems like a very simple problem. But
       
    76 even such simple problems, if approached naively, are in
       
    77 general undecidable, just like Turing's halting problem. I let
       
    78 you think why?
       
    79 
       
    80 
       
    81 Is sign-analysis of variables an interesting problem? Well,
       
    82 yes---if a compiler can find out that for example a variable
       
    83 will never be negative and this variable is used as an index
       
    84 for an array, then the compiler does not need to generate code
       
    85 for an underflow-check. Remember some languages are immune to
       
    86 buffer-overflow attacks, but they need to add underflow and
       
    87 overflow checks everywhere. According to John Regher, an
       
    88 expert in the field of compilers, overflow checks can cause
       
    89 5-10\% slowdown, in some languages even 100\% for tight
       
    90 loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If
       
    91 the compiler can omit the underflow check, for example, then
       
    92 this can potentially drastically speed up the generated code. 
       
    93 
       
    94 What do programs in our simple programming language look like?
       
    95 The following grammar gives a first specification:
       
    96 
       
    97 \begin{multicols}{2}
       
    98 \begin{plstx}[rhs style=,one per line,left margin=9mm]
       
    99 : \meta{Stmt} ::= \meta{label} \texttt{:}
       
   100                 | \meta{var} \texttt{:=} \meta{Exp}
       
   101                 | \texttt{jmp?} \meta{Exp} \meta{label}
       
   102                 | \texttt{goto} \meta{label}\\
       
   103 : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
       
   104 \end{plstx}
       
   105 \columnbreak
       
   106 \begin{plstx}[rhs style=,one per line]
       
   107 : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
       
   108                | \meta{Exp} \texttt{*} \meta{Exp}
       
   109                | \meta{Exp} \texttt{=} \meta{Exp} 
       
   110                | \meta{num}
       
   111                | \meta{var}\\
       
   112 \end{plstx}
       
   113 \end{multicols}
       
   114 
       
   115 \noindent I assume you are familiar with such
       
   116 grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}
       
   117 There are three main syntactic categories: \emph{statments}
       
   118 and \emph{expressions} as well as \emph{programs}, which are
       
   119 sequences of statements. Statements are either labels,
       
   120 variable assignments, conditional jumps (\pcode{jmp?}) and
       
   121 unconditional jumps (\pcode{goto}). Labels are just strings,
       
   122 which can be used as the target of a jump. We assume that in
       
   123 every program the labels are unique---if there is a clash,
       
   124 then we do not know where to jump to. The conditional jumps
       
   125 and variable assignments involve (arithmetic) expressions.
       
   126 Expressions are either numbers, variables or compound
       
   127 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
       
   128 (for simplicity reasons we do not consider any other
       
   129 operations). We assume we have negative and positive numbers,
       
   130 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
       
   131 \pcode{2}\ldots{} An example program that calculates the
       
   132 factorial of 5 is in our programming language as follows:
       
   133 
       
   134 \begin{lstlisting}[language={},xleftmargin=10mm]
       
   135       a := 1
       
   136       n := 5 
       
   137 top:  
       
   138       jmp? n = 0 done 
       
   139       a := a * n 
       
   140       n := n + -1 
       
   141       goto top 
       
   142 done:
       
   143 \end{lstlisting}
       
   144 
       
   145 \noindent As can be seen each line of the program contains a
       
   146 statement. In the first two lines we assign values to the
       
   147 variables \pcode{a} and \pcode{n}. In line 4 we test whether
       
   148 \pcode{n} is zero, in which case we jump to the end of the
       
   149 program marked with the label \pcode{done}. If \pcode{n} is
       
   150 not zero, we multiply the content of \pcode{a} by \pcode{n},
       
   151 decrease \pcode{n} by one and jump back to the beginning of
       
   152 the loop, marked with the label \pcode{top}. Another program
       
   153 in our language is shown in Figure~\ref{fib}. I let you think
       
   154 what it calculates.
       
   155  
       
   156 \begin{figure}[t]
       
   157 \begin{lstlisting}[numbers=none,
       
   158                    language={},xleftmargin=10mm]
       
   159       n := 6
       
   160       m1 := 0
       
   161       m2 := 1
       
   162 loop: 
       
   163       jmp? n = 0 done
       
   164       tmp := m2
       
   165       m2 := m1 + m2
       
   166       m1 := tmp
       
   167       n := n + -1
       
   168       goto top
       
   169 done:
       
   170 \end{lstlisting}
       
   171 \caption{A mystery program in our idealised programming language.
       
   172 Try to find out what it calculates! \label{fib}}
       
   173 \end{figure}
       
   174 
       
   175 Even if our language is rather small, it is still Turing
       
   176 complete---meaning quite powerful. However, discussing this
       
   177 fact in more detail would lead us too far astray. Clearly, our
       
   178 programming is rather low-level and not very comfortable for
       
   179 writing programs. It is inspired by real machine code, which
       
   180 is the code that is executed by a CPU. So a more interesting
       
   181 question is what is missing in comparison with real machine
       
   182 code? Well, not much\ldots{}in principle. Real machine code,
       
   183 of course, contains many more arithmetic instructions (not
       
   184 just addition and multiplication) and many more conditional
       
   185 jumps. We could add these to our language if we wanted, but
       
   186 complexity is really beside the point here. Furthermore, real
       
   187 machine code has many instructions for manipulating memory. We
       
   188 do not have this at all. This is actually a more serious
       
   189 simplification because we assume numbers to be arbitrary small
       
   190 or large, which is not the case with real machine code. In
       
   191 real machine code, basic number formats have a range and might
       
   192 over-flow or under-flow this range. Also the number of
       
   193 variables in our programs is potentially unlimited, while
       
   194 memory in an actual computer, of course, is always limited. To
       
   195 sum up, our language might look ridiculously simple, but it is
       
   196 not too far removed from practically relevant issues.
       
   197 
       
   198 
       
   199 \subsubsection*{An Interpreter}
       
   200 
       
   201 Designing a language is like playing god: you can say what
       
   202 names for variables you allow; what programs should look like;
       
   203 most importantly you can decide what each part of the program
       
   204 should mean and do. While our language is quite simple and the
       
   205 meaning of statements, for example, is rather straightforward,
       
   206 there are still places where we need to make real choices. For
       
   207 example consider the conditional jumps, say the one in the
       
   208 factorial program: 
       
   209 
       
   210 \begin{center}
       
   211 \code{jmp? n = 0 done}
       
   212 \end{center}
       
   213 
       
   214 \noindent How should they work? We could introduce Booleans
       
   215 (\pcode{true} and \pcode{false}) and then jump only when the
       
   216 condition is \pcode{true}. However, since we have numbers in
       
   217 our language anyway, why not just encoding \pcode{true} as
       
   218 one, and \pcode{false} as zero? In this way we can dispense
       
   219 with the additional concept of Booleans.
       
   220 
       
   221 I hope the above discussion makes it already clear we need to
       
   222 be a bit more careful with our programs. Below we shall
       
   223 describe an interpreter for our programming language, which
       
   224 specifies exactly how programs are supposed to be
       
   225 run\ldots{}at least we will specify this for all \emph{good}
       
   226 programs. By good programs I mean where all variables are
       
   227 initialised, for example. Our interpreter will just crash if
       
   228 it cannot find out the value for a variable when it is not
       
   229 initialised. Also, we will assume that labels in good programs
       
   230 are unique, otherwise our programs will calculate ``garbage''.
       
   231 
       
   232 First we will pre-process our programs. This will simplify the
       
   233 definition of our interpreter later on. By pre-processing our
       
   234 programs we will transform programs into \emph{snippets}. A
       
   235 snippet is a label and all the code that comes after the
       
   236 label. This essentially means a snippet is a \emph{map} from
       
   237 labels to code.\footnote{Be sure you know what maps are. In a
       
   238 programming context they are often represented as association
       
   239 list where some data is associated with a key.} 
       
   240 
       
   241 Given that programs are sequences (or lists) of statements, we
       
   242 can easily calculate the snippets by just traversing this
       
   243 sequence and recursively generating the map. Suppose a program
       
   244 is of the general form
       
   245 
       
   246 \begin{center}
       
   247 $stmt_1\;stmt_2\; \ldots\; stmt_n$
       
   248 \end{center}
       
   249 
       
   250 \noindent The idea is to go through this sequence of
       
   251 statements one by one and check whether they are a label. If
       
   252 yes, we add the label and the remaining statements to our map.
       
   253 If no, we just continue with the next statement. To come up
       
   254 with a recursive definition for generating snippets, let us
       
   255 write $[]$ for the program that does not contain any
       
   256 statement. Consider the following definition:
       
   257 
       
   258 \begin{center}
       
   259 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   260 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
       
   261 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
       
   262 $\begin{cases}
       
   263    \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
       
   264    \textit{snippets}(rest)                & \text{otherwise}
       
   265 \end{cases}$   
       
   266 \end{tabular}
       
   267 \end{center}
       
   268 
       
   269 \noindent In the first clause we just return the empty map for
       
   270 the program that does not contain any statement. In the second
       
   271 clause, we have to distinguish the case where the first
       
   272 statement is a label or not. As said before, if not, then we
       
   273 just ``throw away'' the label and recursively calculate the
       
   274 snippets for the rest of the program (the otherwise clause).
       
   275 If yes, then we do the same, but also update the map so that
       
   276 $label$ now points to the rest of the statements (the if
       
   277 clause). This looks all realtively straightforward, but there
       
   278 is one small problem we need to overcome: our two programs
       
   279 shown so far have no label as \emph{entry point}---that is
       
   280 where the execution is supposed to start. We usually assume
       
   281 that the first statement will be run first. To make this the
       
   282 default, it is convenient if we add to all our programs a
       
   283 default label, say \pcode{""} (the empty string). With this we
       
   284 can define our pre-processing of programs as follows
       
   285 
       
   286 \begin{center}
       
   287 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
       
   288 \end{center} 
       
   289 
       
   290 \noindent Let us see how this pans out in practice. If we
       
   291 pre-process the factorial program shown earlier, we obtain the 
       
   292 following map:
       
   293 
       
   294 \begin{center}
       
   295 \small
       
   296 \lstset{numbers=none,
       
   297         language={},
       
   298         xleftmargin=0mm,
       
   299         aboveskip=0.5mm,
       
   300         belowskip=0.5mm,
       
   301         frame=single,
       
   302         framerule=0mm,
       
   303         framesep=0mm}
       
   304 \begin{tikzpicture}[node distance=0mm]
       
   305 \node (A1) [draw]{\pcode{""}};
       
   306 \node (B1) [right=of A1] {$\mapsto$};
       
   307 \node (C1) [right=of B1,anchor=north west] {
       
   308 \begin{minipage}{3.5cm}
       
   309 \begin{lstlisting}[language={},xleftmargin=0mm]
       
   310   a := 1
       
   311   n := 5 
       
   312 top:  
       
   313   jmp? n = 0 done 
       
   314   a := a * n 
       
   315   n := n + -1 
       
   316   goto top 
       
   317 done:
       
   318 \end{lstlisting}
       
   319 \end{minipage}};
       
   320 \node (A2) [right=of C1.north east,draw] {\pcode{top}};
       
   321 \node (B2) [right=of A2] {$\mapsto$};
       
   322 \node (C2) [right=of B2, anchor=north west] {
       
   323 \begin{minipage}{3.5cm}
       
   324 \begin{lstlisting}[language={},xleftmargin=0mm]
       
   325   jmp? n = 0 done 
       
   326   a := a * n 
       
   327   n := n + -1 
       
   328   goto top 
       
   329 done:
       
   330 \end{lstlisting} 
       
   331 \end{minipage}}; 
       
   332 \node (A3) [right=of C2.north east,draw] {\pcode{done}};
       
   333 \node (B3) [right=of A3] {$\mapsto$};
       
   334 \node (C3) [right=of B3] {$[]$};
       
   335 \end{tikzpicture}
       
   336 \end{center}
       
   337 
       
   338 \noindent I highlighted the \emph{keys} in this map. Since
       
   339 there are three labels in the factorial program (remember we
       
   340 added \pcode{""}), there are three keys. When running the
       
   341 factorial program and encountering a jump, then we only have
       
   342 to consult this snippets-map in order to find out what the
       
   343 next statements should be.
       
   344 
       
   345 We should now be in the position to define how a program
       
   346 should be run. In the context of interpreters, this
       
   347 ``running'' of programs is often called \emph{evaluation}. Let
       
   348 us start with the definition of how expressions are evaluated.
       
   349 A first attempt might be the following recursive function:
       
   350 
       
   351 \begin{center}
       
   352 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   353 $\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$
       
   354   \qquad\text{if}\; \texttt{n}\; \text{is a number like} 
       
   355   \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
       
   356 \pcode{2}\ldots{}\\
       
   357 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, 
       
   358                     \texttt{e}_\texttt{2})$ & 
       
   359   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) + 
       
   360            \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
       
   361 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, 
       
   362                     \texttt{e}_\texttt{2})$ & 
       
   363   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) * 
       
   364            \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
       
   365 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, 
       
   366                     \texttt{e}_\texttt{2})$ & 
       
   367   $\dn$ & $\begin{cases}
       
   368   1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) =
       
   369                  \textit{eval\_exp}(\texttt{e}_\texttt{2})\\
       
   370   0 & \text{otherwise}
       
   371   \end{cases}$          
       
   372 \end{tabular}
       
   373 \end{center}
       
   374 
       
   375 \noindent While this should look all rather intuitive`, still
       
   376 be very careful. There is a subtlety which can be easily
       
   377 overlooked: The function \textit{eval\_exp} takes an
       
   378 expression of our programming language as input and returns a
       
   379 number as output. Therefore whenever we encounter a number in
       
   380 our program, we just return this number---this is defined in
       
   381 the first clause above. Whenever we encounter an addition,
       
   382 well then we first evaluate the left-hand side
       
   383 $\texttt{e}_\texttt{1}$ of the addition (this will give a
       
   384 number), then evaluate the right-hand side
       
   385 $\texttt{e}_\texttt{2}$ (this gives another number), and
       
   386 finally add both numbers together. Here is the subtlety: on
       
   387 the left-hand side of the $\dn$ we have a \texttt{+} (in the
       
   388 teletype font) which is the symbol for addition in our
       
   389 programming language. On the right-hand side we have $+$ which
       
   390 stands for the arithmetic operation from ``mathematics'' of
       
   391 adding two numbers. These are rather different concepts---one
       
   392 is a symbol (which we made up), and the other a mathematical
       
   393 operation. When we will have a look at an actual
       
   394 implementation of our interpreter, the mathematical operation
       
   395 will be the function for addition from the programming
       
   396 language in which we \underline{\smash{implement}} our
       
   397 interpreter. While the \texttt{+} is just a symbol that is
       
   398 used in our programming language. Clearly we have to use a
       
   399 symbol that is a good mnemonic for addition, otherwise we will
       
   400 confuse the programmers working with our language. Therefore
       
   401 we use $\texttt{+}$. A similar choice is made for times in the
       
   402 third clause and equality in the fourth clause. 
       
   403 
       
   404 Remember I wrote at the beginning of this section about being
       
   405 god when designing a programming language. You can see this
       
   406 here: we need to give meaning to symbols. At the moment
       
   407 however, we are a poor fallible god. Look again at the grammar
       
   408 of our programming language and our definition. Clearly, an
       
   409 expression can contain variables. So far we have ignored them.
       
   410 What should our interpreter do with variables? They might
       
   411 change during the evaluation of a program. For example the
       
   412 variable \pcode{n} in the factorial program counts down from 5
       
   413 down to 0. How can we improve our definition above to give also
       
   414 an answer whenever our interpreter encounters a variable in an
       
   415 expression? The solution is to add an \emph{environment},
       
   416 written $env$, as an additional input argument to our
       
   417 \textit{eval\_exp} function.
       
   418  
       
   419 \begin{center}
       
   420 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   421 $\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$
       
   422   \qquad\text{if}\; \texttt{n}\; \text{is a number like} 
       
   423   \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
       
   424 \pcode{2}\ldots{}\\
       
   425 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, 
       
   426                     \texttt{e}_\texttt{2}, env)$ & 
       
   427   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) + 
       
   428            \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
       
   429 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, 
       
   430                     \texttt{e}_\texttt{2}, env)$ & 
       
   431   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) * 
       
   432            \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
       
   433 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, 
       
   434                     \texttt{e}_\texttt{2}, env)$ & 
       
   435   $\dn$ & $\begin{cases}
       
   436   1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) =
       
   437                  \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\
       
   438   0 & \text{otherwise}
       
   439   \end{cases}$\\
       
   440 $\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$      
       
   441 \end{tabular}
       
   442 \end{center}
       
   443  
       
   444 \noindent This environment $env$ also acts like a map: it
       
   445 associates variables with their current values. For example
       
   446 after evaluating the first two lines in our factorial
       
   447 program, such an environment might look as follows
       
   448 
       
   449 \begin{center}
       
   450 \begin{tabular}{ll}
       
   451 $\fbox{\texttt{a}} \mapsto \texttt{1}$ &
       
   452 $\fbox{\texttt{n}} \mapsto \texttt{5}$
       
   453 \end{tabular}
       
   454 \end{center}
       
   455 
       
   456 \noindent Again I highlighted the keys. In the clause for
       
   457 variables, we can therefore consult this environment and
       
   458 return whatever value is currently stored for this variable.
       
   459 This is written $env(x)$---meaning we query this map with $x$
       
   460 we obtain the corresponding number. You might ask what happens
       
   461 if an environment does not contain any value for, say, the
       
   462 variable $x$? Well, then our interpreter just ``crashes'', or
       
   463 more precisely will raise an exception. In this case we have a
       
   464 ``bad'' program that tried to use a variable before it was
       
   465 initialised. The programmer should not have done this. In a
       
   466 real programming language, we would of course try a bit harder
       
   467 and for example give an error at compile time, or design our
       
   468 language in such a way that this can never happen. With the
       
   469 second version of \textit{eval\_exp} we completed our
       
   470 definition for evaluating expressions.
       
   471  
       
   472 Next comes the evaluation function for statements. We define
       
   473 this function in such a way that we recursively evaluate a
       
   474 whole sequence of statements. Assume a program $p$ (you want
       
   475 to evaluate) and its pre-processed snippets $sn$. Then we can
       
   476 define:
       
   477 
       
   478 \begin{center}
       
   479 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
       
   480 $\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\
       
   481 $\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &
       
   482   $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ 
       
   483 $\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & 
       
   484   $\dn$ & $\textit{eval\_stmts}(rest, 
       
   485            env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\  
       
   486 $\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ 
       
   487  & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\
       
   488 $\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ 
       
   489  & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}
       
   490  \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ 
       
   491  & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\
       
   492  \textit{eval\_stmts}(rest, env) & \text{otherwise}\\
       
   493  \end{array}\end{cases}$  
       
   494 \end{tabular}
       
   495 \end{center}
       
   496 
       
   497 \noindent The first clause is for the empty program, or when
       
   498 we arrived at the end of the program. In this case we just
       
   499 return the environment. The second clause is for when the next
       
   500 statement is a label. That means the program is of the form
       
   501 $\texttt{label:}\;rest$ where the label is some string and
       
   502 $rest$ stands for all following statements. This case is easy,
       
   503 because our evaluation function just discards the label and
       
   504 evaluates the rest of the statements (we already extracted all
       
   505 important information about labels when we pre-processed our
       
   506 programs and generated the snippets). The third clause is for
       
   507 variable assignments. Again we just evaluate the rest for the
       
   508 statements, but with a modified environment---since the
       
   509 variable assignment is supposed to introduce a new variable or
       
   510 change the current value of a variable. For this modification
       
   511 of the environment we first evaluate the expression
       
   512 $\texttt{e}$ using our evaluation function for expressions.
       
   513 This gives us a number. Then we assign this number to the
       
   514 variable $x$ in the environment. This modified environment
       
   515 will be used to evaluate the rest of the program. The fourth
       
   516 clause is for the unconditional jump to a label, called
       
   517 \texttt{lbl}. That means we have to look up in our snippets
       
   518 map $sn$ what are the next statements for this label.
       
   519 Therefore we will continue with evaluating, not with the rest
       
   520 of the program, but with the statements stored in the
       
   521 snippets-map under the label $\texttt{lbl}$. The fifth clause
       
   522 for conditional jumps is similar, but to decide whether to
       
   523 make the jump we first need to evaluate the expression
       
   524 $\texttt{e}$ in order to find out whether it is $1$. If yes,
       
   525 we jump, otherwise we just continue with evaluating the rest
       
   526 of the program.
       
   527 
       
   528 Our interpreter works in two stages: First we pre-process our
       
   529 program generating the snippets map $sn$, say. Second we call
       
   530 the evaluation function with the default entry point and the
       
   531 empty environment:
       
   532 
       
   533 \begin{center}
       
   534 $\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$
       
   535 \end{center}
       
   536  
       
   537 \noindent It is interesting to note that our interpreter when
       
   538 it comes to the end of the program returns an environment. Our
       
   539 programming language does not contain any constructs for input
       
   540 and output. Therefore this environment is the only effect we
       
   541 can observe when running the program (apart from that our
       
   542 interpreter might need some time before finishing the
       
   543 evaluation of the program and the CPU getting hot). Evaluating 
       
   544 the factorial program with our interpreter we receive as
       
   545 ``answer''-environment
       
   546 
       
   547 \begin{center}
       
   548 \begin{tabular}{ll}
       
   549 $\fbox{\texttt{a}} \mapsto \texttt{120}$ &
       
   550 $\fbox{\texttt{n}} \mapsto \texttt{0}$
       
   551 \end{tabular}
       
   552 \end{center}
       
   553 
       
   554 \noindent While the discussion above should have illustrated
       
   555 the ideas, in order to do some serious calculations, we clearly
       
   556 need to implement the interpreter.
       
   557 
       
   558 
       
   559 \subsubsection*{Scala Code for the Interpreter}
       
   560 
       
   561 Functional programming languages are very convenient for
       
   562 implementations of interpreters. A good choice for a
       
   563 functional programming language is Scala, a programming
       
   564 language that combines functional and object-oriented
       
   565 pro\-gramming-styles. It has received in the last five years or
       
   566 so quite a bit of attention. One reason for this attention is
       
   567 that, like the Java programming language, Scala compiles to
       
   568 the Java Virtual Machine (JVM) and therefore Scala programs
       
   569 can run under MacOSX, Linux and Windows.\footnote{There are
       
   570 also experimental backends for Android and JavaScript.} Unlike
       
   571 Java, however, Scala often allows programmers to write very
       
   572 concise and elegant code. Some therefore say Scala is the much
       
   573 better Java. A number of companies, The Guardian, Twitter,
       
   574 Coursera, FourSquare, LinkedIn to name a few, either use Scala
       
   575 exclusively in production code, or at least to some
       
   576 substantial degree. If you want to try out Scala yourself, the
       
   577 Scala compiler can be downloaded from
       
   578 
       
   579 \begin{quote}
       
   580 \url{http://www.scala-lang.org}
       
   581 \end{quote}
       
   582 
       
   583 Let us have a look at the Scala code shown in
       
   584 Figure~\ref{code}. It shows the entire code for the
       
   585 interpreter, though the implementation is admittedly no
       
   586 frills.
       
   587 
       
   588 \begin{figure}[t]
       
   589 \small
       
   590 \lstinputlisting[language=Scala]{../progs/inter.scala}
       
   591 \caption{The entire code of the interpreter for our
       
   592 idealised programming language.\label{code}}
       
   593 \end{figure}
       
   594 
       
   595 
       
   596 \subsubsection*{Static Analysis}
       
   597 
       
   598 Finally we can come back to our original problem, namely 
       
   599 finding out what the signs of variables are 
       
   600 
       
   601 \begin{center}
       
   602 
       
   603 
       
   604 \end{center}
       
   605  
       
   606 \end{document}
       
   607 
       
   608 %% list of static analysers for C
       
   609 http://spinroot.com/static/index.html
       
   610 
       
   611 %% NASA coding rules for C
       
   612 http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf
       
   613 
       
   614 %%% Local Variables: 
       
   615 %%% mode: latex
       
   616 %%% TeX-master: t
       
   617 %%% End: