handouts/ho09.tex
changeset 539 ed8f014217be
child 677 decfd8cf8180
equal deleted inserted replaced
538:17acdd516ccd 539:ed8f014217be
       
     1 \documentclass{article}
       
     2 \usepackage{../style}
       
     3 \usepackage{../langs}
       
     4 \usepackage{../graphics}
       
     5 \usepackage{../grammar}
       
     6 \usepackage{multicol}
       
     7 
       
     8 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
       
     9 
       
    10 \begin{document}
       
    11 \fnote{\copyright{} Christian Urban, King's College London, 2014}
       
    12 
       
    13 %% why are shuttle flights so good with software
       
    14 %%http://www.fastcompany.com/28121/they-write-right-stuff
       
    15 
       
    16 \section*{Handout 9 (Static Analysis)}
       
    17 
       
    18 If we want to improve the safety and security of our programs,
       
    19 we need a more principled approach to programming. Testing is
       
    20 good, but as Edsger Dijkstra famously wrote: 
       
    21 
       
    22 \begin{quote}\it 
       
    23 ``Program testing can be a very effective way to show the
       
    24 \underline{\smash{presence}} of bugs, but it is hopelessly
       
    25 inadequate for showing their \underline{\smash{absence}}.''
       
    26 \end{quote}
       
    27 
       
    28 \noindent While such a more principled approach has been the
       
    29 subject of intense study for a long, long time, only in the
       
    30 past few years some impressive results have been achieved. One
       
    31 is the complete formalisation and (mathematical) verification
       
    32 of a microkernel operating system called seL4.
       
    33 
       
    34 \begin{center}
       
    35 \url{http://sel4.systems}
       
    36 \end{center}
       
    37 
       
    38 \noindent In 2011 this work was included in the MIT Technology
       
    39 Review in the annual list of the world’s ten most important
       
    40 emerging
       
    41 technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
       
    42 While this work is impressive, its technical details are too
       
    43 enormous for an explanation here. Therefore let us look at
       
    44 something much simpler, namely finding out properties about
       
    45 programs using \emph{static analysis}.
       
    46 
       
    47 Static analysis is a technique that checks properties of a
       
    48 program without actually running the program. This should
       
    49 raise alarm bells with you---because almost all interesting
       
    50 properties about programs  are equivalent to the halting
       
    51 problem, which we know is undecidable. For example estimating
       
    52 the memory consumption of programs is in general undecidable,
       
    53 just like the halting problem. Static analysis circumvents
       
    54 this undecidability-problem by essentially allowing answers
       
    55 \emph{yes} and \emph{no}, but also \emph{don't know}. With
       
    56 this ``trick'' even the halting problem becomes
       
    57 decidable\ldots{}for example we could always say \emph{don't
       
    58 know}. Of course this would be silly. The point is that we
       
    59 should be striving for a method that answers as often as
       
    60 possible either \emph{yes} or \emph{no}---just in cases when
       
    61 it is too difficult we fall back on the
       
    62 \emph{don't-know}-answer. This might sound all like abstract
       
    63 nonsense. Therefore let us look at a concrete example.
       
    64 
       
    65 
       
    66 \subsubsection*{A Simple, Idealised Programming Language}
       
    67 
       
    68 Our starting point is a small, idealised programming language.
       
    69 It is idealised because we cut several corners in comparison
       
    70 with real programming languages. The language we will study
       
    71 contains, amongst other things, variables holding integers.
       
    72 Using static analysis, we want to find out what the sign of
       
    73 these integers (positive or negative) will be when the program
       
    74 runs. This sign-analysis seems like a very simple problem. But
       
    75 even such simple problems, if approached naively, are in
       
    76 general undecidable, just like Turing's halting problem. I let
       
    77 you think why?
       
    78 
       
    79 
       
    80 Is sign-analysis of variables an interesting problem? Well,
       
    81 yes---if a compiler can find out that for example a variable
       
    82 will never be negative and this variable is used as an index
       
    83 for an array, then the compiler does not need to generate code
       
    84 for an underflow-check. Remember some languages are immune to
       
    85 buffer-overflow attacks, but they need to add underflow and
       
    86 overflow checks everywhere. According to John Regher, an
       
    87 expert in the field of compilers, overflow checks can cause
       
    88 5-10\% slowdown, in some languages even 100\% for tight
       
    89 loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If
       
    90 the compiler can omit the underflow check, for example, then
       
    91 this can potentially drastically speed up the generated code. 
       
    92 
       
    93 What do programs in our simple programming language look like?
       
    94 The following grammar gives a first specification:
       
    95 
       
    96 \begin{multicols}{2}
       
    97 \begin{plstx}[rhs style=,one per line,left margin=9mm]
       
    98 : \meta{Stmt} ::= \meta{label} \texttt{:}
       
    99                 | \meta{var} \texttt{:=} \meta{Exp}
       
   100                 | \texttt{jmp?} \meta{Exp} \meta{label}
       
   101                 | \texttt{goto} \meta{label}\\
       
   102 : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
       
   103 \end{plstx}
       
   104 \columnbreak
       
   105 \begin{plstx}[rhs style=,one per line]
       
   106 : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
       
   107                | \meta{Exp} \texttt{*} \meta{Exp}
       
   108                | \meta{Exp} \texttt{=} \meta{Exp} 
       
   109                | \meta{num}
       
   110                | \meta{var}\\
       
   111 \end{plstx}
       
   112 \end{multicols}
       
   113 
       
   114 \noindent I assume you are familiar with such
       
   115 grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}
       
   116 There are three main syntactic categories: \emph{statments}
       
   117 and \emph{expressions} as well as \emph{programs}, which are
       
   118 sequences of statements. Statements are either labels,
       
   119 variable assignments, conditional jumps (\pcode{jmp?}) and
       
   120 unconditional jumps (\pcode{goto}). Labels are just strings,
       
   121 which can be used as the target of a jump. We assume that in
       
   122 every program the labels are unique---if there is a clash,
       
   123 then we do not know where to jump to. The conditional jumps
       
   124 and variable assignments involve (arithmetic) expressions.
       
   125 Expressions are either numbers, variables or compound
       
   126 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
       
   127 (for simplicity reasons we do not consider any other
       
   128 operations). We assume we have negative and positive numbers,
       
   129 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
       
   130 \pcode{2}\ldots{} An example program that calculates the
       
   131 factorial of 5 is in our programming language as follows:
       
   132 
       
   133 \begin{lstlisting}[language={},xleftmargin=10mm]
       
   134       a := 1
       
   135       n := 5 
       
   136 top:  
       
   137       jmp? n = 0 done 
       
   138       a := a * n 
       
   139       n := n + -1 
       
   140       goto top 
       
   141 done:
       
   142 \end{lstlisting}
       
   143 
       
   144 \noindent As can be seen each line of the program contains a
       
   145 statement. In the first two lines we assign values to the
       
   146 variables \pcode{a} and \pcode{n}. In line 4 we test whether
       
   147 \pcode{n} is zero, in which case we jump to the end of the
       
   148 program marked with the label \pcode{done}. If \pcode{n} is
       
   149 not zero, we multiply the content of \pcode{a} by \pcode{n},
       
   150 decrease \pcode{n} by one and jump back to the beginning of
       
   151 the loop, marked with the label \pcode{top}. Another program
       
   152 in our language is shown in Figure~\ref{fib}. I let you think
       
   153 what it calculates.
       
   154  
       
   155 \begin{figure}[t]
       
   156 \begin{lstlisting}[numbers=none,
       
   157                    language={},xleftmargin=10mm]
       
   158       n := 6
       
   159       m1 := 0
       
   160       m2 := 1
       
   161 loop: 
       
   162       jmp? n = 0 done
       
   163       tmp := m2
       
   164       m2 := m1 + m2
       
   165       m1 := tmp
       
   166       n := n + -1
       
   167       goto top
       
   168 done:
       
   169 \end{lstlisting}
       
   170 \caption{A mystery program in our idealised programming language.
       
   171 Try to find out what it calculates! \label{fib}}
       
   172 \end{figure}
       
   173 
       
   174 Even if our language is rather small, it is still Turing
       
   175 complete---meaning quite powerful. However, discussing this
       
   176 fact in more detail would lead us too far astray. Clearly, our
       
   177 programming is rather low-level and not very comfortable for
       
   178 writing programs. It is inspired by real machine code, which
       
   179 is the code that is executed by a CPU. So a more interesting
       
   180 question is what is missing in comparison with real machine
       
   181 code? Well, not much\ldots{}in principle. Real machine code,
       
   182 of course, contains many more arithmetic instructions (not
       
   183 just addition and multiplication) and many more conditional
       
   184 jumps. We could add these to our language if we wanted, but
       
   185 complexity is really beside the point here. Furthermore, real
       
   186 machine code has many instructions for manipulating memory. We
       
   187 do not have this at all. This is actually a more serious
       
   188 simplification because we assume numbers to be arbitrary small
       
   189 or large, which is not the case with real machine code. In
       
   190 real machine code, basic number formats have a range and might
       
   191 over-flow or under-flow this range. Also the number of
       
   192 variables in our programs is potentially unlimited, while
       
   193 memory in an actual computer, of course, is always limited. To
       
   194 sum up, our language might look ridiculously simple, but it is
       
   195 not too far removed from practically relevant issues.
       
   196 
       
   197 
       
   198 \subsubsection*{An Interpreter}
       
   199 
       
   200 Designing a language is like playing god: you can say what
       
   201 names for variables you allow; what programs should look like;
       
   202 most importantly you can decide what each part of the program
       
   203 should mean and do. While our language is quite simple and the
       
   204 meaning of statements, for example, is rather straightforward,
       
   205 there are still places where we need to make real choices. For
       
   206 example consider the conditional jumps, say the one in the
       
   207 factorial program: 
       
   208 
       
   209 \begin{center}
       
   210 \code{jmp? n = 0 done}
       
   211 \end{center}
       
   212 
       
   213 \noindent How should they work? We could introduce Booleans
       
   214 (\pcode{true} and \pcode{false}) and then jump only when the
       
   215 condition is \pcode{true}. However, since we have numbers in
       
   216 our language anyway, why not just encoding \pcode{true} as
       
   217 one, and \pcode{false} as zero? In this way we can dispense
       
   218 with the additional concept of Booleans.
       
   219 
       
   220 I hope the above discussion makes it already clear we need to
       
   221 be a bit more careful with our programs. Below we shall
       
   222 describe an interpreter for our programming language, which
       
   223 specifies exactly how programs are supposed to be
       
   224 run\ldots{}at least we will specify this for all \emph{good}
       
   225 programs. By good programs I mean where all variables are
       
   226 initialised, for example. Our interpreter will just crash if
       
   227 it cannot find out the value for a variable when it is not
       
   228 initialised. Also, we will assume that labels in good programs
       
   229 are unique, otherwise our programs will calculate ``garbage''.
       
   230 
       
   231 First we will pre-process our programs. This will simplify the
       
   232 definition of our interpreter later on. By pre-processing our
       
   233 programs we will transform programs into \emph{snippets}. A
       
   234 snippet is a label and all the code that comes after the
       
   235 label. This essentially means a snippet is a \emph{map} from
       
   236 labels to code.\footnote{Be sure you know what maps are. In a
       
   237 programming context they are often represented as association
       
   238 list where some data is associated with a key.} 
       
   239 
       
   240 Given that programs are sequences (or lists) of statements, we
       
   241 can easily calculate the snippets by just traversing this
       
   242 sequence and recursively generating the map. Suppose a program
       
   243 is of the general form
       
   244 
       
   245 \begin{center}
       
   246 $stmt_1\;stmt_2\; \ldots\; stmt_n$
       
   247 \end{center}
       
   248 
       
   249 \noindent The idea is to go through this sequence of
       
   250 statements one by one and check whether they are a label. If
       
   251 yes, we add the label and the remaining statements to our map.
       
   252 If no, we just continue with the next statement. To come up
       
   253 with a recursive definition for generating snippets, let us
       
   254 write $[]$ for the program that does not contain any
       
   255 statement. Consider the following definition:
       
   256 
       
   257 \begin{center}
       
   258 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   259 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
       
   260 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
       
   261 $\begin{cases}
       
   262    \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
       
   263    \textit{snippets}(rest)                & \text{otherwise}
       
   264 \end{cases}$   
       
   265 \end{tabular}
       
   266 \end{center}
       
   267 
       
   268 \noindent In the first clause we just return the empty map for
       
   269 the program that does not contain any statement. In the second
       
   270 clause, we have to distinguish the case where the first
       
   271 statement is a label or not. As said before, if not, then we
       
   272 just ``throw away'' the label and recursively calculate the
       
   273 snippets for the rest of the program (the otherwise clause).
       
   274 If yes, then we do the same, but also update the map so that
       
   275 $label$ now points to the rest of the statements (the if
       
   276 clause). This looks all realtively straightforward, but there
       
   277 is one small problem we need to overcome: our two programs
       
   278 shown so far have no label as \emph{entry point}---that is
       
   279 where the execution is supposed to start. We usually assume
       
   280 that the first statement will be run first. To make this the
       
   281 default, it is convenient if we add to all our programs a
       
   282 default label, say \pcode{""} (the empty string). With this we
       
   283 can define our pre-processing of programs as follows
       
   284 
       
   285 \begin{center}
       
   286 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
       
   287 \end{center} 
       
   288 
       
   289 \noindent Let us see how this pans out in practice. If we
       
   290 pre-process the factorial program shown earlier, we obtain the 
       
   291 following map:
       
   292 
       
   293 \begin{center}
       
   294 \small
       
   295 \lstset{numbers=none,
       
   296         language={},
       
   297         xleftmargin=0mm,
       
   298         aboveskip=0.5mm,
       
   299         belowskip=0.5mm,
       
   300         frame=single,
       
   301         framerule=0mm,
       
   302         framesep=0mm}
       
   303 \begin{tikzpicture}[node distance=0mm]
       
   304 \node (A1) [draw]{\pcode{""}};
       
   305 \node (B1) [right=of A1] {$\mapsto$};
       
   306 \node (C1) [right=of B1,anchor=north west] {
       
   307 \begin{minipage}{3.5cm}
       
   308 \begin{lstlisting}[language={},xleftmargin=0mm]
       
   309   a := 1
       
   310   n := 5 
       
   311 top:  
       
   312   jmp? n = 0 done 
       
   313   a := a * n 
       
   314   n := n + -1 
       
   315   goto top 
       
   316 done:
       
   317 \end{lstlisting}
       
   318 \end{minipage}};
       
   319 \node (A2) [right=of C1.north east,draw] {\pcode{top}};
       
   320 \node (B2) [right=of A2] {$\mapsto$};
       
   321 \node (C2) [right=of B2, anchor=north west] {
       
   322 \begin{minipage}{3.5cm}
       
   323 \begin{lstlisting}[language={},xleftmargin=0mm]
       
   324   jmp? n = 0 done 
       
   325   a := a * n 
       
   326   n := n + -1 
       
   327   goto top 
       
   328 done:
       
   329 \end{lstlisting} 
       
   330 \end{minipage}}; 
       
   331 \node (A3) [right=of C2.north east,draw] {\pcode{done}};
       
   332 \node (B3) [right=of A3] {$\mapsto$};
       
   333 \node (C3) [right=of B3] {$[]$};
       
   334 \end{tikzpicture}
       
   335 \end{center}
       
   336 
       
   337 \noindent I highlighted the \emph{keys} in this map. Since
       
   338 there are three labels in the factorial program (remember we
       
   339 added \pcode{""}), there are three keys. When running the
       
   340 factorial program and encountering a jump, then we only have
       
   341 to consult this snippets-map in order to find out what the
       
   342 next statements should be.
       
   343 
       
   344 We should now be in the position to define how a program
       
   345 should be run. In the context of interpreters, this
       
   346 ``running'' of programs is often called \emph{evaluation}. Let
       
   347 us start with the definition of how expressions are evaluated.
       
   348 A first attempt might be the following recursive function:
       
   349 
       
   350 \begin{center}
       
   351 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   352 $\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$
       
   353   \qquad\text{if}\; \texttt{n}\; \text{is a number like} 
       
   354   \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
       
   355 \pcode{2}\ldots{}\\
       
   356 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, 
       
   357                     \texttt{e}_\texttt{2})$ & 
       
   358   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) + 
       
   359            \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
       
   360 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, 
       
   361                     \texttt{e}_\texttt{2})$ & 
       
   362   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) * 
       
   363            \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
       
   364 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, 
       
   365                     \texttt{e}_\texttt{2})$ & 
       
   366   $\dn$ & $\begin{cases}
       
   367   1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) =
       
   368                  \textit{eval\_exp}(\texttt{e}_\texttt{2})\\
       
   369   0 & \text{otherwise}
       
   370   \end{cases}$          
       
   371 \end{tabular}
       
   372 \end{center}
       
   373 
       
   374 \noindent While this should look all rather intuitive`, still
       
   375 be very careful. There is a subtlety which can be easily
       
   376 overlooked: The function \textit{eval\_exp} takes an
       
   377 expression of our programming language as input and returns a
       
   378 number as output. Therefore whenever we encounter a number in
       
   379 our program, we just return this number---this is defined in
       
   380 the first clause above. Whenever we encounter an addition,
       
   381 well then we first evaluate the left-hand side
       
   382 $\texttt{e}_\texttt{1}$ of the addition (this will give a
       
   383 number), then evaluate the right-hand side
       
   384 $\texttt{e}_\texttt{2}$ (this gives another number), and
       
   385 finally add both numbers together. Here is the subtlety: on
       
   386 the left-hand side of the $\dn$ we have a \texttt{+} (in the
       
   387 teletype font) which is the symbol for addition in our
       
   388 programming language. On the right-hand side we have $+$ which
       
   389 stands for the arithmetic operation from ``mathematics'' of
       
   390 adding two numbers. These are rather different concepts---one
       
   391 is a symbol (which we made up), and the other a mathematical
       
   392 operation. When we will have a look at an actual
       
   393 implementation of our interpreter, the mathematical operation
       
   394 will be the function for addition from the programming
       
   395 language in which we \underline{\smash{implement}} our
       
   396 interpreter. While the \texttt{+} is just a symbol that is
       
   397 used in our programming language. Clearly we have to use a
       
   398 symbol that is a good mnemonic for addition, otherwise we will
       
   399 confuse the programmers working with our language. Therefore
       
   400 we use $\texttt{+}$. A similar choice is made for times in the
       
   401 third clause and equality in the fourth clause. 
       
   402 
       
   403 Remember I wrote at the beginning of this section about being
       
   404 god when designing a programming language. You can see this
       
   405 here: we need to give meaning to symbols. At the moment
       
   406 however, we are a poor fallible god. Look again at the grammar
       
   407 of our programming language and our definition. Clearly, an
       
   408 expression can contain variables. So far we have ignored them.
       
   409 What should our interpreter do with variables? They might
       
   410 change during the evaluation of a program. For example the
       
   411 variable \pcode{n} in the factorial program counts down from 5
       
   412 down to 0. How can we improve our definition above to give also
       
   413 an answer whenever our interpreter encounters a variable in an
       
   414 expression? The solution is to add an \emph{environment},
       
   415 written $env$, as an additional input argument to our
       
   416 \textit{eval\_exp} function.
       
   417  
       
   418 \begin{center}
       
   419 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   420 $\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$
       
   421   \qquad\text{if}\; \texttt{n}\; \text{is a number like} 
       
   422   \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
       
   423 \pcode{2}\ldots{}\\
       
   424 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, 
       
   425                     \texttt{e}_\texttt{2}, env)$ & 
       
   426   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) + 
       
   427            \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
       
   428 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, 
       
   429                     \texttt{e}_\texttt{2}, env)$ & 
       
   430   $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) * 
       
   431            \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
       
   432 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, 
       
   433                     \texttt{e}_\texttt{2}, env)$ & 
       
   434   $\dn$ & $\begin{cases}
       
   435   1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) =
       
   436                  \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\
       
   437   0 & \text{otherwise}
       
   438   \end{cases}$\\
       
   439 $\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$      
       
   440 \end{tabular}
       
   441 \end{center}
       
   442  
       
   443 \noindent This environment $env$ also acts like a map: it
       
   444 associates variables with their current values. For example
       
   445 after evaluating the first two lines in our factorial
       
   446 program, such an environment might look as follows
       
   447 
       
   448 \begin{center}
       
   449 \begin{tabular}{ll}
       
   450 $\fbox{\texttt{a}} \mapsto \texttt{1}$ &
       
   451 $\fbox{\texttt{n}} \mapsto \texttt{5}$
       
   452 \end{tabular}
       
   453 \end{center}
       
   454 
       
   455 \noindent Again I highlighted the keys. In the clause for
       
   456 variables, we can therefore consult this environment and
       
   457 return whatever value is currently stored for this variable.
       
   458 This is written $env(x)$---meaning we query this map with $x$
       
   459 we obtain the corresponding number. You might ask what happens
       
   460 if an environment does not contain any value for, say, the
       
   461 variable $x$? Well, then our interpreter just ``crashes'', or
       
   462 more precisely will raise an exception. In this case we have a
       
   463 ``bad'' program that tried to use a variable before it was
       
   464 initialised. The programmer should not have done this. In a
       
   465 real programming language, we would of course try a bit harder
       
   466 and for example give an error at compile time, or design our
       
   467 language in such a way that this can never happen. With the
       
   468 second version of \textit{eval\_exp} we completed our
       
   469 definition for evaluating expressions.
       
   470  
       
   471 Next comes the evaluation function for statements. We define
       
   472 this function in such a way that we recursively evaluate a
       
   473 whole sequence of statements. Assume a program $p$ (you want
       
   474 to evaluate) and its pre-processed snippets $sn$. Then we can
       
   475 define:
       
   476 
       
   477 \begin{center}
       
   478 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
       
   479 $\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\
       
   480 $\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &
       
   481   $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ 
       
   482 $\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & 
       
   483   $\dn$ & $\textit{eval\_stmts}(rest, 
       
   484            env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\  
       
   485 $\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ 
       
   486  & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\
       
   487 $\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ 
       
   488  & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}
       
   489  \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ 
       
   490  & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\
       
   491  \textit{eval\_stmts}(rest, env) & \text{otherwise}\\
       
   492  \end{array}\end{cases}$  
       
   493 \end{tabular}
       
   494 \end{center}
       
   495 
       
   496 \noindent The first clause is for the empty program, or when
       
   497 we arrived at the end of the program. In this case we just
       
   498 return the environment. The second clause is for when the next
       
   499 statement is a label. That means the program is of the form
       
   500 $\texttt{label:}\;rest$ where the label is some string and
       
   501 $rest$ stands for all following statements. This case is easy,
       
   502 because our evaluation function just discards the label and
       
   503 evaluates the rest of the statements (we already extracted all
       
   504 important information about labels when we pre-processed our
       
   505 programs and generated the snippets). The third clause is for
       
   506 variable assignments. Again we just evaluate the rest for the
       
   507 statements, but with a modified environment---since the
       
   508 variable assignment is supposed to introduce a new variable or
       
   509 change the current value of a variable. For this modification
       
   510 of the environment we first evaluate the expression
       
   511 $\texttt{e}$ using our evaluation function for expressions.
       
   512 This gives us a number. Then we assign this number to the
       
   513 variable $x$ in the environment. This modified environment
       
   514 will be used to evaluate the rest of the program. The fourth
       
   515 clause is for the unconditional jump to a label, called
       
   516 \texttt{lbl}. That means we have to look up in our snippets
       
   517 map $sn$ what are the next statements for this label.
       
   518 Therefore we will continue with evaluating, not with the rest
       
   519 of the program, but with the statements stored in the
       
   520 snippets-map under the label $\texttt{lbl}$. The fifth clause
       
   521 for conditional jumps is similar, but to decide whether to
       
   522 make the jump we first need to evaluate the expression
       
   523 $\texttt{e}$ in order to find out whether it is $1$. If yes,
       
   524 we jump, otherwise we just continue with evaluating the rest
       
   525 of the program.
       
   526 
       
   527 Our interpreter works in two stages: First we pre-process our
       
   528 program generating the snippets map $sn$, say. Second we call
       
   529 the evaluation function with the default entry point and the
       
   530 empty environment:
       
   531 
       
   532 \begin{center}
       
   533 $\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$
       
   534 \end{center}
       
   535  
       
   536 \noindent It is interesting to note that our interpreter when
       
   537 it comes to the end of the program returns an environment. Our
       
   538 programming language does not contain any constructs for input
       
   539 and output. Therefore this environment is the only effect we
       
   540 can observe when running the program (apart from that our
       
   541 interpreter might need some time before finishing the
       
   542 evaluation of the program and the CPU getting hot). Evaluating 
       
   543 the factorial program with our interpreter we receive as
       
   544 ``answer''-environment
       
   545 
       
   546 \begin{center}
       
   547 \begin{tabular}{ll}
       
   548 $\fbox{\texttt{a}} \mapsto \texttt{120}$ &
       
   549 $\fbox{\texttt{n}} \mapsto \texttt{0}$
       
   550 \end{tabular}
       
   551 \end{center}
       
   552 
       
   553 \noindent While the discussion above should have illustrated
       
   554 the ideas, in order to do some serious calculations, we clearly
       
   555 need to implement the interpreter.
       
   556 
       
   557 
       
   558 \subsubsection*{Scala Code for the Interpreter}
       
   559 
       
   560 Functional programming languages are very convenient for
       
   561 implementations of interpreters. A good choice for a
       
   562 functional programming language is Scala, a programming
       
   563 language that combines functional and object-oriented
       
   564 pro\-gramming-styles. It has received in the last five years or
       
   565 so quite a bit of attention. One reason for this attention is
       
   566 that, like the Java programming language, Scala compiles to
       
   567 the Java Virtual Machine (JVM) and therefore Scala programs
       
   568 can run under MacOSX, Linux and Windows.\footnote{There are
       
   569 also experimental backends for Android and JavaScript.} Unlike
       
   570 Java, however, Scala often allows programmers to write very
       
   571 concise and elegant code. Some therefore say Scala is the much
       
   572 better Java. A number of companies, The Guardian, Twitter,
       
   573 Coursera, FourSquare, LinkedIn to name a few, either use Scala
       
   574 exclusively in production code, or at least to some
       
   575 substantial degree. If you want to try out Scala yourself, the
       
   576 Scala compiler can be downloaded from
       
   577 
       
   578 \begin{quote}
       
   579 \url{http://www.scala-lang.org}
       
   580 \end{quote}
       
   581 
       
   582 Let us have a look at the Scala code shown in
       
   583 Figure~\ref{code}. It shows the entire code for the
       
   584 interpreter, though the implementation is admittedly no
       
   585 frills.
       
   586 
       
   587 \begin{figure}[t]
       
   588 \small
       
   589 \lstinputlisting[language=Scala]{../progs/inter.scala}
       
   590 \caption{The entire code of the interpreter for our
       
   591 idealised programming language.\label{code}}
       
   592 \end{figure}
       
   593 
       
   594 
       
   595 \subsubsection*{Static Analysis}
       
   596 
       
   597 Finally we can come back to our original problem, namely 
       
   598 finding out what the signs of variables are 
       
   599 
       
   600 \begin{center}
       
   601 
       
   602 
       
   603 \end{center}
       
   604  
       
   605 \end{document}
       
   606 
       
   607 %% list of static analysers for C
       
   608 http://spinroot.com/static/index.html
       
   609 
       
   610 %% NASA coding rules for C
       
   611 http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf
       
   612 
       
   613 %%% Local Variables: 
       
   614 %%% mode: latex
       
   615 %%% TeX-master: t
       
   616 %%% End: