\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../langs}+ −
\usepackage{../graphics}+ −
\usepackage{../grammar}+ −
\usepackage{multicol}+ −
+ −
\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}+ −
+ −
\begin{document}+ −
\fnote{\copyright{} Christian Urban, King's College London, 2014}+ −
+ −
%% why are shuttle flights so good with software+ −
%%http://www.fastcompany.com/28121/they-write-right-stuff+ −
+ −
\section*{Handout 9 (Static Analysis)}+ −
+ −
If we want to improve the safety and security of our programs,+ −
we need a more principled approach to programming. Testing is+ −
good, but as Edsger Dijkstra famously wrote: + −
+ −
\begin{quote}\it + −
``Program testing can be a very effective way to show the+ −
\underline{\smash{presence}} of bugs, but it is hopelessly+ −
inadequate for showing their \underline{\smash{absence}}.''+ −
\end{quote}+ −
+ −
\noindent While such a more principled approach has been the+ −
subject of intense study for a long, long time, only in the+ −
past few years some impressive results have been achieved. One+ −
is the complete formalisation and (mathematical) verification+ −
of a microkernel operating system called seL4.+ −
+ −
\begin{center}+ −
\url{http://sel4.systems}+ −
\end{center}+ −
+ −
\noindent In 2011 this work was included in the MIT Technology+ −
Review in the annual list of the world’s ten most important+ −
emerging+ −
technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}+ −
While this work is impressive, its technical details are too+ −
enormous for an explanation here. Therefore let us look at+ −
something much simpler, namely finding out properties about+ −
programs using \emph{static analysis}.+ −
+ −
Static analysis is a technique that checks properties of a+ −
program without actually running the program. This should+ −
raise alarm bells with you---because almost all interesting+ −
properties about programs are equivalent to the halting+ −
problem, which we know is undecidable. For example estimating+ −
the memory consumption of programs is in general undecidable,+ −
just like the halting problem. Static analysis circumvents+ −
this undecidability-problem by essentially allowing answers+ −
\emph{yes} and \emph{no}, but also \emph{don't know}. With+ −
this ``trick'' even the halting problem becomes+ −
decidable\ldots{}for example we could always say \emph{don't+ −
know}. Of course this would be silly. The point is that we+ −
should be striving for a method that answers as often as+ −
possible either \emph{yes} or \emph{no}---just in cases when+ −
it is too difficult we fall back on the+ −
\emph{don't-know}-answer. This might sound all like abstract+ −
nonsense. Therefore let us look at a concrete example.+ −
+ −
+ −
\subsubsection*{A Simple, Idealised Programming Language}+ −
+ −
Our starting point is a small, idealised programming language.+ −
It is idealised because we cut several corners in comparison+ −
with real programming languages. The language we will study+ −
contains, amongst other things, variables holding integers.+ −
Using static analysis, we want to find out what the sign of+ −
these integers (positive or negative) will be when the program+ −
runs. This sign-analysis seems like a very simple problem. But+ −
even such simple problems, if approached naively, are in+ −
general undecidable, just like Turing's halting problem. I let+ −
you think why?+ −
+ −
+ −
Is sign-analysis of variables an interesting problem? Well,+ −
yes---if a compiler can find out that for example a variable+ −
will never be negative and this variable is used as an index+ −
for an array, then the compiler does not need to generate code+ −
for an underflow-check. Remember some languages are immune to+ −
buffer-overflow attacks, but they need to add underflow and+ −
overflow checks everywhere. According to John Regher, an+ −
expert in the field of compilers, overflow checks can cause+ −
5-10\% slowdown, in some languages even 100\% for tight+ −
loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If+ −
the compiler can omit the underflow check, for example, then+ −
this can potentially drastically speed up the generated code. + −
+ −
What do programs in our simple programming language look like?+ −
The following grammar gives a first specification:+ −
+ −
\begin{multicols}{2}+ −
\begin{plstx}[rhs style=,one per line,left margin=9mm]+ −
: \meta{Stmt} ::= \meta{label} \texttt{:}+ −
| \meta{var} \texttt{:=} \meta{Exp}+ −
| \texttt{jmp?} \meta{Exp} \meta{label}+ −
| \texttt{goto} \meta{label}\\+ −
: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\+ −
\end{plstx}+ −
\columnbreak+ −
\begin{plstx}[rhs style=,one per line]+ −
: \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}+ −
| \meta{Exp} \texttt{*} \meta{Exp}+ −
| \meta{Exp} \texttt{=} \meta{Exp} + −
| \meta{num}+ −
| \meta{var}\\+ −
\end{plstx}+ −
\end{multicols}+ −
+ −
\noindent I assume you are familiar with such+ −
grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}+ −
There are three main syntactic categories: \emph{statments}+ −
and \emph{expressions} as well as \emph{programs}, which are+ −
sequences of statements. Statements are either labels,+ −
variable assignments, conditional jumps (\pcode{jmp?}) and+ −
unconditional jumps (\pcode{goto}). Labels are just strings,+ −
which can be used as the target of a jump. We assume that in+ −
every program the labels are unique---if there is a clash,+ −
then we do not know where to jump to. The conditional jumps+ −
and variable assignments involve (arithmetic) expressions.+ −
Expressions are either numbers, variables or compound+ −
expressions built up from \pcode{+}, \pcode{*} and \emph{=}+ −
(for simplicity reasons we do not consider any other+ −
operations). We assume we have negative and positive numbers,+ −
\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},+ −
\pcode{2}\ldots{} An example program that calculates the+ −
factorial of 5 is in our programming language as follows:+ −
+ −
\begin{lstlisting}[language={},xleftmargin=10mm]+ −
a := 1+ −
n := 5 + −
top: + −
jmp? n = 0 done + −
a := a * n + −
n := n + -1 + −
goto top + −
done:+ −
\end{lstlisting}+ −
+ −
\noindent As can be seen each line of the program contains a+ −
statement. In the first two lines we assign values to the+ −
variables \pcode{a} and \pcode{n}. In line 4 we test whether+ −
\pcode{n} is zero, in which case we jump to the end of the+ −
program marked with the label \pcode{done}. If \pcode{n} is+ −
not zero, we multiply the content of \pcode{a} by \pcode{n},+ −
decrease \pcode{n} by one and jump back to the beginning of+ −
the loop, marked with the label \pcode{top}. Another program+ −
in our language is shown in Figure~\ref{fib}. I let you think+ −
what it calculates.+ −
+ −
\begin{figure}[t]+ −
\begin{lstlisting}[numbers=none,+ −
language={},xleftmargin=10mm]+ −
n := 6+ −
m1 := 0+ −
m2 := 1+ −
loop: + −
jmp? n = 0 done+ −
tmp := m2+ −
m2 := m1 + m2+ −
m1 := tmp+ −
n := n + -1+ −
goto top+ −
done:+ −
\end{lstlisting}+ −
\caption{A mystery program in our idealised programming language.+ −
Try to find out what it calculates! \label{fib}}+ −
\end{figure}+ −
+ −
Even if our language is rather small, it is still Turing+ −
complete---meaning quite powerful. However, discussing this+ −
fact in more detail would lead us too far astray. Clearly, our+ −
programming is rather low-level and not very comfortable for+ −
writing programs. It is inspired by real machine code, which+ −
is the code that is executed by a CPU. So a more interesting+ −
question is what is missing in comparison with real machine+ −
code? Well, not much\ldots{}in principle. Real machine code,+ −
of course, contains many more arithmetic instructions (not+ −
just addition and multiplication) and many more conditional+ −
jumps. We could add these to our language if we wanted, but+ −
complexity is really beside the point here. Furthermore, real+ −
machine code has many instructions for manipulating memory. We+ −
do not have this at all. This is actually a more serious+ −
simplification because we assume numbers to be arbitrary small+ −
or large, which is not the case with real machine code. In+ −
real machine code, basic number formats have a range and might+ −
over-flow or under-flow this range. Also the number of+ −
variables in our programs is potentially unlimited, while+ −
memory in an actual computer, of course, is always limited. To+ −
sum up, our language might look ridiculously simple, but it is+ −
not too far removed from practically relevant issues.+ −
+ −
+ −
\subsubsection*{An Interpreter}+ −
+ −
Designing a language is like playing god: you can say what+ −
names for variables you allow; what programs should look like;+ −
most importantly you can decide what each part of the program+ −
should mean and do. While our language is quite simple and the+ −
meaning of statements, for example, is rather straightforward,+ −
there are still places where we need to make real choices. For+ −
example consider the conditional jumps, say the one in the+ −
factorial program: + −
+ −
\begin{center}+ −
\code{jmp? n = 0 done}+ −
\end{center}+ −
+ −
\noindent How should they work? We could introduce Booleans+ −
(\pcode{true} and \pcode{false}) and then jump only when the+ −
condition is \pcode{true}. However, since we have numbers in+ −
our language anyway, why not just encoding \pcode{true} as+ −
one, and \pcode{false} as zero? In this way we can dispense+ −
with the additional concept of Booleans.+ −
+ −
I hope the above discussion makes it already clear we need to+ −
be a bit more careful with our programs. Below we shall+ −
describe an interpreter for our programming language, which+ −
specifies exactly how programs are supposed to be+ −
run\ldots{}at least we will specify this for all \emph{good}+ −
programs. By good programs I mean where all variables are+ −
initialised, for example. Our interpreter will just crash if+ −
it cannot find out the value for a variable when it is not+ −
initialised. Also, we will assume that labels in good programs+ −
are unique, otherwise our programs will calculate ``garbage''.+ −
+ −
First we will pre-process our programs. This will simplify the+ −
definition of our interpreter later on. By pre-processing our+ −
programs we will transform programs into \emph{snippets}. A+ −
snippet is a label and all the code that comes after the+ −
label. This essentially means a snippet is a \emph{map} from+ −
labels to code.\footnote{Be sure you know what maps are. In a+ −
programming context they are often represented as association+ −
list where some data is associated with a key.} + −
+ −
Given that programs are sequences (or lists) of statements, we+ −
can easily calculate the snippets by just traversing this+ −
sequence and recursively generating the map. Suppose a program+ −
is of the general form+ −
+ −
\begin{center}+ −
$stmt_1\;stmt_2\; \ldots\; stmt_n$+ −
\end{center}+ −
+ −
\noindent The idea is to go through this sequence of+ −
statements one by one and check whether they are a label. If+ −
yes, we add the label and the remaining statements to our map.+ −
If no, we just continue with the next statement. To come up+ −
with a recursive definition for generating snippets, let us+ −
write $[]$ for the program that does not contain any+ −
statement. Consider the following definition:+ −
+ −
\begin{center}+ −
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}+ −
$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\+ −
$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &+ −
$\begin{cases}+ −
\textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\+ −
\textit{snippets}(rest) & \text{otherwise}+ −
\end{cases}$ + −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent In the first clause we just return the empty map for+ −
the program that does not contain any statement. In the second+ −
clause, we have to distinguish the case where the first+ −
statement is a label or not. As said before, if not, then we+ −
just ``throw away'' the label and recursively calculate the+ −
snippets for the rest of the program (the otherwise clause).+ −
If yes, then we do the same, but also update the map so that+ −
$label$ now points to the rest of the statements (the if+ −
clause). This looks all realtively straightforward, but there+ −
is one small problem we need to overcome: our two programs+ −
shown so far have no label as \emph{entry point}---that is+ −
where the execution is supposed to start. We usually assume+ −
that the first statement will be run first. To make this the+ −
default, it is convenient if we add to all our programs a+ −
default label, say \pcode{""} (the empty string). With this we+ −
can define our pre-processing of programs as follows+ −
+ −
\begin{center}+ −
$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ + −
\end{center} + −
+ −
\noindent Let us see how this pans out in practice. If we+ −
pre-process the factorial program shown earlier, we obtain the + −
following map:+ −
+ −
\begin{center}+ −
\small+ −
\lstset{numbers=none,+ −
language={},+ −
xleftmargin=0mm,+ −
aboveskip=0.5mm,+ −
belowskip=0.5mm,+ −
frame=single,+ −
framerule=0mm,+ −
framesep=0mm}+ −
\begin{tikzpicture}[node distance=0mm]+ −
\node (A1) [draw]{\pcode{""}};+ −
\node (B1) [right=of A1] {$\mapsto$};+ −
\node (C1) [right=of B1,anchor=north west] {+ −
\begin{minipage}{3.5cm}+ −
\begin{lstlisting}[language={},xleftmargin=0mm]+ −
a := 1+ −
n := 5 + −
top: + −
jmp? n = 0 done + −
a := a * n + −
n := n + -1 + −
goto top + −
done:+ −
\end{lstlisting}+ −
\end{minipage}};+ −
\node (A2) [right=of C1.north east,draw] {\pcode{top}};+ −
\node (B2) [right=of A2] {$\mapsto$};+ −
\node (C2) [right=of B2, anchor=north west] {+ −
\begin{minipage}{3.5cm}+ −
\begin{lstlisting}[language={},xleftmargin=0mm]+ −
jmp? n = 0 done + −
a := a * n + −
n := n + -1 + −
goto top + −
done:+ −
\end{lstlisting} + −
\end{minipage}}; + −
\node (A3) [right=of C2.north east,draw] {\pcode{done}};+ −
\node (B3) [right=of A3] {$\mapsto$};+ −
\node (C3) [right=of B3] {$[]$};+ −
\end{tikzpicture}+ −
\end{center}+ −
+ −
\noindent I highlighted the \emph{keys} in this map. Since+ −
there are three labels in the factorial program (remember we+ −
added \pcode{""}), there are three keys. When running the+ −
factorial program and encountering a jump, then we only have+ −
to consult this snippets-map in order to find out what the+ −
next statements should be.+ −
+ −
We should now be in the position to define how a program+ −
should be run. In the context of interpreters, this+ −
``running'' of programs is often called \emph{evaluation}. Let+ −
us start with the definition of how expressions are evaluated.+ −
A first attempt might be the following recursive function:+ −
+ −
\begin{center}+ −
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}+ −
$\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$+ −
\qquad\text{if}\; \texttt{n}\; \text{is a number like} + −
\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},+ −
\pcode{2}\ldots{}\\+ −
$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, + −
\texttt{e}_\texttt{2})$ & + −
$\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) + + −
\textit{eval\_exp}(\texttt{e}_\texttt{2})$\\+ −
$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, + −
\texttt{e}_\texttt{2})$ & + −
$\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) * + −
\textit{eval\_exp}(\texttt{e}_\texttt{2})$\\+ −
$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, + −
\texttt{e}_\texttt{2})$ & + −
$\dn$ & $\begin{cases}+ −
1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) =+ −
\textit{eval\_exp}(\texttt{e}_\texttt{2})\\+ −
0 & \text{otherwise}+ −
\end{cases}$ + −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent While this should look all rather intuitive`, still+ −
be very careful. There is a subtlety which can be easily+ −
overlooked: The function \textit{eval\_exp} takes an+ −
expression of our programming language as input and returns a+ −
number as output. Therefore whenever we encounter a number in+ −
our program, we just return this number---this is defined in+ −
the first clause above. Whenever we encounter an addition,+ −
well then we first evaluate the left-hand side+ −
$\texttt{e}_\texttt{1}$ of the addition (this will give a+ −
number), then evaluate the right-hand side+ −
$\texttt{e}_\texttt{2}$ (this gives another number), and+ −
finally add both numbers together. Here is the subtlety: on+ −
the left-hand side of the $\dn$ we have a \texttt{+} (in the+ −
teletype font) which is the symbol for addition in our+ −
programming language. On the right-hand side we have $+$ which+ −
stands for the arithmetic operation from ``mathematics'' of+ −
adding two numbers. These are rather different concepts---one+ −
is a symbol (which we made up), and the other a mathematical+ −
operation. When we will have a look at an actual+ −
implementation of our interpreter, the mathematical operation+ −
will be the function for addition from the programming+ −
language in which we \underline{\smash{implement}} our+ −
interpreter. While the \texttt{+} is just a symbol that is+ −
used in our programming language. Clearly we have to use a+ −
symbol that is a good mnemonic for addition, otherwise we will+ −
confuse the programmers working with our language. Therefore+ −
we use $\texttt{+}$. A similar choice is made for times in the+ −
third clause and equality in the fourth clause. + −
+ −
Remember I wrote at the beginning of this section about being+ −
god when designing a programming language. You can see this+ −
here: we need to give meaning to symbols. At the moment+ −
however, we are a poor fallible god. Look again at the grammar+ −
of our programming language and our definition. Clearly, an+ −
expression can contain variables. So far we have ignored them.+ −
What should our interpreter do with variables? They might+ −
change during the evaluation of a program. For example the+ −
variable \pcode{n} in the factorial program counts down from 5+ −
down to 0. How can we improve our definition above to give also+ −
an answer whenever our interpreter encounters a variable in an+ −
expression? The solution is to add an \emph{environment},+ −
written $env$, as an additional input argument to our+ −
\textit{eval\_exp} function.+ −
+ −
\begin{center}+ −
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}+ −
$\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$+ −
\qquad\text{if}\; \texttt{n}\; \text{is a number like} + −
\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},+ −
\pcode{2}\ldots{}\\+ −
$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, + −
\texttt{e}_\texttt{2}, env)$ & + −
$\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) + + −
\textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\+ −
$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, + −
\texttt{e}_\texttt{2}, env)$ & + −
$\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) * + −
\textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\+ −
$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, + −
\texttt{e}_\texttt{2}, env)$ & + −
$\dn$ & $\begin{cases}+ −
1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) =+ −
\textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\+ −
0 & \text{otherwise}+ −
\end{cases}$\\+ −
$\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$ + −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent This environment $env$ also acts like a map: it+ −
associates variables with their current values. For example+ −
after evaluating the first two lines in our factorial+ −
program, such an environment might look as follows+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
$\fbox{\texttt{a}} \mapsto \texttt{1}$ &+ −
$\fbox{\texttt{n}} \mapsto \texttt{5}$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent Again I highlighted the keys. In the clause for+ −
variables, we can therefore consult this environment and+ −
return whatever value is currently stored for this variable.+ −
This is written $env(x)$---meaning we query this map with $x$+ −
we obtain the corresponding number. You might ask what happens+ −
if an environment does not contain any value for, say, the+ −
variable $x$? Well, then our interpreter just ``crashes'', or+ −
more precisely will raise an exception. In this case we have a+ −
``bad'' program that tried to use a variable before it was+ −
initialised. The programmer should not have done this. In a+ −
real programming language, we would of course try a bit harder+ −
and for example give an error at compile time, or design our+ −
language in such a way that this can never happen. With the+ −
second version of \textit{eval\_exp} we completed our+ −
definition for evaluating expressions.+ −
+ −
Next comes the evaluation function for statements. We define+ −
this function in such a way that we recursively evaluate a+ −
whole sequence of statements. Assume a program $p$ (you want+ −
to evaluate) and its pre-processed snippets $sn$. Then we can+ −
define:+ −
+ −
\begin{center}+ −
\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}+ −
$\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\+ −
$\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &+ −
$\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ + −
$\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & + −
$\dn$ & $\textit{eval\_stmts}(rest, + −
env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\ + −
$\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ + −
& $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\+ −
$\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ + −
& $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}+ −
\textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ + −
& \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\+ −
\textit{eval\_stmts}(rest, env) & \text{otherwise}\\+ −
\end{array}\end{cases}$ + −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent The first clause is for the empty program, or when+ −
we arrived at the end of the program. In this case we just+ −
return the environment. The second clause is for when the next+ −
statement is a label. That means the program is of the form+ −
$\texttt{label:}\;rest$ where the label is some string and+ −
$rest$ stands for all following statements. This case is easy,+ −
because our evaluation function just discards the label and+ −
evaluates the rest of the statements (we already extracted all+ −
important information about labels when we pre-processed our+ −
programs and generated the snippets). The third clause is for+ −
variable assignments. Again we just evaluate the rest for the+ −
statements, but with a modified environment---since the+ −
variable assignment is supposed to introduce a new variable or+ −
change the current value of a variable. For this modification+ −
of the environment we first evaluate the expression+ −
$\texttt{e}$ using our evaluation function for expressions.+ −
This gives us a number. Then we assign this number to the+ −
variable $x$ in the environment. This modified environment+ −
will be used to evaluate the rest of the program. The fourth+ −
clause is for the unconditional jump to a label, called+ −
\texttt{lbl}. That means we have to look up in our snippets+ −
map $sn$ what are the next statements for this label.+ −
Therefore we will continue with evaluating, not with the rest+ −
of the program, but with the statements stored in the+ −
snippets-map under the label $\texttt{lbl}$. The fifth clause+ −
for conditional jumps is similar, but to decide whether to+ −
make the jump we first need to evaluate the expression+ −
$\texttt{e}$ in order to find out whether it is $1$. If yes,+ −
we jump, otherwise we just continue with evaluating the rest+ −
of the program.+ −
+ −
Our interpreter works in two stages: First we pre-process our+ −
program generating the snippets map $sn$, say. Second we call+ −
the evaluation function with the default entry point and the+ −
empty environment:+ −
+ −
\begin{center}+ −
$\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$+ −
\end{center}+ −
+ −
\noindent It is interesting to note that our interpreter when+ −
it comes to the end of the program returns an environment. Our+ −
programming language does not contain any constructs for input+ −
and output. Therefore this environment is the only effect we+ −
can observe when running the program (apart from that our+ −
interpreter might need some time before finishing the+ −
evaluation of the program and the CPU getting hot). Evaluating + −
the factorial program with our interpreter we receive as+ −
``answer''-environment+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
$\fbox{\texttt{a}} \mapsto \texttt{120}$ &+ −
$\fbox{\texttt{n}} \mapsto \texttt{0}$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent While the discussion above should have illustrated+ −
the ideas, in order to do some serious calculations, we clearly+ −
need to implement the interpreter.+ −
+ −
+ −
\subsubsection*{Scala Code for the Interpreter}+ −
+ −
Functional programming languages are very convenient for+ −
implementations of interpreters. A good choice for a+ −
functional programming language is Scala, a programming+ −
language that combines functional and object-oriented+ −
pro\-gramming-styles. It has received in the last five years or+ −
so quite a bit of attention. One reason for this attention is+ −
that, like the Java programming language, Scala compiles to+ −
the Java Virtual Machine (JVM) and therefore Scala programs+ −
can run under MacOSX, Linux and Windows.\footnote{There are+ −
also experimental backends for Android and JavaScript.} Unlike+ −
Java, however, Scala often allows programmers to write very+ −
concise and elegant code. Some therefore say Scala is the much+ −
better Java. A number of companies, The Guardian, Twitter,+ −
Coursera, FourSquare, LinkedIn to name a few, either use Scala+ −
exclusively in production code, or at least to some+ −
substantial degree. If you want to try out Scala yourself, the+ −
Scala compiler can be downloaded from+ −
+ −
\begin{quote}+ −
\url{http://www.scala-lang.org}+ −
\end{quote}+ −
+ −
Let us have a look at the Scala code shown in+ −
Figure~\ref{code}. It shows the entire code for the+ −
interpreter, though the implementation is admittedly no+ −
frills.+ −
+ −
\begin{figure}[t]+ −
\small+ −
\lstinputlisting[language=Scala]{../progs/inter.scala}+ −
\caption{The entire code of the interpreter for our+ −
idealised programming language.\label{code}}+ −
\end{figure}+ −
+ −
+ −
\subsubsection*{Static Analysis}+ −
+ −
Finally we can come back to our original problem, namely + −
finding out what the signs of variables are + −
+ −
\begin{center}+ −
+ −
+ −
\end{center}+ −
+ −
\end{document}+ −
+ −
%% list of static analysers for C+ −
http://spinroot.com/static/index.html+ −
+ −
%% NASA coding rules for C+ −
http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −