|
1 \documentclass{article} |
|
2 \usepackage{../style} |
|
3 \usepackage{../langs} |
|
4 \usepackage{../graphics} |
|
5 \usepackage{../grammar} |
|
6 \usepackage{multicol} |
|
7 |
|
8 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}} |
|
9 |
|
10 \begin{document} |
|
11 \fnote{\copyright{} Christian Urban, King's College London, 2014} |
|
12 |
|
13 %% why are shuttle flights so good with software |
|
14 %%http://www.fastcompany.com/28121/they-write-right-stuff |
|
15 |
|
16 \section*{Handout 9 (Static Analysis)} |
|
17 |
|
18 If we want to improve the safety and security of our programs, |
|
19 we need a more principled approach to programming. Testing is |
|
20 good, but as Edsger Dijkstra famously wrote: |
|
21 |
|
22 \begin{quote}\it |
|
23 ``Program testing can be a very effective way to show the |
|
24 \underline{\smash{presence}} of bugs, but it is hopelessly |
|
25 inadequate for showing their \underline{\smash{absence}}.'' |
|
26 \end{quote} |
|
27 |
|
28 \noindent While such a more principled approach has been the |
|
29 subject of intense study for a long, long time, only in the |
|
30 past few years some impressive results have been achieved. One |
|
31 is the complete formalisation and (mathematical) verification |
|
32 of a microkernel operating system called seL4. |
|
33 |
|
34 \begin{center} |
|
35 \url{http://sel4.systems} |
|
36 \end{center} |
|
37 |
|
38 \noindent In 2011 this work was included in the MIT Technology |
|
39 Review in the annual list of the world’s ten most important |
|
40 emerging |
|
41 technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}} |
|
42 While this work is impressive, its technical details are too |
|
43 enormous for an explanation here. Therefore let us look at |
|
44 something much simpler, namely finding out properties about |
|
45 programs using \emph{static analysis}. |
|
46 |
|
47 Static analysis is a technique that checks properties of a |
|
48 program without actually running the program. This should |
|
49 raise alarm bells with you---because almost all interesting |
|
50 properties about programs are equivalent to the halting |
|
51 problem, which we know is undecidable. For example estimating |
|
52 the memory consumption of programs is in general undecidable, |
|
53 just like the halting problem. Static analysis circumvents |
|
54 this undecidability-problem by essentially allowing answers |
|
55 \emph{yes} and \emph{no}, but also \emph{don't know}. With |
|
56 this ``trick'' even the halting problem becomes |
|
57 decidable\ldots{}for example we could always say \emph{don't |
|
58 know}. Of course this would be silly. The point is that we |
|
59 should be striving for a method that answers as often as |
|
60 possible either \emph{yes} or \emph{no}---just in cases when |
|
61 it is too difficult we fall back on the |
|
62 \emph{don't-know}-answer. This might sound all like abstract |
|
63 nonsense. Therefore let us look at a concrete example. |
|
64 |
|
65 |
|
66 \subsubsection*{A Simple, Idealised Programming Language} |
|
67 |
|
68 Our starting point is a small, idealised programming language. |
|
69 It is idealised because we cut several corners in comparison |
|
70 with real programming languages. The language we will study |
|
71 contains, amongst other things, variables holding integers. |
|
72 Using static analysis, we want to find out what the sign of |
|
73 these integers (positive or negative) will be when the program |
|
74 runs. This sign-analysis seems like a very simple problem. But |
|
75 even such simple problems, if approached naively, are in |
|
76 general undecidable, just like Turing's halting problem. I let |
|
77 you think why? |
|
78 |
|
79 |
|
80 Is sign-analysis of variables an interesting problem? Well, |
|
81 yes---if a compiler can find out that for example a variable |
|
82 will never be negative and this variable is used as an index |
|
83 for an array, then the compiler does not need to generate code |
|
84 for an underflow-check. Remember some languages are immune to |
|
85 buffer-overflow attacks, but they need to add underflow and |
|
86 overflow checks everywhere. According to John Regher, an |
|
87 expert in the field of compilers, overflow checks can cause |
|
88 5-10\% slowdown, in some languages even 100\% for tight |
|
89 loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If |
|
90 the compiler can omit the underflow check, for example, then |
|
91 this can potentially drastically speed up the generated code. |
|
92 |
|
93 What do programs in our simple programming language look like? |
|
94 The following grammar gives a first specification: |
|
95 |
|
96 \begin{multicols}{2} |
|
97 \begin{plstx}[rhs style=,one per line,left margin=9mm] |
|
98 : \meta{Stmt} ::= \meta{label} \texttt{:} |
|
99 | \meta{var} \texttt{:=} \meta{Exp} |
|
100 | \texttt{jmp?} \meta{Exp} \meta{label} |
|
101 | \texttt{goto} \meta{label}\\ |
|
102 : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\ |
|
103 \end{plstx} |
|
104 \columnbreak |
|
105 \begin{plstx}[rhs style=,one per line] |
|
106 : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} |
|
107 | \meta{Exp} \texttt{*} \meta{Exp} |
|
108 | \meta{Exp} \texttt{=} \meta{Exp} |
|
109 | \meta{num} |
|
110 | \meta{var}\\ |
|
111 \end{plstx} |
|
112 \end{multicols} |
|
113 |
|
114 \noindent I assume you are familiar with such |
|
115 grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}} |
|
116 There are three main syntactic categories: \emph{statments} |
|
117 and \emph{expressions} as well as \emph{programs}, which are |
|
118 sequences of statements. Statements are either labels, |
|
119 variable assignments, conditional jumps (\pcode{jmp?}) and |
|
120 unconditional jumps (\pcode{goto}). Labels are just strings, |
|
121 which can be used as the target of a jump. We assume that in |
|
122 every program the labels are unique---if there is a clash, |
|
123 then we do not know where to jump to. The conditional jumps |
|
124 and variable assignments involve (arithmetic) expressions. |
|
125 Expressions are either numbers, variables or compound |
|
126 expressions built up from \pcode{+}, \pcode{*} and \emph{=} |
|
127 (for simplicity reasons we do not consider any other |
|
128 operations). We assume we have negative and positive numbers, |
|
129 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
|
130 \pcode{2}\ldots{} An example program that calculates the |
|
131 factorial of 5 is in our programming language as follows: |
|
132 |
|
133 \begin{lstlisting}[language={},xleftmargin=10mm] |
|
134 a := 1 |
|
135 n := 5 |
|
136 top: |
|
137 jmp? n = 0 done |
|
138 a := a * n |
|
139 n := n + -1 |
|
140 goto top |
|
141 done: |
|
142 \end{lstlisting} |
|
143 |
|
144 \noindent As can be seen each line of the program contains a |
|
145 statement. In the first two lines we assign values to the |
|
146 variables \pcode{a} and \pcode{n}. In line 4 we test whether |
|
147 \pcode{n} is zero, in which case we jump to the end of the |
|
148 program marked with the label \pcode{done}. If \pcode{n} is |
|
149 not zero, we multiply the content of \pcode{a} by \pcode{n}, |
|
150 decrease \pcode{n} by one and jump back to the beginning of |
|
151 the loop, marked with the label \pcode{top}. Another program |
|
152 in our language is shown in Figure~\ref{fib}. I let you think |
|
153 what it calculates. |
|
154 |
|
155 \begin{figure}[t] |
|
156 \begin{lstlisting}[numbers=none, |
|
157 language={},xleftmargin=10mm] |
|
158 n := 6 |
|
159 m1 := 0 |
|
160 m2 := 1 |
|
161 loop: |
|
162 jmp? n = 0 done |
|
163 tmp := m2 |
|
164 m2 := m1 + m2 |
|
165 m1 := tmp |
|
166 n := n + -1 |
|
167 goto top |
|
168 done: |
|
169 \end{lstlisting} |
|
170 \caption{A mystery program in our idealised programming language. |
|
171 Try to find out what it calculates! \label{fib}} |
|
172 \end{figure} |
|
173 |
|
174 Even if our language is rather small, it is still Turing |
|
175 complete---meaning quite powerful. However, discussing this |
|
176 fact in more detail would lead us too far astray. Clearly, our |
|
177 programming is rather low-level and not very comfortable for |
|
178 writing programs. It is inspired by real machine code, which |
|
179 is the code that is executed by a CPU. So a more interesting |
|
180 question is what is missing in comparison with real machine |
|
181 code? Well, not much\ldots{}in principle. Real machine code, |
|
182 of course, contains many more arithmetic instructions (not |
|
183 just addition and multiplication) and many more conditional |
|
184 jumps. We could add these to our language if we wanted, but |
|
185 complexity is really beside the point here. Furthermore, real |
|
186 machine code has many instructions for manipulating memory. We |
|
187 do not have this at all. This is actually a more serious |
|
188 simplification because we assume numbers to be arbitrary small |
|
189 or large, which is not the case with real machine code. In |
|
190 real machine code, basic number formats have a range and might |
|
191 over-flow or under-flow this range. Also the number of |
|
192 variables in our programs is potentially unlimited, while |
|
193 memory in an actual computer, of course, is always limited. To |
|
194 sum up, our language might look ridiculously simple, but it is |
|
195 not too far removed from practically relevant issues. |
|
196 |
|
197 |
|
198 \subsubsection*{An Interpreter} |
|
199 |
|
200 Designing a language is like playing god: you can say what |
|
201 names for variables you allow; what programs should look like; |
|
202 most importantly you can decide what each part of the program |
|
203 should mean and do. While our language is quite simple and the |
|
204 meaning of statements, for example, is rather straightforward, |
|
205 there are still places where we need to make real choices. For |
|
206 example consider the conditional jumps, say the one in the |
|
207 factorial program: |
|
208 |
|
209 \begin{center} |
|
210 \code{jmp? n = 0 done} |
|
211 \end{center} |
|
212 |
|
213 \noindent How should they work? We could introduce Booleans |
|
214 (\pcode{true} and \pcode{false}) and then jump only when the |
|
215 condition is \pcode{true}. However, since we have numbers in |
|
216 our language anyway, why not just encoding \pcode{true} as |
|
217 one, and \pcode{false} as zero? In this way we can dispense |
|
218 with the additional concept of Booleans. |
|
219 |
|
220 I hope the above discussion makes it already clear we need to |
|
221 be a bit more careful with our programs. Below we shall |
|
222 describe an interpreter for our programming language, which |
|
223 specifies exactly how programs are supposed to be |
|
224 run\ldots{}at least we will specify this for all \emph{good} |
|
225 programs. By good programs I mean where all variables are |
|
226 initialised, for example. Our interpreter will just crash if |
|
227 it cannot find out the value for a variable when it is not |
|
228 initialised. Also, we will assume that labels in good programs |
|
229 are unique, otherwise our programs will calculate ``garbage''. |
|
230 |
|
231 First we will pre-process our programs. This will simplify the |
|
232 definition of our interpreter later on. By pre-processing our |
|
233 programs we will transform programs into \emph{snippets}. A |
|
234 snippet is a label and all the code that comes after the |
|
235 label. This essentially means a snippet is a \emph{map} from |
|
236 labels to code.\footnote{Be sure you know what maps are. In a |
|
237 programming context they are often represented as association |
|
238 list where some data is associated with a key.} |
|
239 |
|
240 Given that programs are sequences (or lists) of statements, we |
|
241 can easily calculate the snippets by just traversing this |
|
242 sequence and recursively generating the map. Suppose a program |
|
243 is of the general form |
|
244 |
|
245 \begin{center} |
|
246 $stmt_1\;stmt_2\; \ldots\; stmt_n$ |
|
247 \end{center} |
|
248 |
|
249 \noindent The idea is to go through this sequence of |
|
250 statements one by one and check whether they are a label. If |
|
251 yes, we add the label and the remaining statements to our map. |
|
252 If no, we just continue with the next statement. To come up |
|
253 with a recursive definition for generating snippets, let us |
|
254 write $[]$ for the program that does not contain any |
|
255 statement. Consider the following definition: |
|
256 |
|
257 \begin{center} |
|
258 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
|
259 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ |
|
260 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ & |
|
261 $\begin{cases} |
|
262 \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\ |
|
263 \textit{snippets}(rest) & \text{otherwise} |
|
264 \end{cases}$ |
|
265 \end{tabular} |
|
266 \end{center} |
|
267 |
|
268 \noindent In the first clause we just return the empty map for |
|
269 the program that does not contain any statement. In the second |
|
270 clause, we have to distinguish the case where the first |
|
271 statement is a label or not. As said before, if not, then we |
|
272 just ``throw away'' the label and recursively calculate the |
|
273 snippets for the rest of the program (the otherwise clause). |
|
274 If yes, then we do the same, but also update the map so that |
|
275 $label$ now points to the rest of the statements (the if |
|
276 clause). This looks all realtively straightforward, but there |
|
277 is one small problem we need to overcome: our two programs |
|
278 shown so far have no label as \emph{entry point}---that is |
|
279 where the execution is supposed to start. We usually assume |
|
280 that the first statement will be run first. To make this the |
|
281 default, it is convenient if we add to all our programs a |
|
282 default label, say \pcode{""} (the empty string). With this we |
|
283 can define our pre-processing of programs as follows |
|
284 |
|
285 \begin{center} |
|
286 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ |
|
287 \end{center} |
|
288 |
|
289 \noindent Let us see how this pans out in practice. If we |
|
290 pre-process the factorial program shown earlier, we obtain the |
|
291 following map: |
|
292 |
|
293 \begin{center} |
|
294 \small |
|
295 \lstset{numbers=none, |
|
296 language={}, |
|
297 xleftmargin=0mm, |
|
298 aboveskip=0.5mm, |
|
299 belowskip=0.5mm, |
|
300 frame=single, |
|
301 framerule=0mm, |
|
302 framesep=0mm} |
|
303 \begin{tikzpicture}[node distance=0mm] |
|
304 \node (A1) [draw]{\pcode{""}}; |
|
305 \node (B1) [right=of A1] {$\mapsto$}; |
|
306 \node (C1) [right=of B1,anchor=north west] { |
|
307 \begin{minipage}{3.5cm} |
|
308 \begin{lstlisting}[language={},xleftmargin=0mm] |
|
309 a := 1 |
|
310 n := 5 |
|
311 top: |
|
312 jmp? n = 0 done |
|
313 a := a * n |
|
314 n := n + -1 |
|
315 goto top |
|
316 done: |
|
317 \end{lstlisting} |
|
318 \end{minipage}}; |
|
319 \node (A2) [right=of C1.north east,draw] {\pcode{top}}; |
|
320 \node (B2) [right=of A2] {$\mapsto$}; |
|
321 \node (C2) [right=of B2, anchor=north west] { |
|
322 \begin{minipage}{3.5cm} |
|
323 \begin{lstlisting}[language={},xleftmargin=0mm] |
|
324 jmp? n = 0 done |
|
325 a := a * n |
|
326 n := n + -1 |
|
327 goto top |
|
328 done: |
|
329 \end{lstlisting} |
|
330 \end{minipage}}; |
|
331 \node (A3) [right=of C2.north east,draw] {\pcode{done}}; |
|
332 \node (B3) [right=of A3] {$\mapsto$}; |
|
333 \node (C3) [right=of B3] {$[]$}; |
|
334 \end{tikzpicture} |
|
335 \end{center} |
|
336 |
|
337 \noindent I highlighted the \emph{keys} in this map. Since |
|
338 there are three labels in the factorial program (remember we |
|
339 added \pcode{""}), there are three keys. When running the |
|
340 factorial program and encountering a jump, then we only have |
|
341 to consult this snippets-map in order to find out what the |
|
342 next statements should be. |
|
343 |
|
344 We should now be in the position to define how a program |
|
345 should be run. In the context of interpreters, this |
|
346 ``running'' of programs is often called \emph{evaluation}. Let |
|
347 us start with the definition of how expressions are evaluated. |
|
348 A first attempt might be the following recursive function: |
|
349 |
|
350 \begin{center} |
|
351 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
|
352 $\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$ |
|
353 \qquad\text{if}\; \texttt{n}\; \text{is a number like} |
|
354 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
|
355 \pcode{2}\ldots{}\\ |
|
356 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, |
|
357 \texttt{e}_\texttt{2})$ & |
|
358 $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) + |
|
359 \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\ |
|
360 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, |
|
361 \texttt{e}_\texttt{2})$ & |
|
362 $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) * |
|
363 \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\ |
|
364 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, |
|
365 \texttt{e}_\texttt{2})$ & |
|
366 $\dn$ & $\begin{cases} |
|
367 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) = |
|
368 \textit{eval\_exp}(\texttt{e}_\texttt{2})\\ |
|
369 0 & \text{otherwise} |
|
370 \end{cases}$ |
|
371 \end{tabular} |
|
372 \end{center} |
|
373 |
|
374 \noindent While this should look all rather intuitive`, still |
|
375 be very careful. There is a subtlety which can be easily |
|
376 overlooked: The function \textit{eval\_exp} takes an |
|
377 expression of our programming language as input and returns a |
|
378 number as output. Therefore whenever we encounter a number in |
|
379 our program, we just return this number---this is defined in |
|
380 the first clause above. Whenever we encounter an addition, |
|
381 well then we first evaluate the left-hand side |
|
382 $\texttt{e}_\texttt{1}$ of the addition (this will give a |
|
383 number), then evaluate the right-hand side |
|
384 $\texttt{e}_\texttt{2}$ (this gives another number), and |
|
385 finally add both numbers together. Here is the subtlety: on |
|
386 the left-hand side of the $\dn$ we have a \texttt{+} (in the |
|
387 teletype font) which is the symbol for addition in our |
|
388 programming language. On the right-hand side we have $+$ which |
|
389 stands for the arithmetic operation from ``mathematics'' of |
|
390 adding two numbers. These are rather different concepts---one |
|
391 is a symbol (which we made up), and the other a mathematical |
|
392 operation. When we will have a look at an actual |
|
393 implementation of our interpreter, the mathematical operation |
|
394 will be the function for addition from the programming |
|
395 language in which we \underline{\smash{implement}} our |
|
396 interpreter. While the \texttt{+} is just a symbol that is |
|
397 used in our programming language. Clearly we have to use a |
|
398 symbol that is a good mnemonic for addition, otherwise we will |
|
399 confuse the programmers working with our language. Therefore |
|
400 we use $\texttt{+}$. A similar choice is made for times in the |
|
401 third clause and equality in the fourth clause. |
|
402 |
|
403 Remember I wrote at the beginning of this section about being |
|
404 god when designing a programming language. You can see this |
|
405 here: we need to give meaning to symbols. At the moment |
|
406 however, we are a poor fallible god. Look again at the grammar |
|
407 of our programming language and our definition. Clearly, an |
|
408 expression can contain variables. So far we have ignored them. |
|
409 What should our interpreter do with variables? They might |
|
410 change during the evaluation of a program. For example the |
|
411 variable \pcode{n} in the factorial program counts down from 5 |
|
412 down to 0. How can we improve our definition above to give also |
|
413 an answer whenever our interpreter encounters a variable in an |
|
414 expression? The solution is to add an \emph{environment}, |
|
415 written $env$, as an additional input argument to our |
|
416 \textit{eval\_exp} function. |
|
417 |
|
418 \begin{center} |
|
419 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
|
420 $\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$ |
|
421 \qquad\text{if}\; \texttt{n}\; \text{is a number like} |
|
422 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
|
423 \pcode{2}\ldots{}\\ |
|
424 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, |
|
425 \texttt{e}_\texttt{2}, env)$ & |
|
426 $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) + |
|
427 \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\ |
|
428 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, |
|
429 \texttt{e}_\texttt{2}, env)$ & |
|
430 $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) * |
|
431 \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\ |
|
432 $\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, |
|
433 \texttt{e}_\texttt{2}, env)$ & |
|
434 $\dn$ & $\begin{cases} |
|
435 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) = |
|
436 \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\ |
|
437 0 & \text{otherwise} |
|
438 \end{cases}$\\ |
|
439 $\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$ |
|
440 \end{tabular} |
|
441 \end{center} |
|
442 |
|
443 \noindent This environment $env$ also acts like a map: it |
|
444 associates variables with their current values. For example |
|
445 after evaluating the first two lines in our factorial |
|
446 program, such an environment might look as follows |
|
447 |
|
448 \begin{center} |
|
449 \begin{tabular}{ll} |
|
450 $\fbox{\texttt{a}} \mapsto \texttt{1}$ & |
|
451 $\fbox{\texttt{n}} \mapsto \texttt{5}$ |
|
452 \end{tabular} |
|
453 \end{center} |
|
454 |
|
455 \noindent Again I highlighted the keys. In the clause for |
|
456 variables, we can therefore consult this environment and |
|
457 return whatever value is currently stored for this variable. |
|
458 This is written $env(x)$---meaning we query this map with $x$ |
|
459 we obtain the corresponding number. You might ask what happens |
|
460 if an environment does not contain any value for, say, the |
|
461 variable $x$? Well, then our interpreter just ``crashes'', or |
|
462 more precisely will raise an exception. In this case we have a |
|
463 ``bad'' program that tried to use a variable before it was |
|
464 initialised. The programmer should not have done this. In a |
|
465 real programming language, we would of course try a bit harder |
|
466 and for example give an error at compile time, or design our |
|
467 language in such a way that this can never happen. With the |
|
468 second version of \textit{eval\_exp} we completed our |
|
469 definition for evaluating expressions. |
|
470 |
|
471 Next comes the evaluation function for statements. We define |
|
472 this function in such a way that we recursively evaluate a |
|
473 whole sequence of statements. Assume a program $p$ (you want |
|
474 to evaluate) and its pre-processed snippets $sn$. Then we can |
|
475 define: |
|
476 |
|
477 \begin{center} |
|
478 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} |
|
479 $\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\ |
|
480 $\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ & |
|
481 $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ |
|
482 $\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & |
|
483 $\dn$ & $\textit{eval\_stmts}(rest, |
|
484 env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\ |
|
485 $\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ |
|
486 & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\ |
|
487 $\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ |
|
488 & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}} |
|
489 \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ |
|
490 & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\ |
|
491 \textit{eval\_stmts}(rest, env) & \text{otherwise}\\ |
|
492 \end{array}\end{cases}$ |
|
493 \end{tabular} |
|
494 \end{center} |
|
495 |
|
496 \noindent The first clause is for the empty program, or when |
|
497 we arrived at the end of the program. In this case we just |
|
498 return the environment. The second clause is for when the next |
|
499 statement is a label. That means the program is of the form |
|
500 $\texttt{label:}\;rest$ where the label is some string and |
|
501 $rest$ stands for all following statements. This case is easy, |
|
502 because our evaluation function just discards the label and |
|
503 evaluates the rest of the statements (we already extracted all |
|
504 important information about labels when we pre-processed our |
|
505 programs and generated the snippets). The third clause is for |
|
506 variable assignments. Again we just evaluate the rest for the |
|
507 statements, but with a modified environment---since the |
|
508 variable assignment is supposed to introduce a new variable or |
|
509 change the current value of a variable. For this modification |
|
510 of the environment we first evaluate the expression |
|
511 $\texttt{e}$ using our evaluation function for expressions. |
|
512 This gives us a number. Then we assign this number to the |
|
513 variable $x$ in the environment. This modified environment |
|
514 will be used to evaluate the rest of the program. The fourth |
|
515 clause is for the unconditional jump to a label, called |
|
516 \texttt{lbl}. That means we have to look up in our snippets |
|
517 map $sn$ what are the next statements for this label. |
|
518 Therefore we will continue with evaluating, not with the rest |
|
519 of the program, but with the statements stored in the |
|
520 snippets-map under the label $\texttt{lbl}$. The fifth clause |
|
521 for conditional jumps is similar, but to decide whether to |
|
522 make the jump we first need to evaluate the expression |
|
523 $\texttt{e}$ in order to find out whether it is $1$. If yes, |
|
524 we jump, otherwise we just continue with evaluating the rest |
|
525 of the program. |
|
526 |
|
527 Our interpreter works in two stages: First we pre-process our |
|
528 program generating the snippets map $sn$, say. Second we call |
|
529 the evaluation function with the default entry point and the |
|
530 empty environment: |
|
531 |
|
532 \begin{center} |
|
533 $\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$ |
|
534 \end{center} |
|
535 |
|
536 \noindent It is interesting to note that our interpreter when |
|
537 it comes to the end of the program returns an environment. Our |
|
538 programming language does not contain any constructs for input |
|
539 and output. Therefore this environment is the only effect we |
|
540 can observe when running the program (apart from that our |
|
541 interpreter might need some time before finishing the |
|
542 evaluation of the program and the CPU getting hot). Evaluating |
|
543 the factorial program with our interpreter we receive as |
|
544 ``answer''-environment |
|
545 |
|
546 \begin{center} |
|
547 \begin{tabular}{ll} |
|
548 $\fbox{\texttt{a}} \mapsto \texttt{120}$ & |
|
549 $\fbox{\texttt{n}} \mapsto \texttt{0}$ |
|
550 \end{tabular} |
|
551 \end{center} |
|
552 |
|
553 \noindent While the discussion above should have illustrated |
|
554 the ideas, in order to do some serious calculations, we clearly |
|
555 need to implement the interpreter. |
|
556 |
|
557 |
|
558 \subsubsection*{Scala Code for the Interpreter} |
|
559 |
|
560 Functional programming languages are very convenient for |
|
561 implementations of interpreters. A good choice for a |
|
562 functional programming language is Scala, a programming |
|
563 language that combines functional and object-oriented |
|
564 pro\-gramming-styles. It has received in the last five years or |
|
565 so quite a bit of attention. One reason for this attention is |
|
566 that, like the Java programming language, Scala compiles to |
|
567 the Java Virtual Machine (JVM) and therefore Scala programs |
|
568 can run under MacOSX, Linux and Windows.\footnote{There are |
|
569 also experimental backends for Android and JavaScript.} Unlike |
|
570 Java, however, Scala often allows programmers to write very |
|
571 concise and elegant code. Some therefore say Scala is the much |
|
572 better Java. A number of companies, The Guardian, Twitter, |
|
573 Coursera, FourSquare, LinkedIn to name a few, either use Scala |
|
574 exclusively in production code, or at least to some |
|
575 substantial degree. If you want to try out Scala yourself, the |
|
576 Scala compiler can be downloaded from |
|
577 |
|
578 \begin{quote} |
|
579 \url{http://www.scala-lang.org} |
|
580 \end{quote} |
|
581 |
|
582 Let us have a look at the Scala code shown in |
|
583 Figure~\ref{code}. It shows the entire code for the |
|
584 interpreter, though the implementation is admittedly no |
|
585 frills. |
|
586 |
|
587 \begin{figure}[t] |
|
588 \small |
|
589 \lstinputlisting[language=Scala]{../progs/inter.scala} |
|
590 \caption{The entire code of the interpreter for our |
|
591 idealised programming language.\label{code}} |
|
592 \end{figure} |
|
593 |
|
594 |
|
595 \subsubsection*{Static Analysis} |
|
596 |
|
597 Finally we can come back to our original problem, namely |
|
598 finding out what the signs of variables are |
|
599 |
|
600 \begin{center} |
|
601 |
|
602 |
|
603 \end{center} |
|
604 |
|
605 \end{document} |
|
606 |
|
607 %% list of static analysers for C |
|
608 http://spinroot.com/static/index.html |
|
609 |
|
610 %% NASA coding rules for C |
|
611 http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf |
|
612 |
|
613 %%% Local Variables: |
|
614 %%% mode: latex |
|
615 %%% TeX-master: t |
|
616 %%% End: |