9 |
9 |
10 \section*{Handout 9 (Static Analysis)} |
10 \section*{Handout 9 (Static Analysis)} |
11 |
11 |
12 If we want to improve the safety and security of our programs, |
12 If we want to improve the safety and security of our programs, |
13 we need a more principled approach to programming. Testing is |
13 we need a more principled approach to programming. Testing is |
14 good, but as Dijkstra famously said: |
14 good, but as Dijkstra famously wrote: |
15 |
15 |
16 \begin{quote}\it |
16 \begin{quote}\it |
17 ``Program testing can be a very effective way to show the |
17 ``Program testing can be a very effective way to show the |
18 \underline{\smash{presence}} of bugs, but it is hopelessly |
18 \underline{\smash{presence}} of bugs, but it is hopelessly |
19 inadequate for showing their \underline{\smash{absence}}.'' |
19 inadequate for showing their \underline{\smash{absence}}.'' |
39 programs using \emph{static analysis}. |
39 programs using \emph{static analysis}. |
40 |
40 |
41 Static analysis is a technique that checks properties of a |
41 Static analysis is a technique that checks properties of a |
42 program without actually running the program. This should |
42 program without actually running the program. This should |
43 raise alarm bells with you---because almost all interesting |
43 raise alarm bells with you---because almost all interesting |
44 properties about programs are equivalent to the halting |
44 properties about programs are equivalent to the halting |
45 problem, which we know is undecidable. For example estimating |
45 problem, which we know is undecidable. For example estimating |
46 the memory consumption of programs is in general undecidable, |
46 the memory consumption of programs is in general undecidable, |
47 just like the halting problem. Static analysis circumvents |
47 just like the halting problem. Static analysis circumvents |
48 this undecidability-problem by essentially allowing answers |
48 this undecidability-problem by essentially allowing answers |
49 \emph{yes} and \emph{no}, but also \emph{don't know}. With |
49 \emph{yes} and \emph{no}, but also \emph{don't know}. With |
58 |
58 |
59 |
59 |
60 \subsubsection*{A Simple, Idealised Programming Language} |
60 \subsubsection*{A Simple, Idealised Programming Language} |
61 |
61 |
62 Our starting point is a small, idealised programming language. |
62 Our starting point is a small, idealised programming language. |
63 This language, amongst other things, contains variables |
63 It is idealised because we cut several corners in comparison |
64 holding integers. We want to find out what the sign of these |
64 with real programming languages. The language we will study |
65 integers will be when the program runs. This sign-analysis |
65 contains, amongst other things, variables holding integers. We |
|
66 want to find out what the sign of these integers (positive or |
|
67 negative) will be when the program runs. This sign-analysis |
66 seems like a very simple problem, but it will turn out even |
68 seems like a very simple problem, but it will turn out even |
67 such simple problems, if approached naively, are in general |
69 such simple problems, if approached naively, are in general |
68 undecidable, just like Turing's halting problem. I let you |
70 undecidable, just like Turing's halting problem. I let you |
69 think why? |
71 think why? |
70 |
72 |
72 Is sign-analysis of variables an interesting problem? Well, |
74 Is sign-analysis of variables an interesting problem? Well, |
73 yes---if a compiler can find out that for example a variable |
75 yes---if a compiler can find out that for example a variable |
74 will never be negative and this variable is used as an index |
76 will never be negative and this variable is used as an index |
75 for an array, then the compiler does not need to generate code |
77 for an array, then the compiler does not need to generate code |
76 for an underflow-test. Remember some languages are immune to |
78 for an underflow-test. Remember some languages are immune to |
77 buffer-overflow attacks because they add underflow and |
79 buffer-overflow attacks, but they need to add underflow and |
78 overflow checks everywhere. If the compiler can omit the |
80 overflow checks everywhere. If the compiler can omit the |
79 underflow test, for example, then this can potentially |
81 underflow test, for example, then this can potentially |
80 drastically speed up the generated code. |
82 drastically speed up the generated code. |
81 |
83 |
82 What do programs in our programming language look like? The |
84 What do programs in our programming language look like? The |
133 \pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n} |
135 \pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n} |
134 is zero, in which case we jump to the end of the program |
136 is zero, in which case we jump to the end of the program |
135 marked with the label \pcode{done}. If \pcode{n} is not zero, |
137 marked with the label \pcode{done}. If \pcode{n} is not zero, |
136 we multiply the content of \pcode{a} by \pcode{n}, decrease |
138 we multiply the content of \pcode{a} by \pcode{n}, decrease |
137 \pcode{n} by one and jump back to the beginning of the loop, |
139 \pcode{n} by one and jump back to the beginning of the loop, |
138 marked with the label \pcode{top}. |
140 marked with the label \pcode{top}. Another program in our |
|
141 language is shown in Figure~\ref{fib}. I let you think what it |
|
142 calculates. |
139 |
143 |
140 \begin{figure}[t] |
144 \begin{figure}[t] |
141 \begin{lstlisting}[numbers=none, |
145 \begin{lstlisting}[numbers=none, |
142 language={},xleftmargin=10mm] |
146 language={},xleftmargin=10mm] |
143 n := 6 |
147 n := 6 |
150 m1 := tmp |
154 m1 := tmp |
151 n := n + -1 |
155 n := n + -1 |
152 goto top |
156 goto top |
153 done: |
157 done: |
154 \end{lstlisting} |
158 \end{lstlisting} |
155 \label{A mystery program in our idealised programming language.} |
159 \caption{A mystery program in our idealised programming language. |
|
160 Try to find out what it calculates! \label{fib}} |
156 \end{figure} |
161 \end{figure} |
157 |
162 |
158 Even if our language is rather small, it is still Turing |
163 Even if our language is rather small, it is still Turing |
159 complete---so rather powerful. However, discussing this more |
164 complete---meaning quite powerful. However, discussing this |
160 would lead us to far astray. |
165 fact in more detail would lead us too far astray. Clearly, our |
|
166 programming is rather low-level and not very comfortable for |
|
167 writing programs. It is inspired by machine code, which is the |
|
168 code that is actually executed by a CPU. So a more interesting |
|
169 question is what is missing in comparison with real machine |
|
170 code? Well, not much\ldots{}in principle. Real machine code, |
|
171 of course, contains many more arithmetic instructions (not |
|
172 just addition and multiplication) and many more conditional |
|
173 jumps. We could add these to our language if we wanted, but |
|
174 complexity is really beside the point here. Furthermore, real |
|
175 machine code has many instructions for manipulating memory. We |
|
176 do not have this at all. This is actually a more serious |
|
177 simplification because we assume numbers to be arbitrary |
|
178 precision, which is not the case with real machine code. In |
|
179 real code basic number formats have a range and might |
|
180 over-flow or under-flow from this range. Also the numbers of |
|
181 variables in our programs is unlimited, while memory, of |
|
182 course, is always limited somehow on any actual machine. To |
|
183 sum up, our language might look very simple, but it is not |
|
184 completely removed from practically relevant issues. |
161 |
185 |
162 What would be missing in comparison with real |
186 |
163 (low-level machine) code? Well, the numbers we assume to be |
187 \subsubsection*{An Interpreter} |
164 arbitrary precision, which is not the case in real code. There |
188 |
165 basic number formats have a rang and might over-run or |
189 Designing a language is like being god: you can say what |
166 under-run from this range. Our assumption about variables, |
190 each part of the program should mean. |
167 does not correspond to actual registers, which are only |
|
168 limited on real hardware. Obviously, real code has richer |
|
169 operations than just addition, multiplication and equality. |
|
170 But this are not really essential limitations of our simple |
|
171 examples. |
|
172 |
191 |
173 \end{document} |
192 \end{document} |
174 |
193 |
175 %%% Local Variables: |
194 %%% Local Variables: |
176 %%% mode: latex |
195 %%% mode: latex |