27 |
27 |
28 \begin{center} |
28 \begin{center} |
29 \url{http://sel4.systems} |
29 \url{http://sel4.systems} |
30 \end{center} |
30 \end{center} |
31 |
31 |
32 \noindent This work was in 2011 included in the MIT Technology |
32 \noindent In 2011 this work was included in the MIT Technology |
33 Review in the annual list of the world’s ten most important |
33 Review in the annual list of the world’s ten most important |
34 emerging |
34 emerging |
35 technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}} |
35 technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}} |
36 While this work is impressive, its technical details are too |
36 While this work is impressive, its technical details are too |
37 enormous for an explanation here. Therefore let us look at |
37 enormous for an explanation here. Therefore let us look at |
38 something much simpler, namely finding out properties about |
38 something much simpler, namely finding out properties about |
39 programs using \emph{static analysis}. |
39 programs using \emph{static analysis}. |
40 |
40 |
41 Static analysis is one technique that checks properties of a |
41 Static analysis is a technique that checks properties of a |
42 program without actually running the program. This should |
42 program without actually running the program. This should |
43 raise alarm bells with you---because almost all interesting |
43 raise alarm bells with you---because almost all interesting |
44 properties about programs are equivalent to the halting |
44 properties about programs are equivalent to the halting |
45 problem, which we know is undecidable. For example estimating |
45 problem, which we know is undecidable. For example estimating |
46 the memory consumption of programs is in general undecidable, |
46 the memory consumption of programs is in general undecidable, |
58 |
58 |
59 |
59 |
60 \subsubsection*{A Simple, Idealised Programming Language} |
60 \subsubsection*{A Simple, Idealised Programming Language} |
61 |
61 |
62 Our starting point is a small, idealised programming language. |
62 Our starting point is a small, idealised programming language. |
63 This language contains variables holding integers. We want to |
63 This language, amongst other things, contains variables |
64 find out what the sign of these integers will be when the |
64 holding integers. We want to find out what the sign of these |
65 program runs. This seems like a very simple problem, but it |
65 integers will be when the program runs. This sign-analysis |
66 will turn out even such a simple analysis if approached |
66 seems like a very simple problem, but it will turn out even |
67 naively is in general undecidable, just like Turing's halting |
67 such simple problems, if approached naively, are in general |
68 problem. I let you think why? |
68 undecidable, just like Turing's halting problem. I let you |
|
69 think why? |
69 |
70 |
70 |
71 |
71 Is sign-analysis of variables an interesting problem? Well, |
72 Is sign-analysis of variables an interesting problem? Well, |
72 yes---if a compiler can find out that for example a variable |
73 yes---if a compiler can find out that for example a variable |
73 will never be negative and this variable is used as an index |
74 will never be negative and this variable is used as an index |
74 for an array, then the compiler does not need to generate code |
75 for an array, then the compiler does not need to generate code |
75 for an underflow-test. Remember some languages are immune to |
76 for an underflow-test. Remember some languages are immune to |
76 buffer-overflow attacks because they add bound checks |
77 buffer-overflow attacks because they add underflow and |
77 everywhere. This could potentially drastically speed up the |
78 overflow checks everywhere. If the compiler can omit the |
78 generated code. |
79 underflow test, for example, then this can potentially |
|
80 drastically speed up the generated code. |
79 |
81 |
80 Since we want to |
82 What do programs in our programming language look like? The |
|
83 following grammar gives a first specification: |
81 |
84 |
82 \begin{multicols}{2} |
85 \begin{multicols}{2} |
83 \begin{plstx}[rhs style=,one per line,left margin=9mm] |
86 \begin{plstx}[rhs style=,one per line,left margin=9mm] |
|
87 : \meta{Stmt} ::= \meta{label} \texttt{:} |
|
88 | \meta{var} \texttt{:=} \meta{Exp} |
|
89 | \texttt{jmp?} \meta{Exp} \meta{label} |
|
90 | \texttt{goto} \meta{label}\\ |
|
91 : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\ |
|
92 \end{plstx} |
|
93 \columnbreak |
|
94 \begin{plstx}[rhs style=,one per line] |
84 : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} |
95 : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp} |
85 | \meta{Exp} \texttt{*} \meta{Exp} |
96 | \meta{Exp} \texttt{*} \meta{Exp} |
86 | \meta{Exp} \texttt{=} \meta{Exp} |
97 | \meta{Exp} \texttt{=} \meta{Exp} |
87 | \meta{num} |
98 | \meta{num} |
88 | \meta{var}\\ |
99 | \meta{var}\\ |
89 \end{plstx} |
100 \end{plstx} |
90 \columnbreak |
|
91 \begin{plstx}[rhs style=,one per line] |
|
92 : \meta{Stmt} ::= \meta{label} \texttt{:} |
|
93 | \meta{var} \texttt{:=} \meta{Exp} |
|
94 | \texttt{jmp?} \meta{Exp} \meta{label} |
|
95 | \texttt{goto} \meta{label}\\ |
|
96 : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\ |
|
97 \end{plstx} |
|
98 \end{multicols} |
101 \end{multicols} |
99 |
102 |
100 \begin{lstlisting}[numbers=none, |
103 \noindent I assume you are familiar with such |
101 language={},xleftmargin=10mm] |
104 grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}} |
|
105 There are three main syntactic categories: \emph{statments} |
|
106 and \emph{expressions} as well as \emph{programs}, which are |
|
107 sequences of statements. Statements are either labels, |
|
108 variable assignments, conditional jumps (\pcode{jmp?}) and |
|
109 unconditional jumps (\pcode{goto}). Labels are just strings, |
|
110 which can be used as the target of a jump. The conditional |
|
111 jumps and variable assignments involve (arithmetic) |
|
112 expressions. Expressions are either numbers, variables or |
|
113 compound expressions built up from \pcode{+}, \pcode{*} and |
|
114 \emph{=} (for simplicity reasons we do not consider any other |
|
115 operations). We assume we have negative and positive numbers, |
|
116 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
|
117 \pcode{2}\ldots{} An example program that calculates the |
|
118 factorial of 5 is as follows: |
|
119 |
|
120 \begin{lstlisting}[language={},xleftmargin=10mm] |
102 a := 1 |
121 a := 1 |
103 n := 5 |
122 n := 5 |
104 top: jmp? n = 0 done |
123 top: |
|
124 jmp? n = 0 done |
105 a := a * n |
125 a := a * n |
106 n := n + -1 |
126 n := n + -1 |
107 goto top |
127 goto top |
108 done: |
128 done: |
109 \end{lstlisting} |
129 \end{lstlisting} |
110 |
130 |
|
131 \noindent Each line of the program contains a statement. In |
|
132 the first two lines we assign values to the variables |
|
133 \pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n} |
|
134 is zero, in which case we jump to the end of the program |
|
135 marked with the label \pcode{done}. If \pcode{n} is not zero, |
|
136 we multiply the content of \pcode{a} by \pcode{n}, decrease |
|
137 \pcode{n} by one and jump back to the beginning of the loop, |
|
138 marked with the label \pcode{top}. |
|
139 |
|
140 \begin{figure}[t] |
111 \begin{lstlisting}[numbers=none, |
141 \begin{lstlisting}[numbers=none, |
112 language={},xleftmargin=10mm] |
142 language={},xleftmargin=10mm] |
113 n := 6 |
143 n := 6 |
114 m1 := 0 |
144 m1 := 0 |
115 m2 := 1 |
145 m2 := 1 |
116 loop: jmp? n = 0 done |
146 loop: |
|
147 jmp? n = 0 done |
117 tmp := m2 |
148 tmp := m2 |
118 m2 := m1 + m2 |
149 m2 := m1 + m2 |
119 m1 := tmp |
150 m1 := tmp |
120 n := n + -1 |
151 n := n + -1 |
121 goto top |
152 goto top |
122 done: |
153 done: |
123 \end{lstlisting} |
154 \end{lstlisting} |
|
155 \label{A mystery program in our idealised programming language.} |
|
156 \end{figure} |
124 |
157 |
125 \bigskip |
158 Even if our language is rather small, it is still Turing |
|
159 complete---so rather powerful. However, discussing this more |
|
160 would lead us to far astray. |
126 |
161 |
127 \noindent What would be missing in comparison with real |
162 What would be missing in comparison with real |
128 (low-level machine) code? Well, the numbers we assume to be |
163 (low-level machine) code? Well, the numbers we assume to be |
129 arbitrary precision, which is not the case in real code. There |
164 arbitrary precision, which is not the case in real code. There |
130 basic number formats have a rang and might over-run or |
165 basic number formats have a rang and might over-run or |
131 under-run from this range. Our assumption about variables, |
166 under-run from this range. Our assumption about variables, |
132 does not correspond to actual registers, which are only |
167 does not correspond to actual registers, which are only |