64 with real programming languages. The language we will study |
65 with real programming languages. The language we will study |
65 contains, amongst other things, variables holding integers. |
66 contains, amongst other things, variables holding integers. |
66 Using static analysis, we want to find out what the sign of |
67 Using static analysis, we want to find out what the sign of |
67 these integers (positive or negative) will be when the program |
68 these integers (positive or negative) will be when the program |
68 runs. This sign-analysis seems like a very simple problem. But |
69 runs. This sign-analysis seems like a very simple problem. But |
69 it will turn out even such simple problems, if approached |
70 teven such simple problems, if approached naively, are in |
70 naively, are in general undecidable, just like Turing's |
71 general undecidable, just like Turing's halting problem. I let |
71 halting problem. I let you think why? |
72 you think why? |
72 |
73 |
73 |
74 |
74 Is sign-analysis of variables an interesting problem? Well, |
75 Is sign-analysis of variables an interesting problem? Well, |
75 yes---if a compiler can find out that for example a variable |
76 yes---if a compiler can find out that for example a variable |
76 will never be negative and this variable is used as an index |
77 will never be negative and this variable is used as an index |
77 for an array, then the compiler does not need to generate code |
78 for an array, then the comopiler does not need to generate code |
78 for an underflow-test. Remember some languages are immune to |
79 for an underflow-test. Remember some languages are immune to |
79 buffer-overflow attacks, but they need to add underflow and |
80 buffer-overflow attacks, but they need to add underflow and |
80 overflow checks everywhere. If the compiler can omit the |
81 overflow checks everywhere. If the compiler can omit the |
81 underflow test, for example, then this can potentially |
82 underflow test, for example, then this can potentially |
82 drastically speed up the generated code. |
83 drastically speed up the generated code. |
83 |
84 |
84 What do programs in our programming language look like? The |
85 What do programs in our simple programming language look like? |
85 following grammar gives a first specification: |
86 The following grammar gives a first specification: |
86 |
87 |
87 \begin{multicols}{2} |
88 \begin{multicols}{2} |
88 \begin{plstx}[rhs style=,one per line,left margin=9mm] |
89 \begin{plstx}[rhs style=,one per line,left margin=9mm] |
89 : \meta{Stmt} ::= \meta{label} \texttt{:} |
90 : \meta{Stmt} ::= \meta{label} \texttt{:} |
90 | \meta{var} \texttt{:=} \meta{Exp} |
91 | \meta{var} \texttt{:=} \meta{Exp} |
108 and \emph{expressions} as well as \emph{programs}, which are |
109 and \emph{expressions} as well as \emph{programs}, which are |
109 sequences of statements. Statements are either labels, |
110 sequences of statements. Statements are either labels, |
110 variable assignments, conditional jumps (\pcode{jmp?}) and |
111 variable assignments, conditional jumps (\pcode{jmp?}) and |
111 unconditional jumps (\pcode{goto}). Labels are just strings, |
112 unconditional jumps (\pcode{goto}). Labels are just strings, |
112 which can be used as the target of a jump. We assume that in |
113 which can be used as the target of a jump. We assume that in |
113 every program the labels are unique---otherwise if there is a |
114 every program the labels are unique---if there is a clash we |
114 clash we do not know where to jump to. The conditional jumps |
115 do not know where to jump to. The conditional jumps and |
115 and variable assignments involve (arithmetic) expressions. |
116 variable assignments involve (arithmetic) expressions. |
116 Expressions are either numbers, variables or compound |
117 Expressions are either numbers, variables or compound |
117 expressions built up from \pcode{+}, \pcode{*} and \emph{=} |
118 expressions built up from \pcode{+}, \pcode{*} and \emph{=} |
118 (for simplicity reasons we do not consider any other |
119 (for simplicity reasons we do not consider any other |
119 operations). We assume we have negative and positive numbers, |
120 operations). We assume we have negative and positive numbers, |
120 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
121 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
121 \pcode{2}\ldots{} An example program that calculates the |
122 \pcode{2}\ldots{} An example program that calculates the |
122 factorial of 5 is as follows: |
123 factorial of 5 is in oure programming language as follows: |
123 |
124 |
124 \begin{lstlisting}[language={},xleftmargin=10mm] |
125 \begin{lstlisting}[language={},xleftmargin=10mm] |
125 a := 1 |
126 a := 1 |
126 n := 5 |
127 n := 5 |
127 top: |
128 top: |
180 or large, which is not the case with real machine code. In |
181 or large, which is not the case with real machine code. In |
181 real code basic number formats have a range and might |
182 real code basic number formats have a range and might |
182 over-flow or under-flow from this range. Also the number of |
183 over-flow or under-flow from this range. Also the number of |
183 variables in our programs is potentially unlimited, while |
184 variables in our programs is potentially unlimited, while |
184 memory in an actual computer, of course, is always limited |
185 memory in an actual computer, of course, is always limited |
185 somehow on any actual. To sum up, our language might look very |
186 somehow on any actual. To sum up, our language might look |
186 simple, but it is not completely removed from practically |
187 ridiculously simple, but it is not much removed from |
187 relevant issues. |
188 practically relevant issues. |
188 |
189 |
189 |
190 |
190 \subsubsection*{An Interpreter} |
191 \subsubsection*{An Interpreter} |
191 |
192 |
192 Designing a language is like playing god: you can say what |
193 Designing a language is like playing god: you can say what |
193 names for variables you allow; what programs should look like; |
194 names for variables you allow; what programs should look like; |
194 most importantly you can decide what each part of the program |
195 most importantly you can decide what each part of the program |
195 should mean and do. While our language is rather simple and |
196 should mean and do. While our language is rather simple and |
196 the meaning is rather straightforward, there are still places |
197 the meaning of statements, for example, is rather |
197 where we need to make a real choice. For example with |
198 straightforward, there are still places where we need to make |
198 conditional jumps, say the one in the factorial program: |
199 real choices. For example consider the conditional jumps, say |
|
200 the one in the factorial program: |
199 |
201 |
200 \begin{center} |
202 \begin{center} |
201 \code{jmp? n = 0 done} |
203 \code{jmp? n = 0 done} |
202 \end{center} |
204 \end{center} |
203 |
205 |
204 \noindent How should they work? We could introduce Booleans |
206 \noindent How should they work? We could introduce Booleans |
205 (\pcode{true} and \pcode{false}) and then jump only when the |
207 (\pcode{true} and \pcode{false}) and then jump only when the |
206 condition is \pcode{true}. However, since we have numbers in |
208 condition is \pcode{true}. However, since we have numbers in |
207 our language anyway, why not just encoding \emph{true} as |
209 our language anyway, why not just encoding \pcode{true} as |
208 zero, and \pcode{false} as anything else? In this way we can |
210 one, and \pcode{false} as zero? In this way we can dispense |
209 dispense with the additional concept of Booleans, but also we |
211 with the additional concept of Booleans, {\bf but also we could |
210 could replace the jump above by |
212 replace the jump above by |
211 |
213 |
212 \begin{center} |
214 \begin{center} |
213 \code{jmp? n done} |
215 \code{jmp? 1 + (n + -n) done} |
214 \end{center} |
216 \end{center} |
215 |
217 |
216 \noindent which behaves exactly the same. But what does it |
218 \noindent which behaves exactly the same. But what does it |
217 mean that two jumps behave the same? |
219 actually mean that two jumps behave the same? Or two programs |
|
220 for that matter?} |
218 |
221 |
219 I hope the above discussion makes it already clear we need to |
222 I hope the above discussion makes it already clear we need to |
220 be a bit more careful with our programs. Below we shall |
223 be a bit more careful with our programs. Below we shall |
221 describe an interpreter for our programs, which specifies |
224 describe an interpreter for our programming language, which |
222 exactly how programs are supposed to be run\ldots{}at least we |
225 specifies exactly how programs are supposed to be |
223 will specify this for all \emph{good} programs. By good |
226 run\ldots{}at least we will specify this for all \emph{good} |
224 programs we mean where for example all variables are |
227 programs. By good programs I mean where all variables are |
225 initialised. Our interpreter will just crash if it cannot find |
228 initialised, for example. Our interpreter will just crash if |
226 out the value for a variable, because it is not initialised. |
229 it cannot find out the value for a variable, in case it is not |
227 |
230 initialised. Also we will assume that labels in good programs |
228 First we will pre-process our programs. This will simplify |
231 are unique, otherwise our programs will calculate ``garbage''. |
229 our definition of our interpreter later on. We will transform |
232 |
230 programs into \emph{snippets}. |
233 First we will pre-process our programs. This will simplify the |
|
234 definition of our interpreter later on. We will transform |
|
235 programs into \emph{snippets}. A snippet is a label and all |
|
236 code that comes after the label. This essentially means a |
|
237 snippet is a \emph{map} from labels to code. Given that |
|
238 programs are sequences (or lists) of statements, we can easily |
|
239 calculate the snippets by just recursively generating the map. |
|
240 Suppose a program is of the general form |
|
241 |
|
242 \begin{center} |
|
243 $stmt_1\;stmt_2\; \ldots\; stmt_n$ |
|
244 \end{center} |
|
245 |
|
246 \noindent In order to calculate the snippets of such a |
|
247 program, we have to go through the statements one by one and |
|
248 check whether they are a label. If yes, we add the label and |
|
249 the remaining statements to our map. If no we just continue |
|
250 with the next statement. To come up with a recursive |
|
251 definition for generating snippets, let us write $[]$ for the |
|
252 program that does not contain any statement. Consider the |
|
253 following definition: |
|
254 |
|
255 \begin{center} |
|
256 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
|
257 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ |
|
258 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ & |
|
259 $\begin{cases} |
|
260 \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\ |
|
261 \textit{snippets}(rest) & \text{otherwise} |
|
262 \end{cases}$ |
|
263 \end{tabular} |
|
264 \end{center} |
|
265 |
|
266 \noindent In the first clause we just return the empty map for |
|
267 the program that does not contain any statements. In the |
|
268 second clause, we have to distinguish the case where the first |
|
269 statement is a label or not. If not, then we just ``throw |
|
270 away'' the label and recursively calculate the snippets for |
|
271 the rest of the program. If yes, then we do the same, but also |
|
272 update the map so that $label$ now points to the rest of the |
|
273 statements. There is one small problem we need to overcome: |
|
274 our two programs have no label as entry point---that is where |
|
275 the execution starts. We always assume that the first |
|
276 statement will be run first. For us it seems convenient if |
|
277 we add to all our programs a default label, say |
|
278 \pcode{""} (the empty string). With this we can define |
|
279 our preprocessing of programs as follows |
|
280 |
|
281 \begin{center} |
|
282 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ |
|
283 \end{center} |
|
284 |
|
285 \noindent Let us see how this pans out in practice. If we |
|
286 preprocess the factorial program shown earlier we obtain the |
|
287 following map: |
|
288 |
|
289 \begin{center} |
|
290 \small |
|
291 \lstset{numbers=none, |
|
292 language={}, |
|
293 xleftmargin=0mm, |
|
294 aboveskip=0.5mm, |
|
295 frame=single, |
|
296 framerule=0mm, |
|
297 framesep=0mm} |
|
298 \begin{tikzpicture}[node distance=0mm] |
|
299 \node (A1) [draw]{\pcode{""}}; |
|
300 \node (B1) [right=of A1] {$\mapsto$}; |
|
301 \node (C1) [right=of B1,anchor=north west] { |
|
302 \begin{minipage}{3.5cm} |
|
303 \begin{lstlisting}[language={},xleftmargin=0mm] |
|
304 a := 1 |
|
305 n := 5 |
|
306 top: |
|
307 jmp? n = 0 done |
|
308 a := a * n |
|
309 n := n + -1 |
|
310 goto top |
|
311 done: |
|
312 \end{lstlisting} |
|
313 \end{minipage}}; |
|
314 \node (A2) [right=of C1.north east,draw] {\pcode{top}}; |
|
315 \node (B2) [right=of A2] {$\mapsto$}; |
|
316 \node (C2) [right=of B2, anchor=north west] { |
|
317 \begin{minipage}{3.5cm} |
|
318 \begin{lstlisting}[language={},xleftmargin=0mm] |
|
319 jmp? n = 0 done |
|
320 a := a * n |
|
321 n := n + -1 |
|
322 goto top |
|
323 done: |
|
324 \end{lstlisting} |
|
325 \end{minipage}}; |
|
326 \node (A3) [right=of C2.north east,draw] {\pcode{done}}; |
|
327 \node (B3) [right=of A3] {$\mapsto$}; |
|
328 \node (C3) [right=of B3] {$[]$}; |
|
329 \end{tikzpicture} |
|
330 \end{center} |
|
331 |
|
332 \noindent I highlighted the \emph{keys} in this map. Since |
|
333 there are three labels in the factorial program, there are |
|
334 three keys. |
231 |
335 |
232 \end{document} |
336 \end{document} |
233 |
337 |
234 %%% Local Variables: |
338 %%% Local Variables: |
235 %%% mode: latex |
339 %%% mode: latex |