65 with real programming languages. The language we will study |
64 with real programming languages. The language we will study |
66 contains, amongst other things, variables holding integers. |
65 contains, amongst other things, variables holding integers. |
67 Using static analysis, we want to find out what the sign of |
66 Using static analysis, we want to find out what the sign of |
68 these integers (positive or negative) will be when the program |
67 these integers (positive or negative) will be when the program |
69 runs. This sign-analysis seems like a very simple problem. But |
68 runs. This sign-analysis seems like a very simple problem. But |
70 teven such simple problems, if approached naively, are in |
69 even such simple problems, if approached naively, are in |
71 general undecidable, just like Turing's halting problem. I let |
70 general undecidable, just like Turing's halting problem. I let |
72 you think why? |
71 you think why? |
73 |
72 |
74 |
73 |
75 Is sign-analysis of variables an interesting problem? Well, |
74 Is sign-analysis of variables an interesting problem? Well, |
76 yes---if a compiler can find out that for example a variable |
75 yes---if a compiler can find out that for example a variable |
77 will never be negative and this variable is used as an index |
76 will never be negative and this variable is used as an index |
78 for an array, then the comopiler does not need to generate code |
77 for an array, then the compiler does not need to generate code |
79 for an underflow-test. Remember some languages are immune to |
78 for an underflow-test. Remember some languages are immune to |
80 buffer-overflow attacks, but they need to add underflow and |
79 buffer-overflow attacks, but they need to add underflow and |
81 overflow checks everywhere. If the compiler can omit the |
80 overflow checks everywhere. If the compiler can omit the |
82 underflow test, for example, then this can potentially |
81 underflow test, for example, then this can potentially |
83 drastically speed up the generated code. |
82 drastically speed up the generated code. According to the John |
|
83 Regehr, an expert in the field of compilers, overflow checks |
|
84 can cause 5-10\% slowdown, and in some languages even 100\% |
|
85 for tight loops.\footnote{\url{http://blog.regehr.org/archives/1154}} |
84 |
86 |
85 What do programs in our simple programming language look like? |
87 What do programs in our simple programming language look like? |
86 The following grammar gives a first specification: |
88 The following grammar gives a first specification: |
87 |
89 |
88 \begin{multicols}{2} |
90 \begin{multicols}{2} |
109 and \emph{expressions} as well as \emph{programs}, which are |
111 and \emph{expressions} as well as \emph{programs}, which are |
110 sequences of statements. Statements are either labels, |
112 sequences of statements. Statements are either labels, |
111 variable assignments, conditional jumps (\pcode{jmp?}) and |
113 variable assignments, conditional jumps (\pcode{jmp?}) and |
112 unconditional jumps (\pcode{goto}). Labels are just strings, |
114 unconditional jumps (\pcode{goto}). Labels are just strings, |
113 which can be used as the target of a jump. We assume that in |
115 which can be used as the target of a jump. We assume that in |
114 every program the labels are unique---if there is a clash we |
116 every program the labels are unique---if there is a clash, |
115 do not know where to jump to. The conditional jumps and |
117 then we do not know where to jump to. The conditional jumps |
116 variable assignments involve (arithmetic) expressions. |
118 and variable assignments involve (arithmetic) expressions. |
117 Expressions are either numbers, variables or compound |
119 Expressions are either numbers, variables or compound |
118 expressions built up from \pcode{+}, \pcode{*} and \emph{=} |
120 expressions built up from \pcode{+}, \pcode{*} and \emph{=} |
119 (for simplicity reasons we do not consider any other |
121 (for simplicity reasons we do not consider any other |
120 operations). We assume we have negative and positive numbers, |
122 operations). We assume we have negative and positive numbers, |
121 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
123 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
182 real code basic number formats have a range and might |
184 real code basic number formats have a range and might |
183 over-flow or under-flow from this range. Also the number of |
185 over-flow or under-flow from this range. Also the number of |
184 variables in our programs is potentially unlimited, while |
186 variables in our programs is potentially unlimited, while |
185 memory in an actual computer, of course, is always limited |
187 memory in an actual computer, of course, is always limited |
186 somehow on any actual. To sum up, our language might look |
188 somehow on any actual. To sum up, our language might look |
187 ridiculously simple, but it is not much removed from |
189 ridiculously simple, but it is not far removed from |
188 practically relevant issues. |
190 practically relevant issues. |
189 |
191 |
190 |
192 |
191 \subsubsection*{An Interpreter} |
193 \subsubsection*{An Interpreter} |
192 |
194 |
206 \noindent How should they work? We could introduce Booleans |
208 \noindent How should they work? We could introduce Booleans |
207 (\pcode{true} and \pcode{false}) and then jump only when the |
209 (\pcode{true} and \pcode{false}) and then jump only when the |
208 condition is \pcode{true}. However, since we have numbers in |
210 condition is \pcode{true}. However, since we have numbers in |
209 our language anyway, why not just encoding \pcode{true} as |
211 our language anyway, why not just encoding \pcode{true} as |
210 one, and \pcode{false} as zero? In this way we can dispense |
212 one, and \pcode{false} as zero? In this way we can dispense |
211 with the additional concept of Booleans, {\bf but also we could |
213 with the additional concept of Booleans. |
212 replace the jump above by |
|
213 |
|
214 \begin{center} |
|
215 \code{jmp? 1 + (n + -n) done} |
|
216 \end{center} |
|
217 |
|
218 \noindent which behaves exactly the same. But what does it |
|
219 actually mean that two jumps behave the same? Or two programs |
|
220 for that matter?} |
|
221 |
214 |
222 I hope the above discussion makes it already clear we need to |
215 I hope the above discussion makes it already clear we need to |
223 be a bit more careful with our programs. Below we shall |
216 be a bit more careful with our programs. Below we shall |
224 describe an interpreter for our programming language, which |
217 describe an interpreter for our programming language, which |
225 specifies exactly how programs are supposed to be |
218 specifies exactly how programs are supposed to be |
226 run\ldots{}at least we will specify this for all \emph{good} |
219 run\ldots{}at least we will specify this for all \emph{good} |
227 programs. By good programs I mean where all variables are |
220 programs. By good programs I mean where all variables are |
228 initialised, for example. Our interpreter will just crash if |
221 initialised, for example. Our interpreter will just crash if |
229 it cannot find out the value for a variable, in case it is not |
222 it cannot find out the value for a variable, in case it is not |
230 initialised. Also we will assume that labels in good programs |
223 initialised. Also, we will assume that labels in good programs |
231 are unique, otherwise our programs will calculate ``garbage''. |
224 are unique, otherwise our programs will calculate ``garbage''. |
232 |
225 |
233 First we will pre-process our programs. This will simplify the |
226 First we will pre-process our programs. This will simplify the |
234 definition of our interpreter later on. We will transform |
227 definition of our interpreter later on. We will transform |
235 programs into \emph{snippets}. A snippet is a label and all |
228 programs into \emph{snippets}. Their purpose is to simplify |
236 code that comes after the label. This essentially means a |
229 the definition of what the interpreter should do in case of a |
237 snippet is a \emph{map} from labels to code. Given that |
230 jump. A snippet is a label and all code that comes after the |
238 programs are sequences (or lists) of statements, we can easily |
231 label. This essentially means a snippet is a \emph{map} from |
239 calculate the snippets by just recursively generating the map. |
232 labels to code. Given that programs are sequences (or lists) |
|
233 of statements, we can easily calculate the snippets by just |
|
234 traversing this sequence and recursively generating the map. |
240 Suppose a program is of the general form |
235 Suppose a program is of the general form |
241 |
236 |
242 \begin{center} |
237 \begin{center} |
243 $stmt_1\;stmt_2\; \ldots\; stmt_n$ |
238 $stmt_1\;stmt_2\; \ldots\; stmt_n$ |
244 \end{center} |
239 \end{center} |
245 |
240 |
246 \noindent In order to calculate the snippets of such a |
241 \noindent The idea is to go through this sequence of |
247 program, we have to go through the statements one by one and |
242 statements one by one and check whether they are a label. If |
248 check whether they are a label. If yes, we add the label and |
243 yes, we add the label and the remaining statements to our map. |
249 the remaining statements to our map. If no we just continue |
244 If no, we just continue with the next statement. To come up |
250 with the next statement. To come up with a recursive |
245 with a recursive definition for generating snippets, let us |
251 definition for generating snippets, let us write $[]$ for the |
246 write $[]$ for the program that does not contain any |
252 program that does not contain any statement. Consider the |
247 statement. Consider the following definition: |
253 following definition: |
|
254 |
248 |
255 \begin{center} |
249 \begin{center} |
256 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
250 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
257 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ |
251 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ |
258 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ & |
252 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ & |
262 \end{cases}$ |
256 \end{cases}$ |
263 \end{tabular} |
257 \end{tabular} |
264 \end{center} |
258 \end{center} |
265 |
259 |
266 \noindent In the first clause we just return the empty map for |
260 \noindent In the first clause we just return the empty map for |
267 the program that does not contain any statements. In the |
261 the program that does not contain any statement. In the second |
268 second clause, we have to distinguish the case where the first |
262 clause, we have to distinguish the case where the first |
269 statement is a label or not. If not, then we just ``throw |
263 statement is a label or not. As said before, if not, then we |
270 away'' the label and recursively calculate the snippets for |
264 just ``throw away'' the label and recursively calculate the |
271 the rest of the program. If yes, then we do the same, but also |
265 snippets for the rest of the program. If yes, then we do the |
272 update the map so that $label$ now points to the rest of the |
266 same, but also update the map so that $label$ now points to |
273 statements. There is one small problem we need to overcome: |
267 the rest of the statements. There is one small problem we need |
274 our two programs have no label as entry point---that is where |
268 to overcome: our two programs have no label as \emph{entry |
275 the execution starts. We always assume that the first |
269 point}---that is where the execution starts. We usually assume |
276 statement will be run first. For us it seems convenient if |
270 that the first statement will be run first. To make this the |
277 we add to all our programs a default label, say |
271 default, it is convenient if we add to all our programs a |
278 \pcode{""} (the empty string). With this we can define |
272 default label, say \pcode{""} (the empty string). With this we |
279 our preprocessing of programs as follows |
273 can define our pre-processing of programs as follows |
280 |
274 |
281 \begin{center} |
275 \begin{center} |
282 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ |
276 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ |
283 \end{center} |
277 \end{center} |
284 |
278 |
285 \noindent Let us see how this pans out in practice. If we |
279 \noindent Let us see how this pans out in practice. If we |
286 preprocess the factorial program shown earlier we obtain the |
280 pre-process the factorial program shown earlier, we obtain the |
287 following map: |
281 following map: |
288 |
282 |
289 \begin{center} |
283 \begin{center} |
290 \small |
284 \small |
291 \lstset{numbers=none, |
285 \lstset{numbers=none, |
292 language={}, |
286 language={}, |
293 xleftmargin=0mm, |
287 xleftmargin=0mm, |
294 aboveskip=0.5mm, |
288 aboveskip=0.5mm, |
|
289 belowskip=0.5mm, |
295 frame=single, |
290 frame=single, |
296 framerule=0mm, |
291 framerule=0mm, |
297 framesep=0mm} |
292 framesep=0mm} |
298 \begin{tikzpicture}[node distance=0mm] |
293 \begin{tikzpicture}[node distance=0mm] |
299 \node (A1) [draw]{\pcode{""}}; |
294 \node (A1) [draw]{\pcode{""}}; |
328 \node (C3) [right=of B3] {$[]$}; |
323 \node (C3) [right=of B3] {$[]$}; |
329 \end{tikzpicture} |
324 \end{tikzpicture} |
330 \end{center} |
325 \end{center} |
331 |
326 |
332 \noindent I highlighted the \emph{keys} in this map. Since |
327 \noindent I highlighted the \emph{keys} in this map. Since |
333 there are three labels in the factorial program, there are |
328 there are three labels in the factorial program, there are |
334 three keys. |
329 three keys. When running the factorial program and |
|
330 encountering a jump, then we only have to consult this map, in |
|
331 order to find out what the next instruction should be. |
|
332 |
|
333 We should now be in the position to define how a program |
|
334 should be run. This ``running'' of programs is often called |
|
335 \emph{evaluation}. Let us start with the definition for |
|
336 how expressions are evaluated. A first attempt is the |
|
337 following recursive function: |
|
338 |
|
339 \begin{center} |
|
340 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
|
341 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\ |
|
342 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ & |
|
343 $\begin{cases} |
|
344 \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\ |
|
345 \textit{snippets}(rest) & \text{otherwise} |
|
346 \end{cases}$ |
|
347 \end{tabular} |
|
348 \end{center} |
|
349 |
335 |
350 |
336 \end{document} |
351 \end{document} |
337 |
352 |
338 %%% Local Variables: |
353 %%% Local Variables: |
339 %%% mode: latex |
354 %%% mode: latex |