60 \subsubsection*{A Simple, Idealised Programming Language} |
60 \subsubsection*{A Simple, Idealised Programming Language} |
61 |
61 |
62 Our starting point is a small, idealised programming language. |
62 Our starting point is a small, idealised programming language. |
63 It is idealised because we cut several corners in comparison |
63 It is idealised because we cut several corners in comparison |
64 with real programming languages. The language we will study |
64 with real programming languages. The language we will study |
65 contains, amongst other things, variables holding integers. We |
65 contains, amongst other things, variables holding integers. |
66 want to find out what the sign of these integers (positive or |
66 Using static analysis, we want to find out what the sign of |
67 negative) will be when the program runs. This sign-analysis |
67 these integers (positive or negative) will be when the program |
68 seems like a very simple problem, but it will turn out even |
68 runs. This sign-analysis seems like a very simple problem. But |
69 such simple problems, if approached naively, are in general |
69 it will turn out even such simple problems, if approached |
70 undecidable, just like Turing's halting problem. I let you |
70 naively, are in general undecidable, just like Turing's |
71 think why? |
71 halting problem. I let you think why? |
72 |
72 |
73 |
73 |
74 Is sign-analysis of variables an interesting problem? Well, |
74 Is sign-analysis of variables an interesting problem? Well, |
75 yes---if a compiler can find out that for example a variable |
75 yes---if a compiler can find out that for example a variable |
76 will never be negative and this variable is used as an index |
76 will never be negative and this variable is used as an index |
107 There are three main syntactic categories: \emph{statments} |
107 There are three main syntactic categories: \emph{statments} |
108 and \emph{expressions} as well as \emph{programs}, which are |
108 and \emph{expressions} as well as \emph{programs}, which are |
109 sequences of statements. Statements are either labels, |
109 sequences of statements. Statements are either labels, |
110 variable assignments, conditional jumps (\pcode{jmp?}) and |
110 variable assignments, conditional jumps (\pcode{jmp?}) and |
111 unconditional jumps (\pcode{goto}). Labels are just strings, |
111 unconditional jumps (\pcode{goto}). Labels are just strings, |
112 which can be used as the target of a jump. The conditional |
112 which can be used as the target of a jump. We assume that in |
113 jumps and variable assignments involve (arithmetic) |
113 every program the labels are unique---otherwise if there is a |
114 expressions. Expressions are either numbers, variables or |
114 clash we do not know where to jump to. The conditional jumps |
115 compound expressions built up from \pcode{+}, \pcode{*} and |
115 and variable assignments involve (arithmetic) expressions. |
116 \emph{=} (for simplicity reasons we do not consider any other |
116 Expressions are either numbers, variables or compound |
|
117 expressions built up from \pcode{+}, \pcode{*} and \emph{=} |
|
118 (for simplicity reasons we do not consider any other |
117 operations). We assume we have negative and positive numbers, |
119 operations). We assume we have negative and positive numbers, |
118 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
120 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, |
119 \pcode{2}\ldots{} An example program that calculates the |
121 \pcode{2}\ldots{} An example program that calculates the |
120 factorial of 5 is as follows: |
122 factorial of 5 is as follows: |
121 |
123 |
172 just addition and multiplication) and many more conditional |
174 just addition and multiplication) and many more conditional |
173 jumps. We could add these to our language if we wanted, but |
175 jumps. We could add these to our language if we wanted, but |
174 complexity is really beside the point here. Furthermore, real |
176 complexity is really beside the point here. Furthermore, real |
175 machine code has many instructions for manipulating memory. We |
177 machine code has many instructions for manipulating memory. We |
176 do not have this at all. This is actually a more serious |
178 do not have this at all. This is actually a more serious |
177 simplification because we assume numbers to be arbitrary |
179 simplification because we assume numbers to be arbitrary small |
178 precision, which is not the case with real machine code. In |
180 or large, which is not the case with real machine code. In |
179 real code basic number formats have a range and might |
181 real code basic number formats have a range and might |
180 over-flow or under-flow from this range. Also the numbers of |
182 over-flow or under-flow from this range. Also the number of |
181 variables in our programs is unlimited, while memory, of |
183 variables in our programs is potentially unlimited, while |
182 course, is always limited somehow on any actual machine. To |
184 memory in an actual computer, of course, is always limited |
183 sum up, our language might look very simple, but it is not |
185 somehow on any actual. To sum up, our language might look very |
184 completely removed from practically relevant issues. |
186 simple, but it is not completely removed from practically |
|
187 relevant issues. |
185 |
188 |
186 |
189 |
187 \subsubsection*{An Interpreter} |
190 \subsubsection*{An Interpreter} |
188 |
191 |
189 Designing a language is like being god: you can say what |
192 Designing a language is like playing god: you can say what |
190 each part of the program should mean. |
193 names for variables you allow; what programs should look like; |
|
194 most importantly you can decide what each part of the program |
|
195 should mean and do. While our language is rather simple and |
|
196 the meaning is rather straightforward, there are still places |
|
197 where we need to make a real choice. For example with |
|
198 conditional jumps, say the one in the factorial program: |
|
199 |
|
200 \begin{center} |
|
201 \code{jmp? n = 0 done} |
|
202 \end{center} |
|
203 |
|
204 \noindent How should they work? We could introduce Booleans |
|
205 (\pcode{true} and \pcode{false}) and then jump only when the |
|
206 condition is \pcode{true}. However, since we have numbers in |
|
207 our language anyway, why not just encoding \emph{true} as |
|
208 zero, and \pcode{false} as anything else? In this way we can |
|
209 dispense with the additional concept of Booleans, but also we |
|
210 could replace the jump above by |
|
211 |
|
212 \begin{center} |
|
213 \code{jmp? n done} |
|
214 \end{center} |
|
215 |
|
216 \noindent which behaves exactly the same. But what does it |
|
217 mean that two jumps behave the same? |
|
218 |
|
219 I hope the above discussion makes it already clear we need to |
|
220 be a bit more careful with our programs. Below we shall |
|
221 describe an interpreter for our programs, which specifies |
|
222 exactly how programs are supposed to be run\ldots{}at least we |
|
223 will specify this for all \emph{good} programs. By good |
|
224 programs we mean where for example all variables are |
|
225 initialised. Our interpreter will just crash if it cannot find |
|
226 out the value for a variable, because it is not initialised. |
|
227 |
|
228 First we will pre-process our programs. This will simplify |
|
229 our definition of our interpreter later on. We will transform |
|
230 programs into \emph{snippets}. |
191 |
231 |
192 \end{document} |
232 \end{document} |
193 |
233 |
194 %%% Local Variables: |
234 %%% Local Variables: |
195 %%% mode: latex |
235 %%% mode: latex |