diff -r cd8d18f7b7ac -r da5713bcdbb0 handouts/ho09.tex --- a/handouts/ho09.tex Thu Dec 11 23:51:08 2014 +0000 +++ b/handouts/ho09.tex Fri Dec 12 15:41:37 2014 +0000 @@ -62,13 +62,13 @@ Our starting point is a small, idealised programming language. It is idealised because we cut several corners in comparison with real programming languages. The language we will study -contains, amongst other things, variables holding integers. We -want to find out what the sign of these integers (positive or -negative) will be when the program runs. This sign-analysis -seems like a very simple problem, but it will turn out even -such simple problems, if approached naively, are in general -undecidable, just like Turing's halting problem. I let you -think why? +contains, amongst other things, variables holding integers. +Using static analysis, we want to find out what the sign of +these integers (positive or negative) will be when the program +runs. This sign-analysis seems like a very simple problem. But +it will turn out even such simple problems, if approached +naively, are in general undecidable, just like Turing's +halting problem. I let you think why? Is sign-analysis of variables an interesting problem? Well, @@ -108,11 +108,13 @@ sequences of statements. Statements are either labels, variable assignments, conditional jumps (\pcode{jmp?}) and unconditional jumps (\pcode{goto}). Labels are just strings, -which can be used as the target of a jump. The conditional -jumps and variable assignments involve (arithmetic) -expressions. Expressions are either numbers, variables or -compound expressions built up from \pcode{+}, \pcode{*} and -\emph{=} (for simplicity reasons we do not consider any other +which can be used as the target of a jump. We assume that in +every program the labels are unique---otherwise if there is a +clash we do not know where to jump to. The conditional jumps +and variable assignments involve (arithmetic) expressions. +Expressions are either numbers, variables or compound +expressions built up from \pcode{+}, \pcode{*} and \emph{=} +(for simplicity reasons we do not consider any other operations). We assume we have negative and positive numbers, \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1}, \pcode{2}\ldots{} An example program that calculates the @@ -173,20 +175,58 @@ complexity is really beside the point here. Furthermore, real machine code has many instructions for manipulating memory. We do not have this at all. This is actually a more serious -simplification because we assume numbers to be arbitrary -precision, which is not the case with real machine code. In +simplification because we assume numbers to be arbitrary small +or large, which is not the case with real machine code. In real code basic number formats have a range and might -over-flow or under-flow from this range. Also the numbers of -variables in our programs is unlimited, while memory, of -course, is always limited somehow on any actual machine. To -sum up, our language might look very simple, but it is not -completely removed from practically relevant issues. +over-flow or under-flow from this range. Also the number of +variables in our programs is potentially unlimited, while +memory in an actual computer, of course, is always limited +somehow on any actual. To sum up, our language might look very +simple, but it is not completely removed from practically +relevant issues. \subsubsection*{An Interpreter} -Designing a language is like being god: you can say what -each part of the program should mean. +Designing a language is like playing god: you can say what +names for variables you allow; what programs should look like; +most importantly you can decide what each part of the program +should mean and do. While our language is rather simple and +the meaning is rather straightforward, there are still places +where we need to make a real choice. For example with +conditional jumps, say the one in the factorial program: + +\begin{center} +\code{jmp? n = 0 done} +\end{center} + +\noindent How should they work? We could introduce Booleans +(\pcode{true} and \pcode{false}) and then jump only when the +condition is \pcode{true}. However, since we have numbers in +our language anyway, why not just encoding \emph{true} as +zero, and \pcode{false} as anything else? In this way we can +dispense with the additional concept of Booleans, but also we +could replace the jump above by + +\begin{center} +\code{jmp? n done} +\end{center} + +\noindent which behaves exactly the same. But what does it +mean that two jumps behave the same? + +I hope the above discussion makes it already clear we need to +be a bit more careful with our programs. Below we shall +describe an interpreter for our programs, which specifies +exactly how programs are supposed to be run\ldots{}at least we +will specify this for all \emph{good} programs. By good +programs we mean where for example all variables are +initialised. Our interpreter will just crash if it cannot find +out the value for a variable, because it is not initialised. + +First we will pre-process our programs. This will simplify +our definition of our interpreter later on. We will transform +programs into \emph{snippets}. \end{document}