handouts/ho09.tex
changeset 354 8e5e84b14041
parent 352 da5713bcdbb0
child 355 619073c37649
equal deleted inserted replaced
353:605f815986dd 354:8e5e84b14041
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 \usepackage{../graphics}
     4 \usepackage{../graphics}
     5 \usepackage{../grammar}
     5 \usepackage{../grammar}
     6 \usepackage{multicol}
     6 \usepackage{multicol}
       
     7 \usepackage{array}
     7 
     8 
     8 \begin{document}
     9 \begin{document}
     9 
    10 
    10 \section*{Handout 9 (Static Analysis)}
    11 \section*{Handout 9 (Static Analysis)}
    11 
    12 
    64 with real programming languages. The language we will study
    65 with real programming languages. The language we will study
    65 contains, amongst other things, variables holding integers.
    66 contains, amongst other things, variables holding integers.
    66 Using static analysis, we want to find out what the sign of
    67 Using static analysis, we want to find out what the sign of
    67 these integers (positive or negative) will be when the program
    68 these integers (positive or negative) will be when the program
    68 runs. This sign-analysis seems like a very simple problem. But
    69 runs. This sign-analysis seems like a very simple problem. But
    69 it will turn out even such simple problems, if approached
    70 teven such simple problems, if approached naively, are in
    70 naively, are in general undecidable, just like Turing's
    71 general undecidable, just like Turing's halting problem. I let
    71 halting problem. I let you think why?
    72 you think why?
    72 
    73 
    73 
    74 
    74 Is sign-analysis of variables an interesting problem? Well,
    75 Is sign-analysis of variables an interesting problem? Well,
    75 yes---if a compiler can find out that for example a variable
    76 yes---if a compiler can find out that for example a variable
    76 will never be negative and this variable is used as an index
    77 will never be negative and this variable is used as an index
    77 for an array, then the compiler does not need to generate code
    78 for an array, then the comopiler does not need to generate code
    78 for an underflow-test. Remember some languages are immune to
    79 for an underflow-test. Remember some languages are immune to
    79 buffer-overflow attacks, but they need to add underflow and
    80 buffer-overflow attacks, but they need to add underflow and
    80 overflow checks everywhere. If the compiler can omit the
    81 overflow checks everywhere. If the compiler can omit the
    81 underflow test, for example, then this can potentially
    82 underflow test, for example, then this can potentially
    82 drastically speed up the generated code.
    83 drastically speed up the generated code.
    83 
    84 
    84 What do programs in our programming language look like? The 
    85 What do programs in our simple programming language look like?
    85 following grammar gives a first specification:
    86 The following grammar gives a first specification:
    86 
    87 
    87 \begin{multicols}{2}
    88 \begin{multicols}{2}
    88 \begin{plstx}[rhs style=,one per line,left margin=9mm]
    89 \begin{plstx}[rhs style=,one per line,left margin=9mm]
    89 : \meta{Stmt} ::= \meta{label} \texttt{:}
    90 : \meta{Stmt} ::= \meta{label} \texttt{:}
    90                 | \meta{var} \texttt{:=} \meta{Exp}
    91                 | \meta{var} \texttt{:=} \meta{Exp}
   108 and \emph{expressions} as well as \emph{programs}, which are
   109 and \emph{expressions} as well as \emph{programs}, which are
   109 sequences of statements. Statements are either labels,
   110 sequences of statements. Statements are either labels,
   110 variable assignments, conditional jumps (\pcode{jmp?}) and
   111 variable assignments, conditional jumps (\pcode{jmp?}) and
   111 unconditional jumps (\pcode{goto}). Labels are just strings,
   112 unconditional jumps (\pcode{goto}). Labels are just strings,
   112 which can be used as the target of a jump. We assume that in
   113 which can be used as the target of a jump. We assume that in
   113 every program the labels are unique---otherwise if there is a
   114 every program the labels are unique---if there is a clash we
   114 clash we do not know where to jump to. The conditional jumps
   115 do not know where to jump to. The conditional jumps and
   115 and variable assignments involve (arithmetic) expressions.
   116 variable assignments involve (arithmetic) expressions.
   116 Expressions are either numbers, variables or compound
   117 Expressions are either numbers, variables or compound
   117 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
   118 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
   118 (for simplicity reasons we do not consider any other
   119 (for simplicity reasons we do not consider any other
   119 operations). We assume we have negative and positive numbers,
   120 operations). We assume we have negative and positive numbers,
   120 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
   121 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
   121 \pcode{2}\ldots{} An example program that calculates the
   122 \pcode{2}\ldots{} An example program that calculates the
   122 factorial of 5 is as follows:
   123 factorial of 5 is in oure programming language as follows:
   123 
   124 
   124 \begin{lstlisting}[language={},xleftmargin=10mm]
   125 \begin{lstlisting}[language={},xleftmargin=10mm]
   125       a := 1
   126       a := 1
   126       n := 5 
   127       n := 5 
   127 top:  
   128 top:  
   130       n := n + -1 
   131       n := n + -1 
   131       goto top 
   132       goto top 
   132 done:
   133 done:
   133 \end{lstlisting}
   134 \end{lstlisting}
   134 
   135 
   135 \noindent Each line of the program contains a statement. In
   136 \noindent As can be seen each line of the program contains a
   136 the first two lines we assign values to the variables
   137 statement. In the first two lines we assign values to the
   137 \pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n}
   138 variables \pcode{a} and \pcode{n}. In line 4 we test whether
   138 is zero, in which case we jump to the end of the program
   139 \pcode{n} is zero, in which case we jump to the end of the
   139 marked with the label \pcode{done}. If \pcode{n} is not zero,
   140 program marked with the label \pcode{done}. If \pcode{n} is
   140 we multiply the content of \pcode{a} by \pcode{n}, decrease
   141 not zero, we multiply the content of \pcode{a} by \pcode{n},
   141 \pcode{n} by one and jump back to the beginning of the loop,
   142 decrease \pcode{n} by one and jump back to the beginning of
   142 marked with the label \pcode{top}. Another program in our
   143 the loop, marked with the label \pcode{top}. Another program
   143 language is shown in Figure~\ref{fib}. I let you think what it
   144 in our language is shown in Figure~\ref{fib}. I let you think
   144 calculates.
   145 what it calculates.
   145  
   146  
   146 \begin{figure}[t]
   147 \begin{figure}[t]
   147 \begin{lstlisting}[numbers=none,
   148 \begin{lstlisting}[numbers=none,
   148                    language={},xleftmargin=10mm]
   149                    language={},xleftmargin=10mm]
   149       n := 6
   150       n := 6
   164 
   165 
   165 Even if our language is rather small, it is still Turing
   166 Even if our language is rather small, it is still Turing
   166 complete---meaning quite powerful. However, discussing this
   167 complete---meaning quite powerful. However, discussing this
   167 fact in more detail would lead us too far astray. Clearly, our
   168 fact in more detail would lead us too far astray. Clearly, our
   168 programming is rather low-level and not very comfortable for
   169 programming is rather low-level and not very comfortable for
   169 writing programs. It is inspired by machine code, which is the
   170 writing programs. It is inspired by real machine code, which
   170 code that is actually executed by a CPU. So a more interesting
   171 is the code that is executed by a CPU. So a more interesting
   171 question is what is missing in comparison with real machine
   172 question is what is missing in comparison with real machine
   172 code? Well, not much\ldots{}in principle. Real machine code,
   173 code? Well, not much\ldots{}in principle. Real machine code,
   173 of course, contains many more arithmetic instructions (not
   174 of course, contains many more arithmetic instructions (not
   174 just addition and multiplication) and many more conditional
   175 just addition and multiplication) and many more conditional
   175 jumps. We could add these to our language if we wanted, but
   176 jumps. We could add these to our language if we wanted, but
   180 or large, which is not the case with real machine code. In
   181 or large, which is not the case with real machine code. In
   181 real code basic number formats have a range and might
   182 real code basic number formats have a range and might
   182 over-flow or under-flow from this range. Also the number of
   183 over-flow or under-flow from this range. Also the number of
   183 variables in our programs is potentially unlimited, while
   184 variables in our programs is potentially unlimited, while
   184 memory in an actual computer, of course, is always limited
   185 memory in an actual computer, of course, is always limited
   185 somehow on any actual. To sum up, our language might look very
   186 somehow on any actual. To sum up, our language might look
   186 simple, but it is not completely removed from practically
   187 ridiculously simple, but it is not much removed from
   187 relevant issues.
   188 practically relevant issues.
   188 
   189 
   189 
   190 
   190 \subsubsection*{An Interpreter}
   191 \subsubsection*{An Interpreter}
   191 
   192 
   192 Designing a language is like playing god: you can say what
   193 Designing a language is like playing god: you can say what
   193 names for variables you allow; what programs should look like;
   194 names for variables you allow; what programs should look like;
   194 most importantly you can decide what each part of the program
   195 most importantly you can decide what each part of the program
   195 should mean and do. While our language is rather simple and
   196 should mean and do. While our language is rather simple and
   196 the meaning is rather straightforward, there are still places
   197 the meaning of statements, for example, is rather
   197 where we need to make a real choice. For example with
   198 straightforward, there are still places where we need to make
   198 conditional jumps, say the one in the factorial program: 
   199 real choices. For example consider the conditional jumps, say
       
   200 the one in the factorial program: 
   199 
   201 
   200 \begin{center}
   202 \begin{center}
   201 \code{jmp? n = 0 done}
   203 \code{jmp? n = 0 done}
   202 \end{center}
   204 \end{center}
   203 
   205 
   204 \noindent How should they work? We could introduce Booleans
   206 \noindent How should they work? We could introduce Booleans
   205 (\pcode{true} and \pcode{false}) and then jump only when the
   207 (\pcode{true} and \pcode{false}) and then jump only when the
   206 condition is \pcode{true}. However, since we have numbers in
   208 condition is \pcode{true}. However, since we have numbers in
   207 our language anyway, why not just encoding \emph{true} as
   209 our language anyway, why not just encoding \pcode{true} as
   208 zero, and \pcode{false} as anything else? In this way we can
   210 one, and \pcode{false} as zero? In this way we can dispense
   209 dispense with the additional concept of Booleans, but also we
   211 with the additional concept of Booleans, {\bf but also we could
   210 could replace the jump above by
   212 replace the jump above by
   211 
   213 
   212 \begin{center}
   214 \begin{center}
   213 \code{jmp? n done}
   215 \code{jmp? 1 + (n + -n) done}
   214 \end{center}
   216 \end{center}
   215 
   217 
   216 \noindent which behaves exactly the same. But what does it
   218 \noindent which behaves exactly the same. But what does it
   217 mean that two jumps behave the same?
   219 actually mean that two jumps behave the same? Or two programs
       
   220 for that matter?}
   218 
   221 
   219 I hope the above discussion makes it already clear we need to
   222 I hope the above discussion makes it already clear we need to
   220 be a bit more careful with our programs. Below we shall
   223 be a bit more careful with our programs. Below we shall
   221 describe an interpreter for our programs, which specifies
   224 describe an interpreter for our programming language, which
   222 exactly how programs are supposed to be run\ldots{}at least we
   225 specifies exactly how programs are supposed to be
   223 will specify this for all \emph{good} programs. By good
   226 run\ldots{}at least we will specify this for all \emph{good}
   224 programs we mean where for example all variables are
   227 programs. By good programs I mean where all variables are
   225 initialised. Our interpreter will just crash if it cannot find
   228 initialised, for example. Our interpreter will just crash if
   226 out the value for a variable, because it is not initialised.
   229 it cannot find out the value for a variable, in case it is not
   227 
   230 initialised. Also we will assume that labels in good programs
   228 First we will pre-process our programs. This will simplify 
   231 are unique, otherwise our programs will calculate ``garbage''.
   229 our definition of our interpreter later on. We will transform
   232 
   230 programs into \emph{snippets}. 
   233 First we will pre-process our programs. This will simplify the
       
   234 definition of our interpreter later on. We will transform
       
   235 programs into \emph{snippets}. A snippet is a label and all
       
   236 code that comes after the label. This essentially means a
       
   237 snippet is a \emph{map} from labels to code. Given that
       
   238 programs are sequences (or lists) of statements, we can easily
       
   239 calculate the snippets by just recursively generating the map.
       
   240 Suppose a program is of the general form
       
   241 
       
   242 \begin{center}
       
   243 $stmt_1\;stmt_2\; \ldots\; stmt_n$
       
   244 \end{center}
       
   245 
       
   246 \noindent In order to calculate the snippets of such a
       
   247 program, we have to go through the statements one by one and
       
   248 check whether they are a label. If yes, we add the label and
       
   249 the remaining statements to our map. If no we just continue
       
   250 with the next statement. To come up with a recursive
       
   251 definition for generating snippets, let us write $[]$ for the
       
   252 program that does not contain any statement. Consider the
       
   253 following definition:
       
   254 
       
   255 \begin{center}
       
   256 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   257 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
       
   258 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
       
   259 $\begin{cases}
       
   260    \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
       
   261    \textit{snippets}(rest)                & \text{otherwise}
       
   262 \end{cases}$   
       
   263 \end{tabular}
       
   264 \end{center}
       
   265 
       
   266 \noindent In the first clause we just return the empty map for
       
   267 the program that does not contain any statements. In the
       
   268 second clause, we have to distinguish the case where the first
       
   269 statement is a label or not. If not, then we just ``throw
       
   270 away'' the label and recursively calculate the snippets for
       
   271 the rest of the program. If yes, then we do the same, but also
       
   272 update the map so that $label$ now points to the rest of the
       
   273 statements. There is one small problem we need to overcome:
       
   274 our two programs have no label as entry point---that is where
       
   275 the execution starts. We always assume that the first
       
   276 statement will be run first. For us it seems convenient if
       
   277 we add to all our programs a default label, say 
       
   278 \pcode{""} (the empty string). With this we can define
       
   279 our preprocessing of programs as follows
       
   280 
       
   281 \begin{center}
       
   282 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
       
   283 \end{center} 
       
   284 
       
   285 \noindent Let us see how this pans out in practice. If we
       
   286 preprocess the factorial program shown earlier we obtain the 
       
   287 following map:
       
   288 
       
   289 \begin{center}
       
   290 \small
       
   291 \lstset{numbers=none,
       
   292         language={},
       
   293         xleftmargin=0mm,
       
   294         aboveskip=0.5mm,
       
   295         frame=single,
       
   296         framerule=0mm,
       
   297         framesep=0mm}
       
   298 \begin{tikzpicture}[node distance=0mm]
       
   299 \node (A1) [draw]{\pcode{""}};
       
   300 \node (B1) [right=of A1] {$\mapsto$};
       
   301 \node (C1) [right=of B1,anchor=north west] {
       
   302 \begin{minipage}{3.5cm}
       
   303 \begin{lstlisting}[language={},xleftmargin=0mm]
       
   304   a := 1
       
   305   n := 5 
       
   306 top:  
       
   307   jmp? n = 0 done 
       
   308   a := a * n 
       
   309   n := n + -1 
       
   310   goto top 
       
   311 done:
       
   312 \end{lstlisting}
       
   313 \end{minipage}};
       
   314 \node (A2) [right=of C1.north east,draw] {\pcode{top}};
       
   315 \node (B2) [right=of A2] {$\mapsto$};
       
   316 \node (C2) [right=of B2, anchor=north west] {
       
   317 \begin{minipage}{3.5cm}
       
   318 \begin{lstlisting}[language={},xleftmargin=0mm]
       
   319   jmp? n = 0 done 
       
   320   a := a * n 
       
   321   n := n + -1 
       
   322   goto top 
       
   323 done:
       
   324 \end{lstlisting} 
       
   325 \end{minipage}}; 
       
   326 \node (A3) [right=of C2.north east,draw] {\pcode{done}};
       
   327 \node (B3) [right=of A3] {$\mapsto$};
       
   328 \node (C3) [right=of B3] {$[]$};
       
   329 \end{tikzpicture}
       
   330 \end{center}
       
   331 
       
   332 \noindent I highlighted the \emph{keys} in this map. Since
       
   333 there are three labels in the factorial program, there are 
       
   334 three keys.
   231 
   335 
   232 \end{document}
   336 \end{document}
   233 
   337 
   234 %%% Local Variables: 
   338 %%% Local Variables: 
   235 %%% mode: latex
   339 %%% mode: latex