handouts/ho09.tex
changeset 359 c90f803dc7ea
parent 357 5b91f5ad2772
child 360 eb2004430215
equal deleted inserted replaced
358:8787c16bc26e 359:c90f803dc7ea
     9 
     9 
    10 \section*{Handout 9 (Static Analysis)}
    10 \section*{Handout 9 (Static Analysis)}
    11 
    11 
    12 If we want to improve the safety and security of our programs,
    12 If we want to improve the safety and security of our programs,
    13 we need a more principled approach to programming. Testing is
    13 we need a more principled approach to programming. Testing is
    14 good, but as Dijkstra famously wrote: 
    14 good, but as Edsger Dijkstra famously wrote: 
    15 
    15 
    16 \begin{quote}\it 
    16 \begin{quote}\it 
    17 ``Program testing can be a very effective way to show the
    17 ``Program testing can be a very effective way to show the
    18 \underline{\smash{presence}} of bugs, but it is hopelessly
    18 \underline{\smash{presence}} of bugs, but it is hopelessly
    19 inadequate for showing their \underline{\smash{absence}}.''
    19 inadequate for showing their \underline{\smash{absence}}.''
    75 yes---if a compiler can find out that for example a variable
    75 yes---if a compiler can find out that for example a variable
    76 will never be negative and this variable is used as an index
    76 will never be negative and this variable is used as an index
    77 for an array, then the compiler does not need to generate code
    77 for an array, then the compiler does not need to generate code
    78 for an underflow-check. Remember some languages are immune to
    78 for an underflow-check. Remember some languages are immune to
    79 buffer-overflow attacks, but they need to add underflow and
    79 buffer-overflow attacks, but they need to add underflow and
    80 overflow checks everywhere. According to Regehr, an expert in
    80 overflow checks everywhere. According to John Regher, an
    81 the field of compilers, overflow checks can cause 5-10\%
    81 expert in the field of compilers, overflow checks can cause
    82 slowdown, and in some languages even 100\% for tight
    82 5-10\% slowdown, in some languages even 100\% for tight
    83 loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If
    83 loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If
    84 the compiler can omit the underflow check, for example, then
    84 the compiler can omit the underflow check, for example, then
    85 this can potentially drastically speed up the generated code. 
    85 this can potentially drastically speed up the generated code. 
    86 
    86 
    87 What do programs in our simple programming language look like?
    87 What do programs in our simple programming language look like?
   120 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
   120 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
   121 (for simplicity reasons we do not consider any other
   121 (for simplicity reasons we do not consider any other
   122 operations). We assume we have negative and positive numbers,
   122 operations). We assume we have negative and positive numbers,
   123 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
   123 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
   124 \pcode{2}\ldots{} An example program that calculates the
   124 \pcode{2}\ldots{} An example program that calculates the
   125 factorial of 5 is in oure programming language as follows:
   125 factorial of 5 is in our programming language as follows:
   126 
   126 
   127 \begin{lstlisting}[language={},xleftmargin=10mm]
   127 \begin{lstlisting}[language={},xleftmargin=10mm]
   128       a := 1
   128       a := 1
   129       n := 5 
   129       n := 5 
   130 top:  
   130 top:  
   217 describe an interpreter for our programming language, which
   217 describe an interpreter for our programming language, which
   218 specifies exactly how programs are supposed to be
   218 specifies exactly how programs are supposed to be
   219 run\ldots{}at least we will specify this for all \emph{good}
   219 run\ldots{}at least we will specify this for all \emph{good}
   220 programs. By good programs I mean where all variables are
   220 programs. By good programs I mean where all variables are
   221 initialised, for example. Our interpreter will just crash if
   221 initialised, for example. Our interpreter will just crash if
   222 it cannot find out the value for a variable, in case it is not
   222 it cannot find out the value for a variable when it is not
   223 initialised. Also, we will assume that labels in good programs
   223 initialised. Also, we will assume that labels in good programs
   224 are unique, otherwise our programs will calculate ``garbage''.
   224 are unique, otherwise our programs will calculate ``garbage''.
   225 
   225 
   226 First we will pre-process our programs. This will simplify the
   226 First we will pre-process our programs. This will simplify the
   227 definition of our interpreter later on. We will transform
   227 definition of our interpreter later on. By pre-processing our
   228 programs into \emph{snippets}. Their purpose is to simplify
   228 programs we will transform programs into \emph{snippets}.
   229 the definition of what the interpreter should do in case of a
   229 Their purpose is to simplify the definition of what the
   230 jump. A snippet is a label and all code that comes after the
   230 interpreter should do in case of a jump. A snippet is a label
   231 label. This essentially means a snippet is a \emph{map} from
   231 and all the code that comes after the label. This essentially
   232 labels to code. Given that programs are sequences (or lists)
   232 means a snippet is a \emph{map} from labels to
   233 of statements, we can easily calculate the snippets by just
   233 code.\footnote{Be sure you know what maps are. In a
   234 traversing this sequence and recursively generating the map.
   234 programming context they are often represented as association
   235 Suppose a program is of the general form
   235 list where some data is associated with a key.} 
       
   236 
       
   237 Given that programs are sequences (or lists) of statements, we
       
   238 can easily calculate the snippets by just traversing this
       
   239 sequence and recursively generating the map. Suppose a program
       
   240 is of the general form
   236 
   241 
   237 \begin{center}
   242 \begin{center}
   238 $stmt_1\;stmt_2\; \ldots\; stmt_n$
   243 $stmt_1\;stmt_2\; \ldots\; stmt_n$
   239 \end{center}
   244 \end{center}
   240 
   245 
   328 \noindent I highlighted the \emph{keys} in this map. Since
   333 \noindent I highlighted the \emph{keys} in this map. Since
   329 there are three labels in the factorial program (remember we
   334 there are three labels in the factorial program (remember we
   330 added \pcode{""}), there are three keys. When running the
   335 added \pcode{""}), there are three keys. When running the
   331 factorial program and encountering a jump, then we only have
   336 factorial program and encountering a jump, then we only have
   332 to consult this snippets-map in order to find out what the
   337 to consult this snippets-map in order to find out what the
   333 next instruction should be.
   338 next statements should be.
   334 
   339 
   335 We should now be in the position to define how a program
   340 We should now be in the position to define how a program
   336 should be run. In the context of interpreters, this
   341 should be run. In the context of interpreters, this
   337 ``running'' of programs is often called \emph{evaluation}. Let
   342 ``running'' of programs is often called \emph{evaluation}. Let
   338 us start with the definition of how expressions are evaluated.
   343 us start with the definition of how expressions are evaluated.
   431 \end{tabular}
   436 \end{tabular}
   432 \end{center}
   437 \end{center}
   433  
   438  
   434 \noindent This environment $env$ also acts like a map: it
   439 \noindent This environment $env$ also acts like a map: it
   435 associates variable with their current values. For example
   440 associates variable with their current values. For example
   436 such an environment might look as follows
   441 ߆after evaluating the first two lines in our factorial
       
   442 program, such an environment might look as follows
   437 
   443 
   438 \begin{center}
   444 \begin{center}
   439 \begin{tabular}{ll}
   445 \begin{tabular}{ll}
   440 $\fbox{\texttt{a}} \mapsto \texttt{1}$ &
   446 $\fbox{\texttt{a}} \mapsto \texttt{1}$ &
   441 $\fbox{\texttt{n}} \mapsto \texttt{5}$
   447 $\fbox{\texttt{n}} \mapsto \texttt{5}$
   442 \end{tabular}
   448 \end{tabular}
   443 \end{center}
   449 \end{center}
   444 
   450 
   445 \noindent Again I hilighted the keys. In the clause for
   451 \noindent Again I highlighted the keys. In the clause for
   446 variables, we can therefore consult this environment and
   452 variables, we can therefore consult this environment and
   447 return whatever value is currently stored for this variable.
   453 return whatever value is currently stored for this variable.
   448 This is written $env(x)$. If we query this map with $x$ we
   454 This is written $env(x)$. If we query this map with $x$ we
   449 obtain the corresponding number. You might ask what happens if
   455 obtain the corresponding number. You might ask what happens if
   450 an environment does not contain any value for, say, the
   456 an environment does not contain any value for, say, the
   451 variable $x$? Well, then our interpreter just crashes, or will
   457 variable $x$? Well, then our interpreter just ``crashes'', or
   452 raise an exception. In this case we have a ``bad'' program
   458 more precisely will raise an exception. In this case we have a
   453 that tried to use a variable before it was initialised. The
   459 ``bad'' program that tried to use a variable before it was
   454 programmer should not have don this. With the second version
   460 initialised. The programmer should not have done this. In a
   455 of \textit{eval\_exp} we completed our definition for
   461 real programming language we would of course try a bit harder
   456 evaluating expressions.
   462 and for example give an error at compile time, or design our
       
   463 language in such a way that this can never happen. With the
       
   464 second version of \textit{eval\_exp} we completed our
       
   465 definition for evaluating expressions.
   457  
   466  
   458 Next comes the evaluation function for statements. We define
   467 Next comes the evaluation function for statements. We define
   459 this function in such a way that we evaluate a whole sequence
   468 this function in such a way that we recursively evaluate a
   460 of statements. Assume a program $p$ and its preprocessed 
   469 whole sequence of statements. Assume a program $p$ (you want
   461 snippets $sn$. Then we define:
   470 to evaluate) and its pre-processed snippets $sn$. Then we can
       
   471 define:
   462 
   472 
   463 \begin{center}
   473 \begin{center}
   464 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
   474 \begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
   465 $\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\
   475 $\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\
   466 $\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &
   476 $\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &
   467   $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ 
   477   $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ 
   468 $\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & 
   478 $\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & 
   469   $\dn$ & $\textit{eval\_stmts}(rest, 
   479   $\dn$ & $\textit{eval\_stmts}(rest, 
   470            env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\  
   480            env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\  
       
   481 $\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ 
       
   482  & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\
   471 $\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ 
   483 $\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ 
   472  & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}
   484  & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}
   473  \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ 
   485  \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ 
   474  & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\
   486  & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\
   475  \textit{eval\_stmts}(rest, env) & \text{otherwise}\\
   487  \textit{eval\_stmts}(rest, env) & \text{otherwise}\\
   476  \end{array}\end{cases}$\\
   488  \end{array}\end{cases}$  
   477 $\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ 
       
   478  & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$            
       
   479 \end{tabular}
   489 \end{tabular}
   480 \end{center}
   490 \end{center}
   481 
   491 
   482 
   492 \noindent The first clause is for the empty program, or when
   483 
   493 we arrived at the end of the program. In this case we just
   484 
   494 return the environment. The second clause is for when the next
       
   495 statement is a label. That means the program is of the form
       
   496 $\texttt{label:}\;rest$ where the label is some string and
       
   497 $rest$ stands for all following statement. This case is easy,
       
   498 because our evaluation function just discards the label and
       
   499 evaluates the rest of the statements (we already extracted all
       
   500 important information about labels when we pre-processed our
       
   501 programs and generated the snippets). The third clause is for
       
   502 variable assignments. Again we just evaluate the rest for the
       
   503 statements, but with a modified environment---since the
       
   504 variable assignment is supposed to introduce a new variable or
       
   505 change the current value of a variable. For this modification
       
   506 of the environment we first evaluate the expression
       
   507 $\texttt{e}$ using our evaluation function for expressions.
       
   508 This gives us a number. Then we assign this number to the
       
   509 variable $x$ in the environment. This modified environment
       
   510 will be used to evaluate the rest of the program. The fourth
       
   511 clause is for the unconditional jump to a label. That means we
       
   512 have to look up in our snippets map $sn$ what are the next
       
   513 statements for this label are. Therefore we will continue with
       
   514 evaluating, not with the rest of the program, but with the
       
   515 statements stored in the snippets-map under the label
       
   516 $\texttt{lbl}$. The fifth clause for conditional jumps is
       
   517 similar, but in order to make the jump we first need to
       
   518 evaluate the expression $\texttt{e}$ in order to find out
       
   519 whether it is $1$. If yes, we jump, otherwise we just continue
       
   520 with evaluating the rest of the program.
       
   521 
       
   522 Our interpreter works in two stages: First we pre-process our
       
   523 program generating the snippets map $sn$, say. Second we call
       
   524 the evaluation function with the default entry point and the
       
   525 empty environment:
       
   526 
       
   527 \begin{center}
       
   528 $\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$
       
   529 \end{center}
       
   530  
       
   531 \noindent It is interesting to not that our interpreter when
       
   532 it comes to the end of the program returns an environment. Our
       
   533 programming language does not contain any constructs for input
       
   534 and output. Therefore this environment is the only effect we
       
   535 can observe when running the program (apart from that our
       
   536 interpreter might need some time before finishing the
       
   537 evaluation of the program)
   485 
   538 
   486 
   539 
   487 
   540 
   488  
   541  
   489 \end{document}
   542 \end{document}