handouts/ho09.tex
changeset 355 619073c37649
parent 354 8e5e84b14041
child 356 e1e0f78baa70
equal deleted inserted replaced
354:8e5e84b14041 355:619073c37649
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 \usepackage{../graphics}
     4 \usepackage{../graphics}
     5 \usepackage{../grammar}
     5 \usepackage{../grammar}
     6 \usepackage{multicol}
     6 \usepackage{multicol}
     7 \usepackage{array}
       
     8 
     7 
     9 \begin{document}
     8 \begin{document}
    10 
     9 
    11 \section*{Handout 9 (Static Analysis)}
    10 \section*{Handout 9 (Static Analysis)}
    12 
    11 
    65 with real programming languages. The language we will study
    64 with real programming languages. The language we will study
    66 contains, amongst other things, variables holding integers.
    65 contains, amongst other things, variables holding integers.
    67 Using static analysis, we want to find out what the sign of
    66 Using static analysis, we want to find out what the sign of
    68 these integers (positive or negative) will be when the program
    67 these integers (positive or negative) will be when the program
    69 runs. This sign-analysis seems like a very simple problem. But
    68 runs. This sign-analysis seems like a very simple problem. But
    70 teven such simple problems, if approached naively, are in
    69 even such simple problems, if approached naively, are in
    71 general undecidable, just like Turing's halting problem. I let
    70 general undecidable, just like Turing's halting problem. I let
    72 you think why?
    71 you think why?
    73 
    72 
    74 
    73 
    75 Is sign-analysis of variables an interesting problem? Well,
    74 Is sign-analysis of variables an interesting problem? Well,
    76 yes---if a compiler can find out that for example a variable
    75 yes---if a compiler can find out that for example a variable
    77 will never be negative and this variable is used as an index
    76 will never be negative and this variable is used as an index
    78 for an array, then the comopiler does not need to generate code
    77 for an array, then the compiler does not need to generate code
    79 for an underflow-test. Remember some languages are immune to
    78 for an underflow-test. Remember some languages are immune to
    80 buffer-overflow attacks, but they need to add underflow and
    79 buffer-overflow attacks, but they need to add underflow and
    81 overflow checks everywhere. If the compiler can omit the
    80 overflow checks everywhere. If the compiler can omit the
    82 underflow test, for example, then this can potentially
    81 underflow test, for example, then this can potentially
    83 drastically speed up the generated code.
    82 drastically speed up the generated code. According to the John
       
    83 Regehr, an expert in the field of compilers, overflow checks
       
    84 can cause 5-10\% slowdown, and in some languages even 100\%
       
    85 for tight loops.\footnote{\url{http://blog.regehr.org/archives/1154}}  
    84 
    86 
    85 What do programs in our simple programming language look like?
    87 What do programs in our simple programming language look like?
    86 The following grammar gives a first specification:
    88 The following grammar gives a first specification:
    87 
    89 
    88 \begin{multicols}{2}
    90 \begin{multicols}{2}
   109 and \emph{expressions} as well as \emph{programs}, which are
   111 and \emph{expressions} as well as \emph{programs}, which are
   110 sequences of statements. Statements are either labels,
   112 sequences of statements. Statements are either labels,
   111 variable assignments, conditional jumps (\pcode{jmp?}) and
   113 variable assignments, conditional jumps (\pcode{jmp?}) and
   112 unconditional jumps (\pcode{goto}). Labels are just strings,
   114 unconditional jumps (\pcode{goto}). Labels are just strings,
   113 which can be used as the target of a jump. We assume that in
   115 which can be used as the target of a jump. We assume that in
   114 every program the labels are unique---if there is a clash we
   116 every program the labels are unique---if there is a clash,
   115 do not know where to jump to. The conditional jumps and
   117 then we do not know where to jump to. The conditional jumps
   116 variable assignments involve (arithmetic) expressions.
   118 and variable assignments involve (arithmetic) expressions.
   117 Expressions are either numbers, variables or compound
   119 Expressions are either numbers, variables or compound
   118 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
   120 expressions built up from \pcode{+}, \pcode{*} and \emph{=}
   119 (for simplicity reasons we do not consider any other
   121 (for simplicity reasons we do not consider any other
   120 operations). We assume we have negative and positive numbers,
   122 operations). We assume we have negative and positive numbers,
   121 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
   123 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
   182 real code basic number formats have a range and might
   184 real code basic number formats have a range and might
   183 over-flow or under-flow from this range. Also the number of
   185 over-flow or under-flow from this range. Also the number of
   184 variables in our programs is potentially unlimited, while
   186 variables in our programs is potentially unlimited, while
   185 memory in an actual computer, of course, is always limited
   187 memory in an actual computer, of course, is always limited
   186 somehow on any actual. To sum up, our language might look
   188 somehow on any actual. To sum up, our language might look
   187 ridiculously simple, but it is not much removed from
   189 ridiculously simple, but it is not far removed from
   188 practically relevant issues.
   190 practically relevant issues.
   189 
   191 
   190 
   192 
   191 \subsubsection*{An Interpreter}
   193 \subsubsection*{An Interpreter}
   192 
   194 
   206 \noindent How should they work? We could introduce Booleans
   208 \noindent How should they work? We could introduce Booleans
   207 (\pcode{true} and \pcode{false}) and then jump only when the
   209 (\pcode{true} and \pcode{false}) and then jump only when the
   208 condition is \pcode{true}. However, since we have numbers in
   210 condition is \pcode{true}. However, since we have numbers in
   209 our language anyway, why not just encoding \pcode{true} as
   211 our language anyway, why not just encoding \pcode{true} as
   210 one, and \pcode{false} as zero? In this way we can dispense
   212 one, and \pcode{false} as zero? In this way we can dispense
   211 with the additional concept of Booleans, {\bf but also we could
   213 with the additional concept of Booleans.
   212 replace the jump above by
       
   213 
       
   214 \begin{center}
       
   215 \code{jmp? 1 + (n + -n) done}
       
   216 \end{center}
       
   217 
       
   218 \noindent which behaves exactly the same. But what does it
       
   219 actually mean that two jumps behave the same? Or two programs
       
   220 for that matter?}
       
   221 
   214 
   222 I hope the above discussion makes it already clear we need to
   215 I hope the above discussion makes it already clear we need to
   223 be a bit more careful with our programs. Below we shall
   216 be a bit more careful with our programs. Below we shall
   224 describe an interpreter for our programming language, which
   217 describe an interpreter for our programming language, which
   225 specifies exactly how programs are supposed to be
   218 specifies exactly how programs are supposed to be
   226 run\ldots{}at least we will specify this for all \emph{good}
   219 run\ldots{}at least we will specify this for all \emph{good}
   227 programs. By good programs I mean where all variables are
   220 programs. By good programs I mean where all variables are
   228 initialised, for example. Our interpreter will just crash if
   221 initialised, for example. Our interpreter will just crash if
   229 it cannot find out the value for a variable, in case it is not
   222 it cannot find out the value for a variable, in case it is not
   230 initialised. Also we will assume that labels in good programs
   223 initialised. Also, we will assume that labels in good programs
   231 are unique, otherwise our programs will calculate ``garbage''.
   224 are unique, otherwise our programs will calculate ``garbage''.
   232 
   225 
   233 First we will pre-process our programs. This will simplify the
   226 First we will pre-process our programs. This will simplify the
   234 definition of our interpreter later on. We will transform
   227 definition of our interpreter later on. We will transform
   235 programs into \emph{snippets}. A snippet is a label and all
   228 programs into \emph{snippets}. Their purpose is to simplify
   236 code that comes after the label. This essentially means a
   229 the definition of what the interpreter should do in case of a
   237 snippet is a \emph{map} from labels to code. Given that
   230 jump. A snippet is a label and all code that comes after the
   238 programs are sequences (or lists) of statements, we can easily
   231 label. This essentially means a snippet is a \emph{map} from
   239 calculate the snippets by just recursively generating the map.
   232 labels to code. Given that programs are sequences (or lists)
       
   233 of statements, we can easily calculate the snippets by just
       
   234 traversing this sequence and recursively generating the map.
   240 Suppose a program is of the general form
   235 Suppose a program is of the general form
   241 
   236 
   242 \begin{center}
   237 \begin{center}
   243 $stmt_1\;stmt_2\; \ldots\; stmt_n$
   238 $stmt_1\;stmt_2\; \ldots\; stmt_n$
   244 \end{center}
   239 \end{center}
   245 
   240 
   246 \noindent In order to calculate the snippets of such a
   241 \noindent The idea is to go through this sequence of
   247 program, we have to go through the statements one by one and
   242 statements one by one and check whether they are a label. If
   248 check whether they are a label. If yes, we add the label and
   243 yes, we add the label and the remaining statements to our map.
   249 the remaining statements to our map. If no we just continue
   244 If no, we just continue with the next statement. To come up
   250 with the next statement. To come up with a recursive
   245 with a recursive definition for generating snippets, let us
   251 definition for generating snippets, let us write $[]$ for the
   246 write $[]$ for the program that does not contain any
   252 program that does not contain any statement. Consider the
   247 statement. Consider the following definition:
   253 following definition:
       
   254 
   248 
   255 \begin{center}
   249 \begin{center}
   256 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
   250 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
   257 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
   251 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
   258 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
   252 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
   262 \end{cases}$   
   256 \end{cases}$   
   263 \end{tabular}
   257 \end{tabular}
   264 \end{center}
   258 \end{center}
   265 
   259 
   266 \noindent In the first clause we just return the empty map for
   260 \noindent In the first clause we just return the empty map for
   267 the program that does not contain any statements. In the
   261 the program that does not contain any statement. In the second
   268 second clause, we have to distinguish the case where the first
   262 clause, we have to distinguish the case where the first
   269 statement is a label or not. If not, then we just ``throw
   263 statement is a label or not. As said before, if not, then we
   270 away'' the label and recursively calculate the snippets for
   264 just ``throw away'' the label and recursively calculate the
   271 the rest of the program. If yes, then we do the same, but also
   265 snippets for the rest of the program. If yes, then we do the
   272 update the map so that $label$ now points to the rest of the
   266 same, but also update the map so that $label$ now points to
   273 statements. There is one small problem we need to overcome:
   267 the rest of the statements. There is one small problem we need
   274 our two programs have no label as entry point---that is where
   268 to overcome: our two programs have no label as \emph{entry
   275 the execution starts. We always assume that the first
   269 point}---that is where the execution starts. We usually assume
   276 statement will be run first. For us it seems convenient if
   270 that the first statement will be run first. To make this the
   277 we add to all our programs a default label, say 
   271 default, it is convenient if we add to all our programs a
   278 \pcode{""} (the empty string). With this we can define
   272 default label, say \pcode{""} (the empty string). With this we
   279 our preprocessing of programs as follows
   273 can define our pre-processing of programs as follows
   280 
   274 
   281 \begin{center}
   275 \begin{center}
   282 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
   276 $\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
   283 \end{center} 
   277 \end{center} 
   284 
   278 
   285 \noindent Let us see how this pans out in practice. If we
   279 \noindent Let us see how this pans out in practice. If we
   286 preprocess the factorial program shown earlier we obtain the 
   280 pre-process the factorial program shown earlier, we obtain the 
   287 following map:
   281 following map:
   288 
   282 
   289 \begin{center}
   283 \begin{center}
   290 \small
   284 \small
   291 \lstset{numbers=none,
   285 \lstset{numbers=none,
   292         language={},
   286         language={},
   293         xleftmargin=0mm,
   287         xleftmargin=0mm,
   294         aboveskip=0.5mm,
   288         aboveskip=0.5mm,
       
   289         belowskip=0.5mm,
   295         frame=single,
   290         frame=single,
   296         framerule=0mm,
   291         framerule=0mm,
   297         framesep=0mm}
   292         framesep=0mm}
   298 \begin{tikzpicture}[node distance=0mm]
   293 \begin{tikzpicture}[node distance=0mm]
   299 \node (A1) [draw]{\pcode{""}};
   294 \node (A1) [draw]{\pcode{""}};
   328 \node (C3) [right=of B3] {$[]$};
   323 \node (C3) [right=of B3] {$[]$};
   329 \end{tikzpicture}
   324 \end{tikzpicture}
   330 \end{center}
   325 \end{center}
   331 
   326 
   332 \noindent I highlighted the \emph{keys} in this map. Since
   327 \noindent I highlighted the \emph{keys} in this map. Since
   333 there are three labels in the factorial program, there are 
   328 there are three labels in the factorial program, there are
   334 three keys.
   329 three keys. When running the factorial program and
       
   330 encountering a jump, then we only have to consult this map, in
       
   331 order to find out what the next instruction should be.
       
   332 
       
   333 We should now be in the position to define how a program
       
   334 should be run. This ``running'' of programs is often called 
       
   335 \emph{evaluation}. Let us start with the definition for
       
   336 how expressions are evaluated. A first attempt is the 
       
   337 following recursive function:
       
   338 
       
   339 \begin{center}
       
   340 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
       
   341 $\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
       
   342 $\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
       
   343 $\begin{cases}
       
   344    \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
       
   345    \textit{snippets}(rest)                & \text{otherwise}
       
   346 \end{cases}$   
       
   347 \end{tabular}
       
   348 \end{center}
       
   349 
   335 
   350 
   336 \end{document}
   351 \end{document}
   337 
   352 
   338 %%% Local Variables: 
   353 %%% Local Variables: 
   339 %%% mode: latex
   354 %%% mode: latex