handouts/ho09.tex
changeset 351 cd8d18f7b7ac
parent 350 54d6fc856950
child 352 da5713bcdbb0
equal deleted inserted replaced
350:54d6fc856950 351:cd8d18f7b7ac
     9 
     9 
    10 \section*{Handout 9 (Static Analysis)}
    10 \section*{Handout 9 (Static Analysis)}
    11 
    11 
    12 If we want to improve the safety and security of our programs,
    12 If we want to improve the safety and security of our programs,
    13 we need a more principled approach to programming. Testing is
    13 we need a more principled approach to programming. Testing is
    14 good, but as Dijkstra famously said: 
    14 good, but as Dijkstra famously wrote: 
    15 
    15 
    16 \begin{quote}\it 
    16 \begin{quote}\it 
    17 ``Program testing can be a very effective way to show the
    17 ``Program testing can be a very effective way to show the
    18 \underline{\smash{presence}} of bugs, but it is hopelessly
    18 \underline{\smash{presence}} of bugs, but it is hopelessly
    19 inadequate for showing their \underline{\smash{absence}}.''
    19 inadequate for showing their \underline{\smash{absence}}.''
    39 programs using \emph{static analysis}.
    39 programs using \emph{static analysis}.
    40 
    40 
    41 Static analysis is a technique that checks properties of a
    41 Static analysis is a technique that checks properties of a
    42 program without actually running the program. This should
    42 program without actually running the program. This should
    43 raise alarm bells with you---because almost all interesting
    43 raise alarm bells with you---because almost all interesting
    44 properties about programs are equivalent to the halting
    44 properties about programs  are equivalent to the halting
    45 problem, which we know is undecidable. For example estimating
    45 problem, which we know is undecidable. For example estimating
    46 the memory consumption of programs is in general undecidable,
    46 the memory consumption of programs is in general undecidable,
    47 just like the halting problem. Static analysis circumvents
    47 just like the halting problem. Static analysis circumvents
    48 this undecidability-problem by essentially allowing answers
    48 this undecidability-problem by essentially allowing answers
    49 \emph{yes} and \emph{no}, but also \emph{don't know}. With
    49 \emph{yes} and \emph{no}, but also \emph{don't know}. With
    58 
    58 
    59 
    59 
    60 \subsubsection*{A Simple, Idealised Programming Language}
    60 \subsubsection*{A Simple, Idealised Programming Language}
    61 
    61 
    62 Our starting point is a small, idealised programming language.
    62 Our starting point is a small, idealised programming language.
    63 This language, amongst other things, contains variables
    63 It is idealised because we cut several corners in comparison
    64 holding integers. We want to find out what the sign of these
    64 with real programming languages. The language we will study
    65 integers will be when the program runs. This sign-analysis
    65 contains, amongst other things, variables holding integers. We
       
    66 want to find out what the sign of these integers (positive or
       
    67 negative) will be when the program runs. This sign-analysis
    66 seems like a very simple problem, but it will turn out even
    68 seems like a very simple problem, but it will turn out even
    67 such simple problems, if approached naively, are in general
    69 such simple problems, if approached naively, are in general
    68 undecidable, just like Turing's halting problem. I let you
    70 undecidable, just like Turing's halting problem. I let you
    69 think why?
    71 think why?
    70 
    72 
    72 Is sign-analysis of variables an interesting problem? Well,
    74 Is sign-analysis of variables an interesting problem? Well,
    73 yes---if a compiler can find out that for example a variable
    75 yes---if a compiler can find out that for example a variable
    74 will never be negative and this variable is used as an index
    76 will never be negative and this variable is used as an index
    75 for an array, then the compiler does not need to generate code
    77 for an array, then the compiler does not need to generate code
    76 for an underflow-test. Remember some languages are immune to
    78 for an underflow-test. Remember some languages are immune to
    77 buffer-overflow attacks because they add underflow and
    79 buffer-overflow attacks, but they need to add underflow and
    78 overflow checks everywhere. If the compiler can omit the
    80 overflow checks everywhere. If the compiler can omit the
    79 underflow test, for example, then this can potentially
    81 underflow test, for example, then this can potentially
    80 drastically speed up the generated code.
    82 drastically speed up the generated code.
    81 
    83 
    82 What do programs in our programming language look like? The 
    84 What do programs in our programming language look like? The 
   133 \pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n}
   135 \pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n}
   134 is zero, in which case we jump to the end of the program
   136 is zero, in which case we jump to the end of the program
   135 marked with the label \pcode{done}. If \pcode{n} is not zero,
   137 marked with the label \pcode{done}. If \pcode{n} is not zero,
   136 we multiply the content of \pcode{a} by \pcode{n}, decrease
   138 we multiply the content of \pcode{a} by \pcode{n}, decrease
   137 \pcode{n} by one and jump back to the beginning of the loop,
   139 \pcode{n} by one and jump back to the beginning of the loop,
   138 marked with the label \pcode{top}.
   140 marked with the label \pcode{top}. Another program in our
       
   141 language is shown in Figure~\ref{fib}. I let you think what it
       
   142 calculates.
   139  
   143  
   140 \begin{figure}[t]
   144 \begin{figure}[t]
   141 \begin{lstlisting}[numbers=none,
   145 \begin{lstlisting}[numbers=none,
   142                    language={},xleftmargin=10mm]
   146                    language={},xleftmargin=10mm]
   143       n := 6
   147       n := 6
   150       m1 := tmp
   154       m1 := tmp
   151       n := n + -1
   155       n := n + -1
   152       goto top
   156       goto top
   153 done:
   157 done:
   154 \end{lstlisting}
   158 \end{lstlisting}
   155 \label{A mystery program in our idealised programming language.}
   159 \caption{A mystery program in our idealised programming language.
       
   160 Try to find out what it calculates! \label{fib}}
   156 \end{figure}
   161 \end{figure}
   157 
   162 
   158 Even if our language is rather small, it is still Turing
   163 Even if our language is rather small, it is still Turing
   159 complete---so rather powerful. However, discussing this more
   164 complete---meaning quite powerful. However, discussing this
   160 would lead us to far astray.
   165 fact in more detail would lead us too far astray. Clearly, our
       
   166 programming is rather low-level and not very comfortable for
       
   167 writing programs. It is inspired by machine code, which is the
       
   168 code that is actually executed by a CPU. So a more interesting
       
   169 question is what is missing in comparison with real machine
       
   170 code? Well, not much\ldots{}in principle. Real machine code,
       
   171 of course, contains many more arithmetic instructions (not
       
   172 just addition and multiplication) and many more conditional
       
   173 jumps. We could add these to our language if we wanted, but
       
   174 complexity is really beside the point here. Furthermore, real
       
   175 machine code has many instructions for manipulating memory. We
       
   176 do not have this at all. This is actually a more serious
       
   177 simplification because we assume numbers to be arbitrary
       
   178 precision, which is not the case with real machine code. In
       
   179 real code basic number formats have a range and might
       
   180 over-flow or under-flow from this range. Also the numbers of
       
   181 variables in our programs is unlimited, while memory, of
       
   182 course, is always limited somehow on any actual machine. To
       
   183 sum up, our language might look very simple, but it is not
       
   184 completely removed from practically relevant issues.
   161 
   185 
   162 What would be missing in comparison with real
   186 
   163 (low-level machine) code? Well, the numbers we assume to be
   187 \subsubsection*{An Interpreter}
   164 arbitrary precision, which is not the case in real code. There
   188 
   165 basic number formats have a rang and might over-run or
   189 Designing a language is like being god: you can say what 
   166 under-run from this range. Our assumption about variables,
   190 each part of the program should mean. 
   167 does not correspond to actual registers, which are only 
       
   168 limited on real hardware. Obviously, real code has richer
       
   169 operations than just addition, multiplication and equality.
       
   170 But this are not really essential limitations of our simple
       
   171 examples.
       
   172 
   191 
   173 \end{document}
   192 \end{document}
   174 
   193 
   175 %%% Local Variables: 
   194 %%% Local Variables: 
   176 %%% mode: latex
   195 %%% mode: latex