handouts/ho09.tex
changeset 350 54d6fc856950
parent 347 efad8155513f
child 351 cd8d18f7b7ac
equal deleted inserted replaced
349:9f4f626cefea 350:54d6fc856950
    27 
    27 
    28 \begin{center}
    28 \begin{center}
    29 \url{http://sel4.systems}
    29 \url{http://sel4.systems}
    30 \end{center}
    30 \end{center}
    31 
    31 
    32 \noindent This work was in 2011 included in the MIT Technology
    32 \noindent In 2011 this work was included in the MIT Technology
    33 Review in the annual list of the world’s ten most important
    33 Review in the annual list of the world’s ten most important
    34 emerging
    34 emerging
    35 technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
    35 technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
    36 While this work is impressive, its technical details are too
    36 While this work is impressive, its technical details are too
    37 enormous for an explanation here. Therefore let us look at
    37 enormous for an explanation here. Therefore let us look at
    38 something much simpler, namely finding out properties about
    38 something much simpler, namely finding out properties about
    39 programs using \emph{static analysis}.
    39 programs using \emph{static analysis}.
    40 
    40 
    41 Static analysis is one technique that checks properties of a
    41 Static analysis is a technique that checks properties of a
    42 program without actually running the program. This should
    42 program without actually running the program. This should
    43 raise alarm bells with you---because almost all interesting
    43 raise alarm bells with you---because almost all interesting
    44 properties about programs are equivalent to the halting
    44 properties about programs are equivalent to the halting
    45 problem, which we know is undecidable. For example estimating
    45 problem, which we know is undecidable. For example estimating
    46 the memory consumption of programs is in general undecidable,
    46 the memory consumption of programs is in general undecidable,
    58 
    58 
    59 
    59 
    60 \subsubsection*{A Simple, Idealised Programming Language}
    60 \subsubsection*{A Simple, Idealised Programming Language}
    61 
    61 
    62 Our starting point is a small, idealised programming language.
    62 Our starting point is a small, idealised programming language.
    63 This language contains variables holding integers. We want to
    63 This language, amongst other things, contains variables
    64 find out what the sign of these integers will be when the
    64 holding integers. We want to find out what the sign of these
    65 program runs. This seems like a very simple problem, but it
    65 integers will be when the program runs. This sign-analysis
    66 will turn out even such a simple analysis if approached
    66 seems like a very simple problem, but it will turn out even
    67 naively is in general undecidable, just like Turing's halting
    67 such simple problems, if approached naively, are in general
    68 problem. I let you think why?
    68 undecidable, just like Turing's halting problem. I let you
       
    69 think why?
    69 
    70 
    70 
    71 
    71 Is sign-analysis of variables an interesting problem? Well,
    72 Is sign-analysis of variables an interesting problem? Well,
    72 yes---if a compiler can find out that for example a variable
    73 yes---if a compiler can find out that for example a variable
    73 will never be negative and this variable is used as an index
    74 will never be negative and this variable is used as an index
    74 for an array, then the compiler does not need to generate code
    75 for an array, then the compiler does not need to generate code
    75 for an underflow-test. Remember some languages are immune to
    76 for an underflow-test. Remember some languages are immune to
    76 buffer-overflow attacks because they add bound checks
    77 buffer-overflow attacks because they add underflow and
    77 everywhere. This could potentially drastically speed up the
    78 overflow checks everywhere. If the compiler can omit the
    78 generated code.
    79 underflow test, for example, then this can potentially
       
    80 drastically speed up the generated code.
    79 
    81 
    80 Since we want to 
    82 What do programs in our programming language look like? The 
       
    83 following grammar gives a first specification:
    81 
    84 
    82 \begin{multicols}{2}
    85 \begin{multicols}{2}
    83 \begin{plstx}[rhs style=,one per line,left margin=9mm]
    86 \begin{plstx}[rhs style=,one per line,left margin=9mm]
       
    87 : \meta{Stmt} ::= \meta{label} \texttt{:}
       
    88                 | \meta{var} \texttt{:=} \meta{Exp}
       
    89                 | \texttt{jmp?} \meta{Exp} \meta{label}
       
    90                 | \texttt{goto} \meta{label}\\
       
    91 : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
       
    92 \end{plstx}
       
    93 \columnbreak
       
    94 \begin{plstx}[rhs style=,one per line]
    84 : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
    95 : \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
    85                | \meta{Exp} \texttt{*} \meta{Exp}
    96                | \meta{Exp} \texttt{*} \meta{Exp}
    86                | \meta{Exp} \texttt{=} \meta{Exp} 
    97                | \meta{Exp} \texttt{=} \meta{Exp} 
    87                | \meta{num}
    98                | \meta{num}
    88                | \meta{var}\\
    99                | \meta{var}\\
    89 \end{plstx}
   100 \end{plstx}
    90 \columnbreak
       
    91 \begin{plstx}[rhs style=,one per line]
       
    92 : \meta{Stmt} ::= \meta{label} \texttt{:}
       
    93                 | \meta{var} \texttt{:=} \meta{Exp}
       
    94                 | \texttt{jmp?} \meta{Exp} \meta{label}
       
    95                 | \texttt{goto} \meta{label}\\
       
    96 : \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
       
    97 \end{plstx}
       
    98 \end{multicols}
   101 \end{multicols}
    99 
   102 
   100 \begin{lstlisting}[numbers=none,
   103 \noindent I assume you are familiar with such
   101                    language={},xleftmargin=10mm]
   104 grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}
       
   105 There are three main syntactic categories: \emph{statments}
       
   106 and \emph{expressions} as well as \emph{programs}, which are
       
   107 sequences of statements. Statements are either labels,
       
   108 variable assignments, conditional jumps (\pcode{jmp?}) and
       
   109 unconditional jumps (\pcode{goto}). Labels are just strings,
       
   110 which can be used as the target of a jump. The conditional
       
   111 jumps and variable assignments involve (arithmetic)
       
   112 expressions. Expressions are either numbers, variables or
       
   113 compound expressions built up from \pcode{+}, \pcode{*} and
       
   114 \emph{=} (for simplicity reasons we do not consider any other
       
   115 operations). We assume we have negative and positive numbers,
       
   116 \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
       
   117 \pcode{2}\ldots{} An example program that calculates the
       
   118 factorial of 5 is as follows:
       
   119 
       
   120 \begin{lstlisting}[language={},xleftmargin=10mm]
   102       a := 1
   121       a := 1
   103       n := 5 
   122       n := 5 
   104 top:  jmp? n = 0 done 
   123 top:  
       
   124       jmp? n = 0 done 
   105       a := a * n 
   125       a := a * n 
   106       n := n + -1 
   126       n := n + -1 
   107       goto top 
   127       goto top 
   108 done:
   128 done:
   109 \end{lstlisting}
   129 \end{lstlisting}
   110 
   130 
       
   131 \noindent Each line of the program contains a statement. In
       
   132 the first two lines we assign values to the variables
       
   133 \pcode{a} and \pcode{n}. In line 4 we test whether \pcode{n}
       
   134 is zero, in which case we jump to the end of the program
       
   135 marked with the label \pcode{done}. If \pcode{n} is not zero,
       
   136 we multiply the content of \pcode{a} by \pcode{n}, decrease
       
   137 \pcode{n} by one and jump back to the beginning of the loop,
       
   138 marked with the label \pcode{top}.
       
   139  
       
   140 \begin{figure}[t]
   111 \begin{lstlisting}[numbers=none,
   141 \begin{lstlisting}[numbers=none,
   112                    language={},xleftmargin=10mm]
   142                    language={},xleftmargin=10mm]
   113       n := 6
   143       n := 6
   114       m1 := 0
   144       m1 := 0
   115       m2 := 1
   145       m2 := 1
   116 loop: jmp? n = 0 done
   146 loop: 
       
   147       jmp? n = 0 done
   117       tmp := m2
   148       tmp := m2
   118       m2 := m1 + m2
   149       m2 := m1 + m2
   119       m1 := tmp
   150       m1 := tmp
   120       n := n + -1
   151       n := n + -1
   121       goto top
   152       goto top
   122 done:
   153 done:
   123 \end{lstlisting}
   154 \end{lstlisting}
       
   155 \label{A mystery program in our idealised programming language.}
       
   156 \end{figure}
   124 
   157 
   125 \bigskip
   158 Even if our language is rather small, it is still Turing
       
   159 complete---so rather powerful. However, discussing this more
       
   160 would lead us to far astray.
   126 
   161 
   127 \noindent What would be missing in comparison with real
   162 What would be missing in comparison with real
   128 (low-level machine) code? Well, the numbers we assume to be
   163 (low-level machine) code? Well, the numbers we assume to be
   129 arbitrary precision, which is not the case in real code. There
   164 arbitrary precision, which is not the case in real code. There
   130 basic number formats have a rang and might over-run or
   165 basic number formats have a rang and might over-run or
   131 under-run from this range. Our assumption about variables,
   166 under-run from this range. Our assumption about variables,
   132 does not correspond to actual registers, which are only 
   167 does not correspond to actual registers, which are only