afl-material: changeset 677:decfd8cf8180

Binary file handouts/ho09.pdf has changed

--- a/handouts/ho09.tex	Sun Oct 27 13:55:43 2019 +0000
+++ b/handouts/ho09.tex	Sun Oct 27 15:16:22 2019 +0000
@@ -1,613 +1,123 @@
+% !TEX program = xelatex
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
 \usepackage{../graphics}
 \usepackage{../grammar}
-\usepackage{multicol}
+%%\usepackage{multicol}
 
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
+%%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
 
 \begin{document}
-\fnote{\copyright{} Christian Urban, King's College London, 2014}
-
-%% why are shuttle flights so good with software
-%%http://www.fastcompany.com/28121/they-write-right-stuff
-
-\section*{Handout 9 (Static Analysis)}
-
-If we want to improve the safety and security of our programs,
-we need a more principled approach to programming. Testing is
-good, but as Edsger Dijkstra famously wrote: 
-
-\begin{quote}\it 
-``Program testing can be a very effective way to show the
-\underline{\smash{presence}} of bugs, but it is hopelessly
-inadequate for showing their \underline{\smash{absence}}.''
-\end{quote}
-
-\noindent While such a more principled approach has been the
-subject of intense study for a long, long time, only in the
-past few years some impressive results have been achieved. One
-is the complete formalisation and (mathematical) verification
-of a microkernel operating system called seL4.
-
-\begin{center}
-\url{http://sel4.systems}
-\end{center}
-
-\noindent In 2011 this work was included in the MIT Technology
-Review in the annual list of the world’s ten most important
-emerging
-technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
-While this work is impressive, its technical details are too
-enormous for an explanation here. Therefore let us look at
-something much simpler, namely finding out properties about
-programs using \emph{static analysis}.
-
-Static analysis is a technique that checks properties of a
-program without actually running the program. This should
-raise alarm bells with you---because almost all interesting
-properties about programs  are equivalent to the halting
-problem, which we know is undecidable. For example estimating
-the memory consumption of programs is in general undecidable,
-just like the halting problem. Static analysis circumvents
-this undecidability-problem by essentially allowing answers
-\emph{yes} and \emph{no}, but also \emph{don't know}. With
-this ``trick'' even the halting problem becomes
-decidable\ldots{}for example we could always say \emph{don't
-know}. Of course this would be silly. The point is that we
-should be striving for a method that answers as often as
-possible either \emph{yes} or \emph{no}---just in cases when
-it is too difficult we fall back on the
-\emph{don't-know}-answer. This might sound all like abstract
-nonsense. Therefore let us look at a concrete example.
-
-
-\subsubsection*{A Simple, Idealised Programming Language}
-
-Our starting point is a small, idealised programming language.
-It is idealised because we cut several corners in comparison
-with real programming languages. The language we will study
-contains, amongst other things, variables holding integers.
-Using static analysis, we want to find out what the sign of
-these integers (positive or negative) will be when the program
-runs. This sign-analysis seems like a very simple problem. But
-even such simple problems, if approached naively, are in
-general undecidable, just like Turing's halting problem. I let
-you think why?
-
-
-Is sign-analysis of variables an interesting problem? Well,
-yes---if a compiler can find out that for example a variable
-will never be negative and this variable is used as an index
-for an array, then the compiler does not need to generate code
-for an underflow-check. Remember some languages are immune to
-buffer-overflow attacks, but they need to add underflow and
-overflow checks everywhere. According to John Regher, an
-expert in the field of compilers, overflow checks can cause
-5-10\% slowdown, in some languages even 100\% for tight
-loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If
-the compiler can omit the underflow check, for example, then
-this can potentially drastically speed up the generated code. 
-
-What do programs in our simple programming language look like?
-The following grammar gives a first specification:
-
-\begin{multicols}{2}
-\begin{plstx}[rhs style=,one per line,left margin=9mm]
-: \meta{Stmt} ::= \meta{label} \texttt{:}
-                | \meta{var} \texttt{:=} \meta{Exp}
-                | \texttt{jmp?} \meta{Exp} \meta{label}
-                | \texttt{goto} \meta{label}\\
-: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
-\end{plstx}
-\columnbreak
-\begin{plstx}[rhs style=,one per line]
: \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
-               | \meta{Exp} \texttt{*} \meta{Exp}
-               | \meta{Exp} \texttt{=} \meta{Exp} 
-               | \meta{num}
-               | \meta{var}\\
-\end{plstx}
-\end{multicols}
-
-\noindent I assume you are familiar with such
-grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}
-There are three main syntactic categories: \emph{statments}
-and \emph{expressions} as well as \emph{programs}, which are
-sequences of statements. Statements are either labels,
-variable assignments, conditional jumps (\pcode{jmp?}) and
-unconditional jumps (\pcode{goto}). Labels are just strings,
-which can be used as the target of a jump. We assume that in
-every program the labels are unique---if there is a clash,
-then we do not know where to jump to. The conditional jumps
-and variable assignments involve (arithmetic) expressions.
-Expressions are either numbers, variables or compound
-expressions built up from \pcode{+}, \pcode{*} and \emph{=}
-(for simplicity reasons we do not consider any other
-operations). We assume we have negative and positive numbers,
-\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
-\pcode{2}\ldots{} An example program that calculates the
-factorial of 5 is in our programming language as follows:
-
-\begin{lstlisting}[language={},xleftmargin=10mm]
-      a := 1
-      n := 5 
-top:  
-      jmp? n = 0 done 
-      a := a * n 
-      n := n + -1 
-      goto top 
-done:
-\end{lstlisting}
-
-\noindent As can be seen each line of the program contains a
-statement. In the first two lines we assign values to the
-variables \pcode{a} and \pcode{n}. In line 4 we test whether
-\pcode{n} is zero, in which case we jump to the end of the
-program marked with the label \pcode{done}. If \pcode{n} is
-not zero, we multiply the content of \pcode{a} by \pcode{n},
-decrease \pcode{n} by one and jump back to the beginning of
-the loop, marked with the label \pcode{top}. Another program
-in our language is shown in Figure~\ref{fib}. I let you think
-what it calculates.
- 
-\begin{figure}[t]
-\begin{lstlisting}[numbers=none,
-                   language={},xleftmargin=10mm]
-      n := 6
-      m1 := 0
-      m2 := 1
-loop: 
-      jmp? n = 0 done
-      tmp := m2
-      m2 := m1 + m2
-      m1 := tmp
-      n := n + -1
-      goto top
-done:
-\end{lstlisting}
-\caption{A mystery program in our idealised programming language.
-Try to find out what it calculates! \label{fib}}
-\end{figure}
-
-Even if our language is rather small, it is still Turing
-complete---meaning quite powerful. However, discussing this
-fact in more detail would lead us too far astray. Clearly, our
-programming is rather low-level and not very comfortable for
-writing programs. It is inspired by real machine code, which
-is the code that is executed by a CPU. So a more interesting
-question is what is missing in comparison with real machine
-code? Well, not much\ldots{}in principle. Real machine code,
-of course, contains many more arithmetic instructions (not
-just addition and multiplication) and many more conditional
-jumps. We could add these to our language if we wanted, but
-complexity is really beside the point here. Furthermore, real
-machine code has many instructions for manipulating memory. We
-do not have this at all. This is actually a more serious
-simplification because we assume numbers to be arbitrary small
-or large, which is not the case with real machine code. In
-real machine code, basic number formats have a range and might
-over-flow or under-flow this range. Also the number of
-variables in our programs is potentially unlimited, while
-memory in an actual computer, of course, is always limited. To
-sum up, our language might look ridiculously simple, but it is
-not too far removed from practically relevant issues.
+\fnote{\copyright{} Christian Urban, King's College London, 2019}
 
 
-\subsubsection*{An Interpreter}
-
-Designing a language is like playing god: you can say what
-names for variables you allow; what programs should look like;
-most importantly you can decide what each part of the program
-should mean and do. While our language is quite simple and the
-meaning of statements, for example, is rather straightforward,
-there are still places where we need to make real choices. For
-example consider the conditional jumps, say the one in the
-factorial program: 
-
-\begin{center}
-\code{jmp? n = 0 done}
-\end{center}
-
-\noindent How should they work? We could introduce Booleans
-(\pcode{true} and \pcode{false}) and then jump only when the
-condition is \pcode{true}. However, since we have numbers in
-our language anyway, why not just encoding \pcode{true} as
-one, and \pcode{false} as zero? In this way we can dispense
-with the additional concept of Booleans.
-
-I hope the above discussion makes it already clear we need to
-be a bit more careful with our programs. Below we shall
-describe an interpreter for our programming language, which
-specifies exactly how programs are supposed to be
-run\ldots{}at least we will specify this for all \emph{good}
-programs. By good programs I mean where all variables are
-initialised, for example. Our interpreter will just crash if
-it cannot find out the value for a variable when it is not
-initialised. Also, we will assume that labels in good programs
-are unique, otherwise our programs will calculate ``garbage''.
-
-First we will pre-process our programs. This will simplify the
-definition of our interpreter later on. By pre-processing our
-programs we will transform programs into \emph{snippets}. A
-snippet is a label and all the code that comes after the
-label. This essentially means a snippet is a \emph{map} from
-labels to code.\footnote{Be sure you know what maps are. In a
-programming context they are often represented as association
-list where some data is associated with a key.} 
+\section*{Handout 9 (LLVM, SSA and CPS)}
 
-Given that programs are sequences (or lists) of statements, we
-can easily calculate the snippets by just traversing this
-sequence and recursively generating the map. Suppose a program
-is of the general form
-
-\begin{center}
-$stmt_1\;stmt_2\; \ldots\; stmt_n$
-\end{center}
-
-\noindent The idea is to go through this sequence of
-statements one by one and check whether they are a label. If
-yes, we add the label and the remaining statements to our map.
-If no, we just continue with the next statement. To come up
-with a recursive definition for generating snippets, let us
-write $[]$ for the program that does not contain any
-statement. Consider the following definition:
-
-\begin{center}
-\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
-$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
-$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
-$\begin{cases}
-   \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
-   \textit{snippets}(rest)                & \text{otherwise}
-\end{cases}$   
-\end{tabular}
-\end{center}
-
-\noindent In the first clause we just return the empty map for
-the program that does not contain any statement. In the second
-clause, we have to distinguish the case where the first
-statement is a label or not. As said before, if not, then we
-just ``throw away'' the label and recursively calculate the
-snippets for the rest of the program (the otherwise clause).
-If yes, then we do the same, but also update the map so that
-$label$ now points to the rest of the statements (the if
-clause). This looks all realtively straightforward, but there
-is one small problem we need to overcome: our two programs
-shown so far have no label as \emph{entry point}---that is
-where the execution is supposed to start. We usually assume
-that the first statement will be run first. To make this the
-default, it is convenient if we add to all our programs a
-default label, say \pcode{""} (the empty string). With this we
-can define our pre-processing of programs as follows
+Reflecting on our tiny compiler targetting the JVM, code generation was
+actually not so hard, no? One of the main reason for this ease is that
+the JVM is a stack-based virtual machine and it is, for example, not
+hard to translate arithmetic expressions into instructions manipulating
+the stack. The problem is that ``real'' CPUs, although supporting stack
+operations, are not really \emph{stack machines}. They are just not
+optimised for this way of calculating things. The design of CPUs is more
+like, here is a piece of memory---compiler, or compiler writer, do
+something with it. Otherwise get lost. So in the name of raw speed,
+modern compilers go the extra mile and generate code that is much easier
+and faster to process by CPUs. 
 
-\begin{center}
-$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$ 
-\end{center} 
-
-\noindent Let us see how this pans out in practice. If we
-pre-process the factorial program shown earlier, we obtain the 
-following map:
+Another reason why it makes sense to go the extra mile is that stack
+instructions are very difficult to optimise---you cannot just re-arrange
+instructions without messing about with what is calculated on the stack.
+Also it is hard to find out if all the calculations on the stack are
+actually necessary and not by chance dead code. The JVM has for all this
+sophisticated machinery to make such ``high-level'' code still run fast,
+but let's say that for the sake of argument we want to not rely on it.
+We want to generate fast code ourselves. This means we have to work around
+the intricacies of what instructions CPUs can actually process. To make
+this all tractable, we target the LLVM Intermediate Language. In this way
+we can take advantage of the tools coming with LLVM and for example do not
+have to worry about things like that CPUs have only a limited amount of
+registers. 
 
-\begin{center}
-\small
-\lstset{numbers=none,
-        language={},
-        xleftmargin=0mm,
-        aboveskip=0.5mm,
-        belowskip=0.5mm,
-        frame=single,
-        framerule=0mm,
-        framesep=0mm}
-\begin{tikzpicture}[node distance=0mm]
-\node (A1) [draw]{\pcode{""}};
-\node (B1) [right=of A1] {$\mapsto$};
-\node (C1) [right=of B1,anchor=north west] {
-\begin{minipage}{3.5cm}
-\begin{lstlisting}[language={},xleftmargin=0mm]
-  a := 1
-  n := 5 
-top:  
-  jmp? n = 0 done 
-  a := a * n 
-  n := n + -1 
-  goto top 
-done:
-\end{lstlisting}
-\end{minipage}};
-\node (A2) [right=of C1.north east,draw] {\pcode{top}};
-\node (B2) [right=of A2] {$\mapsto$};
-\node (C2) [right=of B2, anchor=north west] {
-\begin{minipage}{3.5cm}
-\begin{lstlisting}[language={},xleftmargin=0mm]
-  jmp? n = 0 done 
-  a := a * n 
-  n := n + -1 
-  goto top 
-done:
-\end{lstlisting} 
-\end{minipage}}; 
-\node (A3) [right=of C2.north east,draw] {\pcode{done}};
-\node (B3) [right=of A3] {$\mapsto$};
-\node (C3) [right=of B3] {$[]$};
-\end{tikzpicture}
-\end{center}
-
-\noindent I highlighted the \emph{keys} in this map. Since
-there are three labels in the factorial program (remember we
-added \pcode{""}), there are three keys. When running the
-factorial program and encountering a jump, then we only have
-to consult this snippets-map in order to find out what the
-next statements should be.
-
-We should now be in the position to define how a program
-should be run. In the context of interpreters, this
-``running'' of programs is often called \emph{evaluation}. Let
-us start with the definition of how expressions are evaluated.
-A first attempt might be the following recursive function:
-
-\begin{center}
-\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
-$\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$
-  \qquad\text{if}\; \texttt{n}\; \text{is a number like} 
-  \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
-\pcode{2}\ldots{}\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, 
-                    \texttt{e}_\texttt{2})$ & 
-  $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) + 
-           \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, 
-                    \texttt{e}_\texttt{2})$ & 
-  $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) * 
-           \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, 
-                    \texttt{e}_\texttt{2})$ & 
-  $\dn$ & $\begin{cases}
-  1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) =
-                 \textit{eval\_exp}(\texttt{e}_\texttt{2})\\
-  0 & \text{otherwise}
-  \end{cases}$          
-\end{tabular}
-\end{center}
+LLVM\footnote{\url{http://llvm.org}} is a beautiful example that
+projects from Academia can make a difference in the world. LLVM was
+started in 2000 by two researchers at the  University of Illinois at
+Urbana–Champaign. At the time the behemoth of compilers was gcc with its
+myriad of front-ends for other languages (e.g.~gfortran, Ada, Go,
+Objective-C, Pascal etc). The problem was that gcc morphed over time
+into a monolithic gigantic piece of m\ldots ehm software, which you could
+not mess about in an afternoon. In contrast LLVM was a modular suite of
+tools with which you could play around easily and try out something new.
+LLVM became a big player once Apple hired one of the original developers
+(I cannot remember the reason why Apple did not want to use gcc, but
+maybe they were also just disgusted by big monolithic codebase). Anyway,
+LLVM is now the big player and gcc is more or less legacy. This does not
+mean that programming languages like C and C++ are dying out any time
+soon---they are nicely supported by LLVM.
 
-\noindent While this should look all rather intuitive`, still
-be very careful. There is a subtlety which can be easily
-overlooked: The function \textit{eval\_exp} takes an
-expression of our programming language as input and returns a
-number as output. Therefore whenever we encounter a number in
-our program, we just return this number---this is defined in
-the first clause above. Whenever we encounter an addition,
-well then we first evaluate the left-hand side
-$\texttt{e}_\texttt{1}$ of the addition (this will give a
-number), then evaluate the right-hand side
-$\texttt{e}_\texttt{2}$ (this gives another number), and
-finally add both numbers together. Here is the subtlety: on
-the left-hand side of the $\dn$ we have a \texttt{+} (in the
-teletype font) which is the symbol for addition in our
-programming language. On the right-hand side we have $+$ which
-stands for the arithmetic operation from ``mathematics'' of
-adding two numbers. These are rather different concepts---one
-is a symbol (which we made up), and the other a mathematical
-operation. When we will have a look at an actual
-implementation of our interpreter, the mathematical operation
-will be the function for addition from the programming
-language in which we \underline{\smash{implement}} our
-interpreter. While the \texttt{+} is just a symbol that is
-used in our programming language. Clearly we have to use a
-symbol that is a good mnemonic for addition, otherwise we will
-confuse the programmers working with our language. Therefore
-we use $\texttt{+}$. A similar choice is made for times in the
-third clause and equality in the fourth clause. 
+Targetting the LLVM-IR also means we can profit from the very modular
+structure of the LLVM compiler and let for example the compiler generate
+code for X86, or ARM etc. We can be agnostic about where our code
+actually runs. However, what we have to do is to generate code in
+\emph{Static Single-Assignment} format (short SSA), because that is what
+the LLVM-IR expects from us and which is the intermediate format that
+LLVM can use to do cool things (like targetting strange architectures)
+and allocating memory efficiently. 
 
-Remember I wrote at the beginning of this section about being
-god when designing a programming language. You can see this
-here: we need to give meaning to symbols. At the moment
-however, we are a poor fallible god. Look again at the grammar
-of our programming language and our definition. Clearly, an
-expression can contain variables. So far we have ignored them.
-What should our interpreter do with variables? They might
-change during the evaluation of a program. For example the
-variable \pcode{n} in the factorial program counts down from 5
-down to 0. How can we improve our definition above to give also
-an answer whenever our interpreter encounters a variable in an
-expression? The solution is to add an \emph{environment},
-written $env$, as an additional input argument to our
-\textit{eval\_exp} function.
- 
-\begin{center}
-\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
-$\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$
-  \qquad\text{if}\; \texttt{n}\; \text{is a number like} 
-  \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
-\pcode{2}\ldots{}\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\, 
-                    \texttt{e}_\texttt{2}, env)$ & 
-  $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) + 
-           \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\, 
-                    \texttt{e}_\texttt{2}, env)$ & 
-  $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) * 
-           \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\, 
-                    \texttt{e}_\texttt{2}, env)$ & 
-  $\dn$ & $\begin{cases}
-  1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) =
-                 \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\
-  0 & \text{otherwise}
-  \end{cases}$\\
-$\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$      
-\end{tabular}
-\end{center}
- 
-\noindent This environment $env$ also acts like a map: it
-associates variables with their current values. For example
-after evaluating the first two lines in our factorial
-program, such an environment might look as follows
+The idea behind SSA is to use very simple variable assignments where
+every variable is assigned only once. The assignments also need to be
+extremely primitive in the sense that they can be just simple operations
+like addition, multiplication, jumps and so on. A typical program in SSA
+is 
 
-\begin{center}
-\begin{tabular}{ll}
-$\fbox{\texttt{a}} \mapsto \texttt{1}$ &
-$\fbox{\texttt{n}} \mapsto \texttt{5}$
-\end{tabular}
-\end{center}
+\begin{lstlisting}[language=LLVM,numbers=none]
+    x := 1
+    y := 2     
+    z := x + y
+\end{lstlisting}
 
-\noindent Again I highlighted the keys. In the clause for
-variables, we can therefore consult this environment and
-return whatever value is currently stored for this variable.
-This is written $env(x)$---meaning we query this map with $x$
-we obtain the corresponding number. You might ask what happens
-if an environment does not contain any value for, say, the
-variable $x$? Well, then our interpreter just ``crashes'', or
-more precisely will raise an exception. In this case we have a
-``bad'' program that tried to use a variable before it was
-initialised. The programmer should not have done this. In a
-real programming language, we would of course try a bit harder
-and for example give an error at compile time, or design our
-language in such a way that this can never happen. With the
-second version of \textit{eval\_exp} we completed our
-definition for evaluating expressions.
- 
-Next comes the evaluation function for statements. We define
-this function in such a way that we recursively evaluate a
-whole sequence of statements. Assume a program $p$ (you want
-to evaluate) and its pre-processed snippets $sn$. Then we can
-define:
+\noindent where every variable is used only once. You cannot for example have
+
+\begin{lstlisting}[language=LLVM,numbers=none]
+    x := 1
+    y := 2     
+    x := x + y
+\end{lstlisting}
 
-\begin{center}
-\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
-$\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\
-$\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &
-  $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\ 
-$\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ & 
-  $\dn$ & $\textit{eval\_stmts}(rest, 
-           env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\  
-$\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$ 
- & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\
-$\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$ 
- & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}
- \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\ 
- & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\
- \textit{eval\_stmts}(rest, env) & \text{otherwise}\\
- \end{array}\end{cases}$  
-\end{tabular}
-\end{center}
+\noindent because in this snippet \texttt{x} is assigned twice. There
+are sophisticated algorithm for imperative languages, like C, for
+efficiently transforming a program into SSA format, but we do not have
+to be concerned about those. We want to compile a functional language
+and there things get much more interesting than just sophisticated. We
+will need to have a look at CPS translations, which stands for
+Continuation-Passing-Style---basically black art. So sit tight.
 
-\noindent The first clause is for the empty program, or when
-we arrived at the end of the program. In this case we just
-return the environment. The second clause is for when the next
-statement is a label. That means the program is of the form
-$\texttt{label:}\;rest$ where the label is some string and
-$rest$ stands for all following statements. This case is easy,
-because our evaluation function just discards the label and
-evaluates the rest of the statements (we already extracted all
-important information about labels when we pre-processed our
-programs and generated the snippets). The third clause is for
-variable assignments. Again we just evaluate the rest for the
-statements, but with a modified environment---since the
-variable assignment is supposed to introduce a new variable or
-change the current value of a variable. For this modification
-of the environment we first evaluate the expression
-$\texttt{e}$ using our evaluation function for expressions.
-This gives us a number. Then we assign this number to the
-variable $x$ in the environment. This modified environment
-will be used to evaluate the rest of the program. The fourth
-clause is for the unconditional jump to a label, called
-\texttt{lbl}. That means we have to look up in our snippets
-map $sn$ what are the next statements for this label.
-Therefore we will continue with evaluating, not with the rest
-of the program, but with the statements stored in the
-snippets-map under the label $\texttt{lbl}$. The fifth clause
-for conditional jumps is similar, but to decide whether to
-make the jump we first need to evaluate the expression
-$\texttt{e}$ in order to find out whether it is $1$. If yes,
-we jump, otherwise we just continue with evaluating the rest
-of the program.
-
-Our interpreter works in two stages: First we pre-process our
-program generating the snippets map $sn$, say. Second we call
-the evaluation function with the default entry point and the
-empty environment:
+\subsection*{CPS-Translations}
 
-\begin{center}
-$\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$
-\end{center}
- 
-\noindent It is interesting to note that our interpreter when
-it comes to the end of the program returns an environment. Our
-programming language does not contain any constructs for input
-and output. Therefore this environment is the only effect we
-can observe when running the program (apart from that our
-interpreter might need some time before finishing the
-evaluation of the program and the CPU getting hot). Evaluating 
-the factorial program with our interpreter we receive as
-``answer''-environment
+What is good about our simple fun language is that it basically only 
+contains expressions (be they arithmetic expressions or boolean expressions).
+The only exceptions are function definitions, for which we can easily
+use the mechanism of defining functions in LLVM-IR. For example the simple
+fun program 
 
-\begin{center}
-\begin{tabular}{ll}
-$\fbox{\texttt{a}} \mapsto \texttt{120}$ &
-$\fbox{\texttt{n}} \mapsto \texttt{0}$
-\end{tabular}
-\end{center}
-
-\noindent While the discussion above should have illustrated
-the ideas, in order to do some serious calculations, we clearly
-need to implement the interpreter.
-
-
-\subsubsection*{Scala Code for the Interpreter}
+\begin{lstlisting}[language=Scala,numbers=none]
+def dble(x) = x * x
+\end{lstlisting}
 
-Functional programming languages are very convenient for
-implementations of interpreters. A good choice for a
-functional programming language is Scala, a programming
-language that combines functional and object-oriented
-pro\-gramming-styles. It has received in the last five years or
-so quite a bit of attention. One reason for this attention is
-that, like the Java programming language, Scala compiles to
-the Java Virtual Machine (JVM) and therefore Scala programs
-can run under MacOSX, Linux and Windows.\footnote{There are
-also experimental backends for Android and JavaScript.} Unlike
-Java, however, Scala often allows programmers to write very
-concise and elegant code. Some therefore say Scala is the much
-better Java. A number of companies, The Guardian, Twitter,
-Coursera, FourSquare, LinkedIn to name a few, either use Scala
-exclusively in production code, or at least to some
-substantial degree. If you want to try out Scala yourself, the
-Scala compiler can be downloaded from
-
-\begin{quote}
-\url{http://www.scala-lang.org}
-\end{quote}
+\noindent
+can be compiled into the following LLVM-IR function:
 
-Let us have a look at the Scala code shown in
-Figure~\ref{code}. It shows the entire code for the
-interpreter, though the implementation is admittedly no
-frills.
+\begin{lstlisting}[language=LLVM]
+define i32 dble(i32 %x) {
+   %tmp =  mul i32 %x, % x
+   ret i32 %tmp
+}    
+\end{lstlisting}
 
-\begin{figure}[t]
-\small
-\lstinputlisting[language=Scala]{../progs/inter.scala}
-\caption{The entire code of the interpreter for our
-idealised programming language.\label{code}}
-\end{figure}
-
-
-\subsubsection*{Static Analysis}
-
-Finally we can come back to our original problem, namely 
-finding out what the signs of variables are 
-
-\begin{center}
-
-
-\end{center}
  
 \end{document}
 
-%% list of static analysers for C
-http://spinroot.com/static/index.html
-
-%% NASA coding rules for C
-http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf
 
 %%% Local Variables: 
 %%% mode: latex

author	Christian Urban <urbanc@in.tum.de>
	Sun, 27 Oct 2019 15:16:22 +0000
changeset 677	decfd8cf8180
parent 676	62d168bf7ac8
child 678	ff3b48da282c

handouts/ho09.pdf		file \| annotate \| diff \| comparison \| revisions
handouts/ho09.tex		file \| annotate \| diff \| comparison \| revisions