afl-material: comparison handouts/ho09.tex

equal deleted inserted replaced

-:e32802acf952
+:eef6a56c185a
 \section*{Handout 9 (LLVM, SSA and CPS)}
 Reflecting on our two tiny compilers targetting the JVM, the code
 generation part was actually not so hard, no? Pretty much just some
-post-traversal of the abstract syntax tree, yes? One of the reasons
+post-traversal of the abstract syntax tree. Yes? One of the reasons
 for this ease is that the JVM is a stack-based virtual machine and it
 is therefore not hard to translate deeply-nested arithmetic
 expressions into a sequence of instructions manipulating the
-stack. The problem is that ``real'' CPUs, although supporting stack
+stack. That is pretty much the whole point of the JVM. The problem is
-operations, are not really designed to be \emph{stack machines}.  The
+that ``real'' CPUs, although supporting stack operations, are not
-design of CPUs is more like: Here are some instructions and a chunk of
+really designed to be \emph{stack machines}.  The design of CPUs is
+more like: Here are some instructions and a chunk of
 memory---compiler, or better compiler writers, do something with
 them. Consequently, modern compilers need to go the extra mile in
 order to generate code that is much easier and faster to process by
 actual CPUs, rather than running code on virtual machines that
 introduce an additional layer of indirection. To make this all
-tractable for this module, we target the LLVM Intermediate
+tractable for this module, we target the \emph{LLVM Intermediate
-Language. In this way we can take advantage of the tools coming with
+Language} (LLVM-IR). In this way we can take advantage of the tools
-LLVM.\footnote{\url{http://llvm.org}} For example we do not have to
+coming with LLVM.\footnote{\url{http://llvm.org}} For example we do
-worry about things like register allocations. By using the LLVM-IR,
+not have to worry about things like register allocations or peephole
-however, we also have to pay price in the sense that generating code
+optimisations. By using the LLVM-IR, however, we also have to pay
-gets harder\ldots{}unfor\-tunately.
+price in the sense that generating code gets
+harder\ldots{}unfor\-tunately nothing comes for free in life.
 \subsection*{LLVM and the LLVM-IR}
 \noindent LLVM is a beautiful example
-that projects from Academia can make a difference in the World. LLVM
+that projects from Academia can make a difference in the Real World. LLVM
 started in 2000 as a project by two researchers at the  University of
 Illinois at Urbana-Champaign. At the time the behemoth of compilers was
 gcc with its myriad of front-ends for different programming languages (C++, Fortran,
 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
 time into a monolithic gigantic piece of m\ldots ehm complicated
 However, what we have to do in order to make LLVM to play ball is to
 generate code in \emph{Static Single-Assignment} format (short SSA). A
 reason why LLVM uses the SSA-format, rather than JVM-like stack
 instructions, is that stack instructions are difficult to
 optimise---you cannot just re-arrange instructions without messing
-about with what is calculated on the stack. Also it is hard to find
+about with what is calculated on the stack. Have a look at the
-out if all the calculations on the stack are actually necessary and
+expression $((a + b) * 4) - (3 * (a + b))$ and the corresponding JVM
-not by chance dead code. The JVM has for all these obstacles
+instructions:
-sophisticated machinery to make such ``high-level'' code still run
-fast, but let's say that for the sake of argument we do not want to
+\begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
-rely on it. We want to generate fast code ourselves. This means we
+iload a
-have to work around the intricacies of what instructions CPUs can
+iload b
-actually process fast. This is what the SSA format is designed for.
+iadd
+ldc 4
+imul
+ldc 3
+iload a
+iload b
+iadd
+imul
+isub
+\end{lstlisting}
+\noindent
+and try to reorganise the code such that you calculate the expression
+$(a + b)$ only once. This requires either quite a bit of
+math-understanding from the compiler or you need to add ``copy-and-fetch''
+of a result from a local variable.  Also it is hard to find out if all
+the calculations on the stack are actually necessary and not by chance
+dead code. The JVM has for all these obstacles sophisticated machinery
+to make such ``high-level'' code still run fast, but let's say that
+for the sake of argument we do not want to rely on it. We want to
+generate fast code ourselves. This means we have to work around the
+intricacies of what instructions CPUs can actually process fast. This
+is what the SSA format is designed for.
 The main idea behind the SSA-format is to have sequences of very
 simple variable assignments where every (tmp)variable is assigned only
 once. The assignments need to be simple in the sense that they can be
 \noindent where every tmpX-variable is used only once (we could, for
 example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if
 this would not change the overall result). At the end we have a
 return-instruction wich contains the final result of the
-expression. As can be seen, the task we have to solve is to take apart
+expression. As can be seen the task we have to solve for generating
-compound expressions as shown above and transfrom them into a sequence
+SSA-code is to take apart compound expressions into its most basic
-of simple assignments. Note that this for example means we have to
+''particles'' and transfrom them into a sequence of simple assignments
-fix the order in which the expression is calculated.
+that calculates the desired result. Note that this means we have to
+fix the order in which the expression is calculated, like from the
+left to right.
 There are sophisticated algorithms for imperative languages, like C,
 that efficiently transform high-level programs into SSA-format. But
 we can ignore them here. We want to compile a functional language and
 there things get much more interesting than just sophisticated. We
 case KReturn(v: KVal)
 }
 \end{lstlisting}
 \noindent
-By having in \texttt{KLet} taking first a string (standing for a
+By having \texttt{KLet} taking first a string (standing for a
 tmp-variable) and second a value, we can fulfil the SSA constraint in
 assignments ``by con\-struction''---there is no way we could write
 anything else than a K-value.  Note that the third argument of a
 \texttt{KLet} is again a K-expression, meaning either another
 \texttt{KLet} or a \texttt{KReturn}. In this way we can construct a
-sequence of computations and indicate what is the final result of the
+sequence of computations and indicate what the final result of the
-computations.  According to the SSA-format, we also have to ensure
+computations is.  According to the SSA-format, we also have to ensure
 that all intermediary computations are given (fresh) names---we will
 use an (ugly) counter for this.
 To sum up, K-values are the atomic operations that can be on the
 right-hand side of assignemnts. The K-language is restricted such that
 it is easy to generate the SSA-format for the LLVM-IR. What remains to
 be done is a translation of our Fun-language into the K-language. The
 main difficulty is that we need to order the computation---in the
-K-language we only have linear sequence of instructions. Before we
+K-language we only have linear sequences of instructions. Before we
 explain this, we have a look at some programs in CPS-style.
 \section*{Worked Example}
 Let us now come back to the CPS-translations for the Fun-language.
 Though we will start with a simpler subset containing only numbers,
-arithmetic expressions and function calls.  The main difficulty of
+arithmetic expressions and function calls---no variables for the
-generating instructions in SSA-format is that large compound
+moment.  The main difficulty of generating instructions in SSA-format
-expressions need to be broken up into smaller pieces and intermediate
+is that large compound expressions need to be broken up into smaller
-results need to be chained into later instructions. To do this
+pieces and intermediate results need to be chained into later
-conveniently, we use the CPS-translation mechanism. What continuations
+instructions. To do this conveniently, we use the CPS-translation
-essentially solve is the following problem: Given an expressions
+mechanism. What the continuations essentially solve is the following
+problem: Given an expressions
 \begin{equation}\label{exp}
 (1 + 2) * (3 + 4)
 \end{equation}
 \noindent
-which of the subexpressions should be calculated first? We just
+which of the subexpressions should be calculated first? We are going
-arbitrarily going to decide that the calculation will be from left to
+arbitrarily to decide that the calculation will be from left to
 right. Other languages make different choices---C famously is right to
 left. In our case this means we have to tear the expression shown in
 \eqref{exp} apart as follows:
 \[
 \noindent
 where \code{k} is the continuation and \code{e} is the expression to
 be compiled. The result of the function is a K-expression, which later
 can be compiled into LLVM-IR code.
-In case we have numbers, then we can just pass them in CPS-translation
+In case we have numbers, then we can just pass them in the CPS-translation
 to the continuations because numbers need not be further teared apart
 as they are already primitive. Passing the number to the continuation
 means we apply the continuation like
 \begin{lstlisting}[language=Scala,numbers=none]
 \end{lstlisting}
 \noindent
 What we essentially have to do in this case is the following: compile
 the subexpressions \texttt{e1} and \texttt{e2}. They will produce some
-result that is stored in two temporary variables (assuming they are more
+result that is stored in two temporary variables (assuming \texttt{e1} and \texttt{e2} are more
 complicated than just numbers). We need to use these two temporary
 variables and feed them into a new assignment of the form
 \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}]
 let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in
 as type for the continuation. Once we created the assignment with the
 fresh temporary variable \texttt{z}, we need to ``communicate'' that
 the result of the computation of the arithmetic expression is stored
 in \texttt{z} to the computations that follow. In this way we apply
 the continuation \texttt{k} with this new variable (essentially we are
-plugging in a hole further down the line).  Hope this makes sense.
+plugging in a hole further down the line).  Hope this makes sense!? If not,
+play with the given code yourself.
 The last case we need to consider in our small expression language are
 function calls. For them remember each argument of the function
-call can in SSA-format only be a variable or a number.
+call can in SSA-format only be a variable or a number. Here is the
+complete code for this case:
 \begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm]
 case Call(fname, args) => {
 def aux(args: List[Expr], vs: List[KVal]): KExp = args match {
 case Nil => {
 aux(args, Nil)
 }
 \end{lstlisting}
 \noindent
-For this case we introduce an auxiliary function that compiles first all
+As can be seen, we introduce an auxiliary function that compiles first all
 function arguments---remember in our source language we can have calls
 such as $foo(3 + 4, g(3))$ where we first have to create temporary
 variables (and computations) for each argument. Therefore \texttt{aux}
 analyses the argument list from left to right. In case there is an
 argument \texttt{a} on the front of the list (the case \texttt{a::as}
 in Line 7), we call CPS recursively for the corresponding
 subexpression. The temporary variable containing the result for this
-argument we add to the end of the K-values we have already analysed
+argument, we add to the end of the K-values we have already analysed
 before. Again very conveniently we can use the recursive call to
 \texttt{aux} as the continuation for the computations that
 follow. When we reach the end of the argument list (the
 \texttt{Nil}-case in Lines 3--6), then we collect all the K-values
 \texttt{v1} to \texttt{vn} and call the function in the K-language
 CPS-translation you can find in the code.
 \section*{Next Steps}
 Having obtained a K-expression, it is relatively straightforward to
-generate a valid program for the LLVM-IR. We leave this to the
+generate a valid program for the LLVM-IR---remember the K-language
-attentive reader. What else can we do?  Well it should be relatively
+already enforces the SSA convention of a linear sequence of primitive
-easy to apply some common optimisations to the K-expressions. One
+instructions involving only unique temporary variables.
-optimisations is called constant folding---for example if we have an
+We leave this step to the attentive reader.
-expression $3 + 4$ then we can replace it by just $5$ instead of
-generating code to compute $5$. Now this information needs to be
+What else can we do?  Well it should be relatively easy to apply some
-propagated to the next computation step to see whether any further
+common optimisations to the K-expressions. One optimisations is called
-constant foldings are possible. Another useful technique is common
+constant folding---for example if we have an expression $3 + 4$ then
-subexpression elimination, meaning if you have twice a calculation of
+we can replace it by just $5$ instead of generating code to compute
-a function $foo(a)$, then we want to call it only once and store the
+$5$. Now this information needs to be propagated to the next
-result in a temporary variable that can be used instead of the second,
+computation step to see whether any further constant foldings are
-or third, call to $foo(a)$. Again I leave this to the attentive reader
+possible. Another useful technique is common subexpression
-to work out and implement.
+elimination, meaning if you have twice a calculation of a function
+$foo(a)$, then we want to call it only once and store the result in a
+temporary variable that can be used instead of the second, or third,
+call to $foo(a)$. Again I leave this to the attentive reader to work
+out and implement.
 \begin{figure}[p]\small
 \begin{lstlisting}[language=Scala,numbers=none]
 // Fun language (expressions)

changeset 913	eef6a56c185a
parent 912	e32802acf952
child 917	89e05a230d2d