handouts/ho09.tex
changeset 913 eef6a56c185a
parent 912 e32802acf952
child 917 89e05a230d2d
equal deleted inserted replaced
912:e32802acf952 913:eef6a56c185a
    17 
    17 
    18 \section*{Handout 9 (LLVM, SSA and CPS)}
    18 \section*{Handout 9 (LLVM, SSA and CPS)}
    19 
    19 
    20 Reflecting on our two tiny compilers targetting the JVM, the code
    20 Reflecting on our two tiny compilers targetting the JVM, the code
    21 generation part was actually not so hard, no? Pretty much just some
    21 generation part was actually not so hard, no? Pretty much just some
    22 post-traversal of the abstract syntax tree, yes? One of the reasons
    22 post-traversal of the abstract syntax tree. Yes? One of the reasons
    23 for this ease is that the JVM is a stack-based virtual machine and it
    23 for this ease is that the JVM is a stack-based virtual machine and it
    24 is therefore not hard to translate deeply-nested arithmetic
    24 is therefore not hard to translate deeply-nested arithmetic
    25 expressions into a sequence of instructions manipulating the
    25 expressions into a sequence of instructions manipulating the
    26 stack. The problem is that ``real'' CPUs, although supporting stack
    26 stack. That is pretty much the whole point of the JVM. The problem is
    27 operations, are not really designed to be \emph{stack machines}.  The
    27 that ``real'' CPUs, although supporting stack operations, are not
    28 design of CPUs is more like: Here are some instructions and a chunk of
    28 really designed to be \emph{stack machines}.  The design of CPUs is
       
    29 more like: Here are some instructions and a chunk of
    29 memory---compiler, or better compiler writers, do something with
    30 memory---compiler, or better compiler writers, do something with
    30 them. Consequently, modern compilers need to go the extra mile in
    31 them. Consequently, modern compilers need to go the extra mile in
    31 order to generate code that is much easier and faster to process by
    32 order to generate code that is much easier and faster to process by
    32 actual CPUs, rather than running code on virtual machines that
    33 actual CPUs, rather than running code on virtual machines that
    33 introduce an additional layer of indirection. To make this all
    34 introduce an additional layer of indirection. To make this all
    34 tractable for this module, we target the LLVM Intermediate
    35 tractable for this module, we target the \emph{LLVM Intermediate
    35 Language. In this way we can take advantage of the tools coming with
    36   Language} (LLVM-IR). In this way we can take advantage of the tools
    36 LLVM.\footnote{\url{http://llvm.org}} For example we do not have to
    37 coming with LLVM.\footnote{\url{http://llvm.org}} For example we do
    37 worry about things like register allocations. By using the LLVM-IR,
    38 not have to worry about things like register allocations or peephole
    38 however, we also have to pay price in the sense that generating code
    39 optimisations. By using the LLVM-IR, however, we also have to pay
    39 gets harder\ldots{}unfor\-tunately.
    40 price in the sense that generating code gets
       
    41 harder\ldots{}unfor\-tunately nothing comes for free in life.
    40 
    42 
    41 \subsection*{LLVM and the LLVM-IR}
    43 \subsection*{LLVM and the LLVM-IR}
    42 
    44 
    43 \noindent LLVM is a beautiful example
    45 \noindent LLVM is a beautiful example
    44 that projects from Academia can make a difference in the World. LLVM
    46 that projects from Academia can make a difference in the Real World. LLVM
    45 started in 2000 as a project by two researchers at the  University of
    47 started in 2000 as a project by two researchers at the  University of
    46 Illinois at Urbana-Champaign. At the time the behemoth of compilers was
    48 Illinois at Urbana-Champaign. At the time the behemoth of compilers was
    47 gcc with its myriad of front-ends for different programming languages (C++, Fortran,
    49 gcc with its myriad of front-ends for different programming languages (C++, Fortran,
    48 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
    50 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
    49 time into a monolithic gigantic piece of m\ldots ehm complicated
    51 time into a monolithic gigantic piece of m\ldots ehm complicated
    71 However, what we have to do in order to make LLVM to play ball is to
    73 However, what we have to do in order to make LLVM to play ball is to
    72 generate code in \emph{Static Single-Assignment} format (short SSA). A
    74 generate code in \emph{Static Single-Assignment} format (short SSA). A
    73 reason why LLVM uses the SSA-format, rather than JVM-like stack
    75 reason why LLVM uses the SSA-format, rather than JVM-like stack
    74 instructions, is that stack instructions are difficult to
    76 instructions, is that stack instructions are difficult to
    75 optimise---you cannot just re-arrange instructions without messing
    77 optimise---you cannot just re-arrange instructions without messing
    76 about with what is calculated on the stack. Also it is hard to find
    78 about with what is calculated on the stack. Have a look at the
    77 out if all the calculations on the stack are actually necessary and
    79 expression $((a + b) * 4) - (3 * (a + b))$ and the corresponding JVM
    78 not by chance dead code. The JVM has for all these obstacles
    80 instructions:
    79 sophisticated machinery to make such ``high-level'' code still run
    81 
    80 fast, but let's say that for the sake of argument we do not want to
    82 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
    81 rely on it. We want to generate fast code ourselves. This means we
    83 iload a
    82 have to work around the intricacies of what instructions CPUs can
    84 iload b
    83 actually process fast. This is what the SSA format is designed for.
    85 iadd
       
    86 ldc 4
       
    87 imul
       
    88 ldc 3
       
    89 iload a
       
    90 iload b
       
    91 iadd
       
    92 imul
       
    93 isub
       
    94 \end{lstlisting}
       
    95 
       
    96 \noindent
       
    97 and try to reorganise the code such that you calculate the expression
       
    98 $(a + b)$ only once. This requires either quite a bit of
       
    99 math-understanding from the compiler or you need to add ``copy-and-fetch''
       
   100 of a result from a local variable.  Also it is hard to find out if all
       
   101 the calculations on the stack are actually necessary and not by chance
       
   102 dead code. The JVM has for all these obstacles sophisticated machinery
       
   103 to make such ``high-level'' code still run fast, but let's say that
       
   104 for the sake of argument we do not want to rely on it. We want to
       
   105 generate fast code ourselves. This means we have to work around the
       
   106 intricacies of what instructions CPUs can actually process fast. This
       
   107 is what the SSA format is designed for.
    84 
   108 
    85 
   109 
    86 The main idea behind the SSA-format is to have sequences of very
   110 The main idea behind the SSA-format is to have sequences of very
    87 simple variable assignments where every (tmp)variable is assigned only
   111 simple variable assignments where every (tmp)variable is assigned only
    88 once. The assignments need to be simple in the sense that they can be
   112 once. The assignments need to be simple in the sense that they can be
   103 
   127 
   104 \noindent where every tmpX-variable is used only once (we could, for
   128 \noindent where every tmpX-variable is used only once (we could, for
   105 example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if
   129 example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if
   106 this would not change the overall result). At the end we have a
   130 this would not change the overall result). At the end we have a
   107 return-instruction wich contains the final result of the
   131 return-instruction wich contains the final result of the
   108 expression. As can be seen, the task we have to solve is to take apart
   132 expression. As can be seen the task we have to solve for generating
   109 compound expressions as shown above and transfrom them into a sequence
   133 SSA-code is to take apart compound expressions into its most basic
   110 of simple assignments. Note that this for example means we have to
   134 ''particles'' and transfrom them into a sequence of simple assignments
   111 fix the order in which the expression is calculated. 
   135 that calculates the desired result. Note that this means we have to
       
   136 fix the order in which the expression is calculated, like from the
       
   137 left to right.
   112 
   138 
   113 There are sophisticated algorithms for imperative languages, like C,
   139 There are sophisticated algorithms for imperative languages, like C,
   114 that efficiently transform high-level programs into SSA-format. But
   140 that efficiently transform high-level programs into SSA-format. But
   115 we can ignore them here. We want to compile a functional language and
   141 we can ignore them here. We want to compile a functional language and
   116 there things get much more interesting than just sophisticated. We
   142 there things get much more interesting than just sophisticated. We
   353   case KReturn(v: KVal)
   379   case KReturn(v: KVal)
   354 }
   380 }
   355 \end{lstlisting}
   381 \end{lstlisting}
   356 
   382 
   357 \noindent
   383 \noindent
   358 By having in \texttt{KLet} taking first a string (standing for a
   384 By having \texttt{KLet} taking first a string (standing for a
   359 tmp-variable) and second a value, we can fulfil the SSA constraint in
   385 tmp-variable) and second a value, we can fulfil the SSA constraint in
   360 assignments ``by con\-struction''---there is no way we could write
   386 assignments ``by con\-struction''---there is no way we could write
   361 anything else than a K-value.  Note that the third argument of a
   387 anything else than a K-value.  Note that the third argument of a
   362 \texttt{KLet} is again a K-expression, meaning either another
   388 \texttt{KLet} is again a K-expression, meaning either another
   363 \texttt{KLet} or a \texttt{KReturn}. In this way we can construct a
   389 \texttt{KLet} or a \texttt{KReturn}. In this way we can construct a
   364 sequence of computations and indicate what is the final result of the
   390 sequence of computations and indicate what the final result of the
   365 computations.  According to the SSA-format, we also have to ensure
   391 computations is.  According to the SSA-format, we also have to ensure
   366 that all intermediary computations are given (fresh) names---we will
   392 that all intermediary computations are given (fresh) names---we will
   367 use an (ugly) counter for this.
   393 use an (ugly) counter for this.
   368 
   394 
   369 To sum up, K-values are the atomic operations that can be on the
   395 To sum up, K-values are the atomic operations that can be on the
   370 right-hand side of assignemnts. The K-language is restricted such that
   396 right-hand side of assignemnts. The K-language is restricted such that
   371 it is easy to generate the SSA-format for the LLVM-IR. What remains to
   397 it is easy to generate the SSA-format for the LLVM-IR. What remains to
   372 be done is a translation of our Fun-language into the K-language. The
   398 be done is a translation of our Fun-language into the K-language. The
   373 main difficulty is that we need to order the computation---in the
   399 main difficulty is that we need to order the computation---in the
   374 K-language we only have linear sequence of instructions. Before we
   400 K-language we only have linear sequences of instructions. Before we
   375 explain this, we have a look at some programs in CPS-style.
   401 explain this, we have a look at some programs in CPS-style.
   376 
   402 
   377 
   403 
   378 
   404 
   379 
   405 
   477 
   503 
   478 \section*{Worked Example}
   504 \section*{Worked Example}
   479 
   505 
   480 Let us now come back to the CPS-translations for the Fun-language.
   506 Let us now come back to the CPS-translations for the Fun-language.
   481 Though we will start with a simpler subset containing only numbers,
   507 Though we will start with a simpler subset containing only numbers,
   482 arithmetic expressions and function calls.  The main difficulty of
   508 arithmetic expressions and function calls---no variables for the
   483 generating instructions in SSA-format is that large compound
   509 moment.  The main difficulty of generating instructions in SSA-format
   484 expressions need to be broken up into smaller pieces and intermediate
   510 is that large compound expressions need to be broken up into smaller
   485 results need to be chained into later instructions. To do this
   511 pieces and intermediate results need to be chained into later
   486 conveniently, we use the CPS-translation mechanism. What continuations
   512 instructions. To do this conveniently, we use the CPS-translation
   487 essentially solve is the following problem: Given an expressions
   513 mechanism. What the continuations essentially solve is the following
       
   514 problem: Given an expressions
   488 
   515 
   489 \begin{equation}\label{exp}
   516 \begin{equation}\label{exp}
   490 (1 + 2) * (3 + 4)
   517 (1 + 2) * (3 + 4)
   491 \end{equation}  
   518 \end{equation}  
   492 
   519 
   493 \noindent
   520 \noindent
   494 which of the subexpressions should be calculated first? We just
   521 which of the subexpressions should be calculated first? We are going 
   495 arbitrarily going to decide that the calculation will be from left to
   522 arbitrarily to decide that the calculation will be from left to
   496 right. Other languages make different choices---C famously is right to
   523 right. Other languages make different choices---C famously is right to
   497 left. In our case this means we have to tear the expression shown in
   524 left. In our case this means we have to tear the expression shown in
   498 \eqref{exp} apart as follows:
   525 \eqref{exp} apart as follows:
   499 
   526 
   500 \[
   527 \[
   567 \noindent 
   594 \noindent 
   568 where \code{k} is the continuation and \code{e} is the expression to
   595 where \code{k} is the continuation and \code{e} is the expression to
   569 be compiled. The result of the function is a K-expression, which later
   596 be compiled. The result of the function is a K-expression, which later
   570 can be compiled into LLVM-IR code.
   597 can be compiled into LLVM-IR code.
   571 
   598 
   572 In case we have numbers, then we can just pass them in CPS-translation
   599 In case we have numbers, then we can just pass them in the CPS-translation
   573 to the continuations because numbers need not be further teared apart
   600 to the continuations because numbers need not be further teared apart
   574 as they are already primitive. Passing the number to the continuation
   601 as they are already primitive. Passing the number to the continuation
   575 means we apply the continuation like
   602 means we apply the continuation like
   576 
   603 
   577 \begin{lstlisting}[language=Scala,numbers=none]
   604 \begin{lstlisting}[language=Scala,numbers=none]
   593 \end{lstlisting}
   620 \end{lstlisting}
   594 
   621 
   595 \noindent
   622 \noindent
   596 What we essentially have to do in this case is the following: compile
   623 What we essentially have to do in this case is the following: compile
   597 the subexpressions \texttt{e1} and \texttt{e2}. They will produce some
   624 the subexpressions \texttt{e1} and \texttt{e2}. They will produce some
   598 result that is stored in two temporary variables (assuming they are more
   625 result that is stored in two temporary variables (assuming \texttt{e1} and \texttt{e2} are more
   599 complicated than just numbers). We need to use these two temporary
   626 complicated than just numbers). We need to use these two temporary
   600 variables and feed them into a new assignment of the form
   627 variables and feed them into a new assignment of the form
   601 
   628 
   602 \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}]
   629 \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}]
   603 let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in
   630 let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in
   619 as type for the continuation. Once we created the assignment with the
   646 as type for the continuation. Once we created the assignment with the
   620 fresh temporary variable \texttt{z}, we need to ``communicate'' that
   647 fresh temporary variable \texttt{z}, we need to ``communicate'' that
   621 the result of the computation of the arithmetic expression is stored
   648 the result of the computation of the arithmetic expression is stored
   622 in \texttt{z} to the computations that follow. In this way we apply
   649 in \texttt{z} to the computations that follow. In this way we apply
   623 the continuation \texttt{k} with this new variable (essentially we are
   650 the continuation \texttt{k} with this new variable (essentially we are
   624 plugging in a hole further down the line).  Hope this makes sense.
   651 plugging in a hole further down the line).  Hope this makes sense!? If not,
       
   652 play with the given code yourself.
   625 
   653 
   626 The last case we need to consider in our small expression language are
   654 The last case we need to consider in our small expression language are
   627 function calls. For them remember each argument of the function
   655 function calls. For them remember each argument of the function
   628 call can in SSA-format only be a variable or a number.
   656 call can in SSA-format only be a variable or a number. Here is the
       
   657 complete code for this case:
   629 
   658 
   630 \begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm]
   659 \begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm]
   631 case Call(fname, args) => {
   660 case Call(fname, args) => {
   632   def aux(args: List[Expr], vs: List[KVal]): KExp = args match {
   661   def aux(args: List[Expr], vs: List[KVal]): KExp = args match {
   633        case Nil => {
   662        case Nil => {
   639   aux(args, Nil)  
   668   aux(args, Nil)  
   640 }
   669 }
   641 \end{lstlisting}
   670 \end{lstlisting}
   642 
   671 
   643 \noindent
   672 \noindent
   644 For this case we introduce an auxiliary function that compiles first all
   673 As can be seen, we introduce an auxiliary function that compiles first all
   645 function arguments---remember in our source language we can have calls
   674 function arguments---remember in our source language we can have calls
   646 such as $foo(3 + 4, g(3))$ where we first have to create temporary
   675 such as $foo(3 + 4, g(3))$ where we first have to create temporary
   647 variables (and computations) for each argument. Therefore \texttt{aux}
   676 variables (and computations) for each argument. Therefore \texttt{aux}
   648 analyses the argument list from left to right. In case there is an
   677 analyses the argument list from left to right. In case there is an
   649 argument \texttt{a} on the front of the list (the case \texttt{a::as}
   678 argument \texttt{a} on the front of the list (the case \texttt{a::as}
   650 in Line 7), we call CPS recursively for the corresponding
   679 in Line 7), we call CPS recursively for the corresponding
   651 subexpression. The temporary variable containing the result for this
   680 subexpression. The temporary variable containing the result for this
   652 argument we add to the end of the K-values we have already analysed
   681 argument, we add to the end of the K-values we have already analysed
   653 before. Again very conveniently we can use the recursive call to
   682 before. Again very conveniently we can use the recursive call to
   654 \texttt{aux} as the continuation for the computations that
   683 \texttt{aux} as the continuation for the computations that
   655 follow. When we reach the end of the argument list (the
   684 follow. When we reach the end of the argument list (the
   656 \texttt{Nil}-case in Lines 3--6), then we collect all the K-values
   685 \texttt{Nil}-case in Lines 3--6), then we collect all the K-values
   657 \texttt{v1} to \texttt{vn} and call the function in the K-language
   686 \texttt{v1} to \texttt{vn} and call the function in the K-language
   701 CPS-translation you can find in the code.
   730 CPS-translation you can find in the code.
   702 
   731 
   703 \section*{Next Steps}
   732 \section*{Next Steps}
   704 
   733 
   705 Having obtained a K-expression, it is relatively straightforward to
   734 Having obtained a K-expression, it is relatively straightforward to
   706 generate a valid program for the LLVM-IR. We leave this to the
   735 generate a valid program for the LLVM-IR---remember the K-language
   707 attentive reader. What else can we do?  Well it should be relatively
   736 already enforces the SSA convention of a linear sequence of primitive
   708 easy to apply some common optimisations to the K-expressions. One
   737 instructions involving only unique temporary variables. 
   709 optimisations is called constant folding---for example if we have an
   738 We leave this step to the attentive reader.
   710 expression $3 + 4$ then we can replace it by just $5$ instead of
   739 
   711 generating code to compute $5$. Now this information needs to be
   740 What else can we do?  Well it should be relatively easy to apply some
   712 propagated to the next computation step to see whether any further
   741 common optimisations to the K-expressions. One optimisations is called
   713 constant foldings are possible. Another useful technique is common
   742 constant folding---for example if we have an expression $3 + 4$ then
   714 subexpression elimination, meaning if you have twice a calculation of
   743 we can replace it by just $5$ instead of generating code to compute
   715 a function $foo(a)$, then we want to call it only once and store the
   744 $5$. Now this information needs to be propagated to the next
   716 result in a temporary variable that can be used instead of the second,
   745 computation step to see whether any further constant foldings are
   717 or third, call to $foo(a)$. Again I leave this to the attentive reader
   746 possible. Another useful technique is common subexpression
   718 to work out and implement.
   747 elimination, meaning if you have twice a calculation of a function
       
   748 $foo(a)$, then we want to call it only once and store the result in a
       
   749 temporary variable that can be used instead of the second, or third,
       
   750 call to $foo(a)$. Again I leave this to the attentive reader to work
       
   751 out and implement.
   719 
   752 
   720 
   753 
   721 \begin{figure}[p]\small
   754 \begin{figure}[p]\small
   722 \begin{lstlisting}[language=Scala,numbers=none]
   755 \begin{lstlisting}[language=Scala,numbers=none]
   723 // Fun language (expressions)
   756 // Fun language (expressions)