|     20  |     17  | 
|     21 \section*{Handout 9 (LLVM, SSA and CPS)} |     18 \section*{Handout 9 (LLVM, SSA and CPS)} | 
|     22  |     19  | 
|     23 Reflecting on our two tiny compilers targetting the JVM, the code |     20 Reflecting on our two tiny compilers targetting the JVM, the code | 
|     24 generation part was actually not so hard, no? Pretty much just some |     21 generation part was actually not so hard, no? Pretty much just some | 
|     25 post-traversal of the abstract syntax tree, yes? One of the reasons for |     22 post-traversal of the abstract syntax tree, yes? One of the reasons | 
|     26 this ease is that the JVM is a stack-based virtual machine and it is |     23 for this ease is that the JVM is a stack-based virtual machine and it | 
|     27 therefore not hard to translate deeply-nested arithmetic expressions |     24 is therefore not hard to translate deeply-nested arithmetic | 
|     28 into a sequence of instructions manipulating the stack. The problem is |     25 expressions into a sequence of instructions manipulating the | 
|     29 that ``real'' CPUs, although supporting stack operations, are not really |     26 stack. The problem is that ``real'' CPUs, although supporting stack | 
|     30 designed to be \emph{stack machines}.  The design of CPUs is more like, |     27 operations, are not really designed to be \emph{stack machines}.  The | 
|     31 here is a chunk of memory---compiler, or better compiler writers, do |     28 design of CPUs is more like: Here are some operations and a chunk of | 
|     32 something with it. Consequently, modern compilers need to go the extra |     29 memory---compiler, or better compiler writers, do something with | 
|     33 mile in order to generate code that is much easier and faster to process |     30 them. Consequently, modern compilers need to go the extra mile in | 
|     34 by CPUs. To make this all tractable for this module, we target the LLVM |     31 order to generate code that is much easier and faster to process by | 
|     35 Intermediate Language. In this way we can take advantage of the tools |     32 actual CPUs, rather than running code on virtual machines that | 
|     36 coming with LLVM. For example we do not have to worry about things like |     33 introduce an additional layer of indirection. To make this all | 
|     37 register allocations.\bigskip  |     34 tractable for this module, we target the LLVM Intermediate | 
|     38  |     35 Language. In this way we can take advantage of the tools coming with | 
|     39 \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example |     36 LLVM.\footnote{\url{http://llvm.org}} For example we do not have to | 
|         |     37 worry about things like register allocations. By using LLVM, however, | 
|         |     38 we also have to pay price in the sense that generating code gets | 
|         |     39 harder\ldots{}unfortunately. | 
|         |     40  | 
|         |     41 \subsection*{LLVM and LLVM-IR} | 
|         |     42  | 
|         |     43 \noindent LLVM is a beautiful example | 
|     40 that projects from Academia can make a difference in the World. LLVM |     44 that projects from Academia can make a difference in the World. LLVM | 
|     41 started in 2000 as a project by two researchers at the  University of |     45 started in 2000 as a project by two researchers at the  University of | 
|     42 Illinois at Urbana-Champaign. At the time the behemoth of compilers was |     46 Illinois at Urbana-Champaign. At the time the behemoth of compilers was | 
|     43 gcc with its myriad of front-ends for other languages (C++, Fortran, |     47 gcc with its myriad of front-ends for different programming languages (C++, Fortran, | 
|     44 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over |     48 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over | 
|     45 time into a monolithic gigantic piece of m\ldots ehm software, which you |     49 time into a monolithic gigantic piece of m\ldots ehm complicated | 
|         |     50 software, which you | 
|     46 could not mess about in an afternoon. In contrast, LLVM is designed to |     51 could not mess about in an afternoon. In contrast, LLVM is designed to | 
|     47 be a modular suite of tools with which you can play around easily and |     52 be a modular suite of tools with which you can play around easily and | 
|     48 try out something new. LLVM became a big player once Apple hired one of |     53 try out something new. LLVM became a big player once Apple hired one of | 
|     49 the original developers (I cannot remember the reason why Apple did not |     54 the original developers (I cannot remember the reason why Apple did not | 
|     50 want to use gcc, but maybe they were also just disgusted by gcc's big |     55 want to use gcc, but maybe they were also just disgusted by gcc's big | 
|     52 or less legacy. This does not mean that programming languages like C and |     57 or less legacy. This does not mean that programming languages like C and | 
|     53 C++ are dying out any time soon---they are nicely supported by LLVM. |     58 C++ are dying out any time soon---they are nicely supported by LLVM. | 
|     54  |     59  | 
|     55 We will target the LLVM Intermediate Language, or LLVM Intermediate |     60 We will target the LLVM Intermediate Language, or LLVM Intermediate | 
|     56 Representation (short LLVM-IR). The LLVM-IR looks very similar to the |     61 Representation (short LLVM-IR). The LLVM-IR looks very similar to the | 
|     57 assembly language of Jasmin and Krakatau. It will also allow us to |     62 assembly language of Jasmin and Krakatau. Targetting LLVM-IR will also | 
|     58 benefit from the modular structure of the LLVM compiler and let for |     63 allow us to benefit from the modular structure of the LLVM compiler | 
|     59 example the compiler generate code for different CPUs, like X86 or ARM. |     64 and we can let, for example, the compiler generate code for different | 
|     60 That means we can be agnostic about where our code actually runs. We can |     65 CPUs, say X86 or ARM.  That means we can be agnostic about where our | 
|     61 also be ignorant about optimising code and allocating memory |     66 code is actually going to run.\footnote{Anybody want to try to run our | 
|     62 efficiently.  |     67   programs on Android phones?}  We can also be somewhat ignorant about | 
|     63  |     68 optimising our code and about allocating memory efficiently: the LLVM | 
|     64 However, what we have to do for LLVM is to generate code in \emph{Static |     69 tools will take care of all this. | 
|     65 Single-Assignment} format (short SSA), because that is what the LLVM-IR |     70  | 
|     66 expects from us. A reason why LLVM uses the SSA format, rather than |     71 However, what we have to do in order to make LLVM to play ball is to | 
|     67 JVM-like stack instructions, is that stack instructions are difficult to |     72 generate code in \emph{Static Single-Assignment} format (short SSA), | 
|     68 optimise---you cannot just re-arrange instructions without messing about |     73 because that is what the LLVM-IR expects from us. A reason why LLVM | 
|     69 with what is calculated on the stack. Also it is hard to find out if all |     74 uses the SSA format, rather than JVM-like stack instructions, is that | 
|     70 the calculations on the stack are actually necessary and not by chance |     75 stack instructions are difficult to optimise---you cannot just | 
|     71 dead code. The JVM has for all these obstacles sophisticated machinery |     76 re-arrange instructions without messing about with what is calculated | 
|     72 to make such ``high-level'' code still run fast, but let's say that for |     77 on the stack. Also it is hard to find out if all the calculations on | 
|     73 the sake of argument we do not want to rely on it. We want to generate |     78 the stack are actually necessary and not by chance dead code. The JVM | 
|     74 fast code ourselves. This means we have to work around the intricacies |     79 has for all these obstacles sophisticated machinery to make such | 
|     75 of what instructions CPUs can actually process fast. This is what the |     80 ``high-level'' code still run fast, but let's say that for the sake of | 
|     76 SSA format is designed for. |     81 argument we do not want to rely on it. We want to generate fast code | 
|         |     82 ourselves. This means we have to work around the intricacies of what | 
|         |     83 instructions CPUs can actually process fast. This is what the SSA | 
|         |     84 format is designed for. | 
|     77  |     85  | 
|     78  |     86  | 
|     79 The main idea behind the SSA format is to use very simple variable |     87 The main idea behind the SSA format is to use very simple variable | 
|     80 assignments where every tmp-variable is assigned only once. The assignments |     88 assignments where every tmp-variable is assigned only once. The | 
|     81 also need to be primitive in the sense that they can be just simple |     89 assignments also need to be primitive in the sense that they can be | 
|     82 operations like addition, multiplication, jumps, comparisons and so on. |     90 just simple operations like addition, multiplication, jumps, | 
|     83 Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the |     91 comparisons, function calls and so on.  Say, we have an expression | 
|     84 corresponding SSA format is  |     92 $((1 + a) + (foo(3 + 2) + (b * 5)))$, then the corresponding SSA format is | 
|     85   |     93   | 
|     86 \begin{lstlisting}[language=LLVMIR,numbers=left] |     94 \begin{lstlisting}[language=LLVMIR,numbers=left] | 
|     87 let tmp0 = add 1 a in    |     95 let tmp0 = add 1 a in | 
|     88 let tmp1 = mul b 5 in  |     96 let tmp1 = add 3 2 in   | 
|     89 let tmp2 = add 3 tmp1 in  |     97 let tmp2 = call foo(tmp1)   | 
|     90 let tmp3 = add tmp0 tmp2 in tmp3  |     98 let tmp3 = mul b 5 in  | 
|     91 \end{lstlisting} |     99 let tmp4 = add tmp2 tmp3 in  | 
|     92  |    100 let tmp5 = add tmp0 tmp4 in | 
|     93 \noindent where every variable is used only once (we could not write |    101   return tmp5  | 
|     94 \texttt{tmp1 = add 3 tmp1} in Line 3 for example). |    102 \end{lstlisting} | 
|         |    103  | 
|         |    104 \noindent where every tmp-variable is used only once (we could, for | 
|         |    105 example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if this | 
|         |    106 would not change the overall result). | 
|     95  |    107  | 
|     96 There are sophisticated algorithms for imperative languages, like C, |    108 There are sophisticated algorithms for imperative languages, like C, | 
|     97 that efficiently transform a high-level program into SSA format. But |    109 that efficiently transform a high-level program into SSA format. But | 
|     98 we can ignore them here. We want to compile a functional language and |    110 we can ignore them here. We want to compile a functional language and | 
|     99 there things get much more interesting than just sophisticated. We |    111 there things get much more interesting than just sophisticated. We | 
|    129    %tmp = mul i32 %x, %x |    141    %tmp = mul i32 %x, %x | 
|    130    ret i32 %tmp |    142    ret i32 %tmp | 
|    131 }     |    143 }     | 
|    132 \end{lstlisting} |    144 \end{lstlisting} | 
|    133  |    145  | 
|    134 \noindent First notice that all variable names, in this case \texttt{x} |    146 \noindent First notice that all ``local'' variable names, in this case | 
|    135 and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR. |    147 \texttt{x} and \texttt{tmp}, are prefixed with \texttt{\%} in the | 
|    136 Temporary variables can be named with an identifier, such as |    148 LLVM-IR.  Temporary variables can be named with an identifier, such as | 
|    137 \texttt{tmp}, or numbers. In contrast, function names, since they are ``global'', |    149 \texttt{tmp}, or numbers. In contrast, function names, since they are | 
|    138 need to be prefixed with an @-symbol. Also, the LLVM-IR is a fully typed |    150 ``global'', need to be prefixed with an @-symbol. Also, the LLVM-IR is | 
|    139 language. The \texttt{i32} type stands for 32-bit integers. There are |    151 a fully typed language. The \texttt{i32} type stands for 32-bit | 
|    140 also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}), |    152 integers. There are also types for 64-bit integers (\texttt{i64}), | 
|    141 floats, arrays and even pointer types. In the code above, \texttt{sqr} |    153 chars (\texttt{i8}), floats, arrays and even pointer types. In the | 
|    142 takes an argument of type \texttt{i32} and produces a result of type |    154 code above, \texttt{sqr} takes an argument of type \texttt{i32} and | 
|    143 \texttt{i32} (the result type is in front of the function name, like in |    155 produces a result of type \texttt{i32} (the result type is in front of | 
|    144 C). Each arithmetic operation, for example addition and multiplication, |    156 the function name, like in C). Each arithmetic operation, for example | 
|    145 are also prefixed with the type they operate on. Obviously these types |    157 addition and multiplication, are also prefixed with the type they | 
|    146 need to match up\ldots{} but since we have in our programs only |    158 operate on. Obviously these types need to match up\ldots{} but since | 
|    147 integers, for the moment \texttt{i32} everywhere will do. We do not have to generate |    159 we have in our programs only integers, for the moment \texttt{i32} | 
|    148 any other types, but obviously this is a limitation in our Fun language. |    160 everywhere will do. We do not have to generate any other types, but | 
|         |    161 obviously this is a limitation in our Fun-language (and which we | 
|         |    162 lift in the final CW). | 
|    149   |    163   | 
|    150 There are a few interesting instructions in the LLVM-IR which are quite |    164 There are a few interesting instructions in the LLVM-IR which are quite | 
|    151 different than what we have seen in the JVM. Can you remember the |    165 different than what we have seen in the JVM. Can you remember the | 
|    152 kerfuffle we had to go through with boolean expressions and negating the |    166 kerfuffle we had to go through with boolean expressions and negating the | 
|    153 condition? In the LLVM-IR, branching  if-conditions is implemented |    167 condition? In the LLVM-IR, branching  if-conditions are implemented | 
|    154 differently: there is a separate \texttt{br}-instruction as follows: |    168 differently: there is a separate \texttt{br}-instruction as follows: | 
|    155  |    169  | 
|    156 \begin{lstlisting}[language=LLVM] |    170 \begin{lstlisting}[language=LLVM] | 
|    157 br i1 %var, label %if_br, label %else_br |    171 br i1 %var, label %if_br, label %else_br | 
|    158 \end{lstlisting} |    172 \end{lstlisting} | 
|    159  |    173  | 
|    160 \noindent |    174 \noindent | 
|    161 The type \texttt{i1} stands for booleans. If the variable is true, then |    175 The type \texttt{i1} stands for booleans. If the variable is true, | 
|    162 this instruction jumps to the if-branch, which needs an explicit label; |    176 then this instruction jumps to the if-branch, which needs an explicit | 
|    163 otherwise it jumps to the else-branch, again with its own label. This allows us |    177 label; otherwise it jumps to the else-branch, again with its own | 
|    164 to keep the meaning of the boolean expression ``as is'' when compiling if's---thanks god no more negating the boolean. |    178 label. This allows us to keep the meaning of the boolean expression | 
|         |    179 ``as is'' when compiling if's---thanks god no more negating of | 
|         |    180 booleans. | 
|         |    181  | 
|    165 A value of type boolean is generated in the LLVM-IR by the |    182 A value of type boolean is generated in the LLVM-IR by the | 
|    166 \texttt{icmp}-instruction. This instruction is for integers (hence the |    183 \texttt{icmp}-instruction. This instruction is for integers (hence the | 
|    167 \texttt{i}) and takes the comparison operation as argument. For example |    184 \texttt{i}) and takes the comparison operation as argument. For example | 
|    168  |    185  | 
|    169 \begin{lstlisting}[language=LLVM] |    186 \begin{lstlisting}[language=LLVM] | 
|    212 \end{lstlisting} |    230 \end{lstlisting} | 
|    213  |    231  | 
|    214 \begin{figure}[t]\small  |    232 \begin{figure}[t]\small  | 
|    215 \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll} |    233 \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll} | 
|    216 \caption{An LLVM-IR program for calculating the square function. It |    234 \caption{An LLVM-IR program for calculating the square function. It | 
|    217 calls this function in \texttt{@main} with the argument \texttt{5}. The |    235   calls the sqr-function in \texttt{@main} with the argument \texttt{5} | 
|    218 code for the \texttt{sqr} function is in Lines 13 -- 16. The main |    236   (Line 20). The | 
|    219 function calls \texttt{sqr} and then prints out the result. The other |    237   code for the \texttt{sqr} function is in Lines 13 -- 16. It stores | 
|    220 code is boilerplate for printing out integers.\label{lli}} |    238   the result of sqr in the variable called \texttt{\%i} and then | 
|         |    239   prints out the contents of this variable in Line 21. The other | 
|         |    240   code is boilerplate for printing out integers.\label{lli}} | 
|    221 \end{figure}    |    241 \end{figure}    | 
|    222  |    242  | 
|    223  |    243  | 
|    224      |    244      | 
|    225 \subsection*{Our Own Intermediate Language} |    245 \subsection*{Our Own Intermediate Language} | 
|    226  |    246  | 
|    227 Remember compilers have to solve the problem of bridging the gap between |    247 Let's get back to our compiler: Remember compilers have to solve the | 
|    228 ``high-level'' programs and ``low-level'' hardware. If the gap is too |    248 problem of bridging the gap between ``high-level'' programs and | 
|    229 wide for one step, then a good strategy is to lay a stepping stone |    249 ``low-level'' hardware. If the gap is too wide for one step, then a | 
|    230 somewhere in between. The LLVM-IR itself is such a stepping stone to |    250 good strategy is to lay a stepping stone somewhere in between. The | 
|    231 make the task of generating and optimising code easier. Like a real |    251 LLVM-IR itself is such a stepping stone to make the task of generating | 
|    232 compiler we will use our own stepping stone which I call the |    252 and optimising code easier. Like a real compiler we will use our own | 
|    233 \emph{K-language}. For what follows recall the various kinds of |    253 stepping stone which I call the \emph{K-language}. | 
|    234 expressions in the Fun language. For convenience the Scala code of the |    254  | 
|    235 corresponding abstract syntax trees is shown on top of |    255 \begin{center} | 
|    236 Figure~\ref{absfun}. Below is the code for the abstract syntax trees in |    256   \begin{tikzpicture}[scale=0.9,font=\bf, | 
|    237 the K-language. In K, here are two kinds of syntactic entities, namely |    257                       node/.style={ | 
|    238 \emph{K-values} and \emph{K-expressions}. The central constructor of the |    258                       rectangle,rounded corners=3mm, | 
|    239 K-language is \texttt{KLet}. For this recall in SSA that arithmetic |    259                       ultra thick,draw=black!50,minimum height=16mm,  | 
|    240 expressions such as $((1 + a) + (3 + (b * 5)))$ need to be broken up |    260                       minimum width=20mm, | 
|    241 into smaller ``atomic'' steps, like so |    261                       top color=white,bottom color=black!20}] | 
|         |    262  | 
|         |    263   %\node (0) at (-3,0) {};   | 
|         |    264   \node (A) at (0.0,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}source}] {Fun-Language}; | 
|         |    265   \node (B) at (3.5,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}stepping stone}] {K-Language}; | 
|         |    266   \node (C) at (7.0,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}target}] {LLVM-IR}; | 
|         |    267   | 
|         |    268   \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {} (B);  | 
|         |    269   \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {} (C);  | 
|         |    270   \end{tikzpicture} | 
|         |    271   \end{center} | 
|         |    272  | 
|         |    273   \noindent | 
|         |    274   To see why we need a stepping stone for generating code in SSA-format, | 
|         |    275   considder again arithmetic expressions such as | 
|         |    276   $((1 + a) + (3 + (b * 5)))$. They need to be broken up into smaller | 
|         |    277   ``atomic'' steps, like so | 
|    242   |    278   | 
|    243 \begin{lstlisting}[language=LLVMIR,numbers=none] |    279 \begin{lstlisting}[language=LLVMIR,numbers=none] | 
|    244 let tmp0 = add 1 a in    |    280 let tmp0 = add 1 a in    | 
|    245 let tmp1 = mul b 5 in  |    281 let tmp1 = mul b 5 in  | 
|    246 let tmp2 = add 3 tmp1 in  |    282 let tmp2 = add 3 tmp1 in  | 
|    247 let tmp3 = add tmp0 tmp2 in |    283 let tmp3 = add tmp0 tmp2 in | 
|    248   tmp3  |    284   return tmp3  | 
|    249 \end{lstlisting} |    285 \end{lstlisting} | 
|    250  |    286  | 
|    251 \noindent |    287 \noindent | 
|    252 Here \texttt{tmp3} will contain the result of what the whole expression |    288 Here \texttt{tmp3} will contain the result of what the whole | 
|    253 stands for. In each individual step we can only perform an ``atomic'' |    289 expression stands for. In each individual step we can only perform an | 
|    254 operation, like addition or multiplication of a number and a variable. |    290 ``atomic'' or ``trival'' operation, like addition or multiplication of | 
|    255 We are not allowed to have for example an if-condition on the right-hand |    291 a number or a variable.  We are not allowed to have for example a | 
|    256 side of an equals. Such constraints are enforced upon us because of how |    292 nested addition or an if-condition on the right-hand side of an | 
|    257 the SSA format works in the LLVM-IR. By having in \texttt{KLet} taking |    293 assignment. Such constraints are forced upon us because of how the SSA | 
|    258 first a string (standing for an intermediate result) and second a value, |    294 format works in the LLVM-IR. To simplify matters we represent | 
|    259 we can fulfil this constraint ``by construction''---there is no way we |    295 assignments with two kinds of syntactic entities, namely | 
|    260 could write anything else than a value.  |    296 \emph{K-values} and \emph{K-expressions}. K-values are all ``things'' | 
|         |    297 that can appear on the right-hand side of an equal. Say we have | 
|         |    298 the following definition for K-values: | 
|         |    299  | 
|         |    300 \begin{lstlisting}[language=Scala,numbers=none] | 
|         |    301 enum KVal { | 
|         |    302   case KVar(s: String) | 
|         |    303   case KNum(n: Int) | 
|         |    304   case KAop(op: String, v1: KVal, v2: KVal) | 
|         |    305   case KCall(fname: String, args: List[KVal]) | 
|         |    306 } | 
|         |    307 \end{lstlisting} | 
|         |    308  | 
|         |    309 \noindent | 
|         |    310 where a K-value can be a variable, a number or a ``trivial'' binary | 
|         |    311 operation, such as addition or multiplication. The two arguments of a | 
|         |    312 binary operation need to be K-values as well. Finally, we have | 
|         |    313 function calls, but again each argument of the function call needs to | 
|         |    314 be a K-value.  One point we need to be careful, however, is that the | 
|         |    315 arguments of the binary operations and function calls are in fact only | 
|         |    316 variables or numbers. To encode this constraint into the type-system | 
|         |    317 of Scala would make things too complicated---to satisfy this | 
|         |    318 constraint is therefore on us. For our Fun-language, we will later on | 
|         |    319 consider some further K-values. | 
|         |    320  | 
|         |    321  | 
|         |    322 Our K-expressions will encode the \texttt{let} and the \texttt{return} | 
|         |    323 from the SSA-format (again for the moment we only consider these two | 
|         |    324 constructors---later on we will have if-branches as well). | 
|         |    325  | 
|         |    326 \begin{lstlisting}[language=Scala,numbers=none] | 
|         |    327 enum KExp { | 
|         |    328   case KLet(x: String, v: KVal, e: KExp) | 
|         |    329   case KReturn(v: KVal) | 
|         |    330 } | 
|         |    331 \end{lstlisting} | 
|         |    332  | 
|         |    333 \noindent | 
|         |    334 By having in \texttt{KLet} taking first a string (standing for an | 
|         |    335 intermediate variable) and second a value, we can fulfil the SSA | 
|         |    336 constraint in assignments ``by construction''---there is no way we | 
|         |    337 could write anything else than a K-value.  Note that the third | 
|         |    338 argument of a \texttt{KLet} is again a K-expression, meaning either | 
|         |    339 another \texttt{KLet} or a \texttt{KReturn}. In this way we can | 
|         |    340 construct a sequence of computations and return a final result.  Of | 
|         |    341 course we also have to ensure that all intermediary computations are | 
|         |    342 given (fresh) names---we will use an (ugly) counter for this. | 
|    261  |    343  | 
|    262 To sum up, K-values are the atomic operations that can be on the |    344 To sum up, K-values are the atomic operations that can be on the | 
|    263 right-hand side of equal-signs. The K-language is restricted such that |    345 right-hand side of assignemnts. The K-language is restricted such that | 
|    264 it is easy to generate the SSA format for the LLVM-IR.  |    346 it is easy to generate the SSA format for the LLVM-IR. What remains to | 
|    265  |    347 be done is a translation of our Fun-language into the K-language. The | 
|    266  |    348 main difficulty is that we need to order the computationa---in teh | 
|    267  |    349 K-language we only have linear sequence of instructions to need to be | 
|    268 \begin{figure}[p]\small |    350 followed. Before we explain this, we have a look at some | 
|    269 \begin{lstlisting}[language=Scala,numbers=none] |    351 CPS-translation. | 
|    270 // Fun language (expressions) |    352  | 
|    271 abstract class Exp  |         | 
|    272 abstract class BExp  |         | 
|    273  |         | 
|    274 case class Call(name: String, args: List[Exp]) extends Exp |         | 
|    275 case class If(a: BExp, e1: Exp, e2: Exp) extends Exp |         | 
|    276 case class Write(e: Exp) extends Exp |         | 
|    277 case class Var(s: String) extends Exp |         | 
|    278 case class Num(i: Int) extends Exp |         | 
|    279 case class Aop(o: String, a1: Exp, a2: Exp) extends Exp |         | 
|    280 case class Sequence(e1: Exp, e2: Exp) extends Exp |         | 
|    281 case class Bop(o: String, a1: Exp, a2: Exp) extends BExp   |         | 
|    282  |         | 
|    283  |         | 
|    284  |         | 
|    285 // K-language (K-expressions, K-values) |         | 
|    286 abstract class KExp |         | 
|    287 abstract class KVal |         | 
|    288  |         | 
|    289 case class KVar(s: String) extends KVal |         | 
|    290 case class KNum(i: Int) extends KVal |         | 
|    291 case class Kop(o: String, v1: KVal, v2: KVal) extends KVal |         | 
|    292 case class KCall(o: String, vrs: List[KVal]) extends KVal |         | 
|    293 case class KWrite(v: KVal) extends KVal |         | 
|    294  |         | 
|    295 case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp |         | 
|    296 case class KLet(x: String, v: KVal, e: KExp) extends KExp |         | 
|    297 case class KReturn(v: KVal) extends KExp |         | 
|    298 \end{lstlisting} |         | 
|    299 \caption{Abstract syntax trees for the Fun language.\label{absfun}} |         | 
|    300 \end{figure} |         | 
|    301  |    353  | 
|    302  |    354  | 
|    303  |    355  | 
|    304 \subsection*{CPS-Translations} |    356 \subsection*{CPS-Translations} | 
|    305  |    357  | 
|    306 CPS stands for Continuation-Passing-Style. It is a kind of programming |    358 CPS stands for Continuation-Passing-Style. It is a kind of programming | 
|    307 technique often used in advanced functional programming. Before we delve |    359 technique often used in advanced functional programming. Before we delve | 
|    308 into the CPS-translation for our Fun language, let us look at  |    360 into the CPS-translation for our Fun-language, let us look at  | 
|    309 CPS-versions of some well-known functions. Consider |    361 CPS-versions of some well-known functions. Consider | 
|    310  |    362  | 
|    311 \begin{lstlisting}[language=Scala, numbers=none] |    363 \begin{lstlisting}[language=Scala, numbers=none] | 
|    312 def fact(n: Int) : Int =  |    364 def fact(n: Int) : Int =  | 
|    313   if (n == 0) 1 else n * fact(n - 1)  |    365   if (n == 0) 1 else n * fact(n - 1)  | 
|    378 id((1 + 1) + 1) |    432 id((1 + 1) + 1) | 
|    379 (1 + 1) + 1 |    433 (1 + 1) + 1 | 
|    380 3 |    434 3 | 
|    381 \end{lstlisting} |    435 \end{lstlisting} | 
|    382  |    436  | 
|    383 Let us now come back to the CPS-translations for the Fun language. The |    437 \noindent | 
|         |    438 The point of this section is that you are playing around with these | 
|         |    439 simple definitions and make sure they calculate the expected result. | 
|         |    440 In the next step, you should understand \emph{how} these functions | 
|         |    441 calculate the result with the continuations.  | 
|         |    442  | 
|         |    443  | 
|         |    444 \section*{Worked Example} | 
|         |    445  | 
|         |    446 Let us now come back to the CPS-translations for the Fun-language. The | 
|    384 main difficulty of generating instructions in SSA format is that large |    447 main difficulty of generating instructions in SSA format is that large | 
|    385 compound expressions need to be broken up into smaller pieces and |    448 compound expressions need to be broken up into smaller pieces and | 
|    386 intermediate results need to be chained into later instructions. To do |    449 intermediate results need to be chained into later instructions. To do | 
|    387 this conveniently, CPS-translations have been developed. They use |    450 this conveniently, we use the CPS-translation mechanism. What continuations | 
|         |    451 essentially solve is the following problem: Given an expressions | 
|         |    452  | 
|         |    453 \begin{equation}\label{exp} | 
|         |    454 (1 + 2) * (3 + 4) | 
|         |    455 \end{equation}   | 
|         |    456  | 
|         |    457 \noindent | 
|         |    458 which of the subexpressions should be calculated first? We just arbitrarily | 
|         |    459 going to decide that it will be from left to right. This means we have | 
|         |    460 to tear the expression shown in \eqref{exp} apart as follows: | 
|         |    461  | 
|         |    462 \[ | 
|         |    463 (1 + 2) \qquad\text{and}\qquad  \Box * (3 + 4) | 
|         |    464 \]   | 
|         |    465  | 
|         |    466 \noindent | 
|         |    467 The first one will give us a result, but the second one is not | 
|         |    468 really an expression that makes sense. It will only make sense | 
|         |    469 as the next step in the computation when we fill-in the result of | 
|         |    470 $1+2$ into the ``hole'' indicated by $\Box$. Such incomplete | 
|         |    471 expressions can be represented in Scala as functions | 
|         |    472  | 
|         |    473 \begin{lstlisting}[language=Scala, numbers=none] | 
|         |    474 r => r * (3 + 4) | 
|         |    475 \end{lstlisting}   | 
|         |    476  | 
|         |    477 \noindent | 
|         |    478 where \texttt{r} is any result that has been computed earlier. By the | 
|         |    479 way in lambda-calculus notation we would write | 
|         |    480 $\lambda r.\, r * (3 + 4)$ for the same function. To sum up, we use | 
|    388 functions (``continuations'') to represent what is coming next in a |    481 functions (``continuations'') to represent what is coming next in a | 
|    389 sequence of instructions. In our case, continuations are functions of type |    482 sequence of instructions. In our case, continuations are functions of | 
|    390 \code{KVal} to \code{KExp}. They can be seen as a sequence of |    483 type \code{KVal} to \code{KExp}. They can be seen as a sequence of | 
|    391 \code{KLet}s where there is a ``hole'' that needs to be filled. Consider |    484 \code{KLet}s where there is a ``hole'' that needs to be | 
|    392 for example |    485 filled. Consider for example | 
|    393  |    486  | 
|    394 \begin{lstlisting}[language=LLVMIR,numbers=left,escapeinside={(*@}{@*)}] |    487 \begin{lstlisting}[language=LLVMIR,numbers=left,escapeinside={(*@}{@*)}] | 
|    395 let tmp0 = add 1 a in    |    488 let tmp0 = add 1 a in    | 
|    396 let tmp1 = mul (*@$\Box$@*) 5 in  |    489 let tmp1 = mul (*@$\Box$@*) 5 in  | 
|    397 let tmp2 = add 3 tmp1 in  |    490 let tmp2 = add 3 tmp1 in  | 
|    398 let tmp3 = add tmp0 tmp2 in |    491 let tmp3 = add tmp0 tmp2 in | 
|    399   tmp3  |    492   retrun tmp3  | 
|    400 \end{lstlisting} |    493 \end{lstlisting} | 
|    401  |    494  | 
|    402 \noindent |    495 \noindent | 
|    403 where in the second line is a $\Box$ which still expects a \code{KVal} |    496 where in the second line is a $\Box$ which still expects a \code{KVal} | 
|    404 to be filled in before it becomes a ``proper'' \code{KExp}. When we  |    497 to be filled in before it becomes a ``proper'' \code{KExp}. When we  | 
|    405 apply an argument to the continuation (remember they are functions) |    498 apply an argument to the continuation (remember they are functions) | 
|    406 we essentially fill something into the corresponding hole. The code |    499 we essentially fill something into the corresponding hole. | 
|    407 of the CPS-translation is  |    500  | 
|         |    501 Lets look at concrete code. To simplify matters first,  | 
|         |    502 suppose our source language consists just of three kinds of expressions | 
|         |    503  | 
|         |    504 \begin{lstlisting}[language=Scala,numbers=none] | 
|         |    505 enum Expr { | 
|         |    506     case Num(n: Int) | 
|         |    507     case Bop(op: String, e1: Expr, e2: Expr) | 
|         |    508     case Call(fname: String, args: List[Expr]) | 
|         |    509 } | 
|         |    510 \end{lstlisting} | 
|         |    511  | 
|         |    512 \noindent | 
|         |    513 The code of the CPS-translation is then of the form | 
|    408  |    514  | 
|    409 \begin{lstlisting}[language=Scala,numbers=none] |    515 \begin{lstlisting}[language=Scala,numbers=none] | 
|    410 def CPS(e: Exp)(k: KVal => KExp) : KExp =  |    516 def CPS(e: Exp)(k: KVal => KExp) : KExp =  | 
|    411   e match { ... } |    517   e match { ... } | 
|    412 \end{lstlisting} |    518 \end{lstlisting} | 
|    413  |    519  | 
|    414 \noindent  |    520 \noindent  | 
|    415 where \code{k} is the continuation and \code{e} is the expression  |    521 where \code{k} is the continuation and \code{e} is the expression to | 
|    416 to be compiled. In case we have numbers or variables, we can just |    522 be compiled. In case we have numbers, we can just pass them to the | 
|         |    523 continuations because numbers need not be further teared apart as they | 
|         |    524 are already atomic. Passing the number to the continuation means we  | 
|    417 apply the continuation like  |    525 apply the continuation like  | 
|    418  |    526  | 
|    419 \begin{center} |    527 \begin{lstlisting}[language=Scala,numbers=none] | 
|    420 \code{k(KNum(n))} \qquad \code{k(KVar(x))} |    528   case Num(i) => k(KNum(i)) | 
|    421 \end{center} |    529 \end{lstlisting} | 
|    422  |    530  | 
|    423 \noindent This would just fill in the $\Box$ in a \code{KLet}-expression. |    531 \noindent This would just fill in the $\Box$ in a \code{KLet}-expression. | 
|    424 More interesting is the case for an arithmetic expression. |    532 There is no need to create a temporary variable for simple numbers. | 
|    425  |    533 More interesting is the case for arithmetic operations. | 
|    426 \begin{lstlisting}[language=Scala,numbers=none] |    534  | 
|    427 case Aop(o, e1, e2) => { |    535 \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm] | 
|    428     val z = Fresh("tmp") |    536 case Aop(op, e1, e2) => { | 
|    429     CPS(e1)(y1 =>  |    537   val z = Fresh("tmp") | 
|    430       CPS(e2)(y2 => KLet(z, Kop(o, y1, y2), k(KVar(z))))) |    538   CPS(e1)(r1 =>  | 
|         |    539     CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z))))) | 
|    431 } |    540 } | 
|    432 \end{lstlisting} |    541 \end{lstlisting} | 
|    433  |    542  | 
|    434 \noindent |    543 \noindent | 
|    435 For more such rules, have a look to the code of the fun-llvm |    544 What we essentially have to do in this case is the following: compile | 
|    436 compiler. |    545 the subexpressions \texttt{e1} and \texttt{e2}. They will produce some | 
|         |    546 result that is stored in two temporary variables (assuming they are more | 
|         |    547 complicated than just numbers). We need to use these two temporary | 
|         |    548 variables and feed them into a new assignment of the form | 
|         |    549  | 
|         |    550 \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}] | 
|         |    551 let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in | 
|         |    552 ... | 
|         |    553 \end{lstlisting} | 
|         |    554  | 
|         |    555 \noindent | 
|         |    556 Note that this assignment has two ``holes'', one for the left | 
|         |    557 subexpression and one for the right subexpression. We can fill both | 
|         |    558 holes by calling the CPS function twice---this is a bit of the black | 
|         |    559 art in CPS. We can use the second call of CPS as the continuation | 
|         |    560 of the first call: The reason is that | 
|         |    561  | 
|         |    562 \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm] | 
|         |    563 r1 => CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z)))) | 
|         |    564 \end{lstlisting} | 
|         |    565  | 
|         |    566 \noindent is a function from \code{KVal} to \code{KExp}. Once | 
|         |    567 we created the assignment with the fresh temporary variable | 
|         |    568 \texttt{z}, we need to ``communicate'' that the result of the | 
|         |    569 computation of the arithmetic expression is stored in \texttt{z} to the | 
|         |    570 computations that follow. In this way we apply the continuation \texttt{k} with this | 
|         |    571 new variable (essentially we are plugging in a hole further down the line). | 
|         |    572  | 
|         |    573 The last case we need to consider in our small expression language are | 
|         |    574 function calls. | 
|         |    575  | 
|         |    576 \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm] | 
|         |    577 case Call(fname, args) => { | 
|         |    578   def aux(args: List[Expr], vs: List[KVal]): KExp = args match { | 
|         |    579        case Nil => { | 
|         |    580          val z = Fresh("tmp") | 
|         |    581          KLet(z, KCall(fname, vs), k(KVar(z))) | 
|         |    582        } | 
|         |    583        case a::as => CPS(a)(r => aux(as, vs ::: List(r))) | 
|         |    584   } | 
|         |    585   aux(args, Nil)   | 
|         |    586 } | 
|         |    587 \end{lstlisting} | 
|         |    588  | 
|         |    589 \noindent | 
|         |    590 For them we introduce an auxiliary function that compiles first all | 
|         |    591 function arguments---remember in our source language we can have calls | 
|         |    592 such as $foo(3 + 4, g(3))$ where we first have to create temporary | 
|         |    593 variables (and computations) for each argument. Therefore \texttt{aux} | 
|         |    594 analyses the arguments from left to right. In case there is an argument | 
|         |    595 \texttt{a} on the front of the list (the case \texttt{a::as}), we call | 
|         |    596 CPS recursively for the corresponding subexpression. The temporary variable | 
|         |    597 containing the result for this argument we add to the end of the K-values we | 
|         |    598 have already analysed before. Again very conveniently we can use the | 
|         |    599 recursive call to \texttt{aux} as the continuation for the computations | 
|         |    600 that follow. If we reach the end of the argument list (the \texttt{Nil}-case), | 
|         |    601 then we collect all the K-values \texttt{v1} to \texttt{vn} and call the | 
|         |    602 function in the K-language like so | 
|         |    603  | 
|         |    604  | 
|         |    605 \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}] | 
|         |    606 let z = call foo(v1,...,vn) in | 
|         |    607 ... | 
|         |    608 \end{lstlisting} | 
|         |    609  | 
|         |    610 \noindent | 
|         |    611 Again we need to communicate the result of the function call, namely the | 
|         |    612 fresh temporary variable \texttt{z}, to the rest of the computation. Therefore | 
|         |    613 we apply the continuation \texttt{k} with this variable. | 
|         |    614  | 
|         |    615  | 
|         |    616  | 
|         |    617 \begin{figure}[p]\small | 
|         |    618   \lstinputlisting[numbers=left,firstline=1,lastline=24]{../progs/fun/simple-cps.sc} | 
|         |    619   \hspace{10mm}... | 
|         |    620   \lstinputlisting[numbers=left,firstline=32,firstnumber=32,lastline=51]{../progs/fun/simple-cps.sc} | 
|         |    621 \caption{CPS-translation for a simple expression language.\label{cps}} | 
|         |    622 \end{figure} | 
|         |    623  | 
|         |    624 The last question we need to answer in the code (see Figure~\ref{cps}) is how we | 
|         |    625 start the CPS-translation function, or more precisely with which continuation we | 
|         |    626 should start CPS? It turns out that this initial continuation will be the last one that is | 
|         |    627 called. What should be the last step in the computation? Yes, we need to return the | 
|         |    628 temporary variable where the last result is stored (if it is a simple number, then we can | 
|         |    629 just return this number). Therefore we cal CPS with the code | 
|         |    630  | 
|         |    631 \begin{lstlisting}[language=Scala,numbers=none] | 
|         |    632 def CPSi(e: Expr) : KExp = CPS(e)(KReturn(_)) | 
|         |    633 \end{lstlisting} | 
|         |    634  | 
|         |    635 \noindent | 
|         |    636 where we give the function \code{KReturn(_)}.  | 
|         |    637  | 
|         |    638  | 
|         |    639 \begin{figure}[p]\small | 
|         |    640 \begin{lstlisting}[language=Scala,numbers=none] | 
|         |    641 // Fun language (expressions) | 
|         |    642 abstract class Exp  | 
|         |    643 abstract class BExp  | 
|         |    644  | 
|         |    645 case class Call(name: String, args: List[Exp]) extends Exp | 
|         |    646 case class If(a: BExp, e1: Exp, e2: Exp) extends Exp | 
|         |    647 case class Write(e: Exp) extends Exp | 
|         |    648 case class Var(s: String) extends Exp | 
|         |    649 case class Num(i: Int) extends Exp | 
|         |    650 case class Aop(o: String, a1: Exp, a2: Exp) extends Exp | 
|         |    651 case class Sequence(e1: Exp, e2: Exp) extends Exp | 
|         |    652 case class Bop(o: String, a1: Exp, a2: Exp) extends BExp   | 
|         |    653  | 
|         |    654  | 
|         |    655  | 
|         |    656 // K-language (K-expressions, K-values) | 
|         |    657 abstract class KExp | 
|         |    658 abstract class KVal | 
|         |    659  | 
|         |    660 case class KVar(s: String) extends KVal | 
|         |    661 case class KNum(i: Int) extends KVal | 
|         |    662 case class Kop(o: String, v1: KVal, v2: KVal) extends KVal | 
|         |    663 case class KCall(o: String, vrs: List[KVal]) extends KVal | 
|         |    664 case class KWrite(v: KVal) extends KVal | 
|         |    665  | 
|         |    666 case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp | 
|         |    667 case class KLet(x: String, v: KVal, e: KExp) extends KExp | 
|         |    668 case class KReturn(v: KVal) extends KExp | 
|         |    669 \end{lstlisting} | 
|         |    670 \caption{Abstract syntax trees for the Fun-language.\label{absfun}} | 
|         |    671 \end{figure} | 
|         |    672  | 
|         |    673  | 
|    437  |    674  | 
|    438 \noindent |    675 \noindent | 
|    439 \end{document} |    676 \end{document} | 
|    440  |    677  | 
|    441  |    678  |