| 677 |      1 | % !TEX program = xelatex
 | 
| 539 |      2 | \documentclass{article}
 | 
|  |      3 | \usepackage{../style}
 | 
|  |      4 | \usepackage{../langs}
 | 
|  |      5 | \usepackage{../graphics}
 | 
|  |      6 | \usepackage{../grammar}
 | 
|  |      7 | 
 | 
|  |      8 | \begin{document}
 | 
| 908 |      9 | \fnote{\copyright{} Christian Urban, King's College London, 2019, 2023}
 | 
| 539 |     10 | 
 | 
|  |     11 | 
 | 
| 722 |     12 | % CPS translations
 | 
|  |     13 | % https://felleisen.org/matthias/4400-s20/lecture17.html
 | 
|  |     14 | 
 | 
|  |     15 | %% pattern matching resources
 | 
|  |     16 | %% https://www.reddit.com/r/ProgrammingLanguages/comments/g1vno3/beginner_resources_for_compiling_pattern_matching/
 | 
|  |     17 | 
 | 
| 677 |     18 | \section*{Handout 9 (LLVM, SSA and CPS)}
 | 
| 539 |     19 | 
 | 
| 700 |     20 | Reflecting on our two tiny compilers targetting the JVM, the code
 | 
|  |     21 | generation part was actually not so hard, no? Pretty much just some
 | 
| 908 |     22 | post-traversal of the abstract syntax tree, yes? One of the reasons
 | 
|  |     23 | for this ease is that the JVM is a stack-based virtual machine and it
 | 
|  |     24 | is therefore not hard to translate deeply-nested arithmetic
 | 
|  |     25 | expressions into a sequence of instructions manipulating the
 | 
|  |     26 | stack. The problem is that ``real'' CPUs, although supporting stack
 | 
|  |     27 | operations, are not really designed to be \emph{stack machines}.  The
 | 
|  |     28 | design of CPUs is more like: Here are some operations and a chunk of
 | 
|  |     29 | memory---compiler, or better compiler writers, do something with
 | 
|  |     30 | them. Consequently, modern compilers need to go the extra mile in
 | 
|  |     31 | order to generate code that is much easier and faster to process by
 | 
|  |     32 | actual CPUs, rather than running code on virtual machines that
 | 
|  |     33 | introduce an additional layer of indirection. To make this all
 | 
|  |     34 | tractable for this module, we target the LLVM Intermediate
 | 
|  |     35 | Language. In this way we can take advantage of the tools coming with
 | 
|  |     36 | LLVM.\footnote{\url{http://llvm.org}} For example we do not have to
 | 
|  |     37 | worry about things like register allocations. By using LLVM, however,
 | 
|  |     38 | we also have to pay price in the sense that generating code gets
 | 
|  |     39 | harder\ldots{}unfortunately.
 | 
| 539 |     40 | 
 | 
| 908 |     41 | \subsection*{LLVM and LLVM-IR}
 | 
|  |     42 | 
 | 
|  |     43 | \noindent LLVM is a beautiful example
 | 
| 700 |     44 | that projects from Academia can make a difference in the World. LLVM
 | 
| 678 |     45 | started in 2000 as a project by two researchers at the  University of
 | 
|  |     46 | Illinois at Urbana-Champaign. At the time the behemoth of compilers was
 | 
| 908 |     47 | gcc with its myriad of front-ends for different programming languages (C++, Fortran,
 | 
| 678 |     48 | Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
 | 
| 908 |     49 | time into a monolithic gigantic piece of m\ldots ehm complicated
 | 
|  |     50 | software, which you
 | 
| 678 |     51 | could not mess about in an afternoon. In contrast, LLVM is designed to
 | 
| 700 |     52 | be a modular suite of tools with which you can play around easily and
 | 
| 678 |     53 | try out something new. LLVM became a big player once Apple hired one of
 | 
|  |     54 | the original developers (I cannot remember the reason why Apple did not
 | 
| 898 |     55 | want to use gcc, but maybe they were also just disgusted by gcc's big
 | 
| 678 |     56 | monolithic codebase). Anyway, LLVM is now the big player and gcc is more
 | 
|  |     57 | or less legacy. This does not mean that programming languages like C and
 | 
|  |     58 | C++ are dying out any time soon---they are nicely supported by LLVM.
 | 
| 539 |     59 | 
 | 
| 700 |     60 | We will target the LLVM Intermediate Language, or LLVM Intermediate
 | 
|  |     61 | Representation (short LLVM-IR). The LLVM-IR looks very similar to the
 | 
| 908 |     62 | assembly language of Jasmin and Krakatau. Targetting LLVM-IR will also
 | 
|  |     63 | allow us to benefit from the modular structure of the LLVM compiler
 | 
|  |     64 | and we can let, for example, the compiler generate code for different
 | 
|  |     65 | CPUs, say X86 or ARM.  That means we can be agnostic about where our
 | 
|  |     66 | code is actually going to run.\footnote{Anybody want to try to run our
 | 
|  |     67 |   programs on Android phones?}  We can also be somewhat ignorant about
 | 
|  |     68 | optimising our code and about allocating memory efficiently: the LLVM
 | 
|  |     69 | tools will take care of all this.
 | 
| 539 |     70 | 
 | 
| 908 |     71 | However, what we have to do in order to make LLVM to play ball is to
 | 
|  |     72 | generate code in \emph{Static Single-Assignment} format (short SSA),
 | 
|  |     73 | because that is what the LLVM-IR expects from us. A reason why LLVM
 | 
|  |     74 | uses the SSA format, rather than JVM-like stack instructions, is that
 | 
|  |     75 | stack instructions are difficult to optimise---you cannot just
 | 
|  |     76 | re-arrange instructions without messing about with what is calculated
 | 
|  |     77 | on the stack. Also it is hard to find out if all the calculations on
 | 
|  |     78 | the stack are actually necessary and not by chance dead code. The JVM
 | 
|  |     79 | has for all these obstacles sophisticated machinery to make such
 | 
|  |     80 | ``high-level'' code still run fast, but let's say that for the sake of
 | 
|  |     81 | argument we do not want to rely on it. We want to generate fast code
 | 
|  |     82 | ourselves. This means we have to work around the intricacies of what
 | 
|  |     83 | instructions CPUs can actually process fast. This is what the SSA
 | 
|  |     84 | format is designed for.
 | 
| 700 |     85 | 
 | 
|  |     86 | 
 | 
|  |     87 | The main idea behind the SSA format is to use very simple variable
 | 
| 908 |     88 | assignments where every tmp-variable is assigned only once. The
 | 
|  |     89 | assignments also need to be primitive in the sense that they can be
 | 
|  |     90 | just simple operations like addition, multiplication, jumps,
 | 
|  |     91 | comparisons, function calls and so on.  Say, we have an expression
 | 
|  |     92 | $((1 + a) + (foo(3 + 2) + (b * 5)))$, then the corresponding SSA format is
 | 
| 680 |     93 |  
 | 
|  |     94 | \begin{lstlisting}[language=LLVMIR,numbers=left]
 | 
| 908 |     95 | let tmp0 = add 1 a in
 | 
|  |     96 | let tmp1 = add 3 2 in  
 | 
|  |     97 | let tmp2 = call foo(tmp1)  
 | 
|  |     98 | let tmp3 = mul b 5 in 
 | 
|  |     99 | let tmp4 = add tmp2 tmp3 in 
 | 
|  |    100 | let tmp5 = add tmp0 tmp4 in
 | 
|  |    101 |   return tmp5 
 | 
| 677 |    102 | \end{lstlisting}
 | 
| 539 |    103 | 
 | 
| 908 |    104 | \noindent where every tmp-variable is used only once (we could, for
 | 
|  |    105 | example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if this
 | 
|  |    106 | would not change the overall result).
 | 
| 898 |    107 | 
 | 
|  |    108 | There are sophisticated algorithms for imperative languages, like C,
 | 
|  |    109 | that efficiently transform a high-level program into SSA format. But
 | 
|  |    110 | we can ignore them here. We want to compile a functional language and
 | 
|  |    111 | there things get much more interesting than just sophisticated. We
 | 
|  |    112 | will need to have a look at CPS translations, where the CPS stands for
 | 
| 678 |    113 | Continuation-Passing-Style---basically black programming art or
 | 
|  |    114 | abracadabra programming. So sit tight.
 | 
| 539 |    115 | 
 | 
| 678 |    116 | \subsection*{LLVM-IR}
 | 
| 539 |    117 | 
 | 
| 908 |    118 | Before we start, let's however first have a look at the \emph{LLVM Intermediate
 | 
| 700 |    119 | Representation} in more detail. The LLVM-IR is in between the frontends
 | 
|  |    120 | and backends of the LLVM framework. It allows compilation of multiple
 | 
|  |    121 | source languages to multiple targets. It is also the place where most of
 | 
|  |    122 | the target independent optimisations are performed. 
 | 
|  |    123 | 
 | 
| 908 |    124 | What is good about our toy Fun-language is that it basically only
 | 
| 700 |    125 | contains expressions (be they arithmetic expressions, boolean
 | 
|  |    126 | expressions or if-expressions). The exception are function definitions.
 | 
|  |    127 | Luckily, for them we can use the mechanism of defining functions in the
 | 
|  |    128 | LLVM-IR (this is similar to using JVM methods for functions in our
 | 
| 908 |    129 | earlier compiler). For example the simple Fun-program 
 | 
| 539 |    130 | 
 | 
|  |    131 | 
 | 
| 677 |    132 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
| 678 |    133 | def sqr(x) = x * x
 | 
| 677 |    134 | \end{lstlisting}
 | 
| 539 |    135 | 
 | 
| 677 |    136 | \noindent
 | 
| 700 |    137 | can be compiled to the following LLVM-IR function:
 | 
| 539 |    138 | 
 | 
| 677 |    139 | \begin{lstlisting}[language=LLVM]
 | 
| 678 |    140 | define i32 @sqr(i32 %x) {
 | 
|  |    141 |    %tmp = mul i32 %x, %x
 | 
| 677 |    142 |    ret i32 %tmp
 | 
|  |    143 | }    
 | 
|  |    144 | \end{lstlisting}
 | 
| 539 |    145 | 
 | 
| 908 |    146 | \noindent First notice that all ``local'' variable names, in this case
 | 
|  |    147 | \texttt{x} and \texttt{tmp}, are prefixed with \texttt{\%} in the
 | 
|  |    148 | LLVM-IR.  Temporary variables can be named with an identifier, such as
 | 
|  |    149 | \texttt{tmp}, or numbers. In contrast, function names, since they are
 | 
|  |    150 | ``global'', need to be prefixed with an @-symbol. Also, the LLVM-IR is
 | 
|  |    151 | a fully typed language. The \texttt{i32} type stands for 32-bit
 | 
|  |    152 | integers. There are also types for 64-bit integers (\texttt{i64}),
 | 
|  |    153 | chars (\texttt{i8}), floats, arrays and even pointer types. In the
 | 
|  |    154 | code above, \texttt{sqr} takes an argument of type \texttt{i32} and
 | 
|  |    155 | produces a result of type \texttt{i32} (the result type is in front of
 | 
|  |    156 | the function name, like in C). Each arithmetic operation, for example
 | 
|  |    157 | addition and multiplication, are also prefixed with the type they
 | 
|  |    158 | operate on. Obviously these types need to match up\ldots{} but since
 | 
|  |    159 | we have in our programs only integers, for the moment \texttt{i32}
 | 
|  |    160 | everywhere will do. We do not have to generate any other types, but
 | 
|  |    161 | obviously this is a limitation in our Fun-language (and which we
 | 
|  |    162 | lift in the final CW).
 | 
| 539 |    163 |  
 | 
| 700 |    164 | There are a few interesting instructions in the LLVM-IR which are quite
 | 
| 701 |    165 | different than what we have seen in the JVM. Can you remember the
 | 
|  |    166 | kerfuffle we had to go through with boolean expressions and negating the
 | 
| 908 |    167 | condition? In the LLVM-IR, branching  if-conditions are implemented
 | 
| 701 |    168 | differently: there is a separate \texttt{br}-instruction as follows:
 | 
| 700 |    169 | 
 | 
|  |    170 | \begin{lstlisting}[language=LLVM]
 | 
|  |    171 | br i1 %var, label %if_br, label %else_br
 | 
|  |    172 | \end{lstlisting}
 | 
|  |    173 | 
 | 
|  |    174 | \noindent
 | 
| 908 |    175 | The type \texttt{i1} stands for booleans. If the variable is true,
 | 
|  |    176 | then this instruction jumps to the if-branch, which needs an explicit
 | 
|  |    177 | label; otherwise it jumps to the else-branch, again with its own
 | 
|  |    178 | label. This allows us to keep the meaning of the boolean expression
 | 
|  |    179 | ``as is'' when compiling if's---thanks god no more negating of
 | 
|  |    180 | booleans.
 | 
|  |    181 | 
 | 
| 701 |    182 | A value of type boolean is generated in the LLVM-IR by the
 | 
|  |    183 | \texttt{icmp}-instruction. This instruction is for integers (hence the
 | 
|  |    184 | \texttt{i}) and takes the comparison operation as argument. For example
 | 
| 700 |    185 | 
 | 
|  |    186 | \begin{lstlisting}[language=LLVM]
 | 
| 898 |    187 | icmp eq  i32 %x, %y     ; for equal
 | 
|  |    188 | icmp sle i32 %x, %y     ; signed less or equal
 | 
|  |    189 | icmp slt i32 %x, %y     ; signed less than
 | 
|  |    190 | icmp ult i32 %x, %y     ; unsigned less than 
 | 
| 700 |    191 | \end{lstlisting}
 | 
|  |    192 | 
 | 
|  |    193 | \noindent
 | 
| 898 |    194 | Note that in some operations the LLVM-IR distinguishes between signed and 
 | 
| 908 |    195 | unsigned representations of integers. I assume you know what this means,
 | 
|  |    196 | otherwise just ask.
 | 
| 700 |    197 | 
 | 
| 701 |    198 | It is also easy to call another function in LLVM-IR: as can be 
 | 
|  |    199 | seen from Figure~\ref{lli} we can just call a function with the 
 | 
|  |    200 | instruction \texttt{call} and can also assign the result to 
 | 
|  |    201 | a variable. The syntax is as follows
 | 
|  |    202 | 
 | 
|  |    203 | \begin{lstlisting}[language=LLVM]
 | 
|  |    204 | %var = call i32 @foo(...args...)
 | 
|  |    205 | \end{lstlisting}
 | 
|  |    206 | 
 | 
|  |    207 | \noindent
 | 
| 908 |    208 | where the arguments can only be simple variables and numbers, but not compound
 | 
| 701 |    209 | expressions.
 | 
|  |    210 | 
 | 
| 679 |    211 | Conveniently, you can use the program \texttt{lli}, which comes with
 | 
|  |    212 | LLVM, to interpret programs written in the LLVM-IR. So you can easily
 | 
|  |    213 | check whether the code you produced actually works. To get a running
 | 
|  |    214 | program that does something interesting you need to add some boilerplate
 | 
| 701 |    215 | about printing out numbers and a main-function that is the entry point
 | 
| 908 |    216 | for the program (see Figure~\ref{lli} for a complete listing of the square function). Again
 | 
| 700 |    217 | this is very similar to the boilerplate we needed to add in our JVM
 | 
|  |    218 | compiler. 
 | 
|  |    219 | 
 | 
|  |    220 | You can generate a binary for the program in Figure~\ref{lli} by using
 | 
| 908 |    221 | the \texttt{llc}-compiler and then \texttt{gcc}/\texttt{clang}, whereby \texttt{llc} generates
 | 
|  |    222 | an object file and \texttt{gcc} (that is actually \texttt{clang} on my Mac) generates the
 | 
| 700 |    223 | executable binary:
 | 
| 678 |    224 | 
 | 
| 679 |    225 | \begin{lstlisting}[language=bash,numbers=none]
 | 
|  |    226 | llc -filetype=obj sqr.ll
 | 
|  |    227 | gcc sqr.o -o a.out
 | 
|  |    228 | ./a.out
 | 
| 680 |    229 | > 25
 | 
| 679 |    230 | \end{lstlisting}
 | 
|  |    231 | 
 | 
|  |    232 | \begin{figure}[t]\small 
 | 
|  |    233 | \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
 | 
| 700 |    234 | \caption{An LLVM-IR program for calculating the square function. It
 | 
| 908 |    235 |   calls the sqr-function in \texttt{@main} with the argument \texttt{5}
 | 
|  |    236 |   (Line 20). The
 | 
|  |    237 |   code for the \texttt{sqr} function is in Lines 13 -- 16. It stores
 | 
|  |    238 |   the result of sqr in the variable called \texttt{\%i} and then
 | 
|  |    239 |   prints out the contents of this variable in Line 21. The other
 | 
|  |    240 |   code is boilerplate for printing out integers.\label{lli}}
 | 
| 678 |    241 | \end{figure}   
 | 
|  |    242 | 
 | 
| 679 |    243 | 
 | 
|  |    244 |     
 | 
|  |    245 | \subsection*{Our Own Intermediate Language}
 | 
|  |    246 | 
 | 
| 908 |    247 | Let's get back to our compiler: Remember compilers have to solve the
 | 
|  |    248 | problem of bridging the gap between ``high-level'' programs and
 | 
|  |    249 | ``low-level'' hardware. If the gap is too wide for one step, then a
 | 
|  |    250 | good strategy is to lay a stepping stone somewhere in between. The
 | 
|  |    251 | LLVM-IR itself is such a stepping stone to make the task of generating
 | 
|  |    252 | and optimising code easier. Like a real compiler we will use our own
 | 
|  |    253 | stepping stone which I call the \emph{K-language}.
 | 
|  |    254 | 
 | 
|  |    255 | \begin{center}
 | 
|  |    256 |   \begin{tikzpicture}[scale=0.9,font=\bf,
 | 
|  |    257 |                       node/.style={
 | 
|  |    258 |                       rectangle,rounded corners=3mm,
 | 
|  |    259 |                       ultra thick,draw=black!50,minimum height=16mm, 
 | 
|  |    260 |                       minimum width=20mm,
 | 
|  |    261 |                       top color=white,bottom color=black!20}]
 | 
|  |    262 | 
 | 
|  |    263 |   %\node (0) at (-3,0) {};  
 | 
|  |    264 |   \node (A) at (0.0,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}source}] {Fun-Language};
 | 
|  |    265 |   \node (B) at (3.5,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}stepping stone}] {K-Language};
 | 
|  |    266 |   \node (C) at (7.0,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}target}] {LLVM-IR};
 | 
|  |    267 |  
 | 
|  |    268 |   \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {} (B); 
 | 
|  |    269 |   \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {} (C); 
 | 
|  |    270 |   \end{tikzpicture}
 | 
|  |    271 |   \end{center}
 | 
|  |    272 | 
 | 
|  |    273 |   \noindent
 | 
|  |    274 |   To see why we need a stepping stone for generating code in SSA-format,
 | 
|  |    275 |   considder again arithmetic expressions such as
 | 
|  |    276 |   $((1 + a) + (3 + (b * 5)))$. They need to be broken up into smaller
 | 
|  |    277 |   ``atomic'' steps, like so
 | 
| 680 |    278 |  
 | 
|  |    279 | \begin{lstlisting}[language=LLVMIR,numbers=none]
 | 
|  |    280 | let tmp0 = add 1 a in   
 | 
|  |    281 | let tmp1 = mul b 5 in 
 | 
|  |    282 | let tmp2 = add 3 tmp1 in 
 | 
|  |    283 | let tmp3 = add tmp0 tmp2 in
 | 
| 908 |    284 |   return tmp3 
 | 
|  |    285 | \end{lstlisting}
 | 
|  |    286 | 
 | 
|  |    287 | \noindent
 | 
|  |    288 | Here \texttt{tmp3} will contain the result of what the whole
 | 
|  |    289 | expression stands for. In each individual step we can only perform an
 | 
|  |    290 | ``atomic'' or ``trival'' operation, like addition or multiplication of
 | 
|  |    291 | a number or a variable.  We are not allowed to have for example a
 | 
|  |    292 | nested addition or an if-condition on the right-hand side of an
 | 
|  |    293 | assignment. Such constraints are forced upon us because of how the SSA
 | 
|  |    294 | format works in the LLVM-IR. To simplify matters we represent
 | 
|  |    295 | assignments with two kinds of syntactic entities, namely
 | 
|  |    296 | \emph{K-values} and \emph{K-expressions}. K-values are all ``things''
 | 
|  |    297 | that can appear on the right-hand side of an equal. Say we have
 | 
|  |    298 | the following definition for K-values:
 | 
|  |    299 | 
 | 
|  |    300 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
|  |    301 | enum KVal {
 | 
|  |    302 |   case KVar(s: String)
 | 
|  |    303 |   case KNum(n: Int)
 | 
|  |    304 |   case KAop(op: String, v1: KVal, v2: KVal)
 | 
|  |    305 |   case KCall(fname: String, args: List[KVal])
 | 
|  |    306 | }
 | 
| 680 |    307 | \end{lstlisting}
 | 
|  |    308 | 
 | 
|  |    309 | \noindent
 | 
| 908 |    310 | where a K-value can be a variable, a number or a ``trivial'' binary
 | 
|  |    311 | operation, such as addition or multiplication. The two arguments of a
 | 
|  |    312 | binary operation need to be K-values as well. Finally, we have
 | 
|  |    313 | function calls, but again each argument of the function call needs to
 | 
|  |    314 | be a K-value.  One point we need to be careful, however, is that the
 | 
|  |    315 | arguments of the binary operations and function calls are in fact only
 | 
|  |    316 | variables or numbers. To encode this constraint into the type-system
 | 
|  |    317 | of Scala would make things too complicated---to satisfy this
 | 
|  |    318 | constraint is therefore on us. For our Fun-language, we will later on
 | 
|  |    319 | consider some further K-values.
 | 
| 679 |    320 | 
 | 
|  |    321 | 
 | 
| 908 |    322 | Our K-expressions will encode the \texttt{let} and the \texttt{return}
 | 
|  |    323 | from the SSA-format (again for the moment we only consider these two
 | 
|  |    324 | constructors---later on we will have if-branches as well).
 | 
| 678 |    325 | 
 | 
| 908 |    326 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
|  |    327 | enum KExp {
 | 
|  |    328 |   case KLet(x: String, v: KVal, e: KExp)
 | 
|  |    329 |   case KReturn(v: KVal)
 | 
|  |    330 | }
 | 
|  |    331 | \end{lstlisting}
 | 
| 679 |    332 | 
 | 
| 908 |    333 | \noindent
 | 
|  |    334 | By having in \texttt{KLet} taking first a string (standing for an
 | 
|  |    335 | intermediate variable) and second a value, we can fulfil the SSA
 | 
|  |    336 | constraint in assignments ``by construction''---there is no way we
 | 
|  |    337 | could write anything else than a K-value.  Note that the third
 | 
|  |    338 | argument of a \texttt{KLet} is again a K-expression, meaning either
 | 
|  |    339 | another \texttt{KLet} or a \texttt{KReturn}. In this way we can
 | 
|  |    340 | construct a sequence of computations and return a final result.  Of
 | 
|  |    341 | course we also have to ensure that all intermediary computations are
 | 
|  |    342 | given (fresh) names---we will use an (ugly) counter for this.
 | 
| 679 |    343 | 
 | 
| 908 |    344 | To sum up, K-values are the atomic operations that can be on the
 | 
|  |    345 | right-hand side of assignemnts. The K-language is restricted such that
 | 
|  |    346 | it is easy to generate the SSA format for the LLVM-IR. What remains to
 | 
|  |    347 | be done is a translation of our Fun-language into the K-language. The
 | 
|  |    348 | main difficulty is that we need to order the computationa---in teh
 | 
|  |    349 | K-language we only have linear sequence of instructions to need to be
 | 
|  |    350 | followed. Before we explain this, we have a look at some
 | 
|  |    351 | CPS-translation.
 | 
| 679 |    352 | 
 | 
|  |    353 | 
 | 
| 678 |    354 | 
 | 
|  |    355 | 
 | 
|  |    356 | \subsection*{CPS-Translations}
 | 
|  |    357 | 
 | 
| 704 |    358 | CPS stands for Continuation-Passing-Style. It is a kind of programming
 | 
|  |    359 | technique often used in advanced functional programming. Before we delve
 | 
| 908 |    360 | into the CPS-translation for our Fun-language, let us look at 
 | 
| 704 |    361 | CPS-versions of some well-known functions. Consider
 | 
|  |    362 | 
 | 
|  |    363 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
|  |    364 | def fact(n: Int) : Int = 
 | 
|  |    365 |   if (n == 0) 1 else n * fact(n - 1) 
 | 
|  |    366 | \end{lstlisting}
 | 
|  |    367 | 
 | 
|  |    368 | \noindent 
 | 
|  |    369 | This is clearly the usual factorial function. But now consider the
 | 
|  |    370 | following version of the factorial function:
 | 
|  |    371 | 
 | 
|  |    372 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
|  |    373 | def factC(n: Int, ret: Int => Int) : Int = 
 | 
|  |    374 |   if (n == 0) ret(1) 
 | 
|  |    375 |   else factC(n - 1, x => ret(n * x)) 
 | 
|  |    376 | 
 | 
|  |    377 | factC(3, identity)
 | 
|  |    378 | \end{lstlisting}
 | 
|  |    379 | 
 | 
|  |    380 | \noindent
 | 
|  |    381 | This function is called with the number, in this case 3, and the
 | 
|  |    382 | identity-function (which returns just its input). The recursive
 | 
|  |    383 | calls are:
 | 
|  |    384 | 
 | 
|  |    385 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
|  |    386 | factC(2, x => identity(3 * x))
 | 
|  |    387 | factC(1, x => identity(3 * (2 * x)))
 | 
|  |    388 | factC(0, x => identity(3 * (2 * (1 * x))))
 | 
|  |    389 | \end{lstlisting}
 | 
|  |    390 | 
 | 
|  |    391 | \noindent
 | 
|  |    392 | Having reached 0, we get out of the recursion and apply 1 to the
 | 
|  |    393 | continuation (see if-branch above). This gives
 | 
|  |    394 | 
 | 
|  |    395 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
|  |    396 | identity(3 * (2 * (1 * 1)))
 | 
|  |    397 | = 3 * (2 * (1 * 1))
 | 
|  |    398 | = 6
 | 
|  |    399 | \end{lstlisting}
 | 
|  |    400 | 
 | 
|  |    401 | \noindent
 | 
| 898 |    402 | which is the expected result. If this looks somewhat familiar to you,
 | 
|  |    403 | than this is because functions with continuations can be
 | 
| 908 |    404 | seen as a kind of generalisation of tail-recursive functions.
 | 
|  |    405 | So far we did not look at this generalisation in earnest.
 | 
|  |    406 | Anyway
 | 
| 704 |    407 | notice how the continuations is ``stacked up'' during the recursion and
 | 
|  |    408 | then ``unrolled'' when we apply 1 to the continuation. Interestingly, we
 | 
|  |    409 | can do something similar to the Fibonacci function where in the traditional
 | 
|  |    410 | version we have two recursive calls. Consider the following function
 | 
|  |    411 | 
 | 
|  |    412 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
|  |    413 | def fibC(n: Int, ret: Int => Int) : Int = 
 | 
|  |    414 |   if (n == 0 || n == 1) ret(1) 
 | 
|  |    415 |   else fibC(n - 1,
 | 
|  |    416 |              r1 => fibC(n - 2,
 | 
|  |    417 |                r2 => ret(r1 + r2)))
 | 
|  |    418 | \end{lstlisting}
 | 
|  |    419 | 
 | 
|  |    420 | \noindent
 | 
|  |    421 | Here the continuation is a nested function essentially wrapping up 
 | 
|  |    422 | the second recursive call. Let us check how the recursion unfolds
 | 
|  |    423 | when called with 3 and the identity function:
 | 
|  |    424 | 
 | 
|  |    425 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
|  |    426 | fibC(3, id)
 | 
|  |    427 | fibC(2, r1 => fibC(1, r2 => id(r1 + r2)))
 | 
|  |    428 | fibC(1, r1 => 
 | 
|  |    429 |    fibC(0, r2 => fibC(1, r2a => id((r1 + r2) + r2a))))
 | 
|  |    430 | fibC(0, r2 => fibC(1, r2a => id((1 + r2) + r2a)))
 | 
|  |    431 | fibC(1, r2a => id((1 + 1) + r2a))
 | 
|  |    432 | id((1 + 1) + 1)
 | 
|  |    433 | (1 + 1) + 1
 | 
|  |    434 | 3
 | 
|  |    435 | \end{lstlisting}
 | 
|  |    436 | 
 | 
| 908 |    437 | \noindent
 | 
|  |    438 | The point of this section is that you are playing around with these
 | 
|  |    439 | simple definitions and make sure they calculate the expected result.
 | 
|  |    440 | In the next step, you should understand \emph{how} these functions
 | 
|  |    441 | calculate the result with the continuations. 
 | 
|  |    442 | 
 | 
|  |    443 | 
 | 
|  |    444 | \section*{Worked Example}
 | 
|  |    445 | 
 | 
|  |    446 | Let us now come back to the CPS-translations for the Fun-language. The
 | 
| 704 |    447 | main difficulty of generating instructions in SSA format is that large
 | 
|  |    448 | compound expressions need to be broken up into smaller pieces and
 | 
| 700 |    449 | intermediate results need to be chained into later instructions. To do
 | 
| 908 |    450 | this conveniently, we use the CPS-translation mechanism. What continuations
 | 
|  |    451 | essentially solve is the following problem: Given an expressions
 | 
|  |    452 | 
 | 
|  |    453 | \begin{equation}\label{exp}
 | 
|  |    454 | (1 + 2) * (3 + 4)
 | 
|  |    455 | \end{equation}  
 | 
|  |    456 | 
 | 
|  |    457 | \noindent
 | 
|  |    458 | which of the subexpressions should be calculated first? We just arbitrarily
 | 
|  |    459 | going to decide that it will be from left to right. This means we have
 | 
|  |    460 | to tear the expression shown in \eqref{exp} apart as follows:
 | 
|  |    461 | 
 | 
|  |    462 | \[
 | 
|  |    463 | (1 + 2) \qquad\text{and}\qquad  \Box * (3 + 4)
 | 
|  |    464 | \]  
 | 
|  |    465 | 
 | 
|  |    466 | \noindent
 | 
|  |    467 | The first one will give us a result, but the second one is not
 | 
|  |    468 | really an expression that makes sense. It will only make sense
 | 
|  |    469 | as the next step in the computation when we fill-in the result of
 | 
|  |    470 | $1+2$ into the ``hole'' indicated by $\Box$. Such incomplete
 | 
|  |    471 | expressions can be represented in Scala as functions
 | 
|  |    472 | 
 | 
|  |    473 | \begin{lstlisting}[language=Scala, numbers=none]
 | 
|  |    474 | r => r * (3 + 4)
 | 
|  |    475 | \end{lstlisting}  
 | 
|  |    476 | 
 | 
|  |    477 | \noindent
 | 
|  |    478 | where \texttt{r} is any result that has been computed earlier. By the
 | 
|  |    479 | way in lambda-calculus notation we would write
 | 
|  |    480 | $\lambda r.\, r * (3 + 4)$ for the same function. To sum up, we use
 | 
| 700 |    481 | functions (``continuations'') to represent what is coming next in a
 | 
| 908 |    482 | sequence of instructions. In our case, continuations are functions of
 | 
|  |    483 | type \code{KVal} to \code{KExp}. They can be seen as a sequence of
 | 
|  |    484 | \code{KLet}s where there is a ``hole'' that needs to be
 | 
|  |    485 | filled. Consider for example
 | 
| 678 |    486 | 
 | 
| 701 |    487 | \begin{lstlisting}[language=LLVMIR,numbers=left,escapeinside={(*@}{@*)}]
 | 
|  |    488 | let tmp0 = add 1 a in   
 | 
|  |    489 | let tmp1 = mul (*@$\Box$@*) 5 in 
 | 
|  |    490 | let tmp2 = add 3 tmp1 in 
 | 
|  |    491 | let tmp3 = add tmp0 tmp2 in
 | 
| 908 |    492 |   retrun tmp3 
 | 
| 701 |    493 | \end{lstlisting}
 | 
|  |    494 | 
 | 
|  |    495 | \noindent
 | 
|  |    496 | where in the second line is a $\Box$ which still expects a \code{KVal}
 | 
|  |    497 | to be filled in before it becomes a ``proper'' \code{KExp}. When we 
 | 
| 898 |    498 | apply an argument to the continuation (remember they are functions)
 | 
| 908 |    499 | we essentially fill something into the corresponding hole.
 | 
|  |    500 | 
 | 
|  |    501 | Lets look at concrete code. To simplify matters first, 
 | 
|  |    502 | suppose our source language consists just of three kinds of expressions
 | 
|  |    503 | 
 | 
|  |    504 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
|  |    505 | enum Expr {
 | 
|  |    506 |     case Num(n: Int)
 | 
|  |    507 |     case Bop(op: String, e1: Expr, e2: Expr)
 | 
|  |    508 |     case Call(fname: String, args: List[Expr])
 | 
|  |    509 | }
 | 
|  |    510 | \end{lstlisting}
 | 
|  |    511 | 
 | 
|  |    512 | \noindent
 | 
|  |    513 | The code of the CPS-translation is then of the form
 | 
| 679 |    514 | 
 | 
| 701 |    515 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
|  |    516 | def CPS(e: Exp)(k: KVal => KExp) : KExp = 
 | 
|  |    517 |   e match { ... }
 | 
|  |    518 | \end{lstlisting}
 | 
| 679 |    519 | 
 | 
| 701 |    520 | \noindent 
 | 
| 908 |    521 | where \code{k} is the continuation and \code{e} is the expression to
 | 
|  |    522 | be compiled. In case we have numbers, we can just pass them to the
 | 
|  |    523 | continuations because numbers need not be further teared apart as they
 | 
|  |    524 | are already atomic. Passing the number to the continuation means we 
 | 
| 701 |    525 | apply the continuation like 
 | 
| 679 |    526 | 
 | 
| 908 |    527 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
|  |    528 |   case Num(i) => k(KNum(i))
 | 
|  |    529 | \end{lstlisting}
 | 
| 679 |    530 | 
 | 
| 701 |    531 | \noindent This would just fill in the $\Box$ in a \code{KLet}-expression.
 | 
| 908 |    532 | There is no need to create a temporary variable for simple numbers.
 | 
|  |    533 | More interesting is the case for arithmetic operations.
 | 
|  |    534 | 
 | 
|  |    535 | \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm]
 | 
|  |    536 | case Aop(op, e1, e2) => {
 | 
|  |    537 |   val z = Fresh("tmp")
 | 
|  |    538 |   CPS(e1)(r1 => 
 | 
|  |    539 |     CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z)))))
 | 
|  |    540 | }
 | 
|  |    541 | \end{lstlisting}
 | 
|  |    542 | 
 | 
|  |    543 | \noindent
 | 
|  |    544 | What we essentially have to do in this case is the following: compile
 | 
|  |    545 | the subexpressions \texttt{e1} and \texttt{e2}. They will produce some
 | 
|  |    546 | result that is stored in two temporary variables (assuming they are more
 | 
|  |    547 | complicated than just numbers). We need to use these two temporary
 | 
|  |    548 | variables and feed them into a new assignment of the form
 | 
|  |    549 | 
 | 
|  |    550 | \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}]
 | 
|  |    551 | let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in
 | 
|  |    552 | ...
 | 
|  |    553 | \end{lstlisting}
 | 
| 701 |    554 | 
 | 
| 908 |    555 | \noindent
 | 
|  |    556 | Note that this assignment has two ``holes'', one for the left
 | 
|  |    557 | subexpression and one for the right subexpression. We can fill both
 | 
|  |    558 | holes by calling the CPS function twice---this is a bit of the black
 | 
|  |    559 | art in CPS. We can use the second call of CPS as the continuation
 | 
|  |    560 | of the first call: The reason is that
 | 
|  |    561 | 
 | 
|  |    562 | \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm]
 | 
|  |    563 | r1 => CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z))))
 | 
|  |    564 | \end{lstlisting}
 | 
|  |    565 | 
 | 
|  |    566 | \noindent is a function from \code{KVal} to \code{KExp}. Once
 | 
|  |    567 | we created the assignment with the fresh temporary variable
 | 
|  |    568 | \texttt{z}, we need to ``communicate'' that the result of the
 | 
|  |    569 | computation of the arithmetic expression is stored in \texttt{z} to the
 | 
|  |    570 | computations that follow. In this way we apply the continuation \texttt{k} with this
 | 
|  |    571 | new variable (essentially we are plugging in a hole further down the line).
 | 
|  |    572 | 
 | 
|  |    573 | The last case we need to consider in our small expression language are
 | 
|  |    574 | function calls.
 | 
|  |    575 | 
 | 
|  |    576 | \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm]
 | 
|  |    577 | case Call(fname, args) => {
 | 
|  |    578 |   def aux(args: List[Expr], vs: List[KVal]): KExp = args match {
 | 
|  |    579 |        case Nil => {
 | 
|  |    580 |          val z = Fresh("tmp")
 | 
|  |    581 |          KLet(z, KCall(fname, vs), k(KVar(z)))
 | 
|  |    582 |        }
 | 
|  |    583 |        case a::as => CPS(a)(r => aux(as, vs ::: List(r)))
 | 
|  |    584 |   }
 | 
|  |    585 |   aux(args, Nil)  
 | 
| 701 |    586 | }
 | 
|  |    587 | \end{lstlisting}
 | 
|  |    588 | 
 | 
|  |    589 | \noindent
 | 
| 908 |    590 | For them we introduce an auxiliary function that compiles first all
 | 
|  |    591 | function arguments---remember in our source language we can have calls
 | 
|  |    592 | such as $foo(3 + 4, g(3))$ where we first have to create temporary
 | 
|  |    593 | variables (and computations) for each argument. Therefore \texttt{aux}
 | 
|  |    594 | analyses the arguments from left to right. In case there is an argument
 | 
|  |    595 | \texttt{a} on the front of the list (the case \texttt{a::as}), we call
 | 
|  |    596 | CPS recursively for the corresponding subexpression. The temporary variable
 | 
|  |    597 | containing the result for this argument we add to the end of the K-values we
 | 
|  |    598 | have already analysed before. Again very conveniently we can use the
 | 
|  |    599 | recursive call to \texttt{aux} as the continuation for the computations
 | 
|  |    600 | that follow. If we reach the end of the argument list (the \texttt{Nil}-case),
 | 
|  |    601 | then we collect all the K-values \texttt{v1} to \texttt{vn} and call the
 | 
|  |    602 | function in the K-language like so
 | 
|  |    603 | 
 | 
|  |    604 | 
 | 
|  |    605 | \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}]
 | 
|  |    606 | let z = call foo(v1,...,vn) in
 | 
|  |    607 | ...
 | 
|  |    608 | \end{lstlisting}
 | 
|  |    609 | 
 | 
|  |    610 | \noindent
 | 
|  |    611 | Again we need to communicate the result of the function call, namely the
 | 
|  |    612 | fresh temporary variable \texttt{z}, to the rest of the computation. Therefore
 | 
|  |    613 | we apply the continuation \texttt{k} with this variable.
 | 
|  |    614 | 
 | 
|  |    615 | 
 | 
|  |    616 | 
 | 
|  |    617 | \begin{figure}[p]\small
 | 
|  |    618 |   \lstinputlisting[numbers=left,firstline=1,lastline=24]{../progs/fun/simple-cps.sc}
 | 
|  |    619 |   \hspace{10mm}...
 | 
|  |    620 |   \lstinputlisting[numbers=left,firstline=32,firstnumber=32,lastline=51]{../progs/fun/simple-cps.sc}
 | 
|  |    621 | \caption{CPS-translation for a simple expression language.\label{cps}}
 | 
|  |    622 | \end{figure}
 | 
|  |    623 | 
 | 
|  |    624 | The last question we need to answer in the code (see Figure~\ref{cps}) is how we
 | 
|  |    625 | start the CPS-translation function, or more precisely with which continuation we
 | 
|  |    626 | should start CPS? It turns out that this initial continuation will be the last one that is
 | 
|  |    627 | called. What should be the last step in the computation? Yes, we need to return the
 | 
|  |    628 | temporary variable where the last result is stored (if it is a simple number, then we can
 | 
|  |    629 | just return this number). Therefore we cal CPS with the code
 | 
|  |    630 | 
 | 
|  |    631 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
|  |    632 | def CPSi(e: Expr) : KExp = CPS(e)(KReturn(_))
 | 
|  |    633 | \end{lstlisting}
 | 
|  |    634 | 
 | 
|  |    635 | \noindent
 | 
|  |    636 | where we give the function \code{KReturn(_)}. 
 | 
|  |    637 | 
 | 
|  |    638 | 
 | 
|  |    639 | \begin{figure}[p]\small
 | 
|  |    640 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
|  |    641 | // Fun language (expressions)
 | 
|  |    642 | abstract class Exp 
 | 
|  |    643 | abstract class BExp 
 | 
|  |    644 | 
 | 
|  |    645 | case class Call(name: String, args: List[Exp]) extends Exp
 | 
|  |    646 | case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
 | 
|  |    647 | case class Write(e: Exp) extends Exp
 | 
|  |    648 | case class Var(s: String) extends Exp
 | 
|  |    649 | case class Num(i: Int) extends Exp
 | 
|  |    650 | case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
 | 
|  |    651 | case class Sequence(e1: Exp, e2: Exp) extends Exp
 | 
|  |    652 | case class Bop(o: String, a1: Exp, a2: Exp) extends BExp  
 | 
|  |    653 | 
 | 
|  |    654 | 
 | 
|  |    655 | 
 | 
|  |    656 | // K-language (K-expressions, K-values)
 | 
|  |    657 | abstract class KExp
 | 
|  |    658 | abstract class KVal
 | 
|  |    659 | 
 | 
|  |    660 | case class KVar(s: String) extends KVal
 | 
|  |    661 | case class KNum(i: Int) extends KVal
 | 
|  |    662 | case class Kop(o: String, v1: KVal, v2: KVal) extends KVal
 | 
|  |    663 | case class KCall(o: String, vrs: List[KVal]) extends KVal
 | 
|  |    664 | case class KWrite(v: KVal) extends KVal
 | 
|  |    665 | 
 | 
|  |    666 | case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
 | 
|  |    667 | case class KLet(x: String, v: KVal, e: KExp) extends KExp
 | 
|  |    668 | case class KReturn(v: KVal) extends KExp
 | 
|  |    669 | \end{lstlisting}
 | 
|  |    670 | \caption{Abstract syntax trees for the Fun-language.\label{absfun}}
 | 
|  |    671 | \end{figure}
 | 
|  |    672 | 
 | 
|  |    673 | 
 | 
| 898 |    674 | 
 | 
|  |    675 | \noindent
 | 
| 539 |    676 | \end{document}
 | 
|  |    677 | 
 | 
|  |    678 | 
 | 
|  |    679 | %%% Local Variables: 
 | 
|  |    680 | %%% mode: latex
 | 
|  |    681 | %%% TeX-master: t
 | 
|  |    682 | %%% End: 
 |