| 677 |      1 | % !TEX program = xelatex
 | 
| 539 |      2 | \documentclass{article}
 | 
|  |      3 | \usepackage{../style}
 | 
|  |      4 | \usepackage{../langs}
 | 
|  |      5 | \usepackage{../graphics}
 | 
|  |      6 | \usepackage{../grammar}
 | 
| 677 |      7 | %%\usepackage{multicol}
 | 
| 539 |      8 | 
 | 
| 677 |      9 | %%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
 | 
| 539 |     10 | 
 | 
|  |     11 | \begin{document}
 | 
| 677 |     12 | \fnote{\copyright{} Christian Urban, King's College London, 2019}
 | 
| 539 |     13 | 
 | 
|  |     14 | 
 | 
| 677 |     15 | \section*{Handout 9 (LLVM, SSA and CPS)}
 | 
| 539 |     16 | 
 | 
| 700 |     17 | Reflecting on our two tiny compilers targetting the JVM, the code
 | 
|  |     18 | generation part was actually not so hard, no? Pretty much just some
 | 
|  |     19 | post-traversal of the abstract syntax tree, yes? One of the reasons for
 | 
|  |     20 | this ease is that the JVM is a stack-based virtual machine and it is
 | 
|  |     21 | therefore not hard to translate deeply-nested arithmetic expressions
 | 
|  |     22 | into a sequence of instructions manipulating the stack. The problem is
 | 
|  |     23 | that ``real'' CPUs, although supporting stack operations, are not really
 | 
|  |     24 | designed to be \emph{stack machines}.  The design of CPUs is more like,
 | 
|  |     25 | here is a chunk of memory---compiler, or better compiler writers, do
 | 
|  |     26 | something with it. Consequently, modern compilers need to go the extra
 | 
|  |     27 | mile in order to generate code that is much easier and faster to process
 | 
|  |     28 | by CPUs. To make this all tractable for this module, we target the LLVM
 | 
| 680 |     29 | Intermediate Language. In this way we can take advantage of the tools
 | 
|  |     30 | coming with LLVM. For example we do not have to worry about things like
 | 
|  |     31 | register allocations.\bigskip 
 | 
| 539 |     32 | 
 | 
| 678 |     33 | \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
 | 
| 700 |     34 | that projects from Academia can make a difference in the World. LLVM
 | 
| 678 |     35 | started in 2000 as a project by two researchers at the  University of
 | 
|  |     36 | Illinois at Urbana-Champaign. At the time the behemoth of compilers was
 | 
| 680 |     37 | gcc with its myriad of front-ends for other languages (C++, Fortran,
 | 
| 678 |     38 | Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
 | 
|  |     39 | time into a monolithic gigantic piece of m\ldots ehm software, which you
 | 
|  |     40 | could not mess about in an afternoon. In contrast, LLVM is designed to
 | 
| 700 |     41 | be a modular suite of tools with which you can play around easily and
 | 
| 678 |     42 | try out something new. LLVM became a big player once Apple hired one of
 | 
|  |     43 | the original developers (I cannot remember the reason why Apple did not
 | 
|  |     44 | want to use gcc, but maybe they were also just disgusted by its big
 | 
|  |     45 | monolithic codebase). Anyway, LLVM is now the big player and gcc is more
 | 
|  |     46 | or less legacy. This does not mean that programming languages like C and
 | 
|  |     47 | C++ are dying out any time soon---they are nicely supported by LLVM.
 | 
| 539 |     48 | 
 | 
| 700 |     49 | We will target the LLVM Intermediate Language, or LLVM Intermediate
 | 
|  |     50 | Representation (short LLVM-IR). The LLVM-IR looks very similar to the
 | 
|  |     51 | assembly language of Jasmin and Krakatau. It will also allow us to
 | 
|  |     52 | benefit from the modular structure of the LLVM compiler and let for
 | 
|  |     53 | example the compiler generate code for different CPUs, like X86 or ARM.
 | 
|  |     54 | That means we can be agnostic about where our code actually runs. We can
 | 
|  |     55 | also be ignorant about optimising code and allocating memory
 | 
|  |     56 | efficiently. 
 | 
| 539 |     57 | 
 | 
| 700 |     58 | However, what we have to do for LLVM is to generate code in \emph{Static
 | 
|  |     59 | Single-Assignment} format (short SSA), because that is what the LLVM-IR
 | 
|  |     60 | expects from us. A reason why LLVM uses the SSA format, rather than
 | 
|  |     61 | JVM-like stack instructions, is that stack instructions are difficult to
 | 
|  |     62 | optimise---you cannot just re-arrange instructions without messing about
 | 
|  |     63 | with what is calculated on the stack. Also it is hard to find out if all
 | 
|  |     64 | the calculations on the stack are actually necessary and not by chance
 | 
|  |     65 | dead code. The JVM has for all these obstacles sophisticated machinery
 | 
|  |     66 | to make such ``high-level'' code still run fast, but let's say that for
 | 
|  |     67 | the sake of argument we do not want to rely on it. We want to generate
 | 
|  |     68 | fast code ourselves. This means we have to work around the intricacies
 | 
|  |     69 | of what instructions CPUs can actually process fast. This is what the
 | 
|  |     70 | SSA format is designed for.
 | 
|  |     71 | 
 | 
|  |     72 | 
 | 
|  |     73 | The main idea behind the SSA format is to use very simple variable
 | 
| 678 |     74 | assignments where every variable is assigned only once. The assignments
 | 
|  |     75 | also need to be primitive in the sense that they can be just simple
 | 
|  |     76 | operations like addition, multiplication, jumps, comparisons and so on.
 | 
| 680 |     77 | Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
 | 
| 700 |     78 | corresponding SSA format is 
 | 
| 680 |     79 |  
 | 
|  |     80 | \begin{lstlisting}[language=LLVMIR,numbers=left]
 | 
|  |     81 | let tmp0 = add 1 a in   
 | 
|  |     82 | let tmp1 = mul b 5 in 
 | 
|  |     83 | let tmp2 = add 3 tmp1 in 
 | 
| 700 |     84 | let tmp3 = add tmp0 tmp2 in tmp3 
 | 
| 677 |     85 | \end{lstlisting}
 | 
| 539 |     86 | 
 | 
| 678 |     87 | \noindent where every variable is used only once (we could not write
 | 
| 680 |     88 | \texttt{tmp1 = add 3 tmp1} in Line 3 for example).  There are
 | 
| 678 |     89 | sophisticated algorithms for imperative languages, like C, that
 | 
|  |     90 | efficiently transform a high-level program into SSA format. But we can
 | 
|  |     91 | ignore them here. We want to compile a functional language and there
 | 
|  |     92 | things get much more interesting than just sophisticated. We will need
 | 
|  |     93 | to have a look at CPS translations, where the CPS stands for
 | 
|  |     94 | Continuation-Passing-Style---basically black programming art or
 | 
|  |     95 | abracadabra programming. So sit tight.
 | 
| 539 |     96 | 
 | 
| 678 |     97 | \subsection*{LLVM-IR}
 | 
| 539 |     98 | 
 | 
| 700 |     99 | Before we start, let's first have a look at the \emph{LLVM Intermediate
 | 
|  |    100 | Representation} in more detail. The LLVM-IR is in between the frontends
 | 
|  |    101 | and backends of the LLVM framework. It allows compilation of multiple
 | 
|  |    102 | source languages to multiple targets. It is also the place where most of
 | 
|  |    103 | the target independent optimisations are performed. 
 | 
|  |    104 | 
 | 
|  |    105 | What is good about our toy Fun language is that it basically only
 | 
|  |    106 | contains expressions (be they arithmetic expressions, boolean
 | 
|  |    107 | expressions or if-expressions). The exception are function definitions.
 | 
|  |    108 | Luckily, for them we can use the mechanism of defining functions in the
 | 
|  |    109 | LLVM-IR (this is similar to using JVM methods for functions in our
 | 
|  |    110 | earlier compiler). For example the simple Fun program 
 | 
| 539 |    111 | 
 | 
|  |    112 | 
 | 
| 677 |    113 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
| 678 |    114 | def sqr(x) = x * x
 | 
| 677 |    115 | \end{lstlisting}
 | 
| 539 |    116 | 
 | 
| 677 |    117 | \noindent
 | 
| 700 |    118 | can be compiled to the following LLVM-IR function:
 | 
| 539 |    119 | 
 | 
| 677 |    120 | \begin{lstlisting}[language=LLVM]
 | 
| 678 |    121 | define i32 @sqr(i32 %x) {
 | 
|  |    122 |    %tmp = mul i32 %x, %x
 | 
| 677 |    123 |    ret i32 %tmp
 | 
|  |    124 | }    
 | 
|  |    125 | \end{lstlisting}
 | 
| 539 |    126 | 
 | 
| 700 |    127 | \noindent First notice that all variable names, in this case \texttt{x}
 | 
|  |    128 | and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR.
 | 
|  |    129 | Temporary variables can be named with an identifier, such as
 | 
|  |    130 | \texttt{tmp}, or numbers. Function names, since they are ``global'',
 | 
|  |    131 | need to be prefixed with @-symbol. Also, the LLVM-IR is a fully typed
 | 
|  |    132 | language. The \texttt{i32} type stands for 32-bit integers. There are
 | 
|  |    133 | also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}),
 | 
|  |    134 | floats, arrays and even pointer types. In the code above, \texttt{sqr}
 | 
|  |    135 | takes an argument of type \texttt{i32} and produces a result of type
 | 
|  |    136 | \texttt{i32} (the result type is in front of the function name, like in
 | 
|  |    137 | C). Each arithmetic operation, for example addition and multiplication,
 | 
|  |    138 | are also prefixed with the type they operate on. Obviously these types
 | 
|  |    139 | need to match up\ldots{} but since we have in our programs only
 | 
|  |    140 | integers, \texttt{i32} everywhere will do. We do not have to generate
 | 
|  |    141 | any other types, but obviously this is a limitation in our Fun-language.
 | 
| 539 |    142 |  
 | 
| 700 |    143 | There are a few interesting instructions in the LLVM-IR which are quite
 | 
|  |    144 | different than in the JVM. Can you remember the kerfuffle we had to go
 | 
|  |    145 | through with boolean expressions and negating the condition? In the
 | 
|  |    146 | LLVM-IR, branching  if-conditions is implemented differently: there
 | 
|  |    147 | is a separate \texttt{br}-instruction as follows:
 | 
|  |    148 | 
 | 
|  |    149 | \begin{lstlisting}[language=LLVM]
 | 
|  |    150 | br i1 %var, label %if_br, label %else_br
 | 
|  |    151 | \end{lstlisting}
 | 
|  |    152 | 
 | 
|  |    153 | \noindent
 | 
|  |    154 | The type \texttt{i1} stands for booleans. If the variable is true, then
 | 
|  |    155 | this instruction jumps to the if-branch, which needs an explicit label;
 | 
|  |    156 | otherwise to the else-branch, again with its own label. This allows us
 | 
|  |    157 | to keep the meaning of the boolean expression as is.  A value of type
 | 
|  |    158 | boolean is generated in the LLVM-IR by the \texttt{icmp}-instruction.
 | 
|  |    159 | This instruction is for integers (hence the \texttt{i}) and takes the
 | 
|  |    160 | comparison operation as argument. For example
 | 
|  |    161 | 
 | 
|  |    162 | \begin{lstlisting}[language=LLVM]
 | 
|  |    163 | icmp eq i32  %x, %y     ; for equal
 | 
|  |    164 | icmp sle i32 %x, %y     ;   signed less or equal
 | 
|  |    165 | icmp slt i32 %x, %y     ;   signed less than
 | 
|  |    166 | icmp ult i32 %x, %y     ;   unsigned less than 
 | 
|  |    167 | \end{lstlisting}
 | 
|  |    168 | 
 | 
|  |    169 | \noindent
 | 
|  |    170 | In some operations, the LLVM-IR distinguishes between signed and 
 | 
|  |    171 | unsigned representations of integers.
 | 
|  |    172 | 
 | 
| 679 |    173 | Conveniently, you can use the program \texttt{lli}, which comes with
 | 
|  |    174 | LLVM, to interpret programs written in the LLVM-IR. So you can easily
 | 
|  |    175 | check whether the code you produced actually works. To get a running
 | 
|  |    176 | program that does something interesting you need to add some boilerplate
 | 
|  |    177 | about printing out numbers and a main-function that is the entrypoint
 | 
| 700 |    178 | for the program (see Figure~\ref{lli} for a complete listing). Again
 | 
|  |    179 | this is very similar to the boilerplate we needed to add in our JVM
 | 
|  |    180 | compiler. 
 | 
|  |    181 | 
 | 
|  |    182 | You can generate a binary for the program in Figure~\ref{lli} by using
 | 
|  |    183 | the \texttt{llc}-compiler and then \texttt{gcc}, whereby \texttt{llc} generates
 | 
|  |    184 | an object file and \texttt{gcc} (that is clang) generates the
 | 
|  |    185 | executable binary:
 | 
| 678 |    186 | 
 | 
| 679 |    187 | \begin{lstlisting}[language=bash,numbers=none]
 | 
|  |    188 | llc -filetype=obj sqr.ll
 | 
|  |    189 | gcc sqr.o -o a.out
 | 
|  |    190 | ./a.out
 | 
| 680 |    191 | > 25
 | 
| 679 |    192 | \end{lstlisting}
 | 
|  |    193 | 
 | 
|  |    194 | \begin{figure}[t]\small 
 | 
|  |    195 | \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
 | 
| 700 |    196 | \caption{An LLVM-IR program for calculating the square function. It
 | 
|  |    197 | calls this function in \texttt{@main} with the argument \texttt{5}. The
 | 
|  |    198 | code for the \texttt{sqr} function is in Lines 13 -- 16. The main
 | 
|  |    199 | function calls \texttt{sqr} and then prints out the result. The other
 | 
| 679 |    200 | code is boilerplate for printing out integers.\label{lli}}
 | 
| 678 |    201 | \end{figure}   
 | 
|  |    202 | 
 | 
| 679 |    203 | 
 | 
|  |    204 |     
 | 
|  |    205 | \subsection*{Our Own Intermediate Language}
 | 
|  |    206 | 
 | 
|  |    207 | Remember compilers have to solve the problem of bridging the gap between
 | 
| 680 |    208 | ``high-level'' programs and ``low-level'' hardware. If the gap is too
 | 
|  |    209 | wide for one step, then a good strategy is to lay a stepping stone
 | 
|  |    210 | somewhere in between. The LLVM-IR itself is such a stepping stone to
 | 
|  |    211 | make the task of generating and optimising code easier. Like a real
 | 
|  |    212 | compiler we will use our own stepping stone which I call the
 | 
| 700 |    213 | \emph{K-language}. For what follows recall the various kinds of
 | 
|  |    214 | expressions in the Fun language. For convenience the Scala code of the
 | 
|  |    215 | corresponding abstract syntax trees is shown on top of
 | 
|  |    216 | Figure~\ref{absfun}. Below is the code for the abstract syntax trees in
 | 
|  |    217 | the K-language. There are two kinds of syntactic entities, namely
 | 
|  |    218 | \emph{K-values} and \emph{K-expressions}. The central constructor of the
 | 
|  |    219 | K-language is \texttt{KLet}. For this recall that arithmetic expressions
 | 
|  |    220 | such as $((1 + a) + (3 + (b * 5)))$ need to be broken up into smaller
 | 
|  |    221 | ``atomic'' steps, like so
 | 
| 680 |    222 |  
 | 
|  |    223 | \begin{lstlisting}[language=LLVMIR,numbers=none]
 | 
|  |    224 | let tmp0 = add 1 a in   
 | 
|  |    225 | let tmp1 = mul b 5 in 
 | 
|  |    226 | let tmp2 = add 3 tmp1 in 
 | 
|  |    227 | let tmp3 = add tmp0 tmp2 in
 | 
|  |    228 |   tmp3 
 | 
|  |    229 | \end{lstlisting}
 | 
|  |    230 | 
 | 
|  |    231 | \noindent
 | 
| 700 |    232 | Here \texttt{tmp3} will contain the result of what the whole expression
 | 
|  |    233 | stands for. In each individual step we can only perform an ``atomic''
 | 
|  |    234 | operation, like addition or multiplication of a number and a variable.
 | 
|  |    235 | We are not allowed to have for example an if-condition on the right-hand
 | 
|  |    236 | side of an equals. Such constraints are enforced upon us because of how
 | 
|  |    237 | the SSA format works in the LLVM-IR. By having in \texttt{KLet} taking
 | 
|  |    238 | first a string (standing for an intermediate result) and second a value,
 | 
|  |    239 | we can fulfil this constraint ``by construction''---there is no way we
 | 
|  |    240 | could write anything else than a value. 
 | 
|  |    241 | 
 | 
|  |    242 | To sum up, K-values are the atomic operations that can be on the
 | 
|  |    243 | right-hand side of equal-signs. The K-language is restricted such that
 | 
|  |    244 | it is easy to generate the SSA format for the LLVM-IR. 
 | 
| 680 |    245 | 
 | 
| 679 |    246 | 
 | 
|  |    247 | 
 | 
|  |    248 | \begin{figure}[p]\small
 | 
| 678 |    249 | \begin{lstlisting}[language=Scala,numbers=none]
 | 
| 679 |    250 | // Fun-language (expressions)
 | 
|  |    251 | abstract class Exp 
 | 
|  |    252 | abstract class BExp 
 | 
| 678 |    253 | 
 | 
|  |    254 | case class Call(name: String, args: List[Exp]) extends Exp
 | 
|  |    255 | case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
 | 
|  |    256 | case class Write(e: Exp) extends Exp
 | 
|  |    257 | case class Var(s: String) extends Exp
 | 
|  |    258 | case class Num(i: Int) extends Exp
 | 
|  |    259 | case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
 | 
|  |    260 | case class Sequence(e1: Exp, e2: Exp) extends Exp
 | 
| 679 |    261 | case class Bop(o: String, a1: Exp, a2: Exp) extends BExp  
 | 
|  |    262 | 
 | 
|  |    263 | 
 | 
|  |    264 | 
 | 
|  |    265 | // K-language (K-expressions, K-values)
 | 
|  |    266 | abstract class KExp
 | 
|  |    267 | abstract class KVal
 | 
|  |    268 | 
 | 
|  |    269 | case class KVar(s: String) extends KVal
 | 
|  |    270 | case class KNum(i: Int) extends KVal
 | 
|  |    271 | case class Kop(o: String, v1: KVal, v2: KVal) extends KVal
 | 
|  |    272 | case class KCall(o: String, vrs: List[KVal]) extends KVal
 | 
|  |    273 | case class KWrite(v: KVal) extends KVal
 | 
|  |    274 | 
 | 
|  |    275 | case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
 | 
| 680 |    276 | case class KLet(x: String, v: KVal, e: KExp) extends KExp
 | 
| 679 |    277 | case class KReturn(v: KVal) extends KExp
 | 
| 678 |    278 | \end{lstlisting}
 | 
|  |    279 | \caption{Abstract syntax trees for the Fun language.\label{absfun}}
 | 
|  |    280 | \end{figure}
 | 
| 679 |    281 | 
 | 
| 678 |    282 | 
 | 
|  |    283 | 
 | 
|  |    284 | \subsection*{CPS-Translations}
 | 
|  |    285 | 
 | 
| 700 |    286 | The main difficulty of generating instructions in SSA format is that
 | 
|  |    287 | large compound expressions need to be broken up into smaller pieces and
 | 
|  |    288 | intermediate results need to be chained into later instructions. To do
 | 
|  |    289 | this conveniently, CPS-translations have been developed. They use
 | 
|  |    290 | functions (``continuations'') to represent what is coming next in a
 | 
|  |    291 | sequence of instructions.
 | 
| 678 |    292 | 
 | 
| 679 |    293 | 
 | 
|  |    294 | 
 | 
|  |    295 | 
 | 
|  |    296 | 
 | 
| 539 |    297 | \end{document}
 | 
|  |    298 | 
 | 
|  |    299 | 
 | 
|  |    300 | %%% Local Variables: 
 | 
|  |    301 | %%% mode: latex
 | 
|  |    302 | %%% TeX-master: t
 | 
|  |    303 | %%% End: 
 |