handouts/ho09.tex
changeset 700 52263ffd17b9
parent 680 eecc4d5a2172
child 701 681c36b2af27
equal deleted inserted replaced
699:7dac350b20a1 700:52263ffd17b9
    12 \fnote{\copyright{} Christian Urban, King's College London, 2019}
    12 \fnote{\copyright{} Christian Urban, King's College London, 2019}
    13 
    13 
    14 
    14 
    15 \section*{Handout 9 (LLVM, SSA and CPS)}
    15 \section*{Handout 9 (LLVM, SSA and CPS)}
    16 
    16 
    17 Reflecting on our tiny compiler targetting the JVM, the code generation
    17 Reflecting on our two tiny compilers targetting the JVM, the code
    18 part was actually not so hard, no? Pretty much just some post-traversal
    18 generation part was actually not so hard, no? Pretty much just some
    19 of the abstract syntax tree, yes? One of the reasons for this ease is
    19 post-traversal of the abstract syntax tree, yes? One of the reasons for
    20 that the JVM is a stack-based virtual machine and it is therefore not
    20 this ease is that the JVM is a stack-based virtual machine and it is
    21 hard to translate deeply-nested arithmetic expressions into a sequence
    21 therefore not hard to translate deeply-nested arithmetic expressions
    22 of instructions manipulating the stack. The problem is that ``real''
    22 into a sequence of instructions manipulating the stack. The problem is
    23 CPUs, although supporting stack operations, are not really designed to
    23 that ``real'' CPUs, although supporting stack operations, are not really
    24 be \emph{stack machines}.  The design of CPUs is more like, here is a
    24 designed to be \emph{stack machines}.  The design of CPUs is more like,
    25 chunk of memory---compiler, or better compiler writers, do something
    25 here is a chunk of memory---compiler, or better compiler writers, do
    26 with it. Consequently, modern compilers need to go the extra mile in
    26 something with it. Consequently, modern compilers need to go the extra
    27 order to generate code that is much easier and faster to process by
    27 mile in order to generate code that is much easier and faster to process
    28 CPUs. To make this all tractable for this module, we target the LLVM
    28 by CPUs. To make this all tractable for this module, we target the LLVM
    29 Intermediate Language. In this way we can take advantage of the tools
    29 Intermediate Language. In this way we can take advantage of the tools
    30 coming with LLVM. For example we do not have to worry about things like
    30 coming with LLVM. For example we do not have to worry about things like
    31 register allocations.\bigskip 
    31 register allocations.\bigskip 
    32 
    32 
    33 \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
    33 \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
    34 that projects from Academia can make a difference in the world. LLVM
    34 that projects from Academia can make a difference in the World. LLVM
    35 started in 2000 as a project by two researchers at the  University of
    35 started in 2000 as a project by two researchers at the  University of
    36 Illinois at Urbana-Champaign. At the time the behemoth of compilers was
    36 Illinois at Urbana-Champaign. At the time the behemoth of compilers was
    37 gcc with its myriad of front-ends for other languages (C++, Fortran,
    37 gcc with its myriad of front-ends for other languages (C++, Fortran,
    38 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
    38 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
    39 time into a monolithic gigantic piece of m\ldots ehm software, which you
    39 time into a monolithic gigantic piece of m\ldots ehm software, which you
    40 could not mess about in an afternoon. In contrast, LLVM is designed to
    40 could not mess about in an afternoon. In contrast, LLVM is designed to
    41 be a modular suite of tools with which you could play around easily and
    41 be a modular suite of tools with which you can play around easily and
    42 try out something new. LLVM became a big player once Apple hired one of
    42 try out something new. LLVM became a big player once Apple hired one of
    43 the original developers (I cannot remember the reason why Apple did not
    43 the original developers (I cannot remember the reason why Apple did not
    44 want to use gcc, but maybe they were also just disgusted by its big
    44 want to use gcc, but maybe they were also just disgusted by its big
    45 monolithic codebase). Anyway, LLVM is now the big player and gcc is more
    45 monolithic codebase). Anyway, LLVM is now the big player and gcc is more
    46 or less legacy. This does not mean that programming languages like C and
    46 or less legacy. This does not mean that programming languages like C and
    47 C++ are dying out any time soon---they are nicely supported by LLVM.
    47 C++ are dying out any time soon---they are nicely supported by LLVM.
    48 
    48 
    49 We will target the LLVM Intermediate Language, or Intermediate
    49 We will target the LLVM Intermediate Language, or LLVM Intermediate
    50 Representation (short LLVM-IR). As a result we can benefit from the
    50 Representation (short LLVM-IR). The LLVM-IR looks very similar to the
    51 modular structure of the LLVM compiler and let for example the compiler
    51 assembly language of Jasmin and Krakatau. It will also allow us to
    52 generate code for different CPUs, like X86 or ARM. That means we can be
    52 benefit from the modular structure of the LLVM compiler and let for
    53 agnostic about where our code actually runs. We can also be ignorant
    53 example the compiler generate code for different CPUs, like X86 or ARM.
    54 about optimising code and allocating memory efficiently. However, what
    54 That means we can be agnostic about where our code actually runs. We can
    55 we have to do is to generate code in \emph{Static Single-Assignment}
    55 also be ignorant about optimising code and allocating memory
    56 format (short SSA), because that is what the LLVM-IR expects from us. 
    56 efficiently. 
    57 
    57 
    58 The idea behind the SSA format is to use very simple variable
    58 However, what we have to do for LLVM is to generate code in \emph{Static
       
    59 Single-Assignment} format (short SSA), because that is what the LLVM-IR
       
    60 expects from us. A reason why LLVM uses the SSA format, rather than
       
    61 JVM-like stack instructions, is that stack instructions are difficult to
       
    62 optimise---you cannot just re-arrange instructions without messing about
       
    63 with what is calculated on the stack. Also it is hard to find out if all
       
    64 the calculations on the stack are actually necessary and not by chance
       
    65 dead code. The JVM has for all these obstacles sophisticated machinery
       
    66 to make such ``high-level'' code still run fast, but let's say that for
       
    67 the sake of argument we do not want to rely on it. We want to generate
       
    68 fast code ourselves. This means we have to work around the intricacies
       
    69 of what instructions CPUs can actually process fast. This is what the
       
    70 SSA format is designed for.
       
    71 
       
    72 
       
    73 The main idea behind the SSA format is to use very simple variable
    59 assignments where every variable is assigned only once. The assignments
    74 assignments where every variable is assigned only once. The assignments
    60 also need to be primitive in the sense that they can be just simple
    75 also need to be primitive in the sense that they can be just simple
    61 operations like addition, multiplication, jumps, comparisons and so on.
    76 operations like addition, multiplication, jumps, comparisons and so on.
    62 Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
    77 Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
    63 SSA is 
    78 corresponding SSA format is 
    64  
    79  
    65 \begin{lstlisting}[language=LLVMIR,numbers=left]
    80 \begin{lstlisting}[language=LLVMIR,numbers=left]
    66 let tmp0 = add 1 a in   
    81 let tmp0 = add 1 a in   
    67 let tmp1 = mul b 5 in 
    82 let tmp1 = mul b 5 in 
    68 let tmp2 = add 3 tmp1 in 
    83 let tmp2 = add 3 tmp1 in 
    69 let tmp3 = add tmp0 tmp2 in
    84 let tmp3 = add tmp0 tmp2 in tmp3 
    70   tmp3 
       
    71 \end{lstlisting}
    85 \end{lstlisting}
    72 
    86 
    73 \noindent where every variable is used only once (we could not write
    87 \noindent where every variable is used only once (we could not write
    74 \texttt{tmp1 = add 3 tmp1} in Line 3 for example).  There are
    88 \texttt{tmp1 = add 3 tmp1} in Line 3 for example).  There are
    75 sophisticated algorithms for imperative languages, like C, that
    89 sophisticated algorithms for imperative languages, like C, that
    80 Continuation-Passing-Style---basically black programming art or
    94 Continuation-Passing-Style---basically black programming art or
    81 abracadabra programming. So sit tight.
    95 abracadabra programming. So sit tight.
    82 
    96 
    83 \subsection*{LLVM-IR}
    97 \subsection*{LLVM-IR}
    84 
    98 
    85 Before we start, lets first have a look at the \emph{LLVM Intermediate
    99 Before we start, let's first have a look at the \emph{LLVM Intermediate
    86 Representation} in more detail. What is good about our toy Fun language
   100 Representation} in more detail. The LLVM-IR is in between the frontends
    87 is that it basically only contains expressions (be they arithmetic
   101 and backends of the LLVM framework. It allows compilation of multiple
    88 expressions or boolean expressions or if-expressions). The exception are
   102 source languages to multiple targets. It is also the place where most of
    89 function definitions. Luckily, for them we can use the mechanism of
   103 the target independent optimisations are performed. 
    90 defining functions in LLVM-IR. For example the simple Fun program 
   104 
       
   105 What is good about our toy Fun language is that it basically only
       
   106 contains expressions (be they arithmetic expressions, boolean
       
   107 expressions or if-expressions). The exception are function definitions.
       
   108 Luckily, for them we can use the mechanism of defining functions in the
       
   109 LLVM-IR (this is similar to using JVM methods for functions in our
       
   110 earlier compiler). For example the simple Fun program 
    91 
   111 
    92 
   112 
    93 \begin{lstlisting}[language=Scala,numbers=none]
   113 \begin{lstlisting}[language=Scala,numbers=none]
    94 def sqr(x) = x * x
   114 def sqr(x) = x * x
    95 \end{lstlisting}
   115 \end{lstlisting}
    96 
   116 
    97 \noindent
   117 \noindent
    98 can be compiled into the following LLVM-IR function:
   118 can be compiled to the following LLVM-IR function:
    99 
   119 
   100 \begin{lstlisting}[language=LLVM]
   120 \begin{lstlisting}[language=LLVM]
   101 define i32 @sqr(i32 %x) {
   121 define i32 @sqr(i32 %x) {
   102    %tmp = mul i32 %x, %x
   122    %tmp = mul i32 %x, %x
   103    ret i32 %tmp
   123    ret i32 %tmp
   104 }    
   124 }    
   105 \end{lstlisting}
   125 \end{lstlisting}
   106 
   126 
   107 \noindent First notice that all variable names in the LLVM-IR are
   127 \noindent First notice that all variable names, in this case \texttt{x}
   108 prefixed by \texttt{\%}; function names need to be prefixed with
   128 and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR.
   109 @-symbol. Also, the LLVM-IR is a fully typed language. The \texttt{i32}
   129 Temporary variables can be named with an identifier, such as
   110 type stands for 32-bit integers. There are also types for 64-bit
   130 \texttt{tmp}, or numbers. Function names, since they are ``global'',
   111 integers (\texttt{i64}), chars (\texttt{i8}), floats, arrays and even
   131 need to be prefixed with @-symbol. Also, the LLVM-IR is a fully typed
   112 pointer types. In the code above, \texttt{sqr} takes an argument of type
   132 language. The \texttt{i32} type stands for 32-bit integers. There are
   113 \texttt{i32} and produces a result of type \texttt{i32} (the result type
   133 also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}),
   114 is before the function name, like in C). Each arithmetic operation, like
   134 floats, arrays and even pointer types. In the code above, \texttt{sqr}
   115 addition or multiplication, are also prefixed with the type they operate
   135 takes an argument of type \texttt{i32} and produces a result of type
   116 on. Obviously these types need to match up\ldots{} but since we have in
   136 \texttt{i32} (the result type is in front of the function name, like in
   117 our programs only integers, \texttt{i32} everywhere will do. 
   137 C). Each arithmetic operation, for example addition and multiplication,
       
   138 are also prefixed with the type they operate on. Obviously these types
       
   139 need to match up\ldots{} but since we have in our programs only
       
   140 integers, \texttt{i32} everywhere will do. We do not have to generate
       
   141 any other types, but obviously this is a limitation in our Fun-language.
   118  
   142  
       
   143 There are a few interesting instructions in the LLVM-IR which are quite
       
   144 different than in the JVM. Can you remember the kerfuffle we had to go
       
   145 through with boolean expressions and negating the condition? In the
       
   146 LLVM-IR, branching  if-conditions is implemented differently: there
       
   147 is a separate \texttt{br}-instruction as follows:
       
   148 
       
   149 \begin{lstlisting}[language=LLVM]
       
   150 br i1 %var, label %if_br, label %else_br
       
   151 \end{lstlisting}
       
   152 
       
   153 \noindent
       
   154 The type \texttt{i1} stands for booleans. If the variable is true, then
       
   155 this instruction jumps to the if-branch, which needs an explicit label;
       
   156 otherwise to the else-branch, again with its own label. This allows us
       
   157 to keep the meaning of the boolean expression as is.  A value of type
       
   158 boolean is generated in the LLVM-IR by the \texttt{icmp}-instruction.
       
   159 This instruction is for integers (hence the \texttt{i}) and takes the
       
   160 comparison operation as argument. For example
       
   161 
       
   162 \begin{lstlisting}[language=LLVM]
       
   163 icmp eq i32  %x, %y     ; for equal
       
   164 icmp sle i32 %x, %y     ;   signed less or equal
       
   165 icmp slt i32 %x, %y     ;   signed less than
       
   166 icmp ult i32 %x, %y     ;   unsigned less than 
       
   167 \end{lstlisting}
       
   168 
       
   169 \noindent
       
   170 In some operations, the LLVM-IR distinguishes between signed and 
       
   171 unsigned representations of integers.
       
   172 
   119 Conveniently, you can use the program \texttt{lli}, which comes with
   173 Conveniently, you can use the program \texttt{lli}, which comes with
   120 LLVM, to interpret programs written in the LLVM-IR. So you can easily
   174 LLVM, to interpret programs written in the LLVM-IR. So you can easily
   121 check whether the code you produced actually works. To get a running
   175 check whether the code you produced actually works. To get a running
   122 program that does something interesting you need to add some boilerplate
   176 program that does something interesting you need to add some boilerplate
   123 about printing out numbers and a main-function that is the entrypoint
   177 about printing out numbers and a main-function that is the entrypoint
   124 for the program (see Figure~\ref{lli} for a complete listing). You can
   178 for the program (see Figure~\ref{lli} for a complete listing). Again
   125 generate a binary for this program using \texttt{llc}-compiler and
   179 this is very similar to the boilerplate we needed to add in our JVM
   126 \texttt{gcc}---\texttt{llc} can generate an object file and then you can 
   180 compiler. 
   127 use \texttt{gcc} (that is clang) for generating an executable  binary:
   181 
       
   182 You can generate a binary for the program in Figure~\ref{lli} by using
       
   183 the \texttt{llc}-compiler and then \texttt{gcc}, whereby \texttt{llc} generates
       
   184 an object file and \texttt{gcc} (that is clang) generates the
       
   185 executable binary:
   128 
   186 
   129 \begin{lstlisting}[language=bash,numbers=none]
   187 \begin{lstlisting}[language=bash,numbers=none]
   130 llc -filetype=obj sqr.ll
   188 llc -filetype=obj sqr.ll
   131 gcc sqr.o -o a.out
   189 gcc sqr.o -o a.out
   132 ./a.out
   190 ./a.out
   133 > 25
   191 > 25
   134 \end{lstlisting}
   192 \end{lstlisting}
   135 
   193 
   136 \begin{figure}[t]\small 
   194 \begin{figure}[t]\small 
   137 \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
   195 \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
   138 \caption{An LLVM-IR program for calculating the square function. The 
   196 \caption{An LLVM-IR program for calculating the square function. It
   139 interesting function is \texttt{sqr} in Lines 13 -- 16. The main
   197 calls this function in \texttt{@main} with the argument \texttt{5}. The
   140 function calls \texttt{sqr} and prints out the result. The other
   198 code for the \texttt{sqr} function is in Lines 13 -- 16. The main
       
   199 function calls \texttt{sqr} and then prints out the result. The other
   141 code is boilerplate for printing out integers.\label{lli}}
   200 code is boilerplate for printing out integers.\label{lli}}
   142 \end{figure}   
   201 \end{figure}   
   143 
   202 
   144 
   203 
   145     
   204     
   149 ``high-level'' programs and ``low-level'' hardware. If the gap is too
   208 ``high-level'' programs and ``low-level'' hardware. If the gap is too
   150 wide for one step, then a good strategy is to lay a stepping stone
   209 wide for one step, then a good strategy is to lay a stepping stone
   151 somewhere in between. The LLVM-IR itself is such a stepping stone to
   210 somewhere in between. The LLVM-IR itself is such a stepping stone to
   152 make the task of generating and optimising code easier. Like a real
   211 make the task of generating and optimising code easier. Like a real
   153 compiler we will use our own stepping stone which I call the
   212 compiler we will use our own stepping stone which I call the
   154 \emph{K-language}. For this remember expressions (and boolean
   213 \emph{K-language}. For what follows recall the various kinds of
   155 expressions) in the Fun language. For convenience the Scala code is
   214 expressions in the Fun language. For convenience the Scala code of the
   156 shown on top of Figure~\ref{absfun}. Below is the code for the
   215 corresponding abstract syntax trees is shown on top of
   157 K-language. There are two kinds of syntactic entities, namely
   216 Figure~\ref{absfun}. Below is the code for the abstract syntax trees in
   158 \emph{K-values} and \emph{K-expressions}. The central point of the
   217 the K-language. There are two kinds of syntactic entities, namely
   159 K-language is the \texttt{KLet}-constructor. For this recall that 
   218 \emph{K-values} and \emph{K-expressions}. The central constructor of the
   160 arithmetic expressions such as $((1 + a) + (3 + (b * 5)))$ need
   219 K-language is \texttt{KLet}. For this recall that arithmetic expressions
   161 to be broken up into smaller ``atomic'' steps, like so
   220 such as $((1 + a) + (3 + (b * 5)))$ need to be broken up into smaller
       
   221 ``atomic'' steps, like so
   162  
   222  
   163 \begin{lstlisting}[language=LLVMIR,numbers=none]
   223 \begin{lstlisting}[language=LLVMIR,numbers=none]
   164 let tmp0 = add 1 a in   
   224 let tmp0 = add 1 a in   
   165 let tmp1 = mul b 5 in 
   225 let tmp1 = mul b 5 in 
   166 let tmp2 = add 3 tmp1 in 
   226 let tmp2 = add 3 tmp1 in 
   167 let tmp3 = add tmp0 tmp2 in
   227 let tmp3 = add tmp0 tmp2 in
   168   tmp3 
   228   tmp3 
   169 \end{lstlisting}
   229 \end{lstlisting}
   170 
   230 
   171 \noindent
   231 \noindent
   172 Here \texttt{tmp3} will contain the result of what the expression stands
   232 Here \texttt{tmp3} will contain the result of what the whole expression
   173 for. In each step we can only perform an ``atomic'' operation, like
   233 stands for. In each individual step we can only perform an ``atomic''
   174 addition or multiplication. We could not for example have an
   234 operation, like addition or multiplication of a number and a variable.
   175 if-condition on the right-hand side of an equals. These constraints
   235 We are not allowed to have for example an if-condition on the right-hand
   176 enforced upon us because how the SSA format works in the LLVM-IR. By
   236 side of an equals. Such constraints are enforced upon us because of how
   177 having in \texttt{KLet},  first a string (standing for an intermediate
   237 the SSA format works in the LLVM-IR. By having in \texttt{KLet} taking
   178 result) and second a value, we can fulfil this constraint---there is no
   238 first a string (standing for an intermediate result) and second a value,
   179 way we could write anything else than a value. To sum up, K-values are
   239 we can fulfil this constraint ``by construction''---there is no way we
   180 the atomic operations that can be on the right-hand side of equal-signs.
   240 could write anything else than a value. 
   181 The K-language is restricted such that it is easy to generate the SSA format 
   241 
   182 for the LLVM-IR.
   242 To sum up, K-values are the atomic operations that can be on the
       
   243 right-hand side of equal-signs. The K-language is restricted such that
       
   244 it is easy to generate the SSA format for the LLVM-IR. 
   183 
   245 
   184 
   246 
   185 
   247 
   186 \begin{figure}[p]\small
   248 \begin{figure}[p]\small
   187 \begin{lstlisting}[language=Scala,numbers=none]
   249 \begin{lstlisting}[language=Scala,numbers=none]
   219 
   281 
   220 
   282 
   221 
   283 
   222 \subsection*{CPS-Translations}
   284 \subsection*{CPS-Translations}
   223 
   285 
   224 
   286 The main difficulty of generating instructions in SSA format is that
   225 
   287 large compound expressions need to be broken up into smaller pieces and
   226 
   288 intermediate results need to be chained into later instructions. To do
   227 
   289 this conveniently, CPS-translations have been developed. They use
   228 Another reason why it makes sense to go the extra mile is that stack
   290 functions (``continuations'') to represent what is coming next in a
   229 instructions are very difficult to optimise---you cannot just re-arrange
   291 sequence of instructions.
   230 instructions without messing about with what is calculated on the stack.
   292 
   231 Also it is hard to find out if all the calculations on the stack are
   293 
   232 actually necessary and not by chance dead code. The JVM has for all this
   294 
   233 sophisticated machinery to make such ``high-level'' code still run fast,
   295 
   234 but let's say that for the sake of argument we do not want to rely on
       
   235 it. We want to generate fast code ourselves. This means we have to work
       
   236 around the intricacies of what instructions CPUs can actually process.
       
   237 
   296 
   238 \end{document}
   297 \end{document}
   239 
   298 
   240 
   299 
   241 %%% Local Variables: 
   300 %%% Local Variables: