12 \fnote{\copyright{} Christian Urban, King's College London, 2019} |
12 \fnote{\copyright{} Christian Urban, King's College London, 2019} |
13 |
13 |
14 |
14 |
15 \section*{Handout 9 (LLVM, SSA and CPS)} |
15 \section*{Handout 9 (LLVM, SSA and CPS)} |
16 |
16 |
17 Reflecting on our tiny compiler targetting the JVM, the code generation |
17 Reflecting on our two tiny compilers targetting the JVM, the code |
18 part was actually not so hard, no? Pretty much just some post-traversal |
18 generation part was actually not so hard, no? Pretty much just some |
19 of the abstract syntax tree, yes? One of the reasons for this ease is |
19 post-traversal of the abstract syntax tree, yes? One of the reasons for |
20 that the JVM is a stack-based virtual machine and it is therefore not |
20 this ease is that the JVM is a stack-based virtual machine and it is |
21 hard to translate deeply-nested arithmetic expressions into a sequence |
21 therefore not hard to translate deeply-nested arithmetic expressions |
22 of instructions manipulating the stack. The problem is that ``real'' |
22 into a sequence of instructions manipulating the stack. The problem is |
23 CPUs, although supporting stack operations, are not really designed to |
23 that ``real'' CPUs, although supporting stack operations, are not really |
24 be \emph{stack machines}. The design of CPUs is more like, here is a |
24 designed to be \emph{stack machines}. The design of CPUs is more like, |
25 chunk of memory---compiler, or better compiler writers, do something |
25 here is a chunk of memory---compiler, or better compiler writers, do |
26 with it. Consequently, modern compilers need to go the extra mile in |
26 something with it. Consequently, modern compilers need to go the extra |
27 order to generate code that is much easier and faster to process by |
27 mile in order to generate code that is much easier and faster to process |
28 CPUs. To make this all tractable for this module, we target the LLVM |
28 by CPUs. To make this all tractable for this module, we target the LLVM |
29 Intermediate Language. In this way we can take advantage of the tools |
29 Intermediate Language. In this way we can take advantage of the tools |
30 coming with LLVM. For example we do not have to worry about things like |
30 coming with LLVM. For example we do not have to worry about things like |
31 register allocations.\bigskip |
31 register allocations.\bigskip |
32 |
32 |
33 \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example |
33 \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example |
34 that projects from Academia can make a difference in the world. LLVM |
34 that projects from Academia can make a difference in the World. LLVM |
35 started in 2000 as a project by two researchers at the University of |
35 started in 2000 as a project by two researchers at the University of |
36 Illinois at Urbana-Champaign. At the time the behemoth of compilers was |
36 Illinois at Urbana-Champaign. At the time the behemoth of compilers was |
37 gcc with its myriad of front-ends for other languages (C++, Fortran, |
37 gcc with its myriad of front-ends for other languages (C++, Fortran, |
38 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over |
38 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over |
39 time into a monolithic gigantic piece of m\ldots ehm software, which you |
39 time into a monolithic gigantic piece of m\ldots ehm software, which you |
40 could not mess about in an afternoon. In contrast, LLVM is designed to |
40 could not mess about in an afternoon. In contrast, LLVM is designed to |
41 be a modular suite of tools with which you could play around easily and |
41 be a modular suite of tools with which you can play around easily and |
42 try out something new. LLVM became a big player once Apple hired one of |
42 try out something new. LLVM became a big player once Apple hired one of |
43 the original developers (I cannot remember the reason why Apple did not |
43 the original developers (I cannot remember the reason why Apple did not |
44 want to use gcc, but maybe they were also just disgusted by its big |
44 want to use gcc, but maybe they were also just disgusted by its big |
45 monolithic codebase). Anyway, LLVM is now the big player and gcc is more |
45 monolithic codebase). Anyway, LLVM is now the big player and gcc is more |
46 or less legacy. This does not mean that programming languages like C and |
46 or less legacy. This does not mean that programming languages like C and |
47 C++ are dying out any time soon---they are nicely supported by LLVM. |
47 C++ are dying out any time soon---they are nicely supported by LLVM. |
48 |
48 |
49 We will target the LLVM Intermediate Language, or Intermediate |
49 We will target the LLVM Intermediate Language, or LLVM Intermediate |
50 Representation (short LLVM-IR). As a result we can benefit from the |
50 Representation (short LLVM-IR). The LLVM-IR looks very similar to the |
51 modular structure of the LLVM compiler and let for example the compiler |
51 assembly language of Jasmin and Krakatau. It will also allow us to |
52 generate code for different CPUs, like X86 or ARM. That means we can be |
52 benefit from the modular structure of the LLVM compiler and let for |
53 agnostic about where our code actually runs. We can also be ignorant |
53 example the compiler generate code for different CPUs, like X86 or ARM. |
54 about optimising code and allocating memory efficiently. However, what |
54 That means we can be agnostic about where our code actually runs. We can |
55 we have to do is to generate code in \emph{Static Single-Assignment} |
55 also be ignorant about optimising code and allocating memory |
56 format (short SSA), because that is what the LLVM-IR expects from us. |
56 efficiently. |
57 |
57 |
58 The idea behind the SSA format is to use very simple variable |
58 However, what we have to do for LLVM is to generate code in \emph{Static |
|
59 Single-Assignment} format (short SSA), because that is what the LLVM-IR |
|
60 expects from us. A reason why LLVM uses the SSA format, rather than |
|
61 JVM-like stack instructions, is that stack instructions are difficult to |
|
62 optimise---you cannot just re-arrange instructions without messing about |
|
63 with what is calculated on the stack. Also it is hard to find out if all |
|
64 the calculations on the stack are actually necessary and not by chance |
|
65 dead code. The JVM has for all these obstacles sophisticated machinery |
|
66 to make such ``high-level'' code still run fast, but let's say that for |
|
67 the sake of argument we do not want to rely on it. We want to generate |
|
68 fast code ourselves. This means we have to work around the intricacies |
|
69 of what instructions CPUs can actually process fast. This is what the |
|
70 SSA format is designed for. |
|
71 |
|
72 |
|
73 The main idea behind the SSA format is to use very simple variable |
59 assignments where every variable is assigned only once. The assignments |
74 assignments where every variable is assigned only once. The assignments |
60 also need to be primitive in the sense that they can be just simple |
75 also need to be primitive in the sense that they can be just simple |
61 operations like addition, multiplication, jumps, comparisons and so on. |
76 operations like addition, multiplication, jumps, comparisons and so on. |
62 Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the |
77 Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the |
63 SSA is |
78 corresponding SSA format is |
64 |
79 |
65 \begin{lstlisting}[language=LLVMIR,numbers=left] |
80 \begin{lstlisting}[language=LLVMIR,numbers=left] |
66 let tmp0 = add 1 a in |
81 let tmp0 = add 1 a in |
67 let tmp1 = mul b 5 in |
82 let tmp1 = mul b 5 in |
68 let tmp2 = add 3 tmp1 in |
83 let tmp2 = add 3 tmp1 in |
69 let tmp3 = add tmp0 tmp2 in |
84 let tmp3 = add tmp0 tmp2 in tmp3 |
70 tmp3 |
|
71 \end{lstlisting} |
85 \end{lstlisting} |
72 |
86 |
73 \noindent where every variable is used only once (we could not write |
87 \noindent where every variable is used only once (we could not write |
74 \texttt{tmp1 = add 3 tmp1} in Line 3 for example). There are |
88 \texttt{tmp1 = add 3 tmp1} in Line 3 for example). There are |
75 sophisticated algorithms for imperative languages, like C, that |
89 sophisticated algorithms for imperative languages, like C, that |
80 Continuation-Passing-Style---basically black programming art or |
94 Continuation-Passing-Style---basically black programming art or |
81 abracadabra programming. So sit tight. |
95 abracadabra programming. So sit tight. |
82 |
96 |
83 \subsection*{LLVM-IR} |
97 \subsection*{LLVM-IR} |
84 |
98 |
85 Before we start, lets first have a look at the \emph{LLVM Intermediate |
99 Before we start, let's first have a look at the \emph{LLVM Intermediate |
86 Representation} in more detail. What is good about our toy Fun language |
100 Representation} in more detail. The LLVM-IR is in between the frontends |
87 is that it basically only contains expressions (be they arithmetic |
101 and backends of the LLVM framework. It allows compilation of multiple |
88 expressions or boolean expressions or if-expressions). The exception are |
102 source languages to multiple targets. It is also the place where most of |
89 function definitions. Luckily, for them we can use the mechanism of |
103 the target independent optimisations are performed. |
90 defining functions in LLVM-IR. For example the simple Fun program |
104 |
|
105 What is good about our toy Fun language is that it basically only |
|
106 contains expressions (be they arithmetic expressions, boolean |
|
107 expressions or if-expressions). The exception are function definitions. |
|
108 Luckily, for them we can use the mechanism of defining functions in the |
|
109 LLVM-IR (this is similar to using JVM methods for functions in our |
|
110 earlier compiler). For example the simple Fun program |
91 |
111 |
92 |
112 |
93 \begin{lstlisting}[language=Scala,numbers=none] |
113 \begin{lstlisting}[language=Scala,numbers=none] |
94 def sqr(x) = x * x |
114 def sqr(x) = x * x |
95 \end{lstlisting} |
115 \end{lstlisting} |
96 |
116 |
97 \noindent |
117 \noindent |
98 can be compiled into the following LLVM-IR function: |
118 can be compiled to the following LLVM-IR function: |
99 |
119 |
100 \begin{lstlisting}[language=LLVM] |
120 \begin{lstlisting}[language=LLVM] |
101 define i32 @sqr(i32 %x) { |
121 define i32 @sqr(i32 %x) { |
102 %tmp = mul i32 %x, %x |
122 %tmp = mul i32 %x, %x |
103 ret i32 %tmp |
123 ret i32 %tmp |
104 } |
124 } |
105 \end{lstlisting} |
125 \end{lstlisting} |
106 |
126 |
107 \noindent First notice that all variable names in the LLVM-IR are |
127 \noindent First notice that all variable names, in this case \texttt{x} |
108 prefixed by \texttt{\%}; function names need to be prefixed with |
128 and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR. |
109 @-symbol. Also, the LLVM-IR is a fully typed language. The \texttt{i32} |
129 Temporary variables can be named with an identifier, such as |
110 type stands for 32-bit integers. There are also types for 64-bit |
130 \texttt{tmp}, or numbers. Function names, since they are ``global'', |
111 integers (\texttt{i64}), chars (\texttt{i8}), floats, arrays and even |
131 need to be prefixed with @-symbol. Also, the LLVM-IR is a fully typed |
112 pointer types. In the code above, \texttt{sqr} takes an argument of type |
132 language. The \texttt{i32} type stands for 32-bit integers. There are |
113 \texttt{i32} and produces a result of type \texttt{i32} (the result type |
133 also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}), |
114 is before the function name, like in C). Each arithmetic operation, like |
134 floats, arrays and even pointer types. In the code above, \texttt{sqr} |
115 addition or multiplication, are also prefixed with the type they operate |
135 takes an argument of type \texttt{i32} and produces a result of type |
116 on. Obviously these types need to match up\ldots{} but since we have in |
136 \texttt{i32} (the result type is in front of the function name, like in |
117 our programs only integers, \texttt{i32} everywhere will do. |
137 C). Each arithmetic operation, for example addition and multiplication, |
|
138 are also prefixed with the type they operate on. Obviously these types |
|
139 need to match up\ldots{} but since we have in our programs only |
|
140 integers, \texttt{i32} everywhere will do. We do not have to generate |
|
141 any other types, but obviously this is a limitation in our Fun-language. |
118 |
142 |
|
143 There are a few interesting instructions in the LLVM-IR which are quite |
|
144 different than in the JVM. Can you remember the kerfuffle we had to go |
|
145 through with boolean expressions and negating the condition? In the |
|
146 LLVM-IR, branching if-conditions is implemented differently: there |
|
147 is a separate \texttt{br}-instruction as follows: |
|
148 |
|
149 \begin{lstlisting}[language=LLVM] |
|
150 br i1 %var, label %if_br, label %else_br |
|
151 \end{lstlisting} |
|
152 |
|
153 \noindent |
|
154 The type \texttt{i1} stands for booleans. If the variable is true, then |
|
155 this instruction jumps to the if-branch, which needs an explicit label; |
|
156 otherwise to the else-branch, again with its own label. This allows us |
|
157 to keep the meaning of the boolean expression as is. A value of type |
|
158 boolean is generated in the LLVM-IR by the \texttt{icmp}-instruction. |
|
159 This instruction is for integers (hence the \texttt{i}) and takes the |
|
160 comparison operation as argument. For example |
|
161 |
|
162 \begin{lstlisting}[language=LLVM] |
|
163 icmp eq i32 %x, %y ; for equal |
|
164 icmp sle i32 %x, %y ; signed less or equal |
|
165 icmp slt i32 %x, %y ; signed less than |
|
166 icmp ult i32 %x, %y ; unsigned less than |
|
167 \end{lstlisting} |
|
168 |
|
169 \noindent |
|
170 In some operations, the LLVM-IR distinguishes between signed and |
|
171 unsigned representations of integers. |
|
172 |
119 Conveniently, you can use the program \texttt{lli}, which comes with |
173 Conveniently, you can use the program \texttt{lli}, which comes with |
120 LLVM, to interpret programs written in the LLVM-IR. So you can easily |
174 LLVM, to interpret programs written in the LLVM-IR. So you can easily |
121 check whether the code you produced actually works. To get a running |
175 check whether the code you produced actually works. To get a running |
122 program that does something interesting you need to add some boilerplate |
176 program that does something interesting you need to add some boilerplate |
123 about printing out numbers and a main-function that is the entrypoint |
177 about printing out numbers and a main-function that is the entrypoint |
124 for the program (see Figure~\ref{lli} for a complete listing). You can |
178 for the program (see Figure~\ref{lli} for a complete listing). Again |
125 generate a binary for this program using \texttt{llc}-compiler and |
179 this is very similar to the boilerplate we needed to add in our JVM |
126 \texttt{gcc}---\texttt{llc} can generate an object file and then you can |
180 compiler. |
127 use \texttt{gcc} (that is clang) for generating an executable binary: |
181 |
|
182 You can generate a binary for the program in Figure~\ref{lli} by using |
|
183 the \texttt{llc}-compiler and then \texttt{gcc}, whereby \texttt{llc} generates |
|
184 an object file and \texttt{gcc} (that is clang) generates the |
|
185 executable binary: |
128 |
186 |
129 \begin{lstlisting}[language=bash,numbers=none] |
187 \begin{lstlisting}[language=bash,numbers=none] |
130 llc -filetype=obj sqr.ll |
188 llc -filetype=obj sqr.ll |
131 gcc sqr.o -o a.out |
189 gcc sqr.o -o a.out |
132 ./a.out |
190 ./a.out |
133 > 25 |
191 > 25 |
134 \end{lstlisting} |
192 \end{lstlisting} |
135 |
193 |
136 \begin{figure}[t]\small |
194 \begin{figure}[t]\small |
137 \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll} |
195 \lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll} |
138 \caption{An LLVM-IR program for calculating the square function. The |
196 \caption{An LLVM-IR program for calculating the square function. It |
139 interesting function is \texttt{sqr} in Lines 13 -- 16. The main |
197 calls this function in \texttt{@main} with the argument \texttt{5}. The |
140 function calls \texttt{sqr} and prints out the result. The other |
198 code for the \texttt{sqr} function is in Lines 13 -- 16. The main |
|
199 function calls \texttt{sqr} and then prints out the result. The other |
141 code is boilerplate for printing out integers.\label{lli}} |
200 code is boilerplate for printing out integers.\label{lli}} |
142 \end{figure} |
201 \end{figure} |
143 |
202 |
144 |
203 |
145 |
204 |
149 ``high-level'' programs and ``low-level'' hardware. If the gap is too |
208 ``high-level'' programs and ``low-level'' hardware. If the gap is too |
150 wide for one step, then a good strategy is to lay a stepping stone |
209 wide for one step, then a good strategy is to lay a stepping stone |
151 somewhere in between. The LLVM-IR itself is such a stepping stone to |
210 somewhere in between. The LLVM-IR itself is such a stepping stone to |
152 make the task of generating and optimising code easier. Like a real |
211 make the task of generating and optimising code easier. Like a real |
153 compiler we will use our own stepping stone which I call the |
212 compiler we will use our own stepping stone which I call the |
154 \emph{K-language}. For this remember expressions (and boolean |
213 \emph{K-language}. For what follows recall the various kinds of |
155 expressions) in the Fun language. For convenience the Scala code is |
214 expressions in the Fun language. For convenience the Scala code of the |
156 shown on top of Figure~\ref{absfun}. Below is the code for the |
215 corresponding abstract syntax trees is shown on top of |
157 K-language. There are two kinds of syntactic entities, namely |
216 Figure~\ref{absfun}. Below is the code for the abstract syntax trees in |
158 \emph{K-values} and \emph{K-expressions}. The central point of the |
217 the K-language. There are two kinds of syntactic entities, namely |
159 K-language is the \texttt{KLet}-constructor. For this recall that |
218 \emph{K-values} and \emph{K-expressions}. The central constructor of the |
160 arithmetic expressions such as $((1 + a) + (3 + (b * 5)))$ need |
219 K-language is \texttt{KLet}. For this recall that arithmetic expressions |
161 to be broken up into smaller ``atomic'' steps, like so |
220 such as $((1 + a) + (3 + (b * 5)))$ need to be broken up into smaller |
|
221 ``atomic'' steps, like so |
162 |
222 |
163 \begin{lstlisting}[language=LLVMIR,numbers=none] |
223 \begin{lstlisting}[language=LLVMIR,numbers=none] |
164 let tmp0 = add 1 a in |
224 let tmp0 = add 1 a in |
165 let tmp1 = mul b 5 in |
225 let tmp1 = mul b 5 in |
166 let tmp2 = add 3 tmp1 in |
226 let tmp2 = add 3 tmp1 in |
167 let tmp3 = add tmp0 tmp2 in |
227 let tmp3 = add tmp0 tmp2 in |
168 tmp3 |
228 tmp3 |
169 \end{lstlisting} |
229 \end{lstlisting} |
170 |
230 |
171 \noindent |
231 \noindent |
172 Here \texttt{tmp3} will contain the result of what the expression stands |
232 Here \texttt{tmp3} will contain the result of what the whole expression |
173 for. In each step we can only perform an ``atomic'' operation, like |
233 stands for. In each individual step we can only perform an ``atomic'' |
174 addition or multiplication. We could not for example have an |
234 operation, like addition or multiplication of a number and a variable. |
175 if-condition on the right-hand side of an equals. These constraints |
235 We are not allowed to have for example an if-condition on the right-hand |
176 enforced upon us because how the SSA format works in the LLVM-IR. By |
236 side of an equals. Such constraints are enforced upon us because of how |
177 having in \texttt{KLet}, first a string (standing for an intermediate |
237 the SSA format works in the LLVM-IR. By having in \texttt{KLet} taking |
178 result) and second a value, we can fulfil this constraint---there is no |
238 first a string (standing for an intermediate result) and second a value, |
179 way we could write anything else than a value. To sum up, K-values are |
239 we can fulfil this constraint ``by construction''---there is no way we |
180 the atomic operations that can be on the right-hand side of equal-signs. |
240 could write anything else than a value. |
181 The K-language is restricted such that it is easy to generate the SSA format |
241 |
182 for the LLVM-IR. |
242 To sum up, K-values are the atomic operations that can be on the |
|
243 right-hand side of equal-signs. The K-language is restricted such that |
|
244 it is easy to generate the SSA format for the LLVM-IR. |
183 |
245 |
184 |
246 |
185 |
247 |
186 \begin{figure}[p]\small |
248 \begin{figure}[p]\small |
187 \begin{lstlisting}[language=Scala,numbers=none] |
249 \begin{lstlisting}[language=Scala,numbers=none] |