677
|
1 |
% !TEX program = xelatex
|
539
|
2 |
\documentclass{article}
|
|
3 |
\usepackage{../style}
|
|
4 |
\usepackage{../langs}
|
|
5 |
\usepackage{../graphics}
|
|
6 |
\usepackage{../grammar}
|
|
7 |
|
|
8 |
\begin{document}
|
908
|
9 |
\fnote{\copyright{} Christian Urban, King's College London, 2019, 2023}
|
539
|
10 |
|
|
11 |
|
722
|
12 |
% CPS translations
|
|
13 |
% https://felleisen.org/matthias/4400-s20/lecture17.html
|
|
14 |
|
|
15 |
%% pattern matching resources
|
|
16 |
%% https://www.reddit.com/r/ProgrammingLanguages/comments/g1vno3/beginner_resources_for_compiling_pattern_matching/
|
|
17 |
|
677
|
18 |
\section*{Handout 9 (LLVM, SSA and CPS)}
|
539
|
19 |
|
700
|
20 |
Reflecting on our two tiny compilers targetting the JVM, the code
|
|
21 |
generation part was actually not so hard, no? Pretty much just some
|
913
|
22 |
post-traversal of the abstract syntax tree. Yes? One of the reasons
|
908
|
23 |
for this ease is that the JVM is a stack-based virtual machine and it
|
|
24 |
is therefore not hard to translate deeply-nested arithmetic
|
|
25 |
expressions into a sequence of instructions manipulating the
|
913
|
26 |
stack. That is pretty much the whole point of the JVM. The problem is
|
|
27 |
that ``real'' CPUs, although supporting stack operations, are not
|
|
28 |
really designed to be \emph{stack machines}. The design of CPUs is
|
|
29 |
more like: Here are some instructions and a chunk of
|
908
|
30 |
memory---compiler, or better compiler writers, do something with
|
|
31 |
them. Consequently, modern compilers need to go the extra mile in
|
|
32 |
order to generate code that is much easier and faster to process by
|
|
33 |
actual CPUs, rather than running code on virtual machines that
|
|
34 |
introduce an additional layer of indirection. To make this all
|
913
|
35 |
tractable for this module, we target the \emph{LLVM Intermediate
|
|
36 |
Language} (LLVM-IR). In this way we can take advantage of the tools
|
|
37 |
coming with LLVM.\footnote{\url{http://llvm.org}} For example we do
|
|
38 |
not have to worry about things like register allocations or peephole
|
|
39 |
optimisations. By using the LLVM-IR, however, we also have to pay
|
|
40 |
price in the sense that generating code gets
|
|
41 |
harder\ldots{}unfor\-tunately nothing comes for free in life.
|
539
|
42 |
|
909
|
43 |
\subsection*{LLVM and the LLVM-IR}
|
908
|
44 |
|
|
45 |
\noindent LLVM is a beautiful example
|
913
|
46 |
that projects from Academia can make a difference in the Real World. LLVM
|
678
|
47 |
started in 2000 as a project by two researchers at the University of
|
|
48 |
Illinois at Urbana-Champaign. At the time the behemoth of compilers was
|
908
|
49 |
gcc with its myriad of front-ends for different programming languages (C++, Fortran,
|
678
|
50 |
Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
|
908
|
51 |
time into a monolithic gigantic piece of m\ldots ehm complicated
|
|
52 |
software, which you
|
678
|
53 |
could not mess about in an afternoon. In contrast, LLVM is designed to
|
700
|
54 |
be a modular suite of tools with which you can play around easily and
|
678
|
55 |
try out something new. LLVM became a big player once Apple hired one of
|
|
56 |
the original developers (I cannot remember the reason why Apple did not
|
898
|
57 |
want to use gcc, but maybe they were also just disgusted by gcc's big
|
678
|
58 |
monolithic codebase). Anyway, LLVM is now the big player and gcc is more
|
|
59 |
or less legacy. This does not mean that programming languages like C and
|
|
60 |
C++ are dying out any time soon---they are nicely supported by LLVM.
|
539
|
61 |
|
700
|
62 |
We will target the LLVM Intermediate Language, or LLVM Intermediate
|
|
63 |
Representation (short LLVM-IR). The LLVM-IR looks very similar to the
|
908
|
64 |
assembly language of Jasmin and Krakatau. Targetting LLVM-IR will also
|
|
65 |
allow us to benefit from the modular structure of the LLVM compiler
|
909
|
66 |
and we can let the compiler generate code for different CPUs, for
|
|
67 |
example X86 or ARM. That means we can be agnostic about where our
|
908
|
68 |
code is actually going to run.\footnote{Anybody want to try to run our
|
|
69 |
programs on Android phones?} We can also be somewhat ignorant about
|
912
|
70 |
optimising our code and about allocating memory efficiently---the LLVM
|
908
|
71 |
tools will take care of all this.
|
539
|
72 |
|
908
|
73 |
However, what we have to do in order to make LLVM to play ball is to
|
912
|
74 |
generate code in \emph{Static Single-Assignment} format (short SSA). A
|
|
75 |
reason why LLVM uses the SSA-format, rather than JVM-like stack
|
|
76 |
instructions, is that stack instructions are difficult to
|
|
77 |
optimise---you cannot just re-arrange instructions without messing
|
913
|
78 |
about with what is calculated on the stack. Have a look at the
|
|
79 |
expression $((a + b) * 4) - (3 * (a + b))$ and the corresponding JVM
|
|
80 |
instructions:
|
|
81 |
|
|
82 |
\begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
|
|
83 |
iload a
|
|
84 |
iload b
|
|
85 |
iadd
|
|
86 |
ldc 4
|
|
87 |
imul
|
|
88 |
ldc 3
|
|
89 |
iload a
|
|
90 |
iload b
|
|
91 |
iadd
|
|
92 |
imul
|
|
93 |
isub
|
|
94 |
\end{lstlisting}
|
|
95 |
|
|
96 |
\noindent
|
|
97 |
and try to reorganise the code such that you calculate the expression
|
|
98 |
$(a + b)$ only once. This requires either quite a bit of
|
|
99 |
math-understanding from the compiler or you need to add ``copy-and-fetch''
|
|
100 |
of a result from a local variable. Also it is hard to find out if all
|
|
101 |
the calculations on the stack are actually necessary and not by chance
|
|
102 |
dead code. The JVM has for all these obstacles sophisticated machinery
|
|
103 |
to make such ``high-level'' code still run fast, but let's say that
|
|
104 |
for the sake of argument we do not want to rely on it. We want to
|
|
105 |
generate fast code ourselves. This means we have to work around the
|
|
106 |
intricacies of what instructions CPUs can actually process fast. This
|
|
107 |
is what the SSA format is designed for.
|
700
|
108 |
|
|
109 |
|
909
|
110 |
The main idea behind the SSA-format is to have sequences of very
|
|
111 |
simple variable assignments where every (tmp)variable is assigned only
|
|
112 |
once. The assignments need to be simple in the sense that they can be
|
|
113 |
just be primitive operations like addition, multiplication, jumps,
|
908
|
114 |
comparisons, function calls and so on. Say, we have an expression
|
909
|
115 |
$((1 + a) + (foo(3 + 2) + (b * 5)))$, then the corresponding sequence
|
|
116 |
of assignments in SSA-format are:
|
680
|
117 |
|
|
118 |
\begin{lstlisting}[language=LLVMIR,numbers=left]
|
908
|
119 |
let tmp0 = add 1 a in
|
|
120 |
let tmp1 = add 3 2 in
|
|
121 |
let tmp2 = call foo(tmp1)
|
|
122 |
let tmp3 = mul b 5 in
|
|
123 |
let tmp4 = add tmp2 tmp3 in
|
|
124 |
let tmp5 = add tmp0 tmp4 in
|
|
125 |
return tmp5
|
677
|
126 |
\end{lstlisting}
|
539
|
127 |
|
909
|
128 |
\noindent where every tmpX-variable is used only once (we could, for
|
912
|
129 |
example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if
|
|
130 |
this would not change the overall result). At the end we have a
|
|
131 |
return-instruction wich contains the final result of the
|
913
|
132 |
expression. As can be seen the task we have to solve for generating
|
|
133 |
SSA-code is to take apart compound expressions into its most basic
|
|
134 |
''particles'' and transfrom them into a sequence of simple assignments
|
|
135 |
that calculates the desired result. Note that this means we have to
|
|
136 |
fix the order in which the expression is calculated, like from the
|
|
137 |
left to right.
|
898
|
138 |
|
|
139 |
There are sophisticated algorithms for imperative languages, like C,
|
909
|
140 |
that efficiently transform high-level programs into SSA-format. But
|
898
|
141 |
we can ignore them here. We want to compile a functional language and
|
|
142 |
there things get much more interesting than just sophisticated. We
|
|
143 |
will need to have a look at CPS translations, where the CPS stands for
|
909
|
144 |
\emph{Continuation-Passing-Style}---basically black programming art or
|
678
|
145 |
abracadabra programming. So sit tight.
|
539
|
146 |
|
678
|
147 |
\subsection*{LLVM-IR}
|
539
|
148 |
|
908
|
149 |
Before we start, let's however first have a look at the \emph{LLVM Intermediate
|
700
|
150 |
Representation} in more detail. The LLVM-IR is in between the frontends
|
|
151 |
and backends of the LLVM framework. It allows compilation of multiple
|
|
152 |
source languages to multiple targets. It is also the place where most of
|
|
153 |
the target independent optimisations are performed.
|
|
154 |
|
908
|
155 |
What is good about our toy Fun-language is that it basically only
|
700
|
156 |
contains expressions (be they arithmetic expressions, boolean
|
|
157 |
expressions or if-expressions). The exception are function definitions.
|
|
158 |
Luckily, for them we can use the mechanism of defining functions in the
|
|
159 |
LLVM-IR (this is similar to using JVM methods for functions in our
|
908
|
160 |
earlier compiler). For example the simple Fun-program
|
539
|
161 |
|
|
162 |
|
677
|
163 |
\begin{lstlisting}[language=Scala,numbers=none]
|
678
|
164 |
def sqr(x) = x * x
|
677
|
165 |
\end{lstlisting}
|
539
|
166 |
|
677
|
167 |
\noindent
|
700
|
168 |
can be compiled to the following LLVM-IR function:
|
539
|
169 |
|
677
|
170 |
\begin{lstlisting}[language=LLVM]
|
678
|
171 |
define i32 @sqr(i32 %x) {
|
|
172 |
%tmp = mul i32 %x, %x
|
677
|
173 |
ret i32 %tmp
|
|
174 |
}
|
|
175 |
\end{lstlisting}
|
539
|
176 |
|
908
|
177 |
\noindent First notice that all ``local'' variable names, in this case
|
|
178 |
\texttt{x} and \texttt{tmp}, are prefixed with \texttt{\%} in the
|
|
179 |
LLVM-IR. Temporary variables can be named with an identifier, such as
|
|
180 |
\texttt{tmp}, or numbers. In contrast, function names, since they are
|
|
181 |
``global'', need to be prefixed with an @-symbol. Also, the LLVM-IR is
|
|
182 |
a fully typed language. The \texttt{i32} type stands for 32-bit
|
|
183 |
integers. There are also types for 64-bit integers (\texttt{i64}),
|
|
184 |
chars (\texttt{i8}), floats, arrays and even pointer types. In the
|
|
185 |
code above, \texttt{sqr} takes an argument of type \texttt{i32} and
|
|
186 |
produces a result of type \texttt{i32} (the result type is in front of
|
|
187 |
the function name, like in C). Each arithmetic operation, for example
|
|
188 |
addition and multiplication, are also prefixed with the type they
|
|
189 |
operate on. Obviously these types need to match up\ldots{} but since
|
|
190 |
we have in our programs only integers, for the moment \texttt{i32}
|
|
191 |
everywhere will do. We do not have to generate any other types, but
|
|
192 |
obviously this is a limitation in our Fun-language (and which we
|
912
|
193 |
are going to lift in the final CW).
|
539
|
194 |
|
700
|
195 |
There are a few interesting instructions in the LLVM-IR which are quite
|
701
|
196 |
different than what we have seen in the JVM. Can you remember the
|
|
197 |
kerfuffle we had to go through with boolean expressions and negating the
|
908
|
198 |
condition? In the LLVM-IR, branching if-conditions are implemented
|
701
|
199 |
differently: there is a separate \texttt{br}-instruction as follows:
|
700
|
200 |
|
|
201 |
\begin{lstlisting}[language=LLVM]
|
|
202 |
br i1 %var, label %if_br, label %else_br
|
|
203 |
\end{lstlisting}
|
|
204 |
|
|
205 |
\noindent
|
908
|
206 |
The type \texttt{i1} stands for booleans. If the variable is true,
|
|
207 |
then this instruction jumps to the if-branch, which needs an explicit
|
|
208 |
label; otherwise it jumps to the else-branch, again with its own
|
|
209 |
label. This allows us to keep the meaning of the boolean expression
|
|
210 |
``as is'' when compiling if's---thanks god no more negating of
|
|
211 |
booleans.
|
|
212 |
|
701
|
213 |
A value of type boolean is generated in the LLVM-IR by the
|
|
214 |
\texttt{icmp}-instruction. This instruction is for integers (hence the
|
|
215 |
\texttt{i}) and takes the comparison operation as argument. For example
|
700
|
216 |
|
|
217 |
\begin{lstlisting}[language=LLVM]
|
898
|
218 |
icmp eq i32 %x, %y ; for equal
|
|
219 |
icmp sle i32 %x, %y ; signed less or equal
|
|
220 |
icmp slt i32 %x, %y ; signed less than
|
|
221 |
icmp ult i32 %x, %y ; unsigned less than
|
700
|
222 |
\end{lstlisting}
|
|
223 |
|
|
224 |
\noindent
|
898
|
225 |
Note that in some operations the LLVM-IR distinguishes between signed and
|
908
|
226 |
unsigned representations of integers. I assume you know what this means,
|
|
227 |
otherwise just ask.
|
700
|
228 |
|
701
|
229 |
It is also easy to call another function in LLVM-IR: as can be
|
|
230 |
seen from Figure~\ref{lli} we can just call a function with the
|
|
231 |
instruction \texttt{call} and can also assign the result to
|
|
232 |
a variable. The syntax is as follows
|
|
233 |
|
|
234 |
\begin{lstlisting}[language=LLVM]
|
|
235 |
%var = call i32 @foo(...args...)
|
|
236 |
\end{lstlisting}
|
|
237 |
|
|
238 |
\noindent
|
908
|
239 |
where the arguments can only be simple variables and numbers, but not compound
|
701
|
240 |
expressions.
|
|
241 |
|
679
|
242 |
Conveniently, you can use the program \texttt{lli}, which comes with
|
912
|
243 |
LLVM, to interpret programs written in the LLVM-IR. Type on the command line
|
|
244 |
|
|
245 |
\begin{lstlisting}[language=bash,numbers=none]
|
|
246 |
lli sqr.ll
|
|
247 |
\end{lstlisting}
|
|
248 |
|
|
249 |
\noindent
|
|
250 |
and you can see the output of the pragram generated.
|
|
251 |
This means you can easily check whether the code you produced actually
|
|
252 |
works. To get a running program that does something interesting you
|
|
253 |
need to add some boilerplate about printing out numbers and a
|
|
254 |
main-function that is the entry point for the program (see
|
|
255 |
Figure~\ref{lli} for a complete listing of the square function). Again
|
700
|
256 |
this is very similar to the boilerplate we needed to add in our JVM
|
912
|
257 |
compiler.
|
700
|
258 |
|
|
259 |
You can generate a binary for the program in Figure~\ref{lli} by using
|
908
|
260 |
the \texttt{llc}-compiler and then \texttt{gcc}/\texttt{clang}, whereby \texttt{llc} generates
|
|
261 |
an object file and \texttt{gcc} (that is actually \texttt{clang} on my Mac) generates the
|
700
|
262 |
executable binary:
|
678
|
263 |
|
679
|
264 |
\begin{lstlisting}[language=bash,numbers=none]
|
|
265 |
llc -filetype=obj sqr.ll
|
|
266 |
gcc sqr.o -o a.out
|
|
267 |
./a.out
|
680
|
268 |
> 25
|
679
|
269 |
\end{lstlisting}
|
|
270 |
|
|
271 |
\begin{figure}[t]\small
|
|
272 |
\lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
|
700
|
273 |
\caption{An LLVM-IR program for calculating the square function. It
|
909
|
274 |
calls the \texttt{sqr}-function in \texttt{@main} with the argument \texttt{5}
|
908
|
275 |
(Line 20). The
|
912
|
276 |
code for the \texttt{sqr}-function is in Lines 13 -- 16. The main-function
|
|
277 |
stores
|
909
|
278 |
the result of sqr in the variable called \texttt{\%1} and then
|
908
|
279 |
prints out the contents of this variable in Line 21. The other
|
|
280 |
code is boilerplate for printing out integers.\label{lli}}
|
678
|
281 |
\end{figure}
|
|
282 |
|
679
|
283 |
|
|
284 |
|
|
285 |
\subsection*{Our Own Intermediate Language}
|
|
286 |
|
908
|
287 |
Let's get back to our compiler: Remember compilers have to solve the
|
|
288 |
problem of bridging the gap between ``high-level'' programs and
|
|
289 |
``low-level'' hardware. If the gap is too wide for one step, then a
|
|
290 |
good strategy is to lay a stepping stone somewhere in between. The
|
|
291 |
LLVM-IR itself is such a stepping stone to make the task of generating
|
|
292 |
and optimising code easier. Like a real compiler we will use our own
|
|
293 |
stepping stone which I call the \emph{K-language}.
|
|
294 |
|
|
295 |
\begin{center}
|
|
296 |
\begin{tikzpicture}[scale=0.9,font=\bf,
|
|
297 |
node/.style={
|
|
298 |
rectangle,rounded corners=3mm,
|
|
299 |
ultra thick,draw=black!50,minimum height=16mm,
|
|
300 |
minimum width=20mm,
|
|
301 |
top color=white,bottom color=black!20}]
|
|
302 |
|
|
303 |
%\node (0) at (-3,0) {};
|
|
304 |
\node (A) at (0.0,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}source}] {Fun-Language};
|
|
305 |
\node (B) at (3.5,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}stepping stone}] {K-Language};
|
|
306 |
\node (C) at (7.0,0) [node,text width=1.6cm,text centered,label=above:{\small\it{}target}] {LLVM-IR};
|
|
307 |
|
|
308 |
\draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {} (B);
|
|
309 |
\draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {} (C);
|
|
310 |
\end{tikzpicture}
|
|
311 |
\end{center}
|
|
312 |
|
|
313 |
\noindent
|
|
314 |
To see why we need a stepping stone for generating code in SSA-format,
|
|
315 |
considder again arithmetic expressions such as
|
|
316 |
$((1 + a) + (3 + (b * 5)))$. They need to be broken up into smaller
|
|
317 |
``atomic'' steps, like so
|
680
|
318 |
|
|
319 |
\begin{lstlisting}[language=LLVMIR,numbers=none]
|
|
320 |
let tmp0 = add 1 a in
|
|
321 |
let tmp1 = mul b 5 in
|
|
322 |
let tmp2 = add 3 tmp1 in
|
|
323 |
let tmp3 = add tmp0 tmp2 in
|
908
|
324 |
return tmp3
|
|
325 |
\end{lstlisting}
|
|
326 |
|
|
327 |
\noindent
|
|
328 |
Here \texttt{tmp3} will contain the result of what the whole
|
|
329 |
expression stands for. In each individual step we can only perform an
|
|
330 |
``atomic'' or ``trival'' operation, like addition or multiplication of
|
|
331 |
a number or a variable. We are not allowed to have for example a
|
|
332 |
nested addition or an if-condition on the right-hand side of an
|
909
|
333 |
assignment. Such constraints are forced upon us because of how the
|
|
334 |
SSA-format works in the LLVM-IR. To simplify matters we represent
|
908
|
335 |
assignments with two kinds of syntactic entities, namely
|
|
336 |
\emph{K-values} and \emph{K-expressions}. K-values are all ``things''
|
909
|
337 |
that can appear on the right-hand side of an equal. Say we have the
|
|
338 |
following definition for K-values:
|
908
|
339 |
|
|
340 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
341 |
enum KVal {
|
|
342 |
case KVar(s: String)
|
|
343 |
case KNum(n: Int)
|
|
344 |
case KAop(op: String, v1: KVal, v2: KVal)
|
|
345 |
case KCall(fname: String, args: List[KVal])
|
|
346 |
}
|
680
|
347 |
\end{lstlisting}
|
|
348 |
|
|
349 |
\noindent
|
908
|
350 |
where a K-value can be a variable, a number or a ``trivial'' binary
|
|
351 |
operation, such as addition or multiplication. The two arguments of a
|
|
352 |
binary operation need to be K-values as well. Finally, we have
|
|
353 |
function calls, but again each argument of the function call needs to
|
912
|
354 |
be a K-value. One constraint we need to be careful about is that the
|
|
355 |
arguments of the binary operations and function calls can only be
|
|
356 |
variables or numbers. For example
|
|
357 |
|
|
358 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
359 |
KAop("+", KAop("*", KNum(1), KNum(2)), KNum(3))
|
|
360 |
KCall("foo", List(KAop("+", KNum(4), KNum(5)))
|
|
361 |
\end{lstlisting}
|
|
362 |
|
|
363 |
\noindent
|
|
364 |
while perfect instances of K-values according to the types, are not
|
|
365 |
allowed in the LLVM-IR. To encode this constraint into the type-system
|
|
366 |
of Scala, however, would make things overly complicated---to satisfy
|
|
367 |
this constraint is therefore on us. For the moment this will be all
|
|
368 |
K-values we are considdering. Later on, we will have some more for our
|
|
369 |
Fun-language.
|
679
|
370 |
|
|
371 |
|
908
|
372 |
Our K-expressions will encode the \texttt{let} and the \texttt{return}
|
|
373 |
from the SSA-format (again for the moment we only consider these two
|
|
374 |
constructors---later on we will have if-branches as well).
|
678
|
375 |
|
908
|
376 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
377 |
enum KExp {
|
|
378 |
case KLet(x: String, v: KVal, e: KExp)
|
|
379 |
case KReturn(v: KVal)
|
|
380 |
}
|
|
381 |
\end{lstlisting}
|
679
|
382 |
|
908
|
383 |
\noindent
|
913
|
384 |
By having \texttt{KLet} taking first a string (standing for a
|
912
|
385 |
tmp-variable) and second a value, we can fulfil the SSA constraint in
|
|
386 |
assignments ``by con\-struction''---there is no way we could write
|
|
387 |
anything else than a K-value. Note that the third argument of a
|
|
388 |
\texttt{KLet} is again a K-expression, meaning either another
|
|
389 |
\texttt{KLet} or a \texttt{KReturn}. In this way we can construct a
|
913
|
390 |
sequence of computations and indicate what the final result of the
|
|
391 |
computations is. According to the SSA-format, we also have to ensure
|
912
|
392 |
that all intermediary computations are given (fresh) names---we will
|
|
393 |
use an (ugly) counter for this.
|
679
|
394 |
|
908
|
395 |
To sum up, K-values are the atomic operations that can be on the
|
|
396 |
right-hand side of assignemnts. The K-language is restricted such that
|
909
|
397 |
it is easy to generate the SSA-format for the LLVM-IR. What remains to
|
908
|
398 |
be done is a translation of our Fun-language into the K-language. The
|
909
|
399 |
main difficulty is that we need to order the computation---in the
|
913
|
400 |
K-language we only have linear sequences of instructions. Before we
|
912
|
401 |
explain this, we have a look at some programs in CPS-style.
|
679
|
402 |
|
|
403 |
|
678
|
404 |
|
|
405 |
|
|
406 |
\subsection*{CPS-Translations}
|
|
407 |
|
912
|
408 |
CPS stands for \emph{Continuation-Passing-Style}. It is a kind of
|
|
409 |
programming technique often used in advanced functional programming
|
|
410 |
and in particular in compilers. Before we delve into the
|
|
411 |
CPS-translation for our Fun-language compiler, let us look at
|
704
|
412 |
CPS-versions of some well-known functions. Consider
|
|
413 |
|
|
414 |
\begin{lstlisting}[language=Scala, numbers=none]
|
|
415 |
def fact(n: Int) : Int =
|
|
416 |
if (n == 0) 1 else n * fact(n - 1)
|
|
417 |
\end{lstlisting}
|
|
418 |
|
|
419 |
\noindent
|
|
420 |
This is clearly the usual factorial function. But now consider the
|
912
|
421 |
following slight variant of the factorial function:
|
704
|
422 |
|
912
|
423 |
\begin{lstlisting}[language=Scala, numbers=left]
|
|
424 |
def factC(n: Int, k: Int => Int) : Int =
|
|
425 |
if (n == 0) k(1)
|
|
426 |
else factC(n - 1, x => k(n * x))
|
704
|
427 |
|
912
|
428 |
factC(3, id)
|
704
|
429 |
\end{lstlisting}
|
|
430 |
|
|
431 |
\noindent
|
912
|
432 |
The function is is Lines 1--3. The function call is in Line 5. We
|
|
433 |
call this function with a number, in this case 3, and the
|
|
434 |
identity-function (which returns just its input). The \texttt{k} is
|
|
435 |
the continuation in this function. The recursive calls are:
|
704
|
436 |
|
|
437 |
\begin{lstlisting}[language=Scala, numbers=none]
|
912
|
438 |
factC(2, x => id(3 * x))
|
|
439 |
factC(1, x => id(3 * (2 * x)))
|
|
440 |
factC(0, x => id(3 * (2 * (1 * x))))
|
704
|
441 |
\end{lstlisting}
|
|
442 |
|
|
443 |
\noindent
|
|
444 |
Having reached 0, we get out of the recursion and apply 1 to the
|
912
|
445 |
continuation (see if-branch above in Line 2). This gives
|
704
|
446 |
|
|
447 |
\begin{lstlisting}[language=Scala, numbers=none]
|
912
|
448 |
id(3 * (2 * (1 * 1)))
|
704
|
449 |
= 3 * (2 * (1 * 1))
|
|
450 |
= 6
|
|
451 |
\end{lstlisting}
|
|
452 |
|
|
453 |
\noindent
|
898
|
454 |
which is the expected result. If this looks somewhat familiar to you,
|
912
|
455 |
than this is because functions with continuations can be seen as a
|
|
456 |
kind of generalisation of tail-recursive functions. So far, however,
|
|
457 |
we did not look at this generalisation in earnest. Anyway notice how
|
|
458 |
the continuations is ``stacked up'' during the recursion and then
|
|
459 |
``unrolled'' when we apply 1 to the continuation. Interestingly, we
|
|
460 |
can do something similar to the Fibonacci function where in the
|
|
461 |
traditional version we have two recursive calls. Consider the
|
|
462 |
following function
|
704
|
463 |
|
912
|
464 |
\begin{lstlisting}[language=Scala, numbers=left]
|
|
465 |
def fibC(n: Int, k: Int => Int) : Int =
|
|
466 |
if (n == 0 || n == 1) k(1)
|
704
|
467 |
else fibC(n - 1,
|
|
468 |
r1 => fibC(n - 2,
|
912
|
469 |
r2 => k(r1 + r2)))
|
704
|
470 |
\end{lstlisting}
|
|
471 |
|
|
472 |
\noindent
|
912
|
473 |
Here the continuation (Lines 4--5) is a nested function essentially wrapping up
|
|
474 |
the second recursive call plus the original continuation. Let us check how the recursion unfolds
|
704
|
475 |
when called with 3 and the identity function:
|
|
476 |
|
|
477 |
\begin{lstlisting}[language=Scala, numbers=none]
|
|
478 |
fibC(3, id)
|
|
479 |
fibC(2, r1 => fibC(1, r2 => id(r1 + r2)))
|
|
480 |
fibC(1, r1 =>
|
|
481 |
fibC(0, r2 => fibC(1, r2a => id((r1 + r2) + r2a))))
|
|
482 |
fibC(0, r2 => fibC(1, r2a => id((1 + r2) + r2a)))
|
|
483 |
fibC(1, r2a => id((1 + 1) + r2a))
|
|
484 |
id((1 + 1) + 1)
|
|
485 |
(1 + 1) + 1
|
|
486 |
3
|
|
487 |
\end{lstlisting}
|
|
488 |
|
908
|
489 |
\noindent
|
909
|
490 |
The point of this section is that you should be playing around with these
|
908
|
491 |
simple definitions and make sure they calculate the expected result.
|
|
492 |
In the next step, you should understand \emph{how} these functions
|
912
|
493 |
calculate the result with the continuations. Now change the initial
|
|
494 |
continuation which you call the function to
|
|
495 |
|
|
496 |
\begin{lstlisting}[language=Scala, numbers=none]
|
|
497 |
r => { println("The result plus 1 is:") ; r + 1 }
|
|
498 |
\end{lstlisting}
|
|
499 |
|
|
500 |
\noindent
|
|
501 |
Does this still calculate the expected result?
|
908
|
502 |
|
|
503 |
|
|
504 |
\section*{Worked Example}
|
|
505 |
|
909
|
506 |
Let us now come back to the CPS-translations for the Fun-language.
|
|
507 |
Though we will start with a simpler subset containing only numbers,
|
913
|
508 |
arithmetic expressions and function calls---no variables for the
|
|
509 |
moment. The main difficulty of generating instructions in SSA-format
|
|
510 |
is that large compound expressions need to be broken up into smaller
|
|
511 |
pieces and intermediate results need to be chained into later
|
|
512 |
instructions. To do this conveniently, we use the CPS-translation
|
|
513 |
mechanism. What the continuations essentially solve is the following
|
|
514 |
problem: Given an expressions
|
908
|
515 |
|
|
516 |
\begin{equation}\label{exp}
|
|
517 |
(1 + 2) * (3 + 4)
|
|
518 |
\end{equation}
|
|
519 |
|
|
520 |
\noindent
|
913
|
521 |
which of the subexpressions should be calculated first? We are going
|
|
522 |
arbitrarily to decide that the calculation will be from left to
|
912
|
523 |
right. Other languages make different choices---C famously is right to
|
|
524 |
left. In our case this means we have to tear the expression shown in
|
|
525 |
\eqref{exp} apart as follows:
|
908
|
526 |
|
|
527 |
\[
|
|
528 |
(1 + 2) \qquad\text{and}\qquad \Box * (3 + 4)
|
|
529 |
\]
|
|
530 |
|
|
531 |
\noindent
|
912
|
532 |
The first subexpression can be easily calculated and will give us a result,
|
909
|
533 |
but the second one is not really an expression that makes sense. It
|
|
534 |
will only make sense as the next step in the computation when we
|
|
535 |
fill-in the result of $1+2$ into the ``hole'' indicated by
|
|
536 |
$\Box$. Such incomplete expressions can be represented in Scala as
|
|
537 |
functions
|
908
|
538 |
|
|
539 |
\begin{lstlisting}[language=Scala, numbers=none]
|
|
540 |
r => r * (3 + 4)
|
|
541 |
\end{lstlisting}
|
|
542 |
|
|
543 |
\noindent
|
912
|
544 |
where \texttt{r} will represent any result that has been computed
|
|
545 |
earlier. By the way, in lambda-calculus notation we would write
|
908
|
546 |
$\lambda r.\, r * (3 + 4)$ for the same function. To sum up, we use
|
700
|
547 |
functions (``continuations'') to represent what is coming next in a
|
908
|
548 |
sequence of instructions. In our case, continuations are functions of
|
|
549 |
type \code{KVal} to \code{KExp}. They can be seen as a sequence of
|
|
550 |
\code{KLet}s where there is a ``hole'' that needs to be
|
|
551 |
filled. Consider for example
|
678
|
552 |
|
701
|
553 |
\begin{lstlisting}[language=LLVMIR,numbers=left,escapeinside={(*@}{@*)}]
|
|
554 |
let tmp0 = add 1 a in
|
|
555 |
let tmp1 = mul (*@$\Box$@*) 5 in
|
|
556 |
let tmp2 = add 3 tmp1 in
|
|
557 |
let tmp3 = add tmp0 tmp2 in
|
912
|
558 |
return tmp3
|
701
|
559 |
\end{lstlisting}
|
|
560 |
|
|
561 |
\noindent
|
|
562 |
where in the second line is a $\Box$ which still expects a \code{KVal}
|
|
563 |
to be filled in before it becomes a ``proper'' \code{KExp}. When we
|
898
|
564 |
apply an argument to the continuation (remember they are functions)
|
908
|
565 |
we essentially fill something into the corresponding hole.
|
|
566 |
|
909
|
567 |
Lets look at some concrete code. To simplify matters,
|
908
|
568 |
suppose our source language consists just of three kinds of expressions
|
|
569 |
|
|
570 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
571 |
enum Expr {
|
|
572 |
case Num(n: Int)
|
912
|
573 |
case Aop(op: String, e1: Expr, e2: Expr)
|
908
|
574 |
case Call(fname: String, args: List[Expr])
|
|
575 |
}
|
|
576 |
\end{lstlisting}
|
|
577 |
|
|
578 |
\noindent
|
912
|
579 |
This allows us to generate SSA-instructions for expressions like
|
|
580 |
|
|
581 |
\[
|
|
582 |
1 + foo(bar(4 * 7), 3, id(12))
|
|
583 |
\]
|
|
584 |
|
|
585 |
\noindent
|
|
586 |
The code of the CPS-translation
|
|
587 |
that generates these instructions is then of the form
|
679
|
588 |
|
701
|
589 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
590 |
def CPS(e: Exp)(k: KVal => KExp) : KExp =
|
|
591 |
e match { ... }
|
|
592 |
\end{lstlisting}
|
679
|
593 |
|
701
|
594 |
\noindent
|
908
|
595 |
where \code{k} is the continuation and \code{e} is the expression to
|
909
|
596 |
be compiled. The result of the function is a K-expression, which later
|
|
597 |
can be compiled into LLVM-IR code.
|
|
598 |
|
913
|
599 |
In case we have numbers, then we can just pass them in the CPS-translation
|
909
|
600 |
to the continuations because numbers need not be further teared apart
|
912
|
601 |
as they are already primitive. Passing the number to the continuation
|
909
|
602 |
means we apply the continuation like
|
679
|
603 |
|
908
|
604 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
605 |
case Num(i) => k(KNum(i))
|
|
606 |
\end{lstlisting}
|
679
|
607 |
|
909
|
608 |
\noindent This would fill in the $\Box$ in a \code{KLet}-expression.
|
|
609 |
Since \texttt{k} is a function from \texttt{KVar} to \texttt{KExp} we
|
|
610 |
also optain the correct type for CPS, namely \texttt{KExp}. There is
|
|
611 |
no need to create a temporary variable for simple numbers. More
|
|
612 |
interesting is the case for arithmetic operations.
|
908
|
613 |
|
|
614 |
\begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm]
|
|
615 |
case Aop(op, e1, e2) => {
|
|
616 |
val z = Fresh("tmp")
|
|
617 |
CPS(e1)(r1 =>
|
|
618 |
CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z)))))
|
|
619 |
}
|
|
620 |
\end{lstlisting}
|
|
621 |
|
|
622 |
\noindent
|
|
623 |
What we essentially have to do in this case is the following: compile
|
|
624 |
the subexpressions \texttt{e1} and \texttt{e2}. They will produce some
|
913
|
625 |
result that is stored in two temporary variables (assuming \texttt{e1} and \texttt{e2} are more
|
908
|
626 |
complicated than just numbers). We need to use these two temporary
|
|
627 |
variables and feed them into a new assignment of the form
|
|
628 |
|
|
629 |
\begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}]
|
|
630 |
let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in
|
|
631 |
...
|
|
632 |
\end{lstlisting}
|
701
|
633 |
|
908
|
634 |
\noindent
|
|
635 |
Note that this assignment has two ``holes'', one for the left
|
|
636 |
subexpression and one for the right subexpression. We can fill both
|
|
637 |
holes by calling the CPS function twice---this is a bit of the black
|
|
638 |
art in CPS. We can use the second call of CPS as the continuation
|
909
|
639 |
of the first call. The reason is that
|
908
|
640 |
|
|
641 |
\begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm]
|
|
642 |
r1 => CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z))))
|
|
643 |
\end{lstlisting}
|
|
644 |
|
912
|
645 |
\noindent is a function from \code{KVal} to \code{KExp}, which we need
|
|
646 |
as type for the continuation. Once we created the assignment with the
|
|
647 |
fresh temporary variable \texttt{z}, we need to ``communicate'' that
|
|
648 |
the result of the computation of the arithmetic expression is stored
|
|
649 |
in \texttt{z} to the computations that follow. In this way we apply
|
|
650 |
the continuation \texttt{k} with this new variable (essentially we are
|
913
|
651 |
plugging in a hole further down the line). Hope this makes sense!? If not,
|
|
652 |
play with the given code yourself.
|
908
|
653 |
|
|
654 |
The last case we need to consider in our small expression language are
|
912
|
655 |
function calls. For them remember each argument of the function
|
913
|
656 |
call can in SSA-format only be a variable or a number. Here is the
|
|
657 |
complete code for this case:
|
908
|
658 |
|
909
|
659 |
\begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm]
|
908
|
660 |
case Call(fname, args) => {
|
|
661 |
def aux(args: List[Expr], vs: List[KVal]): KExp = args match {
|
|
662 |
case Nil => {
|
|
663 |
val z = Fresh("tmp")
|
|
664 |
KLet(z, KCall(fname, vs), k(KVar(z)))
|
|
665 |
}
|
|
666 |
case a::as => CPS(a)(r => aux(as, vs ::: List(r)))
|
|
667 |
}
|
|
668 |
aux(args, Nil)
|
701
|
669 |
}
|
|
670 |
\end{lstlisting}
|
|
671 |
|
|
672 |
\noindent
|
913
|
673 |
As can be seen, we introduce an auxiliary function that compiles first all
|
908
|
674 |
function arguments---remember in our source language we can have calls
|
|
675 |
such as $foo(3 + 4, g(3))$ where we first have to create temporary
|
|
676 |
variables (and computations) for each argument. Therefore \texttt{aux}
|
909
|
677 |
analyses the argument list from left to right. In case there is an
|
|
678 |
argument \texttt{a} on the front of the list (the case \texttt{a::as}
|
|
679 |
in Line 7), we call CPS recursively for the corresponding
|
|
680 |
subexpression. The temporary variable containing the result for this
|
913
|
681 |
argument, we add to the end of the K-values we have already analysed
|
909
|
682 |
before. Again very conveniently we can use the recursive call to
|
|
683 |
\texttt{aux} as the continuation for the computations that
|
|
684 |
follow. When we reach the end of the argument list (the
|
|
685 |
\texttt{Nil}-case in Lines 3--6), then we collect all the K-values
|
|
686 |
\texttt{v1} to \texttt{vn} and call the function in the K-language
|
|
687 |
like so
|
908
|
688 |
|
|
689 |
|
|
690 |
\begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}]
|
|
691 |
let z = call foo(v1,...,vn) in
|
|
692 |
...
|
|
693 |
\end{lstlisting}
|
|
694 |
|
|
695 |
\noindent
|
|
696 |
Again we need to communicate the result of the function call, namely the
|
|
697 |
fresh temporary variable \texttt{z}, to the rest of the computation. Therefore
|
|
698 |
we apply the continuation \texttt{k} with this variable.
|
|
699 |
|
|
700 |
|
|
701 |
|
|
702 |
\begin{figure}[p]\small
|
|
703 |
\lstinputlisting[numbers=left,firstline=1,lastline=24]{../progs/fun/simple-cps.sc}
|
|
704 |
\hspace{10mm}...
|
|
705 |
\lstinputlisting[numbers=left,firstline=32,firstnumber=32,lastline=51]{../progs/fun/simple-cps.sc}
|
|
706 |
\caption{CPS-translation for a simple expression language.\label{cps}}
|
|
707 |
\end{figure}
|
|
708 |
|
912
|
709 |
The last question we need to answer in the code (see Figure~\ref{cps})
|
|
710 |
is how we start the CPS-translation function, or more precisely with
|
|
711 |
which continuation we should start CPS? It turns out that this initial
|
|
712 |
continuation will be the last one that is called. What should be the
|
|
713 |
last step in the computation? Yes, we need to return the temporary
|
|
714 |
variable where the last result is stored (if it is a simple number,
|
|
715 |
then we can just return this number). Therefore we call CPS with the
|
|
716 |
code
|
908
|
717 |
|
|
718 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
719 |
def CPSi(e: Expr) : KExp = CPS(e)(KReturn(_))
|
|
720 |
\end{lstlisting}
|
|
721 |
|
|
722 |
\noindent
|
909
|
723 |
where we give the function \code{KReturn(_)} as the continuation. With
|
912
|
724 |
this we completed the translation of simple expressions into our
|
|
725 |
K-language. Play around with some more expressions and see how the
|
|
726 |
CPS-translation generates the correct code. I know this is not easy to
|
|
727 |
follow code when you see it the first time. Figure~\ref{absfun} gives
|
|
728 |
the complete datatypes for the ASTs of the Fun-language and the
|
|
729 |
K-values and K-expressions for the K-language. The complete
|
|
730 |
CPS-translation you can find in the code.
|
|
731 |
|
|
732 |
\section*{Next Steps}
|
909
|
733 |
|
|
734 |
Having obtained a K-expression, it is relatively straightforward to
|
913
|
735 |
generate a valid program for the LLVM-IR---remember the K-language
|
|
736 |
already enforces the SSA convention of a linear sequence of primitive
|
|
737 |
instructions involving only unique temporary variables.
|
|
738 |
We leave this step to the attentive reader.
|
|
739 |
|
|
740 |
What else can we do? Well it should be relatively easy to apply some
|
|
741 |
common optimisations to the K-expressions. One optimisations is called
|
|
742 |
constant folding---for example if we have an expression $3 + 4$ then
|
|
743 |
we can replace it by just $5$ instead of generating code to compute
|
|
744 |
$5$. Now this information needs to be propagated to the next
|
|
745 |
computation step to see whether any further constant foldings are
|
|
746 |
possible. Another useful technique is common subexpression
|
|
747 |
elimination, meaning if you have twice a calculation of a function
|
|
748 |
$foo(a)$, then we want to call it only once and store the result in a
|
|
749 |
temporary variable that can be used instead of the second, or third,
|
|
750 |
call to $foo(a)$. Again I leave this to the attentive reader to work
|
|
751 |
out and implement.
|
908
|
752 |
|
|
753 |
|
|
754 |
\begin{figure}[p]\small
|
|
755 |
\begin{lstlisting}[language=Scala,numbers=none]
|
|
756 |
// Fun language (expressions)
|
|
757 |
abstract class Exp
|
|
758 |
abstract class BExp
|
|
759 |
|
|
760 |
case class Call(name: String, args: List[Exp]) extends Exp
|
|
761 |
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
|
|
762 |
case class Write(e: Exp) extends Exp
|
|
763 |
case class Var(s: String) extends Exp
|
|
764 |
case class Num(i: Int) extends Exp
|
|
765 |
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
|
|
766 |
case class Sequence(e1: Exp, e2: Exp) extends Exp
|
|
767 |
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
|
|
768 |
|
|
769 |
|
|
770 |
// K-language (K-expressions, K-values)
|
|
771 |
abstract class KExp
|
|
772 |
abstract class KVal
|
|
773 |
|
|
774 |
case class KVar(s: String) extends KVal
|
|
775 |
case class KNum(i: Int) extends KVal
|
|
776 |
case class Kop(o: String, v1: KVal, v2: KVal) extends KVal
|
|
777 |
case class KCall(o: String, vrs: List[KVal]) extends KVal
|
|
778 |
case class KWrite(v: KVal) extends KVal
|
|
779 |
|
|
780 |
case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
|
|
781 |
case class KLet(x: String, v: KVal, e: KExp) extends KExp
|
|
782 |
case class KReturn(v: KVal) extends KExp
|
|
783 |
\end{lstlisting}
|
909
|
784 |
\caption{Abstract syntax trees for the Fun-language and the K-language.\label{absfun}}
|
908
|
785 |
\end{figure}
|
|
786 |
|
|
787 |
|
917
|
788 |
\section*{Alternatives to CPS}
|
|
789 |
|
|
790 |
While I appreciate that this handout is already pretty long, this
|
|
791 |
section is for students who think the CPS-translation is too much of
|
|
792 |
voodoo programming---there is a perhaps simpler alternative. This
|
|
793 |
alternative is along the lines: if you cannot bridge the gap in
|
|
794 |
a single step, do it in two simpler steps. Let's look at the
|
|
795 |
simple expression $1 + (2 + 3)$. The CPS-translation correctly
|
|
796 |
generates the expression
|
|
797 |
|
|
798 |
\begin{lstlisting}[language=LLVMIR,numbers=left,escapeinside={(*@}{@*)}]
|
|
799 |
let tmp0 = add 2 3 in
|
|
800 |
let tmp1 = add 1 tmp0 in
|
|
801 |
return tmp1
|
|
802 |
\end{lstlisting}
|
|
803 |
|
|
804 |
\noindent
|
|
805 |
where $(2 + 3)$ is pulled out and calculated first. The problem is that it
|
|
806 |
requires a bit of magic. But with the ability to give a separate variable
|
|
807 |
to each indivifual computation, we could do the following: The expression
|
|
808 |
$1 + (2 + 3)$ is a tree like this
|
|
809 |
|
|
810 |
\begin{center}
|
|
811 |
\begin{tikzpicture}[line width=1mm]
|
|
812 |
\node {root} [grow'=up]
|
|
813 |
child {node {1}}
|
|
814 |
child {node[circle,draw] {+}
|
|
815 |
child {node {2}}
|
|
816 |
child {node {3}}
|
|
817 |
};
|
|
818 |
\end{tikzpicture}
|
|
819 |
\end{center}
|
|
820 |
|
|
821 |
\noindent
|
|
822 |
and we could perform a completely standard recursive traversal of the
|
|
823 |
tree: each inner node gets a new variable and assignment.
|
|
824 |
|
|
825 |
|
898
|
826 |
|
|
827 |
\noindent
|
539
|
828 |
\end{document}
|
|
829 |
|
|
830 |
|
|
831 |
%%% Local Variables:
|
|
832 |
%%% mode: latex
|
|
833 |
%%% TeX-master: t
|
|
834 |
%%% End:
|