63 allow us to benefit from the modular structure of the LLVM compiler |
63 allow us to benefit from the modular structure of the LLVM compiler |
64 and we can let the compiler generate code for different CPUs, for |
64 and we can let the compiler generate code for different CPUs, for |
65 example X86 or ARM. That means we can be agnostic about where our |
65 example X86 or ARM. That means we can be agnostic about where our |
66 code is actually going to run.\footnote{Anybody want to try to run our |
66 code is actually going to run.\footnote{Anybody want to try to run our |
67 programs on Android phones?} We can also be somewhat ignorant about |
67 programs on Android phones?} We can also be somewhat ignorant about |
68 optimising our code and about allocating memory efficiently: the LLVM |
68 optimising our code and about allocating memory efficiently---the LLVM |
69 tools will take care of all this. |
69 tools will take care of all this. |
70 |
70 |
71 However, what we have to do in order to make LLVM to play ball is to |
71 However, what we have to do in order to make LLVM to play ball is to |
72 generate code in \emph{Static Single-Assignment} format (short SSA), |
72 generate code in \emph{Static Single-Assignment} format (short SSA). A |
73 because that is what the LLVM-IR expects from us. A reason why LLVM |
73 reason why LLVM uses the SSA-format, rather than JVM-like stack |
74 uses the SSA-format, rather than JVM-like stack instructions, is that |
74 instructions, is that stack instructions are difficult to |
75 stack instructions are difficult to optimise---you cannot just |
75 optimise---you cannot just re-arrange instructions without messing |
76 re-arrange instructions without messing about with what is calculated |
76 about with what is calculated on the stack. Also it is hard to find |
77 on the stack. Also it is hard to find out if all the calculations on |
77 out if all the calculations on the stack are actually necessary and |
78 the stack are actually necessary and not by chance dead code. The JVM |
78 not by chance dead code. The JVM has for all these obstacles |
79 has for all these obstacles sophisticated machinery to make such |
79 sophisticated machinery to make such ``high-level'' code still run |
80 ``high-level'' code still run fast, but let's say that for the sake of |
80 fast, but let's say that for the sake of argument we do not want to |
81 argument we do not want to rely on it. We want to generate fast code |
81 rely on it. We want to generate fast code ourselves. This means we |
82 ourselves. This means we have to work around the intricacies of what |
82 have to work around the intricacies of what instructions CPUs can |
83 instructions CPUs can actually process fast. This is what the SSA |
83 actually process fast. This is what the SSA format is designed for. |
84 format is designed for. |
|
85 |
84 |
86 |
85 |
87 The main idea behind the SSA-format is to have sequences of very |
86 The main idea behind the SSA-format is to have sequences of very |
88 simple variable assignments where every (tmp)variable is assigned only |
87 simple variable assignments where every (tmp)variable is assigned only |
89 once. The assignments need to be simple in the sense that they can be |
88 once. The assignments need to be simple in the sense that they can be |
209 \noindent |
212 \noindent |
210 where the arguments can only be simple variables and numbers, but not compound |
213 where the arguments can only be simple variables and numbers, but not compound |
211 expressions. |
214 expressions. |
212 |
215 |
213 Conveniently, you can use the program \texttt{lli}, which comes with |
216 Conveniently, you can use the program \texttt{lli}, which comes with |
214 LLVM, to interpret programs written in the LLVM-IR. So you can easily |
217 LLVM, to interpret programs written in the LLVM-IR. Type on the command line |
215 check whether the code you produced actually works. To get a running |
218 |
216 program that does something interesting you need to add some boilerplate |
219 \begin{lstlisting}[language=bash,numbers=none] |
217 about printing out numbers and a main-function that is the entry point |
220 lli sqr.ll |
218 for the program (see Figure~\ref{lli} for a complete listing of the square function). Again |
221 \end{lstlisting} |
|
222 |
|
223 \noindent |
|
224 and you can see the output of the pragram generated. |
|
225 This means you can easily check whether the code you produced actually |
|
226 works. To get a running program that does something interesting you |
|
227 need to add some boilerplate about printing out numbers and a |
|
228 main-function that is the entry point for the program (see |
|
229 Figure~\ref{lli} for a complete listing of the square function). Again |
219 this is very similar to the boilerplate we needed to add in our JVM |
230 this is very similar to the boilerplate we needed to add in our JVM |
220 compiler. |
231 compiler. |
221 |
232 |
222 You can generate a binary for the program in Figure~\ref{lli} by using |
233 You can generate a binary for the program in Figure~\ref{lli} by using |
223 the \texttt{llc}-compiler and then \texttt{gcc}/\texttt{clang}, whereby \texttt{llc} generates |
234 the \texttt{llc}-compiler and then \texttt{gcc}/\texttt{clang}, whereby \texttt{llc} generates |
224 an object file and \texttt{gcc} (that is actually \texttt{clang} on my Mac) generates the |
235 an object file and \texttt{gcc} (that is actually \texttt{clang} on my Mac) generates the |
225 executable binary: |
236 executable binary: |
311 \noindent |
323 \noindent |
312 where a K-value can be a variable, a number or a ``trivial'' binary |
324 where a K-value can be a variable, a number or a ``trivial'' binary |
313 operation, such as addition or multiplication. The two arguments of a |
325 operation, such as addition or multiplication. The two arguments of a |
314 binary operation need to be K-values as well. Finally, we have |
326 binary operation need to be K-values as well. Finally, we have |
315 function calls, but again each argument of the function call needs to |
327 function calls, but again each argument of the function call needs to |
316 be a K-value. One point we need to be careful, however, is that the |
328 be a K-value. One constraint we need to be careful about is that the |
317 arguments of the binary operations and function calls can in fact only |
329 arguments of the binary operations and function calls can only be |
318 be variables or numbers. To encode this constraint into the type-system |
330 variables or numbers. For example |
319 of Scala, however, would make things too complicated---to satisfy this |
331 |
320 constraint is therefore on us. For our Fun-language, we will later on |
332 \begin{lstlisting}[language=Scala,numbers=none] |
321 consider some further K-values. |
333 KAop("+", KAop("*", KNum(1), KNum(2)), KNum(3)) |
|
334 KCall("foo", List(KAop("+", KNum(4), KNum(5))) |
|
335 \end{lstlisting} |
|
336 |
|
337 \noindent |
|
338 while perfect instances of K-values according to the types, are not |
|
339 allowed in the LLVM-IR. To encode this constraint into the type-system |
|
340 of Scala, however, would make things overly complicated---to satisfy |
|
341 this constraint is therefore on us. For the moment this will be all |
|
342 K-values we are considdering. Later on, we will have some more for our |
|
343 Fun-language. |
322 |
344 |
323 |
345 |
324 Our K-expressions will encode the \texttt{let} and the \texttt{return} |
346 Our K-expressions will encode the \texttt{let} and the \texttt{return} |
325 from the SSA-format (again for the moment we only consider these two |
347 from the SSA-format (again for the moment we only consider these two |
326 constructors---later on we will have if-branches as well). |
348 constructors---later on we will have if-branches as well). |
331 case KReturn(v: KVal) |
353 case KReturn(v: KVal) |
332 } |
354 } |
333 \end{lstlisting} |
355 \end{lstlisting} |
334 |
356 |
335 \noindent |
357 \noindent |
336 By having in \texttt{KLet} taking first a string (standing for an |
358 By having in \texttt{KLet} taking first a string (standing for a |
337 intermediate variable) and second a value, we can fulfil the SSA |
359 tmp-variable) and second a value, we can fulfil the SSA constraint in |
338 constraint in assignments ``by con\-struction''---there is no way we |
360 assignments ``by con\-struction''---there is no way we could write |
339 could write anything else than a K-value. Note that the third |
361 anything else than a K-value. Note that the third argument of a |
340 argument of a \texttt{KLet} is again a K-expression, meaning either |
362 \texttt{KLet} is again a K-expression, meaning either another |
341 another \texttt{KLet} or a \texttt{KReturn}. In this way we can |
363 \texttt{KLet} or a \texttt{KReturn}. In this way we can construct a |
342 construct a sequence of computations and return a final result. Of |
364 sequence of computations and indicate what is the final result of the |
343 course we also have to ensure that all intermediary computations are |
365 computations. According to the SSA-format, we also have to ensure |
344 given (fresh) names---we will use an (ugly) counter for this. |
366 that all intermediary computations are given (fresh) names---we will |
|
367 use an (ugly) counter for this. |
345 |
368 |
346 To sum up, K-values are the atomic operations that can be on the |
369 To sum up, K-values are the atomic operations that can be on the |
347 right-hand side of assignemnts. The K-language is restricted such that |
370 right-hand side of assignemnts. The K-language is restricted such that |
348 it is easy to generate the SSA-format for the LLVM-IR. What remains to |
371 it is easy to generate the SSA-format for the LLVM-IR. What remains to |
349 be done is a translation of our Fun-language into the K-language. The |
372 be done is a translation of our Fun-language into the K-language. The |
350 main difficulty is that we need to order the computation---in the |
373 main difficulty is that we need to order the computation---in the |
351 K-language we only have linear sequence of instructions. Before we |
374 K-language we only have linear sequence of instructions. Before we |
352 explain this, we have a look at some CPS-translation. |
375 explain this, we have a look at some programs in CPS-style. |
353 |
376 |
354 |
377 |
355 |
378 |
356 |
379 |
357 \subsection*{CPS-Translations} |
380 \subsection*{CPS-Translations} |
358 |
381 |
359 CPS stands for \emph{Continuation-Passing-Style}. It is a kind of programming |
382 CPS stands for \emph{Continuation-Passing-Style}. It is a kind of |
360 technique often used in advanced functional programming. Before we delve |
383 programming technique often used in advanced functional programming |
361 into the CPS-translation for our Fun-language compiler, let us look at |
384 and in particular in compilers. Before we delve into the |
|
385 CPS-translation for our Fun-language compiler, let us look at |
362 CPS-versions of some well-known functions. Consider |
386 CPS-versions of some well-known functions. Consider |
363 |
387 |
364 \begin{lstlisting}[language=Scala, numbers=none] |
388 \begin{lstlisting}[language=Scala, numbers=none] |
365 def fact(n: Int) : Int = |
389 def fact(n: Int) : Int = |
366 if (n == 0) 1 else n * fact(n - 1) |
390 if (n == 0) 1 else n * fact(n - 1) |
367 \end{lstlisting} |
391 \end{lstlisting} |
368 |
392 |
369 \noindent |
393 \noindent |
370 This is clearly the usual factorial function. But now consider the |
394 This is clearly the usual factorial function. But now consider the |
371 following version of the factorial function: |
395 following slight variant of the factorial function: |
|
396 |
|
397 \begin{lstlisting}[language=Scala, numbers=left] |
|
398 def factC(n: Int, k: Int => Int) : Int = |
|
399 if (n == 0) k(1) |
|
400 else factC(n - 1, x => k(n * x)) |
|
401 |
|
402 factC(3, id) |
|
403 \end{lstlisting} |
|
404 |
|
405 \noindent |
|
406 The function is is Lines 1--3. The function call is in Line 5. We |
|
407 call this function with a number, in this case 3, and the |
|
408 identity-function (which returns just its input). The \texttt{k} is |
|
409 the continuation in this function. The recursive calls are: |
372 |
410 |
373 \begin{lstlisting}[language=Scala, numbers=none] |
411 \begin{lstlisting}[language=Scala, numbers=none] |
374 def factC(n: Int, ret: Int => Int) : Int = |
412 factC(2, x => id(3 * x)) |
375 if (n == 0) ret(1) |
413 factC(1, x => id(3 * (2 * x))) |
376 else factC(n - 1, x => ret(n * x)) |
414 factC(0, x => id(3 * (2 * (1 * x)))) |
377 |
415 \end{lstlisting} |
378 factC(3, identity) |
416 |
379 \end{lstlisting} |
417 \noindent |
380 |
418 Having reached 0, we get out of the recursion and apply 1 to the |
381 \noindent |
419 continuation (see if-branch above in Line 2). This gives |
382 This function is called with a number, in this case 3, and the |
|
383 identity-function (which returns just its input). The recursive |
|
384 calls are: |
|
385 |
420 |
386 \begin{lstlisting}[language=Scala, numbers=none] |
421 \begin{lstlisting}[language=Scala, numbers=none] |
387 factC(2, x => identity(3 * x)) |
422 id(3 * (2 * (1 * 1))) |
388 factC(1, x => identity(3 * (2 * x))) |
|
389 factC(0, x => identity(3 * (2 * (1 * x)))) |
|
390 \end{lstlisting} |
|
391 |
|
392 \noindent |
|
393 Having reached 0, we get out of the recursion and apply 1 to the |
|
394 continuation (see if-branch above). This gives |
|
395 |
|
396 \begin{lstlisting}[language=Scala, numbers=none] |
|
397 identity(3 * (2 * (1 * 1))) |
|
398 = 3 * (2 * (1 * 1)) |
423 = 3 * (2 * (1 * 1)) |
399 = 6 |
424 = 6 |
400 \end{lstlisting} |
425 \end{lstlisting} |
401 |
426 |
402 \noindent |
427 \noindent |
403 which is the expected result. If this looks somewhat familiar to you, |
428 which is the expected result. If this looks somewhat familiar to you, |
404 than this is because functions with continuations can be |
429 than this is because functions with continuations can be seen as a |
405 seen as a kind of generalisation of tail-recursive functions. |
430 kind of generalisation of tail-recursive functions. So far, however, |
406 So far we did not look at this generalisation in earnest. |
431 we did not look at this generalisation in earnest. Anyway notice how |
407 Anyway |
432 the continuations is ``stacked up'' during the recursion and then |
408 notice how the continuations is ``stacked up'' during the recursion and |
433 ``unrolled'' when we apply 1 to the continuation. Interestingly, we |
409 then ``unrolled'' when we apply 1 to the continuation. Interestingly, we |
434 can do something similar to the Fibonacci function where in the |
410 can do something similar to the Fibonacci function where in the traditional |
435 traditional version we have two recursive calls. Consider the |
411 version we have two recursive calls. Consider the following function |
436 following function |
412 |
437 |
413 \begin{lstlisting}[language=Scala, numbers=none] |
438 \begin{lstlisting}[language=Scala, numbers=left] |
414 def fibC(n: Int, ret: Int => Int) : Int = |
439 def fibC(n: Int, k: Int => Int) : Int = |
415 if (n == 0 || n == 1) ret(1) |
440 if (n == 0 || n == 1) k(1) |
416 else fibC(n - 1, |
441 else fibC(n - 1, |
417 r1 => fibC(n - 2, |
442 r1 => fibC(n - 2, |
418 r2 => ret(r1 + r2))) |
443 r2 => k(r1 + r2))) |
419 \end{lstlisting} |
444 \end{lstlisting} |
420 |
445 |
421 \noindent |
446 \noindent |
422 Here the continuation is a nested function essentially wrapping up |
447 Here the continuation (Lines 4--5) is a nested function essentially wrapping up |
423 the second recursive call. Let us check how the recursion unfolds |
448 the second recursive call plus the original continuation. Let us check how the recursion unfolds |
424 when called with 3 and the identity function: |
449 when called with 3 and the identity function: |
425 |
450 |
426 \begin{lstlisting}[language=Scala, numbers=none] |
451 \begin{lstlisting}[language=Scala, numbers=none] |
427 fibC(3, id) |
452 fibC(3, id) |
428 fibC(2, r1 => fibC(1, r2 => id(r1 + r2))) |
453 fibC(2, r1 => fibC(1, r2 => id(r1 + r2))) |
572 |
613 |
573 \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm] |
614 \begin{lstlisting}[language=Scala,numbers=none,xleftmargin=0mm] |
574 r1 => CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z)))) |
615 r1 => CPS(e2)(r2 => KLet(z, KAop(op, r1, r2), k(KVar(z)))) |
575 \end{lstlisting} |
616 \end{lstlisting} |
576 |
617 |
577 \noindent is a function from \code{KVal} to \code{KExp}. Once |
618 \noindent is a function from \code{KVal} to \code{KExp}, which we need |
578 we created the assignment with the fresh temporary variable |
619 as type for the continuation. Once we created the assignment with the |
579 \texttt{z}, we need to ``communicate'' that the result of the |
620 fresh temporary variable \texttt{z}, we need to ``communicate'' that |
580 computation of the arithmetic expression is stored in \texttt{z} to the |
621 the result of the computation of the arithmetic expression is stored |
581 computations that follow. In this way we apply the continuation \texttt{k} with this |
622 in \texttt{z} to the computations that follow. In this way we apply |
582 new variable (essentially we are plugging in a hole further down the line). |
623 the continuation \texttt{k} with this new variable (essentially we are |
583 Hope this makes sense. |
624 plugging in a hole further down the line). Hope this makes sense. |
584 |
625 |
585 The last case we need to consider in our small expression language are |
626 The last case we need to consider in our small expression language are |
586 function calls. |
627 function calls. For them remember each argument of the function |
|
628 call can in SSA-format only be a variable or a number. |
587 |
629 |
588 \begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm] |
630 \begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm] |
589 case Call(fname, args) => { |
631 case Call(fname, args) => { |
590 def aux(args: List[Expr], vs: List[KVal]): KExp = args match { |
632 def aux(args: List[Expr], vs: List[KVal]): KExp = args match { |
591 case Nil => { |
633 case Nil => { |
633 \hspace{10mm}... |
675 \hspace{10mm}... |
634 \lstinputlisting[numbers=left,firstline=32,firstnumber=32,lastline=51]{../progs/fun/simple-cps.sc} |
676 \lstinputlisting[numbers=left,firstline=32,firstnumber=32,lastline=51]{../progs/fun/simple-cps.sc} |
635 \caption{CPS-translation for a simple expression language.\label{cps}} |
677 \caption{CPS-translation for a simple expression language.\label{cps}} |
636 \end{figure} |
678 \end{figure} |
637 |
679 |
638 The last question we need to answer in the code (see Figure~\ref{cps}) is how we |
680 The last question we need to answer in the code (see Figure~\ref{cps}) |
639 start the CPS-translation function, or more precisely with which continuation we |
681 is how we start the CPS-translation function, or more precisely with |
640 should start CPS? It turns out that this initial continuation will be the last one that is |
682 which continuation we should start CPS? It turns out that this initial |
641 called. What should be the last step in the computation? Yes, we need to return the |
683 continuation will be the last one that is called. What should be the |
642 temporary variable where the last result is stored (if it is a simple number, then we can |
684 last step in the computation? Yes, we need to return the temporary |
643 just return this number). Therefore we call CPS with the code |
685 variable where the last result is stored (if it is a simple number, |
|
686 then we can just return this number). Therefore we call CPS with the |
|
687 code |
644 |
688 |
645 \begin{lstlisting}[language=Scala,numbers=none] |
689 \begin{lstlisting}[language=Scala,numbers=none] |
646 def CPSi(e: Expr) : KExp = CPS(e)(KReturn(_)) |
690 def CPSi(e: Expr) : KExp = CPS(e)(KReturn(_)) |
647 \end{lstlisting} |
691 \end{lstlisting} |
648 |
692 |
649 \noindent |
693 \noindent |
650 where we give the function \code{KReturn(_)} as the continuation. With |
694 where we give the function \code{KReturn(_)} as the continuation. With |
651 this we completed the translation of expressions into our |
695 this we completed the translation of simple expressions into our |
652 K-language. Figure~\ref{absfun} gives the complete datatypes for the |
696 K-language. Play around with some more expressions and see how the |
653 ASTs of the Fun-language and the K-values and K-expressions for the |
697 CPS-translation generates the correct code. I know this is not easy to |
654 K-language. |
698 follow code when you see it the first time. Figure~\ref{absfun} gives |
|
699 the complete datatypes for the ASTs of the Fun-language and the |
|
700 K-values and K-expressions for the K-language. The complete |
|
701 CPS-translation you can find in the code. |
|
702 |
|
703 \section*{Next Steps} |
655 |
704 |
656 Having obtained a K-expression, it is relatively straightforward to |
705 Having obtained a K-expression, it is relatively straightforward to |
657 generate a valid program for the LLVM-IR. We leave this to the |
706 generate a valid program for the LLVM-IR. We leave this to the |
658 attentive reader. What else can we do? Well it should be relatively |
707 attentive reader. What else can we do? Well it should be relatively |
659 easy to apply some common optimisations to the K-expressions. One |
708 easy to apply some common optimisations to the K-expressions. One |