17 |
17 |
18 \section*{Handout 9 (LLVM, SSA and CPS)} |
18 \section*{Handout 9 (LLVM, SSA and CPS)} |
19 |
19 |
20 Reflecting on our two tiny compilers targetting the JVM, the code |
20 Reflecting on our two tiny compilers targetting the JVM, the code |
21 generation part was actually not so hard, no? Pretty much just some |
21 generation part was actually not so hard, no? Pretty much just some |
22 post-traversal of the abstract syntax tree, yes? One of the reasons |
22 post-traversal of the abstract syntax tree. Yes? One of the reasons |
23 for this ease is that the JVM is a stack-based virtual machine and it |
23 for this ease is that the JVM is a stack-based virtual machine and it |
24 is therefore not hard to translate deeply-nested arithmetic |
24 is therefore not hard to translate deeply-nested arithmetic |
25 expressions into a sequence of instructions manipulating the |
25 expressions into a sequence of instructions manipulating the |
26 stack. The problem is that ``real'' CPUs, although supporting stack |
26 stack. That is pretty much the whole point of the JVM. The problem is |
27 operations, are not really designed to be \emph{stack machines}. The |
27 that ``real'' CPUs, although supporting stack operations, are not |
28 design of CPUs is more like: Here are some instructions and a chunk of |
28 really designed to be \emph{stack machines}. The design of CPUs is |
|
29 more like: Here are some instructions and a chunk of |
29 memory---compiler, or better compiler writers, do something with |
30 memory---compiler, or better compiler writers, do something with |
30 them. Consequently, modern compilers need to go the extra mile in |
31 them. Consequently, modern compilers need to go the extra mile in |
31 order to generate code that is much easier and faster to process by |
32 order to generate code that is much easier and faster to process by |
32 actual CPUs, rather than running code on virtual machines that |
33 actual CPUs, rather than running code on virtual machines that |
33 introduce an additional layer of indirection. To make this all |
34 introduce an additional layer of indirection. To make this all |
34 tractable for this module, we target the LLVM Intermediate |
35 tractable for this module, we target the \emph{LLVM Intermediate |
35 Language. In this way we can take advantage of the tools coming with |
36 Language} (LLVM-IR). In this way we can take advantage of the tools |
36 LLVM.\footnote{\url{http://llvm.org}} For example we do not have to |
37 coming with LLVM.\footnote{\url{http://llvm.org}} For example we do |
37 worry about things like register allocations. By using the LLVM-IR, |
38 not have to worry about things like register allocations or peephole |
38 however, we also have to pay price in the sense that generating code |
39 optimisations. By using the LLVM-IR, however, we also have to pay |
39 gets harder\ldots{}unfor\-tunately. |
40 price in the sense that generating code gets |
|
41 harder\ldots{}unfor\-tunately nothing comes for free in life. |
40 |
42 |
41 \subsection*{LLVM and the LLVM-IR} |
43 \subsection*{LLVM and the LLVM-IR} |
42 |
44 |
43 \noindent LLVM is a beautiful example |
45 \noindent LLVM is a beautiful example |
44 that projects from Academia can make a difference in the World. LLVM |
46 that projects from Academia can make a difference in the Real World. LLVM |
45 started in 2000 as a project by two researchers at the University of |
47 started in 2000 as a project by two researchers at the University of |
46 Illinois at Urbana-Champaign. At the time the behemoth of compilers was |
48 Illinois at Urbana-Champaign. At the time the behemoth of compilers was |
47 gcc with its myriad of front-ends for different programming languages (C++, Fortran, |
49 gcc with its myriad of front-ends for different programming languages (C++, Fortran, |
48 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over |
50 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over |
49 time into a monolithic gigantic piece of m\ldots ehm complicated |
51 time into a monolithic gigantic piece of m\ldots ehm complicated |
71 However, what we have to do in order to make LLVM to play ball is to |
73 However, what we have to do in order to make LLVM to play ball is to |
72 generate code in \emph{Static Single-Assignment} format (short SSA). A |
74 generate code in \emph{Static Single-Assignment} format (short SSA). A |
73 reason why LLVM uses the SSA-format, rather than JVM-like stack |
75 reason why LLVM uses the SSA-format, rather than JVM-like stack |
74 instructions, is that stack instructions are difficult to |
76 instructions, is that stack instructions are difficult to |
75 optimise---you cannot just re-arrange instructions without messing |
77 optimise---you cannot just re-arrange instructions without messing |
76 about with what is calculated on the stack. Also it is hard to find |
78 about with what is calculated on the stack. Have a look at the |
77 out if all the calculations on the stack are actually necessary and |
79 expression $((a + b) * 4) - (3 * (a + b))$ and the corresponding JVM |
78 not by chance dead code. The JVM has for all these obstacles |
80 instructions: |
79 sophisticated machinery to make such ``high-level'' code still run |
81 |
80 fast, but let's say that for the sake of argument we do not want to |
82 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape] |
81 rely on it. We want to generate fast code ourselves. This means we |
83 iload a |
82 have to work around the intricacies of what instructions CPUs can |
84 iload b |
83 actually process fast. This is what the SSA format is designed for. |
85 iadd |
|
86 ldc 4 |
|
87 imul |
|
88 ldc 3 |
|
89 iload a |
|
90 iload b |
|
91 iadd |
|
92 imul |
|
93 isub |
|
94 \end{lstlisting} |
|
95 |
|
96 \noindent |
|
97 and try to reorganise the code such that you calculate the expression |
|
98 $(a + b)$ only once. This requires either quite a bit of |
|
99 math-understanding from the compiler or you need to add ``copy-and-fetch'' |
|
100 of a result from a local variable. Also it is hard to find out if all |
|
101 the calculations on the stack are actually necessary and not by chance |
|
102 dead code. The JVM has for all these obstacles sophisticated machinery |
|
103 to make such ``high-level'' code still run fast, but let's say that |
|
104 for the sake of argument we do not want to rely on it. We want to |
|
105 generate fast code ourselves. This means we have to work around the |
|
106 intricacies of what instructions CPUs can actually process fast. This |
|
107 is what the SSA format is designed for. |
84 |
108 |
85 |
109 |
86 The main idea behind the SSA-format is to have sequences of very |
110 The main idea behind the SSA-format is to have sequences of very |
87 simple variable assignments where every (tmp)variable is assigned only |
111 simple variable assignments where every (tmp)variable is assigned only |
88 once. The assignments need to be simple in the sense that they can be |
112 once. The assignments need to be simple in the sense that they can be |
103 |
127 |
104 \noindent where every tmpX-variable is used only once (we could, for |
128 \noindent where every tmpX-variable is used only once (we could, for |
105 example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if |
129 example, not write \texttt{tmp1 = add tmp2 tmp3} in Line 5 even if |
106 this would not change the overall result). At the end we have a |
130 this would not change the overall result). At the end we have a |
107 return-instruction wich contains the final result of the |
131 return-instruction wich contains the final result of the |
108 expression. As can be seen, the task we have to solve is to take apart |
132 expression. As can be seen the task we have to solve for generating |
109 compound expressions as shown above and transfrom them into a sequence |
133 SSA-code is to take apart compound expressions into its most basic |
110 of simple assignments. Note that this for example means we have to |
134 ''particles'' and transfrom them into a sequence of simple assignments |
111 fix the order in which the expression is calculated. |
135 that calculates the desired result. Note that this means we have to |
|
136 fix the order in which the expression is calculated, like from the |
|
137 left to right. |
112 |
138 |
113 There are sophisticated algorithms for imperative languages, like C, |
139 There are sophisticated algorithms for imperative languages, like C, |
114 that efficiently transform high-level programs into SSA-format. But |
140 that efficiently transform high-level programs into SSA-format. But |
115 we can ignore them here. We want to compile a functional language and |
141 we can ignore them here. We want to compile a functional language and |
116 there things get much more interesting than just sophisticated. We |
142 there things get much more interesting than just sophisticated. We |
353 case KReturn(v: KVal) |
379 case KReturn(v: KVal) |
354 } |
380 } |
355 \end{lstlisting} |
381 \end{lstlisting} |
356 |
382 |
357 \noindent |
383 \noindent |
358 By having in \texttt{KLet} taking first a string (standing for a |
384 By having \texttt{KLet} taking first a string (standing for a |
359 tmp-variable) and second a value, we can fulfil the SSA constraint in |
385 tmp-variable) and second a value, we can fulfil the SSA constraint in |
360 assignments ``by con\-struction''---there is no way we could write |
386 assignments ``by con\-struction''---there is no way we could write |
361 anything else than a K-value. Note that the third argument of a |
387 anything else than a K-value. Note that the third argument of a |
362 \texttt{KLet} is again a K-expression, meaning either another |
388 \texttt{KLet} is again a K-expression, meaning either another |
363 \texttt{KLet} or a \texttt{KReturn}. In this way we can construct a |
389 \texttt{KLet} or a \texttt{KReturn}. In this way we can construct a |
364 sequence of computations and indicate what is the final result of the |
390 sequence of computations and indicate what the final result of the |
365 computations. According to the SSA-format, we also have to ensure |
391 computations is. According to the SSA-format, we also have to ensure |
366 that all intermediary computations are given (fresh) names---we will |
392 that all intermediary computations are given (fresh) names---we will |
367 use an (ugly) counter for this. |
393 use an (ugly) counter for this. |
368 |
394 |
369 To sum up, K-values are the atomic operations that can be on the |
395 To sum up, K-values are the atomic operations that can be on the |
370 right-hand side of assignemnts. The K-language is restricted such that |
396 right-hand side of assignemnts. The K-language is restricted such that |
371 it is easy to generate the SSA-format for the LLVM-IR. What remains to |
397 it is easy to generate the SSA-format for the LLVM-IR. What remains to |
372 be done is a translation of our Fun-language into the K-language. The |
398 be done is a translation of our Fun-language into the K-language. The |
373 main difficulty is that we need to order the computation---in the |
399 main difficulty is that we need to order the computation---in the |
374 K-language we only have linear sequence of instructions. Before we |
400 K-language we only have linear sequences of instructions. Before we |
375 explain this, we have a look at some programs in CPS-style. |
401 explain this, we have a look at some programs in CPS-style. |
376 |
402 |
377 |
403 |
378 |
404 |
379 |
405 |
477 |
503 |
478 \section*{Worked Example} |
504 \section*{Worked Example} |
479 |
505 |
480 Let us now come back to the CPS-translations for the Fun-language. |
506 Let us now come back to the CPS-translations for the Fun-language. |
481 Though we will start with a simpler subset containing only numbers, |
507 Though we will start with a simpler subset containing only numbers, |
482 arithmetic expressions and function calls. The main difficulty of |
508 arithmetic expressions and function calls---no variables for the |
483 generating instructions in SSA-format is that large compound |
509 moment. The main difficulty of generating instructions in SSA-format |
484 expressions need to be broken up into smaller pieces and intermediate |
510 is that large compound expressions need to be broken up into smaller |
485 results need to be chained into later instructions. To do this |
511 pieces and intermediate results need to be chained into later |
486 conveniently, we use the CPS-translation mechanism. What continuations |
512 instructions. To do this conveniently, we use the CPS-translation |
487 essentially solve is the following problem: Given an expressions |
513 mechanism. What the continuations essentially solve is the following |
|
514 problem: Given an expressions |
488 |
515 |
489 \begin{equation}\label{exp} |
516 \begin{equation}\label{exp} |
490 (1 + 2) * (3 + 4) |
517 (1 + 2) * (3 + 4) |
491 \end{equation} |
518 \end{equation} |
492 |
519 |
493 \noindent |
520 \noindent |
494 which of the subexpressions should be calculated first? We just |
521 which of the subexpressions should be calculated first? We are going |
495 arbitrarily going to decide that the calculation will be from left to |
522 arbitrarily to decide that the calculation will be from left to |
496 right. Other languages make different choices---C famously is right to |
523 right. Other languages make different choices---C famously is right to |
497 left. In our case this means we have to tear the expression shown in |
524 left. In our case this means we have to tear the expression shown in |
498 \eqref{exp} apart as follows: |
525 \eqref{exp} apart as follows: |
499 |
526 |
500 \[ |
527 \[ |
567 \noindent |
594 \noindent |
568 where \code{k} is the continuation and \code{e} is the expression to |
595 where \code{k} is the continuation and \code{e} is the expression to |
569 be compiled. The result of the function is a K-expression, which later |
596 be compiled. The result of the function is a K-expression, which later |
570 can be compiled into LLVM-IR code. |
597 can be compiled into LLVM-IR code. |
571 |
598 |
572 In case we have numbers, then we can just pass them in CPS-translation |
599 In case we have numbers, then we can just pass them in the CPS-translation |
573 to the continuations because numbers need not be further teared apart |
600 to the continuations because numbers need not be further teared apart |
574 as they are already primitive. Passing the number to the continuation |
601 as they are already primitive. Passing the number to the continuation |
575 means we apply the continuation like |
602 means we apply the continuation like |
576 |
603 |
577 \begin{lstlisting}[language=Scala,numbers=none] |
604 \begin{lstlisting}[language=Scala,numbers=none] |
593 \end{lstlisting} |
620 \end{lstlisting} |
594 |
621 |
595 \noindent |
622 \noindent |
596 What we essentially have to do in this case is the following: compile |
623 What we essentially have to do in this case is the following: compile |
597 the subexpressions \texttt{e1} and \texttt{e2}. They will produce some |
624 the subexpressions \texttt{e1} and \texttt{e2}. They will produce some |
598 result that is stored in two temporary variables (assuming they are more |
625 result that is stored in two temporary variables (assuming \texttt{e1} and \texttt{e2} are more |
599 complicated than just numbers). We need to use these two temporary |
626 complicated than just numbers). We need to use these two temporary |
600 variables and feed them into a new assignment of the form |
627 variables and feed them into a new assignment of the form |
601 |
628 |
602 \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}] |
629 \begin{lstlisting}[language=LLVMIR,numbers=none,escapeinside={(*@}{@*)}] |
603 let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in |
630 let z = op (*@$\Box_\texttt{r1}$@*) (*@$\Box_\texttt{r2}$@*) in |
619 as type for the continuation. Once we created the assignment with the |
646 as type for the continuation. Once we created the assignment with the |
620 fresh temporary variable \texttt{z}, we need to ``communicate'' that |
647 fresh temporary variable \texttt{z}, we need to ``communicate'' that |
621 the result of the computation of the arithmetic expression is stored |
648 the result of the computation of the arithmetic expression is stored |
622 in \texttt{z} to the computations that follow. In this way we apply |
649 in \texttt{z} to the computations that follow. In this way we apply |
623 the continuation \texttt{k} with this new variable (essentially we are |
650 the continuation \texttt{k} with this new variable (essentially we are |
624 plugging in a hole further down the line). Hope this makes sense. |
651 plugging in a hole further down the line). Hope this makes sense!? If not, |
|
652 play with the given code yourself. |
625 |
653 |
626 The last case we need to consider in our small expression language are |
654 The last case we need to consider in our small expression language are |
627 function calls. For them remember each argument of the function |
655 function calls. For them remember each argument of the function |
628 call can in SSA-format only be a variable or a number. |
656 call can in SSA-format only be a variable or a number. Here is the |
|
657 complete code for this case: |
629 |
658 |
630 \begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm] |
659 \begin{lstlisting}[language=Scala,numbers=left,xleftmargin=0mm] |
631 case Call(fname, args) => { |
660 case Call(fname, args) => { |
632 def aux(args: List[Expr], vs: List[KVal]): KExp = args match { |
661 def aux(args: List[Expr], vs: List[KVal]): KExp = args match { |
633 case Nil => { |
662 case Nil => { |
639 aux(args, Nil) |
668 aux(args, Nil) |
640 } |
669 } |
641 \end{lstlisting} |
670 \end{lstlisting} |
642 |
671 |
643 \noindent |
672 \noindent |
644 For this case we introduce an auxiliary function that compiles first all |
673 As can be seen, we introduce an auxiliary function that compiles first all |
645 function arguments---remember in our source language we can have calls |
674 function arguments---remember in our source language we can have calls |
646 such as $foo(3 + 4, g(3))$ where we first have to create temporary |
675 such as $foo(3 + 4, g(3))$ where we first have to create temporary |
647 variables (and computations) for each argument. Therefore \texttt{aux} |
676 variables (and computations) for each argument. Therefore \texttt{aux} |
648 analyses the argument list from left to right. In case there is an |
677 analyses the argument list from left to right. In case there is an |
649 argument \texttt{a} on the front of the list (the case \texttt{a::as} |
678 argument \texttt{a} on the front of the list (the case \texttt{a::as} |
650 in Line 7), we call CPS recursively for the corresponding |
679 in Line 7), we call CPS recursively for the corresponding |
651 subexpression. The temporary variable containing the result for this |
680 subexpression. The temporary variable containing the result for this |
652 argument we add to the end of the K-values we have already analysed |
681 argument, we add to the end of the K-values we have already analysed |
653 before. Again very conveniently we can use the recursive call to |
682 before. Again very conveniently we can use the recursive call to |
654 \texttt{aux} as the continuation for the computations that |
683 \texttt{aux} as the continuation for the computations that |
655 follow. When we reach the end of the argument list (the |
684 follow. When we reach the end of the argument list (the |
656 \texttt{Nil}-case in Lines 3--6), then we collect all the K-values |
685 \texttt{Nil}-case in Lines 3--6), then we collect all the K-values |
657 \texttt{v1} to \texttt{vn} and call the function in the K-language |
686 \texttt{v1} to \texttt{vn} and call the function in the K-language |
701 CPS-translation you can find in the code. |
730 CPS-translation you can find in the code. |
702 |
731 |
703 \section*{Next Steps} |
732 \section*{Next Steps} |
704 |
733 |
705 Having obtained a K-expression, it is relatively straightforward to |
734 Having obtained a K-expression, it is relatively straightforward to |
706 generate a valid program for the LLVM-IR. We leave this to the |
735 generate a valid program for the LLVM-IR---remember the K-language |
707 attentive reader. What else can we do? Well it should be relatively |
736 already enforces the SSA convention of a linear sequence of primitive |
708 easy to apply some common optimisations to the K-expressions. One |
737 instructions involving only unique temporary variables. |
709 optimisations is called constant folding---for example if we have an |
738 We leave this step to the attentive reader. |
710 expression $3 + 4$ then we can replace it by just $5$ instead of |
739 |
711 generating code to compute $5$. Now this information needs to be |
740 What else can we do? Well it should be relatively easy to apply some |
712 propagated to the next computation step to see whether any further |
741 common optimisations to the K-expressions. One optimisations is called |
713 constant foldings are possible. Another useful technique is common |
742 constant folding---for example if we have an expression $3 + 4$ then |
714 subexpression elimination, meaning if you have twice a calculation of |
743 we can replace it by just $5$ instead of generating code to compute |
715 a function $foo(a)$, then we want to call it only once and store the |
744 $5$. Now this information needs to be propagated to the next |
716 result in a temporary variable that can be used instead of the second, |
745 computation step to see whether any further constant foldings are |
717 or third, call to $foo(a)$. Again I leave this to the attentive reader |
746 possible. Another useful technique is common subexpression |
718 to work out and implement. |
747 elimination, meaning if you have twice a calculation of a function |
|
748 $foo(a)$, then we want to call it only once and store the result in a |
|
749 temporary variable that can be used instead of the second, or third, |
|
750 call to $foo(a)$. Again I leave this to the attentive reader to work |
|
751 out and implement. |
719 |
752 |
720 |
753 |
721 \begin{figure}[p]\small |
754 \begin{figure}[p]\small |
722 \begin{lstlisting}[language=Scala,numbers=none] |
755 \begin{lstlisting}[language=Scala,numbers=none] |
723 // Fun language (expressions) |
756 // Fun language (expressions) |