% !TEX program = xelatex\documentclass{article}\usepackage{../style}\usepackage{../langs}\usepackage{../graphics}\usepackage{../grammar}%%\usepackage{multicol}%%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}\begin{document}\fnote{\copyright{} Christian Urban, King's College London, 2019}\section*{Handout 9 (LLVM, SSA and CPS)}Reflecting on our tiny compiler targetting the JVM, code generation wasactually not so hard, no? One of the main reason for this ease is thatthe JVM is a stack-based virtual machine and it is, for example, nothard to translate arithmetic expressions into instructions manipulatingthe stack. The problem is that ``real'' CPUs, although supporting stackoperations, are not really \emph{stack machines}. They are just notoptimised for this way of calculating things. The design of CPUs is morelike, here is a piece of memory---compiler, or compiler writer, dosomething with it. Otherwise get lost. So in the name of raw speed,modern compilers go the extra mile and generate code that is much easierand faster to process by CPUs. Another reason why it makes sense to go the extra mile is that stackinstructions are very difficult to optimise---you cannot just re-arrangeinstructions without messing about with what is calculated on the stack.Also it is hard to find out if all the calculations on the stack areactually necessary and not by chance dead code. The JVM has for all thissophisticated machinery to make such ``high-level'' code still run fast,but let's say that for the sake of argument we want to not rely on it.We want to generate fast code ourselves. This means we have to work aroundthe intricacies of what instructions CPUs can actually process. To makethis all tractable, we target the LLVM Intermediate Language. In this waywe can take advantage of the tools coming with LLVM and for example do nothave to worry about things like that CPUs have only a limited amount ofregisters. LLVM\footnote{\url{http://llvm.org}} is a beautiful example thatprojects from Academia can make a difference in the world. LLVM wasstarted in 2000 by two researchers at the University of Illinois atUrbana–Champaign. At the time the behemoth of compilers was gcc with itsmyriad of front-ends for other languages (e.g.~gfortran, Ada, Go,Objective-C, Pascal etc). The problem was that gcc morphed over timeinto a monolithic gigantic piece of m\ldots ehm software, which you couldnot mess about in an afternoon. In contrast LLVM was a modular suite oftools with which you could play around easily and try out something new.LLVM became a big player once Apple hired one of the original developers(I cannot remember the reason why Apple did not want to use gcc, butmaybe they were also just disgusted by big monolithic codebase). Anyway,LLVM is now the big player and gcc is more or less legacy. This does notmean that programming languages like C and C++ are dying out any timesoon---they are nicely supported by LLVM.Targetting the LLVM-IR also means we can profit from the very modularstructure of the LLVM compiler and let for example the compiler generatecode for X86, or ARM etc. We can be agnostic about where our codeactually runs. However, what we have to do is to generate code in\emph{Static Single-Assignment} format (short SSA), because that is whatthe LLVM-IR expects from us and which is the intermediate format thatLLVM can use to do cool things (like targetting strange architectures)and allocating memory efficiently. The idea behind SSA is to use very simple variable assignments whereevery variable is assigned only once. The assignments also need to beextremely primitive in the sense that they can be just simple operationslike addition, multiplication, jumps and so on. A typical program in SSAis \begin{lstlisting}[language=LLVM,numbers=none] x := 1 y := 2 z := x + y\end{lstlisting}\noindent where every variable is used only once. You cannot for example have\begin{lstlisting}[language=LLVM,numbers=none] x := 1 y := 2 x := x + y\end{lstlisting}\noindent because in this snippet \texttt{x} is assigned twice. Thereare sophisticated algorithm for imperative languages, like C, forefficiently transforming a program into SSA format, but we do not haveto be concerned about those. We want to compile a functional languageand there things get much more interesting than just sophisticated. Wewill need to have a look at CPS translations, which stands forContinuation-Passing-Style---basically black art. So sit tight.\subsection*{CPS-Translations}What is good about our simple fun language is that it basically only contains expressions (be they arithmetic expressions or boolean expressions).The only exceptions are function definitions, for which we can easilyuse the mechanism of defining functions in LLVM-IR. For example the simplefun program \begin{lstlisting}[language=Scala,numbers=none]def dble(x) = x * x\end{lstlisting}\noindentcan be compiled into the following LLVM-IR function:\begin{lstlisting}[language=LLVM]define i32 dble(i32 %x) { %tmp = mul i32 %x, % x ret i32 %tmp} \end{lstlisting}\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: