afl-material: comparison handouts/ho09.tex

equal deleted inserted replaced

-:decfd8cf8180
+:ff3b48da282c
 \fnote{\copyright{} Christian Urban, King's College London, 2019}
 \section*{Handout 9 (LLVM, SSA and CPS)}
-Reflecting on our tiny compiler targetting the JVM, code generation was
+Reflecting on our tiny compiler targetting the JVM, the code generation
-actually not so hard, no? One of the main reason for this ease is that
+part was actually not so hard, no? Pretty much just some post-traversal
-the JVM is a stack-based virtual machine and it is, for example, not
+of the abstract syntax tree, yes? One of the main reason for this ease is
-hard to translate arithmetic expressions into instructions manipulating
+that the JVM is a stack-based virtual machine and it is therefore not
-the stack. The problem is that ``real'' CPUs, although supporting stack
+hard to translate arithmetic expressions into a sequence of instructions
-operations, are not really \emph{stack machines}. They are just not
+manipulating the stack. The problem is that ``real'' CPUs, although
-optimised for this way of calculating things. The design of CPUs is more
+supporting stack operations, are not really designed to be \emph{stack
-like, here is a piece of memory---compiler, or compiler writer, do
+machines}.  The design of CPUs is more like, here is a chunk of
-something with it. Otherwise get lost. So in the name of raw speed,
+memory---compiler, or better compiler writers, do something with it.
-modern compilers go the extra mile and generate code that is much easier
+Consequently, modern compilers need to go the extra mile in order to generate
-and faster to process by CPUs.
+code that is much easier and faster to process by CPUs.
 Another reason why it makes sense to go the extra mile is that stack
 instructions are very difficult to optimise---you cannot just re-arrange
 instructions without messing about with what is calculated on the stack.
 Also it is hard to find out if all the calculations on the stack are
 actually necessary and not by chance dead code. The JVM has for all this
 sophisticated machinery to make such ``high-level'' code still run fast,
-but let's say that for the sake of argument we want to not rely on it.
+but let's say that for the sake of argument we do not want to rely on
-We want to generate fast code ourselves. This means we have to work around
+it. We want to generate fast code ourselves. This means we have to work
-the intricacies of what instructions CPUs can actually process. To make
+around the intricacies of what instructions CPUs can actually process.
-this all tractable, we target the LLVM Intermediate Language. In this way
+To make this all tractable for this module, we target the LLVM
-we can take advantage of the tools coming with LLVM and for example do not
+Intermediate Language. In this way we can take advantage of the tools
-have to worry about things like that CPUs have only a limited amount of
+coming with LLVM. For example we do not have to worry about things like
-registers.
+register allocations.\bigskip
-LLVM\footnote{\url{http://llvm.org}} is a beautiful example that
+\noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
-projects from Academia can make a difference in the world. LLVM was
+that projects from Academia can make a difference in the world. LLVM
-started in 2000 by two researchers at the  University of Illinois at
+started in 2000 as a project by two researchers at the  University of
-Urbana–Champaign. At the time the behemoth of compilers was gcc with its
+Illinois at Urbana-Champaign. At the time the behemoth of compilers was
-myriad of front-ends for other languages (e.g.~gfortran, Ada, Go,
+gcc with its myriad of front-ends for other languages (e.g.~Fortran,
-Objective-C, Pascal etc). The problem was that gcc morphed over time
+Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
-into a monolithic gigantic piece of m\ldots ehm software, which you could
+time into a monolithic gigantic piece of m\ldots ehm software, which you
-not mess about in an afternoon. In contrast LLVM was a modular suite of
+could not mess about in an afternoon. In contrast, LLVM is designed to
-tools with which you could play around easily and try out something new.
+be a modular suite of tools with which you could play around easily and
-LLVM became a big player once Apple hired one of the original developers
+try out something new. LLVM became a big player once Apple hired one of
-(I cannot remember the reason why Apple did not want to use gcc, but
+the original developers (I cannot remember the reason why Apple did not
-maybe they were also just disgusted by big monolithic codebase). Anyway,
+want to use gcc, but maybe they were also just disgusted by its big
-LLVM is now the big player and gcc is more or less legacy. This does not
+monolithic codebase). Anyway, LLVM is now the big player and gcc is more
-mean that programming languages like C and C++ are dying out any time
+or less legacy. This does not mean that programming languages like C and
-soon---they are nicely supported by LLVM.
+C++ are dying out any time soon---they are nicely supported by LLVM.
-Targetting the LLVM-IR also means we can profit from the very modular
+Targetting the LLVM Intermediate Language, or Intermediate
-structure of the LLVM compiler and let for example the compiler generate
+Representation (short LLVM-IR), also means we can profit from the very
-code for X86, or ARM etc. We can be agnostic about where our code
+modular structure of the LLVM compiler and let for example the compiler
-actually runs. However, what we have to do is to generate code in
+generate code for X86, or ARM etc. That means we can be agnostic about
-\emph{Static Single-Assignment} format (short SSA), because that is what
+where our code actually runs. However, what we have to do is to generate
-the LLVM-IR expects from us and which is the intermediate format that
+code in \emph{Static Single-Assignment} format (short SSA), because that
-LLVM can use to do cool things (like targetting strange architectures)
+is what the LLVM-IR expects from us. LLVM-IR is the intermediate format
-and allocating memory efficiently.
+that LLVM uses for doing cool things, like targetting strange
+architectures, optimising code and allocating memory efficiently.
-The idea behind SSA is to use very simple variable assignments where
+The idea behind the SSA format is to use very simple variable
-every variable is assigned only once. The assignments also need to be
+assignments where every variable is assigned only once. The assignments
-extremely primitive in the sense that they can be just simple operations
+also need to be primitive in the sense that they can be just simple
-like addition, multiplication, jumps and so on. A typical program in SSA
+operations like addition, multiplication, jumps, comparisons and so on.
-is
+An idealised snippet of a program in SSA is
 \begin{lstlisting}[language=LLVM,numbers=none]
 x := 1
 y := 2
 z := x + y
 \end{lstlisting}
-\noindent where every variable is used only once. You cannot for example have
+\noindent where every variable is used only once (we could not write
+\texttt{x := x + y} in the last line for example).  There are
+sophisticated algorithms for imperative languages, like C, that
+efficiently transform a high-level program into SSA format. But we can
+ignore them here. We want to compile a functional language and there
+things get much more interesting than just sophisticated. We will need
+to have a look at CPS translations, where the CPS stands for
+Continuation-Passing-Style---basically black programming art or
+abracadabra programming. So sit tight.
-\begin{lstlisting}[language=LLVM,numbers=none]
+\subsection*{LLVM-IR}
-x := 1
-y := 2
-x := x + y
-\end{lstlisting}
-\noindent because in this snippet \texttt{x} is assigned twice. There
+Before we start, lets first have a look at the \emph{LLVM Intermediate
-are sophisticated algorithm for imperative languages, like C, for
+Representation}. What is good about our simple Fun language is that it
-efficiently transforming a program into SSA format, but we do not have
+basically only contains expressions (be they arithmetic expressions or
-to be concerned about those. We want to compile a functional language
+boolean expressions). The exception is function definitions. Luckily,
-and there things get much more interesting than just sophisticated. We
+for them we can use the mechanism of defining functions in LLVM-IR. For
-will need to have a look at CPS translations, which stands for
+example the simple Fun program
-Continuation-Passing-Style---basically black art. So sit tight.
-\subsection*{CPS-Translations}
-What is good about our simple fun language is that it basically only
-contains expressions (be they arithmetic expressions or boolean expressions).
-The only exceptions are function definitions, for which we can easily
-use the mechanism of defining functions in LLVM-IR. For example the simple
-fun program
 \begin{lstlisting}[language=Scala,numbers=none]
-def dble(x) = x * x
+def sqr(x) = x * x
 \end{lstlisting}
 \noindent
 can be compiled into the following LLVM-IR function:
 \begin{lstlisting}[language=LLVM]
-define i32 dble(i32 %x) {
+define i32 @sqr(i32 %x) {
-%tmp =  mul i32 %x, % x
+%tmp = mul i32 %x, %x
 ret i32 %tmp
 }
 \end{lstlisting}
+\noindent First to notice is that all variable names in the LLVM-IR are
+prefixed by \texttt{\%}; function names need to be prefixed with @.
+Also, the LLVM-IR is a fully typed language. The \texttt{i32} type stands
+for a 32-bit integer. There are also types for 64-bit integers, chars
+(\texttt{i8}), floats, arrays and even pointer types. In teh code above,
+\texttt{sqr} takes an argument of type \texttt{i32} and produces a
+result of type \texttt{i32}. Each arithmetic operation, like addition or
+multiplication, are also prefixed with the type they operate on.
+Obviously these types need to match up\ldots{} but since we have in our
+programs only integers, \texttt{i32} everywhere will do.
+Conveniently, you can use the program \texttt{lli}, which comes with LLVM, to interprete
+programs written in the LLVM-IR. So you can easily check whether the
+code you produced actually works. To get a running program that does
+something interesting you need to add some boilerplate about printing out
+numbers and a main-function that is the entrypoint for the program (see Figure~\ref{lli}).
+\begin{figure}[t]
+\lstinputlisting[language=LLVM]{../progs/sqr.ll}
+\caption{\label{lli}}
+\end{figure}
+\begin{figure}[t]
+\begin{lstlisting}[language=Scala,numbers=none]
+abstract class Exp extends Serializable
+abstract class BExp extends Serializable
+abstract class Decl extends Serializable
+case class Main(e: Exp) extends Decl
+case class Def(name: String, args: List[String], body: Exp)
+extends Decl
+case class Call(name: String, args: List[Exp]) extends Exp
+case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
+case class Write(e: Exp) extends Exp
+case class Var(s: String) extends Exp
+case class Num(i: Int) extends Exp
+case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
+case class Sequence(e1: Exp, e2: Exp) extends Exp
+case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
+\end{lstlisting}
+\caption{Abstract syntax trees for the Fun language.\label{absfun}}
+\end{figure}
+\subsection*{CPS-Translations}
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 678	ff3b48da282c
parent 677	decfd8cf8180
child 679	8fc109f36b78