afl-material: comparison handouts/ho09.tex

equal deleted inserted replaced

-:ff3b48da282c
+:8fc109f36b78
 \section*{Handout 9 (LLVM, SSA and CPS)}
 Reflecting on our tiny compiler targetting the JVM, the code generation
 part was actually not so hard, no? Pretty much just some post-traversal
-of the abstract syntax tree, yes? One of the main reason for this ease is
+of the abstract syntax tree, yes? One of the main reason for this ease
-that the JVM is a stack-based virtual machine and it is therefore not
+is that the JVM is a stack-based virtual machine and it is therefore not
 hard to translate arithmetic expressions into a sequence of instructions
 manipulating the stack. The problem is that ``real'' CPUs, although
 supporting stack operations, are not really designed to be \emph{stack
 machines}.  The design of CPUs is more like, here is a chunk of
 memory---compiler, or better compiler writers, do something with it.
-Consequently, modern compilers need to go the extra mile in order to generate
+Consequently, modern compilers need to go the extra mile in order to
-code that is much easier and faster to process by CPUs.
+generate code that is much easier and faster to process by CPUs. To make
+this all tractable for this module, we target the LLVM Intermediate
+Language. In this way we can take advantage of the tools coming with
-Another reason why it makes sense to go the extra mile is that stack
+LLVM. For example we do not have to worry about things like register
-instructions are very difficult to optimise---you cannot just re-arrange
+allocations.\bigskip
-instructions without messing about with what is calculated on the stack.
-Also it is hard to find out if all the calculations on the stack are
-actually necessary and not by chance dead code. The JVM has for all this
-sophisticated machinery to make such ``high-level'' code still run fast,
-but let's say that for the sake of argument we do not want to rely on
-it. We want to generate fast code ourselves. This means we have to work
-around the intricacies of what instructions CPUs can actually process.
-To make this all tractable for this module, we target the LLVM
-Intermediate Language. In this way we can take advantage of the tools
-coming with LLVM. For example we do not have to worry about things like
-register allocations.\bigskip
 \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
 that projects from Academia can make a difference in the world. LLVM
 started in 2000 as a project by two researchers at the  University of
 Illinois at Urbana-Champaign. At the time the behemoth of compilers was
 result of type \texttt{i32}. Each arithmetic operation, like addition or
 multiplication, are also prefixed with the type they operate on.
 Obviously these types need to match up\ldots{} but since we have in our
 programs only integers, \texttt{i32} everywhere will do.
-Conveniently, you can use the program \texttt{lli}, which comes with LLVM, to interprete
+Conveniently, you can use the program \texttt{lli}, which comes with
-programs written in the LLVM-IR. So you can easily check whether the
+LLVM, to interpret programs written in the LLVM-IR. So you can easily
-code you produced actually works. To get a running program that does
+check whether the code you produced actually works. To get a running
-something interesting you need to add some boilerplate about printing out
+program that does something interesting you need to add some boilerplate
-numbers and a main-function that is the entrypoint for the program (see Figure~\ref{lli}).
+about printing out numbers and a main-function that is the entrypoint
+for the program (see Figure~\ref{lli}). You can generate a binary for
-\begin{figure}[t]
+this program using \texttt{llc}-compiler in order to generate an object
-\lstinputlisting[language=LLVM]{../progs/sqr.ll}
+file and then use gcc (clang) for generating a binary:
-\caption{\label{lli}}
+\begin{lstlisting}[language=bash,numbers=none]
+llc -filetype=obj sqr.ll
+gcc sqr.o -o a.out
+./a.out
+\end{lstlisting}
+\begin{figure}[t]\small
+\lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
+\caption{An LLVM-IR program for calculating the square function. The
+interesting function is \texttt{sqr} in Lines 13 -- 16. The main
+function calls \texttt{sqr} and prints out the result. The other
+code is boilerplate for printing out integers.\label{lli}}
 \end{figure}
-\begin{figure}[t]
+\subsection*{Our Own Intermediate Language}
+Remember compilers have to solve the problem of bridging the gap between
+``high-level'' programs and ``low-level'' hardware. If the gap is tool
+wide then a good strategy is to lay a stepping stone somewhere in
+between. The LLVM-IR itself is such a stepping stone to make the task of
+generating code easier. Like a real compiler we will use another
+stepping stone which I call \emph{K-language}. For this remember
+expressions (and boolean expressions) in the Fun language are given by
+the code on top of Figure~\ref{absfun}
+\begin{figure}[p]\small
 \begin{lstlisting}[language=Scala,numbers=none]
-abstract class Exp extends Serializable
+// Fun-language (expressions)
-abstract class BExp extends Serializable
+abstract class Exp
-abstract class Decl extends Serializable
+abstract class BExp
-case class Main(e: Exp) extends Decl
-case class Def(name: String, args: List[String], body: Exp)
-extends Decl
 case class Call(name: String, args: List[Exp]) extends Exp
 case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
 case class Write(e: Exp) extends Exp
 case class Var(s: String) extends Exp
 case class Num(i: Int) extends Exp
 case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
 case class Sequence(e1: Exp, e2: Exp) extends Exp
 case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
+// K-language (K-expressions, K-values)
+abstract class KExp
+abstract class KVal
+case class KVar(s: String) extends KVal
+case class KNum(i: Int) extends KVal
+case class Kop(o: String, v1: KVal, v2: KVal) extends KVal
+case class KCall(o: String, vrs: List[KVal]) extends KVal
+case class KWrite(v: KVal) extends KVal
+case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
+case class KLet(x: String, e1: KVal, e2: KExp) extends KExp
+case class KReturn(v: KVal) extends KExp
 \end{lstlisting}
 \caption{Abstract syntax trees for the Fun language.\label{absfun}}
 \end{figure}
 \subsection*{CPS-Translations}
+Another reason why it makes sense to go the extra mile is that stack
+instructions are very difficult to optimise---you cannot just re-arrange
+instructions without messing about with what is calculated on the stack.
+Also it is hard to find out if all the calculations on the stack are
+actually necessary and not by chance dead code. The JVM has for all this
+sophisticated machinery to make such ``high-level'' code still run fast,
+but let's say that for the sake of argument we do not want to rely on
+it. We want to generate fast code ourselves. This means we have to work
+around the intricacies of what instructions CPUs can actually process.
 \end{document}
 %%% Local Variables:

changeset 679	8fc109f36b78
parent 678	ff3b48da282c
child 680	eecc4d5a2172