afl-material: handouts/ho09.tex@ff3b48da282c (annotated)

677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	1	% !TEX program = xelatex
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	2	\documentclass{article}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	3	\usepackage{../style}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	4	\usepackage{../langs}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	5	\usepackage{../graphics}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	6	\usepackage{../grammar}
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	7	%%\usepackage{multicol}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	8
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	9	%%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	10
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	11	\begin{document}
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	12	\fnote{\copyright{} Christian Urban, King's College London, 2019}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	13
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	14
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	15	\section*{Handout 9 (LLVM, SSA and CPS)}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	16
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	17	Reflecting on our tiny compiler targetting the JVM, the code generation
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	18	part was actually not so hard, no? Pretty much just some post-traversal
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	19	of the abstract syntax tree, yes? One of the main reason for this ease is
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	20	that the JVM is a stack-based virtual machine and it is therefore not
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	21	hard to translate arithmetic expressions into a sequence of instructions
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	22	manipulating the stack. The problem is that ``real'' CPUs, although
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	23	supporting stack operations, are not really designed to be \emph{stack
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	24	machines}. The design of CPUs is more like, here is a chunk of
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	25	memory---compiler, or better compiler writers, do something with it.
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	26	Consequently, modern compilers need to go the extra mile in order to generate
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	27	code that is much easier and faster to process by CPUs.
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	28
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	29
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	30	Another reason why it makes sense to go the extra mile is that stack
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	31	instructions are very difficult to optimise---you cannot just re-arrange
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	32	instructions without messing about with what is calculated on the stack.
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	33	Also it is hard to find out if all the calculations on the stack are
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	34	actually necessary and not by chance dead code. The JVM has for all this
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	35	sophisticated machinery to make such ``high-level'' code still run fast,
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	36	but let's say that for the sake of argument we do not want to rely on
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	37	it. We want to generate fast code ourselves. This means we have to work
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	38	around the intricacies of what instructions CPUs can actually process.
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	39	To make this all tractable for this module, we target the LLVM
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	40	Intermediate Language. In this way we can take advantage of the tools
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	41	coming with LLVM. For example we do not have to worry about things like
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	42	register allocations.\bigskip
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	43
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	44	\noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	45	that projects from Academia can make a difference in the world. LLVM
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	46	started in 2000 as a project by two researchers at the University of
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	47	Illinois at Urbana-Champaign. At the time the behemoth of compilers was
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	48	gcc with its myriad of front-ends for other languages (e.g.~Fortran,
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	49	Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	50	time into a monolithic gigantic piece of m\ldots ehm software, which you
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	51	could not mess about in an afternoon. In contrast, LLVM is designed to
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	52	be a modular suite of tools with which you could play around easily and
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	53	try out something new. LLVM became a big player once Apple hired one of
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	54	the original developers (I cannot remember the reason why Apple did not
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	55	want to use gcc, but maybe they were also just disgusted by its big
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	56	monolithic codebase). Anyway, LLVM is now the big player and gcc is more
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	57	or less legacy. This does not mean that programming languages like C and
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	58	C++ are dying out any time soon---they are nicely supported by LLVM.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	59
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	60	Targetting the LLVM Intermediate Language, or Intermediate
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	61	Representation (short LLVM-IR), also means we can profit from the very
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	62	modular structure of the LLVM compiler and let for example the compiler
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	63	generate code for X86, or ARM etc. That means we can be agnostic about
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	64	where our code actually runs. However, what we have to do is to generate
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	65	code in \emph{Static Single-Assignment} format (short SSA), because that
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	66	is what the LLVM-IR expects from us. LLVM-IR is the intermediate format
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	67	that LLVM uses for doing cool things, like targetting strange
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	68	architectures, optimising code and allocating memory efficiently.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	69
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	70	The idea behind the SSA format is to use very simple variable
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	71	assignments where every variable is assigned only once. The assignments
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	72	also need to be primitive in the sense that they can be just simple
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	73	operations like addition, multiplication, jumps, comparisons and so on.
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	74	An idealised snippet of a program in SSA is
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	75
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	76	\begin{lstlisting}[language=LLVM,numbers=none]
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	77	x := 1
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	78	y := 2
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	79	z := x + y
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	80	\end{lstlisting}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	81
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	82	\noindent where every variable is used only once (we could not write
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	83	\texttt{x := x + y} in the last line for example). There are
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	84	sophisticated algorithms for imperative languages, like C, that
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	85	efficiently transform a high-level program into SSA format. But we can
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	86	ignore them here. We want to compile a functional language and there
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	87	things get much more interesting than just sophisticated. We will need
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	88	to have a look at CPS translations, where the CPS stands for
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	89	Continuation-Passing-Style---basically black programming art or
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	90	abracadabra programming. So sit tight.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	91
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	92	\subsection*{LLVM-IR}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	93
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	94	Before we start, lets first have a look at the \emph{LLVM Intermediate
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	95	Representation}. What is good about our simple Fun language is that it
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	96	basically only contains expressions (be they arithmetic expressions or
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	97	boolean expressions). The exception is function definitions. Luckily,
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	98	for them we can use the mechanism of defining functions in LLVM-IR. For
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	99	example the simple Fun program
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	100
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	101
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	102	\begin{lstlisting}[language=Scala,numbers=none]
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	103	def sqr(x) = x * x
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	104	\end{lstlisting}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	105
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	106	\noindent
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	107	can be compiled into the following LLVM-IR function:
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	108
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	109	\begin{lstlisting}[language=LLVM]
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	110	define i32 @sqr(i32 %x) {
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	111	%tmp = mul i32 %x, %x
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	112	ret i32 %tmp
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	113	}
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	114	\end{lstlisting}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	115
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	116	\noindent First to notice is that all variable names in the LLVM-IR are
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	117	prefixed by \texttt{\%}; function names need to be prefixed with @.
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	118	Also, the LLVM-IR is a fully typed language. The \texttt{i32} type stands
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	119	for a 32-bit integer. There are also types for 64-bit integers, chars
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	120	(\texttt{i8}), floats, arrays and even pointer types. In teh code above,
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	121	\texttt{sqr} takes an argument of type \texttt{i32} and produces a
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	122	result of type \texttt{i32}. Each arithmetic operation, like addition or
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	123	multiplication, are also prefixed with the type they operate on.
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	124	Obviously these types need to match up\ldots{} but since we have in our
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	125	programs only integers, \texttt{i32} everywhere will do.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	126
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	127	Conveniently, you can use the program \texttt{lli}, which comes with LLVM, to interprete
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	128	programs written in the LLVM-IR. So you can easily check whether the
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	129	code you produced actually works. To get a running program that does
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	130	something interesting you need to add some boilerplate about printing out
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	131	numbers and a main-function that is the entrypoint for the program (see Figure~\ref{lli}).
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	132
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	133	\begin{figure}[t]
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	134	\lstinputlisting[language=LLVM]{../progs/sqr.ll}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	135	\caption{\label{lli}}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	136	\end{figure}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	137
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	138	\begin{figure}[t]
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	139	\begin{lstlisting}[language=Scala,numbers=none]
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	140	abstract class Exp extends Serializable
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	141	abstract class BExp extends Serializable
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	142	abstract class Decl extends Serializable
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	143
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	144	case class Main(e: Exp) extends Decl
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	145	case class Def(name: String, args: List[String], body: Exp)
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	146	extends Decl
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	147
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	148	case class Call(name: String, args: List[Exp]) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	149	case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	150	case class Write(e: Exp) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	151	case class Var(s: String) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	152	case class Num(i: Int) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	153	case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	154	case class Sequence(e1: Exp, e2: Exp) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	155	case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	156	\end{lstlisting}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	157	\caption{Abstract syntax trees for the Fun language.\label{absfun}}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	158	\end{figure}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	159
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	160
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	161
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	162	\subsection*{CPS-Translations}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	163
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	164
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	165	\end{document}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	166
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	167
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	168	%%% Local Variables:
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	169	%%% mode: latex
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	170	%%% TeX-master: t
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	171	%%% End:

author	Christian Urban <urbanc@in.tum.de>
	Mon, 28 Oct 2019 13:34:03 +0000
changeset 678	ff3b48da282c
parent 677	decfd8cf8180
child 679	8fc109f36b78
permissions	-rw-r--r--