afl-material: handouts/ho09.tex@52263ffd17b9 (annotated)

677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	1	% !TEX program = xelatex
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	2	\documentclass{article}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	3	\usepackage{../style}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	4	\usepackage{../langs}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	5	\usepackage{../graphics}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	6	\usepackage{../grammar}
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	7	%%\usepackage{multicol}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	8
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	9	%%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	10
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	11	\begin{document}
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	12	\fnote{\copyright{} Christian Urban, King's College London, 2019}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	13
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	14
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	15	\section*{Handout 9 (LLVM, SSA and CPS)}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	16
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	17	Reflecting on our two tiny compilers targetting the JVM, the code
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	18	generation part was actually not so hard, no? Pretty much just some
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	19	post-traversal of the abstract syntax tree, yes? One of the reasons for
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	20	this ease is that the JVM is a stack-based virtual machine and it is
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	21	therefore not hard to translate deeply-nested arithmetic expressions
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	22	into a sequence of instructions manipulating the stack. The problem is
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	23	that ``real'' CPUs, although supporting stack operations, are not really
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	24	designed to be \emph{stack machines}. The design of CPUs is more like,
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	25	here is a chunk of memory---compiler, or better compiler writers, do
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	26	something with it. Consequently, modern compilers need to go the extra
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	27	mile in order to generate code that is much easier and faster to process
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	28	by CPUs. To make this all tractable for this module, we target the LLVM
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	29	Intermediate Language. In this way we can take advantage of the tools
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	30	coming with LLVM. For example we do not have to worry about things like
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	31	register allocations.\bigskip
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	32
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	33	\noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	34	that projects from Academia can make a difference in the World. LLVM
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	35	started in 2000 as a project by two researchers at the University of
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	36	Illinois at Urbana-Champaign. At the time the behemoth of compilers was
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	37	gcc with its myriad of front-ends for other languages (C++, Fortran,
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	38	Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	39	time into a monolithic gigantic piece of m\ldots ehm software, which you
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	40	could not mess about in an afternoon. In contrast, LLVM is designed to
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	41	be a modular suite of tools with which you can play around easily and
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	42	try out something new. LLVM became a big player once Apple hired one of
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	43	the original developers (I cannot remember the reason why Apple did not
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	44	want to use gcc, but maybe they were also just disgusted by its big
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	45	monolithic codebase). Anyway, LLVM is now the big player and gcc is more
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	46	or less legacy. This does not mean that programming languages like C and
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	47	C++ are dying out any time soon---they are nicely supported by LLVM.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	48
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	49	We will target the LLVM Intermediate Language, or LLVM Intermediate
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	50	Representation (short LLVM-IR). The LLVM-IR looks very similar to the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	51	assembly language of Jasmin and Krakatau. It will also allow us to
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	52	benefit from the modular structure of the LLVM compiler and let for
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	53	example the compiler generate code for different CPUs, like X86 or ARM.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	54	That means we can be agnostic about where our code actually runs. We can
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	55	also be ignorant about optimising code and allocating memory
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	56	efficiently.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	57
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	58	However, what we have to do for LLVM is to generate code in \emph{Static
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	59	Single-Assignment} format (short SSA), because that is what the LLVM-IR
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	60	expects from us. A reason why LLVM uses the SSA format, rather than
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	61	JVM-like stack instructions, is that stack instructions are difficult to
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	62	optimise---you cannot just re-arrange instructions without messing about
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	63	with what is calculated on the stack. Also it is hard to find out if all
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	64	the calculations on the stack are actually necessary and not by chance
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	65	dead code. The JVM has for all these obstacles sophisticated machinery
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	66	to make such ``high-level'' code still run fast, but let's say that for
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	67	the sake of argument we do not want to rely on it. We want to generate
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	68	fast code ourselves. This means we have to work around the intricacies
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	69	of what instructions CPUs can actually process fast. This is what the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	70	SSA format is designed for.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	71
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	72
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	73	The main idea behind the SSA format is to use very simple variable
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	74	assignments where every variable is assigned only once. The assignments
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	75	also need to be primitive in the sense that they can be just simple
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	76	operations like addition, multiplication, jumps, comparisons and so on.
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	77	Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	78	corresponding SSA format is
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	79
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	80	\begin{lstlisting}[language=LLVMIR,numbers=left]
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	81	let tmp0 = add 1 a in
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	82	let tmp1 = mul b 5 in
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	83	let tmp2 = add 3 tmp1 in
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	84	let tmp3 = add tmp0 tmp2 in tmp3
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	85	\end{lstlisting}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	86
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	87	\noindent where every variable is used only once (we could not write
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	88	\texttt{tmp1 = add 3 tmp1} in Line 3 for example). There are
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	89	sophisticated algorithms for imperative languages, like C, that
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	90	efficiently transform a high-level program into SSA format. But we can
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	91	ignore them here. We want to compile a functional language and there
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	92	things get much more interesting than just sophisticated. We will need
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	93	to have a look at CPS translations, where the CPS stands for
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	94	Continuation-Passing-Style---basically black programming art or
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	95	abracadabra programming. So sit tight.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	96
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	97	\subsection*{LLVM-IR}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	98
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	99	Before we start, let's first have a look at the \emph{LLVM Intermediate
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	100	Representation} in more detail. The LLVM-IR is in between the frontends
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	101	and backends of the LLVM framework. It allows compilation of multiple
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	102	source languages to multiple targets. It is also the place where most of
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	103	the target independent optimisations are performed.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	104
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	105	What is good about our toy Fun language is that it basically only
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	106	contains expressions (be they arithmetic expressions, boolean
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	107	expressions or if-expressions). The exception are function definitions.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	108	Luckily, for them we can use the mechanism of defining functions in the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	109	LLVM-IR (this is similar to using JVM methods for functions in our
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	110	earlier compiler). For example the simple Fun program
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	111
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	112
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	113	\begin{lstlisting}[language=Scala,numbers=none]
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	114	def sqr(x) = x * x
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	115	\end{lstlisting}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	116
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	117	\noindent
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	118	can be compiled to the following LLVM-IR function:
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	119
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	120	\begin{lstlisting}[language=LLVM]
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	121	define i32 @sqr(i32 %x) {
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	122	%tmp = mul i32 %x, %x
677 decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	123	ret i32 %tmp
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	124	}
decfd8cf8180 updated Christian Urban <urbanc@in.tum.de> parents: 539 diff changeset	125	\end{lstlisting}
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	126
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	127	\noindent First notice that all variable names, in this case \texttt{x}
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	128	and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	129	Temporary variables can be named with an identifier, such as
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	130	\texttt{tmp}, or numbers. Function names, since they are ``global'',
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	131	need to be prefixed with @-symbol. Also, the LLVM-IR is a fully typed
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	132	language. The \texttt{i32} type stands for 32-bit integers. There are
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	133	also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}),
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	134	floats, arrays and even pointer types. In the code above, \texttt{sqr}
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	135	takes an argument of type \texttt{i32} and produces a result of type
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	136	\texttt{i32} (the result type is in front of the function name, like in
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	137	C). Each arithmetic operation, for example addition and multiplication,
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	138	are also prefixed with the type they operate on. Obviously these types
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	139	need to match up\ldots{} but since we have in our programs only
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	140	integers, \texttt{i32} everywhere will do. We do not have to generate
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	141	any other types, but obviously this is a limitation in our Fun-language.
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	142
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	143	There are a few interesting instructions in the LLVM-IR which are quite
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	144	different than in the JVM. Can you remember the kerfuffle we had to go
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	145	through with boolean expressions and negating the condition? In the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	146	LLVM-IR, branching if-conditions is implemented differently: there
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	147	is a separate \texttt{br}-instruction as follows:
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	148
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	149	\begin{lstlisting}[language=LLVM]
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	150	br i1 %var, label %if_br, label %else_br
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	151	\end{lstlisting}
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	152
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	153	\noindent
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	154	The type \texttt{i1} stands for booleans. If the variable is true, then
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	155	this instruction jumps to the if-branch, which needs an explicit label;
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	156	otherwise to the else-branch, again with its own label. This allows us
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	157	to keep the meaning of the boolean expression as is. A value of type
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	158	boolean is generated in the LLVM-IR by the \texttt{icmp}-instruction.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	159	This instruction is for integers (hence the \texttt{i}) and takes the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	160	comparison operation as argument. For example
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	161
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	162	\begin{lstlisting}[language=LLVM]
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	163	icmp eq i32 %x, %y ; for equal
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	164	icmp sle i32 %x, %y ; signed less or equal
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	165	icmp slt i32 %x, %y ; signed less than
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	166	icmp ult i32 %x, %y ; unsigned less than
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	167	\end{lstlisting}
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	168
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	169	\noindent
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	170	In some operations, the LLVM-IR distinguishes between signed and
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	171	unsigned representations of integers.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	172
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	173	Conveniently, you can use the program \texttt{lli}, which comes with
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	174	LLVM, to interpret programs written in the LLVM-IR. So you can easily
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	175	check whether the code you produced actually works. To get a running
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	176	program that does something interesting you need to add some boilerplate
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	177	about printing out numbers and a main-function that is the entrypoint
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	178	for the program (see Figure~\ref{lli} for a complete listing). Again
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	179	this is very similar to the boilerplate we needed to add in our JVM
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	180	compiler.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	181
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	182	You can generate a binary for the program in Figure~\ref{lli} by using
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	183	the \texttt{llc}-compiler and then \texttt{gcc}, whereby \texttt{llc} generates
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	184	an object file and \texttt{gcc} (that is clang) generates the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	185	executable binary:
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	186
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	187	\begin{lstlisting}[language=bash,numbers=none]
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	188	llc -filetype=obj sqr.ll
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	189	gcc sqr.o -o a.out
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	190	./a.out
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	191	> 25
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	192	\end{lstlisting}
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	193
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	194	\begin{figure}[t]\small
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	195	\lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	196	\caption{An LLVM-IR program for calculating the square function. It
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	197	calls this function in \texttt{@main} with the argument \texttt{5}. The
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	198	code for the \texttt{sqr} function is in Lines 13 -- 16. The main
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	199	function calls \texttt{sqr} and then prints out the result. The other
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	200	code is boilerplate for printing out integers.\label{lli}}
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	201	\end{figure}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	202
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	203
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	204
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	205	\subsection*{Our Own Intermediate Language}
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	206
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	207	Remember compilers have to solve the problem of bridging the gap between
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	208	``high-level'' programs and ``low-level'' hardware. If the gap is too
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	209	wide for one step, then a good strategy is to lay a stepping stone
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	210	somewhere in between. The LLVM-IR itself is such a stepping stone to
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	211	make the task of generating and optimising code easier. Like a real
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	212	compiler we will use our own stepping stone which I call the
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	213	\emph{K-language}. For what follows recall the various kinds of
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	214	expressions in the Fun language. For convenience the Scala code of the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	215	corresponding abstract syntax trees is shown on top of
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	216	Figure~\ref{absfun}. Below is the code for the abstract syntax trees in
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	217	the K-language. There are two kinds of syntactic entities, namely
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	218	\emph{K-values} and \emph{K-expressions}. The central constructor of the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	219	K-language is \texttt{KLet}. For this recall that arithmetic expressions
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	220	such as $((1 + a) + (3 + (b * 5)))$ need to be broken up into smaller
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	221	``atomic'' steps, like so
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	222
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	223	\begin{lstlisting}[language=LLVMIR,numbers=none]
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	224	let tmp0 = add 1 a in
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	225	let tmp1 = mul b 5 in
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	226	let tmp2 = add 3 tmp1 in
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	227	let tmp3 = add tmp0 tmp2 in
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	228	tmp3
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	229	\end{lstlisting}
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	230
eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	231	\noindent
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	232	Here \texttt{tmp3} will contain the result of what the whole expression
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	233	stands for. In each individual step we can only perform an ``atomic''
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	234	operation, like addition or multiplication of a number and a variable.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	235	We are not allowed to have for example an if-condition on the right-hand
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	236	side of an equals. Such constraints are enforced upon us because of how
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	237	the SSA format works in the LLVM-IR. By having in \texttt{KLet} taking
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	238	first a string (standing for an intermediate result) and second a value,
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	239	we can fulfil this constraint ``by construction''---there is no way we
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	240	could write anything else than a value.
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	241
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	242	To sum up, K-values are the atomic operations that can be on the
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	243	right-hand side of equal-signs. The K-language is restricted such that
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	244	it is easy to generate the SSA format for the LLVM-IR.
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	245
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	246
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	247
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	248	\begin{figure}[p]\small
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	249	\begin{lstlisting}[language=Scala,numbers=none]
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	250	// Fun-language (expressions)
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	251	abstract class Exp
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	252	abstract class BExp
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	253
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	254	case class Call(name: String, args: List[Exp]) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	255	case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	256	case class Write(e: Exp) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	257	case class Var(s: String) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	258	case class Num(i: Int) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	259	case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	260	case class Sequence(e1: Exp, e2: Exp) extends Exp
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	261	case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	262
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	263
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	264
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	265	// K-language (K-expressions, K-values)
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	266	abstract class KExp
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	267	abstract class KVal
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	268
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	269	case class KVar(s: String) extends KVal
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	270	case class KNum(i: Int) extends KVal
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	271	case class Kop(o: String, v1: KVal, v2: KVal) extends KVal
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	272	case class KCall(o: String, vrs: List[KVal]) extends KVal
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	273	case class KWrite(v: KVal) extends KVal
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	274
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	275	case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
680 eecc4d5a2172 updated Christian Urban <urbanc@in.tum.de> parents: 679 diff changeset	276	case class KLet(x: String, v: KVal, e: KExp) extends KExp
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	277	case class KReturn(v: KVal) extends KExp
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	278	\end{lstlisting}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	279	\caption{Abstract syntax trees for the Fun language.\label{absfun}}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	280	\end{figure}
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	281
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	282
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	283
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	284	\subsection*{CPS-Translations}
ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	285
700 52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	286	The main difficulty of generating instructions in SSA format is that
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	287	large compound expressions need to be broken up into smaller pieces and
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	288	intermediate results need to be chained into later instructions. To do
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	289	this conveniently, CPS-translations have been developed. They use
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	290	functions (``continuations'') to represent what is coming next in a
52263ffd17b9 updated Christian Urban <urbanc@in.tum.de> parents: 680 diff changeset	291	sequence of instructions.
678 ff3b48da282c updated Christian Urban <urbanc@in.tum.de> parents: 677 diff changeset	292
679 8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	293
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	294
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	295
8fc109f36b78 updated Christian Urban <urbanc@in.tum.de> parents: 678 diff changeset	296
539 ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	297	\end{document}
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	298
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	299
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	300	%%% Local Variables:
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	301	%%% mode: latex
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	302	%%% TeX-master: t
ed8f014217be updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	303	%%% End:

author	Christian Urban <urbanc@in.tum.de>
	Sun, 24 Nov 2019 16:30:34 +0000
changeset 700	52263ffd17b9
parent 680	eecc4d5a2172
child 701	681c36b2af27
permissions	-rw-r--r--