--- a/handouts/ho07.tex Fri Oct 25 14:55:31 2019 +0100
+++ b/handouts/ho07.tex Fri Oct 25 17:23:23 2019 +0100
@@ -7,19 +7,20 @@
\begin{document}
+\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019}
\section*{Handout 7 (Compilation)}
-The purpose of a compiler is to transform a program, a human
-can read and write, into code the machine can run as fast as
-possible. The fastest code would be machine code the CPU can
-run directly, but it is often enough to improve the speed of a
-program by just targeting a virtual machine. This produces not
-the fastest possible code, but code that is fast enough and
-has the advantage that the virtual machine takes care of
-things a compiler would normally need to take care of (like
-explicit memory management). Why study compilers? John Regher
-gives this answer in his compiler blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
+The purpose of a compiler is to transform a program a human can read and
+write into code the machine can run as fast as possible. The fastest
+code would be machine code the CPU can run directly, but it is often
+good enough for improving the speed of a program to just target a
+virtual machine. This produces not the fastest possible code, but code
+that is pretty enough. This way of producing code has the advantage that
+the virtual machine takes care of things a compiler would normally need
+to take care of (like explicit memory management). An interesting answer
+for why studying compilers is given by John Regher in his compiler
+blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
\begin{quote}\it{}``We can start off with a couple of observations
about the role of compilers. First, hardware is getting weirder
@@ -41,19 +42,19 @@
As a first example in this module we will implement a compiler for the
very simple While-language. It will generate code for the Java Virtual
-Machine (JVM). This is a stack-based virtual machine, a fact which
-will make it easy to generate code for arithmetic expressions. For
-example for generating code for the expression $1 + 2$ we need to
-generate the following three instructions
+Machine (JVM). This is a stack-based virtual machine, a fact which will
+make it easy to generate code for arithmetic expressions. For example
+when compiling $1 + 2$ we need to generate the following three
+instructions
-\begin{lstlisting}[numbers=none]
+\begin{lstlisting}[language=JVMIS,numbers=none]
ldc 1
ldc 2
iadd
\end{lstlisting}
\noindent The first instruction loads the constant $1$ onto
-the stack, the next one $2$, the third instruction adds both
+the stack, the next one loads $2$, the third instruction adds both
numbers together replacing the top two elements of the stack
with the result $3$. For simplicity, we will throughout
consider only integer numbers and results. Therefore we can
@@ -80,11 +81,10 @@
\noindent where \meta{Id} stands for variables and \meta{Num}
-for numbers. For the moment let us omit variables from
-arithmetic expressions. Our parser will take this grammar and
-given an input produce abstract syntax trees. For example for
-the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
-following tree.
+for numbers. For the moment let us omit variables from arithmetic
+expressions. Our parser will take this grammar and given an input
+produce abstract syntax trees. For example we will obtain for the
+expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
\begin{center}
\begin{tikzpicture}
@@ -92,13 +92,13 @@
\end{tikzpicture}
\end{center}
-\noindent To generate code for this expression, we need to
+\noindent To generate JVM code for this expression, we need to
traverse this tree in post-order fashion and emit code for
each node---this traversal in post-order fashion will produce
code for a stack-machine (what the JVM is). Doing so for the
tree above generates the instructions
-\begin{lstlisting}[numbers=none]
+\begin{lstlisting}[language=JVMIS,numbers=none]
ldc 1
ldc 2
ldc 3
@@ -110,19 +110,17 @@
iadd
\end{lstlisting}
-\noindent If we ``run'' these instructions, the result $8$
-will be on top of the stack (I leave this to you to verify;
-the meaning of each instruction should be clear). The result
-being on the top of the stack will be a convention we always
-observe in our compiler, that is the results of arithmetic
-expressions will always be on top of the stack. Note, that a
-different bracketing of the expression, for example $(1 + (2 *
-3)) + (4 - 3)$, produces a different abstract syntax tree and
-thus potentially also a different list of instructions.
-Generating code in this fashion is rather easy to implement:
-it can be done with the following recursive
-\textit{compile}-function, which takes the abstract syntax
-tree as argument:
+\noindent If we ``run'' these instructions, the result $8$ will be on
+top of the stack (I leave this to you to verify; the meaning of each
+instruction should be clear). The result being on the top of the stack
+will be a convention we always observe in our compiler, that is the
+results of arithmetic expressions will always be on top of the stack.
+Note, that a different bracketing of the expression, for example $(1 +
+(2 * 3)) + (4 - 3)$, produces a different abstract syntax tree and thus
+also a different list of instructions. Generating code in this
+post-order-traversal fashion is rather easy to implement: it can be done
+with the following recursive \textit{compile}-function, which takes the
+abstract syntax tree as argument:
\begin{center}
\begin{tabular}{lcl}
@@ -144,7 +142,7 @@
to memory cells, containing in our case only integers. Looking
up a variable can be done with the instruction
-\begin{lstlisting}[mathescape,numbers=none]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
iload $index$
\end{lstlisting}
@@ -153,7 +151,7 @@
the stack. Storing the top of the stack into a local variable
can be done by the instruction
-\begin{lstlisting}[mathescape,numbers=none]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
istore $index$
\end{lstlisting}
@@ -201,7 +199,7 @@
\end{tabular}
\end{center}
-\noindent $[]$ is the empty list of instructions. Note that
+\noindent whereby $[]$ is the empty list of instructions. Note that
the \textit{compile}-function for statements returns a pair, a
list of instructions (in this case the empty list) and an
environment for variables. The reason for the environment is
@@ -233,7 +231,7 @@
That means for the assignment $x := x + 1$ we generate the
following code
-\begin{lstlisting}[mathescape,numbers=none]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
iload $n_x$
ldc 1
iadd
@@ -241,18 +239,31 @@
\end{lstlisting}
\noindent
-where $n_x$ is the index for the variable $x$.
+where $n_x$ is the index for the variable $x$. The code for
+looking-up the index for the variable is as follow:
-More complicated is the code for \pcode{if}-statements, say
+\begin{center}
+\begin{tabular}{lcl}
+$index \;=\; \textit{if}\;(E(x).\textit{isDefind})\;\textit{then}\;E(x)\;\textit{else}\;|E|$
+\end{tabular}
+\end{center}
+
+\noindent
+In case the environment $E$ contains an index for $x$, we return it.
+Otherwise we ``create'' a new index by returning the size $|E|$ of the
+environment (that will be an index that is guaranteed to be not used
+yet).
+
+More complicated is the generation of code for \pcode{if}-statements, say
\begin{lstlisting}[mathescape,language={},numbers=none]
if $b$ then $cs_1$ else $cs_2$
\end{lstlisting}
-\noindent where $b$ is a boolean expression and the $cs_{1/2}$
-are the statements for each \pcode{if}-branch. Lets assume
-we already generated code for $b$ and $cs_{1/2}$. Then in the
-true-case the control-flow of the program needs to be
+\noindent where $b$ is a boolean expression and the $cs_{1/2}$ are the
+statements for each of the \pcode{if}-branches. Lets assume we already
+generated code for $b$ and $cs_{1/2}$. Then in the true-case the
+control-flow of the program needs to be
\begin{center}
\begin{tikzpicture}[node distance=2mm and 4mm,
@@ -419,7 +430,7 @@
\noindent
The generated code is as follows:
-\begin{lstlisting}[mathescape,language={}]
+\begin{lstlisting}[language=JVMIS,mathescape,language={}]
ldc 1
ldc 1
if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
@@ -528,7 +539,7 @@
\noindent yielding the following code
-\begin{lstlisting}[mathescape,language={}]
+\begin{lstlisting}[language=JVMIS,mathescape,language={}]
L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
iload 0
ldc 10