handouts/ho07.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Fri, 29 Nov 2024 18:59:32 +0000
changeset 976 e9eac62928f5
parent 960 c7009356ddd8
permissions -rw-r--r--
updated

% !TEX program = xelatex
\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\usepackage{../grammar}
\usepackage{../graphics}
\usetikzlibrary{calc,shapes,arrows}
\usepackage{framed}
\usepackage[belowskip=7pt,aboveskip=0pt]{caption}




\begin{document}
\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019, 2020, 2023}

\section*{Handout 7 (Compilation)}

The purpose of a compiler is to transform a program a human can read and
write into code the machine can run as fast as possible. The fastest
code would be machine code the CPU can run directly, but it is often
good enough for improving the speed of a program to target a virtual
machine instead. This produces not the fastest possible code, but code
that is often pretty fast. This way of producing code has also the
advantage that the virtual machine takes care of things a compiler would
normally need to take care of (hairy things like explicit memory
management).

As a first example in this module we will implement a compiler for the
very simple WHILE-language that we parsed in the last lecture. The
compiler will target the Java Virtual Machine (JVM), but not directly.
Pictorially the compiler will work as follows:

\begin{center}
  \begin{tikzpicture}[scale=1,font=\bf,
                      node/.style={
                      rectangle,rounded corners=3mm,
                      ultra thick,draw=black!50,minimum height=18mm,
                      minimum width=20mm,
                      top color=white,bottom color=black!20}]

  \node (0) at (-3,0) {};
  \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler};
  \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau};
  \node (C) at (7.5,0) [node] {JVM};

  \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A);
  \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B);
  \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C);
  \end{tikzpicture}
  \end{center}

\noindent
The input will be WHILE-programs; the output will be assembly files
(with the file extension *.j). Assembly files essentially contain
human-readable low-level code, meaning they are not just bits and
bytes, but rather something you can read and understand---with a bit
of practice of course. An \emph{assembler} will then translate the
assembly files into unreadable class- or binary-files the JVM or CPU
can run, i.e.~bits and bytes.  Unfortunately, the Java ecosystem does not come with an
assembler which would be handy for our compiler-endeavour (unlike
Microsoft's Common Language Infrastructure for the .Net platform which
has an assembler out-of-the-box). As a substitute we shall use the
3rd-party program Jasmin, or alternatively Krakatau (Jasmin is the preferred
option---a \texttt{jasmin.jar}-file is available on KEATS):

\begin{itemize}
  \item \url{http://jasmin.sourceforge.net}
  \item \url{https://github.com/Storyyeller/Krakatau}
\end{itemize}

\noindent
The first is a Java program and the second a program written in Python.
Each of them allow us to generate \emph{assembly} files that are still
readable by humans, as opposed to class-files which are pretty much just
(horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
our assembly files as input and generate the corresponding class-files for
us.

What is good about the JVM is that it is a stack-based virtual machine,
a fact which will make it easy to generate code for arithmetic
expressions. For example when compiling the expression $1 + 2$ we need
to generate the following three instructions

\begin{lstlisting}[language=JVMIS,numbers=none]
ldc 1
ldc 2
iadd
\end{lstlisting}

\noindent The first instruction loads the constant $1$ onto the stack,
the next one loads $2$, the third instruction adds both numbers together
replacing the top two elements of the stack with the result $3$. For
simplicity, we will consider only arithmetic operations involving
integer numbers. This means our main JVM instructions for arithmetic
will be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on.
The \code{i} stands for integer instructions in the JVM (alternatives
are \code{d} for doubles, \code{l} for longs and \code{f} for floats
etc).

Recall our grammar for arithmetic expressions (\meta{E} is the
starting symbol):


\begin{plstx}[rhs style=, margin=3cm]
: \meta{E} ::= \meta{T} $+$ \meta{E}
         | \meta{T} $-$ \meta{E}
         | \meta{T}\\
: \meta{T} ::= \meta{F} $*$ \meta{T}
          | \meta{F} $\backslash$ \meta{T}
          | \meta{F}\\
: \meta{F} ::= ( \meta{E} )
          | \meta{Id}
          | \meta{Num}\\
\end{plstx}


\noindent where \meta{Id} stands for variables and \meta{Num}
for numbers. For the moment let us omit variables from arithmetic
expressions. Our parser will take this grammar and given an input
program produce an abstract syntax tree. For example we obtain for
the expression $1 + ((2 * 3) + (4 - 3))$ the following tree.

\begin{center}
\begin{tikzpicture}
\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
\end{tikzpicture}
\end{center}

\noindent To generate JVM code for this expression, we need to traverse
this tree in \emph{post-order} fashion and emit code for each
node---this traversal in \emph{post-order} fashion will produce code for
a stack-machine (which is what the JVM is). Doing so for the tree above
generates the instructions

\begin{lstlisting}[language=JVMIS,numbers=none]
ldc 1
ldc 2
ldc 3
imul
ldc 4
ldc 3
isub
iadd
iadd
\end{lstlisting}

\noindent If we ``run'' these instructions, the result $8$ will be on
top of the stack (I leave this to you to verify; the meaning of each
instruction should be clear). The result being on the top of the stack
will be an important convention we always observe in our compiler. Note,
that a different bracketing of the expression, for example $(1 + (2 *
3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
a different list of instructions.

Generating code in this post-order-traversal fashion is rather easy to
implement: it can be done with the following recursive
\textit{compile}-function, which takes the abstract syntax tree as an
argument:

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\
$\textit{compile}(a_1 + a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\
$\textit{compile}(a_1 - a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\
$\textit{compile}(a_1 * a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\
$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\
\end{tabular}
\end{center}

\noindent
This is all fine, but our arithmetic expressions can contain variables
and we have not considered them yet. To fix this we will represent our
variables as \emph{local variables} of the JVM. Essentially, local
variables are an array or pointers to memory cells, containing in our
case only integers. Looking up a variable can be done with the
instruction

\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
iload $index$
\end{lstlisting}

\noindent
which places the content of the local variable $index$ onto
the stack. Storing the top of the stack into a local variable
can be done by the instruction

\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
istore $index$
\end{lstlisting}

\noindent Note that this also pops off the top of the stack. One problem
we have to overcome, however, is that local variables are addressed, not
by identifiers (like \texttt{x}, \texttt{foo} and so on), but by numbers
(starting from $0$). Therefore our compiler needs to maintain a kind of
environment where variables are associated to numbers. This association
needs to be unique: if we muddle up the numbers, then we essentially
confuse variables and the consequence will usually be an erroneous
result. Our extended \textit{compile}-function for arithmetic
expressions will therefore take two arguments: the abstract syntax tree
and an environment, $E$, that maps identifiers to index-numbers.

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\
$\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\
$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\
$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\
$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\
$\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\
\end{tabular}
\end{center}

\noindent In the last line we generate the code for variables where
$E(x)$ stands for looking up the environment to which index the variable
$x$ maps to. This is similar to the interpreter we saw earlier in the
module, which also needs an environment: the difference is that the
interpreter maintains a mapping from variables to current values (what
is the currently the value of a variable?), while compilers need a
mapping from variables to memory locations (where can I find the current
value for the variable in memory?).

There is a similar \textit{compile}-function for boolean
expressions, but it includes a ``trick'' to do with
\pcode{if}- and \pcode{while}-statements. To explain the issue
let us first describe the compilation of statements of the
WHILE-language. The clause for \pcode{skip} is trivial, since
we do not have to generate any instruction

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
\end{tabular}
\end{center}

\noindent whereby $[]$ is the empty list of instructions. Note that
the \textit{compile}-function for statements returns a pair, a
list of instructions (in this case the empty list) and an
environment for variables. The reason for the environment is
that assignments in the WHILE-language might change the
environment---clearly if a variable is used for the first
time, we need to allocate a new index and if it has been used
before, then we need to be able to retrieve the associated index.
This is reflected in the clause for compiling assignments, say
$x := a$:

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(x := a, E)$ & $\dn$ &
$(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
\end{tabular}
\end{center}

\noindent We first generate code for the right-hand side of the
assignment (that is the arithmetic expression $a$) and then add an
\instr{istore}-instruction at the end. By convention running the code
for the arithmetic expression $a$ will leave the result on top of the
stack. After that the \instr{istore}-instruction, the result will be
stored in the index corresponding to the variable $x$. If the variable
$x$ has been used before in the program, we just need to look up what
the index is and return the environment unchanged (that is in this case
$E' = E$). However, if this is the first encounter of the variable $x$
in the program, then we have to augment the environment and assign $x$
with the largest index in $E$ plus one (that is $E' = E(x \mapsto
largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we
generate the following code snippet

\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
iload $n_x$
ldc 1
iadd
istore $n_x$
\end{lstlisting}

\noindent
where $n_x$ is the index (or pointer to the memory) for the variable
$x$. The Scala code for looking-up the index for the variable is as follow:

\begin{center}
\begin{tabular}{lcl}
$index \;=\; E\textit{.getOrElse}(x, |E|)$
\end{tabular}
\end{center}

\noindent
This implements the idea that in case the environment $E$ contains an
index for $x$, we return it. Otherwise we ``create'' a new index by
returning the size $|E|$ of the environment (that will be an index that
is guaranteed not to be used yet). In all this we take advantage of the
JVM which provides us with a potentially limitless supply of places
where we can store values of variables.

A bit more complicated is the generation of code for
\pcode{if}-statements, say

\begin{lstlisting}[mathescape,language={WHILE},numbers=none]
if $b$ then $cs_1$ else $cs_2$
\end{lstlisting}

\noindent where $b$ is a boolean expression and where both $cs_{1/2}$
are the statements for each of the \pcode{if}-branches. Let us assume we
already generated code for $b$ and and the two if-branches $cs_{1/2}$.
Then in the true-case the control-flow of the program needs to behave as


\begin{center}
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
               top color=white,bottom color=black!20},
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
\node (A1) [point] {};
\node (b) [block, right=of A1] {code of $b$};
\node (A2) [point, right=of b] {};
\node (cs1) [block, right=of A2] {code of $cs_1$};
\node (A3) [point, right=of cs1] {};
\node (cs2) [block, right=of A3] {code of $cs_2$};
\node (A4) [point, right=of cs2] {};

\draw (A1) edge [->, black, line width=1mm] (b);
\draw (b) edge [->, black, line width=1mm] (cs1);
\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);
\draw (A3) edge [->, black, skip loop] (A4);
\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
\end{tikzpicture}
\end{center}

\noindent where we start with running the code for $b$; since
we are in the true case we continue with running the code for
$cs_1$. After this however, we must not run the code for
$cs_2$, but always jump to after the last instruction of $cs_2$
(the code for the \pcode{else}-branch). Note that this jump is
unconditional, meaning we always have to jump to the end of
$cs_2$. The corresponding instruction of the JVM is
\instr{goto}. In case $b$ turns out to be false we need the
control-flow to be:

\begin{center}
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
               top color=white,bottom color=black!20},
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
\node (A1) [point] {};
\node (b) [block, right=of A1] {code of $b$};
\node (A2) [point, right=of b] {};
\node (cs1) [block, right=of A2] {code of $cs_1$};
\node (A3) [point, right=of cs1] {};
\node (cs2) [block, right=of A3] {code of $cs_2$};
\node (A4) [point, right=of cs2] {};

\draw (A1) edge [->, black, line width=1mm] (b);
\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);
\draw (A2) edge [skip loop] (A3);
\draw (A3) edge [->, black, line width=1mm] (cs2);
\draw (cs2) edge [->,black, line width=1mm] (A4);
\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
\end{tikzpicture}
\end{center}

\noindent where we now need a conditional jump (if the
if-condition is false) from the end of the code for the
boolean to the beginning of the instructions $cs_2$. Once we
are finished with running $cs_2$ we can continue with whatever
code comes after the if-statement.

The \instr{goto} and the conditional jumps need addresses to
where the jump should go. Since we are generating assembly
code for the JVM, we do not actually have to give (numeric)
addresses, but can just attach (symbolic) labels to our code.
These labels specify a target for a jump---essentially they mark
a location in our code. Therefore the labels
need to be unique, as otherwise it would be ambiguous where a
jump should go to. A label, say \pcode{L}, is attached to assembly
code like

\begin{lstlisting}[mathescape,numbers=none]
  $\textit{instr\_1}$
L:
  $\textit{instr\_2}$
  $\textit{instr\_3}$
    $\vdots$
\end{lstlisting}

\noindent where the label needs to be followed by a colon. The task of
the assembler (in our case Jasmin or Krakatau) is to resolve the labels
to actual (numeric) addresses, for example jump 10 instructions forward,
or 20 instructions backwards, or jump to this particular address.

Recall the ``trick'' with compiling boolean expressions: the
\textit{compile}-function for boolean expressions takes three
arguments: an abstract syntax tree, an environment for
variable indices and also the label, which I called $lab$, to where an conditional
jump needs to go. The clause for the expression $a_1 = a_2$,
for example, is as follows:

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\
\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}
\end{tabular}
\end{center}

\noindent where we are first generating code for the
subexpressions $a_1$ and $a_2$. This will mean after running
the corresponding code there will be two integers on top of
the stack. If they are equal, we do not have to do anything
(except for popping them off from the stack) and just continue
with the next instructions (see control-flow of ifs above).
However if they are \emph{not} equal, then we need to
(conditionally) jump to the label $lab$. This can be done with
the instruction

\begin{lstlisting}[mathescape,numbers=none,language=JVMIS]
if_icmpne $lab$
\end{lstlisting}

To sum up, the third argument in the compile function for booleans
specifies where to jump, in case the condition is \emph{not} true. I
leave it to you to extend the \textit{compile}-function for the other
boolean expressions. Note that we need to jump whenever the boolean is
\emph{not} true, which means we have to ``negate'' the jump
condition---equals becomes not-equal, less becomes greater-or-equal.
Other jump instructions for boolean operators are

\begin{center}
\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
$\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\
$<$ & $\Rightarrow$ & \instr{if_icmpge}\\
$\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\
\end{tabular}
\end{center}

\noindent and so on.   If you do not like this design (it can be the
source of some nasty, hard-to-detect errors), you can also change the
layout of the code and first give the code for the else-branch and then
for the if-branch. However in the case of while-loops this
``upside-down-inside-out'' way of generating code still seems the most
convenient.

We are now ready to give the compile function for
if-statements---remember this function returns for statements a
pair consisting of the code and an environment:

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\
\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\
\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\
\end{tabular}
\end{center}

\noindent In the first two lines we generate two fresh labels
for the jump addresses (just before the else-branch and just
after). In the next two lines we generate the instructions for
the two branches, $is_1$ and $is_2$. The final code will
be first the code for $b$ (including the label
just-before-the-else-branch), then the \pcode{goto} for after
the else-branch, the label $L_\textit{--ifelse}$, followed by
the instructions for the else-branch, followed by the
after-the-else-branch label. Consider for example the
if-statement:

\begin{lstlisting}[mathescape,numbers=none,language=While]
if 1 == 1 then x := 2 else y := 3
\end{lstlisting}

\noindent
The generated code is as follows:

\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
   ldc 1
   ldc 1
   if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
   ldc 2
   istore 0
   goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$
L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$
   ldc 3
   istore 1
L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$
\end{lstlisting}

\begin{tikzpicture}[remember picture,overlay]
  \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm)
           -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);
  \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm)
           -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);
\end{tikzpicture}

\noindent The first three lines correspond to the the boolean
expression \texttt{1 == 1}. The jump for when this boolean expression
is false is in Line~3. Lines 4-6 corresponds to the if-branch;
the else-branch is in Lines 8 and 9.

Note carefully how the environment $E$ is threaded through the recursive
calls of \textit{compile}. The function receives an environment $E$, but
it might extend it when compiling the if-branch, yielding $E'$. This
happens for example in the if-statement above whenever the variable
\code{x} has not been used before. Similarly with the environment $E''$
for the second call to \textit{compile}. $E''$ is also the environment
that needs to be returned as part of the answer.

The compilation of the while-loops of the form
\pcode{while} $b$ \pcode{do} $cs$ is very similar. In case
the condition is true and we need to do another iteration,
and the control-flow needs to be as follows

\begin{center}
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
               top color=white,bottom color=black!20},
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
\node (A0) [point, left=of A1] {};
\node (A1) [point] {};
\node (b) [block, right=of A1] {code of $b$};
\node (A2) [point, right=of b] {};
\node (cs1) [block, right=of A2] {code of $cs$};
\node (A3) [point, right=of cs1] {};
\node (A4) [point, right=of A3] {};

\draw (A0) edge [->, black, line width=1mm] (b);
\draw (b) edge [->, black, line width=1mm] (cs1);
\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);
\draw (A3) edge [->,skip loop] (A1);
\end{tikzpicture}
\end{center}

\noindent Whereas if the condition is \emph{not} true, we
need to jump out of the loop, which gives the following
control flow.

\begin{center}
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
               top color=white,bottom color=black!20},
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
\node (A0) [point, left=of A1] {};
\node (A1) [point] {};
\node (b) [block, right=of A1] {code of $b$};
\node (A2) [point, right=of b] {};
\node (cs1) [block, right=of A2] {code of $cs$};
\node (A3) [point, right=of cs1] {};
\node (A4) [point, right=of A3] {};

\draw (A0) edge [->, black, line width=1mm] (b);
\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);
\draw (A2) edge [skip loop] (A3);
\draw (A3) edge [->, black, line width=1mm] (A4);
\end{tikzpicture}
\end{center}

\noindent Again we can use the \textit{compile}-function for
boolean expressions to insert the appropriate jump to the
end of the loop (label $L_\textit{--wend}$ below).

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\
\multicolumn{3}{l}{$\qquad L_\textit{--wbegin}\;$ (fresh label)}\\
\multicolumn{3}{l}{$\qquad L_\textit{--wend}\;$ (fresh label)}\\
\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
\multicolumn{3}{l}{$\qquad(L_\textit{--wbegin}$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_\textit{--wend})$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_\textit{--wbegin}$}\\
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{--wend}, E')$}\\
\end{tabular}
\end{center}

\noindent I let you go through how this clause works. As an example
you can consider the while-loop

\begin{lstlisting}[mathescape,numbers=none,language=While]
while x <= 10 do x := x + 1
\end{lstlisting}

\noindent yielding the following code

\begin{lstlisting}[language=JVMIS2,mathescape,numbers=left]
L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
   iload 0
   ldc 10
   if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
   iload 0
   ldc 1
   iadd
   istore 0
   goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
\end{lstlisting}

\begin{tikzpicture}[remember picture,overlay]
  \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm)
           -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
  \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm)
           -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
\end{tikzpicture}

\noindent
As said, I leave it to you to check that the code really implements
the usual controlflow of while-loops.

Next we need to consider the WHILE-statement \pcode{write x}, which can
be used to print out the content of a variable. For this we shall use a
Java library function. In order to avoid having to generate a lot of
code for each \pcode{write}-command, we use a separate helper-method and
just call this method with an appropriate argument (which of course
needs to be placed onto the stack). The code of the helper-method is as
follows.

\begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small]
.method public static write(I)V
    .limit locals 1
    .limit stack 2
    getstatic java/lang/System/out Ljava/io/PrintStream;
    iload 0
    invokevirtual java/io/PrintStream/println(I)V
    return
.end method
\end{lstlisting}

\noindent The first line marks the beginning of the method, called
\pcode{write}. It takes a single integer argument indicated by the
\pcode{(I)} and returns no result, indicated by the \pcode{V} (for
void). Since the method has only one argument, we only need a single
local variable (Line~2) and a stack with two cells will be sufficient
(Line 3). Line 4 instructs the JVM to get the value of the member
\pcode{out} from the class \pcode{java/lang/System}. It expects the
value to be of type \pcode{java/io/PrintStream}. A reference to this
value will be placed on the stack.\footnote{Note the syntax
  \texttt{Ljava\ldots{};} for the \texttt{PrintStream} type is not an
  typo. Somehow the designers of Jasmin decided that this syntax is
  pleasing to the eye. So if you wanted to have strings in your Jasmin
  code, you would need to write \texttt{Ljava/lang/String;}\;. If you
  want arrays of one dimension, then use \texttt{[\ldots}; two
  dimensions, use \texttt{[[\ldots} and so on. Looks all very ugly to
  my eyes.} Line~5 copies the integer we want to print out onto the
stack. In the line after that we call the method \pcode{println} (from
the class \pcode{java/io/PrintStream}). We want to print out an
integer and do not expect anything back (that is why the type
annotation is \pcode{(I)V}).  The \pcode{return}-instruction in the
next line changes the control-flow back to the place from where
\pcode{write} was called. This method needs to be part of a header
that is included in any code we generate. The helper-method
\pcode{write} can be invoked with the two instructions

\begin{lstlisting}[mathescape,language=JVMIS]
iload $E(x)$
invokestatic XXX/XXX/write(I)V
\end{lstlisting}

\noindent where we first place the variable to be printed on
top of the stack and then call \pcode{write}. The \pcode{XXX}
need to be replaced by an appropriate class name (this will be
explained shortly).


By generating code for a WHILE-program, we end up with a list of (JVM
assembly) instructions. Unfortunately, there is a bit more boilerplate
code needed before these instructions can be run. Essentially we have to
enclose them inside a Java \texttt{main}-method. The corresponding code
is shown in Figure~\ref{boiler}. This boilerplate code is very specific
to the JVM. If we target any other virtual machine or a machine
language, then we would need to change this code.  Interesting are the
Lines 5 and 6 where we hardwire that the stack of our programs will
never be larger than 200 and that the maximum number of variables is
also 200. This seem to be conservative default values that allow is to
run some simple WHILE-programs. In a real compiler, we would of course
need to work harder and find out appropriate values for the stack and
local variables.

\begin{figure}[t]
\begin{framed}
\begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
.class public XXX.XXX
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
    .limit locals 200
    .limit stack 200

      $\textit{\ldots{}here comes the compiled code\ldots}$

    return
.end method
\end{lstlisting}
\end{framed}
\caption{The boilerplate code needed for running generated code. It
  hardwires limits for stack space and for the number of local
  variables.\label{boiler}}
\end{figure}


To sum up, in Figure~\ref{test} is the complete code generated
for the slightly nonsensical program

\begin{lstlisting}[mathescape,language=While]
x := 1 + 2;
write x
\end{lstlisting}

\noindent I let you read the code and make sure the code behaves as
expected. Having this code at our disposal, we need the assembler to
translate the generated code into JVM bytecode (a class file). This
bytecode is then understood by the JVM and can be run by just invoking
the \pcode{java}-program. Again I let you do the work.


\begin{figure}[p]
\begin{framed}
\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
\begin{tikzpicture}[remember picture,overlay]
  \draw[|<->|,very thick] (LA.north) -- (LB.south)
     node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
  \draw[|<->|,very thick] (LC.north) -- (LD.south)
     node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
\end{tikzpicture}
\end{framed}
\caption{The generated code for the test program \texttt{x := 1 + 2; write
x}. This code can be processed by a Java assembler producing a
class-file, which can then be run by the {\tt{}java}-program.\label{test}}
\end{figure}

\subsection*{Arrays}

Maybe a useful addition to the WHILE-language would be arrays. This
would allow us to generate more interesting WHILE-programs by
translating BF*** programs into equivalent WHILE-code. Yeah! Therefore in this
section let us have a look at how we can support the following three
constructions

\begin{itemize}
\item \lstinline|new(arr[15000])|
\item \lstinline|x := 3 + arr[3 + y]|
\item \lstinline|arr[42 * n] := ...|
\end{itemize}

\noindent
The first construct is for creating new arrays. In this instance the
name of the array is \pcode{arr} and it can hold 15000 integers. We do
not support ``dynamic'' arrays, that is the size of our arrays will
always be fixed. The second construct is for referencing an array cell
inside an arithmetic expression---we need to be able to look up the
contents of an array at an index determined by an arithmetic expression.
Similarly in the line below, we need to be able to update the content of
an array at a calculated index.

For creating a new array we can generate the following three JVM
instructions:

\begin{lstlisting}[mathescape,language=JVMIS]
ldc number
newarray int
astore loc_var
\end{lstlisting}

\noindent
First we need to put the size of the array onto the stack. The next
instruction creates the array. In this case the array contains
\texttt{int}s. With the last instruction we can store the array as a
local variable (like the ``simple'' variables from the previous
section). The use of a local variable for each array allows us to have
multiple arrays in a WHILE-program. For looking up an element in an
array we can use the following JVM code

\begin{lstlisting}[mathescape,language=JVMIS]
aload loc_var
$\textit{index\_aexp}$
iaload
\end{lstlisting}

\noindent
The first instruction loads the ``pointer'', or local variable, to the
array onto the stack. Then we have some instructions calculating the
index where we want to look up the array. The idea is that these
instructions will leave a concrete number on the top of the stack, which
will be the index into the array we need. Finally we need to tell the
JVM to load the corresponding element onto the stack. Updating an array
at an index with a value is as follows.

\begin{lstlisting}[mathescape,language=JVMIS]
aload loc_var
$\textit{index\_aexp}$
$\textit{value\_aexp}$
iastore
\end{lstlisting}

\noindent
Again the first instruction loads the local variable of
the array onto the stack. Then we have some instructions calculating
the index where we want to update the array. After that come the
instructions for with which value we want to update the array. The last
line contains the instruction for updating the array.

Next we need to modify our grammar rules for our WHILE-language: it
seems best to extend the rule for factors in arithmetic expressions with
a rule for looking up an array.

\begin{plstx}[rhs style=, margin=3cm]
: \meta{E} ::= \meta{T} $+$ \meta{E}
         | \meta{T} $-$ \meta{E}
         | \meta{T}\\
: \meta{T} ::= \meta{F} $*$ \meta{T}
          | \meta{F} $\backslash$ \meta{T}
          | \meta{F}\\
: \meta{F} ::= ( \meta{E} )
          | $\underbrace{\meta{Id}\,[\,\meta{E}\,]}_{new}$
          | \meta{Id}
          | \meta{Num}\\
\end{plstx}

\noindent
There is no problem with left-recursion as the \meta{E} is ``protected''
by an identifier and the brackets. There are two new rules for statements,
one for creating an array and one for array assignment:

\begin{plstx}[rhs style=, margin=2cm, one per line]
: \meta{Stmt} ::=  \ldots
              | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,])
              | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
\end{plstx}

With this in place we can turn back to the idea of creating
WHILE-programs by translating BF-programs. This is a relatively easy
task because BF has only eight instructions (we will actually implement
seven because we can omit the read-in instruction from BF). What makes
this translation easy is that BF-loops can be straightforwardly
represented as while-loops. The Scala code for the translation is as
follows:

\begin{lstlisting}[language=Scala,numbers=left]
def instr(c: Char) : String = c match {
  case '>' => "ptr := ptr + 1;"
  case '<' => "ptr := ptr - 1;"
  case '+' => "mem[ptr] := mem [ptr] + 1;"
  case '-' => "mem [ptr] := mem [ptr] - 1;"
  case '.' => "x := mem [ptr]; write x;"
  case '['  => "while (mem [ptr] != 0) do {"
  case ']'  => "skip};"
  case _ => ""
}
\end{lstlisting}

\noindent
The idea behind the translation is that BF-programs operate on an array,
called here \texttt{mem}. The BF-memory pointer into this array is
represented as the variable \texttt{ptr}. As usual the BF-instructions
\code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary
variable since we have not changed our write functions in order to cope
with writing out any array-content directly. Lines 7 and 8 are for
translating BF-loops. Line 8 is interesting in the sense that we need to
generate a \code{skip} instruction just before finishing with the
closing \code{"\}"}. The reason is that we are rather pedantic about
semicolons in our WHILE-grammar: the last command cannot have a
semicolon---adding a \code{skip} works around this snag.

Putting this all together and we can generate WHILE-programs with more
than 15K JVM-instructions; run the compiled JVM code for such
programs and marvel at the output\ldots\medskip

\noindent
\ldots{}Hooooray, after a few more tweaks we can finally run the
BF-mandelbrot program on the JVM (after nearly 10 minutes of parsing the
corresponding WHILE-program; the size of the resulting class file is
around 32K---not too bad). The generation of the picture completes
within 20 or so seconds. Try replicating this with an interpreter! The
good point is that we now have a sufficiently complicated program in our
WHILE-language in order to do some benchmarking (your task!). Having done
this, we now face the question about what to do next in our compiler\ldots?

\subsection*{Optimisations \& Co}

Every compiler that deserves its name has to perform some optimisations
on the code: if we put in the extra effort of writing a compiler for a
language, then obviously we want to have our code to run as fast as
possible. So we should look into this in more detail.

There is actually one aspect in our generated code where we can make
easily efficiency gains. This has to do with some of the quirks of the
JVM. Whenever we push a constant onto the stack, we used the JVM
instruction \instr{ldc some_const}. This is a rather generic instruction
in the sense that it works not just for integers but also for strings,
objects and so on. What this instruction does is putting the constant
into a \emph{constant pool} and then uses an index into this constant
pool. This means \instr{ldc} will be represented by at least two bytes
in the class file. While this is a sensible strategy for ``large''
constants like strings, it is a bit of overkill for small integers
(which many integers will be when compiling a BF-program). To counter
this ``waste'', the JVM has specific instructions for small integers,
for example

\begin{itemize}
\item \instr{iconst_0},\ldots, \instr{iconst_5}
\item \instr{bipush n}
\end{itemize}

\noindent
where the \code{n} is \instr{bipush} is between -128 and 128.   By
having dedicated instructions such as \instr{iconst_0} to
\instr{iconst_5} (and \instr{iconst_m1}), we can make the generated code
size smaller as these instructions only require 1 byte (as opposed the
generic \instr{ldc} which needs 1 byte plus another for the index into
the constant pool). While in theory the use of such special instructions
should make the code only smaller, it actually makes the code also run
faster. Probably because the JVM has to process less code and uses a
specific instruction for the underlying CPU.  The story with
\instr{bipush} is slightly different, because it also uses two
bytes---so it does not necessarily result in a reduction of code size.
Instead, it probably uses a specific instruction in the underlying CPU
that makes the JVM code run faster.\footnote{This is all ``probable''
because I have not read the 700 pages of JVM documentation by Oracle and
also have no clue how the JVM is implemented.} This means when
generating code for pushing constants onto the stack, we can use the
following Scala helper-function

\begin{lstlisting}[language=Scala]
def compile_num(i: Int) =
  if (0 <= i && i <= 5) i"iconst_$i" else
  if (-128 <= i && i <= 127) i"bipush $i"
  else i"ldc $i"
\end{lstlisting}

\noindent
It generates the more efficient instructions when pushing a small integer
constant onto the stack. The default is \instr{ldc} for any other constants.

The JVM also has such special instructions for
loading and storing the first three local variables. The assumption is
that most operations and arguments in a method will only use very few
local variables. So we can use the following instructions:

\begin{itemize}
\item \instr{iload_0},\ldots, \instr{iload_3}
\item \instr{istore_0},\ldots, \instr{istore_3}
\item \instr{aload_0},\ldots, \instr{aload_3}
\item \instr{astore_0},\ldots, \instr{astore_3}
\end{itemize}


\noindent Having implemented these optimisations, the code size of the
BF-Mandelbrot program reduces and also the class-file runs faster (the
parsing part is still very slow). According to my very rough
experiments:

\begin{center}
\begin{tabular}{lll}
& class-size & runtime\\\hline
Mandelbrot:\\
\hspace{5mm}unoptimised: & 33296 & 21 secs\\
\hspace{5mm}optimised:   & 21787 & 16 secs\\
\end{tabular}
\end{center}

\noindent
Quite good! Such optimisations are called \emph{peephole optimisations},
because they involve changing one or a small set of instructions into an
equivalent set that has better performance.

If you look careful at our generated code you will quickly find another
source of inefficiency in programs like

\begin{lstlisting}[mathescape,language=While]
x := ...;
write x
\end{lstlisting}

\noindent
where our code first calculates the new result the for \texttt{x} on the
stack, then pops off the result into a local variable, and after that
loads the local variable back onto the stack for writing out a number.

\begin{lstlisting}[mathescape,language=JVMIS]
...
istore 0
iload 0
...
\end{lstlisting}

\noindent
If we can detect such situations, then we can leave the value of
\texttt{x} on the stack with for example the much cheaper instruction
\instr{dup}. Now the problem with this optimisation is that it is quite
easy for the snippet above, but what about instances where there is
further WHILE-code in \emph{between} these two statements? Sometimes we
will be able to optimise, sometimes we will not. The compiler needs to
find out which situation applies. This can quickly become  much more
complicated. So we leave this kind of optimisations here and look at
something more interesting and possibly surprising.

As you might have seen, the compiler writer has a lot of freedom about
how to generate code from what the programmer wrote as program. The only
condition is that generated code should behave as expected by the
programmer. Then all is fine with the code above\ldots mission
accomplished! But sometimes the compiler writer is expected to go an
extra mile, or even miles and change(!) the meaning of a program.
Suppose we are given the following WHILE-program:

\begin{lstlisting}[mathescape,language=While]
new(arr[10]);
arr[14] := 3 + arr[13]
\end{lstlisting}

\noindent
Admittedly this is a contrived program, and probably not meant to be
like this by any sane programmer, but it is supposed to make the
following point: The program generates an array of size 10, and then
tries to access the non-existing element at index 13 and even updating
the element with index 14. Obviously this is baloney. Still, our
compiler generates code for this program without any questions asked. We
can even run this code on the JVM\ldots of course the result is an
exception trace where the JVM yells at us for doing naughty
things.\footnote{Still this is much better than C, for example, where
such errors are not prevented and as a result insidious attacks can be
mounted against such kind C-programs. I assume everyone has heard about
\emph{Buffer Overflow Attacks}.} Now what should we do in such
situations? Over- and underflows of indices are notoriously difficult to
detect statically (at compiletime). So it might seem raising an
exception at run-time like the JVM is the best compromise.

Well, imagine we do not want to rely in our compiler on the JVM for
producing an annoying, but safe exception trace, rather we want to
handle such situations ourselves according to what we think should
happen in such cases. Let us assume we want to handle them in the
following way: if the programmer access a field out-of-bounds, we just
return a default 0, and if a programmer wants to update an out-of-bounds
field, we want to ``quietly'' ignore this update. One way to achieve
this would be to rewrite the WHILE-programs and insert the necessary
if-conditions for safely reading and writing arrays. Another way
is to modify the code we generate.

\begin{lstlisting}[mathescape,language=JVMIS2]
  $\textit{index\_aexp}$
  aload loc_var
  dup2
  arraylength
  if_icmple L1
  pop2
  iconst_0
  goto L2
L1:
  swap
  iaload
L2:
\end{lstlisting}

\textbf{TBD}

 \begin{lstlisting}[mathescape,language=JVMIS2]
  $\textit{index\_aexp}$
  aload loc_var
  dup2
  arraylength
  if_icmple L1
  pop2
  goto L2
L1:
  swap
  $\textit{value\_aexp}$
  iastore
L2:
\end{lstlisting}

\begin{figure}[p]
\begin{center}
\begin{tikzpicture}[every text node part/.style={align=left},
                    stack/.style={rectangle split,rectangle split parts = 5,
                                  fill=black!20,draw,text width=1.6cm,line width=0.5mm}]
\node (A)  {};
\node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}};
\node[stack,right = 60pt] (1) at (0.east)
   {array\nodepart{two}
    $\textit{index}$\nodepart{three} \ldots\phantom{l}};
\node[stack,below = 40pt] (2) at (1.south)
   {array\nodepart{two}
    $\textit{index}$ \nodepart{three}
    array \nodepart{four}
    $\textit{index}$\nodepart{five} \ldots\phantom{l}};
\node[stack,left = 90pt] (3) at (2.west)
   {array\_len\nodepart{two}
    $\textit{index}$ \nodepart{three}
    array \nodepart{four}
    $\textit{index}$\nodepart{five} \ldots\phantom{l}};
\node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south)
   {array\nodepart{two}
    $\textit{index}$\nodepart{three} \ldots\phantom{l}};
\node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south)
   {array\nodepart{two}
    $\textit{index}$\nodepart{three} \ldots\phantom{l}};
\node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south)
   {$\textit{index}$\nodepart{two}
    array\nodepart{three} \ldots\phantom{l}};
\node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south)
   {$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}};
\node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south)
   {\ldots\phantom{l}};
\node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south)
   {0\nodepart{two} \ldots\phantom{l}};

\draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0);
\draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1);
\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2);
\draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3);
\path[->,draw,line width=2.5mm]
  let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north);
\path[->,draw,line width=2.5mm]
  let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north);
\draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a);
\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b);
\draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a);
\draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b);
\end{tikzpicture}
\end{center}
\end{figure}

goto\_w problem solved for too large jumps
\end{document}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: