handouts/ho07.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Mon, 16 Nov 2015 15:16:17 +0000
changeset 370 a65767fe5d71
parent 369 43c0ed473720
child 372 d6af4b1239de
permissions -rw-r--r--
updated

\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\usepackage{../grammar}
\usepackage{tikz-qtree}

\begin{document}

\section*{Handout 7 (Compilation of the WHILE-Language)}

The purpose of a compiler is to transform a program, a human
can write, into code the machine can run as fast as possible.
The fastest code would be machine code the CPU can run
directly, but it is often enough to improve the speed of a
program by just targeting a virtual machine. This produces not
the fastest possible code, but code that is fast enough and
has the advantage that the virtual machine care of things a
compiler would normally need to take care of (like explicit
memory management).

We will be generating code for the Java Virtual Machine. This
is a stack-based virtual machine which will make it easy to
generate code for arithmetic expressions. For example for 
generating code for the expression $1 + 2$ we need to issue
the following three instructions

\begin{lstlisting}[numbers=none]
ldc 1
ldc 2
iadd 
\end{lstlisting}

\noindent The first instruction loads the constant $1$ on the 
stack, the next one $2$, the third instruction add both 
numbers together replacing the top elements of the stack with 
the result $3$. We will throughout consider only integer 
numbers and results, therefore we can use the instructions
\code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
The \code{i} stands for integer instructions (alternatives are
\code{d} for doubles, \code{l} for longs and \code{f} for 
floats).

Recall our grammar for arithmetic expressions:


\begin{plstx}[rhs style=, margin=3cm]
: \meta{E} ::= \meta{T} $+$ \meta{E}
         | \meta{T} $-$ \meta{E}
         | \meta{T}\\
: \meta{T} ::= \meta{F} $*$ \meta{T}
          | \meta{F} $\backslash$ \meta{T}
          | \meta{F}\\
: \meta{F} ::= ( \meta{E} )
          | \meta{Id}
          | \meta{Num}\\
\end{plstx}


\noindent where \meta{Id} stands for variables and
\meta{Num} for number. For the moment let us omit variables from
arithmetic expressions. Our parser will take this grammar and
produce abstract syntax trees, for
example for the expression $1 + ((2 * 3) + (4 - 3))$ it will
produce the following tree.

\begin{center}
\begin{tikzpicture}
\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
\end{tikzpicture}
\end{center}

\noindent 
To generate code for this expression, we need to traverse this
tree in post-order fashion---this will produce code for a 
stack-machine, like the JVM. Doing so gives the instructions

\begin{lstlisting}[numbers=none]
ldc 1 
ldc 2 
ldc 3 
imul 
ldc 4 
ldc 3 
isub 
iadd 
iadd
\end{lstlisting}

\noindent If we ``run'' these instructions, the result $8$
will be on top of the stack. This will be a convention we
always observe, namely that the results of arithmetic
expressions will always be on top of the stack. Note, that a
different bracketing, for example $(1 + (2 * 3)) + (4 - 3)$,
produces a different abstract syntax tree and thus potentially
also a different list of instructions. Generating code in this
fashion is rather simple: it can be done by the following
\textit{compile}-function:

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
$\textit{compile}(a_1 + a_2)$ & $\dn$ &
$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
$\textit{compile}(a_1 - a_2)$ & $\dn$ & 
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
$\textit{compile}(a_1 * a_2)$ & $\dn$ & 
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
\end{tabular}
\end{center}

\noindent However, our arithmetic expressions can also contain
variables. We will represent them as \emph{local variables} in
the JVM. Essentially, local variables are an array or pointers
to the memory containing in our case only integers. Looking up
a variable can be done by the instruction

\begin{lstlisting}[mathescape,numbers=none]
iload $index$
\end{lstlisting}

\noindent 
which places the content of the local variable $index$ onto 
thestack. Storing the top of the stack into a local variable 
can be done by the instruction

\begin{lstlisting}[mathescape,numbers=none]
istore $index$
\end{lstlisting}

\noindent Note that this also pops off the top of the stack.
One problem we have to overcome is that local variables are
addressed, not by identifiers, but by numbers (starting from
$0$). Therefore our compiler needs to maintain a kind of
environment (similar to the interpreter) where variables are
associated to numbers. This association needs to be unique: if
we muddle up the numbers, then we essentially confuse
variables and the result will usually be an erroneous result.
Therefore our \textit{compile}-function will take two
arguments: the abstract syntax tree and the environment, $E$,
that maps identifiers to index-numbers.

\begin{center}
\begin{tabular}{lcl}
$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
\end{tabular}
\end{center}

\noindent In the last line we generate the code for variables
where $E(x)$ stands for looking up to which index the variable
$x$ maps to.


\end{document}

%%% Local Variables: 
%%% mode: latex  
%%% TeX-master: t
%%% End: