--- a/handouts/ho07.tex Sun Nov 15 21:31:31 2015 +0000
+++ b/handouts/ho07.tex Mon Nov 16 15:16:17 2015 +0000
@@ -1,11 +1,12 @@
\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
-
+\usepackage{../grammar}
+\usepackage{tikz-qtree}
\begin{document}
-\section*{Handout 7 (Compilation)}
+\section*{Handout 7 (Compilation of the WHILE-Language)}
The purpose of a compiler is to transform a program, a human
can write, into code the machine can run as fast as possible.
@@ -19,8 +20,142 @@
We will be generating code for the Java Virtual Machine. This
is a stack-based virtual machine which will make it easy to
-generate code for arithmetic expressions. Recall
-that our
+generate code for arithmetic expressions. For example for
+generating code for the expression $1 + 2$ we need to issue
+the following three instructions
+
+\begin{lstlisting}[numbers=none]
+ldc 1
+ldc 2
+iadd
+\end{lstlisting}
+
+\noindent The first instruction loads the constant $1$ on the
+stack, the next one $2$, the third instruction add both
+numbers together replacing the top elements of the stack with
+the result $3$. We will throughout consider only integer
+numbers and results, therefore we can use the instructions
+\code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
+The \code{i} stands for integer instructions (alternatives are
+\code{d} for doubles, \code{l} for longs and \code{f} for
+floats).
+
+Recall our grammar for arithmetic expressions:
+
+
+\begin{plstx}[rhs style=, margin=3cm]
: \meta{E} ::= \meta{T} $+$ \meta{E}
+ | \meta{T} $-$ \meta{E}
+ | \meta{T}\\
+: \meta{T} ::= \meta{F} $*$ \meta{T}
+ | \meta{F} $\backslash$ \meta{T}
+ | \meta{F}\\
+: \meta{F} ::= ( \meta{E} )
+ | \meta{Id}
+ | \meta{Num}\\
\end{plstx}
+
+
+\noindent where \meta{Id} stands for variables and
+\meta{Num} for number. For the moment let us omit variables from
+arithmetic expressions. Our parser will take this grammar and
+produce abstract syntax trees, for
+example for the expression $1 + ((2 * 3) + (4 - 3))$ it will
+produce the following tree.
+
+\begin{center}
+\begin{tikzpicture}
\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
\end{tikzpicture}
+\end{center}
+
+\noindent
+To generate code for this expression, we need to traverse this
+tree in post-order fashion---this will produce code for a
+stack-machine, like the JVM. Doing so gives the instructions
+
+\begin{lstlisting}[numbers=none]
+ldc 1
+ldc 2
+ldc 3
+imul
+ldc 4
+ldc 3
+isub
+iadd
+iadd
+\end{lstlisting}
+
+\noindent If we ``run'' these instructions, the result $8$
+will be on top of the stack. This will be a convention we
+always observe, namely that the results of arithmetic
+expressions will always be on top of the stack. Note, that a
+different bracketing, for example $(1 + (2 * 3)) + (4 - 3)$,
+produces a different abstract syntax tree and thus potentially
+also a different list of instructions. Generating code in this
+fashion is rather simple: it can be done by the following
+\textit{compile}-function:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
+$\textit{compile}(a_1 + a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1 - a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1 * a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
+\end{tabular}
+\end{center}
+
+\noindent However, our arithmetic expressions can also contain
+variables. We will represent them as \emph{local variables} in
+the JVM. Essentially, local variables are an array or pointers
+to the memory containing in our case only integers. Looking up
+a variable can be done by the instruction
+
+\begin{lstlisting}[mathescape,numbers=none]
+iload $index$
+\end{lstlisting}
+
+\noindent
+which places the content of the local variable $index$ onto
+thestack. Storing the top of the stack into a local variable
+can be done by the instruction
+
+\begin{lstlisting}[mathescape,numbers=none]
+istore $index$
+\end{lstlisting}
+
+\noindent Note that this also pops off the top of the stack.
+One problem we have to overcome is that local variables are
+addressed, not by identifiers, but by numbers (starting from
+$0$). Therefore our compiler needs to maintain a kind of
+environment (similar to the interpreter) where variables are
+associated to numbers. This association needs to be unique: if
+we muddle up the numbers, then we essentially confuse
+variables and the result will usually be an erroneous result.
+Therefore our \textit{compile}-function will take two
+arguments: the abstract syntax tree and the environment, $E$,
+that maps identifiers to index-numbers.
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
+$\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
+$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
+\end{tabular}
+\end{center}
+
+\noindent In the last line we generate the code for variables
+where $E(x)$ stands for looking up to which index the variable
+$x$ maps to.
+
\end{document}