--- a/handouts/ho07.tex Tue Nov 17 04:53:14 2015 +0000
+++ b/handouts/ho07.tex Tue Nov 17 13:31:19 2015 +0000
@@ -10,19 +10,19 @@
\section*{Handout 7 (Compilation)}
The purpose of a compiler is to transform a program, a human
-can write, into code the machine can run as fast as possible.
-The fastest code would be machine code the CPU can run
-directly, but it is often enough to improve the speed of a
+can read and write, into code the machine can run as fast as
+possible. The fastest code would be machine code the CPU can
+run directly, but it is often enough to improve the speed of a
program by just targeting a virtual machine. This produces not
the fastest possible code, but code that is fast enough and
-has the advantage that the virtual machine care of things a
-compiler would normally need to take care of (like explicit
-memory management).
+has the advantage that the virtual machine takes care of
+things a compiler would normally need to take care of (like
+explicit memory management).
-As an example we will implement a compiler for the very simple
-While-language. We will be generating code for the Java
-Virtual Machine. This is a stack-based virtual machine, a fact
-which will make it easy to generate code for arithmetic
+As a first example we will implement a compiler for the very
+simple While-language. It will generate code for the Java
+Virtual Machine (JVM). This is a stack-based virtual machine,
+a fact which will make it easy to generate code for arithmetic
expressions. For example for generating code for the
expression $1 + 2$ we need to generate the following three
instructions
@@ -35,13 +35,13 @@
\noindent The first instruction loads the constant $1$ onto
the stack, the next one $2$, the third instruction adds both
-numbers together replacing the top elements of the stack with
-the result $3$. For simplicity, we will throughout consider
-only integer numbers and results. Therefore we can use the
-instructions \code{iadd}, \code{isub}, \code{imul},
-\code{idiv} and so on. The \code{i} stands for integer
-instructions in the JVM (alternatives are \code{d} for doubles,
-\code{l} for longs and \code{f} for floats).
+numbers together replacing the top two elements of the stack
+with the result $3$. For simplicity, we will throughout
+consider only integer numbers and results. Therefore we can
+use the JVM instructions \code{iadd}, \code{isub},
+\code{imul}, \code{idiv} and so on. The \code{i} stands for
+integer instructions in the JVM (alternatives are \code{d} for
+doubles, \code{l} for longs and \code{f} for floats).
Recall our grammar for arithmetic expressions ($E$ is the
starting symbol):
@@ -58,12 +58,12 @@
| \meta{Num}\\
\end{plstx}
-\noindent where \meta{Id} stands for variables and
-\meta{Num} for numbers. For the moment let us omit variables from
+\noindent where \meta{Id} stands for variables and \meta{Num}
+for numbers. For the moment let us omit variables from
arithmetic expressions. Our parser will take this grammar and
-produce abstract syntax trees. For
-example for the expression $1 + ((2 * 3) + (4 - 3))$ it will
-produce the following tree.
+given an input produce abstract syntax trees. For example for
+the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
+following tree.
\begin{center}
\begin{tikzpicture}
\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
\end{tikzpicture}
@@ -97,8 +97,9 @@
3)) + (4 - 3)$, produces a different abstract syntax tree and
thus potentially also a different list of instructions.
Generating code in this fashion is rather easy to implement:
-it can be done with the following \textit{compile}-function,
-which takes the abstract syntax tree as argument:
+it can be done with the following recursive
+\textit{compile}-function, which takes the abstract syntax
+tree as argument:
\begin{center}
\begin{tabular}{lcl}
@@ -167,7 +168,7 @@
There is a similar \textit{compile}-function for boolean
expressions, but it includes a ``trick'' to do with
\pcode{if}- and \pcode{while}-statements. To explain the issue
-let us explain first the compilation of statements of the
+let us first describe the compilation of statements of the
While-language. The clause for \pcode{skip} is trivial, since
we do not have to generate any instruction
@@ -177,19 +178,19 @@
\end{tabular}
\end{center}
-\noindent Note that the \textit{compile}-function for
-statements returns a pair, a list of instructions (in this
-case the empty list) and an environment for variables. The
-reason for the environment is that assignments in the
-While-language might change the environment---clearly if a
-variable is used for the first time, we need to allocate a new
-index and if it has been used before, we need to be able to
-retrieve the associated index. This is reflected in
-the clause for compiling assignments:
+\noindent $[]$ is the empty list of instructions. Note that
+the \textit{compile}-function for statements returns a pair, a
+list of instructions (in this case the empty list) and an
+environment for variables. The reason for the environment is
+that assignments in the While-language might change the
+environment---clearly if a variable is used for the first
+time, we need to allocate a new index and if it has been used
+before, we need to be able to retrieve the associated index.
+This is reflected in the clause for compiling assignments:
\begin{center}
\begin{tabular}{lcl}
-$\text{compile}(x := a, E)$ & $\dn$ &
+$\textit{compile}(x := a, E)$ & $\dn$ &
$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
\end{tabular}
\end{center}
@@ -225,8 +226,8 @@
if $b$ then $cs_1$ else $cs_2$
\end{lstlisting}
-\noindent where $b$ is a boolean expression and the $cs_i$
-are the instructions for each \pcode{if}-branch. Lets assume
+\noindent where $b$ is a boolean expression and the $cs_{1/2}$
+are the statements for each \pcode{if}-branch. Lets assume
we already generated code for $b$ and $cs_{1/2}$. Then in the
true-case the control-flow of the program needs to be
@@ -289,13 +290,14 @@
are finished with running $cs_2$ we can continue with whatever
code comes after the if-statement.
-The \pcode{goto} and conditional jumps need addresses to where
-the jump should go. Since we are generating assembly code for
-the JVM, we do not actually have to give addresses, but need
-to attach labels to our code. These labels specify a target
-for a jump. Therefore the labels need to be unique, as
-otherwise it would be ambiguous where a jump should go.
-A labels, say \pcode{L}, is attached to code like
+The \pcode{goto} and the conditional jumps need addresses to
+where the jump should go. Since we are generating assembly
+code for the JVM, we do not actually have to give (numeric)
+addresses, but can just attach (symbolic) labels to our code.
+These labels specify a target for a jump. Therefore the labels
+need to be unique, as otherwise it would be ambiguous where a
+jump should go to. A label, say \pcode{L}, is attached to code
+like
\begin{lstlisting}[mathescape,numbers=none]
L:
@@ -304,6 +306,8 @@
$\vdots$
\end{lstlisting}
+\noindent where a label is indicated by a colon.
+
Recall the ``trick'' with compiling boolean expressions: the
\textit{compile}-function for boolean expressions takes three
arguments: an abstract syntax tree, an environment for
@@ -318,14 +322,15 @@
\end{tabular}
\end{center}
-\noindent
-We are generating code for the subexpressions $a_1$ and $a_2$.
-This will mean after running the corresponding code there will
-be two integers on top of the stack. If they are equal, we do
-not have to do anything and just continue with the next
-instructions (see control-flow of ifs above). However if they
-are \emph{not} equal, then we need to (conditionally) jump to
-the label $lab$. This can be done with the instruction
+\noindent where we are first generating code for the
+subexpressions $a_1$ and $a_2$. This will mean after running
+the corresponding code there will be two integers on top of
+the stack. If they are equal, we do not have to do anything
+(except for popping them off from the stack) and just continue
+with the next instructions (see control-flow of ifs above).
+However if they are \emph{not} equal, then we need to
+(conditionally) jump to the label $lab$. This can be done with
+the instruction
\begin{lstlisting}[mathescape,numbers=none]
if_icmpne $lab$
@@ -335,7 +340,6 @@
\begin{center}
\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
-$=$ & $\Rightarrow$ & \pcode{if_icmpne}\\
$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
@@ -345,15 +349,17 @@
\noindent and so on. I leave it to you to extend the
\textit{compile}-function for the other boolean expressions.
Note that we need to jump whenever the boolean is \emph{not}
-true, which means we have to ``negate'' the jump---equals
-becomes not-equal, less becomes greater-or-equal. If you do
-not like this design (it can be the source of some nasty,
-hard-to-detect errors), you can also change the layout of the
-code and first give the code for the else-branch and then for
-the if-branch.
+true, which means we have to ``negate'' the jump
+condition---equals becomes not-equal, less becomes
+greater-or-equal. If you do not like this design (it can be
+the source of some nasty, hard-to-detect errors), you can also
+change the layout of the code and first give the code for the
+else-branch and then for the if-branch. However in the case
+of while-loops this way of generating code still seems
+the most convenient.
We are now ready to give the compile function for
-if-statments--remember this function returns for staments a
+if-statments---remember this function returns for staments a
pair consisting of the code and an environment:
\begin{center}