afl-material: changeset 378:e8ac05fe2630

Binary file handouts/ho08.pdf has changed

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/handouts/ho08.tex	Tue Nov 17 15:03:08 2015 +0000
@@ -0,0 +1,644 @@
+\documentclass{article}
+\usepackage{../style}
+\usepackage{../langs}
+\usepackage{../grammar}
+\usepackage{../graphics}
+
+
+\begin{document}
+
+\section*{Handout 8 (A Functional Language)}
+
+The purpose of a compiler is to transform a program, a human
+can read and write, into code the machine can run as fast as
+possible. The fastest code would be machine code the CPU can
+run directly, but it is often enough to improve the speed of a
+program by just targeting a virtual machine. This produces not
+the fastest possible code, but code that is fast enough and
+has the advantage that the virtual machine takes care of
+things a compiler would normally need to take care of (like
+explicit memory management).
+
+As a first example we will implement a compiler for the very
+simple While-language. It will generate code for the Java
+Virtual Machine (JVM). This is a stack-based virtual machine,
+a fact which will make it easy to generate code for arithmetic
+expressions. For example for generating code for the
+expression $1 + 2$ we need to generate the following three
+instructions
+
+\begin{lstlisting}[numbers=none]
+ldc 1
+ldc 2
+iadd 
+\end{lstlisting}
+
+\noindent The first instruction loads the constant $1$ onto
+the stack, the next one $2$, the third instruction adds both
+numbers together replacing the top two elements of the stack
+with the result $3$. For simplicity, we will throughout
+consider only integer numbers and results. Therefore we can
+use the JVM instructions \code{iadd}, \code{isub},
+\code{imul}, \code{idiv} and so on. The \code{i} stands for
+integer instructions in the JVM (alternatives are \code{d} for
+doubles, \code{l} for longs and \code{f} for floats).
+
+Recall our grammar for arithmetic expressions ($E$ is the
+starting symbol):
+
+
+\begin{plstx}[rhs style=, margin=3cm]
: \meta{E} ::= \meta{T} $+$ \meta{E}
+         | \meta{T} $-$ \meta{E}
+         | \meta{T}\\
+: \meta{T} ::= \meta{F} $*$ \meta{T}
+          | \meta{F} $\backslash$ \meta{T}
+          | \meta{F}\\
+: \meta{F} ::= ( \meta{E} )
+          | \meta{Id}
+          | \meta{Num}\\
\end{plstx}
+
+
+\noindent where \meta{Id} stands for variables and \meta{Num}
+for numbers. For the moment let us omit variables from
+arithmetic expressions. Our parser will take this grammar and
+given an input produce abstract syntax trees. For example for
+the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
+following tree.
+
+\begin{center}
+\begin{tikzpicture}
\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
\end{tikzpicture}
+\end{center}
+
+\noindent To generate code for this expression, we need to
+traverse this tree in post-order fashion and emit code for
+each node---this traversal in post-order fashion will produce
+code for a stack-machine (what the JVM is). Doing so for the
+tree above generates the instructions
+
+\begin{lstlisting}[numbers=none]
+ldc 1 
+ldc 2 
+ldc 3 
+imul 
+ldc 4 
+ldc 3 
+isub 
+iadd 
+iadd
+\end{lstlisting}
+
+\noindent If we ``run'' these instructions, the result $8$
+will be on top of the stack (I leave this to you to verify;
+the meaning of each instruction should be clear). The result
+being on the top of the stack will be a convention we always
+observe in our compiler, that is the results of arithmetic
+expressions will always be on top of the stack. Note, that a
+different bracketing of the expression, for example $(1 + (2 *
+3)) + (4 - 3)$, produces a different abstract syntax tree and
+thus potentially also a different list of instructions.
+Generating code in this fashion is rather easy to implement:
+it can be done with the following recursive
+\textit{compile}-function, which takes the abstract syntax
+tree as argument:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
+$\textit{compile}(a_1 + a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1 - a_2)$ & $\dn$ & 
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1 * a_2)$ & $\dn$ & 
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
+\end{tabular}
+\end{center}
+
+However, our arithmetic expressions can also contain
+variables. We will represent them as \emph{local variables} in
+the JVM. Essentially, local variables are an array or pointers
+to memory cells, containing in our case only integers. Looking
+up a variable can be done with the instruction
+
+\begin{lstlisting}[mathescape,numbers=none]
+iload $index$
+\end{lstlisting}
+
+\noindent 
+which places the content of the local variable $index$ onto 
+the stack. Storing the top of the stack into a local variable 
+can be done by the instruction
+
+\begin{lstlisting}[mathescape,numbers=none]
+istore $index$
+\end{lstlisting}
+
+\noindent Note that this also pops off the top of the stack.
+One problem we have to overcome, however, is that local
+variables are addressed, not by identifiers, but by numbers
+(starting from $0$). Therefore our compiler needs to maintain
+a kind of environment where variables are associated to
+numbers. This association needs to be unique: if we muddle up
+the numbers, then we essentially confuse variables and the
+consequence will usually be an erroneous result. Our extended
+\textit{compile}-function for arithmetic expressions will
+therefore take two arguments: the abstract syntax tree and the
+environment, $E$, that maps identifiers to index-numbers.
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
+$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
+$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
+$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
+\end{tabular}
+\end{center}
+
+\noindent In the last line we generate the code for variables
+where $E(x)$ stands for looking up the environment to which
+index the variable $x$ maps to.
+
+There is a similar \textit{compile}-function for boolean
+expressions, but it includes a ``trick'' to do with
+\pcode{if}- and \pcode{while}-statements. To explain the issue
+let us first describe the compilation of statements of the
+While-language. The clause for \pcode{skip} is trivial, since
+we do not have to generate any instruction
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
+\end{tabular}
+\end{center}
+
+\noindent $[]$ is the empty list of instructions. Note that
+the \textit{compile}-function for statements returns a pair, a
+list of instructions (in this case the empty list) and an
+environment for variables. The reason for the environment is
+that assignments in the While-language might change the
+environment---clearly if a variable is used for the first
+time, we need to allocate a new index and if it has been used
+before, we need to be able to retrieve the associated index.
+This is reflected in the clause for compiling assignments:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(x := a, E)$ & $\dn$ & 
+$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
+\end{tabular}
+\end{center}
+
+\noindent We first generate code for the right-hand side of
+the assignment and then add an \pcode{istore}-instruction at
+the end. By convention the result of the arithmetic expression
+$a$ will be on top of the stack. After the \pcode{istore}
+instruction, the result will be stored in the index
+corresponding to the variable $x$. If the variable $x$ has
+been used before in the program, we just need to look up what
+the index is and return the environment unchanged (that is in
+this case $E' = E$). However, if this is the first encounter 
+of the variable $x$ in the program, then we have to augment 
+the environment and assign $x$ with the largest index in $E$
+plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
+That means for the assignment $x := x + 1$ we generate the
+following code
+
+\begin{lstlisting}[mathescape,numbers=none]
+iload $n_x$
+ldc 1
+iadd
+istore $n_x$
+\end{lstlisting}
+
+\noindent 
+where $n_x$ is the index for the variable $x$.
+
+More complicated is the code for \pcode{if}-statments, say
+
+\begin{lstlisting}[mathescape,language={},numbers=none]
+if $b$ then $cs_1$ else $cs_2$
+\end{lstlisting}
+
+\noindent where $b$ is a boolean expression and the $cs_{1/2}$
+are the statements for each \pcode{if}-branch. Lets assume
+we already generated code for $b$ and $cs_{1/2}$. Then in the
+true-case the control-flow of the program needs to be
+
+\begin{center}
+\begin{tikzpicture}[node distance=2mm and 4mm,
+ block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
+ point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
+ skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
+\node (A1) [point] {};
+\node (b) [block, right=of A1] {code of $b$};
+\node (A2) [point, right=of b] {};
+\node (cs1) [block, right=of A2] {code of $cs_1$};
+\node (A3) [point, right=of cs1] {};
+\node (cs2) [block, right=of A3] {code of $cs_2$};
+\node (A4) [point, right=of cs2] {};
+
+\draw (A1) edge [->, black, line width=1mm] (b);
+\draw (b) edge [->, black, line width=1mm] (cs1);
+\draw (cs1) edge [->, black, line width=1mm] (A3);
+\draw (A3) edge [->, black, skip loop] (A4);
+\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
+\end{tikzpicture}
+\end{center}
+
+\noindent where we start with running the code for $b$; since
+we are in the true case we continue with running the code for
+$cs_1$. After this however, we must not run the code for
+$cs_2$, but always jump after the last instruction of $cs_2$
+(the code for the \pcode{else}-branch). Note that this jump is
+unconditional, meaning we always have to jump to the end of
+$cs_2$. The corresponding instruction of the JVM is
+\pcode{goto}. In case $b$ turns out to be false we need the
+control-flow
+
+\begin{center}
+\begin{tikzpicture}[node distance=2mm and 4mm,
+ block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
+ point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
+ skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
+\node (A1) [point] {};
+\node (b) [block, right=of A1] {code of $b$};
+\node (A2) [point, right=of b] {};
+\node (cs1) [block, right=of A2] {code of $cs_1$};
+\node (A3) [point, right=of cs1] {};
+\node (cs2) [block, right=of A3] {code of $cs_2$};
+\node (A4) [point, right=of cs2] {};
+
+\draw (A1) edge [->, black, line width=1mm] (b);
+\draw (b) edge [->, black, line width=1mm] (A2);
+\draw (A2) edge [skip loop] (A3);
+\draw (A3) edge [->, black, line width=1mm] (cs2);
+\draw (cs2) edge [->,black, line width=1mm] (A4);
+\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
+\end{tikzpicture}
+\end{center}
+
+\noindent where we now need a conditional jump (if the
+if-condition is false) from the end of the code for the 
+boolean to the beginning of the instructions $cs_2$. Once we 
+are finished with running $cs_2$ we can continue with whatever
+code comes after the if-statement.
+
+The \pcode{goto} and the conditional jumps need addresses to
+where the jump should go. Since we are generating assembly
+code for the JVM, we do not actually have to give (numeric)
+addresses, but can just attach (symbolic) labels to our code.
+These labels specify a target for a jump. Therefore the labels
+need to be unique, as otherwise it would be ambiguous where a
+jump should go to. A label, say \pcode{L}, is attached to code
+like
+
+\begin{lstlisting}[mathescape,numbers=none]
+L:
+  $instr_1$
+  $instr_2$
+    $\vdots$
+\end{lstlisting}
+ 
+\noindent where a label is indicated by a colon. 
+ 
+Recall the ``trick'' with compiling boolean expressions: the 
+\textit{compile}-function for boolean expressions takes three
+arguments: an abstract syntax tree, an environment for 
+variable indices and also the label, $lab$, to where an conditional 
+jump needs to go. The clause for the expression $a_1 = a_2$, 
+for example, is as follows:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ 
+\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$}
+\end{tabular}
+\end{center}
+
+\noindent where we are first generating code for the
+subexpressions $a_1$ and $a_2$. This will mean after running
+the corresponding code there will be two integers on top of
+the stack. If they are equal, we do not have to do anything
+(except for popping them off from the stack) and just continue
+with the next instructions (see control-flow of ifs above).
+However if they are \emph{not} equal, then we need to
+(conditionally) jump to the label $lab$. This can be done with
+the instruction
+
+\begin{lstlisting}[mathescape,numbers=none]
+if_icmpne $lab$
+\end{lstlisting}
+
+\noindent Other jump instructions for boolean operators are
+
+\begin{center}
+\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
+$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
+$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
+$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
+\end{tabular}
+\end{center}
+
+\noindent and so on. I leave it to you to extend the
+\textit{compile}-function for the other boolean expressions.
+Note that we need to jump whenever the boolean is \emph{not}
+true, which means we have to ``negate'' the jump
+condition---equals becomes not-equal, less becomes
+greater-or-equal. If you do not like this design (it can be
+the source of some nasty, hard-to-detect errors), you can also
+change the layout of the code and first give the code for the
+else-branch and then for the if-branch. However in the case
+of while-loops this way of generating code still seems
+the most convenient.
+
+We are now ready to give the compile function for 
+if-statments---remember this function returns for staments a 
+pair consisting of the code and an environment:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ 
+\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
+\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
+\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
+\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\
+\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\
+\end{tabular}
+\end{center}
+
+\noindent In the first two lines we generate two fresh labels
+for the jump addresses (just before the else-branch and just
+after). In the next two lines we generate the instructions for
+the two branches, $is_1$ and $is_2$. The final code will
+be first the code for $b$ (including the label 
+just-before-the-else-branch), then the \pcode{goto} for after
+the else-branch, the label $L_\textit{ifesle}$, followed by
+the instructions for the else-branch, followed by the 
+after-the-else-branch label. Consider for example the 
+if-statement:
+
+\begin{lstlisting}[mathescape,numbers=none,language={}]
+if 1 = 1 then x := 2 else y := 3
+\end{lstlisting}
+
+\noindent 
+The generated code is as follows:
+
+\begin{lstlisting}[mathescape,language={}]
+   ldc 1
+   ldc 1
+   if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
+   ldc 2
+   istore 0
+   goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$
+L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$
+   ldc 3
+   istore 1
+L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$
+\end{lstlisting}
+
+\begin{tikzpicture}[remember picture,overlay]
  \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) 
+           -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);
+  \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) 
+           -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);
\end{tikzpicture}
+
+\noindent The first three lines correspond to the the boolean
+expression $1 = 1$. The jump for when this boolean expression
+is false is in Line~3. Lines 4-6 corresponds to the if-branch;
+the else-branch is in Lines 8 and 9. Note carefully how the
+environment $E$ is threaded through the recursive calls of
+\textit{compile}. The function receives an environment $E$,
+but it might extend it when compiling the if-branch, yielding
+$E'$. This happens for example in the if-statement above
+whenever the variable \code{x} has not been used before.
+Similarly with the environment $E''$ for the second call to
+\textit{compile}. $E''$ is also the environment that needs to
+be returned as part of the answer.
+
+The compilation of the while-loops, say 
+\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
+the condition is true and we need to do another iteration, 
+and the control-flow needs to be as follows
+
+\begin{center}
+\begin{tikzpicture}[node distance=2mm and 4mm,
+ block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
+ point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
+ skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
+\node (A0) [point, left=of A1] {};
+\node (A1) [point] {};
+\node (b) [block, right=of A1] {code of $b$};
+\node (A2) [point, right=of b] {};
+\node (cs1) [block, right=of A2] {code of $cs$};
+\node (A3) [point, right=of cs1] {};
+\node (A4) [point, right=of A3] {};
+
+\draw (A0) edge [->, black, line width=1mm] (b);
+\draw (b) edge [->, black, line width=1mm] (cs1);
+\draw (cs1) edge [->, black, line width=1mm] (A3);
+\draw (A3) edge [->,skip loop] (A1);
+\end{tikzpicture}
+\end{center}
+
+\noindent Whereas if the condition is \emph{not} true, we
+need to jump out of the loop, which gives the following
+control flow.
+
+\begin{center}
+\begin{tikzpicture}[node distance=2mm and 4mm,
+ block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
+ point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
+ skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
+\node (A0) [point, left=of A1] {};
+\node (A1) [point] {};
+\node (b) [block, right=of A1] {code of $b$};
+\node (A2) [point, right=of b] {};
+\node (cs1) [block, right=of A2] {code of $cs$};
+\node (A3) [point, right=of cs1] {};
+\node (A4) [point, right=of A3] {};
+
+\draw (A0) edge [->, black, line width=1mm] (b);
+\draw (b) edge [->, black, line width=1mm] (A2);
+\draw (A2) edge [skip loop] (A3);
+\draw (A3) edge [->, black, line width=1mm] (A4);
+\end{tikzpicture}
+\end{center}
+
+\noindent Again we can use the \textit{compile}-function for
+boolean expressions to insert the appropriate jump to the
+end of the loop (label $L_{wend}$ below).
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ 
+\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
+\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
+\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
+\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
+\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
+\end{tabular}
+\end{center}
+
+\noindent I let you go through how this clause works. As an example
+you can consider the while-loop
+
+\begin{lstlisting}[mathescape,numbers=none,language={}]
+while x <= 10 do x := x + 1
+\end{lstlisting}
+
+\noindent yielding the following code
+
+\begin{lstlisting}[mathescape,language={}]
+L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
+   iload 0
+   ldc 10
+   if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
+   iload 0
+   ldc 1
+   iadd
+   istore 0
+   goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
+L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
+\end{lstlisting}
+ 
+\begin{tikzpicture}[remember picture,overlay]
  \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) 
+           -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
+  \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
+           -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
\end{tikzpicture}
+
+
+Next we need to consider the statement \pcode{write x}, which
+can be used to print out the content of a variable. For this
+we need to use a Java library function. In order to avoid
+having to generate a lot of code for each
+\pcode{write}-command, we use a separate helper-method and
+just call this method with an argument (which needs to be
+placed onto the stack). The code of the helper-method is as
+follows.
+
+
+\begin{lstlisting}[language=JVMIS]
+.method public static write(I)V 
+    .limit locals 1 
+    .limit stack 2 
+    getstatic java/lang/System/out Ljava/io/PrintStream; 
+    iload 0
+    invokevirtual java/io/PrintStream/println(I)V 
+    return 
+.end method
+\end{lstlisting}
+
+\noindent The first line marks the beginning of the method,
+called \pcode{write}. It takes a single integer argument
+indicated by the \pcode{(I)} and returns no result, indicated
+by the \pcode{V}. Since the method has only one argument, we
+only need a single local variable (Line~2) and a stack with
+two cells will be sufficient (Line 3). Line 4 instructs the
+JVM to get the value of the field \pcode{out} of the class
+\pcode{java/lang/System}. It expects the value to be of type
+\pcode{java/io/PrintStream}. A reference to this value will be
+placed on the stack. Line~5 copies the integer we want to
+print out onto the stack. In the next line we call the method
+\pcode{println} (from the class \pcode{java/io/PrintStream}).
+We want to print out an integer and do not expect anything
+back (that is why the type annotation is \pcode{(I)V}). The
+\pcode{return}-instruction in the next line changes the
+control-flow back to the place from where \pcode{write} was
+called. This method needs to be part of a header that is
+included in any code we generate. The helper-method
+\pcode{write} can be invoked with the two instructions
+
+\begin{lstlisting}[mathescape,language=JVMIS]
+iload $E(x)$ 
+invokestatic XXX/XXX/write(I)V
+\end{lstlisting}
+
+\noindent where we first place the variable to be printed on
+top of the stack and then call \pcode{write}. The \pcode{XXX}
+need to be replaced by an appropriate class name (this will be
+explained shortly).
+
+
+\begin{figure}[t]
+\begin{lstlisting}[mathescape,language=JVMIS]
+.class public XXX.XXX
+.super java/lang/Object
+
+.method public <init>()V
+    aload_0
+    invokenonvirtual java/lang/Object/<init>()V
+    return
+.end method
+
+.method public static main([Ljava/lang/String;)V
+    .limit locals 200
+    .limit stack 200
+
+      $\textit{\ldots{}here comes the compiled code\ldots}$
+
+    return
+.end method
+\end{lstlisting}
+\caption{Boilerplate code needed for running generated code.\label{boiler}}
+\end{figure}
+
+
+By generating code for a While-program, we end up with a list
+of (JVM assembly) instructions. Unfortunately, there is a bit
+more boilerplate code needed before these instructions can be
+run. The complete code is shown in Figure~\ref{boiler}. This
+boilerplate code is very specific to the JVM. If we target any
+other virtual machine or a machine language, then we would
+need to change this code. Lines 4 to 8 in Figure~\ref{boiler}
+contain a method for object creation in the JVM; this method
+is called \emph{before} the \pcode{main}-method in Lines 10 to
+17. Interesting are the Lines 11 and 12 where we hardwire that
+the stack of our programs will never be larger than 200 and
+that the maximum number of variables is also 200. This seem to
+be conservative default values that allow is to run some
+simple While-programs. In a real compiler, we would of course
+need to work harder and find out appropriate values for the
+stack and local variables.
+
+To sum up, in Figure~\ref{test} is the complete code generated
+for the slightly non-sensical program
+
+\begin{lstlisting}[mathescape,language=While]
+x := 1 + 2;
+write x
+\end{lstlisting}
+
+\noindent Having this code at our disposal, we need the
+assembler to translate the generated code into JVM bytecode (a
+class file). This bytecode is understood by the JVM and can be
+run by just invoking the \pcode{java}-program.
+
+
+\begin{figure}[p]
+\lstinputlisting{../progs/test-small.j}
+\caption{Generated code for a test program. This code can be 
+processed by an Java assembler producing a class-file, which
+can be run by the \pcode{java}-program.\label{test}}
+\end{figure}
+
+\end{document}
+
+%%% Local Variables: 
+%%% mode: latex  
+%%% TeX-master: t
+%%% End:

author	Christian Urban <christian dot urban at kcl dot ac dot uk>
	Tue, 17 Nov 2015 15:03:08 +0000
changeset 378	e8ac05fe2630
parent 377	a052a83f562e
child 379	fa2589ec0fae

handouts/ho08.pdf		file \| annotate \| diff \| comparison \| revisions
handouts/ho08.tex		file \| annotate \| diff \| comparison \| revisions