# HG changeset patch # User Christian Urban # Date 1447772588 0 # Node ID e8ac05fe26303c74d2857e19e523eb2bc4bd863d # Parent a052a83f562e122af43455b9842ab7ac90ee47d6 updated diff -r a052a83f562e -r e8ac05fe2630 handouts/ho08.pdf Binary file handouts/ho08.pdf has changed diff -r a052a83f562e -r e8ac05fe2630 handouts/ho08.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/handouts/ho08.tex Tue Nov 17 15:03:08 2015 +0000 @@ -0,0 +1,644 @@ +\documentclass{article} +\usepackage{../style} +\usepackage{../langs} +\usepackage{../grammar} +\usepackage{../graphics} + + +\begin{document} + +\section*{Handout 8 (A Functional Language)} + +The purpose of a compiler is to transform a program, a human +can read and write, into code the machine can run as fast as +possible. The fastest code would be machine code the CPU can +run directly, but it is often enough to improve the speed of a +program by just targeting a virtual machine. This produces not +the fastest possible code, but code that is fast enough and +has the advantage that the virtual machine takes care of +things a compiler would normally need to take care of (like +explicit memory management). + +As a first example we will implement a compiler for the very +simple While-language. It will generate code for the Java +Virtual Machine (JVM). This is a stack-based virtual machine, +a fact which will make it easy to generate code for arithmetic +expressions. For example for generating code for the +expression $1 + 2$ we need to generate the following three +instructions + +\begin{lstlisting}[numbers=none] +ldc 1 +ldc 2 +iadd +\end{lstlisting} + +\noindent The first instruction loads the constant $1$ onto +the stack, the next one $2$, the third instruction adds both +numbers together replacing the top two elements of the stack +with the result $3$. For simplicity, we will throughout +consider only integer numbers and results. Therefore we can +use the JVM instructions \code{iadd}, \code{isub}, +\code{imul}, \code{idiv} and so on. The \code{i} stands for +integer instructions in the JVM (alternatives are \code{d} for +doubles, \code{l} for longs and \code{f} for floats). + +Recall our grammar for arithmetic expressions ($E$ is the +starting symbol): + + +\begin{plstx}[rhs style=, margin=3cm] : \meta{E} ::= \meta{T} $+$ \meta{E} + | \meta{T} $-$ \meta{E} + | \meta{T}\\ +: \meta{T} ::= \meta{F} $*$ \meta{T} + | \meta{F} $\backslash$ \meta{T} + | \meta{F}\\ +: \meta{F} ::= ( \meta{E} ) + | \meta{Id} + | \meta{Num}\\ \end{plstx} + + +\noindent where \meta{Id} stands for variables and \meta{Num} +for numbers. For the moment let us omit variables from +arithmetic expressions. Our parser will take this grammar and +given an input produce abstract syntax trees. For example for +the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the +following tree. + +\begin{center} +\begin{tikzpicture} \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] \end{tikzpicture} +\end{center} + +\noindent To generate code for this expression, we need to +traverse this tree in post-order fashion and emit code for +each node---this traversal in post-order fashion will produce +code for a stack-machine (what the JVM is). Doing so for the +tree above generates the instructions + +\begin{lstlisting}[numbers=none] +ldc 1 +ldc 2 +ldc 3 +imul +ldc 4 +ldc 3 +isub +iadd +iadd +\end{lstlisting} + +\noindent If we ``run'' these instructions, the result $8$ +will be on top of the stack (I leave this to you to verify; +the meaning of each instruction should be clear). The result +being on the top of the stack will be a convention we always +observe in our compiler, that is the results of arithmetic +expressions will always be on top of the stack. Note, that a +different bracketing of the expression, for example $(1 + (2 * +3)) + (4 - 3)$, produces a different abstract syntax tree and +thus potentially also a different list of instructions. +Generating code in this fashion is rather easy to implement: +it can be done with the following recursive +\textit{compile}-function, which takes the abstract syntax +tree as argument: + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\ +$\textit{compile}(a_1 + a_2)$ & $\dn$ & +$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\ +$\textit{compile}(a_1 - a_2)$ & $\dn$ & +$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\ +$\textit{compile}(a_1 * a_2)$ & $\dn$ & +$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\ +$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & +$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\ +\end{tabular} +\end{center} + +However, our arithmetic expressions can also contain +variables. We will represent them as \emph{local variables} in +the JVM. Essentially, local variables are an array or pointers +to memory cells, containing in our case only integers. Looking +up a variable can be done with the instruction + +\begin{lstlisting}[mathescape,numbers=none] +iload $index$ +\end{lstlisting} + +\noindent +which places the content of the local variable $index$ onto +the stack. Storing the top of the stack into a local variable +can be done by the instruction + +\begin{lstlisting}[mathescape,numbers=none] +istore $index$ +\end{lstlisting} + +\noindent Note that this also pops off the top of the stack. +One problem we have to overcome, however, is that local +variables are addressed, not by identifiers, but by numbers +(starting from $0$). Therefore our compiler needs to maintain +a kind of environment where variables are associated to +numbers. This association needs to be unique: if we muddle up +the numbers, then we essentially confuse variables and the +consequence will usually be an erroneous result. Our extended +\textit{compile}-function for arithmetic expressions will +therefore take two arguments: the abstract syntax tree and the +environment, $E$, that maps identifiers to index-numbers. + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ +$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & +$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\ +$\textit{compile}(a_1 - a_2, E)$ & $\dn$ & +$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\ +$\textit{compile}(a_1 * a_2, E)$ & $\dn$ & +$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\ +$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & +$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\ +$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\ +\end{tabular} +\end{center} + +\noindent In the last line we generate the code for variables +where $E(x)$ stands for looking up the environment to which +index the variable $x$ maps to. + +There is a similar \textit{compile}-function for boolean +expressions, but it includes a ``trick'' to do with +\pcode{if}- and \pcode{while}-statements. To explain the issue +let us first describe the compilation of statements of the +While-language. The clause for \pcode{skip} is trivial, since +we do not have to generate any instruction + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\ +\end{tabular} +\end{center} + +\noindent $[]$ is the empty list of instructions. Note that +the \textit{compile}-function for statements returns a pair, a +list of instructions (in this case the empty list) and an +environment for variables. The reason for the environment is +that assignments in the While-language might change the +environment---clearly if a variable is used for the first +time, we need to allocate a new index and if it has been used +before, we need to be able to retrieve the associated index. +This is reflected in the clause for compiling assignments: + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(x := a, E)$ & $\dn$ & +$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$ +\end{tabular} +\end{center} + +\noindent We first generate code for the right-hand side of +the assignment and then add an \pcode{istore}-instruction at +the end. By convention the result of the arithmetic expression +$a$ will be on top of the stack. After the \pcode{istore} +instruction, the result will be stored in the index +corresponding to the variable $x$. If the variable $x$ has +been used before in the program, we just need to look up what +the index is and return the environment unchanged (that is in +this case $E' = E$). However, if this is the first encounter +of the variable $x$ in the program, then we have to augment +the environment and assign $x$ with the largest index in $E$ +plus one (that is $E' = E(x \mapsto largest\_index + 1)$). +That means for the assignment $x := x + 1$ we generate the +following code + +\begin{lstlisting}[mathescape,numbers=none] +iload $n_x$ +ldc 1 +iadd +istore $n_x$ +\end{lstlisting} + +\noindent +where $n_x$ is the index for the variable $x$. + +More complicated is the code for \pcode{if}-statments, say + +\begin{lstlisting}[mathescape,language={},numbers=none] +if $b$ then $cs_1$ else $cs_2$ +\end{lstlisting} + +\noindent where $b$ is a boolean expression and the $cs_{1/2}$ +are the statements for each \pcode{if}-branch. Lets assume +we already generated code for $b$ and $cs_{1/2}$. Then in the +true-case the control-flow of the program needs to be + +\begin{center} +\begin{tikzpicture}[node distance=2mm and 4mm, + block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, + point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, + skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] +\node (A1) [point] {}; +\node (b) [block, right=of A1] {code of $b$}; +\node (A2) [point, right=of b] {}; +\node (cs1) [block, right=of A2] {code of $cs_1$}; +\node (A3) [point, right=of cs1] {}; +\node (cs2) [block, right=of A3] {code of $cs_2$}; +\node (A4) [point, right=of cs2] {}; + +\draw (A1) edge [->, black, line width=1mm] (b); +\draw (b) edge [->, black, line width=1mm] (cs1); +\draw (cs1) edge [->, black, line width=1mm] (A3); +\draw (A3) edge [->, black, skip loop] (A4); +\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}}; +\end{tikzpicture} +\end{center} + +\noindent where we start with running the code for $b$; since +we are in the true case we continue with running the code for +$cs_1$. After this however, we must not run the code for +$cs_2$, but always jump after the last instruction of $cs_2$ +(the code for the \pcode{else}-branch). Note that this jump is +unconditional, meaning we always have to jump to the end of +$cs_2$. The corresponding instruction of the JVM is +\pcode{goto}. In case $b$ turns out to be false we need the +control-flow + +\begin{center} +\begin{tikzpicture}[node distance=2mm and 4mm, + block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, + point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, + skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] +\node (A1) [point] {}; +\node (b) [block, right=of A1] {code of $b$}; +\node (A2) [point, right=of b] {}; +\node (cs1) [block, right=of A2] {code of $cs_1$}; +\node (A3) [point, right=of cs1] {}; +\node (cs2) [block, right=of A3] {code of $cs_2$}; +\node (A4) [point, right=of cs2] {}; + +\draw (A1) edge [->, black, line width=1mm] (b); +\draw (b) edge [->, black, line width=1mm] (A2); +\draw (A2) edge [skip loop] (A3); +\draw (A3) edge [->, black, line width=1mm] (cs2); +\draw (cs2) edge [->,black, line width=1mm] (A4); +\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; +\end{tikzpicture} +\end{center} + +\noindent where we now need a conditional jump (if the +if-condition is false) from the end of the code for the +boolean to the beginning of the instructions $cs_2$. Once we +are finished with running $cs_2$ we can continue with whatever +code comes after the if-statement. + +The \pcode{goto} and the conditional jumps need addresses to +where the jump should go. Since we are generating assembly +code for the JVM, we do not actually have to give (numeric) +addresses, but can just attach (symbolic) labels to our code. +These labels specify a target for a jump. Therefore the labels +need to be unique, as otherwise it would be ambiguous where a +jump should go to. A label, say \pcode{L}, is attached to code +like + +\begin{lstlisting}[mathescape,numbers=none] +L: + $instr_1$ + $instr_2$ + $\vdots$ +\end{lstlisting} + +\noindent where a label is indicated by a colon. + +Recall the ``trick'' with compiling boolean expressions: the +\textit{compile}-function for boolean expressions takes three +arguments: an abstract syntax tree, an environment for +variable indices and also the label, $lab$, to where an conditional +jump needs to go. The clause for the expression $a_1 = a_2$, +for example, is as follows: + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ +\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$} +\end{tabular} +\end{center} + +\noindent where we are first generating code for the +subexpressions $a_1$ and $a_2$. This will mean after running +the corresponding code there will be two integers on top of +the stack. If they are equal, we do not have to do anything +(except for popping them off from the stack) and just continue +with the next instructions (see control-flow of ifs above). +However if they are \emph{not} equal, then we need to +(conditionally) jump to the label $lab$. This can be done with +the instruction + +\begin{lstlisting}[mathescape,numbers=none] +if_icmpne $lab$ +\end{lstlisting} + +\noindent Other jump instructions for boolean operators are + +\begin{center} +\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} +$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\ +$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\ +$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ +\end{tabular} +\end{center} + +\noindent and so on. I leave it to you to extend the +\textit{compile}-function for the other boolean expressions. +Note that we need to jump whenever the boolean is \emph{not} +true, which means we have to ``negate'' the jump +condition---equals becomes not-equal, less becomes +greater-or-equal. If you do not like this design (it can be +the source of some nasty, hard-to-detect errors), you can also +change the layout of the code and first give the code for the +else-branch and then for the if-branch. However in the case +of while-loops this way of generating code still seems +the most convenient. + +We are now ready to give the compile function for +if-statments---remember this function returns for staments a +pair consisting of the code and an environment: + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ +\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\ +\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\ +\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ +\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ +\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\ +\end{tabular} +\end{center} + +\noindent In the first two lines we generate two fresh labels +for the jump addresses (just before the else-branch and just +after). In the next two lines we generate the instructions for +the two branches, $is_1$ and $is_2$. The final code will +be first the code for $b$ (including the label +just-before-the-else-branch), then the \pcode{goto} for after +the else-branch, the label $L_\textit{ifesle}$, followed by +the instructions for the else-branch, followed by the +after-the-else-branch label. Consider for example the +if-statement: + +\begin{lstlisting}[mathescape,numbers=none,language={}] +if 1 = 1 then x := 2 else y := 3 +\end{lstlisting} + +\noindent +The generated code is as follows: + +\begin{lstlisting}[mathescape,language={}] + ldc 1 + ldc 1 + if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ + ldc 2 + istore 0 + goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$ +L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$ + ldc 3 + istore 1 +L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$ +\end{lstlisting} + +\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) + -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); + \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) + -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); \end{tikzpicture} + +\noindent The first three lines correspond to the the boolean +expression $1 = 1$. The jump for when this boolean expression +is false is in Line~3. Lines 4-6 corresponds to the if-branch; +the else-branch is in Lines 8 and 9. Note carefully how the +environment $E$ is threaded through the recursive calls of +\textit{compile}. The function receives an environment $E$, +but it might extend it when compiling the if-branch, yielding +$E'$. This happens for example in the if-statement above +whenever the variable \code{x} has not been used before. +Similarly with the environment $E''$ for the second call to +\textit{compile}. $E''$ is also the environment that needs to +be returned as part of the answer. + +The compilation of the while-loops, say +\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case +the condition is true and we need to do another iteration, +and the control-flow needs to be as follows + +\begin{center} +\begin{tikzpicture}[node distance=2mm and 4mm, + block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, + point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, + skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] +\node (A0) [point, left=of A1] {}; +\node (A1) [point] {}; +\node (b) [block, right=of A1] {code of $b$}; +\node (A2) [point, right=of b] {}; +\node (cs1) [block, right=of A2] {code of $cs$}; +\node (A3) [point, right=of cs1] {}; +\node (A4) [point, right=of A3] {}; + +\draw (A0) edge [->, black, line width=1mm] (b); +\draw (b) edge [->, black, line width=1mm] (cs1); +\draw (cs1) edge [->, black, line width=1mm] (A3); +\draw (A3) edge [->,skip loop] (A1); +\end{tikzpicture} +\end{center} + +\noindent Whereas if the condition is \emph{not} true, we +need to jump out of the loop, which gives the following +control flow. + +\begin{center} +\begin{tikzpicture}[node distance=2mm and 4mm, + block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, + point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, + skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] +\node (A0) [point, left=of A1] {}; +\node (A1) [point] {}; +\node (b) [block, right=of A1] {code of $b$}; +\node (A2) [point, right=of b] {}; +\node (cs1) [block, right=of A2] {code of $cs$}; +\node (A3) [point, right=of cs1] {}; +\node (A4) [point, right=of A3] {}; + +\draw (A0) edge [->, black, line width=1mm] (b); +\draw (b) edge [->, black, line width=1mm] (A2); +\draw (A2) edge [skip loop] (A3); +\draw (A3) edge [->, black, line width=1mm] (A4); +\end{tikzpicture} +\end{center} + +\noindent Again we can use the \textit{compile}-function for +boolean expressions to insert the appropriate jump to the +end of the loop (label $L_{wend}$ below). + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ +\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\ +\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\ +\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\ +\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\ +\end{tabular} +\end{center} + +\noindent I let you go through how this clause works. As an example +you can consider the while-loop + +\begin{lstlisting}[mathescape,numbers=none,language={}] +while x <= 10 do x := x + 1 +\end{lstlisting} + +\noindent yielding the following code + +\begin{lstlisting}[mathescape,language={}] +L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ + iload 0 + ldc 10 + if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$ + iload 0 + ldc 1 + iadd + istore 0 + goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$ +L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$ +\end{lstlisting} + +\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) + -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); + \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) + -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); \end{tikzpicture} + + +Next we need to consider the statement \pcode{write x}, which +can be used to print out the content of a variable. For this +we need to use a Java library function. In order to avoid +having to generate a lot of code for each +\pcode{write}-command, we use a separate helper-method and +just call this method with an argument (which needs to be +placed onto the stack). The code of the helper-method is as +follows. + + +\begin{lstlisting}[language=JVMIS] +.method public static write(I)V + .limit locals 1 + .limit stack 2 + getstatic java/lang/System/out Ljava/io/PrintStream; + iload 0 + invokevirtual java/io/PrintStream/println(I)V + return +.end method +\end{lstlisting} + +\noindent The first line marks the beginning of the method, +called \pcode{write}. It takes a single integer argument +indicated by the \pcode{(I)} and returns no result, indicated +by the \pcode{V}. Since the method has only one argument, we +only need a single local variable (Line~2) and a stack with +two cells will be sufficient (Line 3). Line 4 instructs the +JVM to get the value of the field \pcode{out} of the class +\pcode{java/lang/System}. It expects the value to be of type +\pcode{java/io/PrintStream}. A reference to this value will be +placed on the stack. Line~5 copies the integer we want to +print out onto the stack. In the next line we call the method +\pcode{println} (from the class \pcode{java/io/PrintStream}). +We want to print out an integer and do not expect anything +back (that is why the type annotation is \pcode{(I)V}). The +\pcode{return}-instruction in the next line changes the +control-flow back to the place from where \pcode{write} was +called. This method needs to be part of a header that is +included in any code we generate. The helper-method +\pcode{write} can be invoked with the two instructions + +\begin{lstlisting}[mathescape,language=JVMIS] +iload $E(x)$ +invokestatic XXX/XXX/write(I)V +\end{lstlisting} + +\noindent where we first place the variable to be printed on +top of the stack and then call \pcode{write}. The \pcode{XXX} +need to be replaced by an appropriate class name (this will be +explained shortly). + + +\begin{figure}[t] +\begin{lstlisting}[mathescape,language=JVMIS] +.class public XXX.XXX +.super java/lang/Object + +.method public ()V + aload_0 + invokenonvirtual java/lang/Object/()V + return +.end method + +.method public static main([Ljava/lang/String;)V + .limit locals 200 + .limit stack 200 + + $\textit{\ldots{}here comes the compiled code\ldots}$ + + return +.end method +\end{lstlisting} +\caption{Boilerplate code needed for running generated code.\label{boiler}} +\end{figure} + + +By generating code for a While-program, we end up with a list +of (JVM assembly) instructions. Unfortunately, there is a bit +more boilerplate code needed before these instructions can be +run. The complete code is shown in Figure~\ref{boiler}. This +boilerplate code is very specific to the JVM. If we target any +other virtual machine or a machine language, then we would +need to change this code. Lines 4 to 8 in Figure~\ref{boiler} +contain a method for object creation in the JVM; this method +is called \emph{before} the \pcode{main}-method in Lines 10 to +17. Interesting are the Lines 11 and 12 where we hardwire that +the stack of our programs will never be larger than 200 and +that the maximum number of variables is also 200. This seem to +be conservative default values that allow is to run some +simple While-programs. In a real compiler, we would of course +need to work harder and find out appropriate values for the +stack and local variables. + +To sum up, in Figure~\ref{test} is the complete code generated +for the slightly non-sensical program + +\begin{lstlisting}[mathescape,language=While] +x := 1 + 2; +write x +\end{lstlisting} + +\noindent Having this code at our disposal, we need the +assembler to translate the generated code into JVM bytecode (a +class file). This bytecode is understood by the JVM and can be +run by just invoking the \pcode{java}-program. + + +\begin{figure}[p] +\lstinputlisting{../progs/test-small.j} +\caption{Generated code for a test program. This code can be +processed by an Java assembler producing a class-file, which +can be run by the \pcode{java}-program.\label{test}} +\end{figure} + +\end{document} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: