% !TEX program = xelatex\documentclass{article}\usepackage{../style}\usepackage{../langs}\usepackage{../grammar}\usepackage{../graphics}\begin{document}\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019}\section*{Handout 7 (Compilation)}The purpose of a compiler is to transform a program a human can read andwrite into code the machine can run as fast as possible. The fastestcode would be machine code the CPU can run directly, but it is oftengood enough for improving the speed of a program to target avirtual machine. This produces not the fastest possible code, but codethat is often pretty fast. This way of producing code has the advantage thatthe virtual machine takes care of things a compiler would normally needto take care of (like explicit memory management). As a first example in this module we will implement a compiler for thevery simple While-language. It will generate code for the Java VirtualMachine (JVM). Unfortunately the Java ecosystem does not come with anassembler which would be handy for our compiler-endeavour (unlikeMicrosoft's Common Language Infrastructure for the .Net platform whichhas an assembler out-of-the-box). As a substitute we use in this modulethe 3rd-party programs Jasmin and Krakatau\begin{itemize} \item \url{http://jasmin.sourceforge.net} \item \url{https://github.com/Storyyeller/Krakatau}\end{itemize}\noindentThe first is a Java program and the second a program written in Python.Each of them allow us to generate \emph{assembly} files that are stillreadable by humans, as opposed to class-files which are pretty much just(horrible) zeros and ones. Jasmin (respectively Krakatau) will then takean assembly file as input and generate the corresponding class file forus. Good about the JVM is that it is a stack-based virtual machine, a factwhich will make it easy to generate code for arithmetic expressions. Forexample when compiling the expression $1 + 2$ we need to generate thefollowing three instructions\begin{lstlisting}[language=JVMIS,numbers=none]ldc 1ldc 2iadd \end{lstlisting}\noindent The first instruction loads the constant $1$ ontothe stack, the next one loads $2$, the third instruction adds bothnumbers together replacing the top two elements of the stackwith the result $3$. For simplicity, we will throughoutconsider only integer numbers. Therefore we canuse the JVM instructions \code{iadd}, \code{isub},\code{imul}, \code{idiv} and so on. The \code{i} stands forinteger instructions in the JVM (alternatives are \code{d} fordoubles, \code{l} for longs and \code{f} for floats).Recall our grammar for arithmetic expressions (\meta{E} is thestarting symbol):\begin{plstx}[rhs style=, margin=3cm]: \meta{E} ::= \meta{T} $+$ \meta{E} | \meta{T} $-$ \meta{E} | \meta{T}\\: \meta{T} ::= \meta{F} $*$ \meta{T} | \meta{F} $\backslash$ \meta{T} | \meta{F}\\: \meta{F} ::= ( \meta{E} ) | \meta{Id} | \meta{Num}\\\end{plstx}\noindent where \meta{Id} stands for variables and \meta{Num}for numbers. For the moment let us omit variables from arithmeticexpressions. Our parser will take this grammar and given an inputproduce abstract syntax trees. For example we will obtain for theexpression $1 + ((2 * 3) + (4 - 3))$ the following tree.\begin{center}\begin{tikzpicture}\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]\end{tikzpicture}\end{center}\noindent To generate JVM code for this expression, we need totraverse this tree in post-order fashion and emit code foreach node---this traversal in post-order fashion will producecode for a stack-machine (what the JVM is). Doing so for thetree above generates the instructions\begin{lstlisting}[language=JVMIS,numbers=none]ldc 1 ldc 2 ldc 3 imul ldc 4 ldc 3 isub iadd iadd\end{lstlisting}\noindent If we ``run'' these instructions, the result $8$ will be ontop of the stack (I leave this to you to verify; the meaning of eachinstruction should be clear). The result being on the top of the stackwill be an important convention we always observe in our compiler. Note,that a different bracketing of the expression, for example $(1 + (2 *3)) + (4 - 3)$, produces a different abstract syntax tree and thus alsoa different list of instructions. Generating code in thispost-order-traversal fashion is rather easy to implement: it can be donewith the following recursive \textit{compile}-function, which takes theabstract syntax tree as argument:\begin{center}\begin{tabular}{lcl}$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\$\textit{compile}(a_1 + a_2)$ & $\dn$ &$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\$\textit{compile}(a_1 - a_2)$ & $\dn$ & $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\$\textit{compile}(a_1 * a_2)$ & $\dn$ & $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\\end{tabular}\end{center}However, our arithmetic expressions can also containvariables. We will represent them as \emph{local variables} inthe JVM. Essentially, local variables are an array or pointersto memory cells, containing in our case only integers. Lookingup a variable can be done with the instruction\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]iload $index$\end{lstlisting}\noindent which places the content of the local variable $index$ onto the stack. Storing the top of the stack into a local variable can be done by the instruction\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]istore $index$\end{lstlisting}\noindent Note that this also pops off the top of the stack.One problem we have to overcome, however, is that localvariables are addressed, not by identifiers, but by numbers(starting from $0$). Therefore our compiler needs to maintaina kind of environment where variables are associated tonumbers. This association needs to be unique: if we muddle upthe numbers, then we essentially confuse variables and theconsequence will usually be an erroneous result. Our extended\textit{compile}-function for arithmetic expressions willtherefore take two arguments: the abstract syntax tree and anenvironment, $E$, that maps identifiers to index-numbers.\begin{center}\begin{tabular}{lcl}$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\\end{tabular}\end{center}\noindent In the last line we generate the code for variableswhere $E(x)$ stands for looking up the environment to whichindex the variable $x$ maps to. This is similar to an interpreter,which also needs an environment: the difference is that the interpreter maintains a mapping from variables to current values (what is thecurrently the value of a variable), while compilers need a mappingfrom variables to memory locations (where can I find the current value for the variable in memory).There is a similar \textit{compile}-function for booleanexpressions, but it includes a ``trick'' to do with\pcode{if}- and \pcode{while}-statements. To explain the issuelet us first describe the compilation of statements of theWhile-language. The clause for \pcode{skip} is trivial, sincewe do not have to generate any instruction\begin{center}\begin{tabular}{lcl}$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\\end{tabular}\end{center}\noindent whereby $[]$ is the empty list of instructions. Note thatthe \textit{compile}-function for statements returns a pair, alist of instructions (in this case the empty list) and anenvironment for variables. The reason for the environment isthat assignments in the While-language might change theenvironment---clearly if a variable is used for the firsttime, we need to allocate a new index and if it has been usedbefore, then we need to be able to retrieve the associated index.This is reflected in the clause for compiling assignments, say$\textit{x := a}$:\begin{center}\begin{tabular}{lcl}$\textit{compile}(x := a, E)$ & $\dn$ & $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$\end{tabular}\end{center}\noindent We first generate code for the right-hand side ofthe assignment and then add an \pcode{istore}-instruction atthe end. By convention the result of the arithmetic expression$a$ will be on top of the stack. After the \pcode{istore}instruction, the result will be stored in the indexcorresponding to the variable $x$. If the variable $x$ hasbeen used before in the program, we just need to look up whatthe index is and return the environment unchanged (that is inthis case $E' = E$). However, if this is the first encounter of the variable $x$ in the program, then we have to augment the environment and assign $x$ with the largest index in $E$plus one (that is $E' = E(x \mapsto largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we generate thefollowing code\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]iload $n_x$ldc 1iaddistore $n_x$\end{lstlisting}\noindent where $n_x$ is the index (or pointer to the memory) for the variable$x$. The code for looking-up the index for the variable is as follow:\begin{center}\begin{tabular}{lcl}$index \;=\; E\textit{.getOrElse}(x, |E|)$\end{tabular}\end{center}\noindentIn case the environment $E$ contains an index for $x$, we return it.Otherwise we ``create'' a new index by returning the size $|E|$ of theenvironment (that will be an index that is guaranteed to be not usedyet). In all this we take advantage of the JVM which provides us with a potentially limitless supply of places where we can store valuesof variables.A bit more complicated is the generation of code for\pcode{if}-statements, say\begin{lstlisting}[mathescape,language={},numbers=none]if $b$ then $cs_1$ else $cs_2$\end{lstlisting}\noindent where $b$ is a boolean expression and where both $cs_{1/2}$are the statements for each of the \pcode{if}-branches. Lets assume wealready generated code for $b$ and $cs_{1/2}$. Then in the true-case thecontrol-flow of the program needs to behave as \begin{center}\begin{tikzpicture}[node distance=2mm and 4mm, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs_1$};\node (A3) [point, right=of cs1] {};\node (cs2) [block, right=of A3] {code of $cs_2$};\node (A4) [point, right=of cs2] {};\draw (A1) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm] (cs1);\draw (cs1) edge [->, black, line width=1mm] (A3);\draw (A3) edge [->, black, skip loop] (A4);\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};\end{tikzpicture}\end{center}\noindent where we start with running the code for $b$; sincewe are in the true case we continue with running the code for$cs_1$. After this however, we must not run the code for$cs_2$, but always jump after the last instruction of $cs_2$(the code for the \pcode{else}-branch). Note that this jump isunconditional, meaning we always have to jump to the end of$cs_2$. The corresponding instruction of the JVM is\pcode{goto}. In case $b$ turns out to be false we need thecontrol-flow\begin{center}\begin{tikzpicture}[node distance=2mm and 4mm, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs_1$};\node (A3) [point, right=of cs1] {};\node (cs2) [block, right=of A3] {code of $cs_2$};\node (A4) [point, right=of cs2] {};\draw (A1) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm] (A2);\draw (A2) edge [skip loop] (A3);\draw (A3) edge [->, black, line width=1mm] (cs2);\draw (cs2) edge [->,black, line width=1mm] (A4);\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};\end{tikzpicture}\end{center}\noindent where we now need a conditional jump (if theif-condition is false) from the end of the code for the boolean to the beginning of the instructions $cs_2$. Once we are finished with running $cs_2$ we can continue with whatevercode comes after the if-statement.The \pcode{goto} and the conditional jumps need addresses towhere the jump should go. Since we are generating assemblycode for the JVM, we do not actually have to give (numeric)addresses, but can just attach (symbolic) labels to our code.These labels specify a target for a jump. Therefore the labelsneed to be unique, as otherwise it would be ambiguous where ajump should go to. A label, say \pcode{L}, is attached to codelike\begin{lstlisting}[mathescape,numbers=none]L: $instr_1$ $instr_2$ $\vdots$\end{lstlisting}\noindent where a label is indicated by a colon. The task of theassmbler (in our case Jasmin or Krakatau) is to resolve the labelsto actual addresses, for example jump 10 instructions forward,or 20 instructions backwards.Recall the ``trick'' with compiling boolean expressions: the \textit{compile}-function for boolean expressions takes threearguments: an abstract syntax tree, an environment for variable indices and also the label, $lab$, to where an conditional jump needs to go. The clause for the expression $a_1 = a_2$, for example, is as follows:\begin{center}\begin{tabular}{lcl}$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$}\end{tabular}\end{center}\noindent where we are first generating code for thesubexpressions $a_1$ and $a_2$. This will mean after runningthe corresponding code there will be two integers on top ofthe stack. If they are equal, we do not have to do anything(except for popping them off from the stack) and just continuewith the next instructions (see control-flow of ifs above).However if they are \emph{not} equal, then we need to(conditionally) jump to the label $lab$. This can be done withthe instruction\begin{lstlisting}[mathescape,numbers=none,language=JVMIS]if_icmpne $lab$\end{lstlisting}\noindent Other jump instructions for boolean operators are\begin{center}\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\\end{tabular}\end{center}\noindent and so on. I leave it to you to extend the\textit{compile}-function for the other boolean expressions. Note thatwe need to jump whenever the boolean is \emph{not} true, which means wehave to ``negate'' the jump condition---equals becomes not-equal, lessbecomes greater-or-equal. If you do not like this design (it can be thesource of some nasty, hard-to-detect errors), you can also change thelayout of the code and first give the code for the else-branch and thenfor the if-branch. However in the case of while-loops this``upside-down-inside-out'' way of generating code still seems the mostconvenient.We are now ready to give the compile function for if-statements---remember this function returns for statements a pair consisting of the code and an environment:\begin{center}\begin{tabular}{lcl}$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\\end{tabular}\end{center}\noindent In the first two lines we generate two fresh labelsfor the jump addresses (just before the else-branch and justafter). In the next two lines we generate the instructions forthe two branches, $is_1$ and $is_2$. The final code willbe first the code for $b$ (including the label just-before-the-else-branch), then the \pcode{goto} for afterthe else-branch, the label $L_\textit{ifesle}$, followed bythe instructions for the else-branch, followed by the after-the-else-branch label. Consider for example the if-statement:\begin{lstlisting}[mathescape,numbers=none,language=While]if 1 = 1 then x := 2 else y := 3\end{lstlisting}\noindent The generated code is as follows:\begin{lstlisting}[language=JVMIS,mathescape,numbers=left] ldc 1 ldc 1 if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ ldc 2 istore 0 goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$ ldc 3 istore 1L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$\end{lstlisting}\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);\end{tikzpicture}\noindent The first three lines correspond to the the booleanexpression $1 = 1$. The jump for when this boolean expressionis false is in Line~3. Lines 4-6 corresponds to the if-branch;the else-branch is in Lines 8 and 9. Note carefully how theenvironment $E$ is threaded through the recursive calls of\textit{compile}. The function receives an environment $E$,but it might extend it when compiling the if-branch, yielding$E'$. This happens for example in the if-statement abovewhenever the variable \code{x} has not been used before.Similarly with the environment $E''$ for the second call to\textit{compile}. $E''$ is also the environment that needs tobe returned as part of the answer.The compilation of the while-loops, say \pcode{while} $b$ \pcode{do} $cs$, is very similar. In casethe condition is true and we need to do another iteration, and the control-flow needs to be as follows\begin{center}\begin{tikzpicture}[node distance=2mm and 4mm, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A0) [point, left=of A1] {};\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs$};\node (A3) [point, right=of cs1] {};\node (A4) [point, right=of A3] {};\draw (A0) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm] (cs1);\draw (cs1) edge [->, black, line width=1mm] (A3);\draw (A3) edge [->,skip loop] (A1);\end{tikzpicture}\end{center}\noindent Whereas if the condition is \emph{not} true, weneed to jump out of the loop, which gives the followingcontrol flow.\begin{center}\begin{tikzpicture}[node distance=2mm and 4mm, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A0) [point, left=of A1] {};\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs$};\node (A3) [point, right=of cs1] {};\node (A4) [point, right=of A3] {};\draw (A0) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm] (A2);\draw (A2) edge [skip loop] (A3);\draw (A3) edge [->, black, line width=1mm] (A4);\end{tikzpicture}\end{center}\noindent Again we can use the \textit{compile}-function forboolean expressions to insert the appropriate jump to theend of the loop (label $L_{wend}$ below).\begin{center}\begin{tabular}{lcl}$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\\end{tabular}\end{center}\noindent I let you go through how this clause works. As an exampleyou can consider the while-loop\begin{lstlisting}[mathescape,numbers=none,language=While]while x <= 10 do x := x + 1\end{lstlisting}\noindent yielding the following code\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ iload 0 ldc 10 if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$ iload 0 ldc 1 iadd istore 0 goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$\end{lstlisting}\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);\end{tikzpicture}\noindentI leave it to you to read the code and follow its controlflow.Next we need to consider the statement \pcode{write x}, whichcan be used to print out the content of a variable. For thiswe need to use a Java library function. In order to avoidhaving to generate a lot of code for each\pcode{write}-command, we use a separate helper-method andjust call this method with an argument (which needs to beplaced onto the stack). The code of the helper-method is asfollows.\begin{lstlisting}[language=JVMIS,numbers=left].method public static write(I)V .limit locals 1 .limit stack 2 getstatic java/lang/System/out Ljava/io/PrintStream; iload 0 invokevirtual java/io/PrintStream/println(I)V return .end method\end{lstlisting}\noindent The first line marks the beginning of the method,called \pcode{write}. It takes a single integer argumentindicated by the \pcode{(I)} and returns no result, indicatedby the \pcode{V}. Since the method has only one argument, weonly need a single local variable (Line~2) and a stack withtwo cells will be sufficient (Line 3). Line 4 instructs theJVM to get the value of the field \pcode{out} of the class\pcode{java/lang/System}. It expects the value to be of type\pcode{java/io/PrintStream}. A reference to this value will beplaced on the stack. Line~5 copies the integer we want toprint out onto the stack. In the next line we call the method\pcode{println} (from the class \pcode{java/io/PrintStream}).We want to print out an integer and do not expect anythingback (that is why the type annotation is \pcode{(I)V}). The\pcode{return}-instruction in the next line changes thecontrol-flow back to the place from where \pcode{write} wascalled. This method needs to be part of a header that isincluded in any code we generate. The helper-method\pcode{write} can be invoked with the two instructions\begin{lstlisting}[mathescape,language=JVMIS]iload $E(x)$ invokestatic XXX/XXX/write(I)V\end{lstlisting}\noindent where we first place the variable to be printed ontop of the stack and then call \pcode{write}. The \pcode{XXX}need to be replaced by an appropriate class name (this will beexplained shortly).\begin{figure}[t]\begin{lstlisting}[mathescape,language=JVMIS,numbers=left].class public XXX.XXX.super java/lang/Object.method public <init>()V aload_0 invokenonvirtual java/lang/Object/<init>()V return.end method.method public static main([Ljava/lang/String;)V .limit locals 200 .limit stack 200 $\textit{\ldots{}here comes the compiled code\ldots}$ return.end method\end{lstlisting}\caption{Boilerplate code needed for running generated code.\label{boiler}}\end{figure}By generating code for a While-program, we end up with a listof (JVM assembly) instructions. Unfortunately, there is a bitmore boilerplate code needed before these instructions can berun. The complete code is shown in Figure~\ref{boiler}. Thisboilerplate code is very specific to the JVM. If we target anyother virtual machine or a machine language, then we wouldneed to change this code. Lines 4 to 8 in Figure~\ref{boiler}contain a method for object creation in the JVM; this methodis called \emph{before} the \pcode{main}-method in Lines 10 to17. Interesting are the Lines 11 and 12 where we hardwire thatthe stack of our programs will never be larger than 200 andthat the maximum number of variables is also 200. This seem tobe conservative default values that allow is to run somesimple While-programs. In a real compiler, we would of courseneed to work harder and find out appropriate values for thestack and local variables.To sum up, in Figure~\ref{test} is the complete code generatedfor the slightly nonsensical program\begin{lstlisting}[mathescape,language=While]x := 1 + 2;write x\end{lstlisting}\noindent I let you read the code and make sure the code behaves asexpected. Having this code at our disposal, we need the assembler totranslate the generated code into JVM bytecode (a class file). Thisbytecode is then understood by the JVM and can be run by just invokingthe \pcode{java}-program.\begin{figure}[p]\lstinputlisting[language=JVMIS]{../progs/test-small.j}\caption{Generated code for a test program. This code can be processed by an Java assembler producing a class-file, whichcan be run by the {\tt{}java}-program.\label{test}}\end{figure}\subsection*{Arrays}Maybe a useful addition to the While-language would be arrays. Thiswould let us generate more interesting While-programs by translatingBF*** programs into equivalent While-code. So in this section lets havea look at how we can support the following three constructions\begin{lstlisting}[mathescape,language=While]new arr[15000]x := 3 + arr[3 + y]arr[42 * n] := ...\end{lstlisting}\noindentThe first construct is for creating new arrays, in this instance thename of the array is \pcode{arr} and it can hold 15000 integers. Thesecond is for referencing an array cell inside an arithmeticexpression---we need to be able to look up the contents of an array atan index determined by an arithmetic expression. Similarly in the linebelow, we need to be able to update the content of an array at ancalculated index. For creating a new array we can generate the following three JVMinstructions:\begin{lstlisting}[mathescape,language=JVMIS]ldc number newarray int astore loc_var\end{lstlisting}\noindentFirst we need to put the dimension of the array onto the stack. The nextinstruction creates the array. With the last we can store the array as alocal variable (like the ``simple'' variables from the previoussection). The use of a local variable for each array allows us to havemultiple arrays in a While-program. For looking up an element in anarray we can use the following JVM code\begin{lstlisting}[mathescape,language=JVMIS]aload loc_var index_aexp iaload\end{lstlisting}\noindentThe first instruction loads the ``pointer'' to the array onto the stack.Then we have some instructions corresponding to the index where we wantto look up the array. The idea is that these instructions will leave aconcrete number on the stack, which will be the index into the array weneed. Finally we need to tell the JVM to load the corresponding elementonto the stack. Updating an array at an index with a value is asfollows.\begin{lstlisting}[mathescape,language=JVMIS]aload loc_var index_aexp value_aexp iastore\end{lstlisting}\noindentAgain the first instruction loads the ``pointer'' to the array onto thestack. Then we have some instructions corresponding to the index wherewe want to update the array. After that come the instructions for withwhat value we want to update the array. The last line contains theinstruction for updating the array.Next we need to modify our grammar rules for our While-language: itseems best to extend the rule for factors in arithmetic expressions witha rule for looking up an array.\begin{plstx}[rhs style=, margin=3cm]: \meta{E} ::= \meta{T} $+$ \meta{E} | \meta{T} $-$ \meta{E} | \meta{T}\\: \meta{T} ::= \meta{F} $*$ \meta{T} | \meta{F} $\backslash$ \meta{T} | \meta{F}\\: \meta{F} ::= ( \meta{E} ) | $\underbrace{\meta{Id}\,[\,\meta{E}\,]}_{new}$ | \meta{Id} | \meta{Num}\\\end{plstx}\noindentThere is no problem with left-recursion as the \meta{E} is ``protected''by an identifier and the brackets. There are two new rules for statements,one for creating an array and one for array assignment:\begin{plstx}[rhs style=, margin=2cm, one per line]: \meta{Stmt} ::= \ldots | \texttt{new}\; \meta{Id}\,[\,\meta{Num}\,] | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\\end{plstx}With this in place we can turn back to the idea of creating Whileprograms by translating BF programs. This is a relatively easy taskbecause BF only has eight instructions (we will actually only have sevenbecause we can omit the read-in instruction from BF). But also translatingBF-loops is going to be easy since they straightforwardly can be represented by While-loops. The Scala code for the translation isas follows:\begin{lstlisting}[language=Scala,numbers=left]def instr(c: Char) : String = c match { case '>' => "ptr := ptr + 1;" case '<' => "ptr := ptr - 1;" case '+' => "field[ptr] := field[ptr] + 1;" case '-' => "field[ptr] := field[ptr] - 1;" case '.' => "x := field[ptr]; write x;" case '[' => "while (field[ptr] != 0) do {" case ']' => "skip};" case _ => ""}\end{lstlisting}\noindent The idea behind the translation is that BF-programs operate on an array,called \texttt{field}. The BP-memory pointer into this array isrepresented as the variable \texttt{ptr}. The BF-instructions \code{>}and \code{<} increase, respectively decrease, \texttt{ptr}. Theinstructions \code{+} and \code{-} update a cell in \texttt{field}. InLine 6 we need to first assign a field-cell to an auxiliary variablesince we have not changed our write functions in order to cope withwriting out any array-content directly. Lines 7 and 8 are fortranslating BF-loops. Line 8 is interesting in the sense that we need togenerate a \code{skip} instruction just before finishing with the closing \code{"\}"}. The reason is that we are rather pedantic aboutsemicolons in our While-grammar: the last command cannot have asemicolon---adding a \code{skip} works around this snag. Puttingall this together is we can generate While-programs with more than400 instructions and then run the compiled JVM code for such programs.\end{document}%%% Local Variables: %%% mode: latex %%% TeX-master: t%%% End: