% !TEX program = xelatex\documentclass{article}\usepackage{../style}\usepackage{../langs}\usepackage{../grammar}\usepackage{../graphics}\usetikzlibrary{calc,shapes,arrows}\usepackage{framed}\usepackage[belowskip=7pt,aboveskip=0pt]{caption}\begin{document}\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019, 2020}\section*{Handout 7 (Compilation)}The purpose of a compiler is to transform a program a human can read andwrite into code the machine can run as fast as possible. The fastestcode would be machine code the CPU can run directly, but it is oftengood enough for improving the speed of a program to target a virtualmachine instead. This produces not the fastest possible code, but codethat is often pretty fast. This way of producing code has also theadvantage that the virtual machine takes care of things a compiler wouldnormally need to take care of (hairy things like explicit memorymanagement). As a first example in this module we will implement a compiler for thevery simple WHILE-language that we parsed in the last lecture. Thecompiler will target the Java Virtual Machine (JVM), but not directly.Pictorially the compiler will work as follows:\begin{center} \begin{tikzpicture}[scale=1,font=\bf, node/.style={ rectangle,rounded corners=3mm, ultra thick,draw=black!50,minimum height=18mm, minimum width=20mm, top color=white,bottom color=black!20}] \node (0) at (-3,0) {}; \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler}; \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau}; \node (C) at (7.5,0) [node] {JVM}; \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); \end{tikzpicture} \end{center}\noindentThe input will be WHILE-programs; the output will be assembly files(with the file extension .j). Assembly files essentially containhuman-readable low-level code, meaning they are not just bits and bytes,but rather something you can read and understand---with a bit ofpractice of course. An \emph{assembler} will then translate the assemblyfiles into unreadable class- or binary-files the JVM or CPU can run.Unfortunately, the Java ecosystem does not come with an assembler whichwould be handy for our compiler-endeavour (unlike Microsoft's CommonLanguage Infrastructure for the .Net platform which has an assemblerout-of-the-box). As a substitute we shall use the 3rd-party programsJasmin and Krakatau \begin{itemize} \item \url{http://jasmin.sourceforge.net} \item \url{https://github.com/Storyyeller/Krakatau}\end{itemize}\noindentThe first is a Java program and the second a program written in Python.Each of them allow us to generate \emph{assembly} files that are stillreadable by humans, as opposed to class-files which are pretty much just(horrible) zeros and ones. Jasmin (respectively Krakatau) will then takeour assembly files as input and generate the corresponding class-files forus. What is good about the JVM is that it is a stack-based virtual machine,a fact which will make it easy to generate code for arithmeticexpressions. For example when compiling the expression $1 + 2$ we needto generate the following three instructions\begin{lstlisting}[language=JVMIS,numbers=none]ldc 1ldc 2iadd \end{lstlisting}\noindent The first instruction loads the constant $1$ onto the stack,the next one loads $2$, the third instruction adds both numbers togetherreplacing the top two elements of the stack with the result $3$. Forsimplicity, we will consider throughout only arithmetic involvinginteger numbers. This means our main JVM instructions for arithmeticwill be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on.The \code{i} stands for integer instructions in the JVM (alternativesare \code{d} for doubles, \code{l} for longs and \code{f} for floatsetc).Recall our grammar for arithmetic expressions (\meta{E} is thestarting symbol):\begin{plstx}[rhs style=, margin=3cm]: \meta{E} ::= \meta{T} $+$ \meta{E} | \meta{T} $-$ \meta{E} | \meta{T}\\: \meta{T} ::= \meta{F} $*$ \meta{T} | \meta{F} $\backslash$ \meta{T} | \meta{F}\\: \meta{F} ::= ( \meta{E} ) | \meta{Id} | \meta{Num}\\\end{plstx}\noindent where \meta{Id} stands for variables and \meta{Num}for numbers. For the moment let us omit variables from arithmeticexpressions. Our parser will take this grammar and given an inputprogram produce an abstract syntax tree. For example we obtain forthe expression $1 + ((2 * 3) + (4 - 3))$ the following tree.\begin{center}\begin{tikzpicture}\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]\end{tikzpicture}\end{center}\noindent To generate JVM code for this expression, we need to traversethis tree in \emph{post-order} fashion and emit code for eachnode---this traversal in \emph{post-order} fashion will produce code fora stack-machine (which is what the JVM is). Doing so for the tree abovegenerates the instructions\begin{lstlisting}[language=JVMIS,numbers=none]ldc 1 ldc 2 ldc 3 imul ldc 4 ldc 3 isub iadd iadd\end{lstlisting}\noindent If we ``run'' these instructions, the result $8$ will be ontop of the stack (I leave this to you to verify; the meaning of eachinstruction should be clear). The result being on the top of the stackwill be an important convention we always observe in our compiler. Note,that a different bracketing of the expression, for example $(1 + (2 *3)) + (4 - 3)$, produces a different abstract syntax tree and thus alsoa different list of instructions. Generating code in this post-order-traversal fashion is rather easy toimplement: it can be done with the following recursive\textit{compile}-function, which takes the abstract syntax tree as anargument:\begin{center}\begin{tabular}{lcl}$\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\$\textit{compile}(a_1 + a_2)$ & $\dn$ &$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\$\textit{compile}(a_1 - a_2)$ & $\dn$ & $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\$\textit{compile}(a_1 * a_2)$ & $\dn$ & $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\\end{tabular}\end{center}\noindentThis is all fine, but our arithmetic expressions can contain variablesand we have not considered them yet. To fix this we will represent ourvariables as \emph{local variables} of the JVM. Essentially, localvariables are an array or pointers to memory cells, containing in ourcase only integers. Looking up a variable can be done with theinstruction\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]iload $index$\end{lstlisting}\noindent which places the content of the local variable $index$ onto the stack. Storing the top of the stack into a local variable can be done by the instruction\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]istore $index$\end{lstlisting}\noindent Note that this also pops off the top of the stack. One problemwe have to overcome, however, is that local variables are addressed, notby identifiers (like \texttt{x}, \texttt{foo} and so on), but by numbers(starting from $0$). Therefore our compiler needs to maintain a kind ofenvironment where variables are associated to numbers. This associationneeds to be unique: if we muddle up the numbers, then we essentiallyconfuse variables and the consequence will usually be an erroneousresult. Our extended \textit{compile}-function for arithmeticexpressions will therefore take two arguments: the abstract syntax treeand an environment, $E$, that maps identifiers to index-numbers.\begin{center}\begin{tabular}{lcl}$\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\$\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\\end{tabular}\end{center}\noindent In the last line we generate the code for variables where$E(x)$ stands for looking up the environment to which index the variable$x$ maps to. This is similar to the interpreter we saw earlier in themodule, which also needs an environment: the difference is that theinterpreter maintains a mapping from variables to current values (whatis the currently the value of a variable?), while compilers need amapping from variables to memory locations (where can I find the currentvalue for the variable in memory?).There is a similar \textit{compile}-function for booleanexpressions, but it includes a ``trick'' to do with\pcode{if}- and \pcode{while}-statements. To explain the issuelet us first describe the compilation of statements of theWHILE-language. The clause for \pcode{skip} is trivial, sincewe do not have to generate any instruction\begin{center}\begin{tabular}{lcl}$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\\end{tabular}\end{center}\noindent whereby $[]$ is the empty list of instructions. Note thatthe \textit{compile}-function for statements returns a pair, alist of instructions (in this case the empty list) and anenvironment for variables. The reason for the environment isthat assignments in the WHILE-language might change theenvironment---clearly if a variable is used for the firsttime, we need to allocate a new index and if it has been usedbefore, then we need to be able to retrieve the associated index.This is reflected in the clause for compiling assignments, say$x := a$:\begin{center}\begin{tabular}{lcl}$\textit{compile}(x := a, E)$ & $\dn$ & $(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$\end{tabular}\end{center}\noindent We first generate code for the right-hand side of theassignment (that is the arithmetic expression $a$) and then add an\instr{istore}-instruction at the end. By convention running the codefor the arithmetic expression $a$ will leave the result on top of thestack. After that the \instr{istore}-instruction, the result will bestored in the index corresponding to the variable $x$. If the variable$x$ has been used before in the program, we just need to look up whatthe index is and return the environment unchanged (that is in this case$E' = E$). However, if this is the first encounter of the variable $x$in the program, then we have to augment the environment and assign $x$with the largest index in $E$ plus one (that is $E' = E(x \mapstolargest\_index + 1)$). To sum up, for the assignment $x := x + 1$ wegenerate the following code snippet\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]iload $n_x$ldc 1iaddistore $n_x$\end{lstlisting}\noindent where $n_x$ is the index (or pointer to the memory) for the variable$x$. The Scala code for looking-up the index for the variable is as follow:\begin{center}\begin{tabular}{lcl}$index \;=\; E\textit{.getOrElse}(x, |E|)$\end{tabular}\end{center}\noindentThis implements the idea that in case the environment $E$ contains anindex for $x$, we return it. Otherwise we ``create'' a new index byreturning the size $|E|$ of the environment (that will be an index thatis guaranteed not to be used yet). In all this we take advantage of theJVM which provides us with a potentially limitless supply of placeswhere we can store values of variables.A bit more complicated is the generation of code for\pcode{if}-statements, say\begin{lstlisting}[mathescape,language={WHILE},numbers=none]if $b$ then $cs_1$ else $cs_2$\end{lstlisting}\noindent where $b$ is a boolean expression and where both $cs_{1/2}$are the statements for each of the \pcode{if}-branches. Let us assume wealready generated code for $b$ and and the two if-branches $cs_{1/2}$.Then in the true-case the control-flow of the program needs to behave as\begin{center}\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, top color=white,bottom color=black!20}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs_1$};\node (A3) [point, right=of cs1] {};\node (cs2) [block, right=of A3] {code of $cs_2$};\node (A4) [point, right=of cs2] {};\draw (A1) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm] (cs1);\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);\draw (A3) edge [->, black, skip loop] (A4);\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};\end{tikzpicture}\end{center}\noindent where we start with running the code for $b$; sincewe are in the true case we continue with running the code for$cs_1$. After this however, we must not run the code for$cs_2$, but always jump to after the last instruction of $cs_2$(the code for the \pcode{else}-branch). Note that this jump isunconditional, meaning we always have to jump to the end of$cs_2$. The corresponding instruction of the JVM is\instr{goto}. In case $b$ turns out to be false we need thecontrol-flow\begin{center}\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, top color=white,bottom color=black!20}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs_1$};\node (A3) [point, right=of cs1] {};\node (cs2) [block, right=of A3] {code of $cs_2$};\node (A4) [point, right=of cs2] {};\draw (A1) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);\draw (A2) edge [skip loop] (A3);\draw (A3) edge [->, black, line width=1mm] (cs2);\draw (cs2) edge [->,black, line width=1mm] (A4);\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};\end{tikzpicture}\end{center}\noindent where we now need a conditional jump (if theif-condition is false) from the end of the code for the boolean to the beginning of the instructions $cs_2$. Once we are finished with running $cs_2$ we can continue with whatevercode comes after the if-statement.The \instr{goto} and the conditional jumps need addresses towhere the jump should go. Since we are generating assemblycode for the JVM, we do not actually have to give (numeric)addresses, but can just attach (symbolic) labels to our code.These labels specify a target for a jump. Therefore the labelsneed to be unique, as otherwise it would be ambiguous where ajump should go to. A label, say \pcode{L}, is attached to assembly code like\begin{lstlisting}[mathescape,numbers=none]L: $\textit{instr\_1}$ $\textit{instr\_2}$ $\vdots$\end{lstlisting}\noindent where the label needs to be followed by a colon. The task ofthe assembler (in our case Jasmin or Krakatau) is to resolve the labelsto actual (numeric) addresses, for example jump 10 instructions forward,or 20 instructions backwards.Recall the ``trick'' with compiling boolean expressions: the \textit{compile}-function for boolean expressions takes threearguments: an abstract syntax tree, an environment for variable indices and also the label, $lab$, to where an conditional jump needs to go. The clause for the expression $a_1 = a_2$, for example, is as follows:\begin{center}\begin{tabular}{lcl}$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}\end{tabular}\end{center}\noindent where we are first generating code for thesubexpressions $a_1$ and $a_2$. This will mean after runningthe corresponding code there will be two integers on top ofthe stack. If they are equal, we do not have to do anything(except for popping them off from the stack) and just continuewith the next instructions (see control-flow of ifs above).However if they are \emph{not} equal, then we need to(conditionally) jump to the label $lab$. This can be done withthe instruction\begin{lstlisting}[mathescape,numbers=none,language=JVMIS]if_icmpne $lab$\end{lstlisting}To sum up, the third argument in the compile function for booleansspecifies where to jump, in case the condition is \emph{not} true. Ileave it to you to extend the \textit{compile}-function for the otherboolean expressions. Note that we need to jump whenever the boolean is\emph{not} true, which means we have to ``negate'' the jumpcondition---equals becomes not-equal, less becomes greater-or-equal. Other jump instructions for boolean operators are\begin{center}\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}$\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\$<$ & $\Rightarrow$ & \instr{if_icmpge}\\$\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\\end{tabular}\end{center}\noindent and so on. If you do not like this design (it can be thesource of some nasty, hard-to-detect errors), you can also change thelayout of the code and first give the code for the else-branch and thenfor the if-branch. However in the case of while-loops this``upside-down-inside-out'' way of generating code still seems the mostconvenient.We are now ready to give the compile function for if-statements---remember this function returns for statements a pair consisting of the code and an environment:\begin{center}\begin{tabular}{lcl}$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\\end{tabular}\end{center}\noindent In the first two lines we generate two fresh labelsfor the jump addresses (just before the else-branch and justafter). In the next two lines we generate the instructions forthe two branches, $is_1$ and $is_2$. The final code willbe first the code for $b$ (including the label just-before-the-else-branch), then the \pcode{goto} for afterthe else-branch, the label $L_\textit{ifelse}$, followed bythe instructions for the else-branch, followed by the after-the-else-branch label. Consider for example the if-statement:\begin{lstlisting}[mathescape,numbers=none,language=While]if 1 = 1 then x := 2 else y := 3\end{lstlisting}\noindent The generated code is as follows:\begin{lstlisting}[language=JVMIS,mathescape,numbers=left] ldc 1 ldc 1 if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ ldc 2 istore 0 goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$ ldc 3 istore 1L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$\end{lstlisting}\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);\end{tikzpicture}\noindent The first three lines correspond to the the booleanexpression $1 = 1$. The jump for when this boolean expressionis false is in Line~3. Lines 4-6 corresponds to the if-branch;the else-branch is in Lines 8 and 9. Note carefully how the environment $E$ is threaded through the recursivecalls of \textit{compile}. The function receives an environment $E$, butit might extend it when compiling the if-branch, yielding $E'$. Thishappens for example in the if-statement above whenever the variable\code{x} has not been used before. Similarly with the environment $E''$for the second call to \textit{compile}. $E''$ is also the environmentthat needs to be returned as part of the answer.The compilation of the while-loops, say \pcode{while} $b$ \pcode{do} $cs$, is very similar. In casethe condition is true and we need to do another iteration, and the control-flow needs to be as follows\begin{center}\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, top color=white,bottom color=black!20}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A0) [point, left=of A1] {};\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs$};\node (A3) [point, right=of cs1] {};\node (A4) [point, right=of A3] {};\draw (A0) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm] (cs1);\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);\draw (A3) edge [->,skip loop] (A1);\end{tikzpicture}\end{center}\noindent Whereas if the condition is \emph{not} true, weneed to jump out of the loop, which gives the followingcontrol flow.\begin{center}\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, top color=white,bottom color=black!20}, point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]\node (A0) [point, left=of A1] {};\node (A1) [point] {};\node (b) [block, right=of A1] {code of $b$};\node (A2) [point, right=of b] {};\node (cs1) [block, right=of A2] {code of $cs$};\node (A3) [point, right=of cs1] {};\node (A4) [point, right=of A3] {};\draw (A0) edge [->, black, line width=1mm] (b);\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);\draw (A2) edge [skip loop] (A3);\draw (A3) edge [->, black, line width=1mm] (A4);\end{tikzpicture}\end{center}\noindent Again we can use the \textit{compile}-function forboolean expressions to insert the appropriate jump to theend of the loop (label $L_{wend}$ below).\begin{center}\begin{tabular}{lcl}$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\\end{tabular}\end{center}\noindent I let you go through how this clause works. As an exampleyou can consider the while-loop\begin{lstlisting}[mathescape,numbers=none,language=While]while x <= 10 do x := x + 1\end{lstlisting}\noindent yielding the following code\begin{lstlisting}[language=JVMIS2,mathescape,numbers=left]L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ iload 0 ldc 10 if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$ iload 0 ldc 1 iadd istore 0 goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$\end{lstlisting}\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);\end{tikzpicture}\noindentAs said, I leave it to you to decide whether the code implementsthe usual controlflow of while-loops.Next we need to consider the WHILE-statement \pcode{write x}, which canbe used to print out the content of a variable. For this we shall use aJava library function. In order to avoid having to generate a lot ofcode for each \pcode{write}-command, we use a separate helper-method andjust call this method with an appropriate argument (which of courseneeds to be placed onto the stack). The code of the helper-method is asfollows.\begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small].method public static write(I)V .limit locals 1 .limit stack 2 getstatic java/lang/System/out Ljava/io/PrintStream; iload 0 invokevirtual java/io/PrintStream/println(I)V return .end method\end{lstlisting}\noindent The first line marks the beginning of the method, called\pcode{write}. It takes a single integer argument indicated by the\pcode{(I)} and returns no result, indicated by the \pcode{V} (forvoid). Since the method has only one argument, we only need a singlelocal variable (Line~2) and a stack with two cells will be sufficient(Line 3). Line 4 instructs the JVM to get the value of the member\pcode{out} from the class \pcode{java/lang/System}. It expects the valueto be of type \pcode{java/io/PrintStream}. A reference to this valuewill be placed on the stack.\footnote{Note the syntax \texttt{L\ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow thedesigners of Jasmin decided that this syntax is pleasing to the eye. Soif you wanted to have strings in your Jasmin code, you would need towrite \texttt{Ljava/lang/String;}\;. If you want arrays of onedimension, then use \texttt{[\ldots}; two dimensions, use\texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5copies the integer we want to print out onto the stack. In the lineafter that we call the method \pcode{println} (from the class\pcode{java/io/PrintStream}). We want to print out an integer and do notexpect anything back (that is why the type annotation is \pcode{(I)V}).The \pcode{return}-instruction in the next line changes the control-flowback to the place from where \pcode{write} was called. This method needsto be part of a header that is included in any code we generate. Thehelper-method \pcode{write} can be invoked with the two instructions\begin{lstlisting}[mathescape,language=JVMIS]iload $E(x)$ invokestatic XXX/XXX/write(I)V\end{lstlisting}\noindent where we first place the variable to be printed ontop of the stack and then call \pcode{write}. The \pcode{XXX}need to be replaced by an appropriate class name (this will beexplained shortly).By generating code for a WHILE-program, we end up with a list of (JVMassembly) instructions. Unfortunately, there is a bit more boilerplatecode needed before these instructions can be run. Essentially we have toenclose them inside a Java \texttt{main}-method. The corresponding codeis shown in Figure~\ref{boiler}. This boilerplate code is very specificto the JVM. If we target any other virtual machine or a machinelanguage, then we would need to change this code. Interesting are theLines 5 and 6 where we hardwire that the stack of our programs willnever be larger than 200 and that the maximum number of variables isalso 200. This seem to be conservative default values that allow is torun some simple WHILE-programs. In a real compiler, we would of courseneed to work harder and find out appropriate values for the stack andlocal variables.\begin{figure}[t]\begin{framed}\begin{lstlisting}[mathescape,language=JVMIS,numbers=left].class public XXX.XXX.super java/lang/Object.method public static main([Ljava/lang/String;)V .limit locals 200 .limit stack 200 $\textit{\ldots{}here comes the compiled code\ldots}$ return.end method\end{lstlisting}\end{framed}\caption{The boilerplate code needed for running generated code. It hardwires limits for stack space and for the number of local variables.\label{boiler}}\end{figure}To sum up, in Figure~\ref{test} is the complete code generatedfor the slightly nonsensical program\begin{lstlisting}[mathescape,language=While]x := 1 + 2;write x\end{lstlisting}\noindent I let you read the code and make sure the code behaves asexpected. Having this code at our disposal, we need the assembler totranslate the generated code into JVM bytecode (a class file). Thisbytecode is then understood by the JVM and can be run by just invokingthe \pcode{java}-program. Again I let you do the work.\begin{figure}[p]\begin{framed}\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}\begin{tikzpicture}[remember picture,overlay] \draw[|<->|,very thick] (LA.north) -- (LB.south) node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; \draw[|<->|,very thick] (LC.north) -- (LD.south) node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};\end{tikzpicture}\end{framed}\caption{The generated code for the test program \texttt{x := 1 + 2; writex}. This code can be processed by a Java assembler producing aclass-file, which can then be run by the {\tt{}java}-program.\label{test}}\end{figure}\subsection*{Arrays}Maybe a useful addition to the WHILE-language would be arrays. Thiswould allow us to generate more interesting WHILE-programs bytranslating BF*** programs into equivalent WHILE-code. Therefore in thissection let us have a look at how we can support the following threeconstructions\begin{lstlisting}[mathescape,language=While]new(arr[15000])x := 3 + arr[3 + y]arr[42 * n] := ...\end{lstlisting}\noindentThe first construct is for creating new arrays. In this instance thename of the array is \pcode{arr} and it can hold 15000 integers. We donot support ``dynamic'' arrays, that is the size of our arrays willalways be fixed. The second construct is for referencing an array cellinside an arithmetic expression---we need to be able to look up thecontents of an array at an index determined by an arithmetic expression.Similarly in the line below, we need to be able to update the content ofan array at a calculated index. For creating a new array we can generate the following three JVMinstructions:\begin{lstlisting}[mathescape,language=JVMIS]ldc number newarray int astore loc_var\end{lstlisting}\noindentFirst we need to put the size of the array onto the stack. The nextinstruction creates the array. In this case the array contains\texttt{int}s. With the last instruction we can store the array as alocal variable (like the ``simple'' variables from the previoussection). The use of a local variable for each array allows us to havemultiple arrays in a WHILE-program. For looking up an element in anarray we can use the following JVM code\begin{lstlisting}[mathescape,language=JVMIS]aload loc_var $\textit{index\_aexp}$ iaload\end{lstlisting}\noindentThe first instruction loads the ``pointer'', or local variable, to thearray onto the stack. Then we have some instructions calculating theindex where we want to look up the array. The idea is that theseinstructions will leave a concrete number on the top of the stack, whichwill be the index into the array we need. Finally we need to tell theJVM to load the corresponding element onto the stack. Updating an arrayat an index with a value is as follows.\begin{lstlisting}[mathescape,language=JVMIS]aload loc_var $\textit{index\_aexp}$ $\textit{value\_aexp}$ iastore\end{lstlisting}\noindentAgain the first instruction loads the local variable ofthe array onto the stack. Then we have some instructions calculatingthe index where we want to update the array. After that come theinstructions for with which value we want to update the array. The lastline contains the instruction for updating the array.Next we need to modify our grammar rules for our WHILE-language: itseems best to extend the rule for factors in arithmetic expressions witha rule for looking up an array.\begin{plstx}[rhs style=, margin=3cm]: \meta{E} ::= \meta{T} $+$ \meta{E} | \meta{T} $-$ \meta{E} | \meta{T}\\: \meta{T} ::= \meta{F} $*$ \meta{T} | \meta{F} $\backslash$ \meta{T} | \meta{F}\\: \meta{F} ::= ( \meta{E} ) | $\underbrace{\meta{Id}\,[\,\meta{E}\,]}_{new}$ | \meta{Id} | \meta{Num}\\\end{plstx}\noindentThere is no problem with left-recursion as the \meta{E} is ``protected''by an identifier and the brackets. There are two new rules for statements,one for creating an array and one for array assignment:\begin{plstx}[rhs style=, margin=2cm, one per line]: \meta{Stmt} ::= \ldots | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\\end{plstx}With this in place we can turn back to the idea of creatingWHILE-programs by translating BF-programs. This is a relatively easytask because BF has only eight instructions (we will actually implementseven because we can omit the read-in instruction from BF). What makesthis translation easy is that BF-loops can be straightforwardlyrepresented as while-loops. The Scala code for the translation is asfollows:\begin{lstlisting}[language=Scala,numbers=left]def instr(c: Char) : String = c match { case '>' => "ptr := ptr + 1;" case '<' => "ptr := ptr - 1;" case '+' => "mem[ptr] := mem [ptr] + 1;" case '-' => "mem [ptr] := mem [ptr] - 1;" case '.' => "x := mem [ptr]; write x;" case '[' => "while (mem [ptr] != 0) do {" case ']' => "skip};" case _ => ""}\end{lstlisting}\noindent The idea behind the translation is that BF-programs operate on an array,called here \texttt{mem}. The BF-memory pointer into this array isrepresented as the variable \texttt{ptr}. As usual the BF-instructions\code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. Theinstructions \code{+} and \code{-} update a cell in \texttt{mem}. InLine 6 we need to first assign a \texttt{mem}-cell to an auxiliaryvariable since we have not changed our write functions in order to copewith writing out any array-content directly. Lines 7 and 8 are fortranslating BF-loops. Line 8 is interesting in the sense that we need togenerate a \code{skip} instruction just before finishing with theclosing \code{"\}"}. The reason is that we are rather pedantic aboutsemicolons in our WHILE-grammar: the last command cannot have asemicolon---adding a \code{skip} works around this snag. Putting this all together and we can generate WHILE-programs with morethan 15K JVM-instructions; run the compiled JVM code for suchprograms and marvel at the output\ldots\medskip\noindent\ldots{}Hooooray, after a few more tweaks we can finally run theBF-mandelbrot program on the JVM (after nearly 10 minutes of parsing thecorresponding WHILE-program; the size of the resulting class file isaround 32K---not too bad). The generation of the picture completeswithin 20 or so seconds. Try replicating this with an interpreter! Thegood point is that we now have a sufficiently complicated program in ourWHILE-language in order to do some benchmarking. Which means we now facethe question about what to do next\ldots\subsection*{Optimisations \& Co}Every compiler that deserves its name has to perform some optimisationson the code: if we put in the extra effort of writing a compiler for alanguage, then obviously we want to have our code to run as fast aspossible. So we should look into this in more detail.There is actually one aspect in our generated code where we can makeeasily efficiency gains. This has to do with some of the quirks of theJVM. Whenever we push a constant onto the stack, we used the JVMinstruction \instr{ldc some_const}. This is a rather generic instructionin the sense that it works not just for integers but also for strings,objects and so on. What this instruction does is putting the constantinto a \emph{constant pool} and then uses an index into this constantpool. This means \instr{ldc} will be represented by at least two bytesin the class file. While this is a sensible strategy for ``large''constants like strings, it is a bit of overkill for small integers(which many integers will be when compiling a BF-program). To counterthis ``waste'', the JVM has specific instructions for small integers,for example\begin{itemize}\item \instr{iconst_0},\ldots, \instr{iconst_5}\item \instr{bipush n}\end{itemize}\noindentwhere the \code{n} is \instr{bipush} is between -128 and 128. Byhaving dedicated instructions such as \instr{iconst_0} to\instr{iconst_5} (and \instr{iconst_m1}), we can make the generated codesize smaller as these instructions only require 1 byte (as opposed thegeneric \instr{ldc} which needs 1 byte plus another for the index intothe constant pool). While in theory the use of such special instructionsshould make the code only smaller, it actually makes the code also runfaster. Probably because the JVM has to process less code and uses aspecific instruction for the underlying CPU. The story with\instr{bipush} is slightly different, because it also uses twobytes---so it does not necessarily result in a reduction of code size.Instead, it probably uses a specific instruction in the underlying CPUthat makes the JVM code run faster.\footnote{This is all ``probable''because I have not read the 700 pages of JVM documentation by Oracle andalso have no clue how the JVM is implemented.} This means whengenerating code for pushing constants onto the stack, we can use thefollowing Scala helper-function\begin{lstlisting}[language=Scala]def compile_num(i: Int) = if (0 <= i && i <= 5) i"iconst_$i" else if (-128 <= i && i <= 127) i"bipush $i" else i"ldc $i"\end{lstlisting}\noindent It generates the more efficient instructions when pushing a small integerconstant onto the stack. The default is \instr{ldc} for any other constants.The JVM also has such special instructions forloading and storing the first three local variables. The assumption isthat most operations and arguments in a method will only use very fewlocal variables. So we can use the following instructions:\begin{itemize}\item \instr{iload_0},\ldots, \instr{iload_3}\item \instr{istore_0},\ldots, \instr{istore_3}\item \instr{aload_0},\ldots, \instr{aload_3}\item \instr{astore_0},\ldots, \instr{astore_3}\end{itemize}\noindent Having implemented these optimisations, the code size of theBF-Mandelbrot program reduces and also the class-file runs faster (theparsing part is still very slow). According to my very roughexperiments:\begin{center}\begin{tabular}{lll}& class-size & runtime\\\hlineMandelbrot:\\\hspace{5mm}unoptimised: & 33296 & 21 secs\\\hspace{5mm}optimised: & 21787 & 16 secs\\\end{tabular}\end{center}\noindent Quite good! Such optimisations are called \emph{peephole optimisations},because they involve changing one or a small set of instructions into anequivalent set that has better performance. If you look careful at our generated code you will quickly find anothersource of inefficiency in programs like\begin{lstlisting}[mathescape,language=While]x := ...;write x\end{lstlisting}\noindentwhere our code first calculates the new result the for \texttt{x} on thestack, then pops off the result into a local variable, and after thatloads the local variable back onto the stack for writing out a number.\begin{lstlisting}[mathescape,language=JVMIS]... istore 0iload 0...\end{lstlisting}\noindentIf we can detect such situations, then we can leave the value of\texttt{x} on the stack with for example the much cheaper instruction\instr{dup}. Now the problem with this optimisation is that it is quiteeasy for the snippet above, but what about instances where there isfurther WHILE-code in \emph{between} these two statements? Sometimes wewill be able to optimise, sometimes we will not. The compiler needs tofind out which situation applies. This can quickly become much morecomplicated. So we leave this kind of optimisations here and look atsomething more interesting and possibly surprising.As you might have seen, the compiler writer has a lot of freedom abouthow to generate code from what the programmer wrote as program. The onlycondition is that generated code should behave as expected by theprogrammer. Then all is fine with the code above\ldots missionaccomplished! But sometimes the compiler writer is expected to go anextra mile, or even miles and change(!) the meaning of a program.Suppose we are given the following WHILE-program:\begin{lstlisting}[mathescape,language=While]new(arr[10]);arr[14] := 3 + arr[13]\end{lstlisting}\noindent Admittedly this is a contrived program, and probably not meant to belike this by any sane programmer, but it is supposed to make thefollowing point: The program generates an array of size 10, and thentries to access the non-existing element at index 13 and even updatingthe element with index 14. Obviously this is baloney. Still, ourcompiler generates code for this program without any questions asked. Wecan even run this code on the JVM\ldots of course the result is anexception trace where the JVM yells at us for doing naughtythings.\footnote{Still this is much better than C, for example, wheresuch errors are not prevented and as a result insidious attacks can bemounted against such kind C-programs. I assume everyone has heard about\emph{Buffer Overflow Attacks}.} Now what should we do in suchsituations? Over- and underflows of indices are notoriously difficult todetect statically (at compiletime). So it might seem raising anexception at run-time like the JVM is the best compromise.Well, imagine we do not want to rely in our compiler on the JVM forproducing an annoying, but safe exception trace, rather we want tohandle such situations ourselves according to what we think shouldhappen in such cases. Let us assume we want to handle them in thefollowing way: if the programmer access a field out-of-bounds, we justreturn a default 0, and if a programmer wants to update an out-of-boundsfield, we want to ``quietly'' ignore this update. One way to achievethis would be to rewrite the WHILE-programs and insert the necessaryif-conditions for safely reading and writing arrays. Another wayis to modify the code we generate.\begin{lstlisting}[mathescape,language=JVMIS2] $\textit{index\_aexp}$ aload loc_var dup2 arraylength if_icmple L1 pop2 iconst_0 goto L2L1: swap ialoadL2:\end{lstlisting} \begin{lstlisting}[mathescape,language=JVMIS2] $\textit{index\_aexp}$ aload loc_var dup2 arraylength if_icmple L1 pop2 goto L2L1: swap $\textit{value\_aexp}$ iastoreL2:\end{lstlisting}\begin{figure}[p]\begin{center}\begin{tikzpicture}[every text node part/.style={align=left}, stack/.style={rectangle split,rectangle split parts = 5, fill=black!20,draw,text width=1.6cm,line width=0.5mm}]\node (A) {};\node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}};\node[stack,right = 60pt] (1) at (0.east) {array\nodepart{two} $\textit{index}$\nodepart{three} \ldots\phantom{l}};\node[stack,below = 40pt] (2) at (1.south) {array\nodepart{two} $\textit{index}$ \nodepart{three} array \nodepart{four} $\textit{index}$\nodepart{five} \ldots\phantom{l}}; \node[stack,left = 90pt] (3) at (2.west) {array\_len\nodepart{two} $\textit{index}$ \nodepart{three} array \nodepart{four} $\textit{index}$\nodepart{five} \ldots\phantom{l}}; \node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south) {array\nodepart{two} $\textit{index}$\nodepart{three} \ldots\phantom{l}};\node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south) {array\nodepart{two} $\textit{index}$\nodepart{three} \ldots\phantom{l}}; \node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south) {$\textit{index}$\nodepart{two} array\nodepart{three} \ldots\phantom{l}}; \node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south) {$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}};\node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south) {\ldots\phantom{l}}; \node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south) {0\nodepart{two} \ldots\phantom{l}}; \draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0); \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1);\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2); \draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3);\path[->,draw,line width=2.5mm] let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north); \path[->,draw,line width=2.5mm] let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north);\draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a);\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b); \draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a);\draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b);\end{tikzpicture} \end{center}\end{figure}goto\_w problem solved for too large jumps\end{document}%%% Local Variables: %%% mode: latex %%% TeX-master: t%%% End: