diff -r fa2589ec0fae -r 1e88390e81aa handouts/ho08.tex --- a/handouts/ho08.tex Tue Nov 17 19:12:51 2015 +0000 +++ b/handouts/ho08.tex Wed Nov 18 01:53:01 2015 +0000 @@ -13,10 +13,13 @@ \section*{Handout 8 (A Functional Language)} -The language we looked at in the previous lecture was rather -primitive and the compiler rather crude. In this handout -we like to have a look at a slightly more comfortable -language and a tiny-teeny bit more realistic compiler. A small +The language we looked at in the previous lecture was rather +primitive and the compiler rather crude---everything was +essentially compiled into a big monolithic chunk of code +inside the main function. In this handout we like to have a +look at a slightly more comfortable language, which I call +Fun-language, and a tiny-teeny bit more realistic compiler. +The Fun-language is a functional programming language. A small collection of programs we want to be able to write and compile is as follows: @@ -34,16 +37,25 @@ def gcd(a, b) = if b == 0 then a else gcd(b, a % b); \end{lstlisting} -\noindent We will still restrict us to just programs about -integers, that means for example that every function needs to -return an integer. The grammar of the Fun-language is slightly -simpler than the While-language, because almost everything is -an expression. The grammar rules are as follows: +\noindent Compare the code of the fib-program with the same +program written in the While-language\ldots Fun is definitely +more comfortable. We will still focus on programs involving +integers only, that means for example that every function is +expected to return an integer. The point of the Fun language +is to compile each function to a separate method in JVM +bytecode (not just a big monolithic code chunk). The means we +need to adapt to some of the conventions of the JVM about +methods. + +The grammar of the Fun-language is slightly simpler than the +While-language, because the main syntactic category are +expressions (we do not have statements). The grammar rules are +as follows: \begin{plstx}[rhs style=,margin=1.5cm] -: \meta{Exp} ::= \meta{Var} | \meta{Num} {\hspace{3cm}} - | \meta{Exp} + \meta{Exp} | ... | (\meta{ExP}) +: \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3cm}} + | \meta{Exp} + \meta{Exp} | ... | (\meta{Exp}) | \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp} | \code{write} \meta{Exp} {\hspace{5cm}} | \meta{Exp} ; \meta{Exp} {\hspace{5cm}} @@ -51,587 +63,151 @@ : \meta{BExp} ::= ...\\ : \meta{Decl} ::= \meta{Def} ; \meta{Decl} | \meta{Exp}\\ -: \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_2$)\\ +: \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_n$) = \meta{Exp}\\ \end{plstx} -\noindent where \meta{Id} stands for variables and \meta{Num} -for numbers. For the moment let us omit variables from -arithmetic expressions. Our parser will take this grammar and -given an input produce abstract syntax trees. For example for -the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the -following tree. - -\begin{center} -\begin{tikzpicture} \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] \end{tikzpicture} -\end{center} - -\noindent To generate code for this expression, we need to -traverse this tree in post-order fashion and emit code for -each node---this traversal in post-order fashion will produce -code for a stack-machine (what the JVM is). Doing so for the -tree above generates the instructions +\noindent where, as usual, \meta{Id} stands for variables and +\meta{Num} for numbers. We can call a function by applying the +arguments to a function name (as shown in the last clause of +\meta{Exp}). The arguments in such a function call can be +again expressions, including other function calls. In +contrast, when defining a function (see \meta{Def}-clause) the +arguments need to be variables, say $x_1$ to $x_n$. We call +the expression on the right of = in a function definition as +the \emph{body of the function}. We have the restriction that +the variables inside a function body can only be those that +are mentioned as arguments of the function. A Fun-program is +then a sequence of function definitions separated by +semicolons, and a final ``main'' call of a function that +starts the computation in the program. For example \begin{lstlisting}[numbers=none] -ldc 1 -ldc 2 -ldc 3 -imul -ldc 4 -ldc 3 -isub -iadd -iadd -\end{lstlisting} - -\noindent If we ``run'' these instructions, the result $8$ -will be on top of the stack (I leave this to you to verify; -the meaning of each instruction should be clear). The result -being on the top of the stack will be a convention we always -observe in our compiler, that is the results of arithmetic -expressions will always be on top of the stack. Note, that a -different bracketing of the expression, for example $(1 + (2 * -3)) + (4 - 3)$, produces a different abstract syntax tree and -thus potentially also a different list of instructions. -Generating code in this fashion is rather easy to implement: -it can be done with the following recursive -\textit{compile}-function, which takes the abstract syntax -tree as argument: - -\begin{center} -\begin{tabular}{lcl} -$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\ -$\textit{compile}(a_1 + a_2)$ & $\dn$ & -$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\ -$\textit{compile}(a_1 - a_2)$ & $\dn$ & -$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\ -$\textit{compile}(a_1 * a_2)$ & $\dn$ & -$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\ -$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & -$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\ -\end{tabular} -\end{center} - -However, our arithmetic expressions can also contain -variables. We will represent them as \emph{local variables} in -the JVM. Essentially, local variables are an array or pointers -to memory cells, containing in our case only integers. Looking -up a variable can be done with the instruction - -\begin{lstlisting}[mathescape,numbers=none] -iload $index$ -\end{lstlisting} - -\noindent -which places the content of the local variable $index$ onto -the stack. Storing the top of the stack into a local variable -can be done by the instruction - -\begin{lstlisting}[mathescape,numbers=none] -istore $index$ -\end{lstlisting} - -\noindent Note that this also pops off the top of the stack. -One problem we have to overcome, however, is that local -variables are addressed, not by identifiers, but by numbers -(starting from $0$). Therefore our compiler needs to maintain -a kind of environment where variables are associated to -numbers. This association needs to be unique: if we muddle up -the numbers, then we essentially confuse variables and the -consequence will usually be an erroneous result. Our extended -\textit{compile}-function for arithmetic expressions will -therefore take two arguments: the abstract syntax tree and the -environment, $E$, that maps identifiers to index-numbers. - -\begin{center} -\begin{tabular}{lcl} -$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ -$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & -$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\ -$\textit{compile}(a_1 - a_2, E)$ & $\dn$ & -$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\ -$\textit{compile}(a_1 * a_2, E)$ & $\dn$ & -$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\ -$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & -$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\ -$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\ -\end{tabular} -\end{center} - -\noindent In the last line we generate the code for variables -where $E(x)$ stands for looking up the environment to which -index the variable $x$ maps to. - -There is a similar \textit{compile}-function for boolean -expressions, but it includes a ``trick'' to do with -\pcode{if}- and \pcode{while}-statements. To explain the issue -let us first describe the compilation of statements of the -While-language. The clause for \pcode{skip} is trivial, since -we do not have to generate any instruction - -\begin{center} -\begin{tabular}{lcl} -$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\ -\end{tabular} -\end{center} - -\noindent $[]$ is the empty list of instructions. Note that -the \textit{compile}-function for statements returns a pair, a -list of instructions (in this case the empty list) and an -environment for variables. The reason for the environment is -that assignments in the While-language might change the -environment---clearly if a variable is used for the first -time, we need to allocate a new index and if it has been used -before, we need to be able to retrieve the associated index. -This is reflected in the clause for compiling assignments: - -\begin{center} -\begin{tabular}{lcl} -$\textit{compile}(x := a, E)$ & $\dn$ & -$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$ -\end{tabular} -\end{center} - -\noindent We first generate code for the right-hand side of -the assignment and then add an \pcode{istore}-instruction at -the end. By convention the result of the arithmetic expression -$a$ will be on top of the stack. After the \pcode{istore} -instruction, the result will be stored in the index -corresponding to the variable $x$. If the variable $x$ has -been used before in the program, we just need to look up what -the index is and return the environment unchanged (that is in -this case $E' = E$). However, if this is the first encounter -of the variable $x$ in the program, then we have to augment -the environment and assign $x$ with the largest index in $E$ -plus one (that is $E' = E(x \mapsto largest\_index + 1)$). -That means for the assignment $x := x + 1$ we generate the -following code - -\begin{lstlisting}[mathescape,numbers=none] -iload $n_x$ -ldc 1 -iadd -istore $n_x$ -\end{lstlisting} - -\noindent -where $n_x$ is the index for the variable $x$. - -More complicated is the code for \pcode{if}-statments, say - -\begin{lstlisting}[mathescape,language={},numbers=none] -if $b$ then $cs_1$ else $cs_2$ +def fact(n) = if n == 0 then 1 else n * fact(n - 1); +write(fact(5)) \end{lstlisting} -\noindent where $b$ is a boolean expression and the $cs_{1/2}$ -are the statements for each \pcode{if}-branch. Lets assume -we already generated code for $b$ and $cs_{1/2}$. Then in the -true-case the control-flow of the program needs to be - -\begin{center} -\begin{tikzpicture}[node distance=2mm and 4mm, - block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, - point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, - skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] -\node (A1) [point] {}; -\node (b) [block, right=of A1] {code of $b$}; -\node (A2) [point, right=of b] {}; -\node (cs1) [block, right=of A2] {code of $cs_1$}; -\node (A3) [point, right=of cs1] {}; -\node (cs2) [block, right=of A3] {code of $cs_2$}; -\node (A4) [point, right=of cs2] {}; +\noindent would be a valid Fun-program. The parser of the +Fun-language produces abstract syntax trees which in Scala +can be represented as follows: -\draw (A1) edge [->, black, line width=1mm] (b); -\draw (b) edge [->, black, line width=1mm] (cs1); -\draw (cs1) edge [->, black, line width=1mm] (A3); -\draw (A3) edge [->, black, skip loop] (A4); -\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}}; -\end{tikzpicture} -\end{center} - -\noindent where we start with running the code for $b$; since -we are in the true case we continue with running the code for -$cs_1$. After this however, we must not run the code for -$cs_2$, but always jump after the last instruction of $cs_2$ -(the code for the \pcode{else}-branch). Note that this jump is -unconditional, meaning we always have to jump to the end of -$cs_2$. The corresponding instruction of the JVM is -\pcode{goto}. In case $b$ turns out to be false we need the -control-flow -\begin{center} -\begin{tikzpicture}[node distance=2mm and 4mm, - block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, - point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, - skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] -\node (A1) [point] {}; -\node (b) [block, right=of A1] {code of $b$}; -\node (A2) [point, right=of b] {}; -\node (cs1) [block, right=of A2] {code of $cs_1$}; -\node (A3) [point, right=of cs1] {}; -\node (cs2) [block, right=of A3] {code of $cs_2$}; -\node (A4) [point, right=of cs2] {}; +{\small +\begin{lstlisting}[numbers=none] +abstract class Exp +abstract class BExp +abstract class Decl -\draw (A1) edge [->, black, line width=1mm] (b); -\draw (b) edge [->, black, line width=1mm] (A2); -\draw (A2) edge [skip loop] (A3); -\draw (A3) edge [->, black, line width=1mm] (cs2); -\draw (cs2) edge [->,black, line width=1mm] (A4); -\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; -\end{tikzpicture} -\end{center} +case class Var(s: String) extends Exp +case class Num(i: Int) extends Exp +case class Aop(o: String, a1: Exp, a2: Exp) extends Exp +case class If(a: BExp, e1: Exp, e2: Exp) extends Exp +case class Write(e: Exp) extends Exp +case class Sequ(e1: Exp, e2: Exp) extends Exp +case class Call(name: String, args: List[Exp]) extends Exp -\noindent where we now need a conditional jump (if the -if-condition is false) from the end of the code for the -boolean to the beginning of the instructions $cs_2$. Once we -are finished with running $cs_2$ we can continue with whatever -code comes after the if-statement. - -The \pcode{goto} and the conditional jumps need addresses to -where the jump should go. Since we are generating assembly -code for the JVM, we do not actually have to give (numeric) -addresses, but can just attach (symbolic) labels to our code. -These labels specify a target for a jump. Therefore the labels -need to be unique, as otherwise it would be ambiguous where a -jump should go to. A label, say \pcode{L}, is attached to code -like +case class Bop(o: String, a1: Exp, a2: Exp) extends BExp -\begin{lstlisting}[mathescape,numbers=none] -L: - $instr_1$ - $instr_2$ - $\vdots$ -\end{lstlisting} - -\noindent where a label is indicated by a colon. - -Recall the ``trick'' with compiling boolean expressions: the -\textit{compile}-function for boolean expressions takes three -arguments: an abstract syntax tree, an environment for -variable indices and also the label, $lab$, to where an conditional -jump needs to go. The clause for the expression $a_1 = a_2$, -for example, is as follows: - -\begin{center} -\begin{tabular}{lcl} -$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ -\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$} -\end{tabular} -\end{center} - -\noindent where we are first generating code for the -subexpressions $a_1$ and $a_2$. This will mean after running -the corresponding code there will be two integers on top of -the stack. If they are equal, we do not have to do anything -(except for popping them off from the stack) and just continue -with the next instructions (see control-flow of ifs above). -However if they are \emph{not} equal, then we need to -(conditionally) jump to the label $lab$. This can be done with -the instruction - -\begin{lstlisting}[mathescape,numbers=none] -if_icmpne $lab$ -\end{lstlisting} - -\noindent Other jump instructions for boolean operators are +case class Def(name: String, + args: List[String], + body: Exp) extends Decl +case class Main(e: Exp) extends Decl +\end{lstlisting}} -\begin{center} -\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} -$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\ -$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\ -$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ -\end{tabular} -\end{center} - -\noindent and so on. I leave it to you to extend the -\textit{compile}-function for the other boolean expressions. -Note that we need to jump whenever the boolean is \emph{not} -true, which means we have to ``negate'' the jump -condition---equals becomes not-equal, less becomes -greater-or-equal. If you do not like this design (it can be -the source of some nasty, hard-to-detect errors), you can also -change the layout of the code and first give the code for the -else-branch and then for the if-branch. However in the case -of while-loops this way of generating code still seems -the most convenient. - -We are now ready to give the compile function for -if-statments---remember this function returns for staments a -pair consisting of the code and an environment: +Let us first look at some clauses for compiling expressions. +The compilation of arithmetic and boolean expressions is just +like for the While-language and do not need any modification. +(recall that the \textit{compile}-function for boolean +expression takes a third argument for the label where the +contro-flow should jump when the boolean expression is +\emph{not} true---this is needed for compiling \pcode{if}s). +One additional feature in the Fun-language are sequences. +Their purpose is to do one calculation after another. The +reason why we need to be careful however is the convention +that every expression can only produce s single result +(including sequences). Since this result will be on the +top of the stack, we need to generate a +\pcode{pop}-instruction in order to clean-up the stack. Given +the expression of the form \pcode{exp1 ; exp2} we need to +generate code where after the first code chunk +a \pcode{pop}-instruction is needed. -\begin{center} -\begin{tabular}{lcl} -$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ -\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\ -\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\ -\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ -\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ -\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\ -\end{tabular} -\end{center} - -\noindent In the first two lines we generate two fresh labels -for the jump addresses (just before the else-branch and just -after). In the next two lines we generate the instructions for -the two branches, $is_1$ and $is_2$. The final code will -be first the code for $b$ (including the label -just-before-the-else-branch), then the \pcode{goto} for after -the else-branch, the label $L_\textit{ifesle}$, followed by -the instructions for the else-branch, followed by the -after-the-else-branch label. Consider for example the -if-statement: - -\begin{lstlisting}[mathescape,numbers=none,language={}] -if 1 = 1 then x := 2 else y := 3 +\begin{lstlisting}[language=JVMIS, numbers=none,mathescape] +$\textrm{\textit{compile}}($exp1$)$ +pop +$\textrm{\textit{compile}}($exp2$)$ \end{lstlisting} -\noindent -The generated code is as follows: - -\begin{lstlisting}[mathescape,language={}] - ldc 1 - ldc 1 - if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ - ldc 2 - istore 0 - goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$ -L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$ - ldc 3 - istore 1 -L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$ -\end{lstlisting} - -\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) - -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); - \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) - -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); \end{tikzpicture} +\noindent In effect we ``forget'' about the result the first +expression calculates. I leave you to think about why this +sequence operator is still useful in the Fun-language, even +if the first result is just ``discarded''. -\noindent The first three lines correspond to the the boolean -expression $1 = 1$. The jump for when this boolean expression -is false is in Line~3. Lines 4-6 corresponds to the if-branch; -the else-branch is in Lines 8 and 9. Note carefully how the -environment $E$ is threaded through the recursive calls of -\textit{compile}. The function receives an environment $E$, -but it might extend it when compiling the if-branch, yielding -$E'$. This happens for example in the if-statement above -whenever the variable \code{x} has not been used before. -Similarly with the environment $E''$ for the second call to -\textit{compile}. $E''$ is also the environment that needs to -be returned as part of the answer. - -The compilation of the while-loops, say -\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case -the condition is true and we need to do another iteration, -and the control-flow needs to be as follows - -\begin{center} -\begin{tikzpicture}[node distance=2mm and 4mm, - block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, - point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, - skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] -\node (A0) [point, left=of A1] {}; -\node (A1) [point] {}; -\node (b) [block, right=of A1] {code of $b$}; -\node (A2) [point, right=of b] {}; -\node (cs1) [block, right=of A2] {code of $cs$}; -\node (A3) [point, right=of cs1] {}; -\node (A4) [point, right=of A3] {}; - -\draw (A0) edge [->, black, line width=1mm] (b); -\draw (b) edge [->, black, line width=1mm] (cs1); -\draw (cs1) edge [->, black, line width=1mm] (A3); -\draw (A3) edge [->,skip loop] (A1); -\end{tikzpicture} -\end{center} - -\noindent Whereas if the condition is \emph{not} true, we -need to jump out of the loop, which gives the following -control flow. +There is also one small modification we have to perform when +calling the write method. Remember in the Fun-language we have +the convention that every expression needs to return an +integer as a result (located on the top of the stack). Our +helper function implementing write, however, ``consumes'' the +top element of the stack and violates this convention. +Therefore before we call, say, \pcode{write(1+2)}, we need to +duplicate the top element of the stack like so -\begin{center} -\begin{tikzpicture}[node distance=2mm and 4mm, - block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, - point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, - skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] -\node (A0) [point, left=of A1] {}; -\node (A1) [point] {}; -\node (b) [block, right=of A1] {code of $b$}; -\node (A2) [point, right=of b] {}; -\node (cs1) [block, right=of A2] {code of $cs$}; -\node (A3) [point, right=of cs1] {}; -\node (A4) [point, right=of A3] {}; - -\draw (A0) edge [->, black, line width=1mm] (b); -\draw (b) edge [->, black, line width=1mm] (A2); -\draw (A2) edge [skip loop] (A3); -\draw (A3) edge [->, black, line width=1mm] (A4); -\end{tikzpicture} -\end{center} - -\noindent Again we can use the \textit{compile}-function for -boolean expressions to insert the appropriate jump to the -end of the loop (label $L_{wend}$ below). - -\begin{center} -\begin{tabular}{lcl} -$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ -\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\ -\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\ -\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\ -\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\ -\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\ -\end{tabular} -\end{center} - -\noindent I let you go through how this clause works. As an example -you can consider the while-loop - -\begin{lstlisting}[mathescape,numbers=none,language={}] -while x <= 10 do x := x + 1 +\begin{figure}[t] +\begin{lstlisting}[language=JVMIS, + xleftmargin=2mm, + numbers=none] +.method public static write(I)V + .limit locals 1 + .limit stack 2 + getstatic java/lang/System/out Ljava/io/PrintStream; + iload 0 + invokevirtual java/io/PrintStream/println(I)V + return +.end method \end{lstlisting} - -\noindent yielding the following code - -\begin{lstlisting}[mathescape,language={}] -L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ - iload 0 - ldc 10 - if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$ - iload 0 - ldc 1 - iadd - istore 0 - goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$ -L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$ -\end{lstlisting} - -\begin{tikzpicture}[remember picture,overlay] \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) - -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); - \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) - -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); \end{tikzpicture} +\caption{The helper function for printing out integers.\label{write}} +\end{figure} -Next we need to consider the statement \pcode{write x}, which -can be used to print out the content of a variable. For this -we need to use a Java library function. In order to avoid -having to generate a lot of code for each -\pcode{write}-command, we use a separate helper-method and -just call this method with an argument (which needs to be -placed onto the stack). The code of the helper-method is as -follows. - - -\begin{lstlisting}[language=JVMIS] -.method public static write(I)V - .limit locals 1 - .limit stack 2 - getstatic java/lang/System/out Ljava/io/PrintStream; - iload 0 - invokevirtual java/io/PrintStream/println(I)V - return -.end method -\end{lstlisting} - -\noindent The first line marks the beginning of the method, -called \pcode{write}. It takes a single integer argument -indicated by the \pcode{(I)} and returns no result, indicated -by the \pcode{V}. Since the method has only one argument, we -only need a single local variable (Line~2) and a stack with -two cells will be sufficient (Line 3). Line 4 instructs the -JVM to get the value of the field \pcode{out} of the class -\pcode{java/lang/System}. It expects the value to be of type -\pcode{java/io/PrintStream}. A reference to this value will be -placed on the stack. Line~5 copies the integer we want to -print out onto the stack. In the next line we call the method -\pcode{println} (from the class \pcode{java/io/PrintStream}). -We want to print out an integer and do not expect anything -back (that is why the type annotation is \pcode{(I)V}). The -\pcode{return}-instruction in the next line changes the -control-flow back to the place from where \pcode{write} was -called. This method needs to be part of a header that is -included in any code we generate. The helper-method -\pcode{write} can be invoked with the two instructions - -\begin{lstlisting}[mathescape,language=JVMIS] -iload $E(x)$ +\begin{lstlisting}[language=JVMIS, numbers=none,mathescape] +$\textrm{\textit{compile}}($1+2$)$ +dup invokestatic XXX/XXX/write(I)V \end{lstlisting} -\noindent where we first place the variable to be printed on -top of the stack and then call \pcode{write}. The \pcode{XXX} -need to be replaced by an appropriate class name (this will be -explained shortly). - - -\begin{figure}[t] -\begin{lstlisting}[mathescape,language=JVMIS] -.class public XXX.XXX -.super java/lang/Object - -.method public ()V - aload_0 - invokenonvirtual java/lang/Object/()V - return -.end method - -.method public static main([Ljava/lang/String;)V - .limit locals 200 - .limit stack 200 - - $\textit{\ldots{}here comes the compiled code\ldots}$ +\noindent We also need to first generate code for the +argument-expression of write, which in the While-language was +only allowed to be a single variable. - return -.end method -\end{lstlisting} -\caption{Boilerplate code needed for running generated code.\label{boiler}} -\end{figure} - +Most of the new code in the compiler for the Fun-language +comes from function definitions and function calls. For this +have a look again at the helper function in +Figure~\ref{write}. Assuming we have a function definition -By generating code for a While-program, we end up with a list -of (JVM assembly) instructions. Unfortunately, there is a bit -more boilerplate code needed before these instructions can be -run. The complete code is shown in Figure~\ref{boiler}. This -boilerplate code is very specific to the JVM. If we target any -other virtual machine or a machine language, then we would -need to change this code. Lines 4 to 8 in Figure~\ref{boiler} -contain a method for object creation in the JVM; this method -is called \emph{before} the \pcode{main}-method in Lines 10 to -17. Interesting are the Lines 11 and 12 where we hardwire that -the stack of our programs will never be larger than 200 and -that the maximum number of variables is also 200. This seem to -be conservative default values that allow is to run some -simple While-programs. In a real compiler, we would of course -need to work harder and find out appropriate values for the -stack and local variables. - -To sum up, in Figure~\ref{test} is the complete code generated -for the slightly non-sensical program - -\begin{lstlisting}[mathescape,language=While] -x := 1 + 2; -write x +\begin{lstlisting}[numbers=none,mathescape] +def fname (x1, ... , xn) = ... \end{lstlisting} -\noindent Having this code at our disposal, we need the -assembler to translate the generated code into JVM bytecode (a -class file). This bytecode is understood by the JVM and can be -run by just invoking the \pcode{java}-program. +\noindent then we have to generate +\begin{lstlisting}[numbers=none,mathescape,language=JVMIS] +.method public static fname (I...I)I + .limit locals ?? + .limit stack ?? + ... + ireturn +.method end +\end{lstlisting} -\begin{figure}[p] -\lstinputlisting{../progs/test-small.j} -\caption{Generated code for a test program. This code can be -processed by an Java assembler producing a class-file, which -can be run by the \pcode{java}-program.\label{test}} -\end{figure} - +\noindent where the number of \pcode{I}s corresponds to the +number of arguments the function has, say \pcode{x1} to +\pcode{xn}. The final \pcode{I} is needed in order to indicate +that the function returns an integer. Therefore we also have +to use \pcode{ireturn} instead of \pcode{return}. However, +more interesting are the two \pcode{.limit} lines. Locals +refers to the local variables of the method, which can be +queried and overwritten using \pcode{iload} and +\pcode{istore}, respectively. + \end{document} %%% Local Variables: