afl-material: comparison handouts/ho08.tex

equal deleted inserted replaced

-:fa2589ec0fae
+:1e88390e81aa
 \section*{Handout 8 (A Functional Language)}
 The language we looked at in the previous lecture was rather
-primitive and the compiler rather crude. In this handout
+primitive and the compiler rather crude---everything was
-we like to have a look at a slightly more comfortable
+essentially compiled into a big monolithic chunk of code
-language and a tiny-teeny bit more realistic compiler. A small
+inside the main function. In this handout we like to have a
+look at a slightly more comfortable language, which I call
+Fun-language, and a tiny-teeny bit more realistic compiler.
+The Fun-language is a functional programming language. A small
 collection of programs we want to be able to write and compile
 is as follows:
 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
 def fib(n) = if n == 0 then 0
 else ack(m - 1, ack(m, n - 1));
 def gcd(a, b) = if b == 0 then a else gcd(b, a % b);
 \end{lstlisting}
-\noindent We will still restrict us to just programs about
+\noindent Compare the code of the fib-program with the same
-integers, that means for example that every function needs to
+program written in the While-language\ldots Fun is definitely
-return an integer. The grammar of the Fun-language is slightly
+more comfortable. We will still focus on programs involving
-simpler than the While-language, because almost everything is
+integers only, that means for example that every function is
-an expression. The grammar rules are as follows:
+expected to return an integer. The point of the Fun language
+is to compile each function to a separate method in JVM
+bytecode (not just a big monolithic code chunk). The means we
+need to adapt to some of the conventions of the JVM about
+methods.
+The grammar of the Fun-language is slightly simpler than the
+While-language, because the main syntactic category are
+expressions (we do not have statements). The grammar rules are
+as follows:
 \begin{plstx}[rhs style=,margin=1.5cm]
-: \meta{Exp} ::= \meta{Var} | \meta{Num} {\hspace{3cm}}
+: \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3cm}}
-|   \meta{Exp} + \meta{Exp} | ... | (\meta{ExP})
+|   \meta{Exp} + \meta{Exp} | ... | (\meta{Exp})
 |   \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp}
 |   \code{write} \meta{Exp} {\hspace{5cm}}
 |   \meta{Exp} ; \meta{Exp}  {\hspace{5cm}}
 |   \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\
 : \meta{BExp} ::= ...\\
 : \meta{Decl} ::= \meta{Def} ; \meta{Decl}
 | \meta{Exp}\\
-: \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_2$)\\
+: \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_n$) = \meta{Exp}\\
 \end{plstx}
-\noindent where \meta{Id} stands for variables and \meta{Num}
+\noindent where, as usual, \meta{Id} stands for variables and
-for numbers. For the moment let us omit variables from
+\meta{Num} for numbers. We can call a function by applying the
-arithmetic expressions. Our parser will take this grammar and
+arguments to a function name (as shown in the last clause of
-given an input produce abstract syntax trees. For example for
+\meta{Exp}). The arguments in such a function call can be
-the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
+again expressions, including other function calls. In
-following tree.
+contrast, when defining a function (see \meta{Def}-clause) the
+arguments need to be variables, say $x_1$ to $x_n$. We call
-\begin{center}
+the expression on the right of = in a function definition as
-\begin{tikzpicture}
+the \emph{body of the function}. We have the restriction that
-\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
+the variables inside a function body can only be those that
-\end{tikzpicture}
+are mentioned as arguments of the function. A Fun-program is
-\end{center}
+then a sequence of function definitions separated by
+semicolons, and a final ``main'' call of a function that
-\noindent To generate code for this expression, we need to
+starts the computation in the program. For example
-traverse this tree in post-order fashion and emit code for
-each node---this traversal in post-order fashion will produce
-code for a stack-machine (what the JVM is). Doing so for the
-tree above generates the instructions
 \begin{lstlisting}[numbers=none]
-ldc 1
+def fact(n) = if n == 0 then 1 else n * fact(n - 1);
-ldc 2
+write(fact(5))
-ldc 3
+\end{lstlisting}
-imul
-ldc 4
+\noindent would be a valid Fun-program. The parser of the
-ldc 3
+Fun-language produces abstract syntax trees which in Scala
-isub
+can be represented as follows:
-iadd
-iadd
-\end{lstlisting}
+{\small
+\begin{lstlisting}[numbers=none]
-\noindent If we ``run'' these instructions, the result $8$
+abstract class Exp
-will be on top of the stack (I leave this to you to verify;
+abstract class BExp
-the meaning of each instruction should be clear). The result
+abstract class Decl
-being on the top of the stack will be a convention we always
-observe in our compiler, that is the results of arithmetic
+case class Var(s: String) extends Exp
-expressions will always be on top of the stack. Note, that a
+case class Num(i: Int) extends Exp
-different bracketing of the expression, for example $(1 + (2 *
+case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
-3)) + (4 - 3)$, produces a different abstract syntax tree and
+case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
-thus potentially also a different list of instructions.
+case class Write(e: Exp) extends Exp
-Generating code in this fashion is rather easy to implement:
+case class Sequ(e1: Exp, e2: Exp) extends Exp
-it can be done with the following recursive
+case class Call(name: String, args: List[Exp]) extends Exp
-\textit{compile}-function, which takes the abstract syntax
-tree as argument:
+case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
-\begin{center}
+case class Def(name: String,
-\begin{tabular}{lcl}
+args: List[String],
-$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
+body: Exp) extends Decl
-$\textit{compile}(a_1 + a_2)$ & $\dn$ &
+case class Main(e: Exp) extends Decl
-$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
+\end{lstlisting}}
-$\textit{compile}(a_1 - a_2)$ & $\dn$ &
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
+Let us first look at some clauses for compiling expressions.
-$\textit{compile}(a_1 * a_2)$ & $\dn$ &
+The compilation of arithmetic and boolean expressions is just
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
+like for the While-language and do not need any modification.
-$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
+(recall that the \textit{compile}-function for boolean
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
+expression takes a third argument for the label where the
-\end{tabular}
+contro-flow should jump when the boolean expression is
-\end{center}
+\emph{not} true---this is needed for compiling \pcode{if}s).
+One additional feature in the Fun-language are sequences.
-However, our arithmetic expressions can also contain
+Their purpose is to do one calculation after another. The
-variables. We will represent them as \emph{local variables} in
+reason why we need to be careful however is the convention
-the JVM. Essentially, local variables are an array or pointers
+that every expression can only produce s single result
-to memory cells, containing in our case only integers. Looking
+(including sequences). Since this result will be on the
-up a variable can be done with the instruction
+top of the stack, we need to generate a
+\pcode{pop}-instruction in order to clean-up the stack. Given
-\begin{lstlisting}[mathescape,numbers=none]
+the expression of the form \pcode{exp1 ; exp2} we need to
-iload $index$
+generate code where after the first code chunk
-\end{lstlisting}
+a \pcode{pop}-instruction is needed.
-\noindent
+\begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
-which places the content of the local variable $index$ onto
+$\textrm{\textit{compile}}($exp1$)$
-the stack. Storing the top of the stack into a local variable
+pop
-can be done by the instruction
+$\textrm{\textit{compile}}($exp2$)$
+\end{lstlisting}
-\begin{lstlisting}[mathescape,numbers=none]
-istore $index$
+\noindent In effect we ``forget'' about the result the first
-\end{lstlisting}
+expression calculates. I leave you to think about why this
+sequence operator is still useful in the Fun-language, even
-\noindent Note that this also pops off the top of the stack.
+if the first result is just ``discarded''.
-One problem we have to overcome, however, is that local
-variables are addressed, not by identifiers, but by numbers
+There is also one small modification we have to perform when
-(starting from $0$). Therefore our compiler needs to maintain
+calling the write method. Remember in the Fun-language we have
-a kind of environment where variables are associated to
+the convention that every expression needs to return an
-numbers. This association needs to be unique: if we muddle up
+integer as a result (located on the top of the stack). Our
-the numbers, then we essentially confuse variables and the
+helper function implementing write, however, ``consumes'' the
-consequence will usually be an erroneous result. Our extended
+top element of the stack and violates this convention.
-\textit{compile}-function for arithmetic expressions will
+Therefore before we call, say, \pcode{write(1+2)}, we need to
-therefore take two arguments: the abstract syntax tree and the
+duplicate the top element of the stack like so
-environment, $E$, that maps identifiers to index-numbers.
+\begin{figure}[t]
-\begin{center}
+\begin{lstlisting}[language=JVMIS,
-\begin{tabular}{lcl}
+xleftmargin=2mm,
-$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
+numbers=none]
-$\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
+.method public static write(I)V
-$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
+.limit locals 1
-$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
+.limit stack 2
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
+getstatic java/lang/System/out Ljava/io/PrintStream;
-$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
+iload 0
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
+invokevirtual java/io/PrintStream/println(I)V
-$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
+return
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
+.end method
-$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
+\end{lstlisting}
-\end{tabular}
+\caption{The helper function for printing out integers.\label{write}}
-\end{center}
+\end{figure}
-\noindent In the last line we generate the code for variables
-where $E(x)$ stands for looking up the environment to which
+\begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
-index the variable $x$ maps to.
+$\textrm{\textit{compile}}($1+2$)$
+dup
-There is a similar \textit{compile}-function for boolean
+invokestatic XXX/XXX/write(I)V
-expressions, but it includes a ``trick'' to do with
+\end{lstlisting}
-\pcode{if}- and \pcode{while}-statements. To explain the issue
-let us first describe the compilation of statements of the
+\noindent We also need to first generate code for the
-While-language. The clause for \pcode{skip} is trivial, since
+argument-expression of write, which in the While-language was
-we do not have to generate any instruction
+only allowed to be a single variable.
-\begin{center}
+Most of the new code in the compiler for the Fun-language
-\begin{tabular}{lcl}
+comes from function definitions and function calls. For this
-$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
+have a look again at the helper function in
-\end{tabular}
+Figure~\ref{write}. Assuming we have a function definition
-\end{center}
+\begin{lstlisting}[numbers=none,mathescape]
-\noindent $[]$ is the empty list of instructions. Note that
+def fname (x1, ... , xn) = ...
-the \textit{compile}-function for statements returns a pair, a
+\end{lstlisting}
-list of instructions (in this case the empty list) and an
-environment for variables. The reason for the environment is
+\noindent then we have to generate
-that assignments in the While-language might change the
-environment---clearly if a variable is used for the first
+\begin{lstlisting}[numbers=none,mathescape,language=JVMIS]
-time, we need to allocate a new index and if it has been used
+.method public static fname (I...I)I
-before, we need to be able to retrieve the associated index.
+.limit locals ??
-This is reflected in the clause for compiling assignments:
+.limit stack ??
+...
-\begin{center}
+ireturn
-\begin{tabular}{lcl}
+.method end
-$\textit{compile}(x := a, E)$ & $\dn$ &
+\end{lstlisting}
-$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
-\end{tabular}
+\noindent where the number of \pcode{I}s corresponds to the
-\end{center}
+number of arguments the function has, say \pcode{x1} to
+\pcode{xn}. The final \pcode{I} is needed in order to indicate
-\noindent We first generate code for the right-hand side of
+that the function returns an integer. Therefore we also have
-the assignment and then add an \pcode{istore}-instruction at
+to use \pcode{ireturn} instead of \pcode{return}. However,
-the end. By convention the result of the arithmetic expression
+more interesting are the two \pcode{.limit} lines. Locals
-$a$ will be on top of the stack. After the \pcode{istore}
+refers to the local variables of the method, which can be
-instruction, the result will be stored in the index
+queried and overwritten using \pcode{iload} and
-corresponding to the variable $x$. If the variable $x$ has
+\pcode{istore}, respectively.
-been used before in the program, we just need to look up what
-the index is and return the environment unchanged (that is in
-this case $E' = E$). However, if this is the first encounter
-of the variable $x$ in the program, then we have to augment
-the environment and assign $x$ with the largest index in $E$
-plus one (that is $E' = E(x \mapsto largest\_index + 1)$).
-That means for the assignment $x := x + 1$ we generate the
-following code
-\begin{lstlisting}[mathescape,numbers=none]
-iload $n_x$
-ldc 1
-iadd
-istore $n_x$
-\end{lstlisting}
-\noindent
-where $n_x$ is the index for the variable $x$.
-More complicated is the code for \pcode{if}-statments, say
-\begin{lstlisting}[mathescape,language={},numbers=none]
-if $b$ then $cs_1$ else $cs_2$
-\end{lstlisting}
-\noindent where $b$ is a boolean expression and the $cs_{1/2}$
-are the statements for each \pcode{if}-branch. Lets assume
-we already generated code for $b$ and $cs_{1/2}$. Then in the
-true-case the control-flow of the program needs to be
-\begin{center}
-\begin{tikzpicture}[node distance=2mm and 4mm,
-block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
-point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
-skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
-\node (A1) [point] {};
-\node (b) [block, right=of A1] {code of $b$};
-\node (A2) [point, right=of b] {};
-\node (cs1) [block, right=of A2] {code of $cs_1$};
-\node (A3) [point, right=of cs1] {};
-\node (cs2) [block, right=of A3] {code of $cs_2$};
-\node (A4) [point, right=of cs2] {};
-\draw (A1) edge [->, black, line width=1mm] (b);
-\draw (b) edge [->, black, line width=1mm] (cs1);
-\draw (cs1) edge [->, black, line width=1mm] (A3);
-\draw (A3) edge [->, black, skip loop] (A4);
-\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
-\end{tikzpicture}
-\end{center}
-\noindent where we start with running the code for $b$; since
-we are in the true case we continue with running the code for
-$cs_1$. After this however, we must not run the code for
-$cs_2$, but always jump after the last instruction of $cs_2$
-(the code for the \pcode{else}-branch). Note that this jump is
-unconditional, meaning we always have to jump to the end of
-$cs_2$. The corresponding instruction of the JVM is
-\pcode{goto}. In case $b$ turns out to be false we need the
-control-flow
-\begin{center}
-\begin{tikzpicture}[node distance=2mm and 4mm,
-block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
-point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
-skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
-\node (A1) [point] {};
-\node (b) [block, right=of A1] {code of $b$};
-\node (A2) [point, right=of b] {};
-\node (cs1) [block, right=of A2] {code of $cs_1$};
-\node (A3) [point, right=of cs1] {};
-\node (cs2) [block, right=of A3] {code of $cs_2$};
-\node (A4) [point, right=of cs2] {};
-\draw (A1) edge [->, black, line width=1mm] (b);
-\draw (b) edge [->, black, line width=1mm] (A2);
-\draw (A2) edge [skip loop] (A3);
-\draw (A3) edge [->, black, line width=1mm] (cs2);
-\draw (cs2) edge [->,black, line width=1mm] (A4);
-\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
-\end{tikzpicture}
-\end{center}
-\noindent where we now need a conditional jump (if the
-if-condition is false) from the end of the code for the
-boolean to the beginning of the instructions $cs_2$. Once we
-are finished with running $cs_2$ we can continue with whatever
-code comes after the if-statement.
-The \pcode{goto} and the conditional jumps need addresses to
-where the jump should go. Since we are generating assembly
-code for the JVM, we do not actually have to give (numeric)
-addresses, but can just attach (symbolic) labels to our code.
-These labels specify a target for a jump. Therefore the labels
-need to be unique, as otherwise it would be ambiguous where a
-jump should go to. A label, say \pcode{L}, is attached to code
-like
-\begin{lstlisting}[mathescape,numbers=none]
-L:
-$instr_1$
-$instr_2$
-$\vdots$
-\end{lstlisting}
-\noindent where a label is indicated by a colon.
-Recall the ``trick'' with compiling boolean expressions: the
-\textit{compile}-function for boolean expressions takes three
-arguments: an abstract syntax tree, an environment for
-variable indices and also the label, $lab$, to where an conditional
-jump needs to go. The clause for the expression $a_1 = a_2$,
-for example, is as follows:
-\begin{center}
-\begin{tabular}{lcl}
-$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\
-\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$}
-\end{tabular}
-\end{center}
-\noindent where we are first generating code for the
-subexpressions $a_1$ and $a_2$. This will mean after running
-the corresponding code there will be two integers on top of
-the stack. If they are equal, we do not have to do anything
-(except for popping them off from the stack) and just continue
-with the next instructions (see control-flow of ifs above).
-However if they are \emph{not} equal, then we need to
-(conditionally) jump to the label $lab$. This can be done with
-the instruction
-\begin{lstlisting}[mathescape,numbers=none]
-if_icmpne $lab$
-\end{lstlisting}
-\noindent Other jump instructions for boolean operators are
-\begin{center}
-\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
-$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
-$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
-$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
-\end{tabular}
-\end{center}
-\noindent and so on. I leave it to you to extend the
-\textit{compile}-function for the other boolean expressions.
-Note that we need to jump whenever the boolean is \emph{not}
-true, which means we have to ``negate'' the jump
-condition---equals becomes not-equal, less becomes
-greater-or-equal. If you do not like this design (it can be
-the source of some nasty, hard-to-detect errors), you can also
-change the layout of the code and first give the code for the
-else-branch and then for the if-branch. However in the case
-of while-loops this way of generating code still seems
-the most convenient.
-We are now ready to give the compile function for
-if-statments---remember this function returns for staments a
-pair consisting of the code and an environment:
-\begin{center}
-\begin{tabular}{lcl}
-$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\
-\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
-\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
-\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
-\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\
-\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\
-\end{tabular}
-\end{center}
-\noindent In the first two lines we generate two fresh labels
-for the jump addresses (just before the else-branch and just
-after). In the next two lines we generate the instructions for
-the two branches, $is_1$ and $is_2$. The final code will
-be first the code for $b$ (including the label
-just-before-the-else-branch), then the \pcode{goto} for after
-the else-branch, the label $L_\textit{ifesle}$, followed by
-the instructions for the else-branch, followed by the
-after-the-else-branch label. Consider for example the
-if-statement:
-\begin{lstlisting}[mathescape,numbers=none,language={}]
-if 1 = 1 then x := 2 else y := 3
-\end{lstlisting}
-\noindent
-The generated code is as follows:
-\begin{lstlisting}[mathescape,language={}]
-ldc 1
-ldc 1
-if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
-ldc 2
-istore 0
-goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$
-L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$
-ldc 3
-istore 1
-L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$
-\end{lstlisting}
-\begin{tikzpicture}[remember picture,overlay]
-\draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm)
--- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);
-\draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm)
--- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);
-\end{tikzpicture}
-\noindent The first three lines correspond to the the boolean
-expression $1 = 1$. The jump for when this boolean expression
-is false is in Line~3. Lines 4-6 corresponds to the if-branch;
-the else-branch is in Lines 8 and 9. Note carefully how the
-environment $E$ is threaded through the recursive calls of
-\textit{compile}. The function receives an environment $E$,
-but it might extend it when compiling the if-branch, yielding
-$E'$. This happens for example in the if-statement above
-whenever the variable \code{x} has not been used before.
-Similarly with the environment $E''$ for the second call to
-\textit{compile}. $E''$ is also the environment that needs to
-be returned as part of the answer.
-The compilation of the while-loops, say
-\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
-the condition is true and we need to do another iteration,
-and the control-flow needs to be as follows
-\begin{center}
-\begin{tikzpicture}[node distance=2mm and 4mm,
-block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
-point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
-skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
-\node (A0) [point, left=of A1] {};
-\node (A1) [point] {};
-\node (b) [block, right=of A1] {code of $b$};
-\node (A2) [point, right=of b] {};
-\node (cs1) [block, right=of A2] {code of $cs$};
-\node (A3) [point, right=of cs1] {};
-\node (A4) [point, right=of A3] {};
-\draw (A0) edge [->, black, line width=1mm] (b);
-\draw (b) edge [->, black, line width=1mm] (cs1);
-\draw (cs1) edge [->, black, line width=1mm] (A3);
-\draw (A3) edge [->,skip loop] (A1);
-\end{tikzpicture}
-\end{center}
-\noindent Whereas if the condition is \emph{not} true, we
-need to jump out of the loop, which gives the following
-control flow.
-\begin{center}
-\begin{tikzpicture}[node distance=2mm and 4mm,
-block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
-point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
-skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
-\node (A0) [point, left=of A1] {};
-\node (A1) [point] {};
-\node (b) [block, right=of A1] {code of $b$};
-\node (A2) [point, right=of b] {};
-\node (cs1) [block, right=of A2] {code of $cs$};
-\node (A3) [point, right=of cs1] {};
-\node (A4) [point, right=of A3] {};
-\draw (A0) edge [->, black, line width=1mm] (b);
-\draw (b) edge [->, black, line width=1mm] (A2);
-\draw (A2) edge [skip loop] (A3);
-\draw (A3) edge [->, black, line width=1mm] (A4);
-\end{tikzpicture}
-\end{center}
-\noindent Again we can use the \textit{compile}-function for
-boolean expressions to insert the appropriate jump to the
-end of the loop (label $L_{wend}$ below).
-\begin{center}
-\begin{tabular}{lcl}
-$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\
-\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
-\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
-\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
-\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
-\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
-\end{tabular}
-\end{center}
-\noindent I let you go through how this clause works. As an example
-you can consider the while-loop
-\begin{lstlisting}[mathescape,numbers=none,language={}]
-while x <= 10 do x := x + 1
-\end{lstlisting}
-\noindent yielding the following code
-\begin{lstlisting}[mathescape,language={}]
-L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
-iload 0
-ldc 10
-if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
-iload 0
-ldc 1
-iadd
-istore 0
-goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
-L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
-\end{lstlisting}
-\begin{tikzpicture}[remember picture,overlay]
-\draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm)
--- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
-\draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm)
--- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
-\end{tikzpicture}
-Next we need to consider the statement \pcode{write x}, which
-can be used to print out the content of a variable. For this
-we need to use a Java library function. In order to avoid
-having to generate a lot of code for each
-\pcode{write}-command, we use a separate helper-method and
-just call this method with an argument (which needs to be
-placed onto the stack). The code of the helper-method is as
-follows.
-\begin{lstlisting}[language=JVMIS]
-.method public static write(I)V
-.limit locals 1
-.limit stack 2
-getstatic java/lang/System/out Ljava/io/PrintStream;
-iload 0
-invokevirtual java/io/PrintStream/println(I)V
-return
-.end method
-\end{lstlisting}
-\noindent The first line marks the beginning of the method,
-called \pcode{write}. It takes a single integer argument
-indicated by the \pcode{(I)} and returns no result, indicated
-by the \pcode{V}. Since the method has only one argument, we
-only need a single local variable (Line~2) and a stack with
-two cells will be sufficient (Line 3). Line 4 instructs the
-JVM to get the value of the field \pcode{out} of the class
-\pcode{java/lang/System}. It expects the value to be of type
-\pcode{java/io/PrintStream}. A reference to this value will be
-placed on the stack. Line~5 copies the integer we want to
-print out onto the stack. In the next line we call the method
-\pcode{println} (from the class \pcode{java/io/PrintStream}).
-We want to print out an integer and do not expect anything
-back (that is why the type annotation is \pcode{(I)V}). The
-\pcode{return}-instruction in the next line changes the
-control-flow back to the place from where \pcode{write} was
-called. This method needs to be part of a header that is
-included in any code we generate. The helper-method
-\pcode{write} can be invoked with the two instructions
-\begin{lstlisting}[mathescape,language=JVMIS]
-iload $E(x)$
-invokestatic XXX/XXX/write(I)V
-\end{lstlisting}
-\noindent where we first place the variable to be printed on
-top of the stack and then call \pcode{write}. The \pcode{XXX}
-need to be replaced by an appropriate class name (this will be
-explained shortly).
-\begin{figure}[t]
-\begin{lstlisting}[mathescape,language=JVMIS]
-.class public XXX.XXX
-.super java/lang/Object
-.method public <init>()V
-aload_0
-invokenonvirtual java/lang/Object/<init>()V
-return
-.end method
-.method public static main([Ljava/lang/String;)V
-.limit locals 200
-.limit stack 200
-$\textit{\ldots{}here comes the compiled code\ldots}$
-return
-.end method
-\end{lstlisting}
-\caption{Boilerplate code needed for running generated code.\label{boiler}}
-\end{figure}
-By generating code for a While-program, we end up with a list
-of (JVM assembly) instructions. Unfortunately, there is a bit
-more boilerplate code needed before these instructions can be
-run. The complete code is shown in Figure~\ref{boiler}. This
-boilerplate code is very specific to the JVM. If we target any
-other virtual machine or a machine language, then we would
-need to change this code. Lines 4 to 8 in Figure~\ref{boiler}
-contain a method for object creation in the JVM; this method
-is called \emph{before} the \pcode{main}-method in Lines 10 to
-17. Interesting are the Lines 11 and 12 where we hardwire that
-the stack of our programs will never be larger than 200 and
-that the maximum number of variables is also 200. This seem to
-be conservative default values that allow is to run some
-simple While-programs. In a real compiler, we would of course
-need to work harder and find out appropriate values for the
-stack and local variables.
-To sum up, in Figure~\ref{test} is the complete code generated
-for the slightly non-sensical program
-\begin{lstlisting}[mathescape,language=While]
-x := 1 + 2;
-write x
-\end{lstlisting}
-\noindent Having this code at our disposal, we need the
-assembler to translate the generated code into JVM bytecode (a
-class file). This bytecode is understood by the JVM and can be
-run by just invoking the \pcode{java}-program.
-\begin{figure}[p]
-\lstinputlisting{../progs/test-small.j}
-\caption{Generated code for a test program. This code can be
-processed by an Java assembler producing a class-file, which
-can be run by the \pcode{java}-program.\label{test}}
-\end{figure}
 \end{document}
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t

changeset 380	1e88390e81aa
parent 379	fa2589ec0fae
child 381	47eceea734c5