handouts/ho08.tex
changeset 380 1e88390e81aa
parent 379 fa2589ec0fae
child 381 47eceea734c5
equal deleted inserted replaced
379:fa2589ec0fae 380:1e88390e81aa
    11 
    11 
    12 
    12 
    13 \section*{Handout 8 (A Functional Language)}
    13 \section*{Handout 8 (A Functional Language)}
    14 
    14 
    15 
    15 
    16 The language we looked at in the previous lecture was rather 
    16 The language we looked at in the previous lecture was rather
    17 primitive and the compiler rather crude. In this handout
    17 primitive and the compiler rather crude---everything was
    18 we like to have a look at a slightly more comfortable
    18 essentially compiled into a big monolithic chunk of code
    19 language and a tiny-teeny bit more realistic compiler. A small
    19 inside the main function. In this handout we like to have a
       
    20 look at a slightly more comfortable language, which I call
       
    21 Fun-language, and a tiny-teeny bit more realistic compiler.
       
    22 The Fun-language is a functional programming language. A small
    20 collection of programs we want to be able to write and compile
    23 collection of programs we want to be able to write and compile
    21 is as follows:
    24 is as follows:
    22 
    25 
    23 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
    26 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
    24 def fib(n) = if n == 0 then 0 
    27 def fib(n) = if n == 0 then 0 
    32                 else ack(m - 1, ack(m, n - 1));
    35                 else ack(m - 1, ack(m, n - 1));
    33                 
    36                 
    34 def gcd(a, b) = if b == 0 then a else gcd(b, a % b);                
    37 def gcd(a, b) = if b == 0 then a else gcd(b, a % b);                
    35 \end{lstlisting}
    38 \end{lstlisting}
    36 
    39 
    37 \noindent We will still restrict us to just programs about
    40 \noindent Compare the code of the fib-program with the same
    38 integers, that means for example that every function needs to
    41 program written in the While-language\ldots Fun is definitely
    39 return an integer. The grammar of the Fun-language is slightly 
    42 more comfortable. We will still focus on programs involving
    40 simpler than the While-language, because almost everything is 
    43 integers only, that means for example that every function is
    41 an expression. The grammar rules are as follows:
    44 expected to return an integer. The point of the Fun language
       
    45 is to compile each function to a separate method in JVM
       
    46 bytecode (not just a big monolithic code chunk). The means we
       
    47 need to adapt to some of the conventions of the JVM about
       
    48 methods.
       
    49 
       
    50 The grammar of the Fun-language is slightly simpler than the
       
    51 While-language, because the main syntactic category are
       
    52 expressions (we do not have statements). The grammar rules are
       
    53 as follows:
    42 
    54 
    43 
    55 
    44 \begin{plstx}[rhs style=,margin=1.5cm]
    56 \begin{plstx}[rhs style=,margin=1.5cm]
    45 : \meta{Exp} ::= \meta{Var} | \meta{Num} {\hspace{3cm}}
    57 : \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3cm}}
    46              |   \meta{Exp} + \meta{Exp} | ... | (\meta{ExP})
    58              |   \meta{Exp} + \meta{Exp} | ... | (\meta{Exp})
    47              |   \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp}
    59              |   \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp}
    48              |   \code{write} \meta{Exp} {\hspace{5cm}}
    60              |   \code{write} \meta{Exp} {\hspace{5cm}}
    49              |   \meta{Exp} ; \meta{Exp}  {\hspace{5cm}}
    61              |   \meta{Exp} ; \meta{Exp}  {\hspace{5cm}}
    50              |   \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\
    62              |   \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\
    51 : \meta{BExp} ::= ...\\
    63 : \meta{BExp} ::= ...\\
    52 : \meta{Decl} ::= \meta{Def} ; \meta{Decl}
    64 : \meta{Decl} ::= \meta{Def} ; \meta{Decl}
    53              | \meta{Exp}\\
    65              | \meta{Exp}\\
    54 : \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_2$)\\               
    66 : \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_n$) = \meta{Exp}\\               
    55 \end{plstx}
    67 \end{plstx}
    56 
    68 
    57 \noindent where \meta{Id} stands for variables and \meta{Num}
    69 \noindent where, as usual, \meta{Id} stands for variables and
    58 for numbers. For the moment let us omit variables from
    70 \meta{Num} for numbers. We can call a function by applying the
    59 arithmetic expressions. Our parser will take this grammar and
    71 arguments to a function name (as shown in the last clause of
    60 given an input produce abstract syntax trees. For example for
    72 \meta{Exp}). The arguments in such a function call can be
    61 the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
    73 again expressions, including other function calls. In
    62 following tree.
    74 contrast, when defining a function (see \meta{Def}-clause) the
    63 
    75 arguments need to be variables, say $x_1$ to $x_n$. We call
    64 \begin{center}
    76 the expression on the right of = in a function definition as
    65 \begin{tikzpicture}
    77 the \emph{body of the function}. We have the restriction that
    66 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
    78 the variables inside a function body can only be those that
    67 \end{tikzpicture}
    79 are mentioned as arguments of the function. A Fun-program is
    68 \end{center}
    80 then a sequence of function definitions separated by
    69 
    81 semicolons, and a final ``main'' call of a function that
    70 \noindent To generate code for this expression, we need to
    82 starts the computation in the program. For example
    71 traverse this tree in post-order fashion and emit code for
       
    72 each node---this traversal in post-order fashion will produce
       
    73 code for a stack-machine (what the JVM is). Doing so for the
       
    74 tree above generates the instructions
       
    75 
    83 
    76 \begin{lstlisting}[numbers=none]
    84 \begin{lstlisting}[numbers=none]
    77 ldc 1 
    85 def fact(n) = if n == 0 then 1 else n * fact(n - 1);
    78 ldc 2 
    86 write(fact(5))
    79 ldc 3 
    87 \end{lstlisting}
    80 imul 
    88 
    81 ldc 4 
    89 \noindent would be a valid Fun-program. The parser of the
    82 ldc 3 
    90 Fun-language produces abstract syntax trees which in Scala
    83 isub 
    91 can be represented as follows:
    84 iadd 
    92 
    85 iadd
    93 
    86 \end{lstlisting}
    94 {\small
    87 
    95 \begin{lstlisting}[numbers=none]
    88 \noindent If we ``run'' these instructions, the result $8$
    96 abstract class Exp
    89 will be on top of the stack (I leave this to you to verify;
    97 abstract class BExp 
    90 the meaning of each instruction should be clear). The result
    98 abstract class Decl
    91 being on the top of the stack will be a convention we always
    99 
    92 observe in our compiler, that is the results of arithmetic
   100 case class Var(s: String) extends Exp
    93 expressions will always be on top of the stack. Note, that a
   101 case class Num(i: Int) extends Exp
    94 different bracketing of the expression, for example $(1 + (2 *
   102 case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
    95 3)) + (4 - 3)$, produces a different abstract syntax tree and
   103 case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
    96 thus potentially also a different list of instructions.
   104 case class Write(e: Exp) extends Exp
    97 Generating code in this fashion is rather easy to implement:
   105 case class Sequ(e1: Exp, e2: Exp) extends Exp
    98 it can be done with the following recursive
   106 case class Call(name: String, args: List[Exp]) extends Exp
    99 \textit{compile}-function, which takes the abstract syntax
   107 
   100 tree as argument:
   108 case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
   101 
   109 
   102 \begin{center}
   110 case class Def(name: String, 
   103 \begin{tabular}{lcl}
   111                args: List[String], 
   104 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
   112                body: Exp) extends Decl
   105 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
   113 case class Main(e: Exp) extends Decl
   106 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
   114 \end{lstlisting}}
   107 $\textit{compile}(a_1 - a_2)$ & $\dn$ & 
   115 
   108 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
   116 Let us first look at some clauses for compiling expressions.
   109 $\textit{compile}(a_1 * a_2)$ & $\dn$ & 
   117 The compilation of arithmetic and boolean expressions is just
   110 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
   118 like for the While-language and do not need any modification.
   111 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
   119 (recall that the \textit{compile}-function for boolean
   112 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
   120 expression takes a third argument for the label where the
   113 \end{tabular}
   121 contro-flow should jump when the boolean expression is
   114 \end{center}
   122 \emph{not} true---this is needed for compiling \pcode{if}s).
   115 
   123 One additional feature in the Fun-language are sequences.
   116 However, our arithmetic expressions can also contain
   124 Their purpose is to do one calculation after another. The
   117 variables. We will represent them as \emph{local variables} in
   125 reason why we need to be careful however is the convention
   118 the JVM. Essentially, local variables are an array or pointers
   126 that every expression can only produce s single result
   119 to memory cells, containing in our case only integers. Looking
   127 (including sequences). Since this result will be on the
   120 up a variable can be done with the instruction
   128 top of the stack, we need to generate a 
   121 
   129 \pcode{pop}-instruction in order to clean-up the stack. Given 
   122 \begin{lstlisting}[mathescape,numbers=none]
   130 the expression of the form \pcode{exp1 ; exp2} we need to 
   123 iload $index$
   131 generate code where after the first code chunk
   124 \end{lstlisting}
   132 a \pcode{pop}-instruction is needed. 
   125 
   133 
   126 \noindent 
   134 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   127 which places the content of the local variable $index$ onto 
   135 $\textrm{\textit{compile}}($exp1$)$
   128 the stack. Storing the top of the stack into a local variable 
   136 pop
   129 can be done by the instruction
   137 $\textrm{\textit{compile}}($exp2$)$
   130 
   138 \end{lstlisting}
   131 \begin{lstlisting}[mathescape,numbers=none]
   139 
   132 istore $index$
   140 \noindent In effect we ``forget'' about the result the first
   133 \end{lstlisting}
   141 expression calculates. I leave you to think about why this
   134 
   142 sequence operator is still useful in the Fun-language, even 
   135 \noindent Note that this also pops off the top of the stack.
   143 if the first result is just ``discarded''. 
   136 One problem we have to overcome, however, is that local
   144 
   137 variables are addressed, not by identifiers, but by numbers
   145 There is also one small modification we have to perform when
   138 (starting from $0$). Therefore our compiler needs to maintain
   146 calling the write method. Remember in the Fun-language we have
   139 a kind of environment where variables are associated to
   147 the convention that every expression needs to return an
   140 numbers. This association needs to be unique: if we muddle up
   148 integer as a result (located on the top of the stack). Our
   141 the numbers, then we essentially confuse variables and the
   149 helper function implementing write, however, ``consumes'' the
   142 consequence will usually be an erroneous result. Our extended
   150 top element of the stack and violates this convention.
   143 \textit{compile}-function for arithmetic expressions will
   151 Therefore before we call, say, \pcode{write(1+2)}, we need to
   144 therefore take two arguments: the abstract syntax tree and the
   152 duplicate the top element of the stack like so
   145 environment, $E$, that maps identifiers to index-numbers.
   153 
   146 
   154 \begin{figure}[t]
   147 \begin{center}
   155 \begin{lstlisting}[language=JVMIS, 
   148 \begin{tabular}{lcl}
   156                    xleftmargin=2mm, 
   149 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
   157                    numbers=none]
   150 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
   158 .method public static write(I)V 
   151 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
   159    .limit locals 1
   152 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
   160    .limit stack 2
   153 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
   161    getstatic java/lang/System/out Ljava/io/PrintStream; 
   154 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
   162    iload 0 
   155 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
   163    invokevirtual java/io/PrintStream/println(I)V 
   156 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
   164    return 
   157 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
   165 .end method
   158 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
   166 \end{lstlisting}
   159 \end{tabular}
   167 \caption{The helper function for printing out integers.\label{write}}
   160 \end{center}
   168 \end{figure}
   161 
   169 
   162 \noindent In the last line we generate the code for variables
   170 
   163 where $E(x)$ stands for looking up the environment to which
   171 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   164 index the variable $x$ maps to.
   172 $\textrm{\textit{compile}}($1+2$)$
   165 
   173 dup
   166 There is a similar \textit{compile}-function for boolean
   174 invokestatic XXX/XXX/write(I)V
   167 expressions, but it includes a ``trick'' to do with
   175 \end{lstlisting}
   168 \pcode{if}- and \pcode{while}-statements. To explain the issue
   176 
   169 let us first describe the compilation of statements of the
   177 \noindent We also need to first generate code for the
   170 While-language. The clause for \pcode{skip} is trivial, since
   178 argument-expression of write, which in the While-language was
   171 we do not have to generate any instruction
   179 only allowed to be a single variable.
   172 
   180 
   173 \begin{center}
   181 Most of the new code in the compiler for the Fun-language
   174 \begin{tabular}{lcl}
   182 comes from function definitions and function calls. For this
   175 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
   183 have a look again at the helper function in
   176 \end{tabular}
   184 Figure~\ref{write}. Assuming we have a function definition
   177 \end{center}
   185 
   178 
   186 \begin{lstlisting}[numbers=none,mathescape]
   179 \noindent $[]$ is the empty list of instructions. Note that
   187 def fname (x1, ... , xn) = ...
   180 the \textit{compile}-function for statements returns a pair, a
   188 \end{lstlisting}
   181 list of instructions (in this case the empty list) and an
   189 
   182 environment for variables. The reason for the environment is
   190 \noindent then we have to generate
   183 that assignments in the While-language might change the
   191 
   184 environment---clearly if a variable is used for the first
   192 \begin{lstlisting}[numbers=none,mathescape,language=JVMIS]
   185 time, we need to allocate a new index and if it has been used
   193 .method public static fname (I...I)I
   186 before, we need to be able to retrieve the associated index.
   194    .limit locals ??
   187 This is reflected in the clause for compiling assignments:
   195    .limit stack ?? 
   188 
   196    ...
   189 \begin{center}
   197   ireturn
   190 \begin{tabular}{lcl}
   198 .method end  
   191 $\textit{compile}(x := a, E)$ & $\dn$ & 
   199 \end{lstlisting}
   192 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
   200 
   193 \end{tabular}
   201 \noindent where the number of \pcode{I}s corresponds to the
   194 \end{center}
   202 number of arguments the function has, say \pcode{x1} to
   195 
   203 \pcode{xn}. The final \pcode{I} is needed in order to indicate
   196 \noindent We first generate code for the right-hand side of
   204 that the function returns an integer. Therefore we also have
   197 the assignment and then add an \pcode{istore}-instruction at
   205 to use \pcode{ireturn} instead of \pcode{return}. However,
   198 the end. By convention the result of the arithmetic expression
   206 more interesting are the two \pcode{.limit} lines. Locals
   199 $a$ will be on top of the stack. After the \pcode{istore}
   207 refers to the local variables of the method, which can be
   200 instruction, the result will be stored in the index
   208 queried and overwritten using \pcode{iload} and
   201 corresponding to the variable $x$. If the variable $x$ has
   209 \pcode{istore}, respectively.
   202 been used before in the program, we just need to look up what
       
   203 the index is and return the environment unchanged (that is in
       
   204 this case $E' = E$). However, if this is the first encounter 
       
   205 of the variable $x$ in the program, then we have to augment 
       
   206 the environment and assign $x$ with the largest index in $E$
       
   207 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
       
   208 That means for the assignment $x := x + 1$ we generate the
       
   209 following code
       
   210 
       
   211 \begin{lstlisting}[mathescape,numbers=none]
       
   212 iload $n_x$
       
   213 ldc 1
       
   214 iadd
       
   215 istore $n_x$
       
   216 \end{lstlisting}
       
   217 
       
   218 \noindent 
       
   219 where $n_x$ is the index for the variable $x$.
       
   220 
       
   221 More complicated is the code for \pcode{if}-statments, say
       
   222 
       
   223 \begin{lstlisting}[mathescape,language={},numbers=none]
       
   224 if $b$ then $cs_1$ else $cs_2$
       
   225 \end{lstlisting}
       
   226 
       
   227 \noindent where $b$ is a boolean expression and the $cs_{1/2}$
       
   228 are the statements for each \pcode{if}-branch. Lets assume
       
   229 we already generated code for $b$ and $cs_{1/2}$. Then in the
       
   230 true-case the control-flow of the program needs to be
       
   231 
       
   232 \begin{center}
       
   233 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   234  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   235  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   236  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   237 \node (A1) [point] {};
       
   238 \node (b) [block, right=of A1] {code of $b$};
       
   239 \node (A2) [point, right=of b] {};
       
   240 \node (cs1) [block, right=of A2] {code of $cs_1$};
       
   241 \node (A3) [point, right=of cs1] {};
       
   242 \node (cs2) [block, right=of A3] {code of $cs_2$};
       
   243 \node (A4) [point, right=of cs2] {};
       
   244 
       
   245 \draw (A1) edge [->, black, line width=1mm] (b);
       
   246 \draw (b) edge [->, black, line width=1mm] (cs1);
       
   247 \draw (cs1) edge [->, black, line width=1mm] (A3);
       
   248 \draw (A3) edge [->, black, skip loop] (A4);
       
   249 \node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
       
   250 \end{tikzpicture}
       
   251 \end{center}
       
   252 
       
   253 \noindent where we start with running the code for $b$; since
       
   254 we are in the true case we continue with running the code for
       
   255 $cs_1$. After this however, we must not run the code for
       
   256 $cs_2$, but always jump after the last instruction of $cs_2$
       
   257 (the code for the \pcode{else}-branch). Note that this jump is
       
   258 unconditional, meaning we always have to jump to the end of
       
   259 $cs_2$. The corresponding instruction of the JVM is
       
   260 \pcode{goto}. In case $b$ turns out to be false we need the
       
   261 control-flow
       
   262 
       
   263 \begin{center}
       
   264 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   265  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   266  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   267  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   268 \node (A1) [point] {};
       
   269 \node (b) [block, right=of A1] {code of $b$};
       
   270 \node (A2) [point, right=of b] {};
       
   271 \node (cs1) [block, right=of A2] {code of $cs_1$};
       
   272 \node (A3) [point, right=of cs1] {};
       
   273 \node (cs2) [block, right=of A3] {code of $cs_2$};
       
   274 \node (A4) [point, right=of cs2] {};
       
   275 
       
   276 \draw (A1) edge [->, black, line width=1mm] (b);
       
   277 \draw (b) edge [->, black, line width=1mm] (A2);
       
   278 \draw (A2) edge [skip loop] (A3);
       
   279 \draw (A3) edge [->, black, line width=1mm] (cs2);
       
   280 \draw (cs2) edge [->,black, line width=1mm] (A4);
       
   281 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
       
   282 \end{tikzpicture}
       
   283 \end{center}
       
   284 
       
   285 \noindent where we now need a conditional jump (if the
       
   286 if-condition is false) from the end of the code for the 
       
   287 boolean to the beginning of the instructions $cs_2$. Once we 
       
   288 are finished with running $cs_2$ we can continue with whatever
       
   289 code comes after the if-statement.
       
   290 
       
   291 The \pcode{goto} and the conditional jumps need addresses to
       
   292 where the jump should go. Since we are generating assembly
       
   293 code for the JVM, we do not actually have to give (numeric)
       
   294 addresses, but can just attach (symbolic) labels to our code.
       
   295 These labels specify a target for a jump. Therefore the labels
       
   296 need to be unique, as otherwise it would be ambiguous where a
       
   297 jump should go to. A label, say \pcode{L}, is attached to code
       
   298 like
       
   299 
       
   300 \begin{lstlisting}[mathescape,numbers=none]
       
   301 L:
       
   302   $instr_1$
       
   303   $instr_2$
       
   304     $\vdots$
       
   305 \end{lstlisting}
       
   306  
   210  
   307 \noindent where a label is indicated by a colon. 
       
   308  
       
   309 Recall the ``trick'' with compiling boolean expressions: the 
       
   310 \textit{compile}-function for boolean expressions takes three
       
   311 arguments: an abstract syntax tree, an environment for 
       
   312 variable indices and also the label, $lab$, to where an conditional 
       
   313 jump needs to go. The clause for the expression $a_1 = a_2$, 
       
   314 for example, is as follows:
       
   315 
       
   316 \begin{center}
       
   317 \begin{tabular}{lcl}
       
   318 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ 
       
   319 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$}
       
   320 \end{tabular}
       
   321 \end{center}
       
   322 
       
   323 \noindent where we are first generating code for the
       
   324 subexpressions $a_1$ and $a_2$. This will mean after running
       
   325 the corresponding code there will be two integers on top of
       
   326 the stack. If they are equal, we do not have to do anything
       
   327 (except for popping them off from the stack) and just continue
       
   328 with the next instructions (see control-flow of ifs above).
       
   329 However if they are \emph{not} equal, then we need to
       
   330 (conditionally) jump to the label $lab$. This can be done with
       
   331 the instruction
       
   332 
       
   333 \begin{lstlisting}[mathescape,numbers=none]
       
   334 if_icmpne $lab$
       
   335 \end{lstlisting}
       
   336 
       
   337 \noindent Other jump instructions for boolean operators are
       
   338 
       
   339 \begin{center}
       
   340 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
       
   341 $\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
       
   342 $<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
       
   343 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
       
   344 \end{tabular}
       
   345 \end{center}
       
   346 
       
   347 \noindent and so on. I leave it to you to extend the
       
   348 \textit{compile}-function for the other boolean expressions.
       
   349 Note that we need to jump whenever the boolean is \emph{not}
       
   350 true, which means we have to ``negate'' the jump
       
   351 condition---equals becomes not-equal, less becomes
       
   352 greater-or-equal. If you do not like this design (it can be
       
   353 the source of some nasty, hard-to-detect errors), you can also
       
   354 change the layout of the code and first give the code for the
       
   355 else-branch and then for the if-branch. However in the case
       
   356 of while-loops this way of generating code still seems
       
   357 the most convenient.
       
   358 
       
   359 We are now ready to give the compile function for 
       
   360 if-statments---remember this function returns for staments a 
       
   361 pair consisting of the code and an environment:
       
   362 
       
   363 \begin{center}
       
   364 \begin{tabular}{lcl}
       
   365 $\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ 
       
   366 \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
       
   367 \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
       
   368 \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
       
   369 \multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\
       
   370 \multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\
       
   371 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\
       
   372 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\
       
   373 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\
       
   374 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\
       
   375 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\
       
   376 \end{tabular}
       
   377 \end{center}
       
   378 
       
   379 \noindent In the first two lines we generate two fresh labels
       
   380 for the jump addresses (just before the else-branch and just
       
   381 after). In the next two lines we generate the instructions for
       
   382 the two branches, $is_1$ and $is_2$. The final code will
       
   383 be first the code for $b$ (including the label 
       
   384 just-before-the-else-branch), then the \pcode{goto} for after
       
   385 the else-branch, the label $L_\textit{ifesle}$, followed by
       
   386 the instructions for the else-branch, followed by the 
       
   387 after-the-else-branch label. Consider for example the 
       
   388 if-statement:
       
   389 
       
   390 \begin{lstlisting}[mathescape,numbers=none,language={}]
       
   391 if 1 = 1 then x := 2 else y := 3
       
   392 \end{lstlisting}
       
   393 
       
   394 \noindent 
       
   395 The generated code is as follows:
       
   396 
       
   397 \begin{lstlisting}[mathescape,language={}]
       
   398    ldc 1
       
   399    ldc 1
       
   400    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
       
   401    ldc 2
       
   402    istore 0
       
   403    goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$
       
   404 L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$
       
   405    ldc 3
       
   406    istore 1
       
   407 L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$
       
   408 \end{lstlisting}
       
   409 
       
   410 \begin{tikzpicture}[remember picture,overlay]
       
   411   \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) 
       
   412            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);
       
   413   \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) 
       
   414            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);
       
   415 \end{tikzpicture}
       
   416 
       
   417 \noindent The first three lines correspond to the the boolean
       
   418 expression $1 = 1$. The jump for when this boolean expression
       
   419 is false is in Line~3. Lines 4-6 corresponds to the if-branch;
       
   420 the else-branch is in Lines 8 and 9. Note carefully how the
       
   421 environment $E$ is threaded through the recursive calls of
       
   422 \textit{compile}. The function receives an environment $E$,
       
   423 but it might extend it when compiling the if-branch, yielding
       
   424 $E'$. This happens for example in the if-statement above
       
   425 whenever the variable \code{x} has not been used before.
       
   426 Similarly with the environment $E''$ for the second call to
       
   427 \textit{compile}. $E''$ is also the environment that needs to
       
   428 be returned as part of the answer.
       
   429 
       
   430 The compilation of the while-loops, say 
       
   431 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
       
   432 the condition is true and we need to do another iteration, 
       
   433 and the control-flow needs to be as follows
       
   434 
       
   435 \begin{center}
       
   436 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   437  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   438  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   439  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   440 \node (A0) [point, left=of A1] {};
       
   441 \node (A1) [point] {};
       
   442 \node (b) [block, right=of A1] {code of $b$};
       
   443 \node (A2) [point, right=of b] {};
       
   444 \node (cs1) [block, right=of A2] {code of $cs$};
       
   445 \node (A3) [point, right=of cs1] {};
       
   446 \node (A4) [point, right=of A3] {};
       
   447 
       
   448 \draw (A0) edge [->, black, line width=1mm] (b);
       
   449 \draw (b) edge [->, black, line width=1mm] (cs1);
       
   450 \draw (cs1) edge [->, black, line width=1mm] (A3);
       
   451 \draw (A3) edge [->,skip loop] (A1);
       
   452 \end{tikzpicture}
       
   453 \end{center}
       
   454 
       
   455 \noindent Whereas if the condition is \emph{not} true, we
       
   456 need to jump out of the loop, which gives the following
       
   457 control flow.
       
   458 
       
   459 \begin{center}
       
   460 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   461  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   462  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   463  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   464 \node (A0) [point, left=of A1] {};
       
   465 \node (A1) [point] {};
       
   466 \node (b) [block, right=of A1] {code of $b$};
       
   467 \node (A2) [point, right=of b] {};
       
   468 \node (cs1) [block, right=of A2] {code of $cs$};
       
   469 \node (A3) [point, right=of cs1] {};
       
   470 \node (A4) [point, right=of A3] {};
       
   471 
       
   472 \draw (A0) edge [->, black, line width=1mm] (b);
       
   473 \draw (b) edge [->, black, line width=1mm] (A2);
       
   474 \draw (A2) edge [skip loop] (A3);
       
   475 \draw (A3) edge [->, black, line width=1mm] (A4);
       
   476 \end{tikzpicture}
       
   477 \end{center}
       
   478 
       
   479 \noindent Again we can use the \textit{compile}-function for
       
   480 boolean expressions to insert the appropriate jump to the
       
   481 end of the loop (label $L_{wend}$ below).
       
   482 
       
   483 \begin{center}
       
   484 \begin{tabular}{lcl}
       
   485 $\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ 
       
   486 \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
       
   487 \multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
       
   488 \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
       
   489 \multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
       
   490 \multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
       
   491 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
       
   492 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
       
   493 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
       
   494 \end{tabular}
       
   495 \end{center}
       
   496 
       
   497 \noindent I let you go through how this clause works. As an example
       
   498 you can consider the while-loop
       
   499 
       
   500 \begin{lstlisting}[mathescape,numbers=none,language={}]
       
   501 while x <= 10 do x := x + 1
       
   502 \end{lstlisting}
       
   503 
       
   504 \noindent yielding the following code
       
   505 
       
   506 \begin{lstlisting}[mathescape,language={}]
       
   507 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
       
   508    iload 0
       
   509    ldc 10
       
   510    if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
       
   511    iload 0
       
   512    ldc 1
       
   513    iadd
       
   514    istore 0
       
   515    goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
       
   516 L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
       
   517 \end{lstlisting}
       
   518  
       
   519 \begin{tikzpicture}[remember picture,overlay]
       
   520   \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) 
       
   521            -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
       
   522   \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
       
   523            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
       
   524 \end{tikzpicture}
       
   525 
       
   526 
       
   527 Next we need to consider the statement \pcode{write x}, which
       
   528 can be used to print out the content of a variable. For this
       
   529 we need to use a Java library function. In order to avoid
       
   530 having to generate a lot of code for each
       
   531 \pcode{write}-command, we use a separate helper-method and
       
   532 just call this method with an argument (which needs to be
       
   533 placed onto the stack). The code of the helper-method is as
       
   534 follows.
       
   535 
       
   536 
       
   537 \begin{lstlisting}[language=JVMIS]
       
   538 .method public static write(I)V 
       
   539     .limit locals 1 
       
   540     .limit stack 2 
       
   541     getstatic java/lang/System/out Ljava/io/PrintStream; 
       
   542     iload 0
       
   543     invokevirtual java/io/PrintStream/println(I)V 
       
   544     return 
       
   545 .end method
       
   546 \end{lstlisting}
       
   547 
       
   548 \noindent The first line marks the beginning of the method,
       
   549 called \pcode{write}. It takes a single integer argument
       
   550 indicated by the \pcode{(I)} and returns no result, indicated
       
   551 by the \pcode{V}. Since the method has only one argument, we
       
   552 only need a single local variable (Line~2) and a stack with
       
   553 two cells will be sufficient (Line 3). Line 4 instructs the
       
   554 JVM to get the value of the field \pcode{out} of the class
       
   555 \pcode{java/lang/System}. It expects the value to be of type
       
   556 \pcode{java/io/PrintStream}. A reference to this value will be
       
   557 placed on the stack. Line~5 copies the integer we want to
       
   558 print out onto the stack. In the next line we call the method
       
   559 \pcode{println} (from the class \pcode{java/io/PrintStream}).
       
   560 We want to print out an integer and do not expect anything
       
   561 back (that is why the type annotation is \pcode{(I)V}). The
       
   562 \pcode{return}-instruction in the next line changes the
       
   563 control-flow back to the place from where \pcode{write} was
       
   564 called. This method needs to be part of a header that is
       
   565 included in any code we generate. The helper-method
       
   566 \pcode{write} can be invoked with the two instructions
       
   567 
       
   568 \begin{lstlisting}[mathescape,language=JVMIS]
       
   569 iload $E(x)$ 
       
   570 invokestatic XXX/XXX/write(I)V
       
   571 \end{lstlisting}
       
   572 
       
   573 \noindent where we first place the variable to be printed on
       
   574 top of the stack and then call \pcode{write}. The \pcode{XXX}
       
   575 need to be replaced by an appropriate class name (this will be
       
   576 explained shortly).
       
   577 
       
   578 
       
   579 \begin{figure}[t]
       
   580 \begin{lstlisting}[mathescape,language=JVMIS]
       
   581 .class public XXX.XXX
       
   582 .super java/lang/Object
       
   583 
       
   584 .method public <init>()V
       
   585     aload_0
       
   586     invokenonvirtual java/lang/Object/<init>()V
       
   587     return
       
   588 .end method
       
   589 
       
   590 .method public static main([Ljava/lang/String;)V
       
   591     .limit locals 200
       
   592     .limit stack 200
       
   593 
       
   594       $\textit{\ldots{}here comes the compiled code\ldots}$
       
   595 
       
   596     return
       
   597 .end method
       
   598 \end{lstlisting}
       
   599 \caption{Boilerplate code needed for running generated code.\label{boiler}}
       
   600 \end{figure}
       
   601 
       
   602 
       
   603 By generating code for a While-program, we end up with a list
       
   604 of (JVM assembly) instructions. Unfortunately, there is a bit
       
   605 more boilerplate code needed before these instructions can be
       
   606 run. The complete code is shown in Figure~\ref{boiler}. This
       
   607 boilerplate code is very specific to the JVM. If we target any
       
   608 other virtual machine or a machine language, then we would
       
   609 need to change this code. Lines 4 to 8 in Figure~\ref{boiler}
       
   610 contain a method for object creation in the JVM; this method
       
   611 is called \emph{before} the \pcode{main}-method in Lines 10 to
       
   612 17. Interesting are the Lines 11 and 12 where we hardwire that
       
   613 the stack of our programs will never be larger than 200 and
       
   614 that the maximum number of variables is also 200. This seem to
       
   615 be conservative default values that allow is to run some
       
   616 simple While-programs. In a real compiler, we would of course
       
   617 need to work harder and find out appropriate values for the
       
   618 stack and local variables.
       
   619 
       
   620 To sum up, in Figure~\ref{test} is the complete code generated
       
   621 for the slightly non-sensical program
       
   622 
       
   623 \begin{lstlisting}[mathescape,language=While]
       
   624 x := 1 + 2;
       
   625 write x
       
   626 \end{lstlisting}
       
   627 
       
   628 \noindent Having this code at our disposal, we need the
       
   629 assembler to translate the generated code into JVM bytecode (a
       
   630 class file). This bytecode is understood by the JVM and can be
       
   631 run by just invoking the \pcode{java}-program.
       
   632 
       
   633 
       
   634 \begin{figure}[p]
       
   635 \lstinputlisting{../progs/test-small.j}
       
   636 \caption{Generated code for a test program. This code can be 
       
   637 processed by an Java assembler producing a class-file, which
       
   638 can be run by the \pcode{java}-program.\label{test}}
       
   639 \end{figure}
       
   640 
       
   641 \end{document}
   211 \end{document}
   642 
   212 
   643 %%% Local Variables: 
   213 %%% Local Variables: 
   644 %%% mode: latex  
   214 %%% mode: latex  
   645 %%% TeX-master: t
   215 %%% TeX-master: t