handouts/ho08.tex
changeset 378 e8ac05fe2630
child 379 fa2589ec0fae
equal deleted inserted replaced
377:a052a83f562e 378:e8ac05fe2630
       
     1 \documentclass{article}
       
     2 \usepackage{../style}
       
     3 \usepackage{../langs}
       
     4 \usepackage{../grammar}
       
     5 \usepackage{../graphics}
       
     6 
       
     7 
       
     8 \begin{document}
       
     9 
       
    10 \section*{Handout 8 (A Functional Language)}
       
    11 
       
    12 The purpose of a compiler is to transform a program, a human
       
    13 can read and write, into code the machine can run as fast as
       
    14 possible. The fastest code would be machine code the CPU can
       
    15 run directly, but it is often enough to improve the speed of a
       
    16 program by just targeting a virtual machine. This produces not
       
    17 the fastest possible code, but code that is fast enough and
       
    18 has the advantage that the virtual machine takes care of
       
    19 things a compiler would normally need to take care of (like
       
    20 explicit memory management).
       
    21 
       
    22 As a first example we will implement a compiler for the very
       
    23 simple While-language. It will generate code for the Java
       
    24 Virtual Machine (JVM). This is a stack-based virtual machine,
       
    25 a fact which will make it easy to generate code for arithmetic
       
    26 expressions. For example for generating code for the
       
    27 expression $1 + 2$ we need to generate the following three
       
    28 instructions
       
    29 
       
    30 \begin{lstlisting}[numbers=none]
       
    31 ldc 1
       
    32 ldc 2
       
    33 iadd 
       
    34 \end{lstlisting}
       
    35 
       
    36 \noindent The first instruction loads the constant $1$ onto
       
    37 the stack, the next one $2$, the third instruction adds both
       
    38 numbers together replacing the top two elements of the stack
       
    39 with the result $3$. For simplicity, we will throughout
       
    40 consider only integer numbers and results. Therefore we can
       
    41 use the JVM instructions \code{iadd}, \code{isub},
       
    42 \code{imul}, \code{idiv} and so on. The \code{i} stands for
       
    43 integer instructions in the JVM (alternatives are \code{d} for
       
    44 doubles, \code{l} for longs and \code{f} for floats).
       
    45 
       
    46 Recall our grammar for arithmetic expressions ($E$ is the
       
    47 starting symbol):
       
    48 
       
    49 
       
    50 \begin{plstx}[rhs style=, margin=3cm]
       
    51 : \meta{E} ::= \meta{T} $+$ \meta{E}
       
    52          | \meta{T} $-$ \meta{E}
       
    53          | \meta{T}\\
       
    54 : \meta{T} ::= \meta{F} $*$ \meta{T}
       
    55           | \meta{F} $\backslash$ \meta{T}
       
    56           | \meta{F}\\
       
    57 : \meta{F} ::= ( \meta{E} )
       
    58           | \meta{Id}
       
    59           | \meta{Num}\\
       
    60 \end{plstx}
       
    61 
       
    62 
       
    63 \noindent where \meta{Id} stands for variables and \meta{Num}
       
    64 for numbers. For the moment let us omit variables from
       
    65 arithmetic expressions. Our parser will take this grammar and
       
    66 given an input produce abstract syntax trees. For example for
       
    67 the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
       
    68 following tree.
       
    69 
       
    70 \begin{center}
       
    71 \begin{tikzpicture}
       
    72 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
       
    73 \end{tikzpicture}
       
    74 \end{center}
       
    75 
       
    76 \noindent To generate code for this expression, we need to
       
    77 traverse this tree in post-order fashion and emit code for
       
    78 each node---this traversal in post-order fashion will produce
       
    79 code for a stack-machine (what the JVM is). Doing so for the
       
    80 tree above generates the instructions
       
    81 
       
    82 \begin{lstlisting}[numbers=none]
       
    83 ldc 1 
       
    84 ldc 2 
       
    85 ldc 3 
       
    86 imul 
       
    87 ldc 4 
       
    88 ldc 3 
       
    89 isub 
       
    90 iadd 
       
    91 iadd
       
    92 \end{lstlisting}
       
    93 
       
    94 \noindent If we ``run'' these instructions, the result $8$
       
    95 will be on top of the stack (I leave this to you to verify;
       
    96 the meaning of each instruction should be clear). The result
       
    97 being on the top of the stack will be a convention we always
       
    98 observe in our compiler, that is the results of arithmetic
       
    99 expressions will always be on top of the stack. Note, that a
       
   100 different bracketing of the expression, for example $(1 + (2 *
       
   101 3)) + (4 - 3)$, produces a different abstract syntax tree and
       
   102 thus potentially also a different list of instructions.
       
   103 Generating code in this fashion is rather easy to implement:
       
   104 it can be done with the following recursive
       
   105 \textit{compile}-function, which takes the abstract syntax
       
   106 tree as argument:
       
   107 
       
   108 \begin{center}
       
   109 \begin{tabular}{lcl}
       
   110 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
       
   111 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
       
   112 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
       
   113 $\textit{compile}(a_1 - a_2)$ & $\dn$ & 
       
   114 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
       
   115 $\textit{compile}(a_1 * a_2)$ & $\dn$ & 
       
   116 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
       
   117 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
       
   118 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
       
   119 \end{tabular}
       
   120 \end{center}
       
   121 
       
   122 However, our arithmetic expressions can also contain
       
   123 variables. We will represent them as \emph{local variables} in
       
   124 the JVM. Essentially, local variables are an array or pointers
       
   125 to memory cells, containing in our case only integers. Looking
       
   126 up a variable can be done with the instruction
       
   127 
       
   128 \begin{lstlisting}[mathescape,numbers=none]
       
   129 iload $index$
       
   130 \end{lstlisting}
       
   131 
       
   132 \noindent 
       
   133 which places the content of the local variable $index$ onto 
       
   134 the stack. Storing the top of the stack into a local variable 
       
   135 can be done by the instruction
       
   136 
       
   137 \begin{lstlisting}[mathescape,numbers=none]
       
   138 istore $index$
       
   139 \end{lstlisting}
       
   140 
       
   141 \noindent Note that this also pops off the top of the stack.
       
   142 One problem we have to overcome, however, is that local
       
   143 variables are addressed, not by identifiers, but by numbers
       
   144 (starting from $0$). Therefore our compiler needs to maintain
       
   145 a kind of environment where variables are associated to
       
   146 numbers. This association needs to be unique: if we muddle up
       
   147 the numbers, then we essentially confuse variables and the
       
   148 consequence will usually be an erroneous result. Our extended
       
   149 \textit{compile}-function for arithmetic expressions will
       
   150 therefore take two arguments: the abstract syntax tree and the
       
   151 environment, $E$, that maps identifiers to index-numbers.
       
   152 
       
   153 \begin{center}
       
   154 \begin{tabular}{lcl}
       
   155 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
       
   156 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
       
   157 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
       
   158 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
       
   159 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
       
   160 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
       
   161 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
       
   162 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
       
   163 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
       
   164 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
       
   165 \end{tabular}
       
   166 \end{center}
       
   167 
       
   168 \noindent In the last line we generate the code for variables
       
   169 where $E(x)$ stands for looking up the environment to which
       
   170 index the variable $x$ maps to.
       
   171 
       
   172 There is a similar \textit{compile}-function for boolean
       
   173 expressions, but it includes a ``trick'' to do with
       
   174 \pcode{if}- and \pcode{while}-statements. To explain the issue
       
   175 let us first describe the compilation of statements of the
       
   176 While-language. The clause for \pcode{skip} is trivial, since
       
   177 we do not have to generate any instruction
       
   178 
       
   179 \begin{center}
       
   180 \begin{tabular}{lcl}
       
   181 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
       
   182 \end{tabular}
       
   183 \end{center}
       
   184 
       
   185 \noindent $[]$ is the empty list of instructions. Note that
       
   186 the \textit{compile}-function for statements returns a pair, a
       
   187 list of instructions (in this case the empty list) and an
       
   188 environment for variables. The reason for the environment is
       
   189 that assignments in the While-language might change the
       
   190 environment---clearly if a variable is used for the first
       
   191 time, we need to allocate a new index and if it has been used
       
   192 before, we need to be able to retrieve the associated index.
       
   193 This is reflected in the clause for compiling assignments:
       
   194 
       
   195 \begin{center}
       
   196 \begin{tabular}{lcl}
       
   197 $\textit{compile}(x := a, E)$ & $\dn$ & 
       
   198 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
       
   199 \end{tabular}
       
   200 \end{center}
       
   201 
       
   202 \noindent We first generate code for the right-hand side of
       
   203 the assignment and then add an \pcode{istore}-instruction at
       
   204 the end. By convention the result of the arithmetic expression
       
   205 $a$ will be on top of the stack. After the \pcode{istore}
       
   206 instruction, the result will be stored in the index
       
   207 corresponding to the variable $x$. If the variable $x$ has
       
   208 been used before in the program, we just need to look up what
       
   209 the index is and return the environment unchanged (that is in
       
   210 this case $E' = E$). However, if this is the first encounter 
       
   211 of the variable $x$ in the program, then we have to augment 
       
   212 the environment and assign $x$ with the largest index in $E$
       
   213 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
       
   214 That means for the assignment $x := x + 1$ we generate the
       
   215 following code
       
   216 
       
   217 \begin{lstlisting}[mathescape,numbers=none]
       
   218 iload $n_x$
       
   219 ldc 1
       
   220 iadd
       
   221 istore $n_x$
       
   222 \end{lstlisting}
       
   223 
       
   224 \noindent 
       
   225 where $n_x$ is the index for the variable $x$.
       
   226 
       
   227 More complicated is the code for \pcode{if}-statments, say
       
   228 
       
   229 \begin{lstlisting}[mathescape,language={},numbers=none]
       
   230 if $b$ then $cs_1$ else $cs_2$
       
   231 \end{lstlisting}
       
   232 
       
   233 \noindent where $b$ is a boolean expression and the $cs_{1/2}$
       
   234 are the statements for each \pcode{if}-branch. Lets assume
       
   235 we already generated code for $b$ and $cs_{1/2}$. Then in the
       
   236 true-case the control-flow of the program needs to be
       
   237 
       
   238 \begin{center}
       
   239 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   240  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   241  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   242  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   243 \node (A1) [point] {};
       
   244 \node (b) [block, right=of A1] {code of $b$};
       
   245 \node (A2) [point, right=of b] {};
       
   246 \node (cs1) [block, right=of A2] {code of $cs_1$};
       
   247 \node (A3) [point, right=of cs1] {};
       
   248 \node (cs2) [block, right=of A3] {code of $cs_2$};
       
   249 \node (A4) [point, right=of cs2] {};
       
   250 
       
   251 \draw (A1) edge [->, black, line width=1mm] (b);
       
   252 \draw (b) edge [->, black, line width=1mm] (cs1);
       
   253 \draw (cs1) edge [->, black, line width=1mm] (A3);
       
   254 \draw (A3) edge [->, black, skip loop] (A4);
       
   255 \node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
       
   256 \end{tikzpicture}
       
   257 \end{center}
       
   258 
       
   259 \noindent where we start with running the code for $b$; since
       
   260 we are in the true case we continue with running the code for
       
   261 $cs_1$. After this however, we must not run the code for
       
   262 $cs_2$, but always jump after the last instruction of $cs_2$
       
   263 (the code for the \pcode{else}-branch). Note that this jump is
       
   264 unconditional, meaning we always have to jump to the end of
       
   265 $cs_2$. The corresponding instruction of the JVM is
       
   266 \pcode{goto}. In case $b$ turns out to be false we need the
       
   267 control-flow
       
   268 
       
   269 \begin{center}
       
   270 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   271  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   272  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   273  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   274 \node (A1) [point] {};
       
   275 \node (b) [block, right=of A1] {code of $b$};
       
   276 \node (A2) [point, right=of b] {};
       
   277 \node (cs1) [block, right=of A2] {code of $cs_1$};
       
   278 \node (A3) [point, right=of cs1] {};
       
   279 \node (cs2) [block, right=of A3] {code of $cs_2$};
       
   280 \node (A4) [point, right=of cs2] {};
       
   281 
       
   282 \draw (A1) edge [->, black, line width=1mm] (b);
       
   283 \draw (b) edge [->, black, line width=1mm] (A2);
       
   284 \draw (A2) edge [skip loop] (A3);
       
   285 \draw (A3) edge [->, black, line width=1mm] (cs2);
       
   286 \draw (cs2) edge [->,black, line width=1mm] (A4);
       
   287 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
       
   288 \end{tikzpicture}
       
   289 \end{center}
       
   290 
       
   291 \noindent where we now need a conditional jump (if the
       
   292 if-condition is false) from the end of the code for the 
       
   293 boolean to the beginning of the instructions $cs_2$. Once we 
       
   294 are finished with running $cs_2$ we can continue with whatever
       
   295 code comes after the if-statement.
       
   296 
       
   297 The \pcode{goto} and the conditional jumps need addresses to
       
   298 where the jump should go. Since we are generating assembly
       
   299 code for the JVM, we do not actually have to give (numeric)
       
   300 addresses, but can just attach (symbolic) labels to our code.
       
   301 These labels specify a target for a jump. Therefore the labels
       
   302 need to be unique, as otherwise it would be ambiguous where a
       
   303 jump should go to. A label, say \pcode{L}, is attached to code
       
   304 like
       
   305 
       
   306 \begin{lstlisting}[mathescape,numbers=none]
       
   307 L:
       
   308   $instr_1$
       
   309   $instr_2$
       
   310     $\vdots$
       
   311 \end{lstlisting}
       
   312  
       
   313 \noindent where a label is indicated by a colon. 
       
   314  
       
   315 Recall the ``trick'' with compiling boolean expressions: the 
       
   316 \textit{compile}-function for boolean expressions takes three
       
   317 arguments: an abstract syntax tree, an environment for 
       
   318 variable indices and also the label, $lab$, to where an conditional 
       
   319 jump needs to go. The clause for the expression $a_1 = a_2$, 
       
   320 for example, is as follows:
       
   321 
       
   322 \begin{center}
       
   323 \begin{tabular}{lcl}
       
   324 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ 
       
   325 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$}
       
   326 \end{tabular}
       
   327 \end{center}
       
   328 
       
   329 \noindent where we are first generating code for the
       
   330 subexpressions $a_1$ and $a_2$. This will mean after running
       
   331 the corresponding code there will be two integers on top of
       
   332 the stack. If they are equal, we do not have to do anything
       
   333 (except for popping them off from the stack) and just continue
       
   334 with the next instructions (see control-flow of ifs above).
       
   335 However if they are \emph{not} equal, then we need to
       
   336 (conditionally) jump to the label $lab$. This can be done with
       
   337 the instruction
       
   338 
       
   339 \begin{lstlisting}[mathescape,numbers=none]
       
   340 if_icmpne $lab$
       
   341 \end{lstlisting}
       
   342 
       
   343 \noindent Other jump instructions for boolean operators are
       
   344 
       
   345 \begin{center}
       
   346 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
       
   347 $\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
       
   348 $<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
       
   349 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
       
   350 \end{tabular}
       
   351 \end{center}
       
   352 
       
   353 \noindent and so on. I leave it to you to extend the
       
   354 \textit{compile}-function for the other boolean expressions.
       
   355 Note that we need to jump whenever the boolean is \emph{not}
       
   356 true, which means we have to ``negate'' the jump
       
   357 condition---equals becomes not-equal, less becomes
       
   358 greater-or-equal. If you do not like this design (it can be
       
   359 the source of some nasty, hard-to-detect errors), you can also
       
   360 change the layout of the code and first give the code for the
       
   361 else-branch and then for the if-branch. However in the case
       
   362 of while-loops this way of generating code still seems
       
   363 the most convenient.
       
   364 
       
   365 We are now ready to give the compile function for 
       
   366 if-statments---remember this function returns for staments a 
       
   367 pair consisting of the code and an environment:
       
   368 
       
   369 \begin{center}
       
   370 \begin{tabular}{lcl}
       
   371 $\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ 
       
   372 \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
       
   373 \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
       
   374 \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
       
   375 \multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\
       
   376 \multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\
       
   377 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\
       
   378 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\
       
   379 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\
       
   380 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\
       
   381 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\
       
   382 \end{tabular}
       
   383 \end{center}
       
   384 
       
   385 \noindent In the first two lines we generate two fresh labels
       
   386 for the jump addresses (just before the else-branch and just
       
   387 after). In the next two lines we generate the instructions for
       
   388 the two branches, $is_1$ and $is_2$. The final code will
       
   389 be first the code for $b$ (including the label 
       
   390 just-before-the-else-branch), then the \pcode{goto} for after
       
   391 the else-branch, the label $L_\textit{ifesle}$, followed by
       
   392 the instructions for the else-branch, followed by the 
       
   393 after-the-else-branch label. Consider for example the 
       
   394 if-statement:
       
   395 
       
   396 \begin{lstlisting}[mathescape,numbers=none,language={}]
       
   397 if 1 = 1 then x := 2 else y := 3
       
   398 \end{lstlisting}
       
   399 
       
   400 \noindent 
       
   401 The generated code is as follows:
       
   402 
       
   403 \begin{lstlisting}[mathescape,language={}]
       
   404    ldc 1
       
   405    ldc 1
       
   406    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
       
   407    ldc 2
       
   408    istore 0
       
   409    goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$
       
   410 L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$
       
   411    ldc 3
       
   412    istore 1
       
   413 L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$
       
   414 \end{lstlisting}
       
   415 
       
   416 \begin{tikzpicture}[remember picture,overlay]
       
   417   \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) 
       
   418            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);
       
   419   \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) 
       
   420            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);
       
   421 \end{tikzpicture}
       
   422 
       
   423 \noindent The first three lines correspond to the the boolean
       
   424 expression $1 = 1$. The jump for when this boolean expression
       
   425 is false is in Line~3. Lines 4-6 corresponds to the if-branch;
       
   426 the else-branch is in Lines 8 and 9. Note carefully how the
       
   427 environment $E$ is threaded through the recursive calls of
       
   428 \textit{compile}. The function receives an environment $E$,
       
   429 but it might extend it when compiling the if-branch, yielding
       
   430 $E'$. This happens for example in the if-statement above
       
   431 whenever the variable \code{x} has not been used before.
       
   432 Similarly with the environment $E''$ for the second call to
       
   433 \textit{compile}. $E''$ is also the environment that needs to
       
   434 be returned as part of the answer.
       
   435 
       
   436 The compilation of the while-loops, say 
       
   437 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
       
   438 the condition is true and we need to do another iteration, 
       
   439 and the control-flow needs to be as follows
       
   440 
       
   441 \begin{center}
       
   442 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   443  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   444  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   445  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   446 \node (A0) [point, left=of A1] {};
       
   447 \node (A1) [point] {};
       
   448 \node (b) [block, right=of A1] {code of $b$};
       
   449 \node (A2) [point, right=of b] {};
       
   450 \node (cs1) [block, right=of A2] {code of $cs$};
       
   451 \node (A3) [point, right=of cs1] {};
       
   452 \node (A4) [point, right=of A3] {};
       
   453 
       
   454 \draw (A0) edge [->, black, line width=1mm] (b);
       
   455 \draw (b) edge [->, black, line width=1mm] (cs1);
       
   456 \draw (cs1) edge [->, black, line width=1mm] (A3);
       
   457 \draw (A3) edge [->,skip loop] (A1);
       
   458 \end{tikzpicture}
       
   459 \end{center}
       
   460 
       
   461 \noindent Whereas if the condition is \emph{not} true, we
       
   462 need to jump out of the loop, which gives the following
       
   463 control flow.
       
   464 
       
   465 \begin{center}
       
   466 \begin{tikzpicture}[node distance=2mm and 4mm,
       
   467  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
       
   468  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
       
   469  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
       
   470 \node (A0) [point, left=of A1] {};
       
   471 \node (A1) [point] {};
       
   472 \node (b) [block, right=of A1] {code of $b$};
       
   473 \node (A2) [point, right=of b] {};
       
   474 \node (cs1) [block, right=of A2] {code of $cs$};
       
   475 \node (A3) [point, right=of cs1] {};
       
   476 \node (A4) [point, right=of A3] {};
       
   477 
       
   478 \draw (A0) edge [->, black, line width=1mm] (b);
       
   479 \draw (b) edge [->, black, line width=1mm] (A2);
       
   480 \draw (A2) edge [skip loop] (A3);
       
   481 \draw (A3) edge [->, black, line width=1mm] (A4);
       
   482 \end{tikzpicture}
       
   483 \end{center}
       
   484 
       
   485 \noindent Again we can use the \textit{compile}-function for
       
   486 boolean expressions to insert the appropriate jump to the
       
   487 end of the loop (label $L_{wend}$ below).
       
   488 
       
   489 \begin{center}
       
   490 \begin{tabular}{lcl}
       
   491 $\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ 
       
   492 \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
       
   493 \multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
       
   494 \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
       
   495 \multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
       
   496 \multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
       
   497 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
       
   498 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
       
   499 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
       
   500 \end{tabular}
       
   501 \end{center}
       
   502 
       
   503 \noindent I let you go through how this clause works. As an example
       
   504 you can consider the while-loop
       
   505 
       
   506 \begin{lstlisting}[mathescape,numbers=none,language={}]
       
   507 while x <= 10 do x := x + 1
       
   508 \end{lstlisting}
       
   509 
       
   510 \noindent yielding the following code
       
   511 
       
   512 \begin{lstlisting}[mathescape,language={}]
       
   513 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
       
   514    iload 0
       
   515    ldc 10
       
   516    if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
       
   517    iload 0
       
   518    ldc 1
       
   519    iadd
       
   520    istore 0
       
   521    goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
       
   522 L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
       
   523 \end{lstlisting}
       
   524  
       
   525 \begin{tikzpicture}[remember picture,overlay]
       
   526   \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) 
       
   527            -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
       
   528   \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
       
   529            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
       
   530 \end{tikzpicture}
       
   531 
       
   532 
       
   533 Next we need to consider the statement \pcode{write x}, which
       
   534 can be used to print out the content of a variable. For this
       
   535 we need to use a Java library function. In order to avoid
       
   536 having to generate a lot of code for each
       
   537 \pcode{write}-command, we use a separate helper-method and
       
   538 just call this method with an argument (which needs to be
       
   539 placed onto the stack). The code of the helper-method is as
       
   540 follows.
       
   541 
       
   542 
       
   543 \begin{lstlisting}[language=JVMIS]
       
   544 .method public static write(I)V 
       
   545     .limit locals 1 
       
   546     .limit stack 2 
       
   547     getstatic java/lang/System/out Ljava/io/PrintStream; 
       
   548     iload 0
       
   549     invokevirtual java/io/PrintStream/println(I)V 
       
   550     return 
       
   551 .end method
       
   552 \end{lstlisting}
       
   553 
       
   554 \noindent The first line marks the beginning of the method,
       
   555 called \pcode{write}. It takes a single integer argument
       
   556 indicated by the \pcode{(I)} and returns no result, indicated
       
   557 by the \pcode{V}. Since the method has only one argument, we
       
   558 only need a single local variable (Line~2) and a stack with
       
   559 two cells will be sufficient (Line 3). Line 4 instructs the
       
   560 JVM to get the value of the field \pcode{out} of the class
       
   561 \pcode{java/lang/System}. It expects the value to be of type
       
   562 \pcode{java/io/PrintStream}. A reference to this value will be
       
   563 placed on the stack. Line~5 copies the integer we want to
       
   564 print out onto the stack. In the next line we call the method
       
   565 \pcode{println} (from the class \pcode{java/io/PrintStream}).
       
   566 We want to print out an integer and do not expect anything
       
   567 back (that is why the type annotation is \pcode{(I)V}). The
       
   568 \pcode{return}-instruction in the next line changes the
       
   569 control-flow back to the place from where \pcode{write} was
       
   570 called. This method needs to be part of a header that is
       
   571 included in any code we generate. The helper-method
       
   572 \pcode{write} can be invoked with the two instructions
       
   573 
       
   574 \begin{lstlisting}[mathescape,language=JVMIS]
       
   575 iload $E(x)$ 
       
   576 invokestatic XXX/XXX/write(I)V
       
   577 \end{lstlisting}
       
   578 
       
   579 \noindent where we first place the variable to be printed on
       
   580 top of the stack and then call \pcode{write}. The \pcode{XXX}
       
   581 need to be replaced by an appropriate class name (this will be
       
   582 explained shortly).
       
   583 
       
   584 
       
   585 \begin{figure}[t]
       
   586 \begin{lstlisting}[mathescape,language=JVMIS]
       
   587 .class public XXX.XXX
       
   588 .super java/lang/Object
       
   589 
       
   590 .method public <init>()V
       
   591     aload_0
       
   592     invokenonvirtual java/lang/Object/<init>()V
       
   593     return
       
   594 .end method
       
   595 
       
   596 .method public static main([Ljava/lang/String;)V
       
   597     .limit locals 200
       
   598     .limit stack 200
       
   599 
       
   600       $\textit{\ldots{}here comes the compiled code\ldots}$
       
   601 
       
   602     return
       
   603 .end method
       
   604 \end{lstlisting}
       
   605 \caption{Boilerplate code needed for running generated code.\label{boiler}}
       
   606 \end{figure}
       
   607 
       
   608 
       
   609 By generating code for a While-program, we end up with a list
       
   610 of (JVM assembly) instructions. Unfortunately, there is a bit
       
   611 more boilerplate code needed before these instructions can be
       
   612 run. The complete code is shown in Figure~\ref{boiler}. This
       
   613 boilerplate code is very specific to the JVM. If we target any
       
   614 other virtual machine or a machine language, then we would
       
   615 need to change this code. Lines 4 to 8 in Figure~\ref{boiler}
       
   616 contain a method for object creation in the JVM; this method
       
   617 is called \emph{before} the \pcode{main}-method in Lines 10 to
       
   618 17. Interesting are the Lines 11 and 12 where we hardwire that
       
   619 the stack of our programs will never be larger than 200 and
       
   620 that the maximum number of variables is also 200. This seem to
       
   621 be conservative default values that allow is to run some
       
   622 simple While-programs. In a real compiler, we would of course
       
   623 need to work harder and find out appropriate values for the
       
   624 stack and local variables.
       
   625 
       
   626 To sum up, in Figure~\ref{test} is the complete code generated
       
   627 for the slightly non-sensical program
       
   628 
       
   629 \begin{lstlisting}[mathescape,language=While]
       
   630 x := 1 + 2;
       
   631 write x
       
   632 \end{lstlisting}
       
   633 
       
   634 \noindent Having this code at our disposal, we need the
       
   635 assembler to translate the generated code into JVM bytecode (a
       
   636 class file). This bytecode is understood by the JVM and can be
       
   637 run by just invoking the \pcode{java}-program.
       
   638 
       
   639 
       
   640 \begin{figure}[p]
       
   641 \lstinputlisting{../progs/test-small.j}
       
   642 \caption{Generated code for a test program. This code can be 
       
   643 processed by an Java assembler producing a class-file, which
       
   644 can be run by the \pcode{java}-program.\label{test}}
       
   645 \end{figure}
       
   646 
       
   647 \end{document}
       
   648 
       
   649 %%% Local Variables: 
       
   650 %%% mode: latex  
       
   651 %%% TeX-master: t
       
   652 %%% End: