handouts/ho07.tex
changeset 708 4980f421b3b0
parent 705 bfc8703b1527
child 709 c112a6cb5e52
equal deleted inserted replaced
707:2fcd7c2da729 708:4980f421b3b0
     3 \usepackage{../style}
     3 \usepackage{../style}
     4 \usepackage{../langs}
     4 \usepackage{../langs}
     5 \usepackage{../grammar}
     5 \usepackage{../grammar}
     6 \usepackage{../graphics}
     6 \usepackage{../graphics}
     7 
     7 
     8 
       
     9 %% add safety check on references...whether it is above 0 and below the 
     8 %% add safety check on references...whether it is above 0 and below the 
    10 %% index
     9 %% index
    11 
    10 
       
    11 
       
    12 
       
    13 
    12 \begin{document}
    14 \begin{document}
    13 \fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019}
    15 \fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019, 2020}
    14 
    16 
    15 \section*{Handout 7 (Compilation)}
    17 \section*{Handout 7 (Compilation)}
    16 
    18 
    17 The purpose of a compiler is to transform a program a human can read and
    19 The purpose of a compiler is to transform a program a human can read and
    18 write into code the machine can run as fast as possible. The fastest
    20 write into code the machine can run as fast as possible. The fastest
    22 that is often pretty fast. This way of producing code has the advantage that
    24 that is often pretty fast. This way of producing code has the advantage that
    23 the virtual machine takes care of things a compiler would normally need
    25 the virtual machine takes care of things a compiler would normally need
    24 to take care of (like explicit memory management). 
    26 to take care of (like explicit memory management). 
    25 
    27 
    26 As a first example in this module we will implement a compiler for the
    28 As a first example in this module we will implement a compiler for the
    27 very simple While-language. It will generate code for the Java Virtual
    29 very simple WHILE-language that we parsed in the last lecture. The
    28 Machine (JVM). Unfortunately the Java ecosystem does not come with an
    30 compiler will target the Java Virtual Machine (JVM), but not directly.
    29 assembler which would be handy for our  compiler-endeavour  (unlike
    31 Pictorially the compiler will work as follows:
    30 Microsoft's  Common Language Infrastructure for the .Net platform which
    32 
    31 has an assembler out-of-the-box). As a substitute we use in this module
    33 \begin{center}
    32 the 3rd-party programs Jasmin and Krakatau
    34   \begin{tikzpicture}[scale=1,font=\bf,
       
    35                       node/.style={
       
    36                       rectangle,rounded corners=3mm,
       
    37                       ultra thick,draw=black!50,minimum height=18mm, 
       
    38                       minimum width=20mm,
       
    39                       top color=white,bottom color=black!20}]
       
    40 
       
    41   \node (0) at (-3,0) {};  
       
    42   \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler};
       
    43   \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau};
       
    44   \node (C) at (7.5,0) [node] {JVM};
       
    45  
       
    46   \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); 
       
    47   \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); 
       
    48   \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); 
       
    49   \end{tikzpicture}
       
    50   \end{center}
       
    51 
       
    52 \noindent
       
    53 The input will be WHILE-programs; the output will be assembly files
       
    54 (with the ending .j). Assembly files essentially contain human-readable
       
    55 machine code, meaning they are not just bits and bytes, but rather
       
    56 something you can read and understand---with a bit of practice of
       
    57 course. An \emph{assembler} will then translate the assembly files into
       
    58 unreadable class or binary files the JVM can run. Unfortunately, the
       
    59 Java ecosystem does not come with an assembler which would be handy for
       
    60 our compiler-endeavour  (unlike Microsoft's  Common Language
       
    61 Infrastructure for the .Net platform which has an assembler
       
    62 out-of-the-box). As a substitute we shall therefore use the 3rd-party
       
    63 programs Jasmin and Krakatau 
    33 
    64 
    34 \begin{itemize}
    65 \begin{itemize}
    35   \item \url{http://jasmin.sourceforge.net}
    66   \item \url{http://jasmin.sourceforge.net}
    36   \item \url{https://github.com/Storyyeller/Krakatau}
    67   \item \url{https://github.com/Storyyeller/Krakatau}
    37 \end{itemize}
    68 \end{itemize}
    92 \begin{tikzpicture}
   123 \begin{tikzpicture}
    93 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
   124 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
    94 \end{tikzpicture}
   125 \end{tikzpicture}
    95 \end{center}
   126 \end{center}
    96 
   127 
    97 \noindent To generate JVM code for this expression, we need to
   128 \noindent To generate JVM code for this expression, we need to traverse
    98 traverse this tree in post-order fashion and emit code for
   129 this tree in \emph{post-order} fashion and emit code for each
    99 each node---this traversal in post-order fashion will produce
   130 node---this traversal in \emph{post-order} fashion will produce code for
   100 code for a stack-machine (what the JVM is). Doing so for the
   131 a stack-machine (which is what the JVM is). Doing so for the tree above
   101 tree above generates the instructions
   132 generates the instructions
   102 
   133 
   103 \begin{lstlisting}[language=JVMIS,numbers=none]
   134 \begin{lstlisting}[language=JVMIS,numbers=none]
   104 ldc 1 
   135 ldc 1 
   105 ldc 2 
   136 ldc 2 
   106 ldc 3 
   137 ldc 3 
   119 that a different bracketing of the expression, for example $(1 + (2 *
   150 that a different bracketing of the expression, for example $(1 + (2 *
   120 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
   151 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
   121 a different list of instructions. Generating code in this
   152 a different list of instructions. Generating code in this
   122 post-order-traversal fashion is rather easy to implement: it can be done
   153 post-order-traversal fashion is rather easy to implement: it can be done
   123 with the following recursive \textit{compile}-function, which takes the
   154 with the following recursive \textit{compile}-function, which takes the
   124 abstract syntax tree as argument:
   155 abstract syntax tree as an argument:
   125 
   156 
   126 \begin{center}
   157 \begin{center}
   127 \begin{tabular}{lcl}
   158 \begin{tabular}{lcl}
   128 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
   159 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
   129 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
   160 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
   135 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
   166 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
   136 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
   167 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
   137 \end{tabular}
   168 \end{tabular}
   138 \end{center}
   169 \end{center}
   139 
   170 
   140 However, our arithmetic expressions can also contain
   171 This is all fine, but our arithmetic expressions can clearly contain
   141 variables. We will represent them as \emph{local variables} in
   172 variables and then this code will not be good enough. To fix this we
   142 the JVM. Essentially, local variables are an array or pointers
   173 will represent our variables as the \emph{local variables} in the JVM.
   143 to memory cells, containing in our case only integers. Looking
   174 Essentially, local variables are an array or pointers to memory cells,
   144 up a variable can be done with the instruction
   175 containing in our case only integers. Looking up a variable can be done
       
   176 with the instruction
   145 
   177 
   146 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   178 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   147 iload $index$
   179 iload $index$
   148 \end{lstlisting}
   180 \end{lstlisting}
   149 
   181 
   154 
   186 
   155 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   187 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   156 istore $index$
   188 istore $index$
   157 \end{lstlisting}
   189 \end{lstlisting}
   158 
   190 
   159 \noindent Note that this also pops off the top of the stack.
   191 \noindent Note that this also pops off the top of the stack. One problem
   160 One problem we have to overcome, however, is that local
   192 we have to overcome, however, is that local variables are addressed, not
   161 variables are addressed, not by identifiers, but by numbers
   193 by identifiers (like \texttt{x}, \texttt{foo} and so on), but by numbers
   162 (starting from $0$). Therefore our compiler needs to maintain
   194 (starting from $0$). Therefore our compiler needs to maintain a kind of
   163 a kind of environment where variables are associated to
   195 environment where variables are associated to numbers. This association
   164 numbers. This association needs to be unique: if we muddle up
   196 needs to be unique: if we muddle up the numbers, then we essentially
   165 the numbers, then we essentially confuse variables and the
   197 confuse variables and the consequence will usually be an erroneous
   166 consequence will usually be an erroneous result. Our extended
   198 result. Our extended \textit{compile}-function for arithmetic
   167 \textit{compile}-function for arithmetic expressions will
   199 expressions will therefore take two arguments: the abstract syntax tree
   168 therefore take two arguments: the abstract syntax tree and an
   200 and an environment, $E$, that maps identifiers to index-numbers.
   169 environment, $E$, that maps identifiers to index-numbers.
       
   170 
   201 
   171 \begin{center}
   202 \begin{center}
   172 \begin{tabular}{lcl}
   203 \begin{tabular}{lcl}
   173 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
   204 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
   174 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
   205 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
   181 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
   212 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
   182 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
   213 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
   183 \end{tabular}
   214 \end{tabular}
   184 \end{center}
   215 \end{center}
   185 
   216 
   186 \noindent In the last line we generate the code for variables
   217 \noindent In the last line we generate the code for variables where
   187 where $E(x)$ stands for looking up the environment to which
   218 $E(x)$ stands for looking up the environment to which index the variable
   188 index the variable $x$ maps to. This is similar to an interpreter,
   219 $x$ maps to. This is similar to the interpreter we saw earlier in the
   189 which also needs an environment: the difference is that the 
   220 module, which also needs an environment: the difference is that the
   190 interpreter maintains a mapping from variables to current values (what is the
   221 interpreter maintains a mapping from variables to current values (what
   191 currently the value of a variable), while compilers need a mapping
   222 is the currently the value of a variable?), while compilers need a
   192 from variables to memory locations (where can I find the current 
   223 mapping from variables to memory locations (where can I find the current
   193 value for the variable in memory).
   224 value for the variable in memory?).
   194 
   225 
   195 There is a similar \textit{compile}-function for boolean
   226 There is a similar \textit{compile}-function for boolean
   196 expressions, but it includes a ``trick'' to do with
   227 expressions, but it includes a ``trick'' to do with
   197 \pcode{if}- and \pcode{while}-statements. To explain the issue
   228 \pcode{if}- and \pcode{while}-statements. To explain the issue
   198 let us first describe the compilation of statements of the
   229 let us first describe the compilation of statements of the
   199 While-language. The clause for \pcode{skip} is trivial, since
   230 WHILE-language. The clause for \pcode{skip} is trivial, since
   200 we do not have to generate any instruction
   231 we do not have to generate any instruction
   201 
   232 
   202 \begin{center}
   233 \begin{center}
   203 \begin{tabular}{lcl}
   234 \begin{tabular}{lcl}
   204 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
   235 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
   207 
   238 
   208 \noindent whereby $[]$ is the empty list of instructions. Note that
   239 \noindent whereby $[]$ is the empty list of instructions. Note that
   209 the \textit{compile}-function for statements returns a pair, a
   240 the \textit{compile}-function for statements returns a pair, a
   210 list of instructions (in this case the empty list) and an
   241 list of instructions (in this case the empty list) and an
   211 environment for variables. The reason for the environment is
   242 environment for variables. The reason for the environment is
   212 that assignments in the While-language might change the
   243 that assignments in the WHILE-language might change the
   213 environment---clearly if a variable is used for the first
   244 environment---clearly if a variable is used for the first
   214 time, we need to allocate a new index and if it has been used
   245 time, we need to allocate a new index and if it has been used
   215 before, then we need to be able to retrieve the associated index.
   246 before, then we need to be able to retrieve the associated index.
   216 This is reflected in the clause for compiling assignments, say
   247 This is reflected in the clause for compiling assignments, say
   217 $\textit{x := a}$:
   248 $\textit{x := a}$:
   221 $\textit{compile}(x := a, E)$ & $\dn$ & 
   252 $\textit{compile}(x := a, E)$ & $\dn$ & 
   222 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
   253 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
   223 \end{tabular}
   254 \end{tabular}
   224 \end{center}
   255 \end{center}
   225 
   256 
   226 \noindent We first generate code for the right-hand side of
   257 \noindent We first generate code for the right-hand side of the
   227 the assignment and then add an \pcode{istore}-instruction at
   258 assignment (that is the arithmetic expression $a$) and then add an
   228 the end. By convention the result of the arithmetic expression
   259 \pcode{istore}-instruction at the end. By convention running the code
   229 $a$ will be on top of the stack. After the \pcode{istore}
   260 for the arithmetic expression $a$ will leave the result on top of the
   230 instruction, the result will be stored in the index
   261 stack. After that the \pcode{istore} instruction, the result will be
   231 corresponding to the variable $x$. If the variable $x$ has
   262 stored in the index corresponding to the variable $x$. If the variable
   232 been used before in the program, we just need to look up what
   263 $x$ has been used before in the program, we just need to look up what
   233 the index is and return the environment unchanged (that is in
   264 the index is and return the environment unchanged (that is in this case
   234 this case $E' = E$). However, if this is the first encounter 
   265 $E' = E$). However, if this is the first encounter of the variable $x$
   235 of the variable $x$ in the program, then we have to augment 
   266 in the program, then we have to augment the environment and assign $x$
   236 the environment and assign $x$ with the largest index in $E$
   267 with the largest index in $E$ plus one (that is $E' = E(x \mapsto
   237 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
   268 largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we
   238 To sum up, for the assignment $x := x + 1$ we generate the
   269 generate the following code
   239 following code
       
   240 
   270 
   241 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   271 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   242 iload $n_x$
   272 iload $n_x$
   243 ldc 1
   273 ldc 1
   244 iadd
   274 iadd
   254 $index \;=\; E\textit{.getOrElse}(x, |E|)$
   284 $index \;=\; E\textit{.getOrElse}(x, |E|)$
   255 \end{tabular}
   285 \end{tabular}
   256 \end{center}
   286 \end{center}
   257 
   287 
   258 \noindent
   288 \noindent
   259 In case the environment $E$ contains an index for $x$, we return it.
   289 This implements the idea that in case the environment $E$ contains an
   260 Otherwise we ``create'' a new index by returning the size $|E|$ of the
   290 index for $x$, we return it. Otherwise we ``create'' a new index by
   261 environment (that will be an index that is guaranteed to be not used
   291 returning the size $|E|$ of the environment (that will be an index that
   262 yet). In all this we take advantage of the JVM which provides us with 
   292 is guaranteed not to be used yet). In all this we take advantage of the
   263 a potentially limitless supply of places where we can store values
   293 JVM which provides us with a potentially limitless supply of places
   264 of variables.
   294 where we can store values of variables.
   265 
   295 
   266 A bit more complicated is the generation of code for
   296 A bit more complicated is the generation of code for
   267 \pcode{if}-statements, say
   297 \pcode{if}-statements, say
   268 
   298 
   269 \begin{lstlisting}[mathescape,language={},numbers=none]
   299 \begin{lstlisting}[mathescape,language={},numbers=none]
   270 if $b$ then $cs_1$ else $cs_2$
   300 if $b$ then $cs_1$ else $cs_2$
   271 \end{lstlisting}
   301 \end{lstlisting}
   272 
   302 
   273 \noindent where $b$ is a boolean expression and where both $cs_{1/2}$
   303 \noindent where $b$ is a boolean expression and where both $cs_{1/2}$
   274 are the statements for each of the \pcode{if}-branches. Lets assume we
   304 are the statements for each of the \pcode{if}-branches. Let us assume we
   275 already generated code for $b$ and $cs_{1/2}$. Then in the true-case the
   305 already generated code for $b$ and and the two if-branches $cs_{1/2}$.
   276 control-flow of the program needs to behave as 
   306 Then in the true-case the control-flow of the program needs to behave as
   277 
   307 
   278 \begin{center}
   308 
   279 \begin{tikzpicture}[node distance=2mm and 4mm,
   309 \begin{center}
   280  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   310 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
       
   311  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
       
   312                top color=white,bottom color=black!20},
   281  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   313  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   282  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   314  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   283 \node (A1) [point] {};
   315 \node (A1) [point] {};
   284 \node (b) [block, right=of A1] {code of $b$};
   316 \node (b) [block, right=of A1] {code of $b$};
   285 \node (A2) [point, right=of b] {};
   317 \node (A2) [point, right=of b] {};
   288 \node (cs2) [block, right=of A3] {code of $cs_2$};
   320 \node (cs2) [block, right=of A3] {code of $cs_2$};
   289 \node (A4) [point, right=of cs2] {};
   321 \node (A4) [point, right=of cs2] {};
   290 
   322 
   291 \draw (A1) edge [->, black, line width=1mm] (b);
   323 \draw (A1) edge [->, black, line width=1mm] (b);
   292 \draw (b) edge [->, black, line width=1mm] (cs1);
   324 \draw (b) edge [->, black, line width=1mm] (cs1);
   293 \draw (cs1) edge [->, black, line width=1mm] (A3);
   325 \draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);
   294 \draw (A3) edge [->, black, skip loop] (A4);
   326 \draw (A3) edge [->, black, skip loop] (A4);
   295 \node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
   327 \node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
   296 \end{tikzpicture}
   328 \end{tikzpicture}
   297 \end{center}
   329 \end{center}
   298 
   330 
   299 \noindent where we start with running the code for $b$; since
   331 \noindent where we start with running the code for $b$; since
   300 we are in the true case we continue with running the code for
   332 we are in the true case we continue with running the code for
   301 $cs_1$. After this however, we must not run the code for
   333 $cs_1$. After this however, we must not run the code for
   302 $cs_2$, but always jump after the last instruction of $cs_2$
   334 $cs_2$, but always jump to after the last instruction of $cs_2$
   303 (the code for the \pcode{else}-branch). Note that this jump is
   335 (the code for the \pcode{else}-branch). Note that this jump is
   304 unconditional, meaning we always have to jump to the end of
   336 unconditional, meaning we always have to jump to the end of
   305 $cs_2$. The corresponding instruction of the JVM is
   337 $cs_2$. The corresponding instruction of the JVM is
   306 \pcode{goto}. In case $b$ turns out to be false we need the
   338 \pcode{goto}. In case $b$ turns out to be false we need the
   307 control-flow
   339 control-flow
   308 
   340 
   309 \begin{center}
   341 \begin{center}
   310 \begin{tikzpicture}[node distance=2mm and 4mm,
   342 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
   311  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   343  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
       
   344                top color=white,bottom color=black!20},
   312  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   345  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   313  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   346  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   314 \node (A1) [point] {};
   347 \node (A1) [point] {};
   315 \node (b) [block, right=of A1] {code of $b$};
   348 \node (b) [block, right=of A1] {code of $b$};
   316 \node (A2) [point, right=of b] {};
   349 \node (A2) [point, right=of b] {};
   318 \node (A3) [point, right=of cs1] {};
   351 \node (A3) [point, right=of cs1] {};
   319 \node (cs2) [block, right=of A3] {code of $cs_2$};
   352 \node (cs2) [block, right=of A3] {code of $cs_2$};
   320 \node (A4) [point, right=of cs2] {};
   353 \node (A4) [point, right=of cs2] {};
   321 
   354 
   322 \draw (A1) edge [->, black, line width=1mm] (b);
   355 \draw (A1) edge [->, black, line width=1mm] (b);
   323 \draw (b) edge [->, black, line width=1mm] (A2);
   356 \draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);
   324 \draw (A2) edge [skip loop] (A3);
   357 \draw (A2) edge [skip loop] (A3);
   325 \draw (A3) edge [->, black, line width=1mm] (cs2);
   358 \draw (A3) edge [->, black, line width=1mm] (cs2);
   326 \draw (cs2) edge [->,black, line width=1mm] (A4);
   359 \draw (cs2) edge [->,black, line width=1mm] (A4);
   327 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
   360 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
   328 \end{tikzpicture}
   361 \end{tikzpicture}
   348   $instr_1$
   381   $instr_1$
   349   $instr_2$
   382   $instr_2$
   350     $\vdots$
   383     $\vdots$
   351 \end{lstlisting}
   384 \end{lstlisting}
   352  
   385  
   353 \noindent where a label is indicated by a colon. The task of the
   386 \noindent where the label needs to be followed by a colon. The task of
   354 assmbler (in our case Jasmin or Krakatau) is to resolve the labels
   387 the assembler (in our case Jasmin or Krakatau) is to resolve the labels
   355 to actual addresses, for example jump 10 instructions forward,
   388 to actual (numeric) addresses, for example jump 10 instructions forward,
   356 or 20 instructions backwards.
   389 or 20 instructions backwards.
   357  
   390  
   358 Recall the ``trick'' with compiling boolean expressions: the 
   391 Recall the ``trick'' with compiling boolean expressions: the 
   359 \textit{compile}-function for boolean expressions takes three
   392 \textit{compile}-function for boolean expressions takes three
   360 arguments: an abstract syntax tree, an environment for 
   393 arguments: an abstract syntax tree, an environment for 
   381 
   414 
   382 \begin{lstlisting}[mathescape,numbers=none,language=JVMIS]
   415 \begin{lstlisting}[mathescape,numbers=none,language=JVMIS]
   383 if_icmpne $lab$
   416 if_icmpne $lab$
   384 \end{lstlisting}
   417 \end{lstlisting}
   385 
   418 
   386 \noindent Other jump instructions for boolean operators are
   419 To sum up, the third argument in the compile function for booleans
       
   420 specifies where to jump, in case the condition is \emph{not} true. I
       
   421 leave it to you to extend the \textit{compile}-function for the other
       
   422 boolean expressions. Note that we need to jump whenever the boolean is
       
   423 \emph{not} true, which means we have to ``negate'' the jump
       
   424 condition---equals becomes not-equal, less becomes greater-or-equal. 
       
   425 Other jump instructions for boolean operators are
   387 
   426 
   388 \begin{center}
   427 \begin{center}
   389 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
   428 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
   390 $\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
   429 $\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
   391 $<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
   430 $<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
   392 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
   431 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
   393 \end{tabular}
   432 \end{tabular}
   394 \end{center}
   433 \end{center}
   395 
   434 
   396 \noindent and so on. I leave it to you to extend the
   435 \noindent and so on.   If you do not like this design (it can be the
   397 \textit{compile}-function for the other boolean expressions. Note that
       
   398 we need to jump whenever the boolean is \emph{not} true, which means we
       
   399 have to ``negate'' the jump condition---equals becomes not-equal, less
       
   400 becomes greater-or-equal. If you do not like this design (it can be the
       
   401 source of some nasty, hard-to-detect errors), you can also change the
   436 source of some nasty, hard-to-detect errors), you can also change the
   402 layout of the code and first give the code for the else-branch and then
   437 layout of the code and first give the code for the else-branch and then
   403 for the if-branch. However in the case of while-loops this
   438 for the if-branch. However in the case of while-loops this
   404 ``upside-down-inside-out'' way of generating code still seems the most
   439 ``upside-down-inside-out'' way of generating code still seems the most
   405 convenient.
   440 convenient.
   479 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
   514 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
   480 the condition is true and we need to do another iteration, 
   515 the condition is true and we need to do another iteration, 
   481 and the control-flow needs to be as follows
   516 and the control-flow needs to be as follows
   482 
   517 
   483 \begin{center}
   518 \begin{center}
   484 \begin{tikzpicture}[node distance=2mm and 4mm,
   519 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
   485  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   520  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
       
   521                top color=white,bottom color=black!20},
   486  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   522  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   487  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   523  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   488 \node (A0) [point, left=of A1] {};
   524 \node (A0) [point, left=of A1] {};
   489 \node (A1) [point] {};
   525 \node (A1) [point] {};
   490 \node (b) [block, right=of A1] {code of $b$};
   526 \node (b) [block, right=of A1] {code of $b$};
   493 \node (A3) [point, right=of cs1] {};
   529 \node (A3) [point, right=of cs1] {};
   494 \node (A4) [point, right=of A3] {};
   530 \node (A4) [point, right=of A3] {};
   495 
   531 
   496 \draw (A0) edge [->, black, line width=1mm] (b);
   532 \draw (A0) edge [->, black, line width=1mm] (b);
   497 \draw (b) edge [->, black, line width=1mm] (cs1);
   533 \draw (b) edge [->, black, line width=1mm] (cs1);
   498 \draw (cs1) edge [->, black, line width=1mm] (A3);
   534 \draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);
   499 \draw (A3) edge [->,skip loop] (A1);
   535 \draw (A3) edge [->,skip loop] (A1);
   500 \end{tikzpicture}
   536 \end{tikzpicture}
   501 \end{center}
   537 \end{center}
   502 
   538 
   503 \noindent Whereas if the condition is \emph{not} true, we
   539 \noindent Whereas if the condition is \emph{not} true, we
   504 need to jump out of the loop, which gives the following
   540 need to jump out of the loop, which gives the following
   505 control flow.
   541 control flow.
   506 
   542 
   507 \begin{center}
   543 \begin{center}
   508 \begin{tikzpicture}[node distance=2mm and 4mm,
   544 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
   509  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   545  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
       
   546                top color=white,bottom color=black!20},
   510  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   547  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   511  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   548  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
   512 \node (A0) [point, left=of A1] {};
   549 \node (A0) [point, left=of A1] {};
   513 \node (A1) [point] {};
   550 \node (A1) [point] {};
   514 \node (b) [block, right=of A1] {code of $b$};
   551 \node (b) [block, right=of A1] {code of $b$};
   516 \node (cs1) [block, right=of A2] {code of $cs$};
   553 \node (cs1) [block, right=of A2] {code of $cs$};
   517 \node (A3) [point, right=of cs1] {};
   554 \node (A3) [point, right=of cs1] {};
   518 \node (A4) [point, right=of A3] {};
   555 \node (A4) [point, right=of A3] {};
   519 
   556 
   520 \draw (A0) edge [->, black, line width=1mm] (b);
   557 \draw (A0) edge [->, black, line width=1mm] (b);
   521 \draw (b) edge [->, black, line width=1mm] (A2);
   558 \draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);
   522 \draw (A2) edge [skip loop] (A3);
   559 \draw (A2) edge [skip loop] (A3);
   523 \draw (A3) edge [->, black, line width=1mm] (A4);
   560 \draw (A3) edge [->, black, line width=1mm] (A4);
   524 \end{tikzpicture}
   561 \end{tikzpicture}
   525 \end{center}
   562 \end{center}
   526 
   563 
   570   \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
   607   \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
   571            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
   608            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
   572 \end{tikzpicture}
   609 \end{tikzpicture}
   573 
   610 
   574 \noindent
   611 \noindent
   575 I leave it to you to read the code and follow its controlflow.
   612 As said, I leave it to you to decide whether the code implements
   576 
   613 the usual controlflow of while-loops.
   577 Next we need to consider the statement \pcode{write x}, which
   614 
   578 can be used to print out the content of a variable. For this
   615 Next we need to consider the statement \pcode{write x}, which can be
   579 we need to use a Java library function. In order to avoid
   616 used to print out the content of a variable. For this we shall use a
   580 having to generate a lot of code for each
   617 Java library function. In order to avoid having to generate a lot of
   581 \pcode{write}-command, we use a separate helper-method and
   618 code for each \pcode{write}-command, we use a separate helper-method and
   582 just call this method with an argument (which needs to be
   619 just call this method with an appropriate argument (which of course
   583 placed onto the stack). The code of the helper-method is as
   620 needs to be placed onto the stack). The code of the helper-method is as
   584 follows.
   621 follows.
   585 
       
   586 
   622 
   587 \begin{lstlisting}[language=JVMIS,numbers=left]
   623 \begin{lstlisting}[language=JVMIS,numbers=left]
   588 .method public static write(I)V 
   624 .method public static write(I)V 
   589     .limit locals 1 
   625     .limit locals 1 
   590     .limit stack 2 
   626     .limit stack 2 
   599 called \pcode{write}. It takes a single integer argument
   635 called \pcode{write}. It takes a single integer argument
   600 indicated by the \pcode{(I)} and returns no result, indicated
   636 indicated by the \pcode{(I)} and returns no result, indicated
   601 by the \pcode{V}. Since the method has only one argument, we
   637 by the \pcode{V}. Since the method has only one argument, we
   602 only need a single local variable (Line~2) and a stack with
   638 only need a single local variable (Line~2) and a stack with
   603 two cells will be sufficient (Line 3). Line 4 instructs the
   639 two cells will be sufficient (Line 3). Line 4 instructs the
   604 JVM to get the value of the field \pcode{out} of the class
   640 JVM to get the value of the mem  \pcode{out} of the class
   605 \pcode{java/lang/System}. It expects the value to be of type
   641 \pcode{java/lang/System}. It expects the value to be of type
   606 \pcode{java/io/PrintStream}. A reference to this value will be
   642 \pcode{java/io/PrintStream}. A reference to this value will be
   607 placed on the stack. Line~5 copies the integer we want to
   643 placed on the stack. Line~5 copies the integer we want to
   608 print out onto the stack. In the next line we call the method
   644 print out onto the stack. In the next line we call the method
   609 \pcode{println} (from the class \pcode{java/io/PrintStream}).
   645 \pcode{println} (from the class \pcode{java/io/PrintStream}).
   624 top of the stack and then call \pcode{write}. The \pcode{XXX}
   660 top of the stack and then call \pcode{write}. The \pcode{XXX}
   625 need to be replaced by an appropriate class name (this will be
   661 need to be replaced by an appropriate class name (this will be
   626 explained shortly).
   662 explained shortly).
   627 
   663 
   628 
   664 
   629 \begin{figure}[t]
   665 By generating code for a WHILE-program, we end up with a list
   630 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
       
   631 .class public XXX.XXX
       
   632 .super java/lang/Object
       
   633 
       
   634 .method public <init>()V
       
   635     aload_0
       
   636     invokenonvirtual java/lang/Object/<init>()V
       
   637     return
       
   638 .end method
       
   639 
       
   640 .method public static main([Ljava/lang/String;)V
       
   641     .limit locals 200
       
   642     .limit stack 200
       
   643 
       
   644       $\textit{\ldots{}here comes the compiled code\ldots}$
       
   645 
       
   646     return
       
   647 .end method
       
   648 \end{lstlisting}
       
   649 \caption{Boilerplate code needed for running generated code.\label{boiler}}
       
   650 \end{figure}
       
   651 
       
   652 
       
   653 By generating code for a While-program, we end up with a list
       
   654 of (JVM assembly) instructions. Unfortunately, there is a bit
   666 of (JVM assembly) instructions. Unfortunately, there is a bit
   655 more boilerplate code needed before these instructions can be
   667 more boilerplate code needed before these instructions can be
   656 run. The complete code is shown in Figure~\ref{boiler}. This
   668 run. The complete code is shown in Figure~\ref{boiler}. This
   657 boilerplate code is very specific to the JVM. If we target any
   669 boilerplate code is very specific to the JVM. If we target any
   658 other virtual machine or a machine language, then we would
   670 other virtual machine or a machine language, then we would
   661 is called \emph{before} the \pcode{main}-method in Lines 10 to
   673 is called \emph{before} the \pcode{main}-method in Lines 10 to
   662 17. Interesting are the Lines 11 and 12 where we hardwire that
   674 17. Interesting are the Lines 11 and 12 where we hardwire that
   663 the stack of our programs will never be larger than 200 and
   675 the stack of our programs will never be larger than 200 and
   664 that the maximum number of variables is also 200. This seem to
   676 that the maximum number of variables is also 200. This seem to
   665 be conservative default values that allow is to run some
   677 be conservative default values that allow is to run some
   666 simple While-programs. In a real compiler, we would of course
   678 simple WHILE-programs. In a real compiler, we would of course
   667 need to work harder and find out appropriate values for the
   679 need to work harder and find out appropriate values for the
   668 stack and local variables.
   680 stack and local variables.
       
   681 
       
   682 \begin{figure}[t]
       
   683 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
       
   684 .class public XXX.XXX
       
   685 .super java/lang/Object
       
   686 
       
   687 .method public static main([Ljava/lang/String;)V
       
   688     .limit locals 200
       
   689     .limit stack 200
       
   690 
       
   691       $\textit{\ldots{}here comes the compiled code\ldots}$
       
   692 
       
   693     return
       
   694 .end method
       
   695 \end{lstlisting}
       
   696 \caption{The boilerplate code needed for running generated code.\label{boiler}}
       
   697 \end{figure}
       
   698 
   669 
   699 
   670 To sum up, in Figure~\ref{test} is the complete code generated
   700 To sum up, in Figure~\ref{test} is the complete code generated
   671 for the slightly nonsensical program
   701 for the slightly nonsensical program
   672 
   702 
   673 \begin{lstlisting}[mathescape,language=While]
   703 \begin{lstlisting}[mathescape,language=While]
   681 bytecode is then understood by the JVM and can be run by just invoking
   711 bytecode is then understood by the JVM and can be run by just invoking
   682 the \pcode{java}-program.
   712 the \pcode{java}-program.
   683 
   713 
   684 
   714 
   685 \begin{figure}[p]
   715 \begin{figure}[p]
   686 \lstinputlisting[language=JVMIS]{../progs/test-small.j}
   716 \lstinputlisting[language=JVMIS,mathescape]{../progs/test-small.j}
   687 \caption{Generated code for a test program. This code can be 
   717 \begin{tikzpicture}[remember picture,overlay]
   688 processed by an Java assembler producing a class-file, which
   718   \draw[|<->|,very thick] (LA.north) -- (LB.south) 
   689 can be run by the {\tt{}java}-program.\label{test}}
   719      node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
       
   720   \draw[|<->|,very thick] (LC.north) -- (LD.south)
       
   721      node[left=0mm,midway] {\footnotesize\texttt{write x}};
       
   722 \end{tikzpicture}
       
   723 \caption{The generated code for the test program \texttt{x := 1 + 2; write
       
   724 x}. This code can be processed by a Java assembler producing a
       
   725 class-file, which can then be run by the {\tt{}java}-program.\label{test}}
   690 \end{figure}
   726 \end{figure}
   691 
   727 
   692 \subsection*{Arrays}
   728 \subsection*{Arrays}
   693 
   729 
   694 Maybe a useful addition to the While-language would be arrays. This
   730 Maybe a useful addition to the WHILE-language would be arrays. This
   695 would let us generate more interesting While-programs by translating
   731 would allow us to generate more interesting WHILE-programs by
   696 BF*** programs into equivalent While-code. So in this section lets have
   732 translating BF*** programs into equivalent WHILE-code. Therefore in this
   697 a look at how we can support the following three constructions
   733 section let us have a look at how we can support the following three
       
   734 constructions
   698 
   735 
   699 \begin{lstlisting}[mathescape,language=While]
   736 \begin{lstlisting}[mathescape,language=While]
   700 new arr[15000]
   737 new(arr[15000])
   701 x := 3 + arr[3 + y]
   738 x := 3 + arr[3 + y]
   702 arr[42 * n] := ...
   739 arr[42 * n] := ...
   703 \end{lstlisting}
   740 \end{lstlisting}
   704 
   741 
   705 \noindent
   742 \noindent
   706 The first construct is for creating new arrays, in this instance the
   743 The first construct is for creating new arrays. In this instance the
   707 name of the array is \pcode{arr} and it can hold 15000 integers. The
   744 name of the array is \pcode{arr} and it can hold 15000 integers. We do
   708 second is for referencing an array cell inside an arithmetic
   745 not support ``dynamic'' arrays, that is the size of our arrays will
   709 expression---we need to be able to look up the contents of an array at
   746 always be fixed. The second construct is for referencing an array cell
   710 an index determined by an arithmetic expression. Similarly in the line
   747 inside an arithmetic expression---we need to be able to look up the
   711 below, we need to be able to update the content of an array at an
   748 contents of an array at an index determined by an arithmetic expression.
   712 calculated index.  
   749 Similarly in the line below, we need to be able to update the content of
       
   750 an array at an calculated index.  
   713 
   751 
   714 For creating a new array we can generate the following three JVM
   752 For creating a new array we can generate the following three JVM
   715 instructions:
   753 instructions:
   716 
   754 
   717 \begin{lstlisting}[mathescape,language=JVMIS]
   755 \begin{lstlisting}[mathescape,language=JVMIS]
   719 newarray int 
   757 newarray int 
   720 astore loc_var
   758 astore loc_var
   721 \end{lstlisting}
   759 \end{lstlisting}
   722 
   760 
   723 \noindent
   761 \noindent
   724 First we need to put the dimension of the array onto the stack. The next
   762 First we need to put the size of the array onto the stack. The next
   725 instruction creates the array. With the last we can store the array as a
   763 instruction creates the array. In this case the array contains
       
   764 \texttt{int}s. With the last instruction we can store the array as a
   726 local variable (like the ``simple'' variables from the previous
   765 local variable (like the ``simple'' variables from the previous
   727 section). The use of a local variable for each array allows us to have
   766 section). The use of a local variable for each array allows us to have
   728 multiple arrays in a While-program. For looking up an element in an
   767 multiple arrays in a WHILE-program. For looking up an element in an
   729 array we can use the following JVM code
   768 array we can use the following JVM code
   730 
   769 
   731 \begin{lstlisting}[mathescape,language=JVMIS]
   770 \begin{lstlisting}[mathescape,language=JVMIS]
   732 aload loc_var 
   771 aload loc_var 
   733 index_aexp 
   772 index_aexp 
   734 iaload
   773 iaload
   735 \end{lstlisting}
   774 \end{lstlisting}
   736 
   775 
   737 \noindent
   776 \noindent
   738 The first instruction loads the ``pointer'' to the array onto the stack.
   777 The first instruction loads the ``pointer'', or local variable, to the
   739 Then we have some instructions corresponding to the index where we want
   778 array onto the stack. Then we have some instructions calculating the
   740 to look up the array. The idea is that these instructions will leave a
   779 index where we want to look up the array. The idea is that these
   741 concrete number on the stack, which will be the index into the array we
   780 instructions will leave a concrete number on the top of the stack, which
   742 need. Finally we need to tell the JVM to load the corresponding element
   781 will be the index into the array we need. Finally we need to tell the
   743 onto the stack. Updating an array at an index with a value is as
   782 JVM to load the corresponding element onto the stack. Updating an array
   744 follows.
   783 at an index with a value is as follows.
   745 
   784 
   746 \begin{lstlisting}[mathescape,language=JVMIS]
   785 \begin{lstlisting}[mathescape,language=JVMIS]
   747 aload loc_var 
   786 aload loc_var 
   748 index_aexp 
   787 index_aexp 
   749 value_aexp 
   788 value_aexp 
   750 iastore
   789 iastore
   751 \end{lstlisting}
   790 \end{lstlisting}
   752 
   791 
   753 \noindent
   792 \noindent
   754 Again the first instruction loads the ``pointer'' to the array onto the
   793 Again the first instruction loads the local variable of
   755 stack. Then we have some instructions corresponding to the index where
   794 the array onto the stack. Then we have some instructions calculating
   756 we want to update the array. After that come the instructions for with
   795 the index where we want to update the array. After that come the
   757 what value we want to update the array. The last line contains the
   796 instructions for with which value we want to update the array. The last
   758 instruction for updating the array.
   797 line contains the instruction for updating the array.
   759 
   798 
   760 Next we need to modify our grammar rules for our While-language: it
   799 Next we need to modify our grammar rules for our WHILE-language: it
   761 seems best to extend the rule for factors in arithmetic expressions with
   800 seems best to extend the rule for factors in arithmetic expressions with
   762 a rule for looking up an array.
   801 a rule for looking up an array.
   763 
   802 
   764 \begin{plstx}[rhs style=, margin=3cm]
   803 \begin{plstx}[rhs style=, margin=3cm]
   765 : \meta{E} ::= \meta{T} $+$ \meta{E}
   804 : \meta{E} ::= \meta{T} $+$ \meta{E}
   779 by an identifier and the brackets. There are two new rules for statements,
   818 by an identifier and the brackets. There are two new rules for statements,
   780 one for creating an array and one for array assignment:
   819 one for creating an array and one for array assignment:
   781 
   820 
   782 \begin{plstx}[rhs style=, margin=2cm, one per line]
   821 \begin{plstx}[rhs style=, margin=2cm, one per line]
   783 : \meta{Stmt} ::=  \ldots
   822 : \meta{Stmt} ::=  \ldots
   784               | \texttt{new}\; \meta{Id}\,[\,\meta{Num}\,] 
   823               | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) 
   785               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
   824               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
   786 \end{plstx}
   825 \end{plstx}
   787 
   826 
   788 With this in place we can turn back to the idea of creating While
   827 With this in place we can turn back to the idea of creating
   789 programs by translating BF programs. This is a relatively easy task
   828 WHILE-programs by translating BF programs. This is a relatively easy
   790 because BF only has eight instructions (we will actually only have seven
   829 task because BF has only eight instructions (we will actually implement
   791 because we can omit the read-in instruction from BF). But also translating
   830 seven because we can omit the read-in instruction from BF). What makes
   792 BF-loops is going to be easy since they straightforwardly can be 
   831 this translation easy is that BF-loops can be straightforwardly
   793 represented by While-loops. The Scala code for the translation is
   832 represented as while-loops. The Scala code for the translation is as
   794 as follows:
   833 follows:
   795 
   834 
   796 \begin{lstlisting}[language=Scala,numbers=left]
   835 \begin{lstlisting}[language=Scala,numbers=left]
   797 def instr(c: Char) : String = c match {
   836 def instr(c: Char) : String = c match {
   798   case '>' => "ptr := ptr + 1;"
   837   case '>' => "ptr := ptr + 1;"
   799   case '<' => "ptr := ptr - 1;"
   838   case '<' => "ptr := ptr - 1;"
   800   case '+' => "field[ptr] := field[ptr] + 1;"
   839   case '+' => "mem[ptr] := mem [ptr] + 1;"
   801   case '-' => "field[ptr] := field[ptr] - 1;"
   840   case '-' => "mem [ptr] := mem [ptr] - 1;"
   802   case '.' => "x := field[ptr]; write x;"
   841   case '.' => "x := mem [ptr]; write x;"
   803   case '['  => "while (field[ptr] != 0) do {"
   842   case '['  => "while (mem [ptr] != 0) do {"
   804   case ']'  => "skip};"
   843   case ']'  => "skip};"
   805   case _ => ""
   844   case _ => ""
   806 }
   845 }
   807 \end{lstlisting}
   846 \end{lstlisting}
   808 
   847 
   809 \noindent 
   848 \noindent 
   810 The idea behind the translation is that BF-programs operate on an array,
   849 The idea behind the translation is that BF-programs operate on an array,
   811 called \texttt{field}. The BP-memory pointer into this array is
   850 called here \texttt{mem}. The BP-memory pointer into this array is
   812 represented as the variable \texttt{ptr}. The BF-instructions \code{>}
   851 represented as the variable \texttt{ptr}. As usual the BF-instructions
   813 and \code{<} increase, respectively decrease, \texttt{ptr}. The
   852 \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
   814 instructions \code{+} and \code{-} update a cell in \texttt{field}. In
   853 instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
   815 Line 6 we need to first assign a field-cell to an auxiliary variable
   854 Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary variable
   816 since we have not changed our write functions in order to cope with
   855 since we have not changed our write functions in order to cope with
   817 writing out any array-content directly. Lines 7 and 8 are for
   856 writing out any array-content directly. Lines 7 and 8 are for
   818 translating BF-loops. Line 8 is interesting in the sense that we need to
   857 translating BF-loops. Line 8 is interesting in the sense that we need to
   819 generate a \code{skip} instruction just before finishing with the 
   858 generate a \code{skip} instruction just before finishing with the
   820 closing \code{"\}"}. The reason is that we are rather pedantic about
   859 closing \code{"\}"}. The reason is that we are rather pedantic about
   821 semicolons in our While-grammar: the last command cannot have a
   860 semicolons in our WHILE-grammar: the last command cannot have a
   822 semicolon---adding a \code{skip} works around this snag. Putting
   861 semicolon---adding a \code{skip} works around this snag. Putting all
   823 all this together is we can generate While-programs with more than
   862 this together is we can generate WHILE-programs with more than 400
   824 400 instructions and then run the compiled JVM code for such programs.
   863 instructions and then run the compiled JVM code for such programs.
   825 
   864 \bigskip
       
   865 
       
   866 \noindent
       
   867 Hooooray, we can finally run the BF-mandelbrot program on the JVM and it
       
   868 completes within 20 seconds (after nearly 10 minutes of parsing the
       
   869 corresponding WHILE-program and generating 270K of a class file). Try
       
   870 replicating the 20 secs with an interpreter! OK, we now face the
       
   871 nagging question about what to do next\ldots
       
   872 
       
   873 \subsection*{Added Value}
       
   874 
       
   875 As you have probably seen, the compiler writer has a lot of freedom
       
   876 about how to generate code from what the progarmmer wrote as program.
       
   877 The only condition is that generated code should behave as expected by
       
   878 the programmer. Then all is fine\ldots mission accomplished! But
       
   879 sometimes the compiler writer is expected to go an extra mile, or even
       
   880 miles. Suppose we are given the following WHILE-program:
       
   881 
       
   882 \begin{lstlisting}[mathescape,language=While]
       
   883 new(arr[10]);
       
   884 arr[14] := 3 + arr[13]
       
   885 \end{lstlisting}
       
   886 
       
   887 \noindent 
       
   888 While admittedly this is a contrived program, and probably not meant to
       
   889 be like this by any sane programmer, it is supposed to make the
       
   890 following point: We generate an array of size 10, and then try to access
       
   891 the non-existing element at index 13 and even updating element with
       
   892 index 14. Obviously this is baloney. However, our compiler generates
       
   893 code for this program without any questions asked. We can even run this
       
   894 code on the JVM\ldots of course the result is an exception trace where
       
   895 the JVM yells at us for doing naughty things. (This is much better than
       
   896 C, for example, where such errors are not prevented and as a result
       
   897 insidious attacks can be mounted against such kind C-programs. I assume
       
   898 everyone has heard about \emph{Buffer Overflow Attacks}.) 
       
   899 
       
   900 Imagine we do not want to rely in our compiler on the JVM for producing
       
   901 an annoying, but safe exception trace, rather we want to handle such
       
   902 situations ourselves. Lets assume we want to handle them in the
       
   903 following way: if the programmer access a field out-of-bounds, we just
       
   904 return the default 0, and if a programmer wants to update an
       
   905 out-of-bounds filed, we quietly ignore this update.
   826 
   906 
   827 \end{document}
   907 \end{document}
   828 
   908 
   829 %%% Local Variables: 
   909 %%% Local Variables: 
   830 %%% mode: latex  
   910 %%% mode: latex