| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Wed, 29 Sep 2021 21:04:41 +0100 | |
| changeset 841 | 117aca354c0a | 
| parent 714 | 9d06a8863898 | 
| child 940 | 1c1fbf45a03c | 
| permissions | -rw-r--r-- | 
| 601 | 1 | % !TEX program = xelatex | 
| 327 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 2 | \documentclass{article}
 | 
| 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 3 | \usepackage{../style}
 | 
| 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 4 | \usepackage{../langs}
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 5 | \usepackage{../grammar}
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 6 | \usepackage{../graphics}
 | 
| 714 | 7 | \usetikzlibrary{calc,shapes,arrows}
 | 
| 710 | 8 | \usepackage{framed}
 | 
| 9 | \usepackage[belowskip=7pt,aboveskip=0pt]{caption}
 | |
| 705 | 10 | |
| 708 | 11 | |
| 12 | ||
| 13 | ||
| 327 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 14 | \begin{document}
 | 
| 708 | 15 | \fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019, 2020}
 | 
| 327 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 16 | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 17 | \section*{Handout 7 (Compilation)}
 | 
| 327 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 18 | |
| 668 | 19 | The purpose of a compiler is to transform a program a human can read and | 
| 20 | write into code the machine can run as fast as possible. The fastest | |
| 21 | code would be machine code the CPU can run directly, but it is often | |
| 709 | 22 | good enough for improving the speed of a program to target a virtual | 
| 23 | machine instead. This produces not the fastest possible code, but code | |
| 710 | 24 | that is often pretty fast. This way of producing code has also the | 
| 25 | advantage that the virtual machine takes care of things a compiler would | |
| 26 | normally need to take care of (hairy things like explicit memory | |
| 27 | management). | |
| 452 
0b707b614dac
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 28 | |
| 
0b707b614dac
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 29 | As a first example in this module we will implement a compiler for the | 
| 708 | 30 | very simple WHILE-language that we parsed in the last lecture. The | 
| 31 | compiler will target the Java Virtual Machine (JVM), but not directly. | |
| 32 | Pictorially the compiler will work as follows: | |
| 710 | 33 | |
| 708 | 34 | \begin{center}
 | 
| 35 |   \begin{tikzpicture}[scale=1,font=\bf,
 | |
| 36 |                       node/.style={
 | |
| 37 | rectangle,rounded corners=3mm, | |
| 38 | ultra thick,draw=black!50,minimum height=18mm, | |
| 39 | minimum width=20mm, | |
| 40 | top color=white,bottom color=black!20}] | |
| 41 | ||
| 42 |   \node (0) at (-3,0) {};  
 | |
| 43 |   \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler};
 | |
| 44 |   \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau};
 | |
| 45 |   \node (C) at (7.5,0) [node] {JVM};
 | |
| 46 | ||
| 47 |   \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); 
 | |
| 48 |   \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); 
 | |
| 49 |   \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); 
 | |
| 50 |   \end{tikzpicture}
 | |
| 51 |   \end{center}
 | |
| 52 | ||
| 53 | \noindent | |
| 54 | The input will be WHILE-programs; the output will be assembly files | |
| 709 | 55 | (with the file extension .j). Assembly files essentially contain | 
| 712 | 56 | human-readable low-level code, meaning they are not just bits and bytes, | 
| 709 | 57 | but rather something you can read and understand---with a bit of | 
| 58 | practice of course. An \emph{assembler} will then translate the assembly
 | |
| 712 | 59 | files into unreadable class- or binary-files the JVM or CPU can run. | 
| 709 | 60 | Unfortunately, the Java ecosystem does not come with an assembler which | 
| 61 | would be handy for our compiler-endeavour (unlike Microsoft's Common | |
| 62 | Language Infrastructure for the .Net platform which has an assembler | |
| 712 | 63 | out-of-the-box). As a substitute we shall use the 3rd-party programs | 
| 64 | Jasmin and Krakatau | |
| 690 | 65 | |
| 66 | \begin{itemize}
 | |
| 67 |   \item \url{http://jasmin.sourceforge.net}
 | |
| 68 |   \item \url{https://github.com/Storyyeller/Krakatau}
 | |
| 69 | \end{itemize}
 | |
| 70 | ||
| 71 | \noindent | |
| 72 | The first is a Java program and the second a program written in Python. | |
| 73 | Each of them allow us to generate \emph{assembly} files that are still
 | |
| 74 | readable by humans, as opposed to class-files which are pretty much just | |
| 75 | (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take | |
| 710 | 76 | our assembly files as input and generate the corresponding class-files for | 
| 690 | 77 | us. | 
| 78 | ||
| 710 | 79 | What is good about the JVM is that it is a stack-based virtual machine, | 
| 80 | a fact which will make it easy to generate code for arithmetic | |
| 81 | expressions. For example when compiling the expression $1 + 2$ we need | |
| 82 | to generate the following three instructions | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 83 | |
| 668 | 84 | \begin{lstlisting}[language=JVMIS,numbers=none]
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 85 | ldc 1 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 86 | ldc 2 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 87 | iadd | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 88 | \end{lstlisting}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 89 | |
| 709 | 90 | \noindent The first instruction loads the constant $1$ onto the stack, | 
| 91 | the next one loads $2$, the third instruction adds both numbers together | |
| 92 | replacing the top two elements of the stack with the result $3$. For | |
| 710 | 93 | simplicity, we will consider throughout only arithmetic involving | 
| 94 | integer numbers. This means our main JVM instructions for arithmetic | |
| 711 | 95 | will be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on.
 | 
| 710 | 96 | The \code{i} stands for integer instructions in the JVM (alternatives
 | 
| 97 | are \code{d} for doubles, \code{l} for longs and \code{f} for floats
 | |
| 98 | etc). | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 99 | |
| 600 | 100 | Recall our grammar for arithmetic expressions (\meta{E} is the
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 101 | starting symbol): | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 102 | |
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 103 | |
| 601 | 104 | \begin{plstx}[rhs style=, margin=3cm]
 | 
| 105 | : \meta{E} ::= \meta{T} $+$ \meta{E}
 | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 106 |          | \meta{T} $-$ \meta{E}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 107 |          | \meta{T}\\
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 108 | : \meta{T} ::= \meta{F} $*$ \meta{T}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 109 |           | \meta{F} $\backslash$ \meta{T}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 110 |           | \meta{F}\\
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 111 | : \meta{F} ::= ( \meta{E} )
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 112 |           | \meta{Id}
 | 
| 601 | 113 |           | \meta{Num}\\
 | 
| 114 | \end{plstx}
 | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 115 | |
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 116 | |
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 117 | \noindent where \meta{Id} stands for variables and \meta{Num}
 | 
| 668 | 118 | for numbers. For the moment let us omit variables from arithmetic | 
| 119 | expressions. Our parser will take this grammar and given an input | |
| 712 | 120 | program produce an abstract syntax tree. For example we obtain for | 
| 709 | 121 | the expression $1 + ((2 * 3) + (4 - 3))$ the following tree. | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 122 | |
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 123 | \begin{center}
 | 
| 601 | 124 | \begin{tikzpicture}
 | 
| 125 | \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] | |
| 126 | \end{tikzpicture}
 | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 127 | \end{center}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 128 | |
| 708 | 129 | \noindent To generate JVM code for this expression, we need to traverse | 
| 130 | this tree in \emph{post-order} fashion and emit code for each
 | |
| 131 | node---this traversal in \emph{post-order} fashion will produce code for
 | |
| 132 | a stack-machine (which is what the JVM is). Doing so for the tree above | |
| 133 | generates the instructions | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 134 | |
| 668 | 135 | \begin{lstlisting}[language=JVMIS,numbers=none]
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 136 | ldc 1 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 137 | ldc 2 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 138 | ldc 3 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 139 | imul | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 140 | ldc 4 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 141 | ldc 3 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 142 | isub | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 143 | iadd | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 144 | iadd | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 145 | \end{lstlisting}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 146 | |
| 668 | 147 | \noindent If we ``run'' these instructions, the result $8$ will be on | 
| 148 | top of the stack (I leave this to you to verify; the meaning of each | |
| 149 | instruction should be clear). The result being on the top of the stack | |
| 690 | 150 | will be an important convention we always observe in our compiler. Note, | 
| 151 | that a different bracketing of the expression, for example $(1 + (2 * | |
| 152 | 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also | |
| 709 | 153 | a different list of instructions. | 
| 154 | ||
| 155 | Generating code in this post-order-traversal fashion is rather easy to | |
| 156 | implement: it can be done with the following recursive | |
| 157 | \textit{compile}-function, which takes the abstract syntax tree as an
 | |
| 158 | argument: | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 159 | |
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 160 | \begin{center}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 161 | \begin{tabular}{lcl}
 | 
| 711 | 162 | $\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 163 | $\textit{compile}(a_1 + a_2)$ & $\dn$ &
 | 
| 711 | 164 | $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 165 | $\textit{compile}(a_1 - a_2)$ & $\dn$ & 
 | 
| 711 | 166 | $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 167 | $\textit{compile}(a_1 * a_2)$ & $\dn$ & 
 | 
| 711 | 168 | $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 169 | $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
 | 
| 711 | 170 | $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 171 | \end{tabular}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 172 | \end{center}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 173 | |
| 709 | 174 | \noindent | 
| 175 | This is all fine, but our arithmetic expressions can contain variables | |
| 176 | and we have not considered them yet. To fix this we will represent our | |
| 710 | 177 | variables as \emph{local variables} of the JVM. Essentially, local
 | 
| 709 | 178 | variables are an array or pointers to memory cells, containing in our | 
| 179 | case only integers. Looking up a variable can be done with the | |
| 180 | instruction | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 181 | |
| 668 | 182 | \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 183 | iload $index$ | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 184 | \end{lstlisting}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 185 | |
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 186 | \noindent | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 187 | which places the content of the local variable $index$ onto | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 188 | the stack. Storing the top of the stack into a local variable | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 189 | can be done by the instruction | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 190 | |
| 668 | 191 | \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 192 | istore $index$ | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 193 | \end{lstlisting}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 194 | |
| 708 | 195 | \noindent Note that this also pops off the top of the stack. One problem | 
| 196 | we have to overcome, however, is that local variables are addressed, not | |
| 197 | by identifiers (like \texttt{x}, \texttt{foo} and so on), but by numbers
 | |
| 198 | (starting from $0$). Therefore our compiler needs to maintain a kind of | |
| 199 | environment where variables are associated to numbers. This association | |
| 200 | needs to be unique: if we muddle up the numbers, then we essentially | |
| 201 | confuse variables and the consequence will usually be an erroneous | |
| 202 | result. Our extended \textit{compile}-function for arithmetic
 | |
| 203 | expressions will therefore take two arguments: the abstract syntax tree | |
| 204 | and an environment, $E$, that maps identifiers to index-numbers. | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 205 | |
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 206 | \begin{center}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 207 | \begin{tabular}{lcl}
 | 
| 711 | 208 | $\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 209 | $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
 | 
| 711 | 210 | $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 211 | $\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
 | 
| 711 | 212 | $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 213 | $\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
 | 
| 711 | 214 | $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\
 | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 215 | $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
 | 
| 711 | 216 | $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\
 | 
| 217 | $\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\
 | |
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 218 | \end{tabular}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 219 | \end{center}
 | 
| 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 220 | |
| 708 | 221 | \noindent In the last line we generate the code for variables where | 
| 222 | $E(x)$ stands for looking up the environment to which index the variable | |
| 223 | $x$ maps to. This is similar to the interpreter we saw earlier in the | |
| 224 | module, which also needs an environment: the difference is that the | |
| 225 | interpreter maintains a mapping from variables to current values (what | |
| 226 | is the currently the value of a variable?), while compilers need a | |
| 227 | mapping from variables to memory locations (where can I find the current | |
| 228 | value for the variable in memory?). | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 229 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 230 | There is a similar \textit{compile}-function for boolean
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 231 | expressions, but it includes a ``trick'' to do with | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 232 | \pcode{if}- and \pcode{while}-statements. To explain the issue
 | 
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 233 | let us first describe the compilation of statements of the | 
| 708 | 234 | WHILE-language. The clause for \pcode{skip} is trivial, since
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 235 | we do not have to generate any instruction | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 236 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 237 | \begin{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 238 | \begin{tabular}{lcl}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 239 | $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 240 | \end{tabular}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 241 | \end{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 242 | |
| 668 | 243 | \noindent whereby $[]$ is the empty list of instructions. Note that | 
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 244 | the \textit{compile}-function for statements returns a pair, a
 | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 245 | list of instructions (in this case the empty list) and an | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 246 | environment for variables. The reason for the environment is | 
| 708 | 247 | that assignments in the WHILE-language might change the | 
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 248 | environment---clearly if a variable is used for the first | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 249 | time, we need to allocate a new index and if it has been used | 
| 690 | 250 | before, then we need to be able to retrieve the associated index. | 
| 251 | This is reflected in the clause for compiling assignments, say | |
| 712 | 252 | $x := a$: | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 253 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 254 | \begin{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 255 | \begin{tabular}{lcl}
 | 
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 256 | $\textit{compile}(x := a, E)$ & $\dn$ & 
 | 
| 711 | 257 | $(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 258 | \end{tabular}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 259 | \end{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 260 | |
| 708 | 261 | \noindent We first generate code for the right-hand side of the | 
| 262 | assignment (that is the arithmetic expression $a$) and then add an | |
| 711 | 263 | \instr{istore}-instruction at the end. By convention running the code
 | 
| 708 | 264 | for the arithmetic expression $a$ will leave the result on top of the | 
| 712 | 265 | stack. After that the \instr{istore}-instruction, the result will be
 | 
| 708 | 266 | stored in the index corresponding to the variable $x$. If the variable | 
| 267 | $x$ has been used before in the program, we just need to look up what | |
| 268 | the index is and return the environment unchanged (that is in this case | |
| 269 | $E' = E$). However, if this is the first encounter of the variable $x$ | |
| 270 | in the program, then we have to augment the environment and assign $x$ | |
| 271 | with the largest index in $E$ plus one (that is $E' = E(x \mapsto | |
| 272 | largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we | |
| 710 | 273 | generate the following code snippet | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 274 | |
| 668 | 275 | \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 276 | iload $n_x$ | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 277 | ldc 1 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 278 | iadd | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 279 | istore $n_x$ | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 280 | \end{lstlisting}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 281 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 282 | \noindent | 
| 692 | 283 | where $n_x$ is the index (or pointer to the memory) for the variable | 
| 709 | 284 | $x$. The Scala code for looking-up the index for the variable is as follow: | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 285 | |
| 668 | 286 | \begin{center}
 | 
| 287 | \begin{tabular}{lcl}
 | |
| 690 | 288 | $index \;=\; E\textit{.getOrElse}(x, |E|)$
 | 
| 668 | 289 | \end{tabular}
 | 
| 290 | \end{center}
 | |
| 291 | ||
| 292 | \noindent | |
| 708 | 293 | This implements the idea that in case the environment $E$ contains an | 
| 294 | index for $x$, we return it. Otherwise we ``create'' a new index by | |
| 295 | returning the size $|E|$ of the environment (that will be an index that | |
| 296 | is guaranteed not to be used yet). In all this we take advantage of the | |
| 297 | JVM which provides us with a potentially limitless supply of places | |
| 298 | where we can store values of variables. | |
| 668 | 299 | |
| 692 | 300 | A bit more complicated is the generation of code for | 
| 301 | \pcode{if}-statements, say
 | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 302 | |
| 711 | 303 | \begin{lstlisting}[mathescape,language={WHILE},numbers=none]
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 304 | if $b$ then $cs_1$ else $cs_2$ | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 305 | \end{lstlisting}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 306 | |
| 692 | 307 | \noindent where $b$ is a boolean expression and where both $cs_{1/2}$
 | 
| 708 | 308 | are the statements for each of the \pcode{if}-branches. Let us assume we
 | 
| 309 | already generated code for $b$ and and the two if-branches $cs_{1/2}$.
 | |
| 310 | Then in the true-case the control-flow of the program needs to behave as | |
| 311 | ||
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 312 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 313 | \begin{center}
 | 
| 708 | 314 | \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
| 315 |  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | |
| 316 | top color=white,bottom color=black!20}, | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 317 |  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 318 |  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 319 | \node (A1) [point] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 320 | \node (b) [block, right=of A1] {code of $b$};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 321 | \node (A2) [point, right=of b] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 322 | \node (cs1) [block, right=of A2] {code of $cs_1$};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 323 | \node (A3) [point, right=of cs1] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 324 | \node (cs2) [block, right=of A3] {code of $cs_2$};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 325 | \node (A4) [point, right=of cs2] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 326 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 327 | \draw (A1) edge [->, black, line width=1mm] (b); | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 328 | \draw (b) edge [->, black, line width=1mm] (cs1); | 
| 708 | 329 | \draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3); | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 330 | \draw (A3) edge [->, black, skip loop] (A4); | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 331 | \node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 332 | \end{tikzpicture}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 333 | \end{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 334 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 335 | \noindent where we start with running the code for $b$; since | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 336 | we are in the true case we continue with running the code for | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 337 | $cs_1$. After this however, we must not run the code for | 
| 708 | 338 | $cs_2$, but always jump to after the last instruction of $cs_2$ | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 339 | (the code for the \pcode{else}-branch). Note that this jump is
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 340 | unconditional, meaning we always have to jump to the end of | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 341 | $cs_2$. The corresponding instruction of the JVM is | 
| 711 | 342 | \instr{goto}. In case $b$ turns out to be false we need the
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 343 | control-flow | 
| 370 
a65767fe5d71
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
369diff
changeset | 344 | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 345 | \begin{center}
 | 
| 708 | 346 | \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
| 347 |  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | |
| 348 | top color=white,bottom color=black!20}, | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 349 |  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 350 |  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 351 | \node (A1) [point] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 352 | \node (b) [block, right=of A1] {code of $b$};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 353 | \node (A2) [point, right=of b] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 354 | \node (cs1) [block, right=of A2] {code of $cs_1$};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 355 | \node (A3) [point, right=of cs1] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 356 | \node (cs2) [block, right=of A3] {code of $cs_2$};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 357 | \node (A4) [point, right=of cs2] {};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 358 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 359 | \draw (A1) edge [->, black, line width=1mm] (b); | 
| 708 | 360 | \draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2); | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 361 | \draw (A2) edge [skip loop] (A3); | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 362 | \draw (A3) edge [->, black, line width=1mm] (cs2); | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 363 | \draw (cs2) edge [->,black, line width=1mm] (A4); | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 364 | \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 365 | \end{tikzpicture}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 366 | \end{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 367 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 368 | \noindent where we now need a conditional jump (if the | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 369 | if-condition is false) from the end of the code for the | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 370 | boolean to the beginning of the instructions $cs_2$. Once we | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 371 | are finished with running $cs_2$ we can continue with whatever | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 372 | code comes after the if-statement. | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 373 | |
| 711 | 374 | The \instr{goto} and the conditional jumps need addresses to
 | 
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 375 | where the jump should go. Since we are generating assembly | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 376 | code for the JVM, we do not actually have to give (numeric) | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 377 | addresses, but can just attach (symbolic) labels to our code. | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 378 | These labels specify a target for a jump. Therefore the labels | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 379 | need to be unique, as otherwise it would be ambiguous where a | 
| 712 | 380 | jump should go to. A label, say \pcode{L}, is attached to assembly 
 | 
| 381 | code like | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 382 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 383 | \begin{lstlisting}[mathescape,numbers=none]
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 384 | L: | 
| 711 | 385 |   $\textit{instr\_1}$
 | 
| 386 |   $\textit{instr\_2}$
 | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 387 | $\vdots$ | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 388 | \end{lstlisting}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 389 | |
| 708 | 390 | \noindent where the label needs to be followed by a colon. The task of | 
| 391 | the assembler (in our case Jasmin or Krakatau) is to resolve the labels | |
| 392 | to actual (numeric) addresses, for example jump 10 instructions forward, | |
| 692 | 393 | or 20 instructions backwards. | 
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 394 | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 395 | Recall the ``trick'' with compiling boolean expressions: the | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 396 | \textit{compile}-function for boolean expressions takes three
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 397 | arguments: an abstract syntax tree, an environment for | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 398 | variable indices and also the label, $lab$, to where an conditional | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 399 | jump needs to go. The clause for the expression $a_1 = a_2$, | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 400 | for example, is as follows: | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 401 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 402 | \begin{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 403 | \begin{tabular}{lcl}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 404 | $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ 
 | 
| 711 | 405 | \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 406 | \end{tabular}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 407 | \end{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 408 | |
| 376 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 409 | \noindent where we are first generating code for the | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 410 | subexpressions $a_1$ and $a_2$. This will mean after running | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 411 | the corresponding code there will be two integers on top of | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 412 | the stack. If they are equal, we do not have to do anything | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 413 | (except for popping them off from the stack) and just continue | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 414 | with the next instructions (see control-flow of ifs above). | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 415 | However if they are \emph{not} equal, then we need to
 | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 416 | (conditionally) jump to the label $lab$. This can be done with | 
| 
af65ffff9cdd
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
375diff
changeset | 417 | the instruction | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 418 | |
| 692 | 419 | \begin{lstlisting}[mathescape,numbers=none,language=JVMIS]
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 420 | if_icmpne $lab$ | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 421 | \end{lstlisting}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 422 | |
| 708 | 423 | To sum up, the third argument in the compile function for booleans | 
| 424 | specifies where to jump, in case the condition is \emph{not} true. I
 | |
| 425 | leave it to you to extend the \textit{compile}-function for the other
 | |
| 426 | boolean expressions. Note that we need to jump whenever the boolean is | |
| 427 | \emph{not} true, which means we have to ``negate'' the jump
 | |
| 428 | condition---equals becomes not-equal, less becomes greater-or-equal. | |
| 429 | Other jump instructions for boolean operators are | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 430 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 431 | \begin{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 432 | \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
 | 
| 711 | 433 | $\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\
 | 
| 434 | $<$ & $\Rightarrow$ & \instr{if_icmpge}\\
 | |
| 435 | $\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\
 | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 436 | \end{tabular}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 437 | \end{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 438 | |
| 708 | 439 | \noindent and so on. If you do not like this design (it can be the | 
| 692 | 440 | source of some nasty, hard-to-detect errors), you can also change the | 
| 441 | layout of the code and first give the code for the else-branch and then | |
| 442 | for the if-branch. However in the case of while-loops this | |
| 443 | ``upside-down-inside-out'' way of generating code still seems the most | |
| 444 | convenient. | |
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 445 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 446 | We are now ready to give the compile function for | 
| 601 | 447 | if-statements---remember this function returns for statements a | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 448 | pair consisting of the code and an environment: | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 449 | |
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 450 | \begin{center}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 451 | \begin{tabular}{lcl}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 452 | $\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ 
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 453 | \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 454 | \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 455 | \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 456 | \multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 457 | \multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 458 | \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 459 | \multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 460 | \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 461 | \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 462 | \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\
 | 
| 372 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 463 | \end{tabular}
 | 
| 
d6af4b1239de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
370diff
changeset | 464 | \end{center}
 | 
| 327 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 465 | |
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 466 | \noindent In the first two lines we generate two fresh labels | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 467 | for the jump addresses (just before the else-branch and just | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 468 | after). In the next two lines we generate the instructions for | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 469 | the two branches, $is_1$ and $is_2$. The final code will | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 470 | be first the code for $b$ (including the label | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 471 | just-before-the-else-branch), then the \pcode{goto} for after
 | 
| 712 | 472 | the else-branch, the label $L_\textit{ifelse}$, followed by
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 473 | the instructions for the else-branch, followed by the | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 474 | after-the-else-branch label. Consider for example the | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 475 | if-statement: | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 476 | |
| 690 | 477 | \begin{lstlisting}[mathescape,numbers=none,language=While]
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 478 | if 1 = 1 then x := 2 else y := 3 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 479 | \end{lstlisting}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 480 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 481 | \noindent | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 482 | The generated code is as follows: | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 483 | |
| 690 | 484 | \begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 485 | ldc 1 | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 486 | ldc 1 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 487 |    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 488 | ldc 2 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 489 | istore 0 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 490 |    goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 491 | L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 492 | ldc 3 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 493 | istore 1 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 494 | L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 495 | \end{lstlisting}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 496 | |
| 601 | 497 | \begin{tikzpicture}[remember picture,overlay]
 | 
| 498 |   \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) 
 | |
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 499 | -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 500 |   \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) 
 | 
| 601 | 501 | -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); | 
| 502 | \end{tikzpicture}
 | |
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 503 | |
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 504 | \noindent The first three lines correspond to the the boolean | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 505 | expression $1 = 1$. The jump for when this boolean expression | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 506 | is false is in Line~3. Lines 4-6 corresponds to the if-branch; | 
| 712 | 507 | the else-branch is in Lines 8 and 9. | 
| 508 | ||
| 509 | Note carefully how the environment $E$ is threaded through the recursive | |
| 510 | calls of \textit{compile}. The function receives an environment $E$, but
 | |
| 511 | it might extend it when compiling the if-branch, yielding $E'$. This | |
| 512 | happens for example in the if-statement above whenever the variable | |
| 513 | \code{x} has not been used before. Similarly with the environment $E''$
 | |
| 514 | for the second call to \textit{compile}. $E''$ is also the environment
 | |
| 515 | that needs to be returned as part of the answer. | |
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 516 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 517 | The compilation of the while-loops, say | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 518 | \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
 | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 519 | the condition is true and we need to do another iteration, | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 520 | and the control-flow needs to be as follows | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 521 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 522 | \begin{center}
 | 
| 708 | 523 | \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
| 524 |  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | |
| 525 | top color=white,bottom color=black!20}, | |
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 526 |  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 527 |  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 528 | \node (A0) [point, left=of A1] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 529 | \node (A1) [point] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 530 | \node (b) [block, right=of A1] {code of $b$};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 531 | \node (A2) [point, right=of b] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 532 | \node (cs1) [block, right=of A2] {code of $cs$};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 533 | \node (A3) [point, right=of cs1] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 534 | \node (A4) [point, right=of A3] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 535 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 536 | \draw (A0) edge [->, black, line width=1mm] (b); | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 537 | \draw (b) edge [->, black, line width=1mm] (cs1); | 
| 708 | 538 | \draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3); | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 539 | \draw (A3) edge [->,skip loop] (A1); | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 540 | \end{tikzpicture}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 541 | \end{center}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 542 | |
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 543 | \noindent Whereas if the condition is \emph{not} true, we
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 544 | need to jump out of the loop, which gives the following | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 545 | control flow. | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 546 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 547 | \begin{center}
 | 
| 708 | 548 | \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
| 549 |  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | |
| 550 | top color=white,bottom color=black!20}, | |
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 551 |  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 552 |  skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 553 | \node (A0) [point, left=of A1] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 554 | \node (A1) [point] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 555 | \node (b) [block, right=of A1] {code of $b$};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 556 | \node (A2) [point, right=of b] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 557 | \node (cs1) [block, right=of A2] {code of $cs$};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 558 | \node (A3) [point, right=of cs1] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 559 | \node (A4) [point, right=of A3] {};
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 560 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 561 | \draw (A0) edge [->, black, line width=1mm] (b); | 
| 708 | 562 | \draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2); | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 563 | \draw (A2) edge [skip loop] (A3); | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 564 | \draw (A3) edge [->, black, line width=1mm] (A4); | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 565 | \end{tikzpicture}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 566 | \end{center}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 567 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 568 | \noindent Again we can use the \textit{compile}-function for
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 569 | boolean expressions to insert the appropriate jump to the | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 570 | end of the loop (label $L_{wend}$ below).
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 571 | |
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 572 | \begin{center}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 573 | \begin{tabular}{lcl}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 574 | $\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ 
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 575 | \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 576 | \multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 577 | \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 578 | \multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 579 | \multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 580 | \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
 | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 581 | \multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 582 | \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 583 | \end{tabular}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 584 | \end{center}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 585 | |
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 586 | \noindent I let you go through how this clause works. As an example | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 587 | you can consider the while-loop | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 588 | |
| 690 | 589 | \begin{lstlisting}[mathescape,numbers=none,language=While]
 | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 590 | while x <= 10 do x := x + 1 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 591 | \end{lstlisting}
 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 592 | |
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 593 | \noindent yielding the following code | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 594 | |
| 709 | 595 | \begin{lstlisting}[language=JVMIS2,mathescape,numbers=left]
 | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 596 | L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 597 | iload 0 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 598 | ldc 10 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 599 |    if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 600 | iload 0 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 601 | ldc 1 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 602 | iadd | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 603 | istore 0 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 604 |    goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 605 | L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 606 | \end{lstlisting}
 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 607 | |
| 601 | 608 | \begin{tikzpicture}[remember picture,overlay]
 | 
| 609 |   \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) 
 | |
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 610 | -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 611 |   \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
 | 
| 601 | 612 | -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); | 
| 613 | \end{tikzpicture}
 | |
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 614 | |
| 690 | 615 | \noindent | 
| 708 | 616 | As said, I leave it to you to decide whether the code implements | 
| 617 | the usual controlflow of while-loops. | |
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 618 | |
| 709 | 619 | Next we need to consider the WHILE-statement \pcode{write x}, which can
 | 
| 620 | be used to print out the content of a variable. For this we shall use a | |
| 708 | 621 | Java library function. In order to avoid having to generate a lot of | 
| 622 | code for each \pcode{write}-command, we use a separate helper-method and
 | |
| 623 | just call this method with an appropriate argument (which of course | |
| 624 | needs to be placed onto the stack). The code of the helper-method is as | |
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 625 | follows. | 
| 374 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 626 | |
| 709 | 627 | \begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small]
 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 628 | .method public static write(I)V | 
| 374 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 629 | .limit locals 1 | 
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 630 | .limit stack 2 | 
| 373 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 631 | getstatic java/lang/System/out Ljava/io/PrintStream; | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 632 | iload 0 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 633 | invokevirtual java/io/PrintStream/println(I)V | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 634 | return | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 635 | .end method | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 636 | \end{lstlisting}
 | 
| 
b018234c9126
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
372diff
changeset | 637 | |
| 709 | 638 | \noindent The first line marks the beginning of the method, called | 
| 639 | \pcode{write}. It takes a single integer argument indicated by the
 | |
| 640 | \pcode{(I)} and returns no result, indicated by the \pcode{V} (for
 | |
| 641 | void). Since the method has only one argument, we only need a single | |
| 642 | local variable (Line~2) and a stack with two cells will be sufficient | |
| 643 | (Line 3). Line 4 instructs the JVM to get the value of the member | |
| 712 | 644 | \pcode{out} from the class \pcode{java/lang/System}. It expects the value
 | 
| 709 | 645 | to be of type \pcode{java/io/PrintStream}. A reference to this value
 | 
| 646 | will be placed on the stack.\footnote{Note the syntax \texttt{L
 | |
| 647 | \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
 | |
| 648 | designers of Jasmin decided that this syntax is pleasing to the eye. So | |
| 649 | if you wanted to have strings in your Jasmin code, you would need to | |
| 710 | 650 | write \texttt{Ljava/lang/String;}\;. If you want arrays of one
 | 
| 651 | dimension, then use \texttt{[\ldots}; two dimensions, use
 | |
| 652 | \texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5
 | |
| 653 | copies the integer we want to print out onto the stack. In the line | |
| 654 | after that we call the method \pcode{println} (from the class
 | |
| 655 | \pcode{java/io/PrintStream}). We want to print out an integer and do not
 | |
| 656 | expect anything back (that is why the type annotation is \pcode{(I)V}).
 | |
| 657 | The \pcode{return}-instruction in the next line changes the control-flow
 | |
| 658 | back to the place from where \pcode{write} was called. This method needs
 | |
| 659 | to be part of a header that is included in any code we generate. The | |
| 660 | helper-method \pcode{write} can be invoked with the two instructions
 | |
| 374 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 661 | |
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 662 | \begin{lstlisting}[mathescape,language=JVMIS]
 | 
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 663 | iload $E(x)$ | 
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 664 | invokestatic XXX/XXX/write(I)V | 
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 665 | \end{lstlisting}
 | 
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 666 | |
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 667 | \noindent where we first place the variable to be printed on | 
| 377 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 668 | top of the stack and then call \pcode{write}. The \pcode{XXX}
 | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 669 | need to be replaced by an appropriate class name (this will be | 
| 
a052a83f562e
update
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
376diff
changeset | 670 | explained shortly). | 
| 374 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 671 | |
| 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 672 | |
| 709 | 673 | By generating code for a WHILE-program, we end up with a list of (JVM | 
| 674 | assembly) instructions. Unfortunately, there is a bit more boilerplate | |
| 675 | code needed before these instructions can be run. Essentially we have to | |
| 676 | enclose them inside a Java \texttt{main}-method. The corresponding code
 | |
| 677 | is shown in Figure~\ref{boiler}. This boilerplate code is very specific
 | |
| 678 | to the JVM. If we target any other virtual machine or a machine | |
| 679 | language, then we would need to change this code. Interesting are the | |
| 680 | Lines 5 and 6 where we hardwire that the stack of our programs will | |
| 681 | never be larger than 200 and that the maximum number of variables is | |
| 682 | also 200. This seem to be conservative default values that allow is to | |
| 683 | run some simple WHILE-programs. In a real compiler, we would of course | |
| 684 | need to work harder and find out appropriate values for the stack and | |
| 685 | local variables. | |
| 374 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 686 | |
| 708 | 687 | \begin{figure}[t]
 | 
| 710 | 688 | \begin{framed}
 | 
| 708 | 689 | \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
 | 
| 690 | .class public XXX.XXX | |
| 691 | .super java/lang/Object | |
| 692 | ||
| 693 | .method public static main([Ljava/lang/String;)V | |
| 694 | .limit locals 200 | |
| 695 | .limit stack 200 | |
| 696 | ||
| 697 |       $\textit{\ldots{}here comes the compiled code\ldots}$
 | |
| 698 | ||
| 699 | return | |
| 700 | .end method | |
| 701 | \end{lstlisting}
 | |
| 710 | 702 | \end{framed}
 | 
| 709 | 703 | \caption{The boilerplate code needed for running generated code. It
 | 
| 711 | 704 | hardwires limits for stack space and for the number of local | 
| 709 | 705 |   variables.\label{boiler}}
 | 
| 708 | 706 | \end{figure}
 | 
| 707 | ||
| 708 | ||
| 375 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 709 | To sum up, in Figure~\ref{test} is the complete code generated
 | 
| 601 | 710 | for the slightly nonsensical program | 
| 375 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 711 | |
| 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 712 | \begin{lstlisting}[mathescape,language=While]
 | 
| 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 713 | x := 1 + 2; | 
| 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 714 | write x | 
| 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 715 | \end{lstlisting}
 | 
| 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 716 | |
| 692 | 717 | \noindent I let you read the code and make sure the code behaves as | 
| 718 | expected. Having this code at our disposal, we need the assembler to | |
| 719 | translate the generated code into JVM bytecode (a class file). This | |
| 720 | bytecode is then understood by the JVM and can be run by just invoking | |
| 709 | 721 | the \pcode{java}-program. Again I let you do the work.
 | 
| 375 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 722 | |
| 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 723 | |
| 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 724 | \begin{figure}[p]
 | 
| 710 | 725 | \begin{framed}
 | 
| 709 | 726 | \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
 | 
| 708 | 727 | \begin{tikzpicture}[remember picture,overlay]
 | 
| 728 | \draw[|<->|,very thick] (LA.north) -- (LB.south) | |
| 710 | 729 |      node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
 | 
| 708 | 730 | \draw[|<->|,very thick] (LC.north) -- (LD.south) | 
| 710 | 731 |      node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
 | 
| 708 | 732 | \end{tikzpicture}
 | 
| 710 | 733 | \end{framed}
 | 
| 708 | 734 | \caption{The generated code for the test program \texttt{x := 1 + 2; write
 | 
| 735 | x}. This code can be processed by a Java assembler producing a | |
| 736 | class-file, which can then be run by the {\tt{}java}-program.\label{test}}
 | |
| 375 
bf36664a3196
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
374diff
changeset | 737 | \end{figure}
 | 
| 374 
0e25fb72d339
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
373diff
changeset | 738 | |
| 690 | 739 | \subsection*{Arrays}
 | 
| 740 | ||
| 708 | 741 | Maybe a useful addition to the WHILE-language would be arrays. This | 
| 742 | would allow us to generate more interesting WHILE-programs by | |
| 743 | translating BF*** programs into equivalent WHILE-code. Therefore in this | |
| 744 | section let us have a look at how we can support the following three | |
| 745 | constructions | |
| 690 | 746 | |
| 747 | \begin{lstlisting}[mathescape,language=While]
 | |
| 708 | 748 | new(arr[15000]) | 
| 690 | 749 | x := 3 + arr[3 + y] | 
| 750 | arr[42 * n] := ... | |
| 751 | \end{lstlisting}
 | |
| 752 | ||
| 753 | \noindent | |
| 708 | 754 | The first construct is for creating new arrays. In this instance the | 
| 755 | name of the array is \pcode{arr} and it can hold 15000 integers. We do
 | |
| 756 | not support ``dynamic'' arrays, that is the size of our arrays will | |
| 757 | always be fixed. The second construct is for referencing an array cell | |
| 758 | inside an arithmetic expression---we need to be able to look up the | |
| 759 | contents of an array at an index determined by an arithmetic expression. | |
| 760 | Similarly in the line below, we need to be able to update the content of | |
| 712 | 761 | an array at a calculated index. | 
| 691 | 762 | |
| 763 | For creating a new array we can generate the following three JVM | |
| 764 | instructions: | |
| 690 | 765 | |
| 766 | \begin{lstlisting}[mathescape,language=JVMIS]
 | |
| 767 | ldc number | |
| 768 | newarray int | |
| 769 | astore loc_var | |
| 770 | \end{lstlisting}
 | |
| 771 | ||
| 772 | \noindent | |
| 708 | 773 | First we need to put the size of the array onto the stack. The next | 
| 774 | instruction creates the array. In this case the array contains | |
| 775 | \texttt{int}s. With the last instruction we can store the array as a
 | |
| 691 | 776 | local variable (like the ``simple'' variables from the previous | 
| 692 | 777 | section). The use of a local variable for each array allows us to have | 
| 708 | 778 | multiple arrays in a WHILE-program. For looking up an element in an | 
| 692 | 779 | array we can use the following JVM code | 
| 690 | 780 | |
| 781 | \begin{lstlisting}[mathescape,language=JVMIS]
 | |
| 782 | aload loc_var | |
| 711 | 783 | $\textit{index\_aexp}$ 
 | 
| 690 | 784 | iaload | 
| 785 | \end{lstlisting}
 | |
| 786 | ||
| 787 | \noindent | |
| 708 | 788 | The first instruction loads the ``pointer'', or local variable, to the | 
| 789 | array onto the stack. Then we have some instructions calculating the | |
| 790 | index where we want to look up the array. The idea is that these | |
| 791 | instructions will leave a concrete number on the top of the stack, which | |
| 792 | will be the index into the array we need. Finally we need to tell the | |
| 793 | JVM to load the corresponding element onto the stack. Updating an array | |
| 794 | at an index with a value is as follows. | |
| 691 | 795 | |
| 796 | \begin{lstlisting}[mathescape,language=JVMIS]
 | |
| 797 | aload loc_var | |
| 711 | 798 | $\textit{index\_aexp}$ 
 | 
| 799 | $\textit{value\_aexp}$ 
 | |
| 691 | 800 | iastore | 
| 801 | \end{lstlisting}
 | |
| 802 | ||
| 803 | \noindent | |
| 708 | 804 | Again the first instruction loads the local variable of | 
| 805 | the array onto the stack. Then we have some instructions calculating | |
| 806 | the index where we want to update the array. After that come the | |
| 807 | instructions for with which value we want to update the array. The last | |
| 808 | line contains the instruction for updating the array. | |
| 691 | 809 | |
| 708 | 810 | Next we need to modify our grammar rules for our WHILE-language: it | 
| 692 | 811 | seems best to extend the rule for factors in arithmetic expressions with | 
| 812 | a rule for looking up an array. | |
| 691 | 813 | |
| 814 | \begin{plstx}[rhs style=, margin=3cm]
 | |
| 815 | : \meta{E} ::= \meta{T} $+$ \meta{E}
 | |
| 816 |          | \meta{T} $-$ \meta{E}
 | |
| 817 |          | \meta{T}\\
 | |
| 818 | : \meta{T} ::= \meta{F} $*$ \meta{T}
 | |
| 819 |           | \meta{F} $\backslash$ \meta{T}
 | |
| 820 |           | \meta{F}\\
 | |
| 821 | : \meta{F} ::= ( \meta{E} )
 | |
| 822 |           | $\underbrace{\meta{Id}\,[\,\meta{E}\,]}_{new}$
 | |
| 823 |           | \meta{Id}
 | |
| 824 |           | \meta{Num}\\
 | |
| 825 | \end{plstx}
 | |
| 826 | ||
| 827 | \noindent | |
| 828 | There is no problem with left-recursion as the \meta{E} is ``protected''
 | |
| 692 | 829 | by an identifier and the brackets. There are two new rules for statements, | 
| 830 | one for creating an array and one for array assignment: | |
| 691 | 831 | |
| 832 | \begin{plstx}[rhs style=, margin=2cm, one per line]
 | |
| 833 | : \meta{Stmt} ::=  \ldots
 | |
| 708 | 834 |               | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) 
 | 
| 691 | 835 |               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
 | 
| 836 | \end{plstx}
 | |
| 690 | 837 | |
| 708 | 838 | With this in place we can turn back to the idea of creating | 
| 712 | 839 | WHILE-programs by translating BF-programs. This is a relatively easy | 
| 708 | 840 | task because BF has only eight instructions (we will actually implement | 
| 841 | seven because we can omit the read-in instruction from BF). What makes | |
| 842 | this translation easy is that BF-loops can be straightforwardly | |
| 843 | represented as while-loops. The Scala code for the translation is as | |
| 844 | follows: | |
| 692 | 845 | |
| 846 | \begin{lstlisting}[language=Scala,numbers=left]
 | |
| 847 | def instr(c: Char) : String = c match {
 | |
| 848 | case '>' => "ptr := ptr + 1;" | |
| 849 | case '<' => "ptr := ptr - 1;" | |
| 708 | 850 | case '+' => "mem[ptr] := mem [ptr] + 1;" | 
| 851 | case '-' => "mem [ptr] := mem [ptr] - 1;" | |
| 852 | case '.' => "x := mem [ptr]; write x;" | |
| 853 |   case '['  => "while (mem [ptr] != 0) do {"
 | |
| 692 | 854 | case ']' => "skip};" | 
| 855 | case _ => "" | |
| 856 | } | |
| 857 | \end{lstlisting}
 | |
| 858 | ||
| 859 | \noindent | |
| 860 | The idea behind the translation is that BF-programs operate on an array, | |
| 710 | 861 | called here \texttt{mem}. The BF-memory pointer into this array is
 | 
| 708 | 862 | represented as the variable \texttt{ptr}. As usual the BF-instructions
 | 
| 863 | \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
 | |
| 864 | instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
 | |
| 710 | 865 | Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary
 | 
| 866 | variable since we have not changed our write functions in order to cope | |
| 867 | with writing out any array-content directly. Lines 7 and 8 are for | |
| 692 | 868 | translating BF-loops. Line 8 is interesting in the sense that we need to | 
| 708 | 869 | generate a \code{skip} instruction just before finishing with the
 | 
| 692 | 870 | closing \code{"\}"}. The reason is that we are rather pedantic about
 | 
| 708 | 871 | semicolons in our WHILE-grammar: the last command cannot have a | 
| 710 | 872 | semicolon---adding a \code{skip} works around this snag. 
 | 
| 873 | ||
| 711 | 874 | Putting this all together and we can generate WHILE-programs with more | 
| 710 | 875 | than 15K JVM-instructions; run the compiled JVM code for such | 
| 876 | programs and marvel at the output\ldots\medskip | |
| 708 | 877 | |
| 878 | \noindent | |
| 711 | 879 | \ldots{}Hooooray, after a few more tweaks we can finally run the
 | 
| 880 | BF-mandelbrot program on the JVM (after nearly 10 minutes of parsing the | |
| 881 | corresponding WHILE-program; the size of the resulting class file is | |
| 882 | around 32K---not too bad). The generation of the picture completes | |
| 883 | within 20 or so seconds. Try replicating this with an interpreter! The | |
| 710 | 884 | good point is that we now have a sufficiently complicated program in our | 
| 885 | WHILE-language in order to do some benchmarking. Which means we now face | |
| 886 | the question about what to do next\ldots | |
| 887 | ||
| 888 | \subsection*{Optimisations \& Co}
 | |
| 889 | ||
| 712 | 890 | Every compiler that deserves its name has to perform some optimisations | 
| 891 | on the code: if we put in the extra effort of writing a compiler for a | |
| 892 | language, then obviously we want to have our code to run as fast as | |
| 893 | possible. So we should look into this in more detail. | |
| 708 | 894 | |
| 711 | 895 | There is actually one aspect in our generated code where we can make | 
| 712 | 896 | easily efficiency gains. This has to do with some of the quirks of the | 
| 711 | 897 | JVM. Whenever we push a constant onto the stack, we used the JVM | 
| 898 | instruction \instr{ldc some_const}. This is a rather generic instruction
 | |
| 899 | in the sense that it works not just for integers but also for strings, | |
| 900 | objects and so on. What this instruction does is putting the constant | |
| 712 | 901 | into a \emph{constant pool} and then uses an index into this constant
 | 
| 711 | 902 | pool. This means \instr{ldc} will be represented by at least two bytes
 | 
| 712 | 903 | in the class file. While this is a sensible strategy for ``large'' | 
| 904 | constants like strings, it is a bit of overkill for small integers | |
| 905 | (which many integers will be when compiling a BF-program). To counter | |
| 906 | this ``waste'', the JVM has specific instructions for small integers, | |
| 907 | for example | |
| 710 | 908 | |
| 909 | \begin{itemize}
 | |
| 711 | 910 | \item \instr{iconst_0},\ldots, \instr{iconst_5}
 | 
| 911 | \item \instr{bipush n}
 | |
| 710 | 912 | \end{itemize}
 | 
| 708 | 913 | |
| 710 | 914 | \noindent | 
| 711 | 915 | where the \code{n} is \instr{bipush} is between -128 and 128.   By
 | 
| 916 | having dedicated instructions such as \instr{iconst_0} to
 | |
| 917 | \instr{iconst_5} (and \instr{iconst_m1}), we can make the generated code
 | |
| 918 | size smaller as these instructions only require 1 byte (as opposed the | |
| 919 | generic \instr{ldc} which needs 1 byte plus another for the index into
 | |
| 920 | the constant pool). While in theory the use of such special instructions | |
| 921 | should make the code only smaller, it actually makes the code also run | |
| 922 | faster. Probably because the JVM has to process less code and uses a | |
| 712 | 923 | specific instruction for the underlying CPU. The story with | 
| 711 | 924 | \instr{bipush} is slightly different, because it also uses two
 | 
| 712 | 925 | bytes---so it does not necessarily result in a reduction of code size. | 
| 926 | Instead, it probably uses a specific instruction in the underlying CPU | |
| 927 | that makes the JVM code run faster.\footnote{This is all ``probable''
 | |
| 928 | because I have not read the 700 pages of JVM documentation by Oracle and | |
| 929 | also have no clue how the JVM is implemented.} This means when | |
| 930 | generating code for pushing constants onto the stack, we can use the | |
| 931 | following Scala helper-function | |
| 711 | 932 | |
| 933 | \begin{lstlisting}[language=Scala]
 | |
| 934 | def compile_num(i: Int) = | |
| 935 | if (0 <= i && i <= 5) i"iconst_$i" else | |
| 712 | 936 | if (-128 <= i && i <= 127) i"bipush $i" | 
| 937 | else i"ldc $i" | |
| 711 | 938 | \end{lstlisting}
 | 
| 939 | ||
| 940 | \noindent | |
| 712 | 941 | It generates the more efficient instructions when pushing a small integer | 
| 942 | constant onto the stack. The default is \instr{ldc} for any other constants.
 | |
| 943 | ||
| 944 | The JVM also has such special instructions for | |
| 945 | loading and storing the first three local variables. The assumption is | |
| 946 | that most operations and arguments in a method will only use very few | |
| 947 | local variables. So we can use the following instructions: | |
| 711 | 948 | |
| 949 | \begin{itemize}
 | |
| 950 | \item \instr{iload_0},\ldots, \instr{iload_3}
 | |
| 951 | \item \instr{istore_0},\ldots, \instr{istore_3}
 | |
| 952 | \item \instr{aload_0},\ldots, \instr{aload_3}
 | |
| 953 | \item \instr{astore_0},\ldots, \instr{astore_3}
 | |
| 954 | \end{itemize}
 | |
| 710 | 955 | |
| 956 | ||
| 711 | 957 | \noindent Having implemented these optimisations, the code size of the | 
| 712 | 958 | BF-Mandelbrot program reduces and also the class-file runs faster (the | 
| 959 | parsing part is still very slow). According to my very rough | |
| 960 | experiments: | |
| 710 | 961 | |
| 711 | 962 | \begin{center}
 | 
| 963 | \begin{tabular}{lll}
 | |
| 964 | & class-size & runtime\\\hline | |
| 965 | Mandelbrot:\\ | |
| 966 | \hspace{5mm}unoptimised: & 33296 & 21 secs\\
 | |
| 967 | \hspace{5mm}optimised:   & 21787 & 16 secs\\
 | |
| 968 | \end{tabular}
 | |
| 969 | \end{center}
 | |
| 970 | ||
| 971 | \noindent | |
| 972 | Quite good! Such optimisations are called \emph{peephole optimisations},
 | |
| 712 | 973 | because they involve changing one or a small set of instructions into an | 
| 974 | equivalent set that has better performance. | |
| 710 | 975 | |
| 712 | 976 | If you look careful at our generated code you will quickly find another | 
| 977 | source of inefficiency in programs like | |
| 711 | 978 | |
| 979 | \begin{lstlisting}[mathescape,language=While]
 | |
| 980 | x := ...; | |
| 981 | write x | |
| 982 | \end{lstlisting}
 | |
| 710 | 983 | |
| 711 | 984 | \noindent | 
| 985 | where our code first calculates the new result the for \texttt{x} on the
 | |
| 986 | stack, then pops off the result into a local variable, and after that | |
| 987 | loads the local variable back onto the stack for writing out a number. | |
| 712 | 988 | |
| 989 | \begin{lstlisting}[mathescape,language=JVMIS]
 | |
| 990 | ... | |
| 991 | istore 0 | |
| 992 | iload 0 | |
| 993 | ... | |
| 994 | \end{lstlisting}
 | |
| 995 | ||
| 996 | \noindent | |
| 711 | 997 | If we can detect such situations, then we can leave the value of | 
| 998 | \texttt{x} on the stack with for example the much cheaper instruction
 | |
| 999 | \instr{dup}. Now the problem with this optimisation is that it is quite
 | |
| 1000 | easy for the snippet above, but what about instances where there is | |
| 1001 | further WHILE-code in \emph{between} these two statements? Sometimes we
 | |
| 1002 | will be able to optimise, sometimes we will not. The compiler needs to | |
| 712 | 1003 | find out which situation applies. This can quickly become much more | 
| 711 | 1004 | complicated. So we leave this kind of optimisations here and look at | 
| 1005 | something more interesting and possibly surprising. | |
| 1006 | ||
| 712 | 1007 | As you might have seen, the compiler writer has a lot of freedom about | 
| 1008 | how to generate code from what the programmer wrote as program. The only | |
| 1009 | condition is that generated code should behave as expected by the | |
| 1010 | programmer. Then all is fine with the code above\ldots mission | |
| 1011 | accomplished! But sometimes the compiler writer is expected to go an | |
| 1012 | extra mile, or even miles and change(!) the meaning of a program. | |
| 1013 | Suppose we are given the following WHILE-program: | |
| 692 | 1014 | |
| 708 | 1015 | \begin{lstlisting}[mathescape,language=While]
 | 
| 1016 | new(arr[10]); | |
| 1017 | arr[14] := 3 + arr[13] | |
| 1018 | \end{lstlisting}
 | |
| 1019 | ||
| 1020 | \noindent | |
| 711 | 1021 | Admittedly this is a contrived program, and probably not meant to be | 
| 1022 | like this by any sane programmer, but it is supposed to make the | |
| 712 | 1023 | following point: The program generates an array of size 10, and then | 
| 1024 | tries to access the non-existing element at index 13 and even updating | |
| 1025 | the element with index 14. Obviously this is baloney. Still, our | |
| 1026 | compiler generates code for this program without any questions asked. We | |
| 1027 | can even run this code on the JVM\ldots of course the result is an | |
| 1028 | exception trace where the JVM yells at us for doing naughty | |
| 1029 | things.\footnote{Still this is much better than C, for example, where
 | |
| 1030 | such errors are not prevented and as a result insidious attacks can be | |
| 1031 | mounted against such kind C-programs. I assume everyone has heard about | |
| 1032 | \emph{Buffer Overflow Attacks}.} Now what should we do in such
 | |
| 1033 | situations? Over- and underflows of indices are notoriously difficult to | |
| 1034 | detect statically (at compiletime). So it might seem raising an | |
| 1035 | exception at run-time like the JVM is the best compromise. | |
| 708 | 1036 | |
| 711 | 1037 | Well, imagine we do not want to rely in our compiler on the JVM for | 
| 1038 | producing an annoying, but safe exception trace, rather we want to | |
| 712 | 1039 | handle such situations ourselves according to what we think should | 
| 1040 | happen in such cases. Let us assume we want to handle them in the | |
| 708 | 1041 | following way: if the programmer access a field out-of-bounds, we just | 
| 712 | 1042 | return a default 0, and if a programmer wants to update an out-of-bounds | 
| 1043 | field, we want to ``quietly'' ignore this update. One way to achieve | |
| 1044 | this would be to rewrite the WHILE-programs and insert the necessary | |
| 1045 | if-conditions for safely reading and writing arrays. Another way | |
| 1046 | is to modify the code we generate. | |
| 709 | 1047 | |
| 712 | 1048 | \begin{lstlisting}[mathescape,language=JVMIS2]
 | 
| 1049 |   $\textit{index\_aexp}$ 
 | |
| 1050 | aload loc_var | |
| 1051 | dup2 | |
| 1052 | arraylength | |
| 1053 | if_icmple L1 | |
| 1054 | pop2 | |
| 1055 | iconst_0 | |
| 1056 | goto L2 | |
| 1057 | L1: | |
| 1058 | swap | |
| 1059 | iaload | |
| 1060 | L2: | |
| 1061 | \end{lstlisting}
 | |
| 709 | 1062 | |
| 712 | 1063 |  \begin{lstlisting}[mathescape,language=JVMIS2]
 | 
| 1064 |   $\textit{index\_aexp}$ 
 | |
| 1065 | aload loc_var | |
| 1066 | dup2 | |
| 1067 | arraylength | |
| 1068 | if_icmple L1 | |
| 1069 | pop2 | |
| 1070 | goto L2 | |
| 1071 | L1: | |
| 1072 | swap | |
| 1073 |   $\textit{value\_aexp}$
 | |
| 1074 | iastore | |
| 1075 | L2: | |
| 1076 | \end{lstlisting}
 | |
| 709 | 1077 | |
| 714 | 1078 | \begin{figure}[p]
 | 
| 1079 | \begin{center}
 | |
| 1080 | \begin{tikzpicture}[every text node part/.style={align=left},
 | |
| 1081 |                     stack/.style={rectangle split,rectangle split parts = 5,
 | |
| 1082 | fill=black!20,draw,text width=1.6cm,line width=0.5mm}] | |
| 1083 | \node (A)  {};
 | |
| 1084 | \node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}};
 | |
| 1085 | \node[stack,right = 60pt] (1) at (0.east) | |
| 1086 |    {array\nodepart{two}
 | |
| 1087 |     $\textit{index}$\nodepart{three} \ldots\phantom{l}};
 | |
| 1088 | \node[stack,below = 40pt] (2) at (1.south) | |
| 1089 |    {array\nodepart{two}
 | |
| 1090 |     $\textit{index}$ \nodepart{three}
 | |
| 1091 |     array \nodepart{four}
 | |
| 1092 |     $\textit{index}$\nodepart{five} \ldots\phantom{l}}; 
 | |
| 1093 | \node[stack,left = 90pt] (3) at (2.west) | |
| 1094 |    {array\_len\nodepart{two}
 | |
| 1095 |     $\textit{index}$ \nodepart{three}
 | |
| 1096 |     array \nodepart{four}
 | |
| 1097 |     $\textit{index}$\nodepart{five} \ldots\phantom{l}};    
 | |
| 1098 | \node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south) | |
| 1099 |    {array\nodepart{two}
 | |
| 1100 |     $\textit{index}$\nodepart{three} \ldots\phantom{l}};
 | |
| 1101 | \node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south) | |
| 1102 |    {array\nodepart{two}
 | |
| 1103 |     $\textit{index}$\nodepart{three} \ldots\phantom{l}};  
 | |
| 1104 | \node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south) | |
| 1105 |    {$\textit{index}$\nodepart{two}
 | |
| 1106 |     array\nodepart{three} \ldots\phantom{l}};                
 | |
| 1107 | \node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south) | |
| 1108 |    {$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}};
 | |
| 1109 | \node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south) | |
| 1110 |    {\ldots\phantom{l}};       
 | |
| 1111 | \node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south) | |
| 1112 |    {0\nodepart{two} \ldots\phantom{l}}; 
 | |
| 1113 | ||
| 1114 | \draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0); 
 | |
| 1115 | \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1);
 | |
| 1116 | \draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2);  
 | |
| 1117 | \draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3);
 | |
| 1118 | \path[->,draw,line width=2.5mm] | |
| 1119 |   let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north);  
 | |
| 1120 | \path[->,draw,line width=2.5mm] | |
| 1121 | let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north); | |
| 1122 | \draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a);
 | |
| 1123 | \draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b);  
 | |
| 1124 | \draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a);
 | |
| 1125 | \draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b);
 | |
| 1126 | \end{tikzpicture}                      
 | |
| 1127 | \end{center}
 | |
| 1128 | \end{figure}
 | |
| 1129 | ||
| 713 | 1130 | goto\_w problem solved for too large jumps | 
| 327 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1131 | \end{document}
 | 
| 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1132 | |
| 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1133 | %%% Local Variables: | 
| 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1134 | %%% mode: latex | 
| 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1135 | %%% TeX-master: t | 
| 
9470cd124667
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1136 | %%% End: |