author | Christian Urban <christian.urban@kcl.ac.uk> |
Fri, 13 Oct 2023 15:07:37 +0100 | |
changeset 941 | 66adcae6c762 |
parent 714 | 8a50ccea59e8 |
child 943 | 5365ef60707e |
permissions | -rw-r--r-- |
601 | 1 |
% !TEX program = xelatex |
327
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
2 |
\documentclass{article} |
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
3 |
\usepackage{../style} |
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
4 |
\usepackage{../langs} |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
5 |
\usepackage{../grammar} |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
6 |
\usepackage{../graphics} |
714 | 7 |
\usetikzlibrary{calc,shapes,arrows} |
710 | 8 |
\usepackage{framed} |
9 |
\usepackage[belowskip=7pt,aboveskip=0pt]{caption} |
|
705 | 10 |
|
708 | 11 |
|
12 |
||
13 |
||
327
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
14 |
\begin{document} |
941 | 15 |
\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019, 2020, 2023} |
327
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
16 |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
17 |
\section*{Handout 7 (Compilation)} |
327
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
18 |
|
668 | 19 |
The purpose of a compiler is to transform a program a human can read and |
20 |
write into code the machine can run as fast as possible. The fastest |
|
21 |
code would be machine code the CPU can run directly, but it is often |
|
709 | 22 |
good enough for improving the speed of a program to target a virtual |
23 |
machine instead. This produces not the fastest possible code, but code |
|
710 | 24 |
that is often pretty fast. This way of producing code has also the |
25 |
advantage that the virtual machine takes care of things a compiler would |
|
26 |
normally need to take care of (hairy things like explicit memory |
|
27 |
management). |
|
452
b93f4d2aeee1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
394
diff
changeset
|
28 |
|
b93f4d2aeee1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
394
diff
changeset
|
29 |
As a first example in this module we will implement a compiler for the |
708 | 30 |
very simple WHILE-language that we parsed in the last lecture. The |
31 |
compiler will target the Java Virtual Machine (JVM), but not directly. |
|
32 |
Pictorially the compiler will work as follows: |
|
710 | 33 |
|
708 | 34 |
\begin{center} |
35 |
\begin{tikzpicture}[scale=1,font=\bf, |
|
36 |
node/.style={ |
|
37 |
rectangle,rounded corners=3mm, |
|
38 |
ultra thick,draw=black!50,minimum height=18mm, |
|
39 |
minimum width=20mm, |
|
40 |
top color=white,bottom color=black!20}] |
|
41 |
||
42 |
\node (0) at (-3,0) {}; |
|
43 |
\node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler}; |
|
44 |
\node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau}; |
|
45 |
\node (C) at (7.5,0) [node] {JVM}; |
|
46 |
||
47 |
\draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); |
|
48 |
\draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); |
|
49 |
\draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); |
|
50 |
\end{tikzpicture} |
|
51 |
\end{center} |
|
52 |
||
53 |
\noindent |
|
54 |
The input will be WHILE-programs; the output will be assembly files |
|
709 | 55 |
(with the file extension .j). Assembly files essentially contain |
941 | 56 |
human-readable low-level code, meaning they are not just bits and |
57 |
bytes, but rather something you can read and understand---with a bit |
|
58 |
of practice of course. An \emph{assembler} will then translate the |
|
59 |
assembly files into unreadable class- or binary-files the JVM or CPU |
|
60 |
can run. Unfortunately, the Java ecosystem does not come with an |
|
61 |
assembler which would be handy for our compiler-endeavour (unlike |
|
62 |
Microsoft's Common Language Infrastructure for the .Net platform which |
|
63 |
has an assembler out-of-the-box). As a substitute we shall use the |
|
64 |
3rd-party programs Jasmin or Krakatau (Jasmin is the preferred |
|
65 |
option---a \texttt{jasmin.jar}-file is available on KEATS): |
|
690 | 66 |
|
67 |
\begin{itemize} |
|
68 |
\item \url{http://jasmin.sourceforge.net} |
|
69 |
\item \url{https://github.com/Storyyeller/Krakatau} |
|
70 |
\end{itemize} |
|
71 |
||
72 |
\noindent |
|
73 |
The first is a Java program and the second a program written in Python. |
|
74 |
Each of them allow us to generate \emph{assembly} files that are still |
|
75 |
readable by humans, as opposed to class-files which are pretty much just |
|
76 |
(horrible) zeros and ones. Jasmin (respectively Krakatau) will then take |
|
710 | 77 |
our assembly files as input and generate the corresponding class-files for |
690 | 78 |
us. |
79 |
||
710 | 80 |
What is good about the JVM is that it is a stack-based virtual machine, |
81 |
a fact which will make it easy to generate code for arithmetic |
|
82 |
expressions. For example when compiling the expression $1 + 2$ we need |
|
83 |
to generate the following three instructions |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
84 |
|
668 | 85 |
\begin{lstlisting}[language=JVMIS,numbers=none] |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
86 |
ldc 1 |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
87 |
ldc 2 |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
88 |
iadd |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
89 |
\end{lstlisting} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
90 |
|
709 | 91 |
\noindent The first instruction loads the constant $1$ onto the stack, |
92 |
the next one loads $2$, the third instruction adds both numbers together |
|
93 |
replacing the top two elements of the stack with the result $3$. For |
|
710 | 94 |
simplicity, we will consider throughout only arithmetic involving |
95 |
integer numbers. This means our main JVM instructions for arithmetic |
|
711 | 96 |
will be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on. |
710 | 97 |
The \code{i} stands for integer instructions in the JVM (alternatives |
98 |
are \code{d} for doubles, \code{l} for longs and \code{f} for floats |
|
99 |
etc). |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
100 |
|
600 | 101 |
Recall our grammar for arithmetic expressions (\meta{E} is the |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
102 |
starting symbol): |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
103 |
|
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
104 |
|
601 | 105 |
\begin{plstx}[rhs style=, margin=3cm] |
106 |
: \meta{E} ::= \meta{T} $+$ \meta{E} |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
107 |
| \meta{T} $-$ \meta{E} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
108 |
| \meta{T}\\ |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
109 |
: \meta{T} ::= \meta{F} $*$ \meta{T} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
110 |
| \meta{F} $\backslash$ \meta{T} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
111 |
| \meta{F}\\ |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
112 |
: \meta{F} ::= ( \meta{E} ) |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
113 |
| \meta{Id} |
601 | 114 |
| \meta{Num}\\ |
115 |
\end{plstx} |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
116 |
|
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
117 |
|
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
118 |
\noindent where \meta{Id} stands for variables and \meta{Num} |
668 | 119 |
for numbers. For the moment let us omit variables from arithmetic |
120 |
expressions. Our parser will take this grammar and given an input |
|
712 | 121 |
program produce an abstract syntax tree. For example we obtain for |
709 | 122 |
the expression $1 + ((2 * 3) + (4 - 3))$ the following tree. |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
123 |
|
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
124 |
\begin{center} |
601 | 125 |
\begin{tikzpicture} |
126 |
\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] |
|
127 |
\end{tikzpicture} |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
128 |
\end{center} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
129 |
|
708 | 130 |
\noindent To generate JVM code for this expression, we need to traverse |
131 |
this tree in \emph{post-order} fashion and emit code for each |
|
132 |
node---this traversal in \emph{post-order} fashion will produce code for |
|
133 |
a stack-machine (which is what the JVM is). Doing so for the tree above |
|
134 |
generates the instructions |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
135 |
|
668 | 136 |
\begin{lstlisting}[language=JVMIS,numbers=none] |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
137 |
ldc 1 |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
138 |
ldc 2 |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
139 |
ldc 3 |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
140 |
imul |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
141 |
ldc 4 |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
142 |
ldc 3 |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
143 |
isub |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
144 |
iadd |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
145 |
iadd |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
146 |
\end{lstlisting} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
147 |
|
668 | 148 |
\noindent If we ``run'' these instructions, the result $8$ will be on |
149 |
top of the stack (I leave this to you to verify; the meaning of each |
|
150 |
instruction should be clear). The result being on the top of the stack |
|
690 | 151 |
will be an important convention we always observe in our compiler. Note, |
152 |
that a different bracketing of the expression, for example $(1 + (2 * |
|
153 |
3)) + (4 - 3)$, produces a different abstract syntax tree and thus also |
|
709 | 154 |
a different list of instructions. |
155 |
||
156 |
Generating code in this post-order-traversal fashion is rather easy to |
|
157 |
implement: it can be done with the following recursive |
|
158 |
\textit{compile}-function, which takes the abstract syntax tree as an |
|
159 |
argument: |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
160 |
|
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
161 |
\begin{center} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
162 |
\begin{tabular}{lcl} |
711 | 163 |
$\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
164 |
$\textit{compile}(a_1 + a_2)$ & $\dn$ & |
711 | 165 |
$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
166 |
$\textit{compile}(a_1 - a_2)$ & $\dn$ & |
711 | 167 |
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
168 |
$\textit{compile}(a_1 * a_2)$ & $\dn$ & |
711 | 169 |
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
170 |
$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & |
711 | 171 |
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
172 |
\end{tabular} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
173 |
\end{center} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
174 |
|
709 | 175 |
\noindent |
176 |
This is all fine, but our arithmetic expressions can contain variables |
|
177 |
and we have not considered them yet. To fix this we will represent our |
|
710 | 178 |
variables as \emph{local variables} of the JVM. Essentially, local |
709 | 179 |
variables are an array or pointers to memory cells, containing in our |
180 |
case only integers. Looking up a variable can be done with the |
|
181 |
instruction |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
182 |
|
668 | 183 |
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none] |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
184 |
iload $index$ |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
185 |
\end{lstlisting} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
186 |
|
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
187 |
\noindent |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
188 |
which places the content of the local variable $index$ onto |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
189 |
the stack. Storing the top of the stack into a local variable |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
190 |
can be done by the instruction |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
191 |
|
668 | 192 |
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none] |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
193 |
istore $index$ |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
194 |
\end{lstlisting} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
195 |
|
708 | 196 |
\noindent Note that this also pops off the top of the stack. One problem |
197 |
we have to overcome, however, is that local variables are addressed, not |
|
198 |
by identifiers (like \texttt{x}, \texttt{foo} and so on), but by numbers |
|
199 |
(starting from $0$). Therefore our compiler needs to maintain a kind of |
|
200 |
environment where variables are associated to numbers. This association |
|
201 |
needs to be unique: if we muddle up the numbers, then we essentially |
|
202 |
confuse variables and the consequence will usually be an erroneous |
|
203 |
result. Our extended \textit{compile}-function for arithmetic |
|
204 |
expressions will therefore take two arguments: the abstract syntax tree |
|
205 |
and an environment, $E$, that maps identifiers to index-numbers. |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
206 |
|
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
207 |
\begin{center} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
208 |
\begin{tabular}{lcl} |
711 | 209 |
$\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
210 |
$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
711 | 211 |
$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
212 |
$\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
711 | 213 |
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
214 |
$\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
711 | 215 |
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\ |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
216 |
$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
711 | 217 |
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\ |
218 |
$\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\ |
|
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
219 |
\end{tabular} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
220 |
\end{center} |
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
221 |
|
708 | 222 |
\noindent In the last line we generate the code for variables where |
223 |
$E(x)$ stands for looking up the environment to which index the variable |
|
224 |
$x$ maps to. This is similar to the interpreter we saw earlier in the |
|
225 |
module, which also needs an environment: the difference is that the |
|
226 |
interpreter maintains a mapping from variables to current values (what |
|
227 |
is the currently the value of a variable?), while compilers need a |
|
228 |
mapping from variables to memory locations (where can I find the current |
|
229 |
value for the variable in memory?). |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
230 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
231 |
There is a similar \textit{compile}-function for boolean |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
232 |
expressions, but it includes a ``trick'' to do with |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
233 |
\pcode{if}- and \pcode{while}-statements. To explain the issue |
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
234 |
let us first describe the compilation of statements of the |
708 | 235 |
WHILE-language. The clause for \pcode{skip} is trivial, since |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
236 |
we do not have to generate any instruction |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
237 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
238 |
\begin{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
239 |
\begin{tabular}{lcl} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
240 |
$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\ |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
241 |
\end{tabular} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
242 |
\end{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
243 |
|
668 | 244 |
\noindent whereby $[]$ is the empty list of instructions. Note that |
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
245 |
the \textit{compile}-function for statements returns a pair, a |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
246 |
list of instructions (in this case the empty list) and an |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
247 |
environment for variables. The reason for the environment is |
708 | 248 |
that assignments in the WHILE-language might change the |
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
249 |
environment---clearly if a variable is used for the first |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
250 |
time, we need to allocate a new index and if it has been used |
690 | 251 |
before, then we need to be able to retrieve the associated index. |
252 |
This is reflected in the clause for compiling assignments, say |
|
712 | 253 |
$x := a$: |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
254 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
255 |
\begin{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
256 |
\begin{tabular}{lcl} |
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
257 |
$\textit{compile}(x := a, E)$ & $\dn$ & |
711 | 258 |
$(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$ |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
259 |
\end{tabular} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
260 |
\end{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
261 |
|
708 | 262 |
\noindent We first generate code for the right-hand side of the |
263 |
assignment (that is the arithmetic expression $a$) and then add an |
|
711 | 264 |
\instr{istore}-instruction at the end. By convention running the code |
708 | 265 |
for the arithmetic expression $a$ will leave the result on top of the |
712 | 266 |
stack. After that the \instr{istore}-instruction, the result will be |
708 | 267 |
stored in the index corresponding to the variable $x$. If the variable |
268 |
$x$ has been used before in the program, we just need to look up what |
|
269 |
the index is and return the environment unchanged (that is in this case |
|
270 |
$E' = E$). However, if this is the first encounter of the variable $x$ |
|
271 |
in the program, then we have to augment the environment and assign $x$ |
|
272 |
with the largest index in $E$ plus one (that is $E' = E(x \mapsto |
|
273 |
largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we |
|
710 | 274 |
generate the following code snippet |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
275 |
|
668 | 276 |
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none] |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
277 |
iload $n_x$ |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
278 |
ldc 1 |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
279 |
iadd |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
280 |
istore $n_x$ |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
281 |
\end{lstlisting} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
282 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
283 |
\noindent |
692 | 284 |
where $n_x$ is the index (or pointer to the memory) for the variable |
709 | 285 |
$x$. The Scala code for looking-up the index for the variable is as follow: |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
286 |
|
668 | 287 |
\begin{center} |
288 |
\begin{tabular}{lcl} |
|
690 | 289 |
$index \;=\; E\textit{.getOrElse}(x, |E|)$ |
668 | 290 |
\end{tabular} |
291 |
\end{center} |
|
292 |
||
293 |
\noindent |
|
708 | 294 |
This implements the idea that in case the environment $E$ contains an |
295 |
index for $x$, we return it. Otherwise we ``create'' a new index by |
|
296 |
returning the size $|E|$ of the environment (that will be an index that |
|
297 |
is guaranteed not to be used yet). In all this we take advantage of the |
|
298 |
JVM which provides us with a potentially limitless supply of places |
|
299 |
where we can store values of variables. |
|
668 | 300 |
|
692 | 301 |
A bit more complicated is the generation of code for |
302 |
\pcode{if}-statements, say |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
303 |
|
711 | 304 |
\begin{lstlisting}[mathescape,language={WHILE},numbers=none] |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
305 |
if $b$ then $cs_1$ else $cs_2$ |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
306 |
\end{lstlisting} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
307 |
|
692 | 308 |
\noindent where $b$ is a boolean expression and where both $cs_{1/2}$ |
708 | 309 |
are the statements for each of the \pcode{if}-branches. Let us assume we |
310 |
already generated code for $b$ and and the two if-branches $cs_{1/2}$. |
|
311 |
Then in the true-case the control-flow of the program needs to behave as |
|
312 |
||
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
313 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
314 |
\begin{center} |
708 | 315 |
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
316 |
block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
|
317 |
top color=white,bottom color=black!20}, |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
318 |
point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
319 |
skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
320 |
\node (A1) [point] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
321 |
\node (b) [block, right=of A1] {code of $b$}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
322 |
\node (A2) [point, right=of b] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
323 |
\node (cs1) [block, right=of A2] {code of $cs_1$}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
324 |
\node (A3) [point, right=of cs1] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
325 |
\node (cs2) [block, right=of A3] {code of $cs_2$}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
326 |
\node (A4) [point, right=of cs2] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
327 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
328 |
\draw (A1) edge [->, black, line width=1mm] (b); |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
329 |
\draw (b) edge [->, black, line width=1mm] (cs1); |
708 | 330 |
\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3); |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
331 |
\draw (A3) edge [->, black, skip loop] (A4); |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
332 |
\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
333 |
\end{tikzpicture} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
334 |
\end{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
335 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
336 |
\noindent where we start with running the code for $b$; since |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
337 |
we are in the true case we continue with running the code for |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
338 |
$cs_1$. After this however, we must not run the code for |
708 | 339 |
$cs_2$, but always jump to after the last instruction of $cs_2$ |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
340 |
(the code for the \pcode{else}-branch). Note that this jump is |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
341 |
unconditional, meaning we always have to jump to the end of |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
342 |
$cs_2$. The corresponding instruction of the JVM is |
711 | 343 |
\instr{goto}. In case $b$ turns out to be false we need the |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
344 |
control-flow |
370
a65767fe5d71
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
369
diff
changeset
|
345 |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
346 |
\begin{center} |
708 | 347 |
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
348 |
block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
|
349 |
top color=white,bottom color=black!20}, |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
350 |
point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
351 |
skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
352 |
\node (A1) [point] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
353 |
\node (b) [block, right=of A1] {code of $b$}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
354 |
\node (A2) [point, right=of b] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
355 |
\node (cs1) [block, right=of A2] {code of $cs_1$}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
356 |
\node (A3) [point, right=of cs1] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
357 |
\node (cs2) [block, right=of A3] {code of $cs_2$}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
358 |
\node (A4) [point, right=of cs2] {}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
359 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
360 |
\draw (A1) edge [->, black, line width=1mm] (b); |
708 | 361 |
\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2); |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
362 |
\draw (A2) edge [skip loop] (A3); |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
363 |
\draw (A3) edge [->, black, line width=1mm] (cs2); |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
364 |
\draw (cs2) edge [->,black, line width=1mm] (A4); |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
365 |
\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
366 |
\end{tikzpicture} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
367 |
\end{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
368 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
369 |
\noindent where we now need a conditional jump (if the |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
370 |
if-condition is false) from the end of the code for the |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
371 |
boolean to the beginning of the instructions $cs_2$. Once we |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
372 |
are finished with running $cs_2$ we can continue with whatever |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
373 |
code comes after the if-statement. |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
374 |
|
711 | 375 |
The \instr{goto} and the conditional jumps need addresses to |
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
376 |
where the jump should go. Since we are generating assembly |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
377 |
code for the JVM, we do not actually have to give (numeric) |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
378 |
addresses, but can just attach (symbolic) labels to our code. |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
379 |
These labels specify a target for a jump. Therefore the labels |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
380 |
need to be unique, as otherwise it would be ambiguous where a |
712 | 381 |
jump should go to. A label, say \pcode{L}, is attached to assembly |
382 |
code like |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
383 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
384 |
\begin{lstlisting}[mathescape,numbers=none] |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
385 |
L: |
711 | 386 |
$\textit{instr\_1}$ |
387 |
$\textit{instr\_2}$ |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
388 |
$\vdots$ |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
389 |
\end{lstlisting} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
390 |
|
708 | 391 |
\noindent where the label needs to be followed by a colon. The task of |
392 |
the assembler (in our case Jasmin or Krakatau) is to resolve the labels |
|
393 |
to actual (numeric) addresses, for example jump 10 instructions forward, |
|
692 | 394 |
or 20 instructions backwards. |
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
395 |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
396 |
Recall the ``trick'' with compiling boolean expressions: the |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
397 |
\textit{compile}-function for boolean expressions takes three |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
398 |
arguments: an abstract syntax tree, an environment for |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
399 |
variable indices and also the label, $lab$, to where an conditional |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
400 |
jump needs to go. The clause for the expression $a_1 = a_2$, |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
401 |
for example, is as follows: |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
402 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
403 |
\begin{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
404 |
\begin{tabular}{lcl} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
405 |
$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ |
711 | 406 |
\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$} |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
407 |
\end{tabular} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
408 |
\end{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
409 |
|
376
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
410 |
\noindent where we are first generating code for the |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
411 |
subexpressions $a_1$ and $a_2$. This will mean after running |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
412 |
the corresponding code there will be two integers on top of |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
413 |
the stack. If they are equal, we do not have to do anything |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
414 |
(except for popping them off from the stack) and just continue |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
415 |
with the next instructions (see control-flow of ifs above). |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
416 |
However if they are \emph{not} equal, then we need to |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
417 |
(conditionally) jump to the label $lab$. This can be done with |
af65ffff9cdd
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
375
diff
changeset
|
418 |
the instruction |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
419 |
|
692 | 420 |
\begin{lstlisting}[mathescape,numbers=none,language=JVMIS] |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
421 |
if_icmpne $lab$ |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
422 |
\end{lstlisting} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
423 |
|
708 | 424 |
To sum up, the third argument in the compile function for booleans |
425 |
specifies where to jump, in case the condition is \emph{not} true. I |
|
426 |
leave it to you to extend the \textit{compile}-function for the other |
|
427 |
boolean expressions. Note that we need to jump whenever the boolean is |
|
428 |
\emph{not} true, which means we have to ``negate'' the jump |
|
429 |
condition---equals becomes not-equal, less becomes greater-or-equal. |
|
430 |
Other jump instructions for boolean operators are |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
431 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
432 |
\begin{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
433 |
\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} |
711 | 434 |
$\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\ |
435 |
$<$ & $\Rightarrow$ & \instr{if_icmpge}\\ |
|
436 |
$\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\ |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
437 |
\end{tabular} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
438 |
\end{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
439 |
|
708 | 440 |
\noindent and so on. If you do not like this design (it can be the |
692 | 441 |
source of some nasty, hard-to-detect errors), you can also change the |
442 |
layout of the code and first give the code for the else-branch and then |
|
443 |
for the if-branch. However in the case of while-loops this |
|
444 |
``upside-down-inside-out'' way of generating code still seems the most |
|
445 |
convenient. |
|
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
446 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
447 |
We are now ready to give the compile function for |
601 | 448 |
if-statements---remember this function returns for statements a |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
449 |
pair consisting of the code and an environment: |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
450 |
|
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
451 |
\begin{center} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
452 |
\begin{tabular}{lcl} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
453 |
$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
454 |
\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
455 |
\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\ |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
456 |
\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
457 |
\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
458 |
\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\ |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
459 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\ |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
460 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
461 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\ |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
462 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\ |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
463 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\ |
372
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
464 |
\end{tabular} |
d6af4b1239de
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
370
diff
changeset
|
465 |
\end{center} |
327
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
466 |
|
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
467 |
\noindent In the first two lines we generate two fresh labels |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
468 |
for the jump addresses (just before the else-branch and just |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
469 |
after). In the next two lines we generate the instructions for |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
470 |
the two branches, $is_1$ and $is_2$. The final code will |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
471 |
be first the code for $b$ (including the label |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
472 |
just-before-the-else-branch), then the \pcode{goto} for after |
712 | 473 |
the else-branch, the label $L_\textit{ifelse}$, followed by |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
474 |
the instructions for the else-branch, followed by the |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
475 |
after-the-else-branch label. Consider for example the |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
476 |
if-statement: |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
477 |
|
690 | 478 |
\begin{lstlisting}[mathescape,numbers=none,language=While] |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
479 |
if 1 = 1 then x := 2 else y := 3 |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
480 |
\end{lstlisting} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
481 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
482 |
\noindent |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
483 |
The generated code is as follows: |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
484 |
|
690 | 485 |
\begin{lstlisting}[language=JVMIS,mathescape,numbers=left] |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
486 |
ldc 1 |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
487 |
ldc 1 |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
488 |
if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
489 |
ldc 2 |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
490 |
istore 0 |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
491 |
goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
492 |
L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
493 |
ldc 3 |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
494 |
istore 1 |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
495 |
L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
496 |
\end{lstlisting} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
497 |
|
601 | 498 |
\begin{tikzpicture}[remember picture,overlay] |
499 |
\draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) |
|
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
500 |
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
501 |
\draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) |
601 | 502 |
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); |
503 |
\end{tikzpicture} |
|
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
504 |
|
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
505 |
\noindent The first three lines correspond to the the boolean |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
506 |
expression $1 = 1$. The jump for when this boolean expression |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
507 |
is false is in Line~3. Lines 4-6 corresponds to the if-branch; |
712 | 508 |
the else-branch is in Lines 8 and 9. |
509 |
||
510 |
Note carefully how the environment $E$ is threaded through the recursive |
|
511 |
calls of \textit{compile}. The function receives an environment $E$, but |
|
512 |
it might extend it when compiling the if-branch, yielding $E'$. This |
|
513 |
happens for example in the if-statement above whenever the variable |
|
514 |
\code{x} has not been used before. Similarly with the environment $E''$ |
|
515 |
for the second call to \textit{compile}. $E''$ is also the environment |
|
516 |
that needs to be returned as part of the answer. |
|
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
517 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
518 |
The compilation of the while-loops, say |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
519 |
\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
520 |
the condition is true and we need to do another iteration, |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
521 |
and the control-flow needs to be as follows |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
522 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
523 |
\begin{center} |
708 | 524 |
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
525 |
block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
|
526 |
top color=white,bottom color=black!20}, |
|
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
527 |
point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
528 |
skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
529 |
\node (A0) [point, left=of A1] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
530 |
\node (A1) [point] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
531 |
\node (b) [block, right=of A1] {code of $b$}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
532 |
\node (A2) [point, right=of b] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
533 |
\node (cs1) [block, right=of A2] {code of $cs$}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
534 |
\node (A3) [point, right=of cs1] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
535 |
\node (A4) [point, right=of A3] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
536 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
537 |
\draw (A0) edge [->, black, line width=1mm] (b); |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
538 |
\draw (b) edge [->, black, line width=1mm] (cs1); |
708 | 539 |
\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3); |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
540 |
\draw (A3) edge [->,skip loop] (A1); |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
541 |
\end{tikzpicture} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
542 |
\end{center} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
543 |
|
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
544 |
\noindent Whereas if the condition is \emph{not} true, we |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
545 |
need to jump out of the loop, which gives the following |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
546 |
control flow. |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
547 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
548 |
\begin{center} |
708 | 549 |
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
550 |
block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
|
551 |
top color=white,bottom color=black!20}, |
|
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
552 |
point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
553 |
skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
554 |
\node (A0) [point, left=of A1] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
555 |
\node (A1) [point] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
556 |
\node (b) [block, right=of A1] {code of $b$}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
557 |
\node (A2) [point, right=of b] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
558 |
\node (cs1) [block, right=of A2] {code of $cs$}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
559 |
\node (A3) [point, right=of cs1] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
560 |
\node (A4) [point, right=of A3] {}; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
561 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
562 |
\draw (A0) edge [->, black, line width=1mm] (b); |
708 | 563 |
\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2); |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
564 |
\draw (A2) edge [skip loop] (A3); |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
565 |
\draw (A3) edge [->, black, line width=1mm] (A4); |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
566 |
\end{tikzpicture} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
567 |
\end{center} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
568 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
569 |
\noindent Again we can use the \textit{compile}-function for |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
570 |
boolean expressions to insert the appropriate jump to the |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
571 |
end of the loop (label $L_{wend}$ below). |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
572 |
|
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
573 |
\begin{center} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
574 |
\begin{tabular}{lcl} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
575 |
$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
576 |
\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
577 |
\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
578 |
\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
579 |
\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
580 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
581 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\ |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
582 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\ |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
583 |
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\ |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
584 |
\end{tabular} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
585 |
\end{center} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
586 |
|
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
587 |
\noindent I let you go through how this clause works. As an example |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
588 |
you can consider the while-loop |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
589 |
|
690 | 590 |
\begin{lstlisting}[mathescape,numbers=none,language=While] |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
591 |
while x <= 10 do x := x + 1 |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
592 |
\end{lstlisting} |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
593 |
|
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
594 |
\noindent yielding the following code |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
595 |
|
709 | 596 |
\begin{lstlisting}[language=JVMIS2,mathescape,numbers=left] |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
597 |
L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
598 |
iload 0 |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
599 |
ldc 10 |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
600 |
if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$ |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
601 |
iload 0 |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
602 |
ldc 1 |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
603 |
iadd |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
604 |
istore 0 |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
605 |
goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$ |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
606 |
L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$ |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
607 |
\end{lstlisting} |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
608 |
|
601 | 609 |
\begin{tikzpicture}[remember picture,overlay] |
610 |
\draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) |
|
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
611 |
-- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
612 |
\draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) |
601 | 613 |
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); |
614 |
\end{tikzpicture} |
|
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
615 |
|
690 | 616 |
\noindent |
708 | 617 |
As said, I leave it to you to decide whether the code implements |
618 |
the usual controlflow of while-loops. |
|
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
619 |
|
709 | 620 |
Next we need to consider the WHILE-statement \pcode{write x}, which can |
621 |
be used to print out the content of a variable. For this we shall use a |
|
708 | 622 |
Java library function. In order to avoid having to generate a lot of |
623 |
code for each \pcode{write}-command, we use a separate helper-method and |
|
624 |
just call this method with an appropriate argument (which of course |
|
625 |
needs to be placed onto the stack). The code of the helper-method is as |
|
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
626 |
follows. |
374
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
627 |
|
709 | 628 |
\begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small] |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
629 |
.method public static write(I)V |
374
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
630 |
.limit locals 1 |
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
631 |
.limit stack 2 |
373
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
632 |
getstatic java/lang/System/out Ljava/io/PrintStream; |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
633 |
iload 0 |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
634 |
invokevirtual java/io/PrintStream/println(I)V |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
635 |
return |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
636 |
.end method |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
637 |
\end{lstlisting} |
b018234c9126
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
372
diff
changeset
|
638 |
|
709 | 639 |
\noindent The first line marks the beginning of the method, called |
640 |
\pcode{write}. It takes a single integer argument indicated by the |
|
641 |
\pcode{(I)} and returns no result, indicated by the \pcode{V} (for |
|
642 |
void). Since the method has only one argument, we only need a single |
|
643 |
local variable (Line~2) and a stack with two cells will be sufficient |
|
644 |
(Line 3). Line 4 instructs the JVM to get the value of the member |
|
712 | 645 |
\pcode{out} from the class \pcode{java/lang/System}. It expects the value |
709 | 646 |
to be of type \pcode{java/io/PrintStream}. A reference to this value |
647 |
will be placed on the stack.\footnote{Note the syntax \texttt{L |
|
648 |
\ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the |
|
649 |
designers of Jasmin decided that this syntax is pleasing to the eye. So |
|
650 |
if you wanted to have strings in your Jasmin code, you would need to |
|
710 | 651 |
write \texttt{Ljava/lang/String;}\;. If you want arrays of one |
652 |
dimension, then use \texttt{[\ldots}; two dimensions, use |
|
653 |
\texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5 |
|
654 |
copies the integer we want to print out onto the stack. In the line |
|
655 |
after that we call the method \pcode{println} (from the class |
|
656 |
\pcode{java/io/PrintStream}). We want to print out an integer and do not |
|
657 |
expect anything back (that is why the type annotation is \pcode{(I)V}). |
|
658 |
The \pcode{return}-instruction in the next line changes the control-flow |
|
659 |
back to the place from where \pcode{write} was called. This method needs |
|
660 |
to be part of a header that is included in any code we generate. The |
|
661 |
helper-method \pcode{write} can be invoked with the two instructions |
|
374
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
662 |
|
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
663 |
\begin{lstlisting}[mathescape,language=JVMIS] |
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
664 |
iload $E(x)$ |
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
665 |
invokestatic XXX/XXX/write(I)V |
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
666 |
\end{lstlisting} |
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
667 |
|
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
668 |
\noindent where we first place the variable to be printed on |
377
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
669 |
top of the stack and then call \pcode{write}. The \pcode{XXX} |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
670 |
need to be replaced by an appropriate class name (this will be |
a052a83f562e
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
376
diff
changeset
|
671 |
explained shortly). |
374
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
672 |
|
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
673 |
|
709 | 674 |
By generating code for a WHILE-program, we end up with a list of (JVM |
675 |
assembly) instructions. Unfortunately, there is a bit more boilerplate |
|
676 |
code needed before these instructions can be run. Essentially we have to |
|
677 |
enclose them inside a Java \texttt{main}-method. The corresponding code |
|
678 |
is shown in Figure~\ref{boiler}. This boilerplate code is very specific |
|
679 |
to the JVM. If we target any other virtual machine or a machine |
|
680 |
language, then we would need to change this code. Interesting are the |
|
681 |
Lines 5 and 6 where we hardwire that the stack of our programs will |
|
682 |
never be larger than 200 and that the maximum number of variables is |
|
683 |
also 200. This seem to be conservative default values that allow is to |
|
684 |
run some simple WHILE-programs. In a real compiler, we would of course |
|
685 |
need to work harder and find out appropriate values for the stack and |
|
686 |
local variables. |
|
374
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
687 |
|
708 | 688 |
\begin{figure}[t] |
710 | 689 |
\begin{framed} |
708 | 690 |
\begin{lstlisting}[mathescape,language=JVMIS,numbers=left] |
691 |
.class public XXX.XXX |
|
692 |
.super java/lang/Object |
|
693 |
||
694 |
.method public static main([Ljava/lang/String;)V |
|
695 |
.limit locals 200 |
|
696 |
.limit stack 200 |
|
697 |
||
698 |
$\textit{\ldots{}here comes the compiled code\ldots}$ |
|
699 |
||
700 |
return |
|
701 |
.end method |
|
702 |
\end{lstlisting} |
|
710 | 703 |
\end{framed} |
709 | 704 |
\caption{The boilerplate code needed for running generated code. It |
711 | 705 |
hardwires limits for stack space and for the number of local |
709 | 706 |
variables.\label{boiler}} |
708 | 707 |
\end{figure} |
708 |
||
709 |
||
375
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
710 |
To sum up, in Figure~\ref{test} is the complete code generated |
601 | 711 |
for the slightly nonsensical program |
375
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
712 |
|
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
713 |
\begin{lstlisting}[mathescape,language=While] |
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
714 |
x := 1 + 2; |
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
715 |
write x |
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
716 |
\end{lstlisting} |
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
717 |
|
692 | 718 |
\noindent I let you read the code and make sure the code behaves as |
719 |
expected. Having this code at our disposal, we need the assembler to |
|
720 |
translate the generated code into JVM bytecode (a class file). This |
|
721 |
bytecode is then understood by the JVM and can be run by just invoking |
|
709 | 722 |
the \pcode{java}-program. Again I let you do the work. |
375
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
723 |
|
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
724 |
|
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
725 |
\begin{figure}[p] |
710 | 726 |
\begin{framed} |
709 | 727 |
\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j} |
708 | 728 |
\begin{tikzpicture}[remember picture,overlay] |
729 |
\draw[|<->|,very thick] (LA.north) -- (LB.south) |
|
710 | 730 |
node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; |
708 | 731 |
\draw[|<->|,very thick] (LC.north) -- (LD.south) |
710 | 732 |
node[left=-0.5mm,midway] {\footnotesize\texttt{write x}}; |
708 | 733 |
\end{tikzpicture} |
710 | 734 |
\end{framed} |
708 | 735 |
\caption{The generated code for the test program \texttt{x := 1 + 2; write |
736 |
x}. This code can be processed by a Java assembler producing a |
|
737 |
class-file, which can then be run by the {\tt{}java}-program.\label{test}} |
|
375
bf36664a3196
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
374
diff
changeset
|
738 |
\end{figure} |
374
0e25fb72d339
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
373
diff
changeset
|
739 |
|
690 | 740 |
\subsection*{Arrays} |
741 |
||
708 | 742 |
Maybe a useful addition to the WHILE-language would be arrays. This |
743 |
would allow us to generate more interesting WHILE-programs by |
|
744 |
translating BF*** programs into equivalent WHILE-code. Therefore in this |
|
745 |
section let us have a look at how we can support the following three |
|
746 |
constructions |
|
690 | 747 |
|
748 |
\begin{lstlisting}[mathescape,language=While] |
|
708 | 749 |
new(arr[15000]) |
690 | 750 |
x := 3 + arr[3 + y] |
751 |
arr[42 * n] := ... |
|
752 |
\end{lstlisting} |
|
753 |
||
754 |
\noindent |
|
708 | 755 |
The first construct is for creating new arrays. In this instance the |
756 |
name of the array is \pcode{arr} and it can hold 15000 integers. We do |
|
757 |
not support ``dynamic'' arrays, that is the size of our arrays will |
|
758 |
always be fixed. The second construct is for referencing an array cell |
|
759 |
inside an arithmetic expression---we need to be able to look up the |
|
760 |
contents of an array at an index determined by an arithmetic expression. |
|
761 |
Similarly in the line below, we need to be able to update the content of |
|
712 | 762 |
an array at a calculated index. |
691 | 763 |
|
764 |
For creating a new array we can generate the following three JVM |
|
765 |
instructions: |
|
690 | 766 |
|
767 |
\begin{lstlisting}[mathescape,language=JVMIS] |
|
768 |
ldc number |
|
769 |
newarray int |
|
770 |
astore loc_var |
|
771 |
\end{lstlisting} |
|
772 |
||
773 |
\noindent |
|
708 | 774 |
First we need to put the size of the array onto the stack. The next |
775 |
instruction creates the array. In this case the array contains |
|
776 |
\texttt{int}s. With the last instruction we can store the array as a |
|
691 | 777 |
local variable (like the ``simple'' variables from the previous |
692 | 778 |
section). The use of a local variable for each array allows us to have |
708 | 779 |
multiple arrays in a WHILE-program. For looking up an element in an |
692 | 780 |
array we can use the following JVM code |
690 | 781 |
|
782 |
\begin{lstlisting}[mathescape,language=JVMIS] |
|
783 |
aload loc_var |
|
711 | 784 |
$\textit{index\_aexp}$ |
690 | 785 |
iaload |
786 |
\end{lstlisting} |
|
787 |
||
788 |
\noindent |
|
708 | 789 |
The first instruction loads the ``pointer'', or local variable, to the |
790 |
array onto the stack. Then we have some instructions calculating the |
|
791 |
index where we want to look up the array. The idea is that these |
|
792 |
instructions will leave a concrete number on the top of the stack, which |
|
793 |
will be the index into the array we need. Finally we need to tell the |
|
794 |
JVM to load the corresponding element onto the stack. Updating an array |
|
795 |
at an index with a value is as follows. |
|
691 | 796 |
|
797 |
\begin{lstlisting}[mathescape,language=JVMIS] |
|
798 |
aload loc_var |
|
711 | 799 |
$\textit{index\_aexp}$ |
800 |
$\textit{value\_aexp}$ |
|
691 | 801 |
iastore |
802 |
\end{lstlisting} |
|
803 |
||
804 |
\noindent |
|
708 | 805 |
Again the first instruction loads the local variable of |
806 |
the array onto the stack. Then we have some instructions calculating |
|
807 |
the index where we want to update the array. After that come the |
|
808 |
instructions for with which value we want to update the array. The last |
|
809 |
line contains the instruction for updating the array. |
|
691 | 810 |
|
708 | 811 |
Next we need to modify our grammar rules for our WHILE-language: it |
692 | 812 |
seems best to extend the rule for factors in arithmetic expressions with |
813 |
a rule for looking up an array. |
|
691 | 814 |
|
815 |
\begin{plstx}[rhs style=, margin=3cm] |
|
816 |
: \meta{E} ::= \meta{T} $+$ \meta{E} |
|
817 |
| \meta{T} $-$ \meta{E} |
|
818 |
| \meta{T}\\ |
|
819 |
: \meta{T} ::= \meta{F} $*$ \meta{T} |
|
820 |
| \meta{F} $\backslash$ \meta{T} |
|
821 |
| \meta{F}\\ |
|
822 |
: \meta{F} ::= ( \meta{E} ) |
|
823 |
| $\underbrace{\meta{Id}\,[\,\meta{E}\,]}_{new}$ |
|
824 |
| \meta{Id} |
|
825 |
| \meta{Num}\\ |
|
826 |
\end{plstx} |
|
827 |
||
828 |
\noindent |
|
829 |
There is no problem with left-recursion as the \meta{E} is ``protected'' |
|
692 | 830 |
by an identifier and the brackets. There are two new rules for statements, |
831 |
one for creating an array and one for array assignment: |
|
691 | 832 |
|
833 |
\begin{plstx}[rhs style=, margin=2cm, one per line] |
|
834 |
: \meta{Stmt} ::= \ldots |
|
708 | 835 |
| \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) |
691 | 836 |
| \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\ |
837 |
\end{plstx} |
|
690 | 838 |
|
708 | 839 |
With this in place we can turn back to the idea of creating |
712 | 840 |
WHILE-programs by translating BF-programs. This is a relatively easy |
708 | 841 |
task because BF has only eight instructions (we will actually implement |
842 |
seven because we can omit the read-in instruction from BF). What makes |
|
843 |
this translation easy is that BF-loops can be straightforwardly |
|
844 |
represented as while-loops. The Scala code for the translation is as |
|
845 |
follows: |
|
692 | 846 |
|
847 |
\begin{lstlisting}[language=Scala,numbers=left] |
|
848 |
def instr(c: Char) : String = c match { |
|
849 |
case '>' => "ptr := ptr + 1;" |
|
850 |
case '<' => "ptr := ptr - 1;" |
|
708 | 851 |
case '+' => "mem[ptr] := mem [ptr] + 1;" |
852 |
case '-' => "mem [ptr] := mem [ptr] - 1;" |
|
853 |
case '.' => "x := mem [ptr]; write x;" |
|
854 |
case '[' => "while (mem [ptr] != 0) do {" |
|
692 | 855 |
case ']' => "skip};" |
856 |
case _ => "" |
|
857 |
} |
|
858 |
\end{lstlisting} |
|
859 |
||
860 |
\noindent |
|
861 |
The idea behind the translation is that BF-programs operate on an array, |
|
710 | 862 |
called here \texttt{mem}. The BF-memory pointer into this array is |
708 | 863 |
represented as the variable \texttt{ptr}. As usual the BF-instructions |
864 |
\code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The |
|
865 |
instructions \code{+} and \code{-} update a cell in \texttt{mem}. In |
|
710 | 866 |
Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary |
867 |
variable since we have not changed our write functions in order to cope |
|
868 |
with writing out any array-content directly. Lines 7 and 8 are for |
|
692 | 869 |
translating BF-loops. Line 8 is interesting in the sense that we need to |
708 | 870 |
generate a \code{skip} instruction just before finishing with the |
692 | 871 |
closing \code{"\}"}. The reason is that we are rather pedantic about |
708 | 872 |
semicolons in our WHILE-grammar: the last command cannot have a |
710 | 873 |
semicolon---adding a \code{skip} works around this snag. |
874 |
||
711 | 875 |
Putting this all together and we can generate WHILE-programs with more |
710 | 876 |
than 15K JVM-instructions; run the compiled JVM code for such |
877 |
programs and marvel at the output\ldots\medskip |
|
708 | 878 |
|
879 |
\noindent |
|
711 | 880 |
\ldots{}Hooooray, after a few more tweaks we can finally run the |
881 |
BF-mandelbrot program on the JVM (after nearly 10 minutes of parsing the |
|
882 |
corresponding WHILE-program; the size of the resulting class file is |
|
883 |
around 32K---not too bad). The generation of the picture completes |
|
884 |
within 20 or so seconds. Try replicating this with an interpreter! The |
|
710 | 885 |
good point is that we now have a sufficiently complicated program in our |
886 |
WHILE-language in order to do some benchmarking. Which means we now face |
|
887 |
the question about what to do next\ldots |
|
888 |
||
889 |
\subsection*{Optimisations \& Co} |
|
890 |
||
712 | 891 |
Every compiler that deserves its name has to perform some optimisations |
892 |
on the code: if we put in the extra effort of writing a compiler for a |
|
893 |
language, then obviously we want to have our code to run as fast as |
|
894 |
possible. So we should look into this in more detail. |
|
708 | 895 |
|
711 | 896 |
There is actually one aspect in our generated code where we can make |
712 | 897 |
easily efficiency gains. This has to do with some of the quirks of the |
711 | 898 |
JVM. Whenever we push a constant onto the stack, we used the JVM |
899 |
instruction \instr{ldc some_const}. This is a rather generic instruction |
|
900 |
in the sense that it works not just for integers but also for strings, |
|
901 |
objects and so on. What this instruction does is putting the constant |
|
712 | 902 |
into a \emph{constant pool} and then uses an index into this constant |
711 | 903 |
pool. This means \instr{ldc} will be represented by at least two bytes |
712 | 904 |
in the class file. While this is a sensible strategy for ``large'' |
905 |
constants like strings, it is a bit of overkill for small integers |
|
906 |
(which many integers will be when compiling a BF-program). To counter |
|
907 |
this ``waste'', the JVM has specific instructions for small integers, |
|
908 |
for example |
|
710 | 909 |
|
910 |
\begin{itemize} |
|
711 | 911 |
\item \instr{iconst_0},\ldots, \instr{iconst_5} |
912 |
\item \instr{bipush n} |
|
710 | 913 |
\end{itemize} |
708 | 914 |
|
710 | 915 |
\noindent |
711 | 916 |
where the \code{n} is \instr{bipush} is between -128 and 128. By |
917 |
having dedicated instructions such as \instr{iconst_0} to |
|
918 |
\instr{iconst_5} (and \instr{iconst_m1}), we can make the generated code |
|
919 |
size smaller as these instructions only require 1 byte (as opposed the |
|
920 |
generic \instr{ldc} which needs 1 byte plus another for the index into |
|
921 |
the constant pool). While in theory the use of such special instructions |
|
922 |
should make the code only smaller, it actually makes the code also run |
|
923 |
faster. Probably because the JVM has to process less code and uses a |
|
712 | 924 |
specific instruction for the underlying CPU. The story with |
711 | 925 |
\instr{bipush} is slightly different, because it also uses two |
712 | 926 |
bytes---so it does not necessarily result in a reduction of code size. |
927 |
Instead, it probably uses a specific instruction in the underlying CPU |
|
928 |
that makes the JVM code run faster.\footnote{This is all ``probable'' |
|
929 |
because I have not read the 700 pages of JVM documentation by Oracle and |
|
930 |
also have no clue how the JVM is implemented.} This means when |
|
931 |
generating code for pushing constants onto the stack, we can use the |
|
932 |
following Scala helper-function |
|
711 | 933 |
|
934 |
\begin{lstlisting}[language=Scala] |
|
935 |
def compile_num(i: Int) = |
|
936 |
if (0 <= i && i <= 5) i"iconst_$i" else |
|
712 | 937 |
if (-128 <= i && i <= 127) i"bipush $i" |
938 |
else i"ldc $i" |
|
711 | 939 |
\end{lstlisting} |
940 |
||
941 |
\noindent |
|
712 | 942 |
It generates the more efficient instructions when pushing a small integer |
943 |
constant onto the stack. The default is \instr{ldc} for any other constants. |
|
944 |
||
945 |
The JVM also has such special instructions for |
|
946 |
loading and storing the first three local variables. The assumption is |
|
947 |
that most operations and arguments in a method will only use very few |
|
948 |
local variables. So we can use the following instructions: |
|
711 | 949 |
|
950 |
\begin{itemize} |
|
951 |
\item \instr{iload_0},\ldots, \instr{iload_3} |
|
952 |
\item \instr{istore_0},\ldots, \instr{istore_3} |
|
953 |
\item \instr{aload_0},\ldots, \instr{aload_3} |
|
954 |
\item \instr{astore_0},\ldots, \instr{astore_3} |
|
955 |
\end{itemize} |
|
710 | 956 |
|
957 |
||
711 | 958 |
\noindent Having implemented these optimisations, the code size of the |
712 | 959 |
BF-Mandelbrot program reduces and also the class-file runs faster (the |
960 |
parsing part is still very slow). According to my very rough |
|
961 |
experiments: |
|
710 | 962 |
|
711 | 963 |
\begin{center} |
964 |
\begin{tabular}{lll} |
|
965 |
& class-size & runtime\\\hline |
|
966 |
Mandelbrot:\\ |
|
967 |
\hspace{5mm}unoptimised: & 33296 & 21 secs\\ |
|
968 |
\hspace{5mm}optimised: & 21787 & 16 secs\\ |
|
969 |
\end{tabular} |
|
970 |
\end{center} |
|
971 |
||
972 |
\noindent |
|
973 |
Quite good! Such optimisations are called \emph{peephole optimisations}, |
|
712 | 974 |
because they involve changing one or a small set of instructions into an |
975 |
equivalent set that has better performance. |
|
710 | 976 |
|
712 | 977 |
If you look careful at our generated code you will quickly find another |
978 |
source of inefficiency in programs like |
|
711 | 979 |
|
980 |
\begin{lstlisting}[mathescape,language=While] |
|
981 |
x := ...; |
|
982 |
write x |
|
983 |
\end{lstlisting} |
|
710 | 984 |
|
711 | 985 |
\noindent |
986 |
where our code first calculates the new result the for \texttt{x} on the |
|
987 |
stack, then pops off the result into a local variable, and after that |
|
988 |
loads the local variable back onto the stack for writing out a number. |
|
712 | 989 |
|
990 |
\begin{lstlisting}[mathescape,language=JVMIS] |
|
991 |
... |
|
992 |
istore 0 |
|
993 |
iload 0 |
|
994 |
... |
|
995 |
\end{lstlisting} |
|
996 |
||
997 |
\noindent |
|
711 | 998 |
If we can detect such situations, then we can leave the value of |
999 |
\texttt{x} on the stack with for example the much cheaper instruction |
|
1000 |
\instr{dup}. Now the problem with this optimisation is that it is quite |
|
1001 |
easy for the snippet above, but what about instances where there is |
|
1002 |
further WHILE-code in \emph{between} these two statements? Sometimes we |
|
1003 |
will be able to optimise, sometimes we will not. The compiler needs to |
|
712 | 1004 |
find out which situation applies. This can quickly become much more |
711 | 1005 |
complicated. So we leave this kind of optimisations here and look at |
1006 |
something more interesting and possibly surprising. |
|
1007 |
||
712 | 1008 |
As you might have seen, the compiler writer has a lot of freedom about |
1009 |
how to generate code from what the programmer wrote as program. The only |
|
1010 |
condition is that generated code should behave as expected by the |
|
1011 |
programmer. Then all is fine with the code above\ldots mission |
|
1012 |
accomplished! But sometimes the compiler writer is expected to go an |
|
1013 |
extra mile, or even miles and change(!) the meaning of a program. |
|
1014 |
Suppose we are given the following WHILE-program: |
|
692 | 1015 |
|
708 | 1016 |
\begin{lstlisting}[mathescape,language=While] |
1017 |
new(arr[10]); |
|
1018 |
arr[14] := 3 + arr[13] |
|
1019 |
\end{lstlisting} |
|
1020 |
||
1021 |
\noindent |
|
711 | 1022 |
Admittedly this is a contrived program, and probably not meant to be |
1023 |
like this by any sane programmer, but it is supposed to make the |
|
712 | 1024 |
following point: The program generates an array of size 10, and then |
1025 |
tries to access the non-existing element at index 13 and even updating |
|
1026 |
the element with index 14. Obviously this is baloney. Still, our |
|
1027 |
compiler generates code for this program without any questions asked. We |
|
1028 |
can even run this code on the JVM\ldots of course the result is an |
|
1029 |
exception trace where the JVM yells at us for doing naughty |
|
1030 |
things.\footnote{Still this is much better than C, for example, where |
|
1031 |
such errors are not prevented and as a result insidious attacks can be |
|
1032 |
mounted against such kind C-programs. I assume everyone has heard about |
|
1033 |
\emph{Buffer Overflow Attacks}.} Now what should we do in such |
|
1034 |
situations? Over- and underflows of indices are notoriously difficult to |
|
1035 |
detect statically (at compiletime). So it might seem raising an |
|
1036 |
exception at run-time like the JVM is the best compromise. |
|
708 | 1037 |
|
711 | 1038 |
Well, imagine we do not want to rely in our compiler on the JVM for |
1039 |
producing an annoying, but safe exception trace, rather we want to |
|
712 | 1040 |
handle such situations ourselves according to what we think should |
1041 |
happen in such cases. Let us assume we want to handle them in the |
|
708 | 1042 |
following way: if the programmer access a field out-of-bounds, we just |
712 | 1043 |
return a default 0, and if a programmer wants to update an out-of-bounds |
1044 |
field, we want to ``quietly'' ignore this update. One way to achieve |
|
1045 |
this would be to rewrite the WHILE-programs and insert the necessary |
|
1046 |
if-conditions for safely reading and writing arrays. Another way |
|
1047 |
is to modify the code we generate. |
|
709 | 1048 |
|
712 | 1049 |
\begin{lstlisting}[mathescape,language=JVMIS2] |
1050 |
$\textit{index\_aexp}$ |
|
1051 |
aload loc_var |
|
1052 |
dup2 |
|
1053 |
arraylength |
|
1054 |
if_icmple L1 |
|
1055 |
pop2 |
|
1056 |
iconst_0 |
|
1057 |
goto L2 |
|
1058 |
L1: |
|
1059 |
swap |
|
1060 |
iaload |
|
1061 |
L2: |
|
1062 |
\end{lstlisting} |
|
709 | 1063 |
|
712 | 1064 |
\begin{lstlisting}[mathescape,language=JVMIS2] |
1065 |
$\textit{index\_aexp}$ |
|
1066 |
aload loc_var |
|
1067 |
dup2 |
|
1068 |
arraylength |
|
1069 |
if_icmple L1 |
|
1070 |
pop2 |
|
1071 |
goto L2 |
|
1072 |
L1: |
|
1073 |
swap |
|
1074 |
$\textit{value\_aexp}$ |
|
1075 |
iastore |
|
1076 |
L2: |
|
1077 |
\end{lstlisting} |
|
709 | 1078 |
|
714 | 1079 |
\begin{figure}[p] |
1080 |
\begin{center} |
|
1081 |
\begin{tikzpicture}[every text node part/.style={align=left}, |
|
1082 |
stack/.style={rectangle split,rectangle split parts = 5, |
|
1083 |
fill=black!20,draw,text width=1.6cm,line width=0.5mm}] |
|
1084 |
\node (A) {}; |
|
1085 |
\node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}}; |
|
1086 |
\node[stack,right = 60pt] (1) at (0.east) |
|
1087 |
{array\nodepart{two} |
|
1088 |
$\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
|
1089 |
\node[stack,below = 40pt] (2) at (1.south) |
|
1090 |
{array\nodepart{two} |
|
1091 |
$\textit{index}$ \nodepart{three} |
|
1092 |
array \nodepart{four} |
|
1093 |
$\textit{index}$\nodepart{five} \ldots\phantom{l}}; |
|
1094 |
\node[stack,left = 90pt] (3) at (2.west) |
|
1095 |
{array\_len\nodepart{two} |
|
1096 |
$\textit{index}$ \nodepart{three} |
|
1097 |
array \nodepart{four} |
|
1098 |
$\textit{index}$\nodepart{five} \ldots\phantom{l}}; |
|
1099 |
\node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south) |
|
1100 |
{array\nodepart{two} |
|
1101 |
$\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
|
1102 |
\node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south) |
|
1103 |
{array\nodepart{two} |
|
1104 |
$\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
|
1105 |
\node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south) |
|
1106 |
{$\textit{index}$\nodepart{two} |
|
1107 |
array\nodepart{three} \ldots\phantom{l}}; |
|
1108 |
\node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south) |
|
1109 |
{$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}}; |
|
1110 |
\node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south) |
|
1111 |
{\ldots\phantom{l}}; |
|
1112 |
\node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south) |
|
1113 |
{0\nodepart{two} \ldots\phantom{l}}; |
|
1114 |
||
1115 |
\draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0); |
|
1116 |
\draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1); |
|
1117 |
\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2); |
|
1118 |
\draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3); |
|
1119 |
\path[->,draw,line width=2.5mm] |
|
1120 |
let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north); |
|
1121 |
\path[->,draw,line width=2.5mm] |
|
1122 |
let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north); |
|
1123 |
\draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a); |
|
1124 |
\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b); |
|
1125 |
\draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a); |
|
1126 |
\draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b); |
|
1127 |
\end{tikzpicture} |
|
1128 |
\end{center} |
|
1129 |
\end{figure} |
|
1130 |
||
713 | 1131 |
goto\_w problem solved for too large jumps |
327
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1132 |
\end{document} |
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1133 |
|
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1134 |
%%% Local Variables: |
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1135 |
%%% mode: latex |
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1136 |
%%% TeX-master: t |
9470cd124667
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
1137 |
%%% End: |