| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Mon, 29 Jun 2020 21:13:49 +0100 | |
| changeset 726 | f6c2e8c48a1c | 
| parent 714 | 9d06a8863898 | 
| child 940 | 1c1fbf45a03c | 
| permissions | -rw-r--r-- | 
| 601 | 1  | 
% !TEX program = xelatex  | 
| 
327
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
2  | 
\documentclass{article}
 | 
| 
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
3  | 
\usepackage{../style}
 | 
| 
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
4  | 
\usepackage{../langs}
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
5  | 
\usepackage{../grammar}
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
6  | 
\usepackage{../graphics}
 | 
| 714 | 7  | 
\usetikzlibrary{calc,shapes,arrows}
 | 
| 710 | 8  | 
\usepackage{framed}
 | 
9  | 
\usepackage[belowskip=7pt,aboveskip=0pt]{caption}
 | 
|
| 705 | 10  | 
|
| 708 | 11  | 
|
12  | 
||
13  | 
||
| 
327
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
14  | 
\begin{document}
 | 
| 708 | 15  | 
\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019, 2020}
 | 
| 
327
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
16  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
17  | 
\section*{Handout 7 (Compilation)}
 | 
| 
327
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
18  | 
|
| 668 | 19  | 
The purpose of a compiler is to transform a program a human can read and  | 
20  | 
write into code the machine can run as fast as possible. The fastest  | 
|
21  | 
code would be machine code the CPU can run directly, but it is often  | 
|
| 709 | 22  | 
good enough for improving the speed of a program to target a virtual  | 
23  | 
machine instead. This produces not the fastest possible code, but code  | 
|
| 710 | 24  | 
that is often pretty fast. This way of producing code has also the  | 
25  | 
advantage that the virtual machine takes care of things a compiler would  | 
|
26  | 
normally need to take care of (hairy things like explicit memory  | 
|
27  | 
management).  | 
|
| 
452
 
0b707b614dac
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
28  | 
|
| 
 
0b707b614dac
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
29  | 
As a first example in this module we will implement a compiler for the  | 
| 708 | 30  | 
very simple WHILE-language that we parsed in the last lecture. The  | 
31  | 
compiler will target the Java Virtual Machine (JVM), but not directly.  | 
|
32  | 
Pictorially the compiler will work as follows:  | 
|
| 710 | 33  | 
|
| 708 | 34  | 
\begin{center}
 | 
35  | 
  \begin{tikzpicture}[scale=1,font=\bf,
 | 
|
36  | 
                      node/.style={
 | 
|
37  | 
rectangle,rounded corners=3mm,  | 
|
38  | 
ultra thick,draw=black!50,minimum height=18mm,  | 
|
39  | 
minimum width=20mm,  | 
|
40  | 
top color=white,bottom color=black!20}]  | 
|
41  | 
||
42  | 
  \node (0) at (-3,0) {};  
 | 
|
43  | 
  \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler};
 | 
|
44  | 
  \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau};
 | 
|
45  | 
  \node (C) at (7.5,0) [node] {JVM};
 | 
|
46  | 
||
47  | 
  \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); 
 | 
|
48  | 
  \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); 
 | 
|
49  | 
  \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); 
 | 
|
50  | 
  \end{tikzpicture}
 | 
|
51  | 
  \end{center}
 | 
|
52  | 
||
53  | 
\noindent  | 
|
54  | 
The input will be WHILE-programs; the output will be assembly files  | 
|
| 709 | 55  | 
(with the file extension .j). Assembly files essentially contain  | 
| 712 | 56  | 
human-readable low-level code, meaning they are not just bits and bytes,  | 
| 709 | 57  | 
but rather something you can read and understand---with a bit of  | 
58  | 
practice of course. An \emph{assembler} will then translate the assembly
 | 
|
| 712 | 59  | 
files into unreadable class- or binary-files the JVM or CPU can run.  | 
| 709 | 60  | 
Unfortunately, the Java ecosystem does not come with an assembler which  | 
61  | 
would be handy for our compiler-endeavour (unlike Microsoft's Common  | 
|
62  | 
Language Infrastructure for the .Net platform which has an assembler  | 
|
| 712 | 63  | 
out-of-the-box). As a substitute we shall use the 3rd-party programs  | 
64  | 
Jasmin and Krakatau  | 
|
| 690 | 65  | 
|
66  | 
\begin{itemize}
 | 
|
67  | 
  \item \url{http://jasmin.sourceforge.net}
 | 
|
68  | 
  \item \url{https://github.com/Storyyeller/Krakatau}
 | 
|
69  | 
\end{itemize}
 | 
|
70  | 
||
71  | 
\noindent  | 
|
72  | 
The first is a Java program and the second a program written in Python.  | 
|
73  | 
Each of them allow us to generate \emph{assembly} files that are still
 | 
|
74  | 
readable by humans, as opposed to class-files which are pretty much just  | 
|
75  | 
(horrible) zeros and ones. Jasmin (respectively Krakatau) will then take  | 
|
| 710 | 76  | 
our assembly files as input and generate the corresponding class-files for  | 
| 690 | 77  | 
us.  | 
78  | 
||
| 710 | 79  | 
What is good about the JVM is that it is a stack-based virtual machine,  | 
80  | 
a fact which will make it easy to generate code for arithmetic  | 
|
81  | 
expressions. For example when compiling the expression $1 + 2$ we need  | 
|
82  | 
to generate the following three instructions  | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
83  | 
|
| 668 | 84  | 
\begin{lstlisting}[language=JVMIS,numbers=none]
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
85  | 
ldc 1  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
86  | 
ldc 2  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
87  | 
iadd  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
88  | 
\end{lstlisting}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
89  | 
|
| 709 | 90  | 
\noindent The first instruction loads the constant $1$ onto the stack,  | 
91  | 
the next one loads $2$, the third instruction adds both numbers together  | 
|
92  | 
replacing the top two elements of the stack with the result $3$. For  | 
|
| 710 | 93  | 
simplicity, we will consider throughout only arithmetic involving  | 
94  | 
integer numbers. This means our main JVM instructions for arithmetic  | 
|
| 711 | 95  | 
will be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on.
 | 
| 710 | 96  | 
The \code{i} stands for integer instructions in the JVM (alternatives
 | 
97  | 
are \code{d} for doubles, \code{l} for longs and \code{f} for floats
 | 
|
98  | 
etc).  | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
99  | 
|
| 600 | 100  | 
Recall our grammar for arithmetic expressions (\meta{E} is the
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
101  | 
starting symbol):  | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
102  | 
|
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
103  | 
|
| 601 | 104  | 
\begin{plstx}[rhs style=, margin=3cm]
 | 
105  | 
: \meta{E} ::= \meta{T} $+$ \meta{E}
 | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
106  | 
         | \meta{T} $-$ \meta{E}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
107  | 
         | \meta{T}\\
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
108  | 
: \meta{T} ::= \meta{F} $*$ \meta{T}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
109  | 
          | \meta{F} $\backslash$ \meta{T}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
110  | 
          | \meta{F}\\
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
111  | 
: \meta{F} ::= ( \meta{E} )
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
112  | 
          | \meta{Id}
 | 
| 601 | 113  | 
          | \meta{Num}\\
 | 
114  | 
\end{plstx}
 | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
115  | 
|
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
116  | 
|
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
117  | 
\noindent where \meta{Id} stands for variables and \meta{Num}
 | 
| 668 | 118  | 
for numbers. For the moment let us omit variables from arithmetic  | 
119  | 
expressions. Our parser will take this grammar and given an input  | 
|
| 712 | 120  | 
program produce an abstract syntax tree. For example we obtain for  | 
| 709 | 121  | 
the expression $1 + ((2 * 3) + (4 - 3))$ the following tree.  | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
122  | 
|
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
123  | 
\begin{center}
 | 
| 601 | 124  | 
\begin{tikzpicture}
 | 
125  | 
\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]  | 
|
126  | 
\end{tikzpicture}
 | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
127  | 
\end{center}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
128  | 
|
| 708 | 129  | 
\noindent To generate JVM code for this expression, we need to traverse  | 
130  | 
this tree in \emph{post-order} fashion and emit code for each
 | 
|
131  | 
node---this traversal in \emph{post-order} fashion will produce code for
 | 
|
132  | 
a stack-machine (which is what the JVM is). Doing so for the tree above  | 
|
133  | 
generates the instructions  | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
134  | 
|
| 668 | 135  | 
\begin{lstlisting}[language=JVMIS,numbers=none]
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
136  | 
ldc 1  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
137  | 
ldc 2  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
138  | 
ldc 3  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
139  | 
imul  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
140  | 
ldc 4  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
141  | 
ldc 3  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
142  | 
isub  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
143  | 
iadd  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
144  | 
iadd  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
145  | 
\end{lstlisting}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
146  | 
|
| 668 | 147  | 
\noindent If we ``run'' these instructions, the result $8$ will be on  | 
148  | 
top of the stack (I leave this to you to verify; the meaning of each  | 
|
149  | 
instruction should be clear). The result being on the top of the stack  | 
|
| 690 | 150  | 
will be an important convention we always observe in our compiler. Note,  | 
151  | 
that a different bracketing of the expression, for example $(1 + (2 *  | 
|
152  | 
3)) + (4 - 3)$, produces a different abstract syntax tree and thus also  | 
|
| 709 | 153  | 
a different list of instructions.  | 
154  | 
||
155  | 
Generating code in this post-order-traversal fashion is rather easy to  | 
|
156  | 
implement: it can be done with the following recursive  | 
|
157  | 
\textit{compile}-function, which takes the abstract syntax tree as an
 | 
|
158  | 
argument:  | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
159  | 
|
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
160  | 
\begin{center}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
161  | 
\begin{tabular}{lcl}
 | 
| 711 | 162  | 
$\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
163  | 
$\textit{compile}(a_1 + a_2)$ & $\dn$ &
 | 
| 711 | 164  | 
$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
165  | 
$\textit{compile}(a_1 - a_2)$ & $\dn$ & 
 | 
| 711 | 166  | 
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
167  | 
$\textit{compile}(a_1 * a_2)$ & $\dn$ & 
 | 
| 711 | 168  | 
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
169  | 
$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
 | 
| 711 | 170  | 
$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
171  | 
\end{tabular}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
172  | 
\end{center}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
173  | 
|
| 709 | 174  | 
\noindent  | 
175  | 
This is all fine, but our arithmetic expressions can contain variables  | 
|
176  | 
and we have not considered them yet. To fix this we will represent our  | 
|
| 710 | 177  | 
variables as \emph{local variables} of the JVM. Essentially, local
 | 
| 709 | 178  | 
variables are an array or pointers to memory cells, containing in our  | 
179  | 
case only integers. Looking up a variable can be done with the  | 
|
180  | 
instruction  | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
181  | 
|
| 668 | 182  | 
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
183  | 
iload $index$  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
184  | 
\end{lstlisting}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
185  | 
|
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
186  | 
\noindent  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
187  | 
which places the content of the local variable $index$ onto  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
188  | 
the stack. Storing the top of the stack into a local variable  | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
189  | 
can be done by the instruction  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
190  | 
|
| 668 | 191  | 
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
192  | 
istore $index$  | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
193  | 
\end{lstlisting}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
194  | 
|
| 708 | 195  | 
\noindent Note that this also pops off the top of the stack. One problem  | 
196  | 
we have to overcome, however, is that local variables are addressed, not  | 
|
197  | 
by identifiers (like \texttt{x}, \texttt{foo} and so on), but by numbers
 | 
|
198  | 
(starting from $0$). Therefore our compiler needs to maintain a kind of  | 
|
199  | 
environment where variables are associated to numbers. This association  | 
|
200  | 
needs to be unique: if we muddle up the numbers, then we essentially  | 
|
201  | 
confuse variables and the consequence will usually be an erroneous  | 
|
202  | 
result. Our extended \textit{compile}-function for arithmetic
 | 
|
203  | 
expressions will therefore take two arguments: the abstract syntax tree  | 
|
204  | 
and an environment, $E$, that maps identifiers to index-numbers.  | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
205  | 
|
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
206  | 
\begin{center}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
207  | 
\begin{tabular}{lcl}
 | 
| 711 | 208  | 
$\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
209  | 
$\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
 | 
| 711 | 210  | 
$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
211  | 
$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
 | 
| 711 | 212  | 
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
213  | 
$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
 | 
| 711 | 214  | 
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\
 | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
215  | 
$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
 | 
| 711 | 216  | 
$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\
 | 
217  | 
$\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\
 | 
|
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
218  | 
\end{tabular}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
219  | 
\end{center}
 | 
| 
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
220  | 
|
| 708 | 221  | 
\noindent In the last line we generate the code for variables where  | 
222  | 
$E(x)$ stands for looking up the environment to which index the variable  | 
|
223  | 
$x$ maps to. This is similar to the interpreter we saw earlier in the  | 
|
224  | 
module, which also needs an environment: the difference is that the  | 
|
225  | 
interpreter maintains a mapping from variables to current values (what  | 
|
226  | 
is the currently the value of a variable?), while compilers need a  | 
|
227  | 
mapping from variables to memory locations (where can I find the current  | 
|
228  | 
value for the variable in memory?).  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
229  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
230  | 
There is a similar \textit{compile}-function for boolean
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
231  | 
expressions, but it includes a ``trick'' to do with  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
232  | 
\pcode{if}- and \pcode{while}-statements. To explain the issue
 | 
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
233  | 
let us first describe the compilation of statements of the  | 
| 708 | 234  | 
WHILE-language. The clause for \pcode{skip} is trivial, since
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
235  | 
we do not have to generate any instruction  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
236  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
237  | 
\begin{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
238  | 
\begin{tabular}{lcl}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
239  | 
$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
240  | 
\end{tabular}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
241  | 
\end{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
242  | 
|
| 668 | 243  | 
\noindent whereby $[]$ is the empty list of instructions. Note that  | 
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
244  | 
the \textit{compile}-function for statements returns a pair, a
 | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
245  | 
list of instructions (in this case the empty list) and an  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
246  | 
environment for variables. The reason for the environment is  | 
| 708 | 247  | 
that assignments in the WHILE-language might change the  | 
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
248  | 
environment---clearly if a variable is used for the first  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
249  | 
time, we need to allocate a new index and if it has been used  | 
| 690 | 250  | 
before, then we need to be able to retrieve the associated index.  | 
251  | 
This is reflected in the clause for compiling assignments, say  | 
|
| 712 | 252  | 
$x := a$:  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
253  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
254  | 
\begin{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
255  | 
\begin{tabular}{lcl}
 | 
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
256  | 
$\textit{compile}(x := a, E)$ & $\dn$ & 
 | 
| 711 | 257  | 
$(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
258  | 
\end{tabular}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
259  | 
\end{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
260  | 
|
| 708 | 261  | 
\noindent We first generate code for the right-hand side of the  | 
262  | 
assignment (that is the arithmetic expression $a$) and then add an  | 
|
| 711 | 263  | 
\instr{istore}-instruction at the end. By convention running the code
 | 
| 708 | 264  | 
for the arithmetic expression $a$ will leave the result on top of the  | 
| 712 | 265  | 
stack. After that the \instr{istore}-instruction, the result will be
 | 
| 708 | 266  | 
stored in the index corresponding to the variable $x$. If the variable  | 
267  | 
$x$ has been used before in the program, we just need to look up what  | 
|
268  | 
the index is and return the environment unchanged (that is in this case  | 
|
269  | 
$E' = E$). However, if this is the first encounter of the variable $x$  | 
|
270  | 
in the program, then we have to augment the environment and assign $x$  | 
|
271  | 
with the largest index in $E$ plus one (that is $E' = E(x \mapsto  | 
|
272  | 
largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we  | 
|
| 710 | 273  | 
generate the following code snippet  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
274  | 
|
| 668 | 275  | 
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
276  | 
iload $n_x$  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
277  | 
ldc 1  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
278  | 
iadd  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
279  | 
istore $n_x$  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
280  | 
\end{lstlisting}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
281  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
282  | 
\noindent  | 
| 692 | 283  | 
where $n_x$ is the index (or pointer to the memory) for the variable  | 
| 709 | 284  | 
$x$. The Scala code for looking-up the index for the variable is as follow:  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
285  | 
|
| 668 | 286  | 
\begin{center}
 | 
287  | 
\begin{tabular}{lcl}
 | 
|
| 690 | 288  | 
$index \;=\; E\textit{.getOrElse}(x, |E|)$
 | 
| 668 | 289  | 
\end{tabular}
 | 
290  | 
\end{center}
 | 
|
291  | 
||
292  | 
\noindent  | 
|
| 708 | 293  | 
This implements the idea that in case the environment $E$ contains an  | 
294  | 
index for $x$, we return it. Otherwise we ``create'' a new index by  | 
|
295  | 
returning the size $|E|$ of the environment (that will be an index that  | 
|
296  | 
is guaranteed not to be used yet). In all this we take advantage of the  | 
|
297  | 
JVM which provides us with a potentially limitless supply of places  | 
|
298  | 
where we can store values of variables.  | 
|
| 668 | 299  | 
|
| 692 | 300  | 
A bit more complicated is the generation of code for  | 
301  | 
\pcode{if}-statements, say
 | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
302  | 
|
| 711 | 303  | 
\begin{lstlisting}[mathescape,language={WHILE},numbers=none]
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
304  | 
if $b$ then $cs_1$ else $cs_2$  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
305  | 
\end{lstlisting}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
306  | 
|
| 692 | 307  | 
\noindent where $b$ is a boolean expression and where both $cs_{1/2}$
 | 
| 708 | 308  | 
are the statements for each of the \pcode{if}-branches. Let us assume we
 | 
309  | 
already generated code for $b$ and and the two if-branches $cs_{1/2}$.
 | 
|
310  | 
Then in the true-case the control-flow of the program needs to behave as  | 
|
311  | 
||
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
312  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
313  | 
\begin{center}
 | 
| 708 | 314  | 
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
315  | 
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | 
|
316  | 
top color=white,bottom color=black!20},  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
317  | 
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
318  | 
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
319  | 
\node (A1) [point] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
320  | 
\node (b) [block, right=of A1] {code of $b$};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
321  | 
\node (A2) [point, right=of b] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
322  | 
\node (cs1) [block, right=of A2] {code of $cs_1$};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
323  | 
\node (A3) [point, right=of cs1] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
324  | 
\node (cs2) [block, right=of A3] {code of $cs_2$};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
325  | 
\node (A4) [point, right=of cs2] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
326  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
327  | 
\draw (A1) edge [->, black, line width=1mm] (b);  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
328  | 
\draw (b) edge [->, black, line width=1mm] (cs1);  | 
| 708 | 329  | 
\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
330  | 
\draw (A3) edge [->, black, skip loop] (A4);  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
331  | 
\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
332  | 
\end{tikzpicture}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
333  | 
\end{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
334  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
335  | 
\noindent where we start with running the code for $b$; since  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
336  | 
we are in the true case we continue with running the code for  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
337  | 
$cs_1$. After this however, we must not run the code for  | 
| 708 | 338  | 
$cs_2$, but always jump to after the last instruction of $cs_2$  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
339  | 
(the code for the \pcode{else}-branch). Note that this jump is
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
340  | 
unconditional, meaning we always have to jump to the end of  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
341  | 
$cs_2$. The corresponding instruction of the JVM is  | 
| 711 | 342  | 
\instr{goto}. In case $b$ turns out to be false we need the
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
343  | 
control-flow  | 
| 
370
 
a65767fe5d71
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
369 
diff
changeset
 | 
344  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
345  | 
\begin{center}
 | 
| 708 | 346  | 
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
347  | 
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | 
|
348  | 
top color=white,bottom color=black!20},  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
349  | 
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
350  | 
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
351  | 
\node (A1) [point] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
352  | 
\node (b) [block, right=of A1] {code of $b$};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
353  | 
\node (A2) [point, right=of b] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
354  | 
\node (cs1) [block, right=of A2] {code of $cs_1$};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
355  | 
\node (A3) [point, right=of cs1] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
356  | 
\node (cs2) [block, right=of A3] {code of $cs_2$};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
357  | 
\node (A4) [point, right=of cs2] {};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
358  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
359  | 
\draw (A1) edge [->, black, line width=1mm] (b);  | 
| 708 | 360  | 
\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
361  | 
\draw (A2) edge [skip loop] (A3);  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
362  | 
\draw (A3) edge [->, black, line width=1mm] (cs2);  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
363  | 
\draw (cs2) edge [->,black, line width=1mm] (A4);  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
364  | 
\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}};
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
365  | 
\end{tikzpicture}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
366  | 
\end{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
367  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
368  | 
\noindent where we now need a conditional jump (if the  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
369  | 
if-condition is false) from the end of the code for the  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
370  | 
boolean to the beginning of the instructions $cs_2$. Once we  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
371  | 
are finished with running $cs_2$ we can continue with whatever  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
372  | 
code comes after the if-statement.  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
373  | 
|
| 711 | 374  | 
The \instr{goto} and the conditional jumps need addresses to
 | 
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
375  | 
where the jump should go. Since we are generating assembly  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
376  | 
code for the JVM, we do not actually have to give (numeric)  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
377  | 
addresses, but can just attach (symbolic) labels to our code.  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
378  | 
These labels specify a target for a jump. Therefore the labels  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
379  | 
need to be unique, as otherwise it would be ambiguous where a  | 
| 712 | 380  | 
jump should go to. A label, say \pcode{L}, is attached to assembly 
 | 
381  | 
code like  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
382  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
383  | 
\begin{lstlisting}[mathescape,numbers=none]
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
384  | 
L:  | 
| 711 | 385  | 
  $\textit{instr\_1}$
 | 
386  | 
  $\textit{instr\_2}$
 | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
387  | 
$\vdots$  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
388  | 
\end{lstlisting}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
389  | 
|
| 708 | 390  | 
\noindent where the label needs to be followed by a colon. The task of  | 
391  | 
the assembler (in our case Jasmin or Krakatau) is to resolve the labels  | 
|
392  | 
to actual (numeric) addresses, for example jump 10 instructions forward,  | 
|
| 692 | 393  | 
or 20 instructions backwards.  | 
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
394  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
395  | 
Recall the ``trick'' with compiling boolean expressions: the  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
396  | 
\textit{compile}-function for boolean expressions takes three
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
397  | 
arguments: an abstract syntax tree, an environment for  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
398  | 
variable indices and also the label, $lab$, to where an conditional  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
399  | 
jump needs to go. The clause for the expression $a_1 = a_2$,  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
400  | 
for example, is as follows:  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
401  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
402  | 
\begin{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
403  | 
\begin{tabular}{lcl}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
404  | 
$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ 
 | 
| 711 | 405  | 
\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
406  | 
\end{tabular}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
407  | 
\end{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
408  | 
|
| 
376
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
409  | 
\noindent where we are first generating code for the  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
410  | 
subexpressions $a_1$ and $a_2$. This will mean after running  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
411  | 
the corresponding code there will be two integers on top of  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
412  | 
the stack. If they are equal, we do not have to do anything  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
413  | 
(except for popping them off from the stack) and just continue  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
414  | 
with the next instructions (see control-flow of ifs above).  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
415  | 
However if they are \emph{not} equal, then we need to
 | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
416  | 
(conditionally) jump to the label $lab$. This can be done with  | 
| 
 
af65ffff9cdd
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
375 
diff
changeset
 | 
417  | 
the instruction  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
418  | 
|
| 692 | 419  | 
\begin{lstlisting}[mathescape,numbers=none,language=JVMIS]
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
420  | 
if_icmpne $lab$  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
421  | 
\end{lstlisting}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
422  | 
|
| 708 | 423  | 
To sum up, the third argument in the compile function for booleans  | 
424  | 
specifies where to jump, in case the condition is \emph{not} true. I
 | 
|
425  | 
leave it to you to extend the \textit{compile}-function for the other
 | 
|
426  | 
boolean expressions. Note that we need to jump whenever the boolean is  | 
|
427  | 
\emph{not} true, which means we have to ``negate'' the jump
 | 
|
428  | 
condition---equals becomes not-equal, less becomes greater-or-equal.  | 
|
429  | 
Other jump instructions for boolean operators are  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
430  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
431  | 
\begin{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
432  | 
\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
 | 
| 711 | 433  | 
$\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\
 | 
434  | 
$<$ & $\Rightarrow$ & \instr{if_icmpge}\\
 | 
|
435  | 
$\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\
 | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
436  | 
\end{tabular}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
437  | 
\end{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
438  | 
|
| 708 | 439  | 
\noindent and so on. If you do not like this design (it can be the  | 
| 692 | 440  | 
source of some nasty, hard-to-detect errors), you can also change the  | 
441  | 
layout of the code and first give the code for the else-branch and then  | 
|
442  | 
for the if-branch. However in the case of while-loops this  | 
|
443  | 
``upside-down-inside-out'' way of generating code still seems the most  | 
|
444  | 
convenient.  | 
|
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
445  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
446  | 
We are now ready to give the compile function for  | 
| 601 | 447  | 
if-statements---remember this function returns for statements a  | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
448  | 
pair consisting of the code and an environment:  | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
449  | 
|
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
450  | 
\begin{center}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
451  | 
\begin{tabular}{lcl}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
452  | 
$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ 
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
453  | 
\multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
454  | 
\multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
455  | 
\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
456  | 
\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
457  | 
\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
458  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
459  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
460  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
461  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
462  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\
 | 
| 
372
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
463  | 
\end{tabular}
 | 
| 
 
d6af4b1239de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
370 
diff
changeset
 | 
464  | 
\end{center}
 | 
| 
327
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
465  | 
|
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
466  | 
\noindent In the first two lines we generate two fresh labels  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
467  | 
for the jump addresses (just before the else-branch and just  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
468  | 
after). In the next two lines we generate the instructions for  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
469  | 
the two branches, $is_1$ and $is_2$. The final code will  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
470  | 
be first the code for $b$ (including the label  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
471  | 
just-before-the-else-branch), then the \pcode{goto} for after
 | 
| 712 | 472  | 
the else-branch, the label $L_\textit{ifelse}$, followed by
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
473  | 
the instructions for the else-branch, followed by the  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
474  | 
after-the-else-branch label. Consider for example the  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
475  | 
if-statement:  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
476  | 
|
| 690 | 477  | 
\begin{lstlisting}[mathescape,numbers=none,language=While]
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
478  | 
if 1 = 1 then x := 2 else y := 3  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
479  | 
\end{lstlisting}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
480  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
481  | 
\noindent  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
482  | 
The generated code is as follows:  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
483  | 
|
| 690 | 484  | 
\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
485  | 
ldc 1  | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
486  | 
ldc 1  | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
487  | 
   if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
488  | 
ldc 2  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
489  | 
istore 0  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
490  | 
   goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
491  | 
L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
492  | 
ldc 3  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
493  | 
istore 1  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
494  | 
L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
495  | 
\end{lstlisting}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
496  | 
|
| 601 | 497  | 
\begin{tikzpicture}[remember picture,overlay]
 | 
498  | 
  \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) 
 | 
|
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
499  | 
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east);  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
500  | 
  \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) 
 | 
| 601 | 501  | 
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east);  | 
502  | 
\end{tikzpicture}
 | 
|
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
503  | 
|
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
504  | 
\noindent The first three lines correspond to the the boolean  | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
505  | 
expression $1 = 1$. The jump for when this boolean expression  | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
506  | 
is false is in Line~3. Lines 4-6 corresponds to the if-branch;  | 
| 712 | 507  | 
the else-branch is in Lines 8 and 9.  | 
508  | 
||
509  | 
Note carefully how the environment $E$ is threaded through the recursive  | 
|
510  | 
calls of \textit{compile}. The function receives an environment $E$, but
 | 
|
511  | 
it might extend it when compiling the if-branch, yielding $E'$. This  | 
|
512  | 
happens for example in the if-statement above whenever the variable  | 
|
513  | 
\code{x} has not been used before. Similarly with the environment $E''$
 | 
|
514  | 
for the second call to \textit{compile}. $E''$ is also the environment
 | 
|
515  | 
that needs to be returned as part of the answer.  | 
|
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
516  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
517  | 
The compilation of the while-loops, say  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
518  | 
\pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
 | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
519  | 
the condition is true and we need to do another iteration,  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
520  | 
and the control-flow needs to be as follows  | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
521  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
522  | 
\begin{center}
 | 
| 708 | 523  | 
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
524  | 
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | 
|
525  | 
top color=white,bottom color=black!20},  | 
|
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
526  | 
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
527  | 
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
528  | 
\node (A0) [point, left=of A1] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
529  | 
\node (A1) [point] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
530  | 
\node (b) [block, right=of A1] {code of $b$};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
531  | 
\node (A2) [point, right=of b] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
532  | 
\node (cs1) [block, right=of A2] {code of $cs$};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
533  | 
\node (A3) [point, right=of cs1] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
534  | 
\node (A4) [point, right=of A3] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
535  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
536  | 
\draw (A0) edge [->, black, line width=1mm] (b);  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
537  | 
\draw (b) edge [->, black, line width=1mm] (cs1);  | 
| 708 | 538  | 
\draw (cs1) edge [->, black, line width=1mm,shorten >= -0.5mm] (A3);  | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
539  | 
\draw (A3) edge [->,skip loop] (A1);  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
540  | 
\end{tikzpicture}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
541  | 
\end{center}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
542  | 
|
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
543  | 
\noindent Whereas if the condition is \emph{not} true, we
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
544  | 
need to jump out of the loop, which gives the following  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
545  | 
control flow.  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
546  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
547  | 
\begin{center}
 | 
| 708 | 548  | 
\begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 | 
549  | 
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 | 
|
550  | 
top color=white,bottom color=black!20},  | 
|
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
551  | 
 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
552  | 
 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}]
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
553  | 
\node (A0) [point, left=of A1] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
554  | 
\node (A1) [point] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
555  | 
\node (b) [block, right=of A1] {code of $b$};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
556  | 
\node (A2) [point, right=of b] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
557  | 
\node (cs1) [block, right=of A2] {code of $cs$};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
558  | 
\node (A3) [point, right=of cs1] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
559  | 
\node (A4) [point, right=of A3] {};
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
560  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
561  | 
\draw (A0) edge [->, black, line width=1mm] (b);  | 
| 708 | 562  | 
\draw (b) edge [->, black, line width=1mm,shorten >= -0.5mm] (A2);  | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
563  | 
\draw (A2) edge [skip loop] (A3);  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
564  | 
\draw (A3) edge [->, black, line width=1mm] (A4);  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
565  | 
\end{tikzpicture}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
566  | 
\end{center}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
567  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
568  | 
\noindent Again we can use the \textit{compile}-function for
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
569  | 
boolean expressions to insert the appropriate jump to the  | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
570  | 
end of the loop (label $L_{wend}$ below).
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
571  | 
|
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
572  | 
\begin{center}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
573  | 
\begin{tabular}{lcl}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
574  | 
$\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ 
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
575  | 
\multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
576  | 
\multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
577  | 
\multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
578  | 
\multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
579  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
580  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\
 | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
581  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
582  | 
\multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
583  | 
\end{tabular}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
584  | 
\end{center}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
585  | 
|
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
586  | 
\noindent I let you go through how this clause works. As an example  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
587  | 
you can consider the while-loop  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
588  | 
|
| 690 | 589  | 
\begin{lstlisting}[mathescape,numbers=none,language=While]
 | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
590  | 
while x <= 10 do x := x + 1  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
591  | 
\end{lstlisting}
 | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
592  | 
|
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
593  | 
\noindent yielding the following code  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
594  | 
|
| 709 | 595  | 
\begin{lstlisting}[language=JVMIS2,mathescape,numbers=left]
 | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
596  | 
L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
 | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
597  | 
iload 0  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
598  | 
ldc 10  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
599  | 
   if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
 | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
600  | 
iload 0  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
601  | 
ldc 1  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
602  | 
iadd  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
603  | 
istore 0  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
604  | 
   goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$
 | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
605  | 
L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$
 | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
606  | 
\end{lstlisting}
 | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
607  | 
|
| 601 | 608  | 
\begin{tikzpicture}[remember picture,overlay]
 | 
609  | 
  \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) 
 | 
|
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
610  | 
-- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
611  | 
  \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
 | 
| 601 | 612  | 
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);  | 
613  | 
\end{tikzpicture}
 | 
|
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
614  | 
|
| 690 | 615  | 
\noindent  | 
| 708 | 616  | 
As said, I leave it to you to decide whether the code implements  | 
617  | 
the usual controlflow of while-loops.  | 
|
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
618  | 
|
| 709 | 619  | 
Next we need to consider the WHILE-statement \pcode{write x}, which can
 | 
620  | 
be used to print out the content of a variable. For this we shall use a  | 
|
| 708 | 621  | 
Java library function. In order to avoid having to generate a lot of  | 
622  | 
code for each \pcode{write}-command, we use a separate helper-method and
 | 
|
623  | 
just call this method with an appropriate argument (which of course  | 
|
624  | 
needs to be placed onto the stack). The code of the helper-method is as  | 
|
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
625  | 
follows.  | 
| 
374
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
626  | 
|
| 709 | 627  | 
\begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small]
 | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
628  | 
.method public static write(I)V  | 
| 
374
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
629  | 
.limit locals 1  | 
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
630  | 
.limit stack 2  | 
| 
373
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
631  | 
getstatic java/lang/System/out Ljava/io/PrintStream;  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
632  | 
iload 0  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
633  | 
invokevirtual java/io/PrintStream/println(I)V  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
634  | 
return  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
635  | 
.end method  | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
636  | 
\end{lstlisting}
 | 
| 
 
b018234c9126
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
372 
diff
changeset
 | 
637  | 
|
| 709 | 638  | 
\noindent The first line marks the beginning of the method, called  | 
639  | 
\pcode{write}. It takes a single integer argument indicated by the
 | 
|
640  | 
\pcode{(I)} and returns no result, indicated by the \pcode{V} (for
 | 
|
641  | 
void). Since the method has only one argument, we only need a single  | 
|
642  | 
local variable (Line~2) and a stack with two cells will be sufficient  | 
|
643  | 
(Line 3). Line 4 instructs the JVM to get the value of the member  | 
|
| 712 | 644  | 
\pcode{out} from the class \pcode{java/lang/System}. It expects the value
 | 
| 709 | 645  | 
to be of type \pcode{java/io/PrintStream}. A reference to this value
 | 
646  | 
will be placed on the stack.\footnote{Note the syntax \texttt{L
 | 
|
647  | 
\ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
 | 
|
648  | 
designers of Jasmin decided that this syntax is pleasing to the eye. So  | 
|
649  | 
if you wanted to have strings in your Jasmin code, you would need to  | 
|
| 710 | 650  | 
write \texttt{Ljava/lang/String;}\;. If you want arrays of one
 | 
651  | 
dimension, then use \texttt{[\ldots}; two dimensions, use
 | 
|
652  | 
\texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5
 | 
|
653  | 
copies the integer we want to print out onto the stack. In the line  | 
|
654  | 
after that we call the method \pcode{println} (from the class
 | 
|
655  | 
\pcode{java/io/PrintStream}). We want to print out an integer and do not
 | 
|
656  | 
expect anything back (that is why the type annotation is \pcode{(I)V}).
 | 
|
657  | 
The \pcode{return}-instruction in the next line changes the control-flow
 | 
|
658  | 
back to the place from where \pcode{write} was called. This method needs
 | 
|
659  | 
to be part of a header that is included in any code we generate. The  | 
|
660  | 
helper-method \pcode{write} can be invoked with the two instructions
 | 
|
| 
374
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
661  | 
|
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
662  | 
\begin{lstlisting}[mathescape,language=JVMIS]
 | 
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
663  | 
iload $E(x)$  | 
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
664  | 
invokestatic XXX/XXX/write(I)V  | 
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
665  | 
\end{lstlisting}
 | 
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
666  | 
|
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
667  | 
\noindent where we first place the variable to be printed on  | 
| 
377
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
668  | 
top of the stack and then call \pcode{write}. The \pcode{XXX}
 | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
669  | 
need to be replaced by an appropriate class name (this will be  | 
| 
 
a052a83f562e
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
376 
diff
changeset
 | 
670  | 
explained shortly).  | 
| 
374
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
671  | 
|
| 
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
672  | 
|
| 709 | 673  | 
By generating code for a WHILE-program, we end up with a list of (JVM  | 
674  | 
assembly) instructions. Unfortunately, there is a bit more boilerplate  | 
|
675  | 
code needed before these instructions can be run. Essentially we have to  | 
|
676  | 
enclose them inside a Java \texttt{main}-method. The corresponding code
 | 
|
677  | 
is shown in Figure~\ref{boiler}. This boilerplate code is very specific
 | 
|
678  | 
to the JVM. If we target any other virtual machine or a machine  | 
|
679  | 
language, then we would need to change this code. Interesting are the  | 
|
680  | 
Lines 5 and 6 where we hardwire that the stack of our programs will  | 
|
681  | 
never be larger than 200 and that the maximum number of variables is  | 
|
682  | 
also 200. This seem to be conservative default values that allow is to  | 
|
683  | 
run some simple WHILE-programs. In a real compiler, we would of course  | 
|
684  | 
need to work harder and find out appropriate values for the stack and  | 
|
685  | 
local variables.  | 
|
| 
374
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
686  | 
|
| 708 | 687  | 
\begin{figure}[t]
 | 
| 710 | 688  | 
\begin{framed}
 | 
| 708 | 689  | 
\begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
 | 
690  | 
.class public XXX.XXX  | 
|
691  | 
.super java/lang/Object  | 
|
692  | 
||
693  | 
.method public static main([Ljava/lang/String;)V  | 
|
694  | 
.limit locals 200  | 
|
695  | 
.limit stack 200  | 
|
696  | 
||
697  | 
      $\textit{\ldots{}here comes the compiled code\ldots}$
 | 
|
698  | 
||
699  | 
return  | 
|
700  | 
.end method  | 
|
701  | 
\end{lstlisting}
 | 
|
| 710 | 702  | 
\end{framed}
 | 
| 709 | 703  | 
\caption{The boilerplate code needed for running generated code. It
 | 
| 711 | 704  | 
hardwires limits for stack space and for the number of local  | 
| 709 | 705  | 
  variables.\label{boiler}}
 | 
| 708 | 706  | 
\end{figure}
 | 
707  | 
||
708  | 
||
| 
375
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
709  | 
To sum up, in Figure~\ref{test} is the complete code generated
 | 
| 601 | 710  | 
for the slightly nonsensical program  | 
| 
375
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
711  | 
|
| 
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
712  | 
\begin{lstlisting}[mathescape,language=While]
 | 
| 
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
713  | 
x := 1 + 2;  | 
| 
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
714  | 
write x  | 
| 
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
715  | 
\end{lstlisting}
 | 
| 
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
716  | 
|
| 692 | 717  | 
\noindent I let you read the code and make sure the code behaves as  | 
718  | 
expected. Having this code at our disposal, we need the assembler to  | 
|
719  | 
translate the generated code into JVM bytecode (a class file). This  | 
|
720  | 
bytecode is then understood by the JVM and can be run by just invoking  | 
|
| 709 | 721  | 
the \pcode{java}-program. Again I let you do the work.
 | 
| 
375
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
722  | 
|
| 
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
723  | 
|
| 
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
724  | 
\begin{figure}[p]
 | 
| 710 | 725  | 
\begin{framed}
 | 
| 709 | 726  | 
\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
 | 
| 708 | 727  | 
\begin{tikzpicture}[remember picture,overlay]
 | 
728  | 
\draw[|<->|,very thick] (LA.north) -- (LB.south)  | 
|
| 710 | 729  | 
     node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
 | 
| 708 | 730  | 
\draw[|<->|,very thick] (LC.north) -- (LD.south)  | 
| 710 | 731  | 
     node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
 | 
| 708 | 732  | 
\end{tikzpicture}
 | 
| 710 | 733  | 
\end{framed}
 | 
| 708 | 734  | 
\caption{The generated code for the test program \texttt{x := 1 + 2; write
 | 
735  | 
x}. This code can be processed by a Java assembler producing a  | 
|
736  | 
class-file, which can then be run by the {\tt{}java}-program.\label{test}}
 | 
|
| 
375
 
bf36664a3196
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
374 
diff
changeset
 | 
737  | 
\end{figure}
 | 
| 
374
 
0e25fb72d339
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
373 
diff
changeset
 | 
738  | 
|
| 690 | 739  | 
\subsection*{Arrays}
 | 
740  | 
||
| 708 | 741  | 
Maybe a useful addition to the WHILE-language would be arrays. This  | 
742  | 
would allow us to generate more interesting WHILE-programs by  | 
|
743  | 
translating BF*** programs into equivalent WHILE-code. Therefore in this  | 
|
744  | 
section let us have a look at how we can support the following three  | 
|
745  | 
constructions  | 
|
| 690 | 746  | 
|
747  | 
\begin{lstlisting}[mathescape,language=While]
 | 
|
| 708 | 748  | 
new(arr[15000])  | 
| 690 | 749  | 
x := 3 + arr[3 + y]  | 
750  | 
arr[42 * n] := ...  | 
|
751  | 
\end{lstlisting}
 | 
|
752  | 
||
753  | 
\noindent  | 
|
| 708 | 754  | 
The first construct is for creating new arrays. In this instance the  | 
755  | 
name of the array is \pcode{arr} and it can hold 15000 integers. We do
 | 
|
756  | 
not support ``dynamic'' arrays, that is the size of our arrays will  | 
|
757  | 
always be fixed. The second construct is for referencing an array cell  | 
|
758  | 
inside an arithmetic expression---we need to be able to look up the  | 
|
759  | 
contents of an array at an index determined by an arithmetic expression.  | 
|
760  | 
Similarly in the line below, we need to be able to update the content of  | 
|
| 712 | 761  | 
an array at a calculated index.  | 
| 691 | 762  | 
|
763  | 
For creating a new array we can generate the following three JVM  | 
|
764  | 
instructions:  | 
|
| 690 | 765  | 
|
766  | 
\begin{lstlisting}[mathescape,language=JVMIS]
 | 
|
767  | 
ldc number  | 
|
768  | 
newarray int  | 
|
769  | 
astore loc_var  | 
|
770  | 
\end{lstlisting}
 | 
|
771  | 
||
772  | 
\noindent  | 
|
| 708 | 773  | 
First we need to put the size of the array onto the stack. The next  | 
774  | 
instruction creates the array. In this case the array contains  | 
|
775  | 
\texttt{int}s. With the last instruction we can store the array as a
 | 
|
| 691 | 776  | 
local variable (like the ``simple'' variables from the previous  | 
| 692 | 777  | 
section). The use of a local variable for each array allows us to have  | 
| 708 | 778  | 
multiple arrays in a WHILE-program. For looking up an element in an  | 
| 692 | 779  | 
array we can use the following JVM code  | 
| 690 | 780  | 
|
781  | 
\begin{lstlisting}[mathescape,language=JVMIS]
 | 
|
782  | 
aload loc_var  | 
|
| 711 | 783  | 
$\textit{index\_aexp}$ 
 | 
| 690 | 784  | 
iaload  | 
785  | 
\end{lstlisting}
 | 
|
786  | 
||
787  | 
\noindent  | 
|
| 708 | 788  | 
The first instruction loads the ``pointer'', or local variable, to the  | 
789  | 
array onto the stack. Then we have some instructions calculating the  | 
|
790  | 
index where we want to look up the array. The idea is that these  | 
|
791  | 
instructions will leave a concrete number on the top of the stack, which  | 
|
792  | 
will be the index into the array we need. Finally we need to tell the  | 
|
793  | 
JVM to load the corresponding element onto the stack. Updating an array  | 
|
794  | 
at an index with a value is as follows.  | 
|
| 691 | 795  | 
|
796  | 
\begin{lstlisting}[mathescape,language=JVMIS]
 | 
|
797  | 
aload loc_var  | 
|
| 711 | 798  | 
$\textit{index\_aexp}$ 
 | 
799  | 
$\textit{value\_aexp}$ 
 | 
|
| 691 | 800  | 
iastore  | 
801  | 
\end{lstlisting}
 | 
|
802  | 
||
803  | 
\noindent  | 
|
| 708 | 804  | 
Again the first instruction loads the local variable of  | 
805  | 
the array onto the stack. Then we have some instructions calculating  | 
|
806  | 
the index where we want to update the array. After that come the  | 
|
807  | 
instructions for with which value we want to update the array. The last  | 
|
808  | 
line contains the instruction for updating the array.  | 
|
| 691 | 809  | 
|
| 708 | 810  | 
Next we need to modify our grammar rules for our WHILE-language: it  | 
| 692 | 811  | 
seems best to extend the rule for factors in arithmetic expressions with  | 
812  | 
a rule for looking up an array.  | 
|
| 691 | 813  | 
|
814  | 
\begin{plstx}[rhs style=, margin=3cm]
 | 
|
815  | 
: \meta{E} ::= \meta{T} $+$ \meta{E}
 | 
|
816  | 
         | \meta{T} $-$ \meta{E}
 | 
|
817  | 
         | \meta{T}\\
 | 
|
818  | 
: \meta{T} ::= \meta{F} $*$ \meta{T}
 | 
|
819  | 
          | \meta{F} $\backslash$ \meta{T}
 | 
|
820  | 
          | \meta{F}\\
 | 
|
821  | 
: \meta{F} ::= ( \meta{E} )
 | 
|
822  | 
          | $\underbrace{\meta{Id}\,[\,\meta{E}\,]}_{new}$
 | 
|
823  | 
          | \meta{Id}
 | 
|
824  | 
          | \meta{Num}\\
 | 
|
825  | 
\end{plstx}
 | 
|
826  | 
||
827  | 
\noindent  | 
|
828  | 
There is no problem with left-recursion as the \meta{E} is ``protected''
 | 
|
| 692 | 829  | 
by an identifier and the brackets. There are two new rules for statements,  | 
830  | 
one for creating an array and one for array assignment:  | 
|
| 691 | 831  | 
|
832  | 
\begin{plstx}[rhs style=, margin=2cm, one per line]
 | 
|
833  | 
: \meta{Stmt} ::=  \ldots
 | 
|
| 708 | 834  | 
              | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) 
 | 
| 691 | 835  | 
              | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
 | 
836  | 
\end{plstx}
 | 
|
| 690 | 837  | 
|
| 708 | 838  | 
With this in place we can turn back to the idea of creating  | 
| 712 | 839  | 
WHILE-programs by translating BF-programs. This is a relatively easy  | 
| 708 | 840  | 
task because BF has only eight instructions (we will actually implement  | 
841  | 
seven because we can omit the read-in instruction from BF). What makes  | 
|
842  | 
this translation easy is that BF-loops can be straightforwardly  | 
|
843  | 
represented as while-loops. The Scala code for the translation is as  | 
|
844  | 
follows:  | 
|
| 692 | 845  | 
|
846  | 
\begin{lstlisting}[language=Scala,numbers=left]
 | 
|
847  | 
def instr(c: Char) : String = c match {
 | 
|
848  | 
case '>' => "ptr := ptr + 1;"  | 
|
849  | 
case '<' => "ptr := ptr - 1;"  | 
|
| 708 | 850  | 
case '+' => "mem[ptr] := mem [ptr] + 1;"  | 
851  | 
case '-' => "mem [ptr] := mem [ptr] - 1;"  | 
|
852  | 
case '.' => "x := mem [ptr]; write x;"  | 
|
853  | 
  case '['  => "while (mem [ptr] != 0) do {"
 | 
|
| 692 | 854  | 
case ']' => "skip};"  | 
855  | 
case _ => ""  | 
|
856  | 
}  | 
|
857  | 
\end{lstlisting}
 | 
|
858  | 
||
859  | 
\noindent  | 
|
860  | 
The idea behind the translation is that BF-programs operate on an array,  | 
|
| 710 | 861  | 
called here \texttt{mem}. The BF-memory pointer into this array is
 | 
| 708 | 862  | 
represented as the variable \texttt{ptr}. As usual the BF-instructions
 | 
863  | 
\code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
 | 
|
864  | 
instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
 | 
|
| 710 | 865  | 
Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary
 | 
866  | 
variable since we have not changed our write functions in order to cope  | 
|
867  | 
with writing out any array-content directly. Lines 7 and 8 are for  | 
|
| 692 | 868  | 
translating BF-loops. Line 8 is interesting in the sense that we need to  | 
| 708 | 869  | 
generate a \code{skip} instruction just before finishing with the
 | 
| 692 | 870  | 
closing \code{"\}"}. The reason is that we are rather pedantic about
 | 
| 708 | 871  | 
semicolons in our WHILE-grammar: the last command cannot have a  | 
| 710 | 872  | 
semicolon---adding a \code{skip} works around this snag. 
 | 
873  | 
||
| 711 | 874  | 
Putting this all together and we can generate WHILE-programs with more  | 
| 710 | 875  | 
than 15K JVM-instructions; run the compiled JVM code for such  | 
876  | 
programs and marvel at the output\ldots\medskip  | 
|
| 708 | 877  | 
|
878  | 
\noindent  | 
|
| 711 | 879  | 
\ldots{}Hooooray, after a few more tweaks we can finally run the
 | 
880  | 
BF-mandelbrot program on the JVM (after nearly 10 minutes of parsing the  | 
|
881  | 
corresponding WHILE-program; the size of the resulting class file is  | 
|
882  | 
around 32K---not too bad). The generation of the picture completes  | 
|
883  | 
within 20 or so seconds. Try replicating this with an interpreter! The  | 
|
| 710 | 884  | 
good point is that we now have a sufficiently complicated program in our  | 
885  | 
WHILE-language in order to do some benchmarking. Which means we now face  | 
|
886  | 
the question about what to do next\ldots  | 
|
887  | 
||
888  | 
\subsection*{Optimisations \& Co}
 | 
|
889  | 
||
| 712 | 890  | 
Every compiler that deserves its name has to perform some optimisations  | 
891  | 
on the code: if we put in the extra effort of writing a compiler for a  | 
|
892  | 
language, then obviously we want to have our code to run as fast as  | 
|
893  | 
possible. So we should look into this in more detail.  | 
|
| 708 | 894  | 
|
| 711 | 895  | 
There is actually one aspect in our generated code where we can make  | 
| 712 | 896  | 
easily efficiency gains. This has to do with some of the quirks of the  | 
| 711 | 897  | 
JVM. Whenever we push a constant onto the stack, we used the JVM  | 
898  | 
instruction \instr{ldc some_const}. This is a rather generic instruction
 | 
|
899  | 
in the sense that it works not just for integers but also for strings,  | 
|
900  | 
objects and so on. What this instruction does is putting the constant  | 
|
| 712 | 901  | 
into a \emph{constant pool} and then uses an index into this constant
 | 
| 711 | 902  | 
pool. This means \instr{ldc} will be represented by at least two bytes
 | 
| 712 | 903  | 
in the class file. While this is a sensible strategy for ``large''  | 
904  | 
constants like strings, it is a bit of overkill for small integers  | 
|
905  | 
(which many integers will be when compiling a BF-program). To counter  | 
|
906  | 
this ``waste'', the JVM has specific instructions for small integers,  | 
|
907  | 
for example  | 
|
| 710 | 908  | 
|
909  | 
\begin{itemize}
 | 
|
| 711 | 910  | 
\item \instr{iconst_0},\ldots, \instr{iconst_5}
 | 
911  | 
\item \instr{bipush n}
 | 
|
| 710 | 912  | 
\end{itemize}
 | 
| 708 | 913  | 
|
| 710 | 914  | 
\noindent  | 
| 711 | 915  | 
where the \code{n} is \instr{bipush} is between -128 and 128.   By
 | 
916  | 
having dedicated instructions such as \instr{iconst_0} to
 | 
|
917  | 
\instr{iconst_5} (and \instr{iconst_m1}), we can make the generated code
 | 
|
918  | 
size smaller as these instructions only require 1 byte (as opposed the  | 
|
919  | 
generic \instr{ldc} which needs 1 byte plus another for the index into
 | 
|
920  | 
the constant pool). While in theory the use of such special instructions  | 
|
921  | 
should make the code only smaller, it actually makes the code also run  | 
|
922  | 
faster. Probably because the JVM has to process less code and uses a  | 
|
| 712 | 923  | 
specific instruction for the underlying CPU. The story with  | 
| 711 | 924  | 
\instr{bipush} is slightly different, because it also uses two
 | 
| 712 | 925  | 
bytes---so it does not necessarily result in a reduction of code size.  | 
926  | 
Instead, it probably uses a specific instruction in the underlying CPU  | 
|
927  | 
that makes the JVM code run faster.\footnote{This is all ``probable''
 | 
|
928  | 
because I have not read the 700 pages of JVM documentation by Oracle and  | 
|
929  | 
also have no clue how the JVM is implemented.} This means when  | 
|
930  | 
generating code for pushing constants onto the stack, we can use the  | 
|
931  | 
following Scala helper-function  | 
|
| 711 | 932  | 
|
933  | 
\begin{lstlisting}[language=Scala]
 | 
|
934  | 
def compile_num(i: Int) =  | 
|
935  | 
if (0 <= i && i <= 5) i"iconst_$i" else  | 
|
| 712 | 936  | 
if (-128 <= i && i <= 127) i"bipush $i"  | 
937  | 
else i"ldc $i"  | 
|
| 711 | 938  | 
\end{lstlisting}
 | 
939  | 
||
940  | 
\noindent  | 
|
| 712 | 941  | 
It generates the more efficient instructions when pushing a small integer  | 
942  | 
constant onto the stack. The default is \instr{ldc} for any other constants.
 | 
|
943  | 
||
944  | 
The JVM also has such special instructions for  | 
|
945  | 
loading and storing the first three local variables. The assumption is  | 
|
946  | 
that most operations and arguments in a method will only use very few  | 
|
947  | 
local variables. So we can use the following instructions:  | 
|
| 711 | 948  | 
|
949  | 
\begin{itemize}
 | 
|
950  | 
\item \instr{iload_0},\ldots, \instr{iload_3}
 | 
|
951  | 
\item \instr{istore_0},\ldots, \instr{istore_3}
 | 
|
952  | 
\item \instr{aload_0},\ldots, \instr{aload_3}
 | 
|
953  | 
\item \instr{astore_0},\ldots, \instr{astore_3}
 | 
|
954  | 
\end{itemize}
 | 
|
| 710 | 955  | 
|
956  | 
||
| 711 | 957  | 
\noindent Having implemented these optimisations, the code size of the  | 
| 712 | 958  | 
BF-Mandelbrot program reduces and also the class-file runs faster (the  | 
959  | 
parsing part is still very slow). According to my very rough  | 
|
960  | 
experiments:  | 
|
| 710 | 961  | 
|
| 711 | 962  | 
\begin{center}
 | 
963  | 
\begin{tabular}{lll}
 | 
|
964  | 
& class-size & runtime\\\hline  | 
|
965  | 
Mandelbrot:\\  | 
|
966  | 
\hspace{5mm}unoptimised: & 33296 & 21 secs\\
 | 
|
967  | 
\hspace{5mm}optimised:   & 21787 & 16 secs\\
 | 
|
968  | 
\end{tabular}
 | 
|
969  | 
\end{center}
 | 
|
970  | 
||
971  | 
\noindent  | 
|
972  | 
Quite good! Such optimisations are called \emph{peephole optimisations},
 | 
|
| 712 | 973  | 
because they involve changing one or a small set of instructions into an  | 
974  | 
equivalent set that has better performance.  | 
|
| 710 | 975  | 
|
| 712 | 976  | 
If you look careful at our generated code you will quickly find another  | 
977  | 
source of inefficiency in programs like  | 
|
| 711 | 978  | 
|
979  | 
\begin{lstlisting}[mathescape,language=While]
 | 
|
980  | 
x := ...;  | 
|
981  | 
write x  | 
|
982  | 
\end{lstlisting}
 | 
|
| 710 | 983  | 
|
| 711 | 984  | 
\noindent  | 
985  | 
where our code first calculates the new result the for \texttt{x} on the
 | 
|
986  | 
stack, then pops off the result into a local variable, and after that  | 
|
987  | 
loads the local variable back onto the stack for writing out a number.  | 
|
| 712 | 988  | 
|
989  | 
\begin{lstlisting}[mathescape,language=JVMIS]
 | 
|
990  | 
...  | 
|
991  | 
istore 0  | 
|
992  | 
iload 0  | 
|
993  | 
...  | 
|
994  | 
\end{lstlisting}
 | 
|
995  | 
||
996  | 
\noindent  | 
|
| 711 | 997  | 
If we can detect such situations, then we can leave the value of  | 
998  | 
\texttt{x} on the stack with for example the much cheaper instruction
 | 
|
999  | 
\instr{dup}. Now the problem with this optimisation is that it is quite
 | 
|
1000  | 
easy for the snippet above, but what about instances where there is  | 
|
1001  | 
further WHILE-code in \emph{between} these two statements? Sometimes we
 | 
|
1002  | 
will be able to optimise, sometimes we will not. The compiler needs to  | 
|
| 712 | 1003  | 
find out which situation applies. This can quickly become much more  | 
| 711 | 1004  | 
complicated. So we leave this kind of optimisations here and look at  | 
1005  | 
something more interesting and possibly surprising.  | 
|
1006  | 
||
| 712 | 1007  | 
As you might have seen, the compiler writer has a lot of freedom about  | 
1008  | 
how to generate code from what the programmer wrote as program. The only  | 
|
1009  | 
condition is that generated code should behave as expected by the  | 
|
1010  | 
programmer. Then all is fine with the code above\ldots mission  | 
|
1011  | 
accomplished! But sometimes the compiler writer is expected to go an  | 
|
1012  | 
extra mile, or even miles and change(!) the meaning of a program.  | 
|
1013  | 
Suppose we are given the following WHILE-program:  | 
|
| 692 | 1014  | 
|
| 708 | 1015  | 
\begin{lstlisting}[mathescape,language=While]
 | 
1016  | 
new(arr[10]);  | 
|
1017  | 
arr[14] := 3 + arr[13]  | 
|
1018  | 
\end{lstlisting}
 | 
|
1019  | 
||
1020  | 
\noindent  | 
|
| 711 | 1021  | 
Admittedly this is a contrived program, and probably not meant to be  | 
1022  | 
like this by any sane programmer, but it is supposed to make the  | 
|
| 712 | 1023  | 
following point: The program generates an array of size 10, and then  | 
1024  | 
tries to access the non-existing element at index 13 and even updating  | 
|
1025  | 
the element with index 14. Obviously this is baloney. Still, our  | 
|
1026  | 
compiler generates code for this program without any questions asked. We  | 
|
1027  | 
can even run this code on the JVM\ldots of course the result is an  | 
|
1028  | 
exception trace where the JVM yells at us for doing naughty  | 
|
1029  | 
things.\footnote{Still this is much better than C, for example, where
 | 
|
1030  | 
such errors are not prevented and as a result insidious attacks can be  | 
|
1031  | 
mounted against such kind C-programs. I assume everyone has heard about  | 
|
1032  | 
\emph{Buffer Overflow Attacks}.} Now what should we do in such
 | 
|
1033  | 
situations? Over- and underflows of indices are notoriously difficult to  | 
|
1034  | 
detect statically (at compiletime). So it might seem raising an  | 
|
1035  | 
exception at run-time like the JVM is the best compromise.  | 
|
| 708 | 1036  | 
|
| 711 | 1037  | 
Well, imagine we do not want to rely in our compiler on the JVM for  | 
1038  | 
producing an annoying, but safe exception trace, rather we want to  | 
|
| 712 | 1039  | 
handle such situations ourselves according to what we think should  | 
1040  | 
happen in such cases. Let us assume we want to handle them in the  | 
|
| 708 | 1041  | 
following way: if the programmer access a field out-of-bounds, we just  | 
| 712 | 1042  | 
return a default 0, and if a programmer wants to update an out-of-bounds  | 
1043  | 
field, we want to ``quietly'' ignore this update. One way to achieve  | 
|
1044  | 
this would be to rewrite the WHILE-programs and insert the necessary  | 
|
1045  | 
if-conditions for safely reading and writing arrays. Another way  | 
|
1046  | 
is to modify the code we generate.  | 
|
| 709 | 1047  | 
|
| 712 | 1048  | 
\begin{lstlisting}[mathescape,language=JVMIS2]
 | 
1049  | 
  $\textit{index\_aexp}$ 
 | 
|
1050  | 
aload loc_var  | 
|
1051  | 
dup2  | 
|
1052  | 
arraylength  | 
|
1053  | 
if_icmple L1  | 
|
1054  | 
pop2  | 
|
1055  | 
iconst_0  | 
|
1056  | 
goto L2  | 
|
1057  | 
L1:  | 
|
1058  | 
swap  | 
|
1059  | 
iaload  | 
|
1060  | 
L2:  | 
|
1061  | 
\end{lstlisting}
 | 
|
| 709 | 1062  | 
|
| 712 | 1063  | 
 \begin{lstlisting}[mathescape,language=JVMIS2]
 | 
1064  | 
  $\textit{index\_aexp}$ 
 | 
|
1065  | 
aload loc_var  | 
|
1066  | 
dup2  | 
|
1067  | 
arraylength  | 
|
1068  | 
if_icmple L1  | 
|
1069  | 
pop2  | 
|
1070  | 
goto L2  | 
|
1071  | 
L1:  | 
|
1072  | 
swap  | 
|
1073  | 
  $\textit{value\_aexp}$
 | 
|
1074  | 
iastore  | 
|
1075  | 
L2:  | 
|
1076  | 
\end{lstlisting}
 | 
|
| 709 | 1077  | 
|
| 714 | 1078  | 
\begin{figure}[p]
 | 
1079  | 
\begin{center}
 | 
|
1080  | 
\begin{tikzpicture}[every text node part/.style={align=left},
 | 
|
1081  | 
                    stack/.style={rectangle split,rectangle split parts = 5,
 | 
|
1082  | 
fill=black!20,draw,text width=1.6cm,line width=0.5mm}]  | 
|
1083  | 
\node (A)  {};
 | 
|
1084  | 
\node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}};
 | 
|
1085  | 
\node[stack,right = 60pt] (1) at (0.east)  | 
|
1086  | 
   {array\nodepart{two}
 | 
|
1087  | 
    $\textit{index}$\nodepart{three} \ldots\phantom{l}};
 | 
|
1088  | 
\node[stack,below = 40pt] (2) at (1.south)  | 
|
1089  | 
   {array\nodepart{two}
 | 
|
1090  | 
    $\textit{index}$ \nodepart{three}
 | 
|
1091  | 
    array \nodepart{four}
 | 
|
1092  | 
    $\textit{index}$\nodepart{five} \ldots\phantom{l}}; 
 | 
|
1093  | 
\node[stack,left = 90pt] (3) at (2.west)  | 
|
1094  | 
   {array\_len\nodepart{two}
 | 
|
1095  | 
    $\textit{index}$ \nodepart{three}
 | 
|
1096  | 
    array \nodepart{four}
 | 
|
1097  | 
    $\textit{index}$\nodepart{five} \ldots\phantom{l}};    
 | 
|
1098  | 
\node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south)  | 
|
1099  | 
   {array\nodepart{two}
 | 
|
1100  | 
    $\textit{index}$\nodepart{three} \ldots\phantom{l}};
 | 
|
1101  | 
\node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south)  | 
|
1102  | 
   {array\nodepart{two}
 | 
|
1103  | 
    $\textit{index}$\nodepart{three} \ldots\phantom{l}};  
 | 
|
1104  | 
\node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south)  | 
|
1105  | 
   {$\textit{index}$\nodepart{two}
 | 
|
1106  | 
    array\nodepart{three} \ldots\phantom{l}};                
 | 
|
1107  | 
\node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south)  | 
|
1108  | 
   {$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}};
 | 
|
1109  | 
\node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south)  | 
|
1110  | 
   {\ldots\phantom{l}};       
 | 
|
1111  | 
\node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south)  | 
|
1112  | 
   {0\nodepart{two} \ldots\phantom{l}}; 
 | 
|
1113  | 
||
1114  | 
\draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0); 
 | 
|
1115  | 
\draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1);
 | 
|
1116  | 
\draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2);  
 | 
|
1117  | 
\draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3);
 | 
|
1118  | 
\path[->,draw,line width=2.5mm]  | 
|
1119  | 
  let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north);  
 | 
|
1120  | 
\path[->,draw,line width=2.5mm]  | 
|
1121  | 
let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north);  | 
|
1122  | 
\draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a);
 | 
|
1123  | 
\draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b);  
 | 
|
1124  | 
\draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a);
 | 
|
1125  | 
\draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b);
 | 
|
1126  | 
\end{tikzpicture}                      
 | 
|
1127  | 
\end{center}
 | 
|
1128  | 
\end{figure}
 | 
|
1129  | 
||
| 713 | 1130  | 
goto\_w problem solved for too large jumps  | 
| 
327
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1131  | 
\end{document}
 | 
| 
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1132  | 
|
| 
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1133  | 
%%% Local Variables:  | 
| 
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1134  | 
%%% mode: latex  | 
| 
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1135  | 
%%% TeX-master: t  | 
| 
 
9470cd124667
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1136  | 
%%% End:  |