|
1 \documentclass{article} |
|
2 \usepackage{../style} |
|
3 \usepackage{../langs} |
|
4 \usepackage{../grammar} |
|
5 \usepackage{../graphics} |
|
6 |
|
7 |
|
8 \begin{document} |
|
9 |
|
10 \section*{Handout 8 (A Functional Language)} |
|
11 |
|
12 The purpose of a compiler is to transform a program, a human |
|
13 can read and write, into code the machine can run as fast as |
|
14 possible. The fastest code would be machine code the CPU can |
|
15 run directly, but it is often enough to improve the speed of a |
|
16 program by just targeting a virtual machine. This produces not |
|
17 the fastest possible code, but code that is fast enough and |
|
18 has the advantage that the virtual machine takes care of |
|
19 things a compiler would normally need to take care of (like |
|
20 explicit memory management). |
|
21 |
|
22 As a first example we will implement a compiler for the very |
|
23 simple While-language. It will generate code for the Java |
|
24 Virtual Machine (JVM). This is a stack-based virtual machine, |
|
25 a fact which will make it easy to generate code for arithmetic |
|
26 expressions. For example for generating code for the |
|
27 expression $1 + 2$ we need to generate the following three |
|
28 instructions |
|
29 |
|
30 \begin{lstlisting}[numbers=none] |
|
31 ldc 1 |
|
32 ldc 2 |
|
33 iadd |
|
34 \end{lstlisting} |
|
35 |
|
36 \noindent The first instruction loads the constant $1$ onto |
|
37 the stack, the next one $2$, the third instruction adds both |
|
38 numbers together replacing the top two elements of the stack |
|
39 with the result $3$. For simplicity, we will throughout |
|
40 consider only integer numbers and results. Therefore we can |
|
41 use the JVM instructions \code{iadd}, \code{isub}, |
|
42 \code{imul}, \code{idiv} and so on. The \code{i} stands for |
|
43 integer instructions in the JVM (alternatives are \code{d} for |
|
44 doubles, \code{l} for longs and \code{f} for floats). |
|
45 |
|
46 Recall our grammar for arithmetic expressions ($E$ is the |
|
47 starting symbol): |
|
48 |
|
49 |
|
50 \begin{plstx}[rhs style=, margin=3cm] |
|
51 : \meta{E} ::= \meta{T} $+$ \meta{E} |
|
52 | \meta{T} $-$ \meta{E} |
|
53 | \meta{T}\\ |
|
54 : \meta{T} ::= \meta{F} $*$ \meta{T} |
|
55 | \meta{F} $\backslash$ \meta{T} |
|
56 | \meta{F}\\ |
|
57 : \meta{F} ::= ( \meta{E} ) |
|
58 | \meta{Id} |
|
59 | \meta{Num}\\ |
|
60 \end{plstx} |
|
61 |
|
62 |
|
63 \noindent where \meta{Id} stands for variables and \meta{Num} |
|
64 for numbers. For the moment let us omit variables from |
|
65 arithmetic expressions. Our parser will take this grammar and |
|
66 given an input produce abstract syntax trees. For example for |
|
67 the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the |
|
68 following tree. |
|
69 |
|
70 \begin{center} |
|
71 \begin{tikzpicture} |
|
72 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] |
|
73 \end{tikzpicture} |
|
74 \end{center} |
|
75 |
|
76 \noindent To generate code for this expression, we need to |
|
77 traverse this tree in post-order fashion and emit code for |
|
78 each node---this traversal in post-order fashion will produce |
|
79 code for a stack-machine (what the JVM is). Doing so for the |
|
80 tree above generates the instructions |
|
81 |
|
82 \begin{lstlisting}[numbers=none] |
|
83 ldc 1 |
|
84 ldc 2 |
|
85 ldc 3 |
|
86 imul |
|
87 ldc 4 |
|
88 ldc 3 |
|
89 isub |
|
90 iadd |
|
91 iadd |
|
92 \end{lstlisting} |
|
93 |
|
94 \noindent If we ``run'' these instructions, the result $8$ |
|
95 will be on top of the stack (I leave this to you to verify; |
|
96 the meaning of each instruction should be clear). The result |
|
97 being on the top of the stack will be a convention we always |
|
98 observe in our compiler, that is the results of arithmetic |
|
99 expressions will always be on top of the stack. Note, that a |
|
100 different bracketing of the expression, for example $(1 + (2 * |
|
101 3)) + (4 - 3)$, produces a different abstract syntax tree and |
|
102 thus potentially also a different list of instructions. |
|
103 Generating code in this fashion is rather easy to implement: |
|
104 it can be done with the following recursive |
|
105 \textit{compile}-function, which takes the abstract syntax |
|
106 tree as argument: |
|
107 |
|
108 \begin{center} |
|
109 \begin{tabular}{lcl} |
|
110 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\ |
|
111 $\textit{compile}(a_1 + a_2)$ & $\dn$ & |
|
112 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\ |
|
113 $\textit{compile}(a_1 - a_2)$ & $\dn$ & |
|
114 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\ |
|
115 $\textit{compile}(a_1 * a_2)$ & $\dn$ & |
|
116 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\ |
|
117 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & |
|
118 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\ |
|
119 \end{tabular} |
|
120 \end{center} |
|
121 |
|
122 However, our arithmetic expressions can also contain |
|
123 variables. We will represent them as \emph{local variables} in |
|
124 the JVM. Essentially, local variables are an array or pointers |
|
125 to memory cells, containing in our case only integers. Looking |
|
126 up a variable can be done with the instruction |
|
127 |
|
128 \begin{lstlisting}[mathescape,numbers=none] |
|
129 iload $index$ |
|
130 \end{lstlisting} |
|
131 |
|
132 \noindent |
|
133 which places the content of the local variable $index$ onto |
|
134 the stack. Storing the top of the stack into a local variable |
|
135 can be done by the instruction |
|
136 |
|
137 \begin{lstlisting}[mathescape,numbers=none] |
|
138 istore $index$ |
|
139 \end{lstlisting} |
|
140 |
|
141 \noindent Note that this also pops off the top of the stack. |
|
142 One problem we have to overcome, however, is that local |
|
143 variables are addressed, not by identifiers, but by numbers |
|
144 (starting from $0$). Therefore our compiler needs to maintain |
|
145 a kind of environment where variables are associated to |
|
146 numbers. This association needs to be unique: if we muddle up |
|
147 the numbers, then we essentially confuse variables and the |
|
148 consequence will usually be an erroneous result. Our extended |
|
149 \textit{compile}-function for arithmetic expressions will |
|
150 therefore take two arguments: the abstract syntax tree and the |
|
151 environment, $E$, that maps identifiers to index-numbers. |
|
152 |
|
153 \begin{center} |
|
154 \begin{tabular}{lcl} |
|
155 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ |
|
156 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
|
157 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\ |
|
158 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
|
159 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\ |
|
160 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
|
161 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\ |
|
162 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
|
163 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\ |
|
164 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\ |
|
165 \end{tabular} |
|
166 \end{center} |
|
167 |
|
168 \noindent In the last line we generate the code for variables |
|
169 where $E(x)$ stands for looking up the environment to which |
|
170 index the variable $x$ maps to. |
|
171 |
|
172 There is a similar \textit{compile}-function for boolean |
|
173 expressions, but it includes a ``trick'' to do with |
|
174 \pcode{if}- and \pcode{while}-statements. To explain the issue |
|
175 let us first describe the compilation of statements of the |
|
176 While-language. The clause for \pcode{skip} is trivial, since |
|
177 we do not have to generate any instruction |
|
178 |
|
179 \begin{center} |
|
180 \begin{tabular}{lcl} |
|
181 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\ |
|
182 \end{tabular} |
|
183 \end{center} |
|
184 |
|
185 \noindent $[]$ is the empty list of instructions. Note that |
|
186 the \textit{compile}-function for statements returns a pair, a |
|
187 list of instructions (in this case the empty list) and an |
|
188 environment for variables. The reason for the environment is |
|
189 that assignments in the While-language might change the |
|
190 environment---clearly if a variable is used for the first |
|
191 time, we need to allocate a new index and if it has been used |
|
192 before, we need to be able to retrieve the associated index. |
|
193 This is reflected in the clause for compiling assignments: |
|
194 |
|
195 \begin{center} |
|
196 \begin{tabular}{lcl} |
|
197 $\textit{compile}(x := a, E)$ & $\dn$ & |
|
198 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$ |
|
199 \end{tabular} |
|
200 \end{center} |
|
201 |
|
202 \noindent We first generate code for the right-hand side of |
|
203 the assignment and then add an \pcode{istore}-instruction at |
|
204 the end. By convention the result of the arithmetic expression |
|
205 $a$ will be on top of the stack. After the \pcode{istore} |
|
206 instruction, the result will be stored in the index |
|
207 corresponding to the variable $x$. If the variable $x$ has |
|
208 been used before in the program, we just need to look up what |
|
209 the index is and return the environment unchanged (that is in |
|
210 this case $E' = E$). However, if this is the first encounter |
|
211 of the variable $x$ in the program, then we have to augment |
|
212 the environment and assign $x$ with the largest index in $E$ |
|
213 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). |
|
214 That means for the assignment $x := x + 1$ we generate the |
|
215 following code |
|
216 |
|
217 \begin{lstlisting}[mathescape,numbers=none] |
|
218 iload $n_x$ |
|
219 ldc 1 |
|
220 iadd |
|
221 istore $n_x$ |
|
222 \end{lstlisting} |
|
223 |
|
224 \noindent |
|
225 where $n_x$ is the index for the variable $x$. |
|
226 |
|
227 More complicated is the code for \pcode{if}-statments, say |
|
228 |
|
229 \begin{lstlisting}[mathescape,language={},numbers=none] |
|
230 if $b$ then $cs_1$ else $cs_2$ |
|
231 \end{lstlisting} |
|
232 |
|
233 \noindent where $b$ is a boolean expression and the $cs_{1/2}$ |
|
234 are the statements for each \pcode{if}-branch. Lets assume |
|
235 we already generated code for $b$ and $cs_{1/2}$. Then in the |
|
236 true-case the control-flow of the program needs to be |
|
237 |
|
238 \begin{center} |
|
239 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
240 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
241 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
242 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
243 \node (A1) [point] {}; |
|
244 \node (b) [block, right=of A1] {code of $b$}; |
|
245 \node (A2) [point, right=of b] {}; |
|
246 \node (cs1) [block, right=of A2] {code of $cs_1$}; |
|
247 \node (A3) [point, right=of cs1] {}; |
|
248 \node (cs2) [block, right=of A3] {code of $cs_2$}; |
|
249 \node (A4) [point, right=of cs2] {}; |
|
250 |
|
251 \draw (A1) edge [->, black, line width=1mm] (b); |
|
252 \draw (b) edge [->, black, line width=1mm] (cs1); |
|
253 \draw (cs1) edge [->, black, line width=1mm] (A3); |
|
254 \draw (A3) edge [->, black, skip loop] (A4); |
|
255 \node [below=of cs2] {\raisebox{-5mm}{\small{}jump}}; |
|
256 \end{tikzpicture} |
|
257 \end{center} |
|
258 |
|
259 \noindent where we start with running the code for $b$; since |
|
260 we are in the true case we continue with running the code for |
|
261 $cs_1$. After this however, we must not run the code for |
|
262 $cs_2$, but always jump after the last instruction of $cs_2$ |
|
263 (the code for the \pcode{else}-branch). Note that this jump is |
|
264 unconditional, meaning we always have to jump to the end of |
|
265 $cs_2$. The corresponding instruction of the JVM is |
|
266 \pcode{goto}. In case $b$ turns out to be false we need the |
|
267 control-flow |
|
268 |
|
269 \begin{center} |
|
270 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
271 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
272 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
273 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
274 \node (A1) [point] {}; |
|
275 \node (b) [block, right=of A1] {code of $b$}; |
|
276 \node (A2) [point, right=of b] {}; |
|
277 \node (cs1) [block, right=of A2] {code of $cs_1$}; |
|
278 \node (A3) [point, right=of cs1] {}; |
|
279 \node (cs2) [block, right=of A3] {code of $cs_2$}; |
|
280 \node (A4) [point, right=of cs2] {}; |
|
281 |
|
282 \draw (A1) edge [->, black, line width=1mm] (b); |
|
283 \draw (b) edge [->, black, line width=1mm] (A2); |
|
284 \draw (A2) edge [skip loop] (A3); |
|
285 \draw (A3) edge [->, black, line width=1mm] (cs2); |
|
286 \draw (cs2) edge [->,black, line width=1mm] (A4); |
|
287 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; |
|
288 \end{tikzpicture} |
|
289 \end{center} |
|
290 |
|
291 \noindent where we now need a conditional jump (if the |
|
292 if-condition is false) from the end of the code for the |
|
293 boolean to the beginning of the instructions $cs_2$. Once we |
|
294 are finished with running $cs_2$ we can continue with whatever |
|
295 code comes after the if-statement. |
|
296 |
|
297 The \pcode{goto} and the conditional jumps need addresses to |
|
298 where the jump should go. Since we are generating assembly |
|
299 code for the JVM, we do not actually have to give (numeric) |
|
300 addresses, but can just attach (symbolic) labels to our code. |
|
301 These labels specify a target for a jump. Therefore the labels |
|
302 need to be unique, as otherwise it would be ambiguous where a |
|
303 jump should go to. A label, say \pcode{L}, is attached to code |
|
304 like |
|
305 |
|
306 \begin{lstlisting}[mathescape,numbers=none] |
|
307 L: |
|
308 $instr_1$ |
|
309 $instr_2$ |
|
310 $\vdots$ |
|
311 \end{lstlisting} |
|
312 |
|
313 \noindent where a label is indicated by a colon. |
|
314 |
|
315 Recall the ``trick'' with compiling boolean expressions: the |
|
316 \textit{compile}-function for boolean expressions takes three |
|
317 arguments: an abstract syntax tree, an environment for |
|
318 variable indices and also the label, $lab$, to where an conditional |
|
319 jump needs to go. The clause for the expression $a_1 = a_2$, |
|
320 for example, is as follows: |
|
321 |
|
322 \begin{center} |
|
323 \begin{tabular}{lcl} |
|
324 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ |
|
325 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$} |
|
326 \end{tabular} |
|
327 \end{center} |
|
328 |
|
329 \noindent where we are first generating code for the |
|
330 subexpressions $a_1$ and $a_2$. This will mean after running |
|
331 the corresponding code there will be two integers on top of |
|
332 the stack. If they are equal, we do not have to do anything |
|
333 (except for popping them off from the stack) and just continue |
|
334 with the next instructions (see control-flow of ifs above). |
|
335 However if they are \emph{not} equal, then we need to |
|
336 (conditionally) jump to the label $lab$. This can be done with |
|
337 the instruction |
|
338 |
|
339 \begin{lstlisting}[mathescape,numbers=none] |
|
340 if_icmpne $lab$ |
|
341 \end{lstlisting} |
|
342 |
|
343 \noindent Other jump instructions for boolean operators are |
|
344 |
|
345 \begin{center} |
|
346 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} |
|
347 $\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\ |
|
348 $<$ & $\Rightarrow$ & \pcode{if_icmpge}\\ |
|
349 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ |
|
350 \end{tabular} |
|
351 \end{center} |
|
352 |
|
353 \noindent and so on. I leave it to you to extend the |
|
354 \textit{compile}-function for the other boolean expressions. |
|
355 Note that we need to jump whenever the boolean is \emph{not} |
|
356 true, which means we have to ``negate'' the jump |
|
357 condition---equals becomes not-equal, less becomes |
|
358 greater-or-equal. If you do not like this design (it can be |
|
359 the source of some nasty, hard-to-detect errors), you can also |
|
360 change the layout of the code and first give the code for the |
|
361 else-branch and then for the if-branch. However in the case |
|
362 of while-loops this way of generating code still seems |
|
363 the most convenient. |
|
364 |
|
365 We are now ready to give the compile function for |
|
366 if-statments---remember this function returns for staments a |
|
367 pair consisting of the code and an environment: |
|
368 |
|
369 \begin{center} |
|
370 \begin{tabular}{lcl} |
|
371 $\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ |
|
372 \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\ |
|
373 \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\ |
|
374 \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ |
|
375 \multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ |
|
376 \multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\ |
|
377 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\ |
|
378 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\ |
|
379 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\ |
|
380 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\ |
|
381 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\ |
|
382 \end{tabular} |
|
383 \end{center} |
|
384 |
|
385 \noindent In the first two lines we generate two fresh labels |
|
386 for the jump addresses (just before the else-branch and just |
|
387 after). In the next two lines we generate the instructions for |
|
388 the two branches, $is_1$ and $is_2$. The final code will |
|
389 be first the code for $b$ (including the label |
|
390 just-before-the-else-branch), then the \pcode{goto} for after |
|
391 the else-branch, the label $L_\textit{ifesle}$, followed by |
|
392 the instructions for the else-branch, followed by the |
|
393 after-the-else-branch label. Consider for example the |
|
394 if-statement: |
|
395 |
|
396 \begin{lstlisting}[mathescape,numbers=none,language={}] |
|
397 if 1 = 1 then x := 2 else y := 3 |
|
398 \end{lstlisting} |
|
399 |
|
400 \noindent |
|
401 The generated code is as follows: |
|
402 |
|
403 \begin{lstlisting}[mathescape,language={}] |
|
404 ldc 1 |
|
405 ldc 1 |
|
406 if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ |
|
407 ldc 2 |
|
408 istore 0 |
|
409 goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$ |
|
410 L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$ |
|
411 ldc 3 |
|
412 istore 1 |
|
413 L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$ |
|
414 \end{lstlisting} |
|
415 |
|
416 \begin{tikzpicture}[remember picture,overlay] |
|
417 \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) |
|
418 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); |
|
419 \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) |
|
420 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); |
|
421 \end{tikzpicture} |
|
422 |
|
423 \noindent The first three lines correspond to the the boolean |
|
424 expression $1 = 1$. The jump for when this boolean expression |
|
425 is false is in Line~3. Lines 4-6 corresponds to the if-branch; |
|
426 the else-branch is in Lines 8 and 9. Note carefully how the |
|
427 environment $E$ is threaded through the recursive calls of |
|
428 \textit{compile}. The function receives an environment $E$, |
|
429 but it might extend it when compiling the if-branch, yielding |
|
430 $E'$. This happens for example in the if-statement above |
|
431 whenever the variable \code{x} has not been used before. |
|
432 Similarly with the environment $E''$ for the second call to |
|
433 \textit{compile}. $E''$ is also the environment that needs to |
|
434 be returned as part of the answer. |
|
435 |
|
436 The compilation of the while-loops, say |
|
437 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case |
|
438 the condition is true and we need to do another iteration, |
|
439 and the control-flow needs to be as follows |
|
440 |
|
441 \begin{center} |
|
442 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
443 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
444 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
445 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
446 \node (A0) [point, left=of A1] {}; |
|
447 \node (A1) [point] {}; |
|
448 \node (b) [block, right=of A1] {code of $b$}; |
|
449 \node (A2) [point, right=of b] {}; |
|
450 \node (cs1) [block, right=of A2] {code of $cs$}; |
|
451 \node (A3) [point, right=of cs1] {}; |
|
452 \node (A4) [point, right=of A3] {}; |
|
453 |
|
454 \draw (A0) edge [->, black, line width=1mm] (b); |
|
455 \draw (b) edge [->, black, line width=1mm] (cs1); |
|
456 \draw (cs1) edge [->, black, line width=1mm] (A3); |
|
457 \draw (A3) edge [->,skip loop] (A1); |
|
458 \end{tikzpicture} |
|
459 \end{center} |
|
460 |
|
461 \noindent Whereas if the condition is \emph{not} true, we |
|
462 need to jump out of the loop, which gives the following |
|
463 control flow. |
|
464 |
|
465 \begin{center} |
|
466 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
467 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
468 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
469 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
470 \node (A0) [point, left=of A1] {}; |
|
471 \node (A1) [point] {}; |
|
472 \node (b) [block, right=of A1] {code of $b$}; |
|
473 \node (A2) [point, right=of b] {}; |
|
474 \node (cs1) [block, right=of A2] {code of $cs$}; |
|
475 \node (A3) [point, right=of cs1] {}; |
|
476 \node (A4) [point, right=of A3] {}; |
|
477 |
|
478 \draw (A0) edge [->, black, line width=1mm] (b); |
|
479 \draw (b) edge [->, black, line width=1mm] (A2); |
|
480 \draw (A2) edge [skip loop] (A3); |
|
481 \draw (A3) edge [->, black, line width=1mm] (A4); |
|
482 \end{tikzpicture} |
|
483 \end{center} |
|
484 |
|
485 \noindent Again we can use the \textit{compile}-function for |
|
486 boolean expressions to insert the appropriate jump to the |
|
487 end of the loop (label $L_{wend}$ below). |
|
488 |
|
489 \begin{center} |
|
490 \begin{tabular}{lcl} |
|
491 $\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ |
|
492 \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\ |
|
493 \multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\ |
|
494 \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\ |
|
495 \multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\ |
|
496 \multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\ |
|
497 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\ |
|
498 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\ |
|
499 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\ |
|
500 \end{tabular} |
|
501 \end{center} |
|
502 |
|
503 \noindent I let you go through how this clause works. As an example |
|
504 you can consider the while-loop |
|
505 |
|
506 \begin{lstlisting}[mathescape,numbers=none,language={}] |
|
507 while x <= 10 do x := x + 1 |
|
508 \end{lstlisting} |
|
509 |
|
510 \noindent yielding the following code |
|
511 |
|
512 \begin{lstlisting}[mathescape,language={}] |
|
513 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ |
|
514 iload 0 |
|
515 ldc 10 |
|
516 if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$ |
|
517 iload 0 |
|
518 ldc 1 |
|
519 iadd |
|
520 istore 0 |
|
521 goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$ |
|
522 L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$ |
|
523 \end{lstlisting} |
|
524 |
|
525 \begin{tikzpicture}[remember picture,overlay] |
|
526 \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) |
|
527 -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); |
|
528 \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) |
|
529 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); |
|
530 \end{tikzpicture} |
|
531 |
|
532 |
|
533 Next we need to consider the statement \pcode{write x}, which |
|
534 can be used to print out the content of a variable. For this |
|
535 we need to use a Java library function. In order to avoid |
|
536 having to generate a lot of code for each |
|
537 \pcode{write}-command, we use a separate helper-method and |
|
538 just call this method with an argument (which needs to be |
|
539 placed onto the stack). The code of the helper-method is as |
|
540 follows. |
|
541 |
|
542 |
|
543 \begin{lstlisting}[language=JVMIS] |
|
544 .method public static write(I)V |
|
545 .limit locals 1 |
|
546 .limit stack 2 |
|
547 getstatic java/lang/System/out Ljava/io/PrintStream; |
|
548 iload 0 |
|
549 invokevirtual java/io/PrintStream/println(I)V |
|
550 return |
|
551 .end method |
|
552 \end{lstlisting} |
|
553 |
|
554 \noindent The first line marks the beginning of the method, |
|
555 called \pcode{write}. It takes a single integer argument |
|
556 indicated by the \pcode{(I)} and returns no result, indicated |
|
557 by the \pcode{V}. Since the method has only one argument, we |
|
558 only need a single local variable (Line~2) and a stack with |
|
559 two cells will be sufficient (Line 3). Line 4 instructs the |
|
560 JVM to get the value of the field \pcode{out} of the class |
|
561 \pcode{java/lang/System}. It expects the value to be of type |
|
562 \pcode{java/io/PrintStream}. A reference to this value will be |
|
563 placed on the stack. Line~5 copies the integer we want to |
|
564 print out onto the stack. In the next line we call the method |
|
565 \pcode{println} (from the class \pcode{java/io/PrintStream}). |
|
566 We want to print out an integer and do not expect anything |
|
567 back (that is why the type annotation is \pcode{(I)V}). The |
|
568 \pcode{return}-instruction in the next line changes the |
|
569 control-flow back to the place from where \pcode{write} was |
|
570 called. This method needs to be part of a header that is |
|
571 included in any code we generate. The helper-method |
|
572 \pcode{write} can be invoked with the two instructions |
|
573 |
|
574 \begin{lstlisting}[mathescape,language=JVMIS] |
|
575 iload $E(x)$ |
|
576 invokestatic XXX/XXX/write(I)V |
|
577 \end{lstlisting} |
|
578 |
|
579 \noindent where we first place the variable to be printed on |
|
580 top of the stack and then call \pcode{write}. The \pcode{XXX} |
|
581 need to be replaced by an appropriate class name (this will be |
|
582 explained shortly). |
|
583 |
|
584 |
|
585 \begin{figure}[t] |
|
586 \begin{lstlisting}[mathescape,language=JVMIS] |
|
587 .class public XXX.XXX |
|
588 .super java/lang/Object |
|
589 |
|
590 .method public <init>()V |
|
591 aload_0 |
|
592 invokenonvirtual java/lang/Object/<init>()V |
|
593 return |
|
594 .end method |
|
595 |
|
596 .method public static main([Ljava/lang/String;)V |
|
597 .limit locals 200 |
|
598 .limit stack 200 |
|
599 |
|
600 $\textit{\ldots{}here comes the compiled code\ldots}$ |
|
601 |
|
602 return |
|
603 .end method |
|
604 \end{lstlisting} |
|
605 \caption{Boilerplate code needed for running generated code.\label{boiler}} |
|
606 \end{figure} |
|
607 |
|
608 |
|
609 By generating code for a While-program, we end up with a list |
|
610 of (JVM assembly) instructions. Unfortunately, there is a bit |
|
611 more boilerplate code needed before these instructions can be |
|
612 run. The complete code is shown in Figure~\ref{boiler}. This |
|
613 boilerplate code is very specific to the JVM. If we target any |
|
614 other virtual machine or a machine language, then we would |
|
615 need to change this code. Lines 4 to 8 in Figure~\ref{boiler} |
|
616 contain a method for object creation in the JVM; this method |
|
617 is called \emph{before} the \pcode{main}-method in Lines 10 to |
|
618 17. Interesting are the Lines 11 and 12 where we hardwire that |
|
619 the stack of our programs will never be larger than 200 and |
|
620 that the maximum number of variables is also 200. This seem to |
|
621 be conservative default values that allow is to run some |
|
622 simple While-programs. In a real compiler, we would of course |
|
623 need to work harder and find out appropriate values for the |
|
624 stack and local variables. |
|
625 |
|
626 To sum up, in Figure~\ref{test} is the complete code generated |
|
627 for the slightly non-sensical program |
|
628 |
|
629 \begin{lstlisting}[mathescape,language=While] |
|
630 x := 1 + 2; |
|
631 write x |
|
632 \end{lstlisting} |
|
633 |
|
634 \noindent Having this code at our disposal, we need the |
|
635 assembler to translate the generated code into JVM bytecode (a |
|
636 class file). This bytecode is understood by the JVM and can be |
|
637 run by just invoking the \pcode{java}-program. |
|
638 |
|
639 |
|
640 \begin{figure}[p] |
|
641 \lstinputlisting{../progs/test-small.j} |
|
642 \caption{Generated code for a test program. This code can be |
|
643 processed by an Java assembler producing a class-file, which |
|
644 can be run by the \pcode{java}-program.\label{test}} |
|
645 \end{figure} |
|
646 |
|
647 \end{document} |
|
648 |
|
649 %%% Local Variables: |
|
650 %%% mode: latex |
|
651 %%% TeX-master: t |
|
652 %%% End: |