32 else ack(m - 1, ack(m, n - 1)); |
35 else ack(m - 1, ack(m, n - 1)); |
33 |
36 |
34 def gcd(a, b) = if b == 0 then a else gcd(b, a % b); |
37 def gcd(a, b) = if b == 0 then a else gcd(b, a % b); |
35 \end{lstlisting} |
38 \end{lstlisting} |
36 |
39 |
37 \noindent We will still restrict us to just programs about |
40 \noindent Compare the code of the fib-program with the same |
38 integers, that means for example that every function needs to |
41 program written in the While-language\ldots Fun is definitely |
39 return an integer. The grammar of the Fun-language is slightly |
42 more comfortable. We will still focus on programs involving |
40 simpler than the While-language, because almost everything is |
43 integers only, that means for example that every function is |
41 an expression. The grammar rules are as follows: |
44 expected to return an integer. The point of the Fun language |
|
45 is to compile each function to a separate method in JVM |
|
46 bytecode (not just a big monolithic code chunk). The means we |
|
47 need to adapt to some of the conventions of the JVM about |
|
48 methods. |
|
49 |
|
50 The grammar of the Fun-language is slightly simpler than the |
|
51 While-language, because the main syntactic category are |
|
52 expressions (we do not have statements). The grammar rules are |
|
53 as follows: |
42 |
54 |
43 |
55 |
44 \begin{plstx}[rhs style=,margin=1.5cm] |
56 \begin{plstx}[rhs style=,margin=1.5cm] |
45 : \meta{Exp} ::= \meta{Var} | \meta{Num} {\hspace{3cm}} |
57 : \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3cm}} |
46 | \meta{Exp} + \meta{Exp} | ... | (\meta{ExP}) |
58 | \meta{Exp} + \meta{Exp} | ... | (\meta{Exp}) |
47 | \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp} |
59 | \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp} |
48 | \code{write} \meta{Exp} {\hspace{5cm}} |
60 | \code{write} \meta{Exp} {\hspace{5cm}} |
49 | \meta{Exp} ; \meta{Exp} {\hspace{5cm}} |
61 | \meta{Exp} ; \meta{Exp} {\hspace{5cm}} |
50 | \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\ |
62 | \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\ |
51 : \meta{BExp} ::= ...\\ |
63 : \meta{BExp} ::= ...\\ |
52 : \meta{Decl} ::= \meta{Def} ; \meta{Decl} |
64 : \meta{Decl} ::= \meta{Def} ; \meta{Decl} |
53 | \meta{Exp}\\ |
65 | \meta{Exp}\\ |
54 : \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_2$)\\ |
66 : \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_n$) = \meta{Exp}\\ |
55 \end{plstx} |
67 \end{plstx} |
56 |
68 |
57 \noindent where \meta{Id} stands for variables and \meta{Num} |
69 \noindent where, as usual, \meta{Id} stands for variables and |
58 for numbers. For the moment let us omit variables from |
70 \meta{Num} for numbers. We can call a function by applying the |
59 arithmetic expressions. Our parser will take this grammar and |
71 arguments to a function name (as shown in the last clause of |
60 given an input produce abstract syntax trees. For example for |
72 \meta{Exp}). The arguments in such a function call can be |
61 the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the |
73 again expressions, including other function calls. In |
62 following tree. |
74 contrast, when defining a function (see \meta{Def}-clause) the |
63 |
75 arguments need to be variables, say $x_1$ to $x_n$. We call |
64 \begin{center} |
76 the expression on the right of = in a function definition as |
65 \begin{tikzpicture} |
77 the \emph{body of the function}. We have the restriction that |
66 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] |
78 the variables inside a function body can only be those that |
67 \end{tikzpicture} |
79 are mentioned as arguments of the function. A Fun-program is |
68 \end{center} |
80 then a sequence of function definitions separated by |
69 |
81 semicolons, and a final ``main'' call of a function that |
70 \noindent To generate code for this expression, we need to |
82 starts the computation in the program. For example |
71 traverse this tree in post-order fashion and emit code for |
|
72 each node---this traversal in post-order fashion will produce |
|
73 code for a stack-machine (what the JVM is). Doing so for the |
|
74 tree above generates the instructions |
|
75 |
83 |
76 \begin{lstlisting}[numbers=none] |
84 \begin{lstlisting}[numbers=none] |
77 ldc 1 |
85 def fact(n) = if n == 0 then 1 else n * fact(n - 1); |
78 ldc 2 |
86 write(fact(5)) |
79 ldc 3 |
87 \end{lstlisting} |
80 imul |
88 |
81 ldc 4 |
89 \noindent would be a valid Fun-program. The parser of the |
82 ldc 3 |
90 Fun-language produces abstract syntax trees which in Scala |
83 isub |
91 can be represented as follows: |
84 iadd |
92 |
85 iadd |
93 |
86 \end{lstlisting} |
94 {\small |
87 |
95 \begin{lstlisting}[numbers=none] |
88 \noindent If we ``run'' these instructions, the result $8$ |
96 abstract class Exp |
89 will be on top of the stack (I leave this to you to verify; |
97 abstract class BExp |
90 the meaning of each instruction should be clear). The result |
98 abstract class Decl |
91 being on the top of the stack will be a convention we always |
99 |
92 observe in our compiler, that is the results of arithmetic |
100 case class Var(s: String) extends Exp |
93 expressions will always be on top of the stack. Note, that a |
101 case class Num(i: Int) extends Exp |
94 different bracketing of the expression, for example $(1 + (2 * |
102 case class Aop(o: String, a1: Exp, a2: Exp) extends Exp |
95 3)) + (4 - 3)$, produces a different abstract syntax tree and |
103 case class If(a: BExp, e1: Exp, e2: Exp) extends Exp |
96 thus potentially also a different list of instructions. |
104 case class Write(e: Exp) extends Exp |
97 Generating code in this fashion is rather easy to implement: |
105 case class Sequ(e1: Exp, e2: Exp) extends Exp |
98 it can be done with the following recursive |
106 case class Call(name: String, args: List[Exp]) extends Exp |
99 \textit{compile}-function, which takes the abstract syntax |
107 |
100 tree as argument: |
108 case class Bop(o: String, a1: Exp, a2: Exp) extends BExp |
101 |
109 |
102 \begin{center} |
110 case class Def(name: String, |
103 \begin{tabular}{lcl} |
111 args: List[String], |
104 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\ |
112 body: Exp) extends Decl |
105 $\textit{compile}(a_1 + a_2)$ & $\dn$ & |
113 case class Main(e: Exp) extends Decl |
106 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\ |
114 \end{lstlisting}} |
107 $\textit{compile}(a_1 - a_2)$ & $\dn$ & |
115 |
108 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\ |
116 Let us first look at some clauses for compiling expressions. |
109 $\textit{compile}(a_1 * a_2)$ & $\dn$ & |
117 The compilation of arithmetic and boolean expressions is just |
110 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\ |
118 like for the While-language and do not need any modification. |
111 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & |
119 (recall that the \textit{compile}-function for boolean |
112 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\ |
120 expression takes a third argument for the label where the |
113 \end{tabular} |
121 contro-flow should jump when the boolean expression is |
114 \end{center} |
122 \emph{not} true---this is needed for compiling \pcode{if}s). |
115 |
123 One additional feature in the Fun-language are sequences. |
116 However, our arithmetic expressions can also contain |
124 Their purpose is to do one calculation after another. The |
117 variables. We will represent them as \emph{local variables} in |
125 reason why we need to be careful however is the convention |
118 the JVM. Essentially, local variables are an array or pointers |
126 that every expression can only produce s single result |
119 to memory cells, containing in our case only integers. Looking |
127 (including sequences). Since this result will be on the |
120 up a variable can be done with the instruction |
128 top of the stack, we need to generate a |
121 |
129 \pcode{pop}-instruction in order to clean-up the stack. Given |
122 \begin{lstlisting}[mathescape,numbers=none] |
130 the expression of the form \pcode{exp1 ; exp2} we need to |
123 iload $index$ |
131 generate code where after the first code chunk |
124 \end{lstlisting} |
132 a \pcode{pop}-instruction is needed. |
125 |
133 |
126 \noindent |
134 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape] |
127 which places the content of the local variable $index$ onto |
135 $\textrm{\textit{compile}}($exp1$)$ |
128 the stack. Storing the top of the stack into a local variable |
136 pop |
129 can be done by the instruction |
137 $\textrm{\textit{compile}}($exp2$)$ |
130 |
138 \end{lstlisting} |
131 \begin{lstlisting}[mathescape,numbers=none] |
139 |
132 istore $index$ |
140 \noindent In effect we ``forget'' about the result the first |
133 \end{lstlisting} |
141 expression calculates. I leave you to think about why this |
134 |
142 sequence operator is still useful in the Fun-language, even |
135 \noindent Note that this also pops off the top of the stack. |
143 if the first result is just ``discarded''. |
136 One problem we have to overcome, however, is that local |
144 |
137 variables are addressed, not by identifiers, but by numbers |
145 There is also one small modification we have to perform when |
138 (starting from $0$). Therefore our compiler needs to maintain |
146 calling the write method. Remember in the Fun-language we have |
139 a kind of environment where variables are associated to |
147 the convention that every expression needs to return an |
140 numbers. This association needs to be unique: if we muddle up |
148 integer as a result (located on the top of the stack). Our |
141 the numbers, then we essentially confuse variables and the |
149 helper function implementing write, however, ``consumes'' the |
142 consequence will usually be an erroneous result. Our extended |
150 top element of the stack and violates this convention. |
143 \textit{compile}-function for arithmetic expressions will |
151 Therefore before we call, say, \pcode{write(1+2)}, we need to |
144 therefore take two arguments: the abstract syntax tree and the |
152 duplicate the top element of the stack like so |
145 environment, $E$, that maps identifiers to index-numbers. |
153 |
146 |
154 \begin{figure}[t] |
147 \begin{center} |
155 \begin{lstlisting}[language=JVMIS, |
148 \begin{tabular}{lcl} |
156 xleftmargin=2mm, |
149 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ |
157 numbers=none] |
150 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
158 .method public static write(I)V |
151 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\ |
159 .limit locals 1 |
152 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
160 .limit stack 2 |
153 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\ |
161 getstatic java/lang/System/out Ljava/io/PrintStream; |
154 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
162 iload 0 |
155 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\ |
163 invokevirtual java/io/PrintStream/println(I)V |
156 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
164 return |
157 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\ |
165 .end method |
158 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\ |
166 \end{lstlisting} |
159 \end{tabular} |
167 \caption{The helper function for printing out integers.\label{write}} |
160 \end{center} |
168 \end{figure} |
161 |
169 |
162 \noindent In the last line we generate the code for variables |
170 |
163 where $E(x)$ stands for looking up the environment to which |
171 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape] |
164 index the variable $x$ maps to. |
172 $\textrm{\textit{compile}}($1+2$)$ |
165 |
173 dup |
166 There is a similar \textit{compile}-function for boolean |
174 invokestatic XXX/XXX/write(I)V |
167 expressions, but it includes a ``trick'' to do with |
175 \end{lstlisting} |
168 \pcode{if}- and \pcode{while}-statements. To explain the issue |
176 |
169 let us first describe the compilation of statements of the |
177 \noindent We also need to first generate code for the |
170 While-language. The clause for \pcode{skip} is trivial, since |
178 argument-expression of write, which in the While-language was |
171 we do not have to generate any instruction |
179 only allowed to be a single variable. |
172 |
180 |
173 \begin{center} |
181 Most of the new code in the compiler for the Fun-language |
174 \begin{tabular}{lcl} |
182 comes from function definitions and function calls. For this |
175 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\ |
183 have a look again at the helper function in |
176 \end{tabular} |
184 Figure~\ref{write}. Assuming we have a function definition |
177 \end{center} |
185 |
178 |
186 \begin{lstlisting}[numbers=none,mathescape] |
179 \noindent $[]$ is the empty list of instructions. Note that |
187 def fname (x1, ... , xn) = ... |
180 the \textit{compile}-function for statements returns a pair, a |
188 \end{lstlisting} |
181 list of instructions (in this case the empty list) and an |
189 |
182 environment for variables. The reason for the environment is |
190 \noindent then we have to generate |
183 that assignments in the While-language might change the |
191 |
184 environment---clearly if a variable is used for the first |
192 \begin{lstlisting}[numbers=none,mathescape,language=JVMIS] |
185 time, we need to allocate a new index and if it has been used |
193 .method public static fname (I...I)I |
186 before, we need to be able to retrieve the associated index. |
194 .limit locals ?? |
187 This is reflected in the clause for compiling assignments: |
195 .limit stack ?? |
188 |
196 ... |
189 \begin{center} |
197 ireturn |
190 \begin{tabular}{lcl} |
198 .method end |
191 $\textit{compile}(x := a, E)$ & $\dn$ & |
199 \end{lstlisting} |
192 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$ |
200 |
193 \end{tabular} |
201 \noindent where the number of \pcode{I}s corresponds to the |
194 \end{center} |
202 number of arguments the function has, say \pcode{x1} to |
195 |
203 \pcode{xn}. The final \pcode{I} is needed in order to indicate |
196 \noindent We first generate code for the right-hand side of |
204 that the function returns an integer. Therefore we also have |
197 the assignment and then add an \pcode{istore}-instruction at |
205 to use \pcode{ireturn} instead of \pcode{return}. However, |
198 the end. By convention the result of the arithmetic expression |
206 more interesting are the two \pcode{.limit} lines. Locals |
199 $a$ will be on top of the stack. After the \pcode{istore} |
207 refers to the local variables of the method, which can be |
200 instruction, the result will be stored in the index |
208 queried and overwritten using \pcode{iload} and |
201 corresponding to the variable $x$. If the variable $x$ has |
209 \pcode{istore}, respectively. |
202 been used before in the program, we just need to look up what |
|
203 the index is and return the environment unchanged (that is in |
|
204 this case $E' = E$). However, if this is the first encounter |
|
205 of the variable $x$ in the program, then we have to augment |
|
206 the environment and assign $x$ with the largest index in $E$ |
|
207 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). |
|
208 That means for the assignment $x := x + 1$ we generate the |
|
209 following code |
|
210 |
|
211 \begin{lstlisting}[mathescape,numbers=none] |
|
212 iload $n_x$ |
|
213 ldc 1 |
|
214 iadd |
|
215 istore $n_x$ |
|
216 \end{lstlisting} |
|
217 |
|
218 \noindent |
|
219 where $n_x$ is the index for the variable $x$. |
|
220 |
|
221 More complicated is the code for \pcode{if}-statments, say |
|
222 |
|
223 \begin{lstlisting}[mathescape,language={},numbers=none] |
|
224 if $b$ then $cs_1$ else $cs_2$ |
|
225 \end{lstlisting} |
|
226 |
|
227 \noindent where $b$ is a boolean expression and the $cs_{1/2}$ |
|
228 are the statements for each \pcode{if}-branch. Lets assume |
|
229 we already generated code for $b$ and $cs_{1/2}$. Then in the |
|
230 true-case the control-flow of the program needs to be |
|
231 |
|
232 \begin{center} |
|
233 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
234 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
235 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
236 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
237 \node (A1) [point] {}; |
|
238 \node (b) [block, right=of A1] {code of $b$}; |
|
239 \node (A2) [point, right=of b] {}; |
|
240 \node (cs1) [block, right=of A2] {code of $cs_1$}; |
|
241 \node (A3) [point, right=of cs1] {}; |
|
242 \node (cs2) [block, right=of A3] {code of $cs_2$}; |
|
243 \node (A4) [point, right=of cs2] {}; |
|
244 |
|
245 \draw (A1) edge [->, black, line width=1mm] (b); |
|
246 \draw (b) edge [->, black, line width=1mm] (cs1); |
|
247 \draw (cs1) edge [->, black, line width=1mm] (A3); |
|
248 \draw (A3) edge [->, black, skip loop] (A4); |
|
249 \node [below=of cs2] {\raisebox{-5mm}{\small{}jump}}; |
|
250 \end{tikzpicture} |
|
251 \end{center} |
|
252 |
|
253 \noindent where we start with running the code for $b$; since |
|
254 we are in the true case we continue with running the code for |
|
255 $cs_1$. After this however, we must not run the code for |
|
256 $cs_2$, but always jump after the last instruction of $cs_2$ |
|
257 (the code for the \pcode{else}-branch). Note that this jump is |
|
258 unconditional, meaning we always have to jump to the end of |
|
259 $cs_2$. The corresponding instruction of the JVM is |
|
260 \pcode{goto}. In case $b$ turns out to be false we need the |
|
261 control-flow |
|
262 |
|
263 \begin{center} |
|
264 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
265 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
266 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
267 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
268 \node (A1) [point] {}; |
|
269 \node (b) [block, right=of A1] {code of $b$}; |
|
270 \node (A2) [point, right=of b] {}; |
|
271 \node (cs1) [block, right=of A2] {code of $cs_1$}; |
|
272 \node (A3) [point, right=of cs1] {}; |
|
273 \node (cs2) [block, right=of A3] {code of $cs_2$}; |
|
274 \node (A4) [point, right=of cs2] {}; |
|
275 |
|
276 \draw (A1) edge [->, black, line width=1mm] (b); |
|
277 \draw (b) edge [->, black, line width=1mm] (A2); |
|
278 \draw (A2) edge [skip loop] (A3); |
|
279 \draw (A3) edge [->, black, line width=1mm] (cs2); |
|
280 \draw (cs2) edge [->,black, line width=1mm] (A4); |
|
281 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; |
|
282 \end{tikzpicture} |
|
283 \end{center} |
|
284 |
|
285 \noindent where we now need a conditional jump (if the |
|
286 if-condition is false) from the end of the code for the |
|
287 boolean to the beginning of the instructions $cs_2$. Once we |
|
288 are finished with running $cs_2$ we can continue with whatever |
|
289 code comes after the if-statement. |
|
290 |
|
291 The \pcode{goto} and the conditional jumps need addresses to |
|
292 where the jump should go. Since we are generating assembly |
|
293 code for the JVM, we do not actually have to give (numeric) |
|
294 addresses, but can just attach (symbolic) labels to our code. |
|
295 These labels specify a target for a jump. Therefore the labels |
|
296 need to be unique, as otherwise it would be ambiguous where a |
|
297 jump should go to. A label, say \pcode{L}, is attached to code |
|
298 like |
|
299 |
|
300 \begin{lstlisting}[mathescape,numbers=none] |
|
301 L: |
|
302 $instr_1$ |
|
303 $instr_2$ |
|
304 $\vdots$ |
|
305 \end{lstlisting} |
|
306 |
210 |
307 \noindent where a label is indicated by a colon. |
|
308 |
|
309 Recall the ``trick'' with compiling boolean expressions: the |
|
310 \textit{compile}-function for boolean expressions takes three |
|
311 arguments: an abstract syntax tree, an environment for |
|
312 variable indices and also the label, $lab$, to where an conditional |
|
313 jump needs to go. The clause for the expression $a_1 = a_2$, |
|
314 for example, is as follows: |
|
315 |
|
316 \begin{center} |
|
317 \begin{tabular}{lcl} |
|
318 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ |
|
319 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$} |
|
320 \end{tabular} |
|
321 \end{center} |
|
322 |
|
323 \noindent where we are first generating code for the |
|
324 subexpressions $a_1$ and $a_2$. This will mean after running |
|
325 the corresponding code there will be two integers on top of |
|
326 the stack. If they are equal, we do not have to do anything |
|
327 (except for popping them off from the stack) and just continue |
|
328 with the next instructions (see control-flow of ifs above). |
|
329 However if they are \emph{not} equal, then we need to |
|
330 (conditionally) jump to the label $lab$. This can be done with |
|
331 the instruction |
|
332 |
|
333 \begin{lstlisting}[mathescape,numbers=none] |
|
334 if_icmpne $lab$ |
|
335 \end{lstlisting} |
|
336 |
|
337 \noindent Other jump instructions for boolean operators are |
|
338 |
|
339 \begin{center} |
|
340 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} |
|
341 $\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\ |
|
342 $<$ & $\Rightarrow$ & \pcode{if_icmpge}\\ |
|
343 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ |
|
344 \end{tabular} |
|
345 \end{center} |
|
346 |
|
347 \noindent and so on. I leave it to you to extend the |
|
348 \textit{compile}-function for the other boolean expressions. |
|
349 Note that we need to jump whenever the boolean is \emph{not} |
|
350 true, which means we have to ``negate'' the jump |
|
351 condition---equals becomes not-equal, less becomes |
|
352 greater-or-equal. If you do not like this design (it can be |
|
353 the source of some nasty, hard-to-detect errors), you can also |
|
354 change the layout of the code and first give the code for the |
|
355 else-branch and then for the if-branch. However in the case |
|
356 of while-loops this way of generating code still seems |
|
357 the most convenient. |
|
358 |
|
359 We are now ready to give the compile function for |
|
360 if-statments---remember this function returns for staments a |
|
361 pair consisting of the code and an environment: |
|
362 |
|
363 \begin{center} |
|
364 \begin{tabular}{lcl} |
|
365 $\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ |
|
366 \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\ |
|
367 \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\ |
|
368 \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ |
|
369 \multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ |
|
370 \multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\ |
|
371 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\ |
|
372 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;L_\textit{ifend}$}\\ |
|
373 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifelse}:$}\\ |
|
374 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\ |
|
375 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{ifend}:, E'')$}\\ |
|
376 \end{tabular} |
|
377 \end{center} |
|
378 |
|
379 \noindent In the first two lines we generate two fresh labels |
|
380 for the jump addresses (just before the else-branch and just |
|
381 after). In the next two lines we generate the instructions for |
|
382 the two branches, $is_1$ and $is_2$. The final code will |
|
383 be first the code for $b$ (including the label |
|
384 just-before-the-else-branch), then the \pcode{goto} for after |
|
385 the else-branch, the label $L_\textit{ifesle}$, followed by |
|
386 the instructions for the else-branch, followed by the |
|
387 after-the-else-branch label. Consider for example the |
|
388 if-statement: |
|
389 |
|
390 \begin{lstlisting}[mathescape,numbers=none,language={}] |
|
391 if 1 = 1 then x := 2 else y := 3 |
|
392 \end{lstlisting} |
|
393 |
|
394 \noindent |
|
395 The generated code is as follows: |
|
396 |
|
397 \begin{lstlisting}[mathescape,language={}] |
|
398 ldc 1 |
|
399 ldc 1 |
|
400 if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ |
|
401 ldc 2 |
|
402 istore 0 |
|
403 goto L_ifend $\quad\tikz[remember picture] \node (A) {\mbox{}};$ |
|
404 L_ifelse: $\quad\tikz[remember picture] \node[] (D) {\mbox{}};$ |
|
405 ldc 3 |
|
406 istore 1 |
|
407 L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$ |
|
408 \end{lstlisting} |
|
409 |
|
410 \begin{tikzpicture}[remember picture,overlay] |
|
411 \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) |
|
412 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); |
|
413 \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) |
|
414 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); |
|
415 \end{tikzpicture} |
|
416 |
|
417 \noindent The first three lines correspond to the the boolean |
|
418 expression $1 = 1$. The jump for when this boolean expression |
|
419 is false is in Line~3. Lines 4-6 corresponds to the if-branch; |
|
420 the else-branch is in Lines 8 and 9. Note carefully how the |
|
421 environment $E$ is threaded through the recursive calls of |
|
422 \textit{compile}. The function receives an environment $E$, |
|
423 but it might extend it when compiling the if-branch, yielding |
|
424 $E'$. This happens for example in the if-statement above |
|
425 whenever the variable \code{x} has not been used before. |
|
426 Similarly with the environment $E''$ for the second call to |
|
427 \textit{compile}. $E''$ is also the environment that needs to |
|
428 be returned as part of the answer. |
|
429 |
|
430 The compilation of the while-loops, say |
|
431 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case |
|
432 the condition is true and we need to do another iteration, |
|
433 and the control-flow needs to be as follows |
|
434 |
|
435 \begin{center} |
|
436 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
437 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
438 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
439 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
440 \node (A0) [point, left=of A1] {}; |
|
441 \node (A1) [point] {}; |
|
442 \node (b) [block, right=of A1] {code of $b$}; |
|
443 \node (A2) [point, right=of b] {}; |
|
444 \node (cs1) [block, right=of A2] {code of $cs$}; |
|
445 \node (A3) [point, right=of cs1] {}; |
|
446 \node (A4) [point, right=of A3] {}; |
|
447 |
|
448 \draw (A0) edge [->, black, line width=1mm] (b); |
|
449 \draw (b) edge [->, black, line width=1mm] (cs1); |
|
450 \draw (cs1) edge [->, black, line width=1mm] (A3); |
|
451 \draw (A3) edge [->,skip loop] (A1); |
|
452 \end{tikzpicture} |
|
453 \end{center} |
|
454 |
|
455 \noindent Whereas if the condition is \emph{not} true, we |
|
456 need to jump out of the loop, which gives the following |
|
457 control flow. |
|
458 |
|
459 \begin{center} |
|
460 \begin{tikzpicture}[node distance=2mm and 4mm, |
|
461 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
|
462 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
|
463 skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] |
|
464 \node (A0) [point, left=of A1] {}; |
|
465 \node (A1) [point] {}; |
|
466 \node (b) [block, right=of A1] {code of $b$}; |
|
467 \node (A2) [point, right=of b] {}; |
|
468 \node (cs1) [block, right=of A2] {code of $cs$}; |
|
469 \node (A3) [point, right=of cs1] {}; |
|
470 \node (A4) [point, right=of A3] {}; |
|
471 |
|
472 \draw (A0) edge [->, black, line width=1mm] (b); |
|
473 \draw (b) edge [->, black, line width=1mm] (A2); |
|
474 \draw (A2) edge [skip loop] (A3); |
|
475 \draw (A3) edge [->, black, line width=1mm] (A4); |
|
476 \end{tikzpicture} |
|
477 \end{center} |
|
478 |
|
479 \noindent Again we can use the \textit{compile}-function for |
|
480 boolean expressions to insert the appropriate jump to the |
|
481 end of the loop (label $L_{wend}$ below). |
|
482 |
|
483 \begin{center} |
|
484 \begin{tabular}{lcl} |
|
485 $\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ |
|
486 \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\ |
|
487 \multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\ |
|
488 \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\ |
|
489 \multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\ |
|
490 \multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\ |
|
491 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\ |
|
492 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\ |
|
493 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\ |
|
494 \end{tabular} |
|
495 \end{center} |
|
496 |
|
497 \noindent I let you go through how this clause works. As an example |
|
498 you can consider the while-loop |
|
499 |
|
500 \begin{lstlisting}[mathescape,numbers=none,language={}] |
|
501 while x <= 10 do x := x + 1 |
|
502 \end{lstlisting} |
|
503 |
|
504 \noindent yielding the following code |
|
505 |
|
506 \begin{lstlisting}[mathescape,language={}] |
|
507 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ |
|
508 iload 0 |
|
509 ldc 10 |
|
510 if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$ |
|
511 iload 0 |
|
512 ldc 1 |
|
513 iadd |
|
514 istore 0 |
|
515 goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$ |
|
516 L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$ |
|
517 \end{lstlisting} |
|
518 |
|
519 \begin{tikzpicture}[remember picture,overlay] |
|
520 \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) |
|
521 -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); |
|
522 \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) |
|
523 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); |
|
524 \end{tikzpicture} |
|
525 |
|
526 |
|
527 Next we need to consider the statement \pcode{write x}, which |
|
528 can be used to print out the content of a variable. For this |
|
529 we need to use a Java library function. In order to avoid |
|
530 having to generate a lot of code for each |
|
531 \pcode{write}-command, we use a separate helper-method and |
|
532 just call this method with an argument (which needs to be |
|
533 placed onto the stack). The code of the helper-method is as |
|
534 follows. |
|
535 |
|
536 |
|
537 \begin{lstlisting}[language=JVMIS] |
|
538 .method public static write(I)V |
|
539 .limit locals 1 |
|
540 .limit stack 2 |
|
541 getstatic java/lang/System/out Ljava/io/PrintStream; |
|
542 iload 0 |
|
543 invokevirtual java/io/PrintStream/println(I)V |
|
544 return |
|
545 .end method |
|
546 \end{lstlisting} |
|
547 |
|
548 \noindent The first line marks the beginning of the method, |
|
549 called \pcode{write}. It takes a single integer argument |
|
550 indicated by the \pcode{(I)} and returns no result, indicated |
|
551 by the \pcode{V}. Since the method has only one argument, we |
|
552 only need a single local variable (Line~2) and a stack with |
|
553 two cells will be sufficient (Line 3). Line 4 instructs the |
|
554 JVM to get the value of the field \pcode{out} of the class |
|
555 \pcode{java/lang/System}. It expects the value to be of type |
|
556 \pcode{java/io/PrintStream}. A reference to this value will be |
|
557 placed on the stack. Line~5 copies the integer we want to |
|
558 print out onto the stack. In the next line we call the method |
|
559 \pcode{println} (from the class \pcode{java/io/PrintStream}). |
|
560 We want to print out an integer and do not expect anything |
|
561 back (that is why the type annotation is \pcode{(I)V}). The |
|
562 \pcode{return}-instruction in the next line changes the |
|
563 control-flow back to the place from where \pcode{write} was |
|
564 called. This method needs to be part of a header that is |
|
565 included in any code we generate. The helper-method |
|
566 \pcode{write} can be invoked with the two instructions |
|
567 |
|
568 \begin{lstlisting}[mathescape,language=JVMIS] |
|
569 iload $E(x)$ |
|
570 invokestatic XXX/XXX/write(I)V |
|
571 \end{lstlisting} |
|
572 |
|
573 \noindent where we first place the variable to be printed on |
|
574 top of the stack and then call \pcode{write}. The \pcode{XXX} |
|
575 need to be replaced by an appropriate class name (this will be |
|
576 explained shortly). |
|
577 |
|
578 |
|
579 \begin{figure}[t] |
|
580 \begin{lstlisting}[mathescape,language=JVMIS] |
|
581 .class public XXX.XXX |
|
582 .super java/lang/Object |
|
583 |
|
584 .method public <init>()V |
|
585 aload_0 |
|
586 invokenonvirtual java/lang/Object/<init>()V |
|
587 return |
|
588 .end method |
|
589 |
|
590 .method public static main([Ljava/lang/String;)V |
|
591 .limit locals 200 |
|
592 .limit stack 200 |
|
593 |
|
594 $\textit{\ldots{}here comes the compiled code\ldots}$ |
|
595 |
|
596 return |
|
597 .end method |
|
598 \end{lstlisting} |
|
599 \caption{Boilerplate code needed for running generated code.\label{boiler}} |
|
600 \end{figure} |
|
601 |
|
602 |
|
603 By generating code for a While-program, we end up with a list |
|
604 of (JVM assembly) instructions. Unfortunately, there is a bit |
|
605 more boilerplate code needed before these instructions can be |
|
606 run. The complete code is shown in Figure~\ref{boiler}. This |
|
607 boilerplate code is very specific to the JVM. If we target any |
|
608 other virtual machine or a machine language, then we would |
|
609 need to change this code. Lines 4 to 8 in Figure~\ref{boiler} |
|
610 contain a method for object creation in the JVM; this method |
|
611 is called \emph{before} the \pcode{main}-method in Lines 10 to |
|
612 17. Interesting are the Lines 11 and 12 where we hardwire that |
|
613 the stack of our programs will never be larger than 200 and |
|
614 that the maximum number of variables is also 200. This seem to |
|
615 be conservative default values that allow is to run some |
|
616 simple While-programs. In a real compiler, we would of course |
|
617 need to work harder and find out appropriate values for the |
|
618 stack and local variables. |
|
619 |
|
620 To sum up, in Figure~\ref{test} is the complete code generated |
|
621 for the slightly non-sensical program |
|
622 |
|
623 \begin{lstlisting}[mathescape,language=While] |
|
624 x := 1 + 2; |
|
625 write x |
|
626 \end{lstlisting} |
|
627 |
|
628 \noindent Having this code at our disposal, we need the |
|
629 assembler to translate the generated code into JVM bytecode (a |
|
630 class file). This bytecode is understood by the JVM and can be |
|
631 run by just invoking the \pcode{java}-program. |
|
632 |
|
633 |
|
634 \begin{figure}[p] |
|
635 \lstinputlisting{../progs/test-small.j} |
|
636 \caption{Generated code for a test program. This code can be |
|
637 processed by an Java assembler producing a class-file, which |
|
638 can be run by the \pcode{java}-program.\label{test}} |
|
639 \end{figure} |
|
640 |
|
641 \end{document} |
211 \end{document} |
642 |
212 |
643 %%% Local Variables: |
213 %%% Local Variables: |
644 %%% mode: latex |
214 %%% mode: latex |
645 %%% TeX-master: t |
215 %%% TeX-master: t |