22 good enough for improving the speed of a program to target a virtual |
22 good enough for improving the speed of a program to target a virtual |
23 machine instead. This produces not the fastest possible code, but code |
23 machine instead. This produces not the fastest possible code, but code |
24 that is often pretty fast. This way of producing code has also the |
24 that is often pretty fast. This way of producing code has also the |
25 advantage that the virtual machine takes care of things a compiler would |
25 advantage that the virtual machine takes care of things a compiler would |
26 normally need to take care of (hairy things like explicit memory |
26 normally need to take care of (hairy things like explicit memory |
27 management). |
27 management). |
28 |
28 |
29 As a first example in this module we will implement a compiler for the |
29 As a first example in this module we will implement a compiler for the |
30 very simple WHILE-language that we parsed in the last lecture. The |
30 very simple WHILE-language that we parsed in the last lecture. The |
31 compiler will target the Java Virtual Machine (JVM), but not directly. |
31 compiler will target the Java Virtual Machine (JVM), but not directly. |
32 Pictorially the compiler will work as follows: |
32 Pictorially the compiler will work as follows: |
33 |
33 |
34 \begin{center} |
34 \begin{center} |
35 \begin{tikzpicture}[scale=1,font=\bf, |
35 \begin{tikzpicture}[scale=1,font=\bf, |
36 node/.style={ |
36 node/.style={ |
37 rectangle,rounded corners=3mm, |
37 rectangle,rounded corners=3mm, |
38 ultra thick,draw=black!50,minimum height=18mm, |
38 ultra thick,draw=black!50,minimum height=18mm, |
39 minimum width=20mm, |
39 minimum width=20mm, |
40 top color=white,bottom color=black!20}] |
40 top color=white,bottom color=black!20}] |
41 |
41 |
42 \node (0) at (-3,0) {}; |
42 \node (0) at (-3,0) {}; |
43 \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler}; |
43 \node (A) at (0,0) [node,text width=1.6cm,text centered] {our compiler}; |
44 \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau}; |
44 \node (B) at (3.5,0) [node,text width=1.6cm,text centered] {Jasmin / Krakatau}; |
45 \node (C) at (7.5,0) [node] {JVM}; |
45 \node (C) at (7.5,0) [node] {JVM}; |
46 |
46 |
47 \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); |
47 \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {*.while} (A); |
48 \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); |
48 \draw [->,line width=2.5mm] (A) -- node [above,pos=0.35] {*.j} (B); |
49 \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); |
49 \draw [->,line width=2.5mm] (B) -- node [above,pos=0.35] {*.class} (C); |
50 \end{tikzpicture} |
50 \end{tikzpicture} |
51 \end{center} |
51 \end{center} |
52 |
52 |
53 \noindent |
53 \noindent |
54 The input will be WHILE-programs; the output will be assembly files |
54 The input will be WHILE-programs; the output will be assembly files |
55 (with the file extension .j). Assembly files essentially contain |
55 (with the file extension *.j). Assembly files essentially contain |
56 human-readable low-level code, meaning they are not just bits and |
56 human-readable low-level code, meaning they are not just bits and |
57 bytes, but rather something you can read and understand---with a bit |
57 bytes, but rather something you can read and understand---with a bit |
58 of practice of course. An \emph{assembler} will then translate the |
58 of practice of course. An \emph{assembler} will then translate the |
59 assembly files into unreadable class- or binary-files the JVM or CPU |
59 assembly files into unreadable class- or binary-files the JVM or CPU |
60 can run. Unfortunately, the Java ecosystem does not come with an |
60 can run, i.e.~bits and bytes. Unfortunately, the Java ecosystem does not come with an |
61 assembler which would be handy for our compiler-endeavour (unlike |
61 assembler which would be handy for our compiler-endeavour (unlike |
62 Microsoft's Common Language Infrastructure for the .Net platform which |
62 Microsoft's Common Language Infrastructure for the .Net platform which |
63 has an assembler out-of-the-box). As a substitute we shall use the |
63 has an assembler out-of-the-box). As a substitute we shall use the |
64 3rd-party programs Jasmin or Krakatau (Jasmin is the preferred |
64 3rd-party program Jasmin, or alternatively Krakatau (Jasmin is the preferred |
65 option---a \texttt{jasmin.jar}-file is available on KEATS): |
65 option---a \texttt{jasmin.jar}-file is available on KEATS): |
66 |
66 |
67 \begin{itemize} |
67 \begin{itemize} |
68 \item \url{http://jasmin.sourceforge.net} |
68 \item \url{http://jasmin.sourceforge.net} |
69 \item \url{https://github.com/Storyyeller/Krakatau} |
69 \item \url{https://github.com/Storyyeller/Krakatau} |
73 The first is a Java program and the second a program written in Python. |
73 The first is a Java program and the second a program written in Python. |
74 Each of them allow us to generate \emph{assembly} files that are still |
74 Each of them allow us to generate \emph{assembly} files that are still |
75 readable by humans, as opposed to class-files which are pretty much just |
75 readable by humans, as opposed to class-files which are pretty much just |
76 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take |
76 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take |
77 our assembly files as input and generate the corresponding class-files for |
77 our assembly files as input and generate the corresponding class-files for |
78 us. |
78 us. |
79 |
79 |
80 What is good about the JVM is that it is a stack-based virtual machine, |
80 What is good about the JVM is that it is a stack-based virtual machine, |
81 a fact which will make it easy to generate code for arithmetic |
81 a fact which will make it easy to generate code for arithmetic |
82 expressions. For example when compiling the expression $1 + 2$ we need |
82 expressions. For example when compiling the expression $1 + 2$ we need |
83 to generate the following three instructions |
83 to generate the following three instructions |
84 |
84 |
85 \begin{lstlisting}[language=JVMIS,numbers=none] |
85 \begin{lstlisting}[language=JVMIS,numbers=none] |
86 ldc 1 |
86 ldc 1 |
87 ldc 2 |
87 ldc 2 |
88 iadd |
88 iadd |
89 \end{lstlisting} |
89 \end{lstlisting} |
90 |
90 |
91 \noindent The first instruction loads the constant $1$ onto the stack, |
91 \noindent The first instruction loads the constant $1$ onto the stack, |
92 the next one loads $2$, the third instruction adds both numbers together |
92 the next one loads $2$, the third instruction adds both numbers together |
93 replacing the top two elements of the stack with the result $3$. For |
93 replacing the top two elements of the stack with the result $3$. For |
132 node---this traversal in \emph{post-order} fashion will produce code for |
132 node---this traversal in \emph{post-order} fashion will produce code for |
133 a stack-machine (which is what the JVM is). Doing so for the tree above |
133 a stack-machine (which is what the JVM is). Doing so for the tree above |
134 generates the instructions |
134 generates the instructions |
135 |
135 |
136 \begin{lstlisting}[language=JVMIS,numbers=none] |
136 \begin{lstlisting}[language=JVMIS,numbers=none] |
137 ldc 1 |
137 ldc 1 |
138 ldc 2 |
138 ldc 2 |
139 ldc 3 |
139 ldc 3 |
140 imul |
140 imul |
141 ldc 4 |
141 ldc 4 |
142 ldc 3 |
142 ldc 3 |
143 isub |
143 isub |
144 iadd |
144 iadd |
145 iadd |
145 iadd |
146 \end{lstlisting} |
146 \end{lstlisting} |
147 |
147 |
148 \noindent If we ``run'' these instructions, the result $8$ will be on |
148 \noindent If we ``run'' these instructions, the result $8$ will be on |
149 top of the stack (I leave this to you to verify; the meaning of each |
149 top of the stack (I leave this to you to verify; the meaning of each |
150 instruction should be clear). The result being on the top of the stack |
150 instruction should be clear). The result being on the top of the stack |
151 will be an important convention we always observe in our compiler. Note, |
151 will be an important convention we always observe in our compiler. Note, |
152 that a different bracketing of the expression, for example $(1 + (2 * |
152 that a different bracketing of the expression, for example $(1 + (2 * |
153 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also |
153 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also |
154 a different list of instructions. |
154 a different list of instructions. |
155 |
155 |
156 Generating code in this post-order-traversal fashion is rather easy to |
156 Generating code in this post-order-traversal fashion is rather easy to |
157 implement: it can be done with the following recursive |
157 implement: it can be done with the following recursive |
158 \textit{compile}-function, which takes the abstract syntax tree as an |
158 \textit{compile}-function, which takes the abstract syntax tree as an |
159 argument: |
159 argument: |
205 and an environment, $E$, that maps identifiers to index-numbers. |
205 and an environment, $E$, that maps identifiers to index-numbers. |
206 |
206 |
207 \begin{center} |
207 \begin{center} |
208 \begin{tabular}{lcl} |
208 \begin{tabular}{lcl} |
209 $\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\ |
209 $\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\ |
210 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
210 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
211 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\ |
211 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\ |
212 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
212 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
213 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\ |
213 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\ |
214 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
214 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
215 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\ |
215 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\ |
216 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
216 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
217 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\ |
217 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\ |
218 $\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\ |
218 $\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\ |
219 \end{tabular} |
219 \end{tabular} |
220 \end{center} |
220 \end{center} |
221 |
221 |
365 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; |
365 \node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; |
366 \end{tikzpicture} |
366 \end{tikzpicture} |
367 \end{center} |
367 \end{center} |
368 |
368 |
369 \noindent where we now need a conditional jump (if the |
369 \noindent where we now need a conditional jump (if the |
370 if-condition is false) from the end of the code for the |
370 if-condition is false) from the end of the code for the |
371 boolean to the beginning of the instructions $cs_2$. Once we |
371 boolean to the beginning of the instructions $cs_2$. Once we |
372 are finished with running $cs_2$ we can continue with whatever |
372 are finished with running $cs_2$ we can continue with whatever |
373 code comes after the if-statement. |
373 code comes after the if-statement. |
374 |
374 |
375 The \instr{goto} and the conditional jumps need addresses to |
375 The \instr{goto} and the conditional jumps need addresses to |
376 where the jump should go. Since we are generating assembly |
376 where the jump should go. Since we are generating assembly |
377 code for the JVM, we do not actually have to give (numeric) |
377 code for the JVM, we do not actually have to give (numeric) |
378 addresses, but can just attach (symbolic) labels to our code. |
378 addresses, but can just attach (symbolic) labels to our code. |
379 These labels specify a target for a jump. Therefore the labels |
379 These labels specify a target for a jump---essentially they mark |
|
380 a location in our code. Therefore the labels |
380 need to be unique, as otherwise it would be ambiguous where a |
381 need to be unique, as otherwise it would be ambiguous where a |
381 jump should go to. A label, say \pcode{L}, is attached to assembly |
382 jump should go to. A label, say \pcode{L}, is attached to assembly |
382 code like |
383 code like |
383 |
384 |
384 \begin{lstlisting}[mathescape,numbers=none] |
385 \begin{lstlisting}[mathescape,numbers=none] |
|
386 $\textit{instr\_1}$ |
385 L: |
387 L: |
386 $\textit{instr\_1}$ |
|
387 $\textit{instr\_2}$ |
388 $\textit{instr\_2}$ |
|
389 $\textit{instr\_3}$ |
388 $\vdots$ |
390 $\vdots$ |
389 \end{lstlisting} |
391 \end{lstlisting} |
390 |
392 |
391 \noindent where the label needs to be followed by a colon. The task of |
393 \noindent where the label needs to be followed by a colon. The task of |
392 the assembler (in our case Jasmin or Krakatau) is to resolve the labels |
394 the assembler (in our case Jasmin or Krakatau) is to resolve the labels |
393 to actual (numeric) addresses, for example jump 10 instructions forward, |
395 to actual (numeric) addresses, for example jump 10 instructions forward, |
394 or 20 instructions backwards. |
396 or 20 instructions backwards, or jump to this particular address. |
395 |
397 |
396 Recall the ``trick'' with compiling boolean expressions: the |
398 Recall the ``trick'' with compiling boolean expressions: the |
397 \textit{compile}-function for boolean expressions takes three |
399 \textit{compile}-function for boolean expressions takes three |
398 arguments: an abstract syntax tree, an environment for |
400 arguments: an abstract syntax tree, an environment for |
399 variable indices and also the label, $lab$, to where an conditional |
401 variable indices and also the label, which I called $lab$, to where an conditional |
400 jump needs to go. The clause for the expression $a_1 = a_2$, |
402 jump needs to go. The clause for the expression $a_1 = a_2$, |
401 for example, is as follows: |
403 for example, is as follows: |
402 |
404 |
403 \begin{center} |
405 \begin{center} |
404 \begin{tabular}{lcl} |
406 \begin{tabular}{lcl} |
405 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ |
407 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ |
406 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$} |
408 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$} |
407 \end{tabular} |
409 \end{tabular} |
408 \end{center} |
410 \end{center} |
409 |
411 |
410 \noindent where we are first generating code for the |
412 \noindent where we are first generating code for the |
442 layout of the code and first give the code for the else-branch and then |
444 layout of the code and first give the code for the else-branch and then |
443 for the if-branch. However in the case of while-loops this |
445 for the if-branch. However in the case of while-loops this |
444 ``upside-down-inside-out'' way of generating code still seems the most |
446 ``upside-down-inside-out'' way of generating code still seems the most |
445 convenient. |
447 convenient. |
446 |
448 |
447 We are now ready to give the compile function for |
449 We are now ready to give the compile function for |
448 if-statements---remember this function returns for statements a |
450 if-statements---remember this function returns for statements a |
449 pair consisting of the code and an environment: |
451 pair consisting of the code and an environment: |
450 |
452 |
451 \begin{center} |
453 \begin{center} |
452 \begin{tabular}{lcl} |
454 \begin{tabular}{lcl} |
453 $\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ |
455 $\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ |
454 \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\ |
456 \multicolumn{3}{l}{$\qquad L_\textit{ifelse}\;$ (fresh label)}\\ |
455 \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\ |
457 \multicolumn{3}{l}{$\qquad L_\textit{ifend}\;$ (fresh label)}\\ |
456 \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ |
458 \multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ |
457 \multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ |
459 \multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ |
458 \multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\ |
460 \multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, L_\textit{ifelse})$}\\ |
466 |
468 |
467 \noindent In the first two lines we generate two fresh labels |
469 \noindent In the first two lines we generate two fresh labels |
468 for the jump addresses (just before the else-branch and just |
470 for the jump addresses (just before the else-branch and just |
469 after). In the next two lines we generate the instructions for |
471 after). In the next two lines we generate the instructions for |
470 the two branches, $is_1$ and $is_2$. The final code will |
472 the two branches, $is_1$ and $is_2$. The final code will |
471 be first the code for $b$ (including the label |
473 be first the code for $b$ (including the label |
472 just-before-the-else-branch), then the \pcode{goto} for after |
474 just-before-the-else-branch), then the \pcode{goto} for after |
473 the else-branch, the label $L_\textit{ifelse}$, followed by |
475 the else-branch, the label $L_\textit{--ifelse}$, followed by |
474 the instructions for the else-branch, followed by the |
476 the instructions for the else-branch, followed by the |
475 after-the-else-branch label. Consider for example the |
477 after-the-else-branch label. Consider for example the |
476 if-statement: |
478 if-statement: |
477 |
479 |
478 \begin{lstlisting}[mathescape,numbers=none,language=While] |
480 \begin{lstlisting}[mathescape,numbers=none,language=While] |
479 if 1 == 1 then x := 2 else y := 3 |
481 if 1 == 1 then x := 2 else y := 3 |
480 \end{lstlisting} |
482 \end{lstlisting} |
481 |
483 |
482 \noindent |
484 \noindent |
483 The generated code is as follows: |
485 The generated code is as follows: |
484 |
486 |
485 \begin{lstlisting}[language=JVMIS,mathescape,numbers=left] |
487 \begin{lstlisting}[language=JVMIS,mathescape,numbers=left] |
486 ldc 1 |
488 ldc 1 |
487 ldc 1 |
489 ldc 1 |
494 istore 1 |
496 istore 1 |
495 L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$ |
497 L_ifend: $\quad\tikz[remember picture] \node[] (B) {\mbox{}};$ |
496 \end{lstlisting} |
498 \end{lstlisting} |
497 |
499 |
498 \begin{tikzpicture}[remember picture,overlay] |
500 \begin{tikzpicture}[remember picture,overlay] |
499 \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) |
501 \draw[->,very thick] (A) edge [->,to path={-- ++(10mm,0mm) |
500 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); |
502 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (B.east); |
501 \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) |
503 \draw[->,very thick] (C) edge [->,to path={-- ++(10mm,0mm) |
502 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); |
504 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (D.east); |
503 \end{tikzpicture} |
505 \end{tikzpicture} |
504 |
506 |
505 \noindent The first three lines correspond to the the boolean |
507 \noindent The first three lines correspond to the the boolean |
506 expression \texttt{1 == 1}. The jump for when this boolean expression |
508 expression \texttt{1 == 1}. The jump for when this boolean expression |
507 is false is in Line~3. Lines 4-6 corresponds to the if-branch; |
509 is false is in Line~3. Lines 4-6 corresponds to the if-branch; |
508 the else-branch is in Lines 8 and 9. |
510 the else-branch is in Lines 8 and 9. |
509 |
511 |
510 Note carefully how the environment $E$ is threaded through the recursive |
512 Note carefully how the environment $E$ is threaded through the recursive |
511 calls of \textit{compile}. The function receives an environment $E$, but |
513 calls of \textit{compile}. The function receives an environment $E$, but |
512 it might extend it when compiling the if-branch, yielding $E'$. This |
514 it might extend it when compiling the if-branch, yielding $E'$. This |
513 happens for example in the if-statement above whenever the variable |
515 happens for example in the if-statement above whenever the variable |
514 \code{x} has not been used before. Similarly with the environment $E''$ |
516 \code{x} has not been used before. Similarly with the environment $E''$ |
515 for the second call to \textit{compile}. $E''$ is also the environment |
517 for the second call to \textit{compile}. $E''$ is also the environment |
516 that needs to be returned as part of the answer. |
518 that needs to be returned as part of the answer. |
517 |
519 |
518 The compilation of the while-loops, say |
520 The compilation of the while-loops of the form |
519 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case |
521 \pcode{while} $b$ \pcode{do} $cs$ is very similar. In case |
520 the condition is true and we need to do another iteration, |
522 the condition is true and we need to do another iteration, |
521 and the control-flow needs to be as follows |
523 and the control-flow needs to be as follows |
522 |
524 |
523 \begin{center} |
525 \begin{center} |
524 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
526 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
525 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
527 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
566 \end{tikzpicture} |
568 \end{tikzpicture} |
567 \end{center} |
569 \end{center} |
568 |
570 |
569 \noindent Again we can use the \textit{compile}-function for |
571 \noindent Again we can use the \textit{compile}-function for |
570 boolean expressions to insert the appropriate jump to the |
572 boolean expressions to insert the appropriate jump to the |
571 end of the loop (label $L_{wend}$ below). |
573 end of the loop (label $L_\textit{--wend}$ below). |
572 |
574 |
573 \begin{center} |
575 \begin{center} |
574 \begin{tabular}{lcl} |
576 \begin{tabular}{lcl} |
575 $\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ |
577 $\textit{compile}(\pcode{while}\; b\; \pcode{do} \;cs, E)$ & $\dn$\\ |
576 \multicolumn{3}{l}{$\qquad L_{wbegin}\;$ (fresh label)}\\ |
578 \multicolumn{3}{l}{$\qquad L_\textit{--wbegin}\;$ (fresh label)}\\ |
577 \multicolumn{3}{l}{$\qquad L_{wend}\;$ (fresh label)}\\ |
579 \multicolumn{3}{l}{$\qquad L_\textit{--wend}\;$ (fresh label)}\\ |
578 \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\ |
580 \multicolumn{3}{l}{$\qquad (is, E') = \textit{compile}(cs_1, E)$}\\ |
579 \multicolumn{3}{l}{$\qquad(L_{wbegin}:$}\\ |
581 \multicolumn{3}{l}{$\qquad(L_\textit{--wbegin}$}\\ |
580 \multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_{wend})$}\\ |
582 \multicolumn{3}{l}{$\qquad\phantom{(}@\;\textit{compile}(b, E, L_\textit{--wend})$}\\ |
581 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\ |
583 \multicolumn{3}{l}{$\qquad\phantom{(}@\;is$}\\ |
582 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_{wbegin}$}\\ |
584 \multicolumn{3}{l}{$\qquad\phantom{(}@\; \text{goto}\;L_\textit{--wbegin}$}\\ |
583 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_{wend}:, E')$}\\ |
585 \multicolumn{3}{l}{$\qquad\phantom{(}@\;L_\textit{--wend}, E')$}\\ |
584 \end{tabular} |
586 \end{tabular} |
585 \end{center} |
587 \end{center} |
586 |
588 |
587 \noindent I let you go through how this clause works. As an example |
589 \noindent I let you go through how this clause works. As an example |
588 you can consider the while-loop |
590 you can consider the while-loop |
603 iadd |
605 iadd |
604 istore 0 |
606 istore 0 |
605 goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$ |
607 goto L_wbegin $\quad\tikz[remember picture] \node (LA) {\mbox{}};$ |
606 L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$ |
608 L_wend: $\quad\tikz[remember picture] \node[] (LD) {\mbox{}};$ |
607 \end{lstlisting} |
609 \end{lstlisting} |
608 |
610 |
609 \begin{tikzpicture}[remember picture,overlay] |
611 \begin{tikzpicture}[remember picture,overlay] |
610 \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) |
612 \draw[->,very thick] (LA) edge [->,to path={-- ++(10mm,0mm) |
611 -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); |
613 -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east); |
612 \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) |
614 \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) |
613 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); |
615 -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); |
614 \end{tikzpicture} |
616 \end{tikzpicture} |
615 |
617 |
616 \noindent |
618 \noindent |
617 As said, I leave it to you to decide whether the code implements |
619 As said, I leave it to you to check that the code really implements |
618 the usual controlflow of while-loops. |
620 the usual controlflow of while-loops. |
619 |
621 |
620 Next we need to consider the WHILE-statement \pcode{write x}, which can |
622 Next we need to consider the WHILE-statement \pcode{write x}, which can |
621 be used to print out the content of a variable. For this we shall use a |
623 be used to print out the content of a variable. For this we shall use a |
622 Java library function. In order to avoid having to generate a lot of |
624 Java library function. In order to avoid having to generate a lot of |
740 |
742 |
741 \subsection*{Arrays} |
743 \subsection*{Arrays} |
742 |
744 |
743 Maybe a useful addition to the WHILE-language would be arrays. This |
745 Maybe a useful addition to the WHILE-language would be arrays. This |
744 would allow us to generate more interesting WHILE-programs by |
746 would allow us to generate more interesting WHILE-programs by |
745 translating BF*** programs into equivalent WHILE-code. Therefore in this |
747 translating BF*** programs into equivalent WHILE-code. Yeah! Therefore in this |
746 section let us have a look at how we can support the following three |
748 section let us have a look at how we can support the following three |
747 constructions |
749 constructions |
748 |
750 |
749 \begin{itemize} |
751 \begin{itemize} |
750 \item \lstinline|new(arr[15000])| |
752 \item \lstinline|new(arr[15000])| |
751 \item \lstinline|x := 3 + arr[3 + y]| |
753 \item \lstinline|x := 3 + arr[3 + y]| |
752 \item \lstinline|arr[42 * n] := ...| |
754 \item \lstinline|arr[42 * n] := ...| |
753 \end{itemize} |
755 \end{itemize} |
754 |
756 |
755 \noindent |
757 \noindent |
756 The first construct is for creating new arrays. In this instance the |
758 The first construct is for creating new arrays. In this instance the |
757 name of the array is \pcode{arr} and it can hold 15000 integers. We do |
759 name of the array is \pcode{arr} and it can hold 15000 integers. We do |
758 not support ``dynamic'' arrays, that is the size of our arrays will |
760 not support ``dynamic'' arrays, that is the size of our arrays will |
759 always be fixed. The second construct is for referencing an array cell |
761 always be fixed. The second construct is for referencing an array cell |
760 inside an arithmetic expression---we need to be able to look up the |
762 inside an arithmetic expression---we need to be able to look up the |
761 contents of an array at an index determined by an arithmetic expression. |
763 contents of an array at an index determined by an arithmetic expression. |
762 Similarly in the line below, we need to be able to update the content of |
764 Similarly in the line below, we need to be able to update the content of |
763 an array at a calculated index. |
765 an array at a calculated index. |
764 |
766 |
765 For creating a new array we can generate the following three JVM |
767 For creating a new array we can generate the following three JVM |
766 instructions: |
768 instructions: |
767 |
769 |
768 \begin{lstlisting}[mathescape,language=JVMIS] |
770 \begin{lstlisting}[mathescape,language=JVMIS] |
769 ldc number |
771 ldc number |
770 newarray int |
772 newarray int |
771 astore loc_var |
773 astore loc_var |
772 \end{lstlisting} |
774 \end{lstlisting} |
773 |
775 |
774 \noindent |
776 \noindent |
775 First we need to put the size of the array onto the stack. The next |
777 First we need to put the size of the array onto the stack. The next |
1082 \begin{tikzpicture}[every text node part/.style={align=left}, |
1086 \begin{tikzpicture}[every text node part/.style={align=left}, |
1083 stack/.style={rectangle split,rectangle split parts = 5, |
1087 stack/.style={rectangle split,rectangle split parts = 5, |
1084 fill=black!20,draw,text width=1.6cm,line width=0.5mm}] |
1088 fill=black!20,draw,text width=1.6cm,line width=0.5mm}] |
1085 \node (A) {}; |
1089 \node (A) {}; |
1086 \node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}}; |
1090 \node[stack,right = 80pt] (0) at (A.east) {$\textit{index}$\nodepart{two} \ldots\phantom{l}}; |
1087 \node[stack,right = 60pt] (1) at (0.east) |
1091 \node[stack,right = 60pt] (1) at (0.east) |
1088 {array\nodepart{two} |
1092 {array\nodepart{two} |
1089 $\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
1093 $\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
1090 \node[stack,below = 40pt] (2) at (1.south) |
1094 \node[stack,below = 40pt] (2) at (1.south) |
1091 {array\nodepart{two} |
1095 {array\nodepart{two} |
1092 $\textit{index}$ \nodepart{three} |
1096 $\textit{index}$ \nodepart{three} |
1093 array \nodepart{four} |
1097 array \nodepart{four} |
1094 $\textit{index}$\nodepart{five} \ldots\phantom{l}}; |
1098 $\textit{index}$\nodepart{five} \ldots\phantom{l}}; |
1095 \node[stack,left = 90pt] (3) at (2.west) |
1099 \node[stack,left = 90pt] (3) at (2.west) |
1096 {array\_len\nodepart{two} |
1100 {array\_len\nodepart{two} |
1097 $\textit{index}$ \nodepart{three} |
1101 $\textit{index}$ \nodepart{three} |
1098 array \nodepart{four} |
1102 array \nodepart{four} |
1099 $\textit{index}$\nodepart{five} \ldots\phantom{l}}; |
1103 $\textit{index}$\nodepart{five} \ldots\phantom{l}}; |
1100 \node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south) |
1104 \node[stack,below right of = 3,node distance = 130pt,rectangle split parts = 3] (4b) at (3.south) |
1101 {array\nodepart{two} |
1105 {array\nodepart{two} |
1102 $\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
1106 $\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
1103 \node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south) |
1107 \node[stack,below left of = 3,node distance = 130pt,rectangle split parts = 3] (4a) at (3.south) |
1104 {array\nodepart{two} |
1108 {array\nodepart{two} |
1105 $\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
1109 $\textit{index}$\nodepart{three} \ldots\phantom{l}}; |
1106 \node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south) |
1110 \node[stack,below of = 4a,node distance = 70pt,rectangle split parts = 3] (5a) at (4a.south) |
1107 {$\textit{index}$\nodepart{two} |
1111 {$\textit{index}$\nodepart{two} |
1108 array\nodepart{three} \ldots\phantom{l}}; |
1112 array\nodepart{three} \ldots\phantom{l}}; |
1109 \node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south) |
1113 \node[stack,below of = 5a,node distance = 60pt,rectangle split parts = 2] (6a) at (5a.south) |
1110 {$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}}; |
1114 {$\textit{array\_elem}$\nodepart{two} \ldots\phantom{l}}; |
1111 \node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south) |
1115 \node[stack,below of = 4b,node distance = 65pt,rectangle split parts = 2] (5b) at (4b.south) |
1112 {\ldots\phantom{l}}; |
1116 {\ldots\phantom{l}}; |
1113 \node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south) |
1117 \node[stack,below of = 5b,node distance = 60pt,rectangle split parts = 2] (6b) at (5b.south) |
1114 {0\nodepart{two} \ldots\phantom{l}}; |
1118 {0\nodepart{two} \ldots\phantom{l}}; |
1115 |
1119 |
1116 \draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0); |
1120 \draw [|->,line width=2.5mm] (A) -- node [above,pos=0.45] {$\textit{index\_aexp}$} (0); |
1117 \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1); |
1121 \draw [->,line width=2.5mm] (0) -- node [above,pos=0.35] {\instr{aload}} (1); |
1118 \draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2); |
1122 \draw [->,line width=2.5mm] (1) -- node [right,pos=0.35] {\instr{dup2}} (2); |
1119 \draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3); |
1123 \draw [->,line width=2.5mm] (2) -- node [above,pos=0.40] {\instr{arraylength}} (3); |
1120 \path[->,draw,line width=2.5mm] |
1124 \path[->,draw,line width=2.5mm] |
1121 let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north); |
1125 let \p1=(3.south), \p2=(4a.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) node [right,pos=0.50] {\instr{if_icmple}} -| (4a.north); |
1122 \path[->,draw,line width=2.5mm] |
1126 \path[->,draw,line width=2.5mm] |
1123 let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north); |
1127 let \p1=(3.south), \p2=(4b.north) in (3.south) -- +(0,0.5*\y2-0.5*\y1) -| (4b.north); |
1124 \draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a); |
1128 \draw [->,line width=2.5mm] (4a) -- node [right,pos=0.35] {\instr{swap}} (5a); |
1125 \draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b); |
1129 \draw [->,line width=2.5mm] (4b) -- node [right,pos=0.35] {\instr{pop2}} (5b); |
1126 \draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a); |
1130 \draw [->,line width=2.5mm] (5a) -- node [right,pos=0.35] {\instr{iaload}} (6a); |
1127 \draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b); |
1131 \draw [->,line width=2.5mm] (5b) -- node [right,pos=0.35] {\instr{iconst_0}} (6b); |
1128 \end{tikzpicture} |
1132 \end{tikzpicture} |
1129 \end{center} |
1133 \end{center} |
1130 \end{figure} |
1134 \end{figure} |
1131 |
1135 |
1132 goto\_w problem solved for too large jumps |
1136 goto\_w problem solved for too large jumps |
1133 \end{document} |
1137 \end{document} |
1134 |
1138 |
1135 %%% Local Variables: |
1139 %%% Local Variables: |
1136 %%% mode: latex |
1140 %%% mode: latex |
1137 %%% TeX-master: t |
1141 %%% TeX-master: t |
1138 %%% End: |
1142 %%% End: |