17 compiler would normally need to take care of (like explicit |
18 compiler would normally need to take care of (like explicit |
18 memory management). |
19 memory management). |
19 |
20 |
20 We will be generating code for the Java Virtual Machine. This |
21 We will be generating code for the Java Virtual Machine. This |
21 is a stack-based virtual machine which will make it easy to |
22 is a stack-based virtual machine which will make it easy to |
22 generate code for arithmetic expressions. Recall |
23 generate code for arithmetic expressions. For example for |
23 that our |
24 generating code for the expression $1 + 2$ we need to issue |
|
25 the following three instructions |
|
26 |
|
27 \begin{lstlisting}[numbers=none] |
|
28 ldc 1 |
|
29 ldc 2 |
|
30 iadd |
|
31 \end{lstlisting} |
|
32 |
|
33 \noindent The first instruction loads the constant $1$ on the |
|
34 stack, the next one $2$, the third instruction add both |
|
35 numbers together replacing the top elements of the stack with |
|
36 the result $3$. We will throughout consider only integer |
|
37 numbers and results, therefore we can use the instructions |
|
38 \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on. |
|
39 The \code{i} stands for integer instructions (alternatives are |
|
40 \code{d} for doubles, \code{l} for longs and \code{f} for |
|
41 floats). |
|
42 |
|
43 Recall our grammar for arithmetic expressions: |
|
44 |
|
45 |
|
46 \begin{plstx}[rhs style=, margin=3cm] |
|
47 : \meta{E} ::= \meta{T} $+$ \meta{E} |
|
48 | \meta{T} $-$ \meta{E} |
|
49 | \meta{T}\\ |
|
50 : \meta{T} ::= \meta{F} $*$ \meta{T} |
|
51 | \meta{F} $\backslash$ \meta{T} |
|
52 | \meta{F}\\ |
|
53 : \meta{F} ::= ( \meta{E} ) |
|
54 | \meta{Id} |
|
55 | \meta{Num}\\ |
|
56 \end{plstx} |
|
57 |
|
58 |
|
59 \noindent where \meta{Id} stands for variables and |
|
60 \meta{Num} for number. For the moment let us omit variables from |
|
61 arithmetic expressions. Our parser will take this grammar and |
|
62 produce abstract syntax trees, for |
|
63 example for the expression $1 + ((2 * 3) + (4 - 3))$ it will |
|
64 produce the following tree. |
|
65 |
|
66 \begin{center} |
|
67 \begin{tikzpicture} |
|
68 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] |
|
69 \end{tikzpicture} |
|
70 \end{center} |
|
71 |
|
72 \noindent |
|
73 To generate code for this expression, we need to traverse this |
|
74 tree in post-order fashion---this will produce code for a |
|
75 stack-machine, like the JVM. Doing so gives the instructions |
|
76 |
|
77 \begin{lstlisting}[numbers=none] |
|
78 ldc 1 |
|
79 ldc 2 |
|
80 ldc 3 |
|
81 imul |
|
82 ldc 4 |
|
83 ldc 3 |
|
84 isub |
|
85 iadd |
|
86 iadd |
|
87 \end{lstlisting} |
|
88 |
|
89 \noindent If we ``run'' these instructions, the result $8$ |
|
90 will be on top of the stack. This will be a convention we |
|
91 always observe, namely that the results of arithmetic |
|
92 expressions will always be on top of the stack. Note, that a |
|
93 different bracketing, for example $(1 + (2 * 3)) + (4 - 3)$, |
|
94 produces a different abstract syntax tree and thus potentially |
|
95 also a different list of instructions. Generating code in this |
|
96 fashion is rather simple: it can be done by the following |
|
97 \textit{compile}-function: |
|
98 |
|
99 \begin{center} |
|
100 \begin{tabular}{lcl} |
|
101 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\ |
|
102 $\textit{compile}(a_1 + a_2)$ & $\dn$ & |
|
103 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\ |
|
104 $\textit{compile}(a_1 - a_2)$ & $\dn$ & |
|
105 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\ |
|
106 $\textit{compile}(a_1 * a_2)$ & $\dn$ & |
|
107 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\ |
|
108 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & |
|
109 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\ |
|
110 \end{tabular} |
|
111 \end{center} |
|
112 |
|
113 \noindent However, our arithmetic expressions can also contain |
|
114 variables. We will represent them as \emph{local variables} in |
|
115 the JVM. Essentially, local variables are an array or pointers |
|
116 to the memory containing in our case only integers. Looking up |
|
117 a variable can be done by the instruction |
|
118 |
|
119 \begin{lstlisting}[mathescape,numbers=none] |
|
120 iload $index$ |
|
121 \end{lstlisting} |
|
122 |
|
123 \noindent |
|
124 which places the content of the local variable $index$ onto |
|
125 thestack. Storing the top of the stack into a local variable |
|
126 can be done by the instruction |
|
127 |
|
128 \begin{lstlisting}[mathescape,numbers=none] |
|
129 istore $index$ |
|
130 \end{lstlisting} |
|
131 |
|
132 \noindent Note that this also pops off the top of the stack. |
|
133 One problem we have to overcome is that local variables are |
|
134 addressed, not by identifiers, but by numbers (starting from |
|
135 $0$). Therefore our compiler needs to maintain a kind of |
|
136 environment (similar to the interpreter) where variables are |
|
137 associated to numbers. This association needs to be unique: if |
|
138 we muddle up the numbers, then we essentially confuse |
|
139 variables and the result will usually be an erroneous result. |
|
140 Therefore our \textit{compile}-function will take two |
|
141 arguments: the abstract syntax tree and the environment, $E$, |
|
142 that maps identifiers to index-numbers. |
|
143 |
|
144 \begin{center} |
|
145 \begin{tabular}{lcl} |
|
146 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ |
|
147 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
|
148 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\ |
|
149 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
|
150 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\ |
|
151 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
|
152 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\ |
|
153 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
|
154 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\ |
|
155 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\ |
|
156 \end{tabular} |
|
157 \end{center} |
|
158 |
|
159 \noindent In the last line we generate the code for variables |
|
160 where $E(x)$ stands for looking up to which index the variable |
|
161 $x$ maps to. |
|
162 |
24 |
163 |
25 \end{document} |
164 \end{document} |
26 |
165 |
27 %%% Local Variables: |
166 %%% Local Variables: |
28 %%% mode: latex |
167 %%% mode: latex |