56 |
56 |
57 \noindent The first instruction loads the constant $1$ onto |
57 \noindent The first instruction loads the constant $1$ onto |
58 the stack, the next one loads $2$, the third instruction adds both |
58 the stack, the next one loads $2$, the third instruction adds both |
59 numbers together replacing the top two elements of the stack |
59 numbers together replacing the top two elements of the stack |
60 with the result $3$. For simplicity, we will throughout |
60 with the result $3$. For simplicity, we will throughout |
61 consider only integer numbers and results. Therefore we can |
61 consider only integer numbers. Therefore we can |
62 use the JVM instructions \code{iadd}, \code{isub}, |
62 use the JVM instructions \code{iadd}, \code{isub}, |
63 \code{imul}, \code{idiv} and so on. The \code{i} stands for |
63 \code{imul}, \code{idiv} and so on. The \code{i} stands for |
64 integer instructions in the JVM (alternatives are \code{d} for |
64 integer instructions in the JVM (alternatives are \code{d} for |
65 doubles, \code{l} for longs and \code{f} for floats). |
65 doubles, \code{l} for longs and \code{f} for floats). |
66 |
66 |
162 a kind of environment where variables are associated to |
162 a kind of environment where variables are associated to |
163 numbers. This association needs to be unique: if we muddle up |
163 numbers. This association needs to be unique: if we muddle up |
164 the numbers, then we essentially confuse variables and the |
164 the numbers, then we essentially confuse variables and the |
165 consequence will usually be an erroneous result. Our extended |
165 consequence will usually be an erroneous result. Our extended |
166 \textit{compile}-function for arithmetic expressions will |
166 \textit{compile}-function for arithmetic expressions will |
167 therefore take two arguments: the abstract syntax tree and the |
167 therefore take two arguments: the abstract syntax tree and an |
168 environment, $E$, that maps identifiers to index-numbers. |
168 environment, $E$, that maps identifiers to index-numbers. |
169 |
169 |
170 \begin{center} |
170 \begin{center} |
171 \begin{tabular}{lcl} |
171 \begin{tabular}{lcl} |
172 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ |
172 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ |
232 the index is and return the environment unchanged (that is in |
232 the index is and return the environment unchanged (that is in |
233 this case $E' = E$). However, if this is the first encounter |
233 this case $E' = E$). However, if this is the first encounter |
234 of the variable $x$ in the program, then we have to augment |
234 of the variable $x$ in the program, then we have to augment |
235 the environment and assign $x$ with the largest index in $E$ |
235 the environment and assign $x$ with the largest index in $E$ |
236 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). |
236 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). |
237 This means for the assignment $x := x + 1$ we generate the |
237 To sum up, for the assignment $x := x + 1$ we generate the |
238 following code |
238 following code |
239 |
239 |
240 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none] |
240 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none] |
241 iload $n_x$ |
241 iload $n_x$ |
242 ldc 1 |
242 ldc 1 |
243 iadd |
243 iadd |
244 istore $n_x$ |
244 istore $n_x$ |
245 \end{lstlisting} |
245 \end{lstlisting} |
246 |
246 |
247 \noindent |
247 \noindent |
248 where $n_x$ is the index (or pointer to the memory) for the variable $x$ . The code for |
248 where $n_x$ is the index (or pointer to the memory) for the variable |
249 looking-up the index for the variable is as follow: |
249 $x$. The code for looking-up the index for the variable is as follow: |
250 |
250 |
251 \begin{center} |
251 \begin{center} |
252 \begin{tabular}{lcl} |
252 \begin{tabular}{lcl} |
253 $index \;=\; E\textit{.getOrElse}(x, |E|)$ |
253 $index \;=\; E\textit{.getOrElse}(x, |E|)$ |
254 \end{tabular} |
254 \end{tabular} |
256 |
256 |
257 \noindent |
257 \noindent |
258 In case the environment $E$ contains an index for $x$, we return it. |
258 In case the environment $E$ contains an index for $x$, we return it. |
259 Otherwise we ``create'' a new index by returning the size $|E|$ of the |
259 Otherwise we ``create'' a new index by returning the size $|E|$ of the |
260 environment (that will be an index that is guaranteed to be not used |
260 environment (that will be an index that is guaranteed to be not used |
261 yet). |
261 yet). In all this we take advantage of the JVM which provides us with |
262 |
262 a potentially limitless supply of places where we can store values |
263 A bit more complicated is the generation of code for \pcode{if}-statements, say |
263 of variables. |
|
264 |
|
265 A bit more complicated is the generation of code for |
|
266 \pcode{if}-statements, say |
264 |
267 |
265 \begin{lstlisting}[mathescape,language={},numbers=none] |
268 \begin{lstlisting}[mathescape,language={},numbers=none] |
266 if $b$ then $cs_1$ else $cs_2$ |
269 if $b$ then $cs_1$ else $cs_2$ |
267 \end{lstlisting} |
270 \end{lstlisting} |
268 |
271 |
269 \noindent where $b$ is a boolean expression and the $cs_{1/2}$ are the |
272 \noindent where $b$ is a boolean expression and where both $cs_{1/2}$ |
270 statements for each of the \pcode{if}-branches. Lets assume we already |
273 are the statements for each of the \pcode{if}-branches. Lets assume we |
271 generated code for $b$ and $cs_{1/2}$. Then in the true-case the |
274 already generated code for $b$ and $cs_{1/2}$. Then in the true-case the |
272 control-flow of the program needs to be |
275 control-flow of the program needs to behave as |
273 |
276 |
274 \begin{center} |
277 \begin{center} |
275 \begin{tikzpicture}[node distance=2mm and 4mm, |
278 \begin{tikzpicture}[node distance=2mm and 4mm, |
276 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
279 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, |
277 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
280 point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, |
344 $instr_1$ |
347 $instr_1$ |
345 $instr_2$ |
348 $instr_2$ |
346 $\vdots$ |
349 $\vdots$ |
347 \end{lstlisting} |
350 \end{lstlisting} |
348 |
351 |
349 \noindent where a label is indicated by a colon. |
352 \noindent where a label is indicated by a colon. The task of the |
|
353 assmbler (in our case Jasmin or Krakatau) is to resolve the labels |
|
354 to actual addresses, for example jump 10 instructions forward, |
|
355 or 20 instructions backwards. |
350 |
356 |
351 Recall the ``trick'' with compiling boolean expressions: the |
357 Recall the ``trick'' with compiling boolean expressions: the |
352 \textit{compile}-function for boolean expressions takes three |
358 \textit{compile}-function for boolean expressions takes three |
353 arguments: an abstract syntax tree, an environment for |
359 arguments: an abstract syntax tree, an environment for |
354 variable indices and also the label, $lab$, to where an conditional |
360 variable indices and also the label, $lab$, to where an conditional |
370 with the next instructions (see control-flow of ifs above). |
376 with the next instructions (see control-flow of ifs above). |
371 However if they are \emph{not} equal, then we need to |
377 However if they are \emph{not} equal, then we need to |
372 (conditionally) jump to the label $lab$. This can be done with |
378 (conditionally) jump to the label $lab$. This can be done with |
373 the instruction |
379 the instruction |
374 |
380 |
375 \begin{lstlisting}[mathescape,numbers=none] |
381 \begin{lstlisting}[mathescape,numbers=none,language=JVMIS] |
376 if_icmpne $lab$ |
382 if_icmpne $lab$ |
377 \end{lstlisting} |
383 \end{lstlisting} |
378 |
384 |
379 \noindent Other jump instructions for boolean operators are |
385 \noindent Other jump instructions for boolean operators are |
380 |
386 |
385 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ |
391 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ |
386 \end{tabular} |
392 \end{tabular} |
387 \end{center} |
393 \end{center} |
388 |
394 |
389 \noindent and so on. I leave it to you to extend the |
395 \noindent and so on. I leave it to you to extend the |
390 \textit{compile}-function for the other boolean expressions. |
396 \textit{compile}-function for the other boolean expressions. Note that |
391 Note that we need to jump whenever the boolean is \emph{not} |
397 we need to jump whenever the boolean is \emph{not} true, which means we |
392 true, which means we have to ``negate'' the jump |
398 have to ``negate'' the jump condition---equals becomes not-equal, less |
393 condition---equals becomes not-equal, less becomes |
399 becomes greater-or-equal. If you do not like this design (it can be the |
394 greater-or-equal. If you do not like this design (it can be |
400 source of some nasty, hard-to-detect errors), you can also change the |
395 the source of some nasty, hard-to-detect errors), you can also |
401 layout of the code and first give the code for the else-branch and then |
396 change the layout of the code and first give the code for the |
402 for the if-branch. However in the case of while-loops this |
397 else-branch and then for the if-branch. However in the case |
403 ``upside-down-inside-out'' way of generating code still seems the most |
398 of while-loops this way of generating code still seems |
404 convenient. |
399 the most convenient. |
|
400 |
405 |
401 We are now ready to give the compile function for |
406 We are now ready to give the compile function for |
402 if-statements---remember this function returns for statements a |
407 if-statements---remember this function returns for statements a |
403 pair consisting of the code and an environment: |
408 pair consisting of the code and an environment: |
404 |
409 |
667 \begin{lstlisting}[mathescape,language=While] |
672 \begin{lstlisting}[mathescape,language=While] |
668 x := 1 + 2; |
673 x := 1 + 2; |
669 write x |
674 write x |
670 \end{lstlisting} |
675 \end{lstlisting} |
671 |
676 |
672 \noindent Having this code at our disposal, we need the |
677 \noindent I let you read the code and make sure the code behaves as |
673 assembler to translate the generated code into JVM bytecode (a |
678 expected. Having this code at our disposal, we need the assembler to |
674 class file). This bytecode is understood by the JVM and can be |
679 translate the generated code into JVM bytecode (a class file). This |
675 run by just invoking the \pcode{java}-program. |
680 bytecode is then understood by the JVM and can be run by just invoking |
|
681 the \pcode{java}-program. |
676 |
682 |
677 |
683 |
678 \begin{figure}[p] |
684 \begin{figure}[p] |
679 \lstinputlisting[language=JVMIS]{../progs/test-small.j} |
685 \lstinputlisting[language=JVMIS]{../progs/test-small.j} |
680 \caption{Generated code for a test program. This code can be |
686 \caption{Generated code for a test program. This code can be |
698 \noindent |
704 \noindent |
699 The first construct is for creating new arrays, in this instance the |
705 The first construct is for creating new arrays, in this instance the |
700 name of the array is \pcode{arr} and it can hold 15000 integers. The |
706 name of the array is \pcode{arr} and it can hold 15000 integers. The |
701 second is for referencing an array cell inside an arithmetic |
707 second is for referencing an array cell inside an arithmetic |
702 expression---we need to be able to look up the contents of an array at |
708 expression---we need to be able to look up the contents of an array at |
703 an index determined by an arithmetic expression. Similarly, we need to |
709 an index determined by an arithmetic expression. Similarly in the line |
704 be able to update the content of an array at an calculated index---the |
710 below, we need to be able to update the content of an array at an |
705 3rd feature. |
711 calculated index. |
706 |
712 |
707 For creating a new array we can generate the following three JVM |
713 For creating a new array we can generate the following three JVM |
708 instructions: |
714 instructions: |
709 |
715 |
710 \begin{lstlisting}[mathescape,language=JVMIS] |
716 \begin{lstlisting}[mathescape,language=JVMIS] |
715 |
721 |
716 \noindent |
722 \noindent |
717 First we need to put the dimension of the array onto the stack. The next |
723 First we need to put the dimension of the array onto the stack. The next |
718 instruction creates the array. With the last we can store the array as a |
724 instruction creates the array. With the last we can store the array as a |
719 local variable (like the ``simple'' variables from the previous |
725 local variable (like the ``simple'' variables from the previous |
720 section). The use of a local variable allows us to have multiple arrays |
726 section). The use of a local variable for each array allows us to have |
721 in a While-program. For looking up an element in an array we can use the |
727 multiple arrays in a While-program. For looking up an element in an |
722 following JVM code |
728 array we can use the following JVM code |
723 |
729 |
724 \begin{lstlisting}[mathescape,language=JVMIS] |
730 \begin{lstlisting}[mathescape,language=JVMIS] |
725 aload loc_var |
731 aload loc_var |
726 index_aexp |
732 index_aexp |
727 iaload |
733 iaload |
744 \end{lstlisting} |
750 \end{lstlisting} |
745 |
751 |
746 \noindent |
752 \noindent |
747 Again the first instruction loads the ``pointer'' to the array onto the |
753 Again the first instruction loads the ``pointer'' to the array onto the |
748 stack. Then we have some instructions corresponding to the index where |
754 stack. Then we have some instructions corresponding to the index where |
749 we want to update the array. After that come the instructions for with what |
755 we want to update the array. After that come the instructions for with |
750 value we want to update the array. Finally the instruction for |
756 what value we want to update the array. The last line contains the |
751 updating the array. |
757 instruction for updating the array. |
752 |
758 |
753 Next we need to update our grammar rules: it seems best to extend the |
759 Next we need to modify our grammar rules for our While-language: it |
754 rule for factors in arithmetic expressions with a rule for looking up |
760 seems best to extend the rule for factors in arithmetic expressions with |
755 an array. |
761 a rule for looking up an array. |
756 |
762 |
757 \begin{plstx}[rhs style=, margin=3cm] |
763 \begin{plstx}[rhs style=, margin=3cm] |
758 : \meta{E} ::= \meta{T} $+$ \meta{E} |
764 : \meta{E} ::= \meta{T} $+$ \meta{E} |
759 | \meta{T} $-$ \meta{E} |
765 | \meta{T} $-$ \meta{E} |
760 | \meta{T}\\ |
766 | \meta{T}\\ |
767 | \meta{Num}\\ |
773 | \meta{Num}\\ |
768 \end{plstx} |
774 \end{plstx} |
769 |
775 |
770 \noindent |
776 \noindent |
771 There is no problem with left-recursion as the \meta{E} is ``protected'' |
777 There is no problem with left-recursion as the \meta{E} is ``protected'' |
772 by an identifier and brackets. There are two new rules in statements, |
778 by an identifier and the brackets. There are two new rules for statements, |
773 namely creation of an array and array assignment: |
779 one for creating an array and one for array assignment: |
774 |
780 |
775 \begin{plstx}[rhs style=, margin=2cm, one per line] |
781 \begin{plstx}[rhs style=, margin=2cm, one per line] |
776 : \meta{Stmt} ::= \ldots |
782 : \meta{Stmt} ::= \ldots |
777 | \texttt{new}\; \meta{Id}\,[\,\meta{Num}\,] |
783 | \texttt{new}\; \meta{Id}\,[\,\meta{Num}\,] |
778 | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\ |
784 | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\ |
779 \end{plstx} |
785 \end{plstx} |
780 |
786 |
|
787 With this in place we can turn back to the idea of creating While |
|
788 programs by translating BF programs. This is a relatively easy task |
|
789 because BF only has eight instructions (we will actually only have seven |
|
790 because we can omit the read-in instruction from BF). But also translating |
|
791 BF-loops is going to be easy since they straightforwardly can be |
|
792 represented by While-loops. The Scala code for the translation is |
|
793 as follows: |
|
794 |
|
795 \begin{lstlisting}[language=Scala,numbers=left] |
|
796 def instr(c: Char) : String = c match { |
|
797 case '>' => "ptr := ptr + 1;" |
|
798 case '<' => "ptr := ptr - 1;" |
|
799 case '+' => "field[ptr] := field[ptr] + 1;" |
|
800 case '-' => "field[ptr] := field[ptr] - 1;" |
|
801 case '.' => "x := field[ptr]; write x;" |
|
802 case '[' => "while (field[ptr] != 0) do {" |
|
803 case ']' => "skip};" |
|
804 case _ => "" |
|
805 } |
|
806 \end{lstlisting} |
|
807 |
|
808 \noindent |
|
809 The idea behind the translation is that BF-programs operate on an array, |
|
810 called \texttt{field}. The BP-memory pointer into this array is |
|
811 represented as the variable \texttt{ptr}. The BF-instructions \code{>} |
|
812 and \code{<} increase, respectively decrease, \texttt{ptr}. The |
|
813 instructions \code{+} and \code{-} update a cell in \texttt{field}. In |
|
814 Line 6 we need to first assign a field-cell to an auxiliary variable |
|
815 since we have not changed our write functions in order to cope with |
|
816 writing out any array-content directly. Lines 7 and 8 are for |
|
817 translating BF-loops. Line 8 is interesting in the sense that we need to |
|
818 generate a \code{skip} instruction just before finishing with the |
|
819 closing \code{"\}"}. The reason is that we are rather pedantic about |
|
820 semicolons in our While-grammar: the last command cannot have a |
|
821 semicolon---adding a \code{skip} works around this snag. Putting |
|
822 all this together is we can generate While-programs with more than |
|
823 400 instructions and then run the compiled JVM code for such programs. |
|
824 |
|
825 |
781 \end{document} |
826 \end{document} |
782 |
827 |
783 %%% Local Variables: |
828 %%% Local Variables: |
784 %%% mode: latex |
829 %%% mode: latex |
785 %%% TeX-master: t |
830 %%% TeX-master: t |