changeset 711 | 6f3f3dd01786 |
parent 710 | 183663740fb7 |
child 712 | e71eb9ce2373 |
710:183663740fb7 | 711:6f3f3dd01786 |
---|---|
89 \noindent The first instruction loads the constant $1$ onto the stack, |
89 \noindent The first instruction loads the constant $1$ onto the stack, |
90 the next one loads $2$, the third instruction adds both numbers together |
90 the next one loads $2$, the third instruction adds both numbers together |
91 replacing the top two elements of the stack with the result $3$. For |
91 replacing the top two elements of the stack with the result $3$. For |
92 simplicity, we will consider throughout only arithmetic involving |
92 simplicity, we will consider throughout only arithmetic involving |
93 integer numbers. This means our main JVM instructions for arithmetic |
93 integer numbers. This means our main JVM instructions for arithmetic |
94 will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on. |
94 will be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on. |
95 The \code{i} stands for integer instructions in the JVM (alternatives |
95 The \code{i} stands for integer instructions in the JVM (alternatives |
96 are \code{d} for doubles, \code{l} for longs and \code{f} for floats |
96 are \code{d} for doubles, \code{l} for longs and \code{f} for floats |
97 etc). |
97 etc). |
98 |
98 |
99 Recall our grammar for arithmetic expressions (\meta{E} is the |
99 Recall our grammar for arithmetic expressions (\meta{E} is the |
156 \textit{compile}-function, which takes the abstract syntax tree as an |
156 \textit{compile}-function, which takes the abstract syntax tree as an |
157 argument: |
157 argument: |
158 |
158 |
159 \begin{center} |
159 \begin{center} |
160 \begin{tabular}{lcl} |
160 \begin{tabular}{lcl} |
161 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\ |
161 $\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\ |
162 $\textit{compile}(a_1 + a_2)$ & $\dn$ & |
162 $\textit{compile}(a_1 + a_2)$ & $\dn$ & |
163 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\ |
163 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\ |
164 $\textit{compile}(a_1 - a_2)$ & $\dn$ & |
164 $\textit{compile}(a_1 - a_2)$ & $\dn$ & |
165 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\ |
165 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\ |
166 $\textit{compile}(a_1 * a_2)$ & $\dn$ & |
166 $\textit{compile}(a_1 * a_2)$ & $\dn$ & |
167 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\ |
167 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\ |
168 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & |
168 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & |
169 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\ |
169 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\ |
170 \end{tabular} |
170 \end{tabular} |
171 \end{center} |
171 \end{center} |
172 |
172 |
173 \noindent |
173 \noindent |
174 This is all fine, but our arithmetic expressions can contain variables |
174 This is all fine, but our arithmetic expressions can contain variables |
202 expressions will therefore take two arguments: the abstract syntax tree |
202 expressions will therefore take two arguments: the abstract syntax tree |
203 and an environment, $E$, that maps identifiers to index-numbers. |
203 and an environment, $E$, that maps identifiers to index-numbers. |
204 |
204 |
205 \begin{center} |
205 \begin{center} |
206 \begin{tabular}{lcl} |
206 \begin{tabular}{lcl} |
207 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\ |
207 $\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\ |
208 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
208 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & |
209 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\ |
209 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\ |
210 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
210 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ & |
211 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\ |
211 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\ |
212 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
212 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ & |
213 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\ |
213 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\ |
214 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
214 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & |
215 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\ |
215 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\ |
216 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\ |
216 $\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\ |
217 \end{tabular} |
217 \end{tabular} |
218 \end{center} |
218 \end{center} |
219 |
219 |
220 \noindent In the last line we generate the code for variables where |
220 \noindent In the last line we generate the code for variables where |
221 $E(x)$ stands for looking up the environment to which index the variable |
221 $E(x)$ stands for looking up the environment to which index the variable |
251 $\textit{x := a}$: |
251 $\textit{x := a}$: |
252 |
252 |
253 \begin{center} |
253 \begin{center} |
254 \begin{tabular}{lcl} |
254 \begin{tabular}{lcl} |
255 $\textit{compile}(x := a, E)$ & $\dn$ & |
255 $\textit{compile}(x := a, E)$ & $\dn$ & |
256 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$ |
256 $(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$ |
257 \end{tabular} |
257 \end{tabular} |
258 \end{center} |
258 \end{center} |
259 |
259 |
260 \noindent We first generate code for the right-hand side of the |
260 \noindent We first generate code for the right-hand side of the |
261 assignment (that is the arithmetic expression $a$) and then add an |
261 assignment (that is the arithmetic expression $a$) and then add an |
262 \pcode{istore}-instruction at the end. By convention running the code |
262 \instr{istore}-instruction at the end. By convention running the code |
263 for the arithmetic expression $a$ will leave the result on top of the |
263 for the arithmetic expression $a$ will leave the result on top of the |
264 stack. After that the \pcode{istore} instruction, the result will be |
264 stack. After that the \instr{istore} instruction, the result will be |
265 stored in the index corresponding to the variable $x$. If the variable |
265 stored in the index corresponding to the variable $x$. If the variable |
266 $x$ has been used before in the program, we just need to look up what |
266 $x$ has been used before in the program, we just need to look up what |
267 the index is and return the environment unchanged (that is in this case |
267 the index is and return the environment unchanged (that is in this case |
268 $E' = E$). However, if this is the first encounter of the variable $x$ |
268 $E' = E$). However, if this is the first encounter of the variable $x$ |
269 in the program, then we have to augment the environment and assign $x$ |
269 in the program, then we have to augment the environment and assign $x$ |
297 where we can store values of variables. |
297 where we can store values of variables. |
298 |
298 |
299 A bit more complicated is the generation of code for |
299 A bit more complicated is the generation of code for |
300 \pcode{if}-statements, say |
300 \pcode{if}-statements, say |
301 |
301 |
302 \begin{lstlisting}[mathescape,language={},numbers=none] |
302 \begin{lstlisting}[mathescape,language={WHILE},numbers=none] |
303 if $b$ then $cs_1$ else $cs_2$ |
303 if $b$ then $cs_1$ else $cs_2$ |
304 \end{lstlisting} |
304 \end{lstlisting} |
305 |
305 |
306 \noindent where $b$ is a boolean expression and where both $cs_{1/2}$ |
306 \noindent where $b$ is a boolean expression and where both $cs_{1/2}$ |
307 are the statements for each of the \pcode{if}-branches. Let us assume we |
307 are the statements for each of the \pcode{if}-branches. Let us assume we |
336 $cs_1$. After this however, we must not run the code for |
336 $cs_1$. After this however, we must not run the code for |
337 $cs_2$, but always jump to after the last instruction of $cs_2$ |
337 $cs_2$, but always jump to after the last instruction of $cs_2$ |
338 (the code for the \pcode{else}-branch). Note that this jump is |
338 (the code for the \pcode{else}-branch). Note that this jump is |
339 unconditional, meaning we always have to jump to the end of |
339 unconditional, meaning we always have to jump to the end of |
340 $cs_2$. The corresponding instruction of the JVM is |
340 $cs_2$. The corresponding instruction of the JVM is |
341 \pcode{goto}. In case $b$ turns out to be false we need the |
341 \instr{goto}. In case $b$ turns out to be false we need the |
342 control-flow |
342 control-flow |
343 |
343 |
344 \begin{center} |
344 \begin{center} |
345 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
345 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round, |
346 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
346 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm, |
368 if-condition is false) from the end of the code for the |
368 if-condition is false) from the end of the code for the |
369 boolean to the beginning of the instructions $cs_2$. Once we |
369 boolean to the beginning of the instructions $cs_2$. Once we |
370 are finished with running $cs_2$ we can continue with whatever |
370 are finished with running $cs_2$ we can continue with whatever |
371 code comes after the if-statement. |
371 code comes after the if-statement. |
372 |
372 |
373 The \pcode{goto} and the conditional jumps need addresses to |
373 The \instr{goto} and the conditional jumps need addresses to |
374 where the jump should go. Since we are generating assembly |
374 where the jump should go. Since we are generating assembly |
375 code for the JVM, we do not actually have to give (numeric) |
375 code for the JVM, we do not actually have to give (numeric) |
376 addresses, but can just attach (symbolic) labels to our code. |
376 addresses, but can just attach (symbolic) labels to our code. |
377 These labels specify a target for a jump. Therefore the labels |
377 These labels specify a target for a jump. Therefore the labels |
378 need to be unique, as otherwise it would be ambiguous where a |
378 need to be unique, as otherwise it would be ambiguous where a |
379 jump should go to. A label, say \pcode{L}, is attached to code |
379 jump should go to. A label, say \pcode{L}, is attached to code |
380 like |
380 like |
381 |
381 |
382 \begin{lstlisting}[mathescape,numbers=none] |
382 \begin{lstlisting}[mathescape,numbers=none] |
383 L: |
383 L: |
384 $instr_1$ |
384 $\textit{instr\_1}$ |
385 $instr_2$ |
385 $\textit{instr\_2}$ |
386 $\vdots$ |
386 $\vdots$ |
387 \end{lstlisting} |
387 \end{lstlisting} |
388 |
388 |
389 \noindent where the label needs to be followed by a colon. The task of |
389 \noindent where the label needs to be followed by a colon. The task of |
390 the assembler (in our case Jasmin or Krakatau) is to resolve the labels |
390 the assembler (in our case Jasmin or Krakatau) is to resolve the labels |
399 for example, is as follows: |
399 for example, is as follows: |
400 |
400 |
401 \begin{center} |
401 \begin{center} |
402 \begin{tabular}{lcl} |
402 \begin{tabular}{lcl} |
403 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ |
403 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ |
404 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$} |
404 \multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$} |
405 \end{tabular} |
405 \end{tabular} |
406 \end{center} |
406 \end{center} |
407 |
407 |
408 \noindent where we are first generating code for the |
408 \noindent where we are first generating code for the |
409 subexpressions $a_1$ and $a_2$. This will mean after running |
409 subexpressions $a_1$ and $a_2$. This will mean after running |
427 condition---equals becomes not-equal, less becomes greater-or-equal. |
427 condition---equals becomes not-equal, less becomes greater-or-equal. |
428 Other jump instructions for boolean operators are |
428 Other jump instructions for boolean operators are |
429 |
429 |
430 \begin{center} |
430 \begin{center} |
431 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} |
431 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} |
432 $\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\ |
432 $\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\ |
433 $<$ & $\Rightarrow$ & \pcode{if_icmpge}\\ |
433 $<$ & $\Rightarrow$ & \instr{if_icmpge}\\ |
434 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ |
434 $\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\ |
435 \end{tabular} |
435 \end{tabular} |
436 \end{center} |
436 \end{center} |
437 |
437 |
438 \noindent and so on. If you do not like this design (it can be the |
438 \noindent and so on. If you do not like this design (it can be the |
439 source of some nasty, hard-to-detect errors), you can also change the |
439 source of some nasty, hard-to-detect errors), you can also change the |
698 return |
698 return |
699 .end method |
699 .end method |
700 \end{lstlisting} |
700 \end{lstlisting} |
701 \end{framed} |
701 \end{framed} |
702 \caption{The boilerplate code needed for running generated code. It |
702 \caption{The boilerplate code needed for running generated code. It |
703 hardwires limits for stack space and number of local |
703 hardwires limits for stack space and for the number of local |
704 variables.\label{boiler}} |
704 variables.\label{boiler}} |
705 \end{figure} |
705 \end{figure} |
706 |
706 |
707 |
707 |
708 To sum up, in Figure~\ref{test} is the complete code generated |
708 To sum up, in Figure~\ref{test} is the complete code generated |
777 multiple arrays in a WHILE-program. For looking up an element in an |
777 multiple arrays in a WHILE-program. For looking up an element in an |
778 array we can use the following JVM code |
778 array we can use the following JVM code |
779 |
779 |
780 \begin{lstlisting}[mathescape,language=JVMIS] |
780 \begin{lstlisting}[mathescape,language=JVMIS] |
781 aload loc_var |
781 aload loc_var |
782 index_aexp |
782 $\textit{index\_aexp}$ |
783 iaload |
783 iaload |
784 \end{lstlisting} |
784 \end{lstlisting} |
785 |
785 |
786 \noindent |
786 \noindent |
787 The first instruction loads the ``pointer'', or local variable, to the |
787 The first instruction loads the ``pointer'', or local variable, to the |
792 JVM to load the corresponding element onto the stack. Updating an array |
792 JVM to load the corresponding element onto the stack. Updating an array |
793 at an index with a value is as follows. |
793 at an index with a value is as follows. |
794 |
794 |
795 \begin{lstlisting}[mathescape,language=JVMIS] |
795 \begin{lstlisting}[mathescape,language=JVMIS] |
796 aload loc_var |
796 aload loc_var |
797 index_aexp |
797 $\textit{index\_aexp}$ |
798 value_aexp |
798 $\textit{value\_aexp}$ |
799 iastore |
799 iastore |
800 \end{lstlisting} |
800 \end{lstlisting} |
801 |
801 |
802 \noindent |
802 \noindent |
803 Again the first instruction loads the local variable of |
803 Again the first instruction loads the local variable of |
868 generate a \code{skip} instruction just before finishing with the |
868 generate a \code{skip} instruction just before finishing with the |
869 closing \code{"\}"}. The reason is that we are rather pedantic about |
869 closing \code{"\}"}. The reason is that we are rather pedantic about |
870 semicolons in our WHILE-grammar: the last command cannot have a |
870 semicolons in our WHILE-grammar: the last command cannot have a |
871 semicolon---adding a \code{skip} works around this snag. |
871 semicolon---adding a \code{skip} works around this snag. |
872 |
872 |
873 Putting all this together and we can generate WHILE-programs with more |
873 Putting this all together and we can generate WHILE-programs with more |
874 than 15K JVM-instructions; run the compiled JVM code for such |
874 than 15K JVM-instructions; run the compiled JVM code for such |
875 programs and marvel at the output\ldots\medskip |
875 programs and marvel at the output\ldots\medskip |
876 |
876 |
877 \noindent |
877 \noindent |
878 \ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it |
878 \ldots{}Hooooray, after a few more tweaks we can finally run the |
879 completes within 20 or so seconds (after nearly 10 minutes of parsing |
879 BF-mandelbrot program on the JVM (after nearly 10 minutes of parsing the |
880 the corresponding WHILE-program; the size of the resulting class files |
880 corresponding WHILE-program; the size of the resulting class file is |
881 is around 32K). Try replicating the 20 secs with an interpreter! The |
881 around 32K---not too bad). The generation of the picture completes |
882 within 20 or so seconds. Try replicating this with an interpreter! The |
|
882 good point is that we now have a sufficiently complicated program in our |
883 good point is that we now have a sufficiently complicated program in our |
883 WHILE-language in order to do some benchmarking. Which means we now face |
884 WHILE-language in order to do some benchmarking. Which means we now face |
884 the question about what to do next\ldots |
885 the question about what to do next\ldots |
885 |
886 |
886 \subsection*{Optimisations \& Co} |
887 \subsection*{Optimisations \& Co} |
887 |
888 |
888 Every compiler that deserves its name performs some optimisation on the |
889 Every compiler that deserves its name performs optimisations on the |
889 code. If we make the extra effort of writing a compiler for a language, |
890 code. If we make the extra effort of writing a compiler for a language, |
890 then obviously we want to have our code to run as fast as possible. |
891 then obviously we want to have our code to run as fast as possible. |
891 |
892 So we should look into this. |
892 So let's optimise a bit the code we generate. There is actually one |
893 |
893 aspect in our generated code where we can make easily efficiency gains: |
894 There is actually one aspect in our generated code where we can make |
894 this has to do with some of the quirks of the JVM. Whenever we push a |
895 easily efficiency gains: this has to do with some of the quirks of the |
895 constant onto the stack, we used the JVM instruction \code{ldc |
896 JVM. Whenever we push a constant onto the stack, we used the JVM |
896 some_const}. This is a rather generic instructions in the sense that it |
897 instruction \instr{ldc some_const}. This is a rather generic instruction |
897 works not just for integers but also for strings, objects and so on. |
898 in the sense that it works not just for integers but also for strings, |
898 What this instruction does is to put the constant into a constant pool |
899 objects and so on. What this instruction does is putting the constant |
899 and then to use an index to this constant pool. This means \code{ldc} |
900 into a \emph{constant pool} and then to use an index into this constant |
900 will be represented by at least two bytes in the class file. While this |
901 pool. This means \instr{ldc} will be represented by at least two bytes |
901 is sensible for ``large'' constants like strings, it is a bit of |
902 in the class file. While this is sensible for ``large'' constants like |
902 overkill for small integers (which many integers will be when compiling |
903 strings, it is a bit of overkill for small integers (which many integers |
903 a BF-program). To counter this ``waste'', the JVM has specific |
904 will be when compiling a BF-program). To counter this ``waste'', the JVM |
904 instructions for small integers, for example |
905 has specific instructions for small integers, for example |
905 |
906 |
906 \begin{itemize} |
907 \begin{itemize} |
907 \item \code{iconst_0},\ldots, \code{iconst_5} |
908 \item \instr{iconst_0},\ldots, \instr{iconst_5} |
908 \item \code{bipush n} |
909 \item \instr{bipush n} |
909 \end{itemize} |
910 \end{itemize} |
910 |
911 |
911 \noindent |
912 \noindent |
912 where the \code{n} is \code{bipush} is between -128 and 128. By having |
913 where the \code{n} is \instr{bipush} is between -128 and 128. By |
913 dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and |
914 having dedicated instructions such as \instr{iconst_0} to |
914 \code{iconst_m1}), we can make the generated code size smaller as these |
915 \instr{iconst_5} (and \instr{iconst_m1}), we can make the generated code |
915 instructions only require 1 Byte (as opposed the generic \code{ldc} |
916 size smaller as these instructions only require 1 byte (as opposed the |
916 which needs 1 Byte plus another for the index into the constant pool). |
917 generic \instr{ldc} which needs 1 byte plus another for the index into |
917 While in theory the use of such special instructions should make the |
918 the constant pool). While in theory the use of such special instructions |
918 code only smaller, it actually makes the code also run faster. Probably |
919 should make the code only smaller, it actually makes the code also run |
919 because the JVM has to process less code and uses a specific instruction |
920 faster. Probably because the JVM has to process less code and uses a |
920 in the underlying CPU. The story with \code{bipush} is slightly |
921 specific instruction in the underlying CPU. The story with |
921 different, because it also uses two Bytes---so it does not result in a |
922 \instr{bipush} is slightly different, because it also uses two |
922 reduction in code size. But probably it uses specific instruction in |
923 bytes---so it does not result in a reduction in code size. But againm, |
923 the underlying CPU which make the JVM code run faster. |
924 it probably uses a specific instruction in the underlying CPU that make |
924 |
925 the JVM code run faster. This means when generating code we can use |
925 |
926 the following helper function |
926 |
927 |
928 \begin{lstlisting}[language=Scala] |
|
929 def compile_num(i: Int) = |
|
930 if (0 <= i && i <= 5) i"iconst_$i" else |
|
931 if (-128 <= i && i <= 127) i"bipush $i" else i"ldc $i" |
|
932 \end{lstlisting} |
|
933 |
|
934 \noindent |
|
935 that generates the more efficient instructions for pushing a constant |
|
936 onto the stack. Note the JVM also has special instructions that |
|
937 load and store the first three local variables. The assumption is that |
|
938 most operations and arguments in a method will only use very few |
|
939 local variables. So the JVM has the following instructions: |
|
927 |
940 |
928 \begin{itemize} |
941 \begin{itemize} |
929 \item \code{iload_0},\ldots, \code{iload_3} |
942 \item \instr{iload_0},\ldots, \instr{iload_3} |
930 \item \code{istore_0},\ldots, \code{istore_3} |
943 \item \instr{istore_0},\ldots, \instr{istore_3} |
931 \item \code{aload_0},\ldots, \code{aload_3} |
944 \item \instr{aload_0},\ldots, \instr{aload_3} |
932 \item \code{astore_0},\ldots, \code{astore_3} |
945 \item \instr{astore_0},\ldots, \instr{astore_3} |
933 \end{itemize} |
946 \end{itemize} |
934 |
947 |
935 % 33296 bytes -> 21787 |
948 |
936 % 21 -> 16 seconds |
949 \noindent Having implemented these optimisations, the code size of the |
950 BF-Mandelbrot program reduces and also it runs faster. According to my |
|
951 very rough experiments: |
|
952 |
|
953 \begin{center} |
|
954 \begin{tabular}{lll} |
|
955 & class-size & runtime\\\hline |
|
956 Mandelbrot:\\ |
|
957 \hspace{5mm}unoptimised: & 33296 & 21 secs\\ |
|
958 \hspace{5mm}optimised: & 21787 & 16 secs\\ |
|
959 \end{tabular} |
|
960 \end{center} |
|
961 |
|
962 \noindent |
|
963 Quite good! Such optimisations are called \emph{peephole optimisations}, |
|
964 because it is type of optimisations that involve changing a small set of |
|
965 instructions into an equivalent set that has better performance. |
|
966 |
|
967 If you look careful at our code you will quickly find another source of |
|
968 inefficiency in programs like |
|
969 |
|
970 \begin{lstlisting}[mathescape,language=While] |
|
971 x := ...; |
|
972 write x |
|
973 \end{lstlisting} |
|
974 |
|
975 \noindent |
|
976 where our code first calculates the new result the for \texttt{x} on the |
|
977 stack, then pops off the result into a local variable, and after that |
|
978 loads the local variable back onto the stack for writing out a number. |
|
979 If we can detect such situations, then we can leave the value of |
|
980 \texttt{x} on the stack with for example the much cheaper instruction |
|
981 \instr{dup}. Now the problem with this optimisation is that it is quite |
|
982 easy for the snippet above, but what about instances where there is |
|
983 further WHILE-code in \emph{between} these two statements? Sometimes we |
|
984 will be able to optimise, sometimes we will not. The compiler needs to |
|
985 find out which situation applies. This can become quickly much more |
|
986 complicated. So we leave this kind of optimisations here and look at |
|
987 something more interesting and possibly surprising. |
|
988 |
|
937 |
989 |
938 As you have probably seen, the compiler writer has a lot of freedom |
990 As you have probably seen, the compiler writer has a lot of freedom |
939 about how to generate code from what the programmer wrote as program. |
991 about how to generate code from what the programmer wrote as program. |
940 The only condition is that generated code should behave as expected by |
992 The only condition is that generated code should behave as expected by |
941 the programmer. Then all is fine\ldots mission accomplished! But |
993 the programmer. Then all is fine\ldots mission accomplished! But |
942 sometimes the compiler writer is expected to go an extra mile, or even |
994 sometimes the compiler writer is expected to go an extra mile, or even |
943 miles. Suppose we are given the following WHILE-program: |
995 miles and change the meaning of a program in unexpected ways. Suppose we |
996 are given the following WHILE-program: |
|
944 |
997 |
945 \begin{lstlisting}[mathescape,language=While] |
998 \begin{lstlisting}[mathescape,language=While] |
946 new(arr[10]); |
999 new(arr[10]); |
947 arr[14] := 3 + arr[13] |
1000 arr[14] := 3 + arr[13] |
948 \end{lstlisting} |
1001 \end{lstlisting} |
949 |
1002 |
950 \noindent |
1003 \noindent |
951 While admittedly this is a contrived program, and probably not meant to |
1004 Admittedly this is a contrived program, and probably not meant to be |
952 be like this by any sane programmer, it is supposed to make the |
1005 like this by any sane programmer, but it is supposed to make the |
953 following point: We generate an array of size 10, and then try to access |
1006 following point: We generate an array of size 10, and then try to access |
954 the non-existing element at index 13 and even updating element with |
1007 the non-existing element at index 13 and even updating element with |
955 index 14. Obviously this is baloney. However, our compiler generates |
1008 index 14. Obviously this is baloney. Still, our compiler generates code |
956 code for this program without any questions asked. We can even run this |
1009 for this program without any questions asked. We can even run this code |
957 code on the JVM\ldots of course the result is an exception trace where |
1010 on the JVM\ldots of course the result is an exception trace where the |
958 the JVM yells at us for doing naughty things. (This is much better than |
1011 JVM yells at us for doing naughty things. (This is much better than C, |
959 C, for example, where such errors are not prevented and as a result |
1012 for example, where such errors are not prevented and as a result |
960 insidious attacks can be mounted against such kind C-programs. I assume |
1013 insidious attacks can be mounted against such kind C-programs. I assume |
961 everyone has heard about \emph{Buffer Overflow Attacks}.) |
1014 everyone has heard about \emph{Buffer Overflow Attacks}.) Now what |
962 |
1015 should we do in such situations? Index over- or underflows are |
963 Imagine we do not want to rely in our compiler on the JVM for producing |
1016 notoriously difficult to detect statically (at compiletime), so it seem |
964 an annoying, but safe exception trace, rather we want to handle such |
1017 raising an exception at run-time like the JVM is the best compromise. |
965 situations ourselves. Lets assume we want to handle them in the |
1018 |
1019 Well, imagine we do not want to rely in our compiler on the JVM for |
|
1020 producing an annoying, but safe exception trace, rather we want to |
|
1021 handle such situations ourselves according to what we thing should |
|
1022 happen in such cases. Let's assume we want to handle them in the |
|
966 following way: if the programmer access a field out-of-bounds, we just |
1023 following way: if the programmer access a field out-of-bounds, we just |
967 return the default 0, and if a programmer wants to update an |
1024 return a default 0, and if a programmer wants to update an |
968 out-of-bounds filed, we want to ``quietly'' ignore this update. |
1025 out-of-bounds field, we want to ``quietly'' ignore this update. |
969 |
1026 |
970 |
1027 |
971 arraylength |
1028 arraylength |
972 |
1029 |
973 |
1030 |