afl-material: comparison handouts/ho07.tex

equal deleted inserted replaced

-:183663740fb7
+:6f3f3dd01786
 \noindent The first instruction loads the constant $1$ onto the stack,
 the next one loads $2$, the third instruction adds both numbers together
 replacing the top two elements of the stack with the result $3$. For
 simplicity, we will consider throughout only arithmetic involving
 integer numbers. This means our main JVM instructions for arithmetic
-will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
+will be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on.
 The \code{i} stands for integer instructions in the JVM (alternatives
 are \code{d} for doubles, \code{l} for longs and \code{f} for floats
 etc).
 Recall our grammar for arithmetic expressions (\meta{E} is the
 \textit{compile}-function, which takes the abstract syntax tree as an
 argument:
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
+$\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\
 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
-$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\
 $\textit{compile}(a_1 - a_2)$ & $\dn$ &
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\
 $\textit{compile}(a_1 * a_2)$ & $\dn$ &
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\
 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\
 \end{tabular}
 \end{center}
 \noindent
 This is all fine, but our arithmetic expressions can contain variables
 expressions will therefore take two arguments: the abstract syntax tree
 and an environment, $E$, that maps identifiers to index-numbers.
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
+$\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\
 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
-$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\
 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\
 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\
 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\
-$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
+$\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\
 \end{tabular}
 \end{center}
 \noindent In the last line we generate the code for variables where
 $E(x)$ stands for looking up the environment to which index the variable
 $\textit{x := a}$:
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{compile}(x := a, E)$ & $\dn$ &
-$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
+$(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
 \end{tabular}
 \end{center}
 \noindent We first generate code for the right-hand side of the
 assignment (that is the arithmetic expression $a$) and then add an
-\pcode{istore}-instruction at the end. By convention running the code
+\instr{istore}-instruction at the end. By convention running the code
 for the arithmetic expression $a$ will leave the result on top of the
-stack. After that the \pcode{istore} instruction, the result will be
+stack. After that the \instr{istore} instruction, the result will be
 stored in the index corresponding to the variable $x$. If the variable
 $x$ has been used before in the program, we just need to look up what
 the index is and return the environment unchanged (that is in this case
 $E' = E$). However, if this is the first encounter of the variable $x$
 in the program, then we have to augment the environment and assign $x$
 where we can store values of variables.
 A bit more complicated is the generation of code for
 \pcode{if}-statements, say
-\begin{lstlisting}[mathescape,language={},numbers=none]
+\begin{lstlisting}[mathescape,language={WHILE},numbers=none]
 if $b$ then $cs_1$ else $cs_2$
 \end{lstlisting}
 \noindent where $b$ is a boolean expression and where both $cs_{1/2}$
 are the statements for each of the \pcode{if}-branches. Let us assume we
 $cs_1$. After this however, we must not run the code for
 $cs_2$, but always jump to after the last instruction of $cs_2$
 (the code for the \pcode{else}-branch). Note that this jump is
 unconditional, meaning we always have to jump to the end of
 $cs_2$. The corresponding instruction of the JVM is
-\pcode{goto}. In case $b$ turns out to be false we need the
+\instr{goto}. In case $b$ turns out to be false we need the
 control-flow
 \begin{center}
 \begin{tikzpicture}[node distance=2mm and 4mm,line cap=round,
 block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm,
 if-condition is false) from the end of the code for the
 boolean to the beginning of the instructions $cs_2$. Once we
 are finished with running $cs_2$ we can continue with whatever
 code comes after the if-statement.
-The \pcode{goto} and the conditional jumps need addresses to
+The \instr{goto} and the conditional jumps need addresses to
 where the jump should go. Since we are generating assembly
 code for the JVM, we do not actually have to give (numeric)
 addresses, but can just attach (symbolic) labels to our code.
 These labels specify a target for a jump. Therefore the labels
 need to be unique, as otherwise it would be ambiguous where a
 jump should go to. A label, say \pcode{L}, is attached to code
 like
 \begin{lstlisting}[mathescape,numbers=none]
 L:
-$instr_1$
+$\textit{instr\_1}$
-$instr_2$
+$\textit{instr\_2}$
 $\vdots$
 \end{lstlisting}
 \noindent where the label needs to be followed by a colon. The task of
 the assembler (in our case Jasmin or Krakatau) is to resolve the labels
 for example, is as follows:
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\
-\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$}
+\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}
 \end{tabular}
 \end{center}
 \noindent where we are first generating code for the
 subexpressions $a_1$ and $a_2$. This will mean after running
 condition---equals becomes not-equal, less becomes greater-or-equal.
 Other jump instructions for boolean operators are
 \begin{center}
 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
-$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
+$\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\
-$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
+$<$ & $\Rightarrow$ & \instr{if_icmpge}\\
-$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
+$\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\
 \end{tabular}
 \end{center}
 \noindent and so on.   If you do not like this design (it can be the
 source of some nasty, hard-to-detect errors), you can also change the
 return
 .end method
 \end{lstlisting}
 \end{framed}
 \caption{The boilerplate code needed for running generated code. It
-hardwires limits for stack space and number of local
+hardwires limits for stack space and for the number of local
 variables.\label{boiler}}
 \end{figure}
 To sum up, in Figure~\ref{test} is the complete code generated
 multiple arrays in a WHILE-program. For looking up an element in an
 array we can use the following JVM code
 \begin{lstlisting}[mathescape,language=JVMIS]
 aload loc_var
-index_aexp
+$\textit{index\_aexp}$
 iaload
 \end{lstlisting}
 \noindent
 The first instruction loads the ``pointer'', or local variable, to the
 JVM to load the corresponding element onto the stack. Updating an array
 at an index with a value is as follows.
 \begin{lstlisting}[mathescape,language=JVMIS]
 aload loc_var
-index_aexp
+$\textit{index\_aexp}$
-value_aexp
+$\textit{value\_aexp}$
 iastore
 \end{lstlisting}
 \noindent
 Again the first instruction loads the local variable of
 generate a \code{skip} instruction just before finishing with the
 closing \code{"\}"}. The reason is that we are rather pedantic about
 semicolons in our WHILE-grammar: the last command cannot have a
 semicolon---adding a \code{skip} works around this snag.
-Putting all this together and we can generate WHILE-programs with more
+Putting this all together and we can generate WHILE-programs with more
 than 15K JVM-instructions; run the compiled JVM code for such
 programs and marvel at the output\ldots\medskip
 \noindent
-\ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it
+\ldots{}Hooooray, after a few more tweaks we can finally run the
-completes within 20 or so seconds (after nearly 10 minutes of parsing
+BF-mandelbrot program on the JVM (after nearly 10 minutes of parsing the
-the corresponding WHILE-program; the size of the resulting class files
+corresponding WHILE-program; the size of the resulting class file is
-is around 32K). Try replicating the 20 secs with an interpreter! The
+around 32K---not too bad). The generation of the picture completes
+within 20 or so seconds. Try replicating this with an interpreter! The
 good point is that we now have a sufficiently complicated program in our
 WHILE-language in order to do some benchmarking. Which means we now face
 the question about what to do next\ldots
 \subsection*{Optimisations \& Co}
-Every compiler that deserves its name performs some optimisation on the
+Every compiler that deserves its name performs optimisations on the
 code. If we make the extra effort of writing a compiler for a language,
 then obviously we want to have our code to run as fast as possible.
+So we should look into this.
-So let's optimise a bit the code we generate. There is actually one
-aspect in our generated code where we can make easily efficiency gains:
+There is actually one aspect in our generated code where we can make
-this has to do with some of the quirks of the JVM. Whenever we push a
+easily efficiency gains: this has to do with some of the quirks of the
-constant onto the stack, we used the JVM instruction \code{ldc
+JVM. Whenever we push a constant onto the stack, we used the JVM
-some_const}. This is a rather generic instructions in the sense that it
+instruction \instr{ldc some_const}. This is a rather generic instruction
-works not just for integers but also for strings, objects and so on.
+in the sense that it works not just for integers but also for strings,
-What this instruction does is to put the constant into a constant pool
+objects and so on. What this instruction does is putting the constant
-and then to use an index to this constant pool. This means \code{ldc}
+into a \emph{constant pool} and then to use an index into this constant
-will be represented by at least two bytes in the class file. While this
+pool. This means \instr{ldc} will be represented by at least two bytes
-is sensible for ``large'' constants like strings, it is a bit of
+in the class file. While this is sensible for ``large'' constants like
-overkill for small integers (which many integers will be when compiling
+strings, it is a bit of overkill for small integers (which many integers
-a BF-program). To counter this ``waste'', the JVM has specific
+will be when compiling a BF-program). To counter this ``waste'', the JVM
-instructions for small integers, for example
+has specific instructions for small integers, for example
 \begin{itemize}
-\item \code{iconst_0},\ldots, \code{iconst_5}
+\item \instr{iconst_0},\ldots, \instr{iconst_5}
-\item \code{bipush n}
+\item \instr{bipush n}
 \end{itemize}
 \noindent
-where the \code{n} is \code{bipush} is between -128 and 128.   By having
+where the \code{n} is \instr{bipush} is between -128 and 128.   By
-dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and
+having dedicated instructions such as \instr{iconst_0} to
-\code{iconst_m1}), we can make the generated code size smaller as these
+\instr{iconst_5} (and \instr{iconst_m1}), we can make the generated code
-instructions only require 1 Byte (as opposed the generic \code{ldc}
+size smaller as these instructions only require 1 byte (as opposed the
-which needs 1 Byte plus another for the index into the constant pool).
+generic \instr{ldc} which needs 1 byte plus another for the index into
-While in theory the use of such special instructions should make the
+the constant pool). While in theory the use of such special instructions
-code only smaller, it actually makes the code also run faster. Probably
+should make the code only smaller, it actually makes the code also run
-because the JVM has to process less code and uses a specific instruction
+faster. Probably because the JVM has to process less code and uses a
-in the underlying CPU.  The story with \code{bipush} is slightly
+specific instruction in the underlying CPU.  The story with
-different, because it also uses two Bytes---so it does not result in a
+\instr{bipush} is slightly different, because it also uses two
-reduction in code size. But probably it uses  specific instruction in
+bytes---so it does not result in a reduction in code size. But againm,
-the underlying CPU which make the JVM code run faster.
+it probably uses a specific instruction in the underlying CPU that make
+the JVM code run faster. This means when generating code we can use
+the following helper function
+\begin{lstlisting}[language=Scala]
+def compile_num(i: Int) =
+if (0 <= i && i <= 5) i"iconst_$i" else
+if (-128 <= i && i <= 127) i"bipush $i" else i"ldc $i"
+\end{lstlisting}
+\noindent
+that generates the more efficient instructions for pushing a constant
+onto the stack. Note the JVM also has special instructions  that
+load and store the first three local variables. The assumption is that
+most operations and arguments in a method will only use very few
+local variables. So the JVM has the following instructions:
 \begin{itemize}
-\item \code{iload_0},\ldots, \code{iload_3}
+\item \instr{iload_0},\ldots, \instr{iload_3}
-\item \code{istore_0},\ldots, \code{istore_3}
+\item \instr{istore_0},\ldots, \instr{istore_3}
-\item \code{aload_0},\ldots, \code{aload_3}
+\item \instr{aload_0},\ldots, \instr{aload_3}
-\item \code{astore_0},\ldots, \code{astore_3}
+\item \instr{astore_0},\ldots, \instr{astore_3}
 \end{itemize}
-% 33296 bytes -> 21787
-% 21 ->  16 seconds
+\noindent Having implemented these optimisations, the code size of the
+BF-Mandelbrot program reduces and also it runs faster. According to my
+very rough experiments:
+\begin{center}
+\begin{tabular}{lll}
+& class-size & runtime\\\hline
+Mandelbrot:\\
+\hspace{5mm}unoptimised: & 33296 & 21 secs\\
+\hspace{5mm}optimised:   & 21787 & 16 secs\\
+\end{tabular}
+\end{center}
+\noindent
+Quite good! Such optimisations are called \emph{peephole optimisations},
+because it is type of optimisations that involve changing a small set of
+instructions into an equivalent set that has better performance.
+If you look careful at our code you will quickly find another source of
+inefficiency in programs like
+\begin{lstlisting}[mathescape,language=While]
+x := ...;
+write x
+\end{lstlisting}
+\noindent
+where our code first calculates the new result the for \texttt{x} on the
+stack, then pops off the result into a local variable, and after that
+loads the local variable back onto the stack for writing out a number.
+If we can detect such situations, then we can leave the value of
+\texttt{x} on the stack with for example the much cheaper instruction
+\instr{dup}. Now the problem with this optimisation is that it is quite
+easy for the snippet above, but what about instances where there is
+further WHILE-code in \emph{between} these two statements? Sometimes we
+will be able to optimise, sometimes we will not. The compiler needs to
+find out which situation applies. This can become quickly much more
+complicated. So we leave this kind of optimisations here and look at
+something more interesting and possibly surprising.
 As you have probably seen, the compiler writer has a lot of freedom
 about how to generate code from what the programmer wrote as program.
 The only condition is that generated code should behave as expected by
 the programmer. Then all is fine\ldots mission accomplished! But
 sometimes the compiler writer is expected to go an extra mile, or even
-miles. Suppose we are given the following WHILE-program:
+miles and change the meaning of a program in unexpected ways. Suppose we
+are given the following WHILE-program:
 \begin{lstlisting}[mathescape,language=While]
 new(arr[10]);
 arr[14] := 3 + arr[13]
 \end{lstlisting}
 \noindent
-While admittedly this is a contrived program, and probably not meant to
+Admittedly this is a contrived program, and probably not meant to be
-be like this by any sane programmer, it is supposed to make the
+like this by any sane programmer, but it is supposed to make the
 following point: We generate an array of size 10, and then try to access
 the non-existing element at index 13 and even updating element with
-index 14. Obviously this is baloney. However, our compiler generates
+index 14. Obviously this is baloney. Still, our compiler generates code
-code for this program without any questions asked. We can even run this
+for this program without any questions asked. We can even run this code
-code on the JVM\ldots of course the result is an exception trace where
+on the JVM\ldots of course the result is an exception trace where the
-the JVM yells at us for doing naughty things. (This is much better than
+JVM yells at us for doing naughty things. (This is much better than C,
-C, for example, where such errors are not prevented and as a result
+for example, where such errors are not prevented and as a result
 insidious attacks can be mounted against such kind C-programs. I assume
-everyone has heard about \emph{Buffer Overflow Attacks}.)
+everyone has heard about \emph{Buffer Overflow Attacks}.) Now what
+should we do in such situations? Index over- or underflows are
-Imagine we do not want to rely in our compiler on the JVM for producing
+notoriously difficult to detect statically (at compiletime), so it seem
-an annoying, but safe exception trace, rather we want to handle such
+raising an exception at run-time like the JVM is the best compromise.
-situations ourselves. Lets assume we want to handle them in the
+Well, imagine we do not want to rely in our compiler on the JVM for
+producing an annoying, but safe exception trace, rather we want to
+handle such situations ourselves according to what we thing should
+happen in such cases. Let's assume we want to handle them in the
 following way: if the programmer access a field out-of-bounds, we just
-return the default 0, and if a programmer wants to update an
+return a default 0, and if a programmer wants to update an
-out-of-bounds filed, we want to ``quietly'' ignore this update.
+out-of-bounds field, we want to ``quietly'' ignore this update.
 arraylength

changeset 711	6f3f3dd01786
parent 710	183663740fb7
child 712	e71eb9ce2373