--- a/handouts/ho07.tex	Mon Feb 03 01:10:16 2020 +0000
+++ b/handouts/ho07.tex	Tue Feb 04 09:31:18 2020 +0000
@@ -91,7 +91,7 @@
 replacing the top two elements of the stack with the result $3$. For
 simplicity, we will consider throughout only arithmetic involving
 integer numbers. This means our main JVM instructions for arithmetic
-will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
+will be \instr{iadd}, \instr{isub}, \instr{imul}, \instr{idiv} and so on.
 The \code{i} stands for integer instructions in the JVM (alternatives
 are \code{d} for doubles, \code{l} for longs and \code{f} for floats
 etc).
@@ -158,15 +158,15 @@
 
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
+$\textit{compile}(n)$ & $\dn$ & $\instr{ldc}\; n$\\
 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
-$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \instr{iadd}$\\
 $\textit{compile}(a_1 - a_2)$ & $\dn$ & 
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{isub}$\\
 $\textit{compile}(a_1 * a_2)$ & $\dn$ & 
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{imul}$\\
 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
-$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \instr{idiv}$\\
 \end{tabular}
 \end{center}
 
@@ -204,16 +204,16 @@
 
 \begin{center}
 \begin{tabular}{lcl}
-$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
+$\textit{compile}(n, E)$ & $\dn$ & $\instr{ldc}\;n$\\
 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
-$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{iadd}$\\
 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{isub}$\\
 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{imul}$\\
 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
-$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
-$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \instr{idiv}$\\
+$\textit{compile}(x, E)$ & $\dn$ & $\instr{iload}\;E(x)$\\
 \end{tabular}
 \end{center}
 
@@ -253,15 +253,15 @@
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{compile}(x := a, E)$ & $\dn$ & 
-$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
+$(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
 \end{tabular}
 \end{center}
 
 \noindent We first generate code for the right-hand side of the
 assignment (that is the arithmetic expression $a$) and then add an
-\pcode{istore}-instruction at the end. By convention running the code
+\instr{istore}-instruction at the end. By convention running the code
 for the arithmetic expression $a$ will leave the result on top of the
-stack. After that the \pcode{istore} instruction, the result will be
+stack. After that the \instr{istore} instruction, the result will be
 stored in the index corresponding to the variable $x$. If the variable
 $x$ has been used before in the program, we just need to look up what
 the index is and return the environment unchanged (that is in this case
@@ -299,7 +299,7 @@
 A bit more complicated is the generation of code for
 \pcode{if}-statements, say
 
-\begin{lstlisting}[mathescape,language={},numbers=none]
+\begin{lstlisting}[mathescape,language={WHILE},numbers=none]
 if $b$ then $cs_1$ else $cs_2$
 \end{lstlisting}
 
@@ -338,7 +338,7 @@
 (the code for the \pcode{else}-branch). Note that this jump is
 unconditional, meaning we always have to jump to the end of
 $cs_2$. The corresponding instruction of the JVM is
-\pcode{goto}. In case $b$ turns out to be false we need the
+\instr{goto}. In case $b$ turns out to be false we need the
 control-flow
 
 \begin{center}
@@ -370,7 +370,7 @@
 are finished with running $cs_2$ we can continue with whatever
 code comes after the if-statement.
 
-The \pcode{goto} and the conditional jumps need addresses to
+The \instr{goto} and the conditional jumps need addresses to
 where the jump should go. Since we are generating assembly
 code for the JVM, we do not actually have to give (numeric)
 addresses, but can just attach (symbolic) labels to our code.
@@ -381,8 +381,8 @@
 
 \begin{lstlisting}[mathescape,numbers=none]
 L:
-  $instr_1$
-  $instr_2$
+  $\textit{instr\_1}$
+  $\textit{instr\_2}$
     $\vdots$
 \end{lstlisting}
  
@@ -401,7 +401,7 @@
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ 
-\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$}
+\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \instr{if_icmpne}\;lab$}
 \end{tabular}
 \end{center}
 
@@ -429,9 +429,9 @@
 
 \begin{center}
 \begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l}
-$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\
-$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\
-$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
+$\not=$ & $\Rightarrow$ & \instr{if_icmpeq}\\
+$<$ & $\Rightarrow$ & \instr{if_icmpge}\\
+$\le$ & $\Rightarrow$ & \instr{if_icmpgt}\\
 \end{tabular}
 \end{center}
 
@@ -700,7 +700,7 @@
 \end{lstlisting}
 \end{framed}
 \caption{The boilerplate code needed for running generated code. It
-  hardwires limits for stack space and number of local
+  hardwires limits for stack space and for the number of local
   variables.\label{boiler}}
 \end{figure}
 
@@ -779,7 +779,7 @@
 
 \begin{lstlisting}[mathescape,language=JVMIS]
 aload loc_var 
-index_aexp 
+$\textit{index\_aexp}$ 
 iaload
 \end{lstlisting}
 
@@ -794,8 +794,8 @@
 
 \begin{lstlisting}[mathescape,language=JVMIS]
 aload loc_var 
-index_aexp 
-value_aexp 
+$\textit{index\_aexp}$ 
+$\textit{value\_aexp}$ 
 iastore
 \end{lstlisting}
 
@@ -870,77 +870,130 @@
 semicolons in our WHILE-grammar: the last command cannot have a
 semicolon---adding a \code{skip} works around this snag. 
 
-Putting all this together and we can generate WHILE-programs with more
+Putting this all together and we can generate WHILE-programs with more
 than 15K JVM-instructions; run the compiled JVM code for such
 programs and marvel at the output\ldots\medskip
 
 \noindent
-\ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it
-completes within 20 or so seconds (after nearly 10 minutes of parsing
-the corresponding WHILE-program; the size of the resulting class files
-is around 32K). Try replicating the 20 secs with an interpreter! The
+\ldots{}Hooooray, after a few more tweaks we can finally run the
+BF-mandelbrot program on the JVM (after nearly 10 minutes of parsing the
+corresponding WHILE-program; the size of the resulting class file is
+around 32K---not too bad). The generation of the picture completes
+within 20 or so seconds. Try replicating this with an interpreter! The
 good point is that we now have a sufficiently complicated program in our
 WHILE-language in order to do some benchmarking. Which means we now face
 the question about what to do next\ldots
 
 \subsection*{Optimisations \& Co}
 
-Every compiler that deserves its name performs some optimisation on the
+Every compiler that deserves its name performs optimisations on the
 code. If we make the extra effort of writing a compiler for a language,
-then obviously we want to have our code to run as fast as possible.
+then obviously we want to have our code to run as fast as possible. 
+So we should look into this.
 
-So let's optimise a bit the code we generate. There is actually one
-aspect in our generated code where we can make easily efficiency gains:
-this has to do with some of the quirks of the JVM. Whenever we push a
-constant onto the stack, we used the JVM instruction \code{ldc
-some_const}. This is a rather generic instructions in the sense that it
-works not just for integers but also for strings, objects and so on.
-What this instruction does is to put the constant into a constant pool
-and then to use an index to this constant pool. This means \code{ldc}
-will be represented by at least two bytes in the class file. While this
-is sensible for ``large'' constants like strings, it is a bit of
-overkill for small integers (which many integers will be when compiling
-a BF-program). To counter this ``waste'', the JVM has specific
-instructions for small integers, for example
+There is actually one aspect in our generated code where we can make
+easily efficiency gains: this has to do with some of the quirks of the
+JVM. Whenever we push a constant onto the stack, we used the JVM
+instruction \instr{ldc some_const}. This is a rather generic instruction
+in the sense that it works not just for integers but also for strings,
+objects and so on. What this instruction does is putting the constant
+into a \emph{constant pool} and then to use an index into this constant
+pool. This means \instr{ldc} will be represented by at least two bytes
+in the class file. While this is sensible for ``large'' constants like
+strings, it is a bit of overkill for small integers (which many integers
+will be when compiling a BF-program). To counter this ``waste'', the JVM
+has specific instructions for small integers, for example
  
 \begin{itemize}
-\item \code{iconst_0},\ldots, \code{iconst_5}
-\item \code{bipush n}
+\item \instr{iconst_0},\ldots, \instr{iconst_5}
+\item \instr{bipush n}
 \end{itemize}
 
 \noindent
-where the \code{n} is \code{bipush} is between -128 and 128.   By having
-dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and
-\code{iconst_m1}), we can make the generated code size smaller as these
-instructions only require 1 Byte (as opposed the generic \code{ldc}
-which needs 1 Byte plus another for the index into the constant pool).
-While in theory the use of such special instructions should make the
-code only smaller, it actually makes the code also run faster. Probably
-because the JVM has to process less code and uses a specific instruction
-in the underlying CPU.  The story with \code{bipush} is slightly
-different, because it also uses two Bytes---so it does not result in a
-reduction in code size. But probably it uses  specific instruction in
-the underlying CPU which make the JVM code run faster.  
+where the \code{n} is \instr{bipush} is between -128 and 128.   By
+having dedicated instructions such as \instr{iconst_0} to
+\instr{iconst_5} (and \instr{iconst_m1}), we can make the generated code
+size smaller as these instructions only require 1 byte (as opposed the
+generic \instr{ldc} which needs 1 byte plus another for the index into
+the constant pool). While in theory the use of such special instructions
+should make the code only smaller, it actually makes the code also run
+faster. Probably because the JVM has to process less code and uses a
+specific instruction in the underlying CPU.  The story with
+\instr{bipush} is slightly different, because it also uses two
+bytes---so it does not result in a reduction in code size. But againm,
+it probably uses a specific instruction in the underlying CPU that make
+the JVM code run faster. This means when generating code we can use
+the following helper function
+
+\begin{lstlisting}[language=Scala]
+def compile_num(i: Int) = 
+  if (0 <= i && i <= 5) i"iconst_$i" else 
+  if (-128 <= i && i <= 127) i"bipush $i" else i"ldc $i"
+\end{lstlisting}
+
+\noindent 
+that generates the more efficient instructions for pushing a constant
+onto the stack. Note the JVM also has special instructions  that
+load and store the first three local variables. The assumption is that 
+most operations and arguments in a method will only use very few 
+local variables. So the JVM has the following instructions:
+
+\begin{itemize}
+\item \instr{iload_0},\ldots, \instr{iload_3}
+\item \instr{istore_0},\ldots, \instr{istore_3}
+\item \instr{aload_0},\ldots, \instr{aload_3}
+\item \instr{astore_0},\ldots, \instr{astore_3}
+\end{itemize}
 
 
+\noindent Having implemented these optimisations, the code size of the
+BF-Mandelbrot program reduces and also it runs faster. According to my
+very rough experiments:
 
+\begin{center}
+\begin{tabular}{lll}
+& class-size & runtime\\\hline
+Mandelbrot:\\
+\hspace{5mm}unoptimised: & 33296 & 21 secs\\
+\hspace{5mm}optimised:   & 21787 & 16 secs\\
+\end{tabular}
+\end{center}
+
+\noindent 
+Quite good! Such optimisations are called \emph{peephole optimisations},
+because it is type of optimisations that involve changing a small set of
+instructions into an equivalent set that has better performance. 
 
-\begin{itemize}
-\item \code{iload_0},\ldots, \code{iload_3}
-\item \code{istore_0},\ldots, \code{istore_3}
-\item \code{aload_0},\ldots, \code{aload_3}
-\item \code{astore_0},\ldots, \code{astore_3}
-\end{itemize}
+If you look careful at our code you will quickly find another source of
+inefficiency in programs like
+
+\begin{lstlisting}[mathescape,language=While]
+x := ...;
+write x
+\end{lstlisting}
 
-% 33296 bytes -> 21787
-% 21 ->  16 seconds
+\noindent
+where our code first calculates the new result the for \texttt{x} on the
+stack, then pops off the result into a local variable, and after that
+loads the local variable back onto the stack for writing out a number.
+If we can detect such situations, then we can leave the value of
+\texttt{x} on the stack with for example the much cheaper instruction
+\instr{dup}. Now the problem with this optimisation is that it is quite
+easy for the snippet above, but what about instances where there is
+further WHILE-code in \emph{between} these two statements? Sometimes we
+will be able to optimise, sometimes we will not. The compiler needs to
+find out which situation applies. This can become quickly much more
+complicated. So we leave this kind of optimisations here and look at
+something more interesting and possibly surprising.
+
 
 As you have probably seen, the compiler writer has a lot of freedom
 about how to generate code from what the programmer wrote as program.
 The only condition is that generated code should behave as expected by
 the programmer. Then all is fine\ldots mission accomplished! But
 sometimes the compiler writer is expected to go an extra mile, or even
-miles. Suppose we are given the following WHILE-program:
+miles and change the meaning of a program in unexpected ways. Suppose we
+are given the following WHILE-program:
 
 \begin{lstlisting}[mathescape,language=While]
 new(arr[10]);
@@ -948,24 +1001,28 @@
 \end{lstlisting}
 
 \noindent 
-While admittedly this is a contrived program, and probably not meant to
-be like this by any sane programmer, it is supposed to make the
+Admittedly this is a contrived program, and probably not meant to be
+like this by any sane programmer, but it is supposed to make the
 following point: We generate an array of size 10, and then try to access
 the non-existing element at index 13 and even updating element with
-index 14. Obviously this is baloney. However, our compiler generates
-code for this program without any questions asked. We can even run this
-code on the JVM\ldots of course the result is an exception trace where
-the JVM yells at us for doing naughty things. (This is much better than
-C, for example, where such errors are not prevented and as a result
+index 14. Obviously this is baloney. Still, our compiler generates code
+for this program without any questions asked. We can even run this code
+on the JVM\ldots of course the result is an exception trace where the
+JVM yells at us for doing naughty things. (This is much better than C,
+for example, where such errors are not prevented and as a result
 insidious attacks can be mounted against such kind C-programs. I assume
-everyone has heard about \emph{Buffer Overflow Attacks}.) 
+everyone has heard about \emph{Buffer Overflow Attacks}.) Now what
+should we do in such situations? Index over- or underflows are
+notoriously difficult to detect statically (at compiletime), so it seem
+raising an exception at run-time like the JVM is the best compromise.
 
-Imagine we do not want to rely in our compiler on the JVM for producing
-an annoying, but safe exception trace, rather we want to handle such
-situations ourselves. Lets assume we want to handle them in the
+Well, imagine we do not want to rely in our compiler on the JVM for
+producing an annoying, but safe exception trace, rather we want to
+handle such situations ourselves according to what we thing should
+happen in such cases. Let's assume we want to handle them in the
 following way: if the programmer access a field out-of-bounds, we just
-return the default 0, and if a programmer wants to update an
-out-of-bounds filed, we want to ``quietly'' ignore this update.
+return a default 0, and if a programmer wants to update an
+out-of-bounds field, we want to ``quietly'' ignore this update.
 
 
 arraylength
changeset 711	6f3f3dd01786
parent 710	183663740fb7
child 712	e71eb9ce2373