--- a/handouts/ho07.tex	Tue Jan 28 12:23:53 2020 +0000
+++ b/handouts/ho07.tex	Mon Feb 03 01:10:16 2020 +0000
@@ -4,9 +4,8 @@
 \usepackage{../langs}
 \usepackage{../grammar}
 \usepackage{../graphics}
-
-%% add safety check on references...whether it is above 0 and below the 
-%% index
+\usepackage{framed}
+\usepackage[belowskip=7pt,aboveskip=0pt]{caption}
 
 
 
@@ -21,15 +20,16 @@
 code would be machine code the CPU can run directly, but it is often
 good enough for improving the speed of a program to target a virtual
 machine instead. This produces not the fastest possible code, but code
-that is often pretty fast. This way of producing code has the advantage
-that the virtual machine takes care of things a compiler would normally
-need to take care of (hairy things like explicit memory management). 
+that is often pretty fast. This way of producing code has also the
+advantage that the virtual machine takes care of things a compiler would
+normally need to take care of (hairy things like explicit memory
+management). 
 
 As a first example in this module we will implement a compiler for the
 very simple WHILE-language that we parsed in the last lecture. The
 compiler will target the Java Virtual Machine (JVM), but not directly.
 Pictorially the compiler will work as follows:
-
+ 
 \begin{center}
   \begin{tikzpicture}[scale=1,font=\bf,
                       node/.style={
@@ -55,11 +55,11 @@
 human-readable machine code, meaning they are not just bits and bytes,
 but rather something you can read and understand---with a bit of
 practice of course. An \emph{assembler} will then translate the assembly
-files into unreadable class or binary files the JVM can run.
+files into unreadable class- or binary-files the JVM can run.
 Unfortunately, the Java ecosystem does not come with an assembler which
 would be handy for our compiler-endeavour  (unlike Microsoft's  Common
 Language Infrastructure for the .Net platform which has an assembler
-out-of-the-box). As a substitute we shall therefore use the 3rd-party
+out-of-the-box). As a substitute we shall use the 3rd-party
 programs Jasmin and Krakatau 
 
 \begin{itemize}
@@ -72,13 +72,13 @@
 Each of them allow us to generate \emph{assembly} files that are still
 readable by humans, as opposed to class-files which are pretty much just
 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
-an assembly file as input and generate the corresponding class file for
+our assembly files as input and generate the corresponding class-files for
 us. 
 
-Good about the JVM is that it is a stack-based virtual machine, a fact
-which will make it easy to generate code for arithmetic expressions. For
-example when compiling the expression $1 + 2$ we need to generate the
-following three instructions
+What is good about the JVM is that it is a stack-based virtual machine,
+a fact which will make it easy to generate code for arithmetic
+expressions. For example when compiling the expression $1 + 2$ we need
+to generate the following three instructions
 
 \begin{lstlisting}[language=JVMIS,numbers=none]
 ldc 1
@@ -89,11 +89,12 @@
 \noindent The first instruction loads the constant $1$ onto the stack,
 the next one loads $2$, the third instruction adds both numbers together
 replacing the top two elements of the stack with the result $3$. For
-simplicity, we will consider throughout only integer numbers. This means
-our main JVM instructions for arithmetic will be \code{iadd},
-\code{isub}, \code{imul}, \code{idiv} and so on. The \code{i} stands for
-integer instructions in the JVM (alternatives are \code{d} for doubles,
-\code{l} for longs and \code{f} for floats etc).
+simplicity, we will consider throughout only arithmetic involving
+integer numbers. This means our main JVM instructions for arithmetic
+will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
+The \code{i} stands for integer instructions in the JVM (alternatives
+are \code{d} for doubles, \code{l} for longs and \code{f} for floats
+etc).
 
 Recall our grammar for arithmetic expressions (\meta{E} is the
 starting symbol):
@@ -172,7 +173,7 @@
 \noindent
 This is all fine, but our arithmetic expressions can contain variables
 and we have not considered them yet. To fix this we will represent our
-variables as the \emph{local variables} of the JVM. Essentially, local
+variables as \emph{local variables} of the JVM. Essentially, local
 variables are an array or pointers to memory cells, containing in our
 case only integers. Looking up a variable can be done with the
 instruction
@@ -268,7 +269,7 @@
 in the program, then we have to augment the environment and assign $x$
 with the largest index in $E$ plus one (that is $E' = E(x \mapsto
 largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we
-generate the following code
+generate the following code snippet
 
 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 iload $n_x$
@@ -645,17 +646,17 @@
 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
 designers of Jasmin decided that this syntax is pleasing to the eye. So
 if you wanted to have strings in your Jasmin code, you would need to
-write \texttt{Ljava/lang/String;}\;. If you want arrays of one dimension,
-then use \texttt{[\ldots}; two dimensions, use \texttt{[[\ldots} and
-so on. Looks all very ugly to my eyes.} Line~5 copies the integer we
-want to print out onto the stack. In the line after that we call the
-method \pcode{println} (from the class \pcode{java/io/PrintStream}). We
-want to print out an integer and do not expect anything back (that is
-why the type annotation is \pcode{(I)V}). The \pcode{return}-instruction
-in the next line changes the control-flow back to the place from where
-\pcode{write} was called. This method needs to be part of a header that
-is included in any code we generate. The helper-method \pcode{write} can
-be invoked with the two instructions
+write \texttt{Ljava/lang/String;}\;. If you want arrays of one
+dimension, then use \texttt{[\ldots}; two dimensions, use
+\texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5
+copies the integer we want to print out onto the stack. In the line
+after that we call the method \pcode{println} (from the class
+\pcode{java/io/PrintStream}). We want to print out an integer and do not
+expect anything back (that is why the type annotation is \pcode{(I)V}).
+The \pcode{return}-instruction in the next line changes the control-flow
+back to the place from where \pcode{write} was called. This method needs
+to be part of a header that is included in any code we generate. The
+helper-method \pcode{write} can be invoked with the two instructions
 
 \begin{lstlisting}[mathescape,language=JVMIS]
 iload $E(x)$ 
@@ -683,6 +684,7 @@
 local variables.
 
 \begin{figure}[t]
+\begin{framed}
 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
 .class public XXX.XXX
 .super java/lang/Object
@@ -696,6 +698,7 @@
     return
 .end method
 \end{lstlisting}
+\end{framed}
 \caption{The boilerplate code needed for running generated code. It
   hardwires limits for stack space and number of local
   variables.\label{boiler}}
@@ -718,13 +721,15 @@
 
 
 \begin{figure}[p]
+\begin{framed}
 \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
 \begin{tikzpicture}[remember picture,overlay]
   \draw[|<->|,very thick] (LA.north) -- (LB.south) 
-     node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
+     node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
   \draw[|<->|,very thick] (LC.north) -- (LD.south)
-     node[left=0mm,midway] {\footnotesize\texttt{write x}};
+     node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
 \end{tikzpicture}
+\end{framed}
 \caption{The generated code for the test program \texttt{x := 1 + 2; write
 x}. This code can be processed by a Java assembler producing a
 class-file, which can then be run by the {\tt{}java}-program.\label{test}}
@@ -852,36 +857,86 @@
 
 \noindent 
 The idea behind the translation is that BF-programs operate on an array,
-called here \texttt{mem}. The BP-memory pointer into this array is
+called here \texttt{mem}. The BF-memory pointer into this array is
 represented as the variable \texttt{ptr}. As usual the BF-instructions
 \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
 instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
-Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary variable
-since we have not changed our write functions in order to cope with
-writing out any array-content directly. Lines 7 and 8 are for
+Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary
+variable since we have not changed our write functions in order to cope
+with writing out any array-content directly. Lines 7 and 8 are for
 translating BF-loops. Line 8 is interesting in the sense that we need to
 generate a \code{skip} instruction just before finishing with the
 closing \code{"\}"}. The reason is that we are rather pedantic about
 semicolons in our WHILE-grammar: the last command cannot have a
-semicolon---adding a \code{skip} works around this snag. Putting all
-this together is we can generate WHILE-programs with more than 400
-instructions and then run the compiled JVM code for such programs.
-\bigskip
+semicolon---adding a \code{skip} works around this snag. 
+
+Putting all this together and we can generate WHILE-programs with more
+than 15K JVM-instructions; run the compiled JVM code for such
+programs and marvel at the output\ldots\medskip
 
 \noindent
-Hooooray, we can finally run the BF-mandelbrot program on the JVM and it
-completes within 20 seconds (after nearly 10 minutes of parsing the
-corresponding WHILE-program and generating 270K of a class file). Try
-replicating the 20 secs with an interpreter! OK, we now face the
-nagging question about what to do next\ldots
+\ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it
+completes within 20 or so seconds (after nearly 10 minutes of parsing
+the corresponding WHILE-program; the size of the resulting class files
+is around 32K). Try replicating the 20 secs with an interpreter! The
+good point is that we now have a sufficiently complicated program in our
+WHILE-language in order to do some benchmarking. Which means we now face
+the question about what to do next\ldots
+
+\subsection*{Optimisations \& Co}
+
+Every compiler that deserves its name performs some optimisation on the
+code. If we make the extra effort of writing a compiler for a language,
+then obviously we want to have our code to run as fast as possible.
 
-\subsection*{Added Value}
+So let's optimise a bit the code we generate. There is actually one
+aspect in our generated code where we can make easily efficiency gains:
+this has to do with some of the quirks of the JVM. Whenever we push a
+constant onto the stack, we used the JVM instruction \code{ldc
+some_const}. This is a rather generic instructions in the sense that it
+works not just for integers but also for strings, objects and so on.
+What this instruction does is to put the constant into a constant pool
+and then to use an index to this constant pool. This means \code{ldc}
+will be represented by at least two bytes in the class file. While this
+is sensible for ``large'' constants like strings, it is a bit of
+overkill for small integers (which many integers will be when compiling
+a BF-program). To counter this ``waste'', the JVM has specific
+instructions for small integers, for example
+ 
+\begin{itemize}
+\item \code{iconst_0},\ldots, \code{iconst_5}
+\item \code{bipush n}
+\end{itemize}
 
-% 33296 bytes -> 21882
-% shave off 2 seconds
+\noindent
+where the \code{n} is \code{bipush} is between -128 and 128.   By having
+dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and
+\code{iconst_m1}), we can make the generated code size smaller as these
+instructions only require 1 Byte (as opposed the generic \code{ldc}
+which needs 1 Byte plus another for the index into the constant pool).
+While in theory the use of such special instructions should make the
+code only smaller, it actually makes the code also run faster. Probably
+because the JVM has to process less code and uses a specific instruction
+in the underlying CPU.  The story with \code{bipush} is slightly
+different, because it also uses two Bytes---so it does not result in a
+reduction in code size. But probably it uses  specific instruction in
+the underlying CPU which make the JVM code run faster.  
+
+
+
+
+\begin{itemize}
+\item \code{iload_0},\ldots, \code{iload_3}
+\item \code{istore_0},\ldots, \code{istore_3}
+\item \code{aload_0},\ldots, \code{aload_3}
+\item \code{astore_0},\ldots, \code{astore_3}
+\end{itemize}
+
+% 33296 bytes -> 21787
+% 21 ->  16 seconds
 
 As you have probably seen, the compiler writer has a lot of freedom
-about how to generate code from what the progarmmer wrote as program.
+about how to generate code from what the programmer wrote as program.
 The only condition is that generated code should behave as expected by
 the programmer. Then all is fine\ldots mission accomplished! But
 sometimes the compiler writer is expected to go an extra mile, or even
changeset 710	183663740fb7
parent 709	c112a6cb5e52
child 711	6f3f3dd01786