--- a/handouts/ho07.tex Tue Jan 28 12:23:53 2020 +0000
+++ b/handouts/ho07.tex Mon Feb 03 01:10:16 2020 +0000
@@ -4,9 +4,8 @@
\usepackage{../langs}
\usepackage{../grammar}
\usepackage{../graphics}
-
-%% add safety check on references...whether it is above 0 and below the
-%% index
+\usepackage{framed}
+\usepackage[belowskip=7pt,aboveskip=0pt]{caption}
@@ -21,15 +20,16 @@
code would be machine code the CPU can run directly, but it is often
good enough for improving the speed of a program to target a virtual
machine instead. This produces not the fastest possible code, but code
-that is often pretty fast. This way of producing code has the advantage
-that the virtual machine takes care of things a compiler would normally
-need to take care of (hairy things like explicit memory management).
+that is often pretty fast. This way of producing code has also the
+advantage that the virtual machine takes care of things a compiler would
+normally need to take care of (hairy things like explicit memory
+management).
As a first example in this module we will implement a compiler for the
very simple WHILE-language that we parsed in the last lecture. The
compiler will target the Java Virtual Machine (JVM), but not directly.
Pictorially the compiler will work as follows:
-
+
\begin{center}
\begin{tikzpicture}[scale=1,font=\bf,
node/.style={
@@ -55,11 +55,11 @@
human-readable machine code, meaning they are not just bits and bytes,
but rather something you can read and understand---with a bit of
practice of course. An \emph{assembler} will then translate the assembly
-files into unreadable class or binary files the JVM can run.
+files into unreadable class- or binary-files the JVM can run.
Unfortunately, the Java ecosystem does not come with an assembler which
would be handy for our compiler-endeavour (unlike Microsoft's Common
Language Infrastructure for the .Net platform which has an assembler
-out-of-the-box). As a substitute we shall therefore use the 3rd-party
+out-of-the-box). As a substitute we shall use the 3rd-party
programs Jasmin and Krakatau
\begin{itemize}
@@ -72,13 +72,13 @@
Each of them allow us to generate \emph{assembly} files that are still
readable by humans, as opposed to class-files which are pretty much just
(horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
-an assembly file as input and generate the corresponding class file for
+our assembly files as input and generate the corresponding class-files for
us.
-Good about the JVM is that it is a stack-based virtual machine, a fact
-which will make it easy to generate code for arithmetic expressions. For
-example when compiling the expression $1 + 2$ we need to generate the
-following three instructions
+What is good about the JVM is that it is a stack-based virtual machine,
+a fact which will make it easy to generate code for arithmetic
+expressions. For example when compiling the expression $1 + 2$ we need
+to generate the following three instructions
\begin{lstlisting}[language=JVMIS,numbers=none]
ldc 1
@@ -89,11 +89,12 @@
\noindent The first instruction loads the constant $1$ onto the stack,
the next one loads $2$, the third instruction adds both numbers together
replacing the top two elements of the stack with the result $3$. For
-simplicity, we will consider throughout only integer numbers. This means
-our main JVM instructions for arithmetic will be \code{iadd},
-\code{isub}, \code{imul}, \code{idiv} and so on. The \code{i} stands for
-integer instructions in the JVM (alternatives are \code{d} for doubles,
-\code{l} for longs and \code{f} for floats etc).
+simplicity, we will consider throughout only arithmetic involving
+integer numbers. This means our main JVM instructions for arithmetic
+will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
+The \code{i} stands for integer instructions in the JVM (alternatives
+are \code{d} for doubles, \code{l} for longs and \code{f} for floats
+etc).
Recall our grammar for arithmetic expressions (\meta{E} is the
starting symbol):
@@ -172,7 +173,7 @@
\noindent
This is all fine, but our arithmetic expressions can contain variables
and we have not considered them yet. To fix this we will represent our
-variables as the \emph{local variables} of the JVM. Essentially, local
+variables as \emph{local variables} of the JVM. Essentially, local
variables are an array or pointers to memory cells, containing in our
case only integers. Looking up a variable can be done with the
instruction
@@ -268,7 +269,7 @@
in the program, then we have to augment the environment and assign $x$
with the largest index in $E$ plus one (that is $E' = E(x \mapsto
largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we
-generate the following code
+generate the following code snippet
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
iload $n_x$
@@ -645,17 +646,17 @@
\ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
designers of Jasmin decided that this syntax is pleasing to the eye. So
if you wanted to have strings in your Jasmin code, you would need to
-write \texttt{Ljava/lang/String;}\;. If you want arrays of one dimension,
-then use \texttt{[\ldots}; two dimensions, use \texttt{[[\ldots} and
-so on. Looks all very ugly to my eyes.} Line~5 copies the integer we
-want to print out onto the stack. In the line after that we call the
-method \pcode{println} (from the class \pcode{java/io/PrintStream}). We
-want to print out an integer and do not expect anything back (that is
-why the type annotation is \pcode{(I)V}). The \pcode{return}-instruction
-in the next line changes the control-flow back to the place from where
-\pcode{write} was called. This method needs to be part of a header that
-is included in any code we generate. The helper-method \pcode{write} can
-be invoked with the two instructions
+write \texttt{Ljava/lang/String;}\;. If you want arrays of one
+dimension, then use \texttt{[\ldots}; two dimensions, use
+\texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5
+copies the integer we want to print out onto the stack. In the line
+after that we call the method \pcode{println} (from the class
+\pcode{java/io/PrintStream}). We want to print out an integer and do not
+expect anything back (that is why the type annotation is \pcode{(I)V}).
+The \pcode{return}-instruction in the next line changes the control-flow
+back to the place from where \pcode{write} was called. This method needs
+to be part of a header that is included in any code we generate. The
+helper-method \pcode{write} can be invoked with the two instructions
\begin{lstlisting}[mathescape,language=JVMIS]
iload $E(x)$
@@ -683,6 +684,7 @@
local variables.
\begin{figure}[t]
+\begin{framed}
\begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
.class public XXX.XXX
.super java/lang/Object
@@ -696,6 +698,7 @@
return
.end method
\end{lstlisting}
+\end{framed}
\caption{The boilerplate code needed for running generated code. It
hardwires limits for stack space and number of local
variables.\label{boiler}}
@@ -718,13 +721,15 @@
\begin{figure}[p]
+\begin{framed}
\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
\begin{tikzpicture}[remember picture,overlay]
\draw[|<->|,very thick] (LA.north) -- (LB.south)
- node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
+ node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
\draw[|<->|,very thick] (LC.north) -- (LD.south)
- node[left=0mm,midway] {\footnotesize\texttt{write x}};
+ node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
\end{tikzpicture}
+\end{framed}
\caption{The generated code for the test program \texttt{x := 1 + 2; write
x}. This code can be processed by a Java assembler producing a
class-file, which can then be run by the {\tt{}java}-program.\label{test}}
@@ -852,36 +857,86 @@
\noindent
The idea behind the translation is that BF-programs operate on an array,
-called here \texttt{mem}. The BP-memory pointer into this array is
+called here \texttt{mem}. The BF-memory pointer into this array is
represented as the variable \texttt{ptr}. As usual the BF-instructions
\code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
-Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary variable
-since we have not changed our write functions in order to cope with
-writing out any array-content directly. Lines 7 and 8 are for
+Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary
+variable since we have not changed our write functions in order to cope
+with writing out any array-content directly. Lines 7 and 8 are for
translating BF-loops. Line 8 is interesting in the sense that we need to
generate a \code{skip} instruction just before finishing with the
closing \code{"\}"}. The reason is that we are rather pedantic about
semicolons in our WHILE-grammar: the last command cannot have a
-semicolon---adding a \code{skip} works around this snag. Putting all
-this together is we can generate WHILE-programs with more than 400
-instructions and then run the compiled JVM code for such programs.
-\bigskip
+semicolon---adding a \code{skip} works around this snag.
+
+Putting all this together and we can generate WHILE-programs with more
+than 15K JVM-instructions; run the compiled JVM code for such
+programs and marvel at the output\ldots\medskip
\noindent
-Hooooray, we can finally run the BF-mandelbrot program on the JVM and it
-completes within 20 seconds (after nearly 10 minutes of parsing the
-corresponding WHILE-program and generating 270K of a class file). Try
-replicating the 20 secs with an interpreter! OK, we now face the
-nagging question about what to do next\ldots
+\ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it
+completes within 20 or so seconds (after nearly 10 minutes of parsing
+the corresponding WHILE-program; the size of the resulting class files
+is around 32K). Try replicating the 20 secs with an interpreter! The
+good point is that we now have a sufficiently complicated program in our
+WHILE-language in order to do some benchmarking. Which means we now face
+the question about what to do next\ldots
+
+\subsection*{Optimisations \& Co}
+
+Every compiler that deserves its name performs some optimisation on the
+code. If we make the extra effort of writing a compiler for a language,
+then obviously we want to have our code to run as fast as possible.
-\subsection*{Added Value}
+So let's optimise a bit the code we generate. There is actually one
+aspect in our generated code where we can make easily efficiency gains:
+this has to do with some of the quirks of the JVM. Whenever we push a
+constant onto the stack, we used the JVM instruction \code{ldc
+some_const}. This is a rather generic instructions in the sense that it
+works not just for integers but also for strings, objects and so on.
+What this instruction does is to put the constant into a constant pool
+and then to use an index to this constant pool. This means \code{ldc}
+will be represented by at least two bytes in the class file. While this
+is sensible for ``large'' constants like strings, it is a bit of
+overkill for small integers (which many integers will be when compiling
+a BF-program). To counter this ``waste'', the JVM has specific
+instructions for small integers, for example
+
+\begin{itemize}
+\item \code{iconst_0},\ldots, \code{iconst_5}
+\item \code{bipush n}
+\end{itemize}
-% 33296 bytes -> 21882
-% shave off 2 seconds
+\noindent
+where the \code{n} is \code{bipush} is between -128 and 128. By having
+dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and
+\code{iconst_m1}), we can make the generated code size smaller as these
+instructions only require 1 Byte (as opposed the generic \code{ldc}
+which needs 1 Byte plus another for the index into the constant pool).
+While in theory the use of such special instructions should make the
+code only smaller, it actually makes the code also run faster. Probably
+because the JVM has to process less code and uses a specific instruction
+in the underlying CPU. The story with \code{bipush} is slightly
+different, because it also uses two Bytes---so it does not result in a
+reduction in code size. But probably it uses specific instruction in
+the underlying CPU which make the JVM code run faster.
+
+
+
+
+\begin{itemize}
+\item \code{iload_0},\ldots, \code{iload_3}
+\item \code{istore_0},\ldots, \code{istore_3}
+\item \code{aload_0},\ldots, \code{aload_3}
+\item \code{astore_0},\ldots, \code{astore_3}
+\end{itemize}
+
+% 33296 bytes -> 21787
+% 21 -> 16 seconds
As you have probably seen, the compiler writer has a lot of freedom
-about how to generate code from what the progarmmer wrote as program.
+about how to generate code from what the programmer wrote as program.
The only condition is that generated code should behave as expected by
the programmer. Then all is fine\ldots mission accomplished! But
sometimes the compiler writer is expected to go an extra mile, or even