afl-material: comparison handouts/ho07.tex

equal deleted inserted replaced

-:c112a6cb5e52
+:183663740fb7
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
 \usepackage{../grammar}
 \usepackage{../graphics}
+\usepackage{framed}
-%% add safety check on references...whether it is above 0 and below the
+\usepackage[belowskip=7pt,aboveskip=0pt]{caption}
-%% index
 \begin{document}
 The purpose of a compiler is to transform a program a human can read and
 write into code the machine can run as fast as possible. The fastest
 code would be machine code the CPU can run directly, but it is often
 good enough for improving the speed of a program to target a virtual
 machine instead. This produces not the fastest possible code, but code
-that is often pretty fast. This way of producing code has the advantage
+that is often pretty fast. This way of producing code has also the
-that the virtual machine takes care of things a compiler would normally
+advantage that the virtual machine takes care of things a compiler would
-need to take care of (hairy things like explicit memory management).
+normally need to take care of (hairy things like explicit memory
+management).
 As a first example in this module we will implement a compiler for the
 very simple WHILE-language that we parsed in the last lecture. The
 compiler will target the Java Virtual Machine (JVM), but not directly.
 Pictorially the compiler will work as follows:
 \begin{center}
 \begin{tikzpicture}[scale=1,font=\bf,
 node/.style={
 rectangle,rounded corners=3mm,
 ultra thick,draw=black!50,minimum height=18mm,
 The input will be WHILE-programs; the output will be assembly files
 (with the file extension .j). Assembly files essentially contain
 human-readable machine code, meaning they are not just bits and bytes,
 but rather something you can read and understand---with a bit of
 practice of course. An \emph{assembler} will then translate the assembly
-files into unreadable class or binary files the JVM can run.
+files into unreadable class- or binary-files the JVM can run.
 Unfortunately, the Java ecosystem does not come with an assembler which
 would be handy for our compiler-endeavour  (unlike Microsoft's  Common
 Language Infrastructure for the .Net platform which has an assembler
-out-of-the-box). As a substitute we shall therefore use the 3rd-party
+out-of-the-box). As a substitute we shall use the 3rd-party
 programs Jasmin and Krakatau
 \begin{itemize}
 \item \url{http://jasmin.sourceforge.net}
 \item \url{https://github.com/Storyyeller/Krakatau}
 \noindent
 The first is a Java program and the second a program written in Python.
 Each of them allow us to generate \emph{assembly} files that are still
 readable by humans, as opposed to class-files which are pretty much just
 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
-an assembly file as input and generate the corresponding class file for
+our assembly files as input and generate the corresponding class-files for
 us.
-Good about the JVM is that it is a stack-based virtual machine, a fact
+What is good about the JVM is that it is a stack-based virtual machine,
-which will make it easy to generate code for arithmetic expressions. For
+a fact which will make it easy to generate code for arithmetic
-example when compiling the expression $1 + 2$ we need to generate the
+expressions. For example when compiling the expression $1 + 2$ we need
-following three instructions
+to generate the following three instructions
 \begin{lstlisting}[language=JVMIS,numbers=none]
 ldc 1
 ldc 2
 iadd
 \end{lstlisting}
 \noindent The first instruction loads the constant $1$ onto the stack,
 the next one loads $2$, the third instruction adds both numbers together
 replacing the top two elements of the stack with the result $3$. For
-simplicity, we will consider throughout only integer numbers. This means
+simplicity, we will consider throughout only arithmetic involving
-our main JVM instructions for arithmetic will be \code{iadd},
+integer numbers. This means our main JVM instructions for arithmetic
-\code{isub}, \code{imul}, \code{idiv} and so on. The \code{i} stands for
+will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
-integer instructions in the JVM (alternatives are \code{d} for doubles,
+The \code{i} stands for integer instructions in the JVM (alternatives
-\code{l} for longs and \code{f} for floats etc).
+are \code{d} for doubles, \code{l} for longs and \code{f} for floats
+etc).
 Recall our grammar for arithmetic expressions (\meta{E} is the
 starting symbol):
 \end{center}
 \noindent
 This is all fine, but our arithmetic expressions can contain variables
 and we have not considered them yet. To fix this we will represent our
-variables as the \emph{local variables} of the JVM. Essentially, local
+variables as \emph{local variables} of the JVM. Essentially, local
 variables are an array or pointers to memory cells, containing in our
 case only integers. Looking up a variable can be done with the
 instruction
 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 the index is and return the environment unchanged (that is in this case
 $E' = E$). However, if this is the first encounter of the variable $x$
 in the program, then we have to augment the environment and assign $x$
 with the largest index in $E$ plus one (that is $E' = E(x \mapsto
 largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we
-generate the following code
+generate the following code snippet
 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 iload $n_x$
 ldc 1
 iadd
 to be of type \pcode{java/io/PrintStream}. A reference to this value
 will be placed on the stack.\footnote{Note the syntax \texttt{L
 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
 designers of Jasmin decided that this syntax is pleasing to the eye. So
 if you wanted to have strings in your Jasmin code, you would need to
-write \texttt{Ljava/lang/String;}\;. If you want arrays of one dimension,
+write \texttt{Ljava/lang/String;}\;. If you want arrays of one
-then use \texttt{[\ldots}; two dimensions, use \texttt{[[\ldots} and
+dimension, then use \texttt{[\ldots}; two dimensions, use
-so on. Looks all very ugly to my eyes.} Line~5 copies the integer we
+\texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5
-want to print out onto the stack. In the line after that we call the
+copies the integer we want to print out onto the stack. In the line
-method \pcode{println} (from the class \pcode{java/io/PrintStream}). We
+after that we call the method \pcode{println} (from the class
-want to print out an integer and do not expect anything back (that is
+\pcode{java/io/PrintStream}). We want to print out an integer and do not
-why the type annotation is \pcode{(I)V}). The \pcode{return}-instruction
+expect anything back (that is why the type annotation is \pcode{(I)V}).
-in the next line changes the control-flow back to the place from where
+The \pcode{return}-instruction in the next line changes the control-flow
-\pcode{write} was called. This method needs to be part of a header that
+back to the place from where \pcode{write} was called. This method needs
-is included in any code we generate. The helper-method \pcode{write} can
+to be part of a header that is included in any code we generate. The
-be invoked with the two instructions
+helper-method \pcode{write} can be invoked with the two instructions
 \begin{lstlisting}[mathescape,language=JVMIS]
 iload $E(x)$
 invokestatic XXX/XXX/write(I)V
 \end{lstlisting}
 run some simple WHILE-programs. In a real compiler, we would of course
 need to work harder and find out appropriate values for the stack and
 local variables.
 \begin{figure}[t]
+\begin{framed}
 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
 .class public XXX.XXX
 .super java/lang/Object
 .method public static main([Ljava/lang/String;)V
 $\textit{\ldots{}here comes the compiled code\ldots}$
 return
 .end method
 \end{lstlisting}
+\end{framed}
 \caption{The boilerplate code needed for running generated code. It
 hardwires limits for stack space and number of local
 variables.\label{boiler}}
 \end{figure}
 bytecode is then understood by the JVM and can be run by just invoking
 the \pcode{java}-program. Again I let you do the work.
 \begin{figure}[p]
+\begin{framed}
 \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
 \begin{tikzpicture}[remember picture,overlay]
 \draw[|<->|,very thick] (LA.north) -- (LB.south)
-node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
+node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
 \draw[|<->|,very thick] (LC.north) -- (LD.south)
-node[left=0mm,midway] {\footnotesize\texttt{write x}};
+node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
 \end{tikzpicture}
+\end{framed}
 \caption{The generated code for the test program \texttt{x := 1 + 2; write
 x}. This code can be processed by a Java assembler producing a
 class-file, which can then be run by the {\tt{}java}-program.\label{test}}
 \end{figure}
 }
 \end{lstlisting}
 \noindent
 The idea behind the translation is that BF-programs operate on an array,
-called here \texttt{mem}. The BP-memory pointer into this array is
+called here \texttt{mem}. The BF-memory pointer into this array is
 represented as the variable \texttt{ptr}. As usual the BF-instructions
 \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
 instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
-Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary variable
+Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary
-since we have not changed our write functions in order to cope with
+variable since we have not changed our write functions in order to cope
-writing out any array-content directly. Lines 7 and 8 are for
+with writing out any array-content directly. Lines 7 and 8 are for
 translating BF-loops. Line 8 is interesting in the sense that we need to
 generate a \code{skip} instruction just before finishing with the
 closing \code{"\}"}. The reason is that we are rather pedantic about
 semicolons in our WHILE-grammar: the last command cannot have a
-semicolon---adding a \code{skip} works around this snag. Putting all
+semicolon---adding a \code{skip} works around this snag.
-this together is we can generate WHILE-programs with more than 400
-instructions and then run the compiled JVM code for such programs.
+Putting all this together and we can generate WHILE-programs with more
-\bigskip
+than 15K JVM-instructions; run the compiled JVM code for such
+programs and marvel at the output\ldots\medskip
-\noindent
-Hooooray, we can finally run the BF-mandelbrot program on the JVM and it
+\noindent
-completes within 20 seconds (after nearly 10 minutes of parsing the
+\ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it
-corresponding WHILE-program and generating 270K of a class file). Try
+completes within 20 or so seconds (after nearly 10 minutes of parsing
-replicating the 20 secs with an interpreter! OK, we now face the
+the corresponding WHILE-program; the size of the resulting class files
-nagging question about what to do next\ldots
+is around 32K). Try replicating the 20 secs with an interpreter! The
+good point is that we now have a sufficiently complicated program in our
-\subsection*{Added Value}
+WHILE-language in order to do some benchmarking. Which means we now face
+the question about what to do next\ldots
-% 33296 bytes -> 21882
-% shave off 2 seconds
+\subsection*{Optimisations \& Co}
+Every compiler that deserves its name performs some optimisation on the
+code. If we make the extra effort of writing a compiler for a language,
+then obviously we want to have our code to run as fast as possible.
+So let's optimise a bit the code we generate. There is actually one
+aspect in our generated code where we can make easily efficiency gains:
+this has to do with some of the quirks of the JVM. Whenever we push a
+constant onto the stack, we used the JVM instruction \code{ldc
+some_const}. This is a rather generic instructions in the sense that it
+works not just for integers but also for strings, objects and so on.
+What this instruction does is to put the constant into a constant pool
+and then to use an index to this constant pool. This means \code{ldc}
+will be represented by at least two bytes in the class file. While this
+is sensible for ``large'' constants like strings, it is a bit of
+overkill for small integers (which many integers will be when compiling
+a BF-program). To counter this ``waste'', the JVM has specific
+instructions for small integers, for example
+\begin{itemize}
+\item \code{iconst_0},\ldots, \code{iconst_5}
+\item \code{bipush n}
+\end{itemize}
+\noindent
+where the \code{n} is \code{bipush} is between -128 and 128.   By having
+dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and
+\code{iconst_m1}), we can make the generated code size smaller as these
+instructions only require 1 Byte (as opposed the generic \code{ldc}
+which needs 1 Byte plus another for the index into the constant pool).
+While in theory the use of such special instructions should make the
+code only smaller, it actually makes the code also run faster. Probably
+because the JVM has to process less code and uses a specific instruction
+in the underlying CPU.  The story with \code{bipush} is slightly
+different, because it also uses two Bytes---so it does not result in a
+reduction in code size. But probably it uses  specific instruction in
+the underlying CPU which make the JVM code run faster.
+\begin{itemize}
+\item \code{iload_0},\ldots, \code{iload_3}
+\item \code{istore_0},\ldots, \code{istore_3}
+\item \code{aload_0},\ldots, \code{aload_3}
+\item \code{astore_0},\ldots, \code{astore_3}
+\end{itemize}
+% 33296 bytes -> 21787
+% 21 ->  16 seconds
 As you have probably seen, the compiler writer has a lot of freedom
-about how to generate code from what the progarmmer wrote as program.
+about how to generate code from what the programmer wrote as program.
 The only condition is that generated code should behave as expected by
 the programmer. Then all is fine\ldots mission accomplished! But
 sometimes the compiler writer is expected to go an extra mile, or even
 miles. Suppose we are given the following WHILE-program:

changeset 710	183663740fb7
parent 709	c112a6cb5e52
child 711	6f3f3dd01786