afl-material: comparison handouts/ho07.tex

equal deleted inserted replaced

-:4980f421b3b0
+:c112a6cb5e52
 \section*{Handout 7 (Compilation)}
 The purpose of a compiler is to transform a program a human can read and
 write into code the machine can run as fast as possible. The fastest
 code would be machine code the CPU can run directly, but it is often
-good enough for improving the speed of a program to target a
+good enough for improving the speed of a program to target a virtual
-virtual machine. This produces not the fastest possible code, but code
+machine instead. This produces not the fastest possible code, but code
-that is often pretty fast. This way of producing code has the advantage that
+that is often pretty fast. This way of producing code has the advantage
-the virtual machine takes care of things a compiler would normally need
+that the virtual machine takes care of things a compiler would normally
-to take care of (like explicit memory management).
+need to take care of (hairy things like explicit memory management).
 As a first example in this module we will implement a compiler for the
 very simple WHILE-language that we parsed in the last lecture. The
 compiler will target the Java Virtual Machine (JVM), but not directly.
 Pictorially the compiler will work as follows:
 \end{tikzpicture}
 \end{center}
 \noindent
 The input will be WHILE-programs; the output will be assembly files
-(with the ending .j). Assembly files essentially contain human-readable
+(with the file extension .j). Assembly files essentially contain
-machine code, meaning they are not just bits and bytes, but rather
+human-readable machine code, meaning they are not just bits and bytes,
-something you can read and understand---with a bit of practice of
+but rather something you can read and understand---with a bit of
-course. An \emph{assembler} will then translate the assembly files into
+practice of course. An \emph{assembler} will then translate the assembly
-unreadable class or binary files the JVM can run. Unfortunately, the
+files into unreadable class or binary files the JVM can run.
-Java ecosystem does not come with an assembler which would be handy for
+Unfortunately, the Java ecosystem does not come with an assembler which
-our compiler-endeavour  (unlike Microsoft's  Common Language
+would be handy for our compiler-endeavour  (unlike Microsoft's  Common
-Infrastructure for the .Net platform which has an assembler
+Language Infrastructure for the .Net platform which has an assembler
 out-of-the-box). As a substitute we shall therefore use the 3rd-party
 programs Jasmin and Krakatau
 \begin{itemize}
 \item \url{http://jasmin.sourceforge.net}
 ldc 1
 ldc 2
 iadd
 \end{lstlisting}
-\noindent The first instruction loads the constant $1$ onto
+\noindent The first instruction loads the constant $1$ onto the stack,
-the stack, the next one loads $2$, the third instruction adds both
+the next one loads $2$, the third instruction adds both numbers together
-numbers together replacing the top two elements of the stack
+replacing the top two elements of the stack with the result $3$. For
-with the result $3$. For simplicity, we will throughout
+simplicity, we will consider throughout only integer numbers. This means
-consider only integer numbers. Therefore we can
+our main JVM instructions for arithmetic will be \code{iadd},
-use the JVM instructions \code{iadd}, \code{isub},
+\code{isub}, \code{imul}, \code{idiv} and so on. The \code{i} stands for
-\code{imul}, \code{idiv} and so on. The \code{i} stands for
+integer instructions in the JVM (alternatives are \code{d} for doubles,
-integer instructions in the JVM (alternatives are \code{d} for
+\code{l} for longs and \code{f} for floats etc).
-doubles, \code{l} for longs and \code{f} for floats).
 Recall our grammar for arithmetic expressions (\meta{E} is the
 starting symbol):
 \noindent where \meta{Id} stands for variables and \meta{Num}
 for numbers. For the moment let us omit variables from arithmetic
 expressions. Our parser will take this grammar and given an input
-produce abstract syntax trees. For example we will obtain for the
+program produce an abstract syntax tree. For example we will obtain for
-expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
+the expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
 \begin{center}
 \begin{tikzpicture}
 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
 \end{tikzpicture}
 top of the stack (I leave this to you to verify; the meaning of each
 instruction should be clear). The result being on the top of the stack
 will be an important convention we always observe in our compiler. Note,
 that a different bracketing of the expression, for example $(1 + (2 *
 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
-a different list of instructions. Generating code in this
+a different list of instructions.
-post-order-traversal fashion is rather easy to implement: it can be done
-with the following recursive \textit{compile}-function, which takes the
+Generating code in this post-order-traversal fashion is rather easy to
-abstract syntax tree as an argument:
+implement: it can be done with the following recursive
+\textit{compile}-function, which takes the abstract syntax tree as an
+argument:
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
 \end{tabular}
 \end{center}
-This is all fine, but our arithmetic expressions can clearly contain
+\noindent
-variables and then this code will not be good enough. To fix this we
+This is all fine, but our arithmetic expressions can contain variables
-will represent our variables as the \emph{local variables} in the JVM.
+and we have not considered them yet. To fix this we will represent our
-Essentially, local variables are an array or pointers to memory cells,
+variables as the \emph{local variables} of the JVM. Essentially, local
-containing in our case only integers. Looking up a variable can be done
+variables are an array or pointers to memory cells, containing in our
-with the instruction
+case only integers. Looking up a variable can be done with the
+instruction
 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 iload $index$
 \end{lstlisting}
 istore $n_x$
 \end{lstlisting}
 \noindent
 where $n_x$ is the index (or pointer to the memory) for the variable
-$x$. The code for looking-up the index for the variable is as follow:
+$x$. The Scala code for looking-up the index for the variable is as follow:
 \begin{center}
 \begin{tabular}{lcl}
 $index \;=\; E\textit{.getOrElse}(x, |E|)$
 \end{tabular}
 while x <= 10 do x := x + 1
 \end{lstlisting}
 \noindent yielding the following code
-\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
+\begin{lstlisting}[language=JVMIS2,mathescape,numbers=left]
 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
 iload 0
 ldc 10
 if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
 iload 0
 \noindent
 As said, I leave it to you to decide whether the code implements
 the usual controlflow of while-loops.
-Next we need to consider the statement \pcode{write x}, which can be
+Next we need to consider the WHILE-statement \pcode{write x}, which can
-used to print out the content of a variable. For this we shall use a
+be used to print out the content of a variable. For this we shall use a
 Java library function. In order to avoid having to generate a lot of
 code for each \pcode{write}-command, we use a separate helper-method and
 just call this method with an appropriate argument (which of course
 needs to be placed onto the stack). The code of the helper-method is as
 follows.
-\begin{lstlisting}[language=JVMIS,numbers=left]
+\begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small]
 .method public static write(I)V
 .limit locals 1
 .limit stack 2
 getstatic java/lang/System/out Ljava/io/PrintStream;
 iload 0
 invokevirtual java/io/PrintStream/println(I)V
 return
 .end method
 \end{lstlisting}
-\noindent The first line marks the beginning of the method,
+\noindent The first line marks the beginning of the method, called
-called \pcode{write}. It takes a single integer argument
+\pcode{write}. It takes a single integer argument indicated by the
-indicated by the \pcode{(I)} and returns no result, indicated
+\pcode{(I)} and returns no result, indicated by the \pcode{V} (for
-by the \pcode{V}. Since the method has only one argument, we
+void). Since the method has only one argument, we only need a single
-only need a single local variable (Line~2) and a stack with
+local variable (Line~2) and a stack with two cells will be sufficient
-two cells will be sufficient (Line 3). Line 4 instructs the
+(Line 3). Line 4 instructs the JVM to get the value of the member
-JVM to get the value of the mem  \pcode{out} of the class
+\pcode{out} of the class \pcode{java/lang/System}. It expects the value
-\pcode{java/lang/System}. It expects the value to be of type
+to be of type \pcode{java/io/PrintStream}. A reference to this value
-\pcode{java/io/PrintStream}. A reference to this value will be
+will be placed on the stack.\footnote{Note the syntax \texttt{L
-placed on the stack. Line~5 copies the integer we want to
+\ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
-print out onto the stack. In the next line we call the method
+designers of Jasmin decided that this syntax is pleasing to the eye. So
-\pcode{println} (from the class \pcode{java/io/PrintStream}).
+if you wanted to have strings in your Jasmin code, you would need to
-We want to print out an integer and do not expect anything
+write \texttt{Ljava/lang/String;}\;. If you want arrays of one dimension,
-back (that is why the type annotation is \pcode{(I)V}). The
+then use \texttt{[\ldots}; two dimensions, use \texttt{[[\ldots} and
-\pcode{return}-instruction in the next line changes the
+so on. Looks all very ugly to my eyes.} Line~5 copies the integer we
-control-flow back to the place from where \pcode{write} was
+want to print out onto the stack. In the line after that we call the
-called. This method needs to be part of a header that is
+method \pcode{println} (from the class \pcode{java/io/PrintStream}). We
-included in any code we generate. The helper-method
+want to print out an integer and do not expect anything back (that is
-\pcode{write} can be invoked with the two instructions
+why the type annotation is \pcode{(I)V}). The \pcode{return}-instruction
+in the next line changes the control-flow back to the place from where
+\pcode{write} was called. This method needs to be part of a header that
+is included in any code we generate. The helper-method \pcode{write} can
+be invoked with the two instructions
 \begin{lstlisting}[mathescape,language=JVMIS]
 iload $E(x)$
 invokestatic XXX/XXX/write(I)V
 \end{lstlisting}
 top of the stack and then call \pcode{write}. The \pcode{XXX}
 need to be replaced by an appropriate class name (this will be
 explained shortly).
-By generating code for a WHILE-program, we end up with a list
+By generating code for a WHILE-program, we end up with a list of (JVM
-of (JVM assembly) instructions. Unfortunately, there is a bit
+assembly) instructions. Unfortunately, there is a bit more boilerplate
-more boilerplate code needed before these instructions can be
+code needed before these instructions can be run. Essentially we have to
-run. The complete code is shown in Figure~\ref{boiler}. This
+enclose them inside a Java \texttt{main}-method. The corresponding code
-boilerplate code is very specific to the JVM. If we target any
+is shown in Figure~\ref{boiler}. This boilerplate code is very specific
-other virtual machine or a machine language, then we would
+to the JVM. If we target any other virtual machine or a machine
-need to change this code. Lines 4 to 8 in Figure~\ref{boiler}
+language, then we would need to change this code.  Interesting are the
-contain a method for object creation in the JVM; this method
+Lines 5 and 6 where we hardwire that the stack of our programs will
-is called \emph{before} the \pcode{main}-method in Lines 10 to
+never be larger than 200 and that the maximum number of variables is
-17. Interesting are the Lines 11 and 12 where we hardwire that
+also 200. This seem to be conservative default values that allow is to
-the stack of our programs will never be larger than 200 and
+run some simple WHILE-programs. In a real compiler, we would of course
-that the maximum number of variables is also 200. This seem to
+need to work harder and find out appropriate values for the stack and
-be conservative default values that allow is to run some
+local variables.
-simple WHILE-programs. In a real compiler, we would of course
-need to work harder and find out appropriate values for the
-stack and local variables.
 \begin{figure}[t]
 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
 .class public XXX.XXX
 .super java/lang/Object
 $\textit{\ldots{}here comes the compiled code\ldots}$
 return
 .end method
 \end{lstlisting}
-\caption{The boilerplate code needed for running generated code.\label{boiler}}
+\caption{The boilerplate code needed for running generated code. It
+hardwires limits for stack space and number of local
+variables.\label{boiler}}
 \end{figure}
 To sum up, in Figure~\ref{test} is the complete code generated
 for the slightly nonsensical program
 \noindent I let you read the code and make sure the code behaves as
 expected. Having this code at our disposal, we need the assembler to
 translate the generated code into JVM bytecode (a class file). This
 bytecode is then understood by the JVM and can be run by just invoking
-the \pcode{java}-program.
+the \pcode{java}-program. Again I let you do the work.
 \begin{figure}[p]
-\lstinputlisting[language=JVMIS,mathescape]{../progs/test-small.j}
+\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
 \begin{tikzpicture}[remember picture,overlay]
 \draw[|<->|,very thick] (LA.north) -- (LB.south)
 node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}};
 \draw[|<->|,very thick] (LC.north) -- (LD.south)
 node[left=0mm,midway] {\footnotesize\texttt{write x}};
 replicating the 20 secs with an interpreter! OK, we now face the
 nagging question about what to do next\ldots
 \subsection*{Added Value}
+% 33296 bytes -> 21882
+% shave off 2 seconds
 As you have probably seen, the compiler writer has a lot of freedom
 about how to generate code from what the progarmmer wrote as program.
 The only condition is that generated code should behave as expected by
 the programmer. Then all is fine\ldots mission accomplished! But
 sometimes the compiler writer is expected to go an extra mile, or even
 Imagine we do not want to rely in our compiler on the JVM for producing
 an annoying, but safe exception trace, rather we want to handle such
 situations ourselves. Lets assume we want to handle them in the
 following way: if the programmer access a field out-of-bounds, we just
 return the default 0, and if a programmer wants to update an
-out-of-bounds filed, we quietly ignore this update.
+out-of-bounds filed, we want to ``quietly'' ignore this update.
+arraylength
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 709	c112a6cb5e52
parent 708	4980f421b3b0
child 710	183663740fb7