handouts/ho07.tex
changeset 709 c112a6cb5e52
parent 708 4980f421b3b0
child 710 183663740fb7
--- a/handouts/ho07.tex	Mon Jan 27 10:11:44 2020 +0000
+++ b/handouts/ho07.tex	Tue Jan 28 12:23:53 2020 +0000
@@ -19,11 +19,11 @@
 The purpose of a compiler is to transform a program a human can read and
 write into code the machine can run as fast as possible. The fastest
 code would be machine code the CPU can run directly, but it is often
-good enough for improving the speed of a program to target a
-virtual machine. This produces not the fastest possible code, but code
-that is often pretty fast. This way of producing code has the advantage that
-the virtual machine takes care of things a compiler would normally need
-to take care of (like explicit memory management). 
+good enough for improving the speed of a program to target a virtual
+machine instead. This produces not the fastest possible code, but code
+that is often pretty fast. This way of producing code has the advantage
+that the virtual machine takes care of things a compiler would normally
+need to take care of (hairy things like explicit memory management). 
 
 As a first example in this module we will implement a compiler for the
 very simple WHILE-language that we parsed in the last lecture. The
@@ -51,14 +51,14 @@
 
 \noindent
 The input will be WHILE-programs; the output will be assembly files
-(with the ending .j). Assembly files essentially contain human-readable
-machine code, meaning they are not just bits and bytes, but rather
-something you can read and understand---with a bit of practice of
-course. An \emph{assembler} will then translate the assembly files into
-unreadable class or binary files the JVM can run. Unfortunately, the
-Java ecosystem does not come with an assembler which would be handy for
-our compiler-endeavour  (unlike Microsoft's  Common Language
-Infrastructure for the .Net platform which has an assembler
+(with the file extension .j). Assembly files essentially contain
+human-readable machine code, meaning they are not just bits and bytes,
+but rather something you can read and understand---with a bit of
+practice of course. An \emph{assembler} will then translate the assembly
+files into unreadable class or binary files the JVM can run.
+Unfortunately, the Java ecosystem does not come with an assembler which
+would be handy for our compiler-endeavour  (unlike Microsoft's  Common
+Language Infrastructure for the .Net platform which has an assembler
 out-of-the-box). As a substitute we shall therefore use the 3rd-party
 programs Jasmin and Krakatau 
 
@@ -86,15 +86,14 @@
 iadd 
 \end{lstlisting}
 
-\noindent The first instruction loads the constant $1$ onto
-the stack, the next one loads $2$, the third instruction adds both
-numbers together replacing the top two elements of the stack
-with the result $3$. For simplicity, we will throughout
-consider only integer numbers. Therefore we can
-use the JVM instructions \code{iadd}, \code{isub},
-\code{imul}, \code{idiv} and so on. The \code{i} stands for
-integer instructions in the JVM (alternatives are \code{d} for
-doubles, \code{l} for longs and \code{f} for floats).
+\noindent The first instruction loads the constant $1$ onto the stack,
+the next one loads $2$, the third instruction adds both numbers together
+replacing the top two elements of the stack with the result $3$. For
+simplicity, we will consider throughout only integer numbers. This means
+our main JVM instructions for arithmetic will be \code{iadd},
+\code{isub}, \code{imul}, \code{idiv} and so on. The \code{i} stands for
+integer instructions in the JVM (alternatives are \code{d} for doubles,
+\code{l} for longs and \code{f} for floats etc).
 
 Recall our grammar for arithmetic expressions (\meta{E} is the
 starting symbol):
@@ -116,8 +115,8 @@
 \noindent where \meta{Id} stands for variables and \meta{Num}
 for numbers. For the moment let us omit variables from arithmetic
 expressions. Our parser will take this grammar and given an input
-produce abstract syntax trees. For example we will obtain for the
-expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
+program produce an abstract syntax tree. For example we will obtain for
+the expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
 
 \begin{center}
 \begin{tikzpicture}
@@ -149,10 +148,12 @@
 will be an important convention we always observe in our compiler. Note,
 that a different bracketing of the expression, for example $(1 + (2 *
 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
-a different list of instructions. Generating code in this
-post-order-traversal fashion is rather easy to implement: it can be done
-with the following recursive \textit{compile}-function, which takes the
-abstract syntax tree as an argument:
+a different list of instructions. 
+
+Generating code in this post-order-traversal fashion is rather easy to
+implement: it can be done with the following recursive
+\textit{compile}-function, which takes the abstract syntax tree as an
+argument:
 
 \begin{center}
 \begin{tabular}{lcl}
@@ -168,12 +169,13 @@
 \end{tabular}
 \end{center}
 
-This is all fine, but our arithmetic expressions can clearly contain
-variables and then this code will not be good enough. To fix this we
-will represent our variables as the \emph{local variables} in the JVM.
-Essentially, local variables are an array or pointers to memory cells,
-containing in our case only integers. Looking up a variable can be done
-with the instruction
+\noindent
+This is all fine, but our arithmetic expressions can contain variables
+and we have not considered them yet. To fix this we will represent our
+variables as the \emph{local variables} of the JVM. Essentially, local
+variables are an array or pointers to memory cells, containing in our
+case only integers. Looking up a variable can be done with the
+instruction
 
 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 iload $index$
@@ -277,7 +279,7 @@
 
 \noindent 
 where $n_x$ is the index (or pointer to the memory) for the variable
-$x$. The code for looking-up the index for the variable is as follow:
+$x$. The Scala code for looking-up the index for the variable is as follow:
 
 \begin{center}
 \begin{tabular}{lcl}
@@ -588,7 +590,7 @@
 
 \noindent yielding the following code
 
-\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
+\begin{lstlisting}[language=JVMIS2,mathescape,numbers=left]
 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
    iload 0
    ldc 10
@@ -612,15 +614,15 @@
 As said, I leave it to you to decide whether the code implements
 the usual controlflow of while-loops.
 
-Next we need to consider the statement \pcode{write x}, which can be
-used to print out the content of a variable. For this we shall use a
+Next we need to consider the WHILE-statement \pcode{write x}, which can
+be used to print out the content of a variable. For this we shall use a
 Java library function. In order to avoid having to generate a lot of
 code for each \pcode{write}-command, we use a separate helper-method and
 just call this method with an appropriate argument (which of course
 needs to be placed onto the stack). The code of the helper-method is as
 follows.
 
-\begin{lstlisting}[language=JVMIS,numbers=left]
+\begin{lstlisting}[language=JVMIS,numbers=left,basicstyle=\ttfamily\small]
 .method public static write(I)V 
     .limit locals 1 
     .limit stack 2 
@@ -631,25 +633,29 @@
 .end method
 \end{lstlisting}
 
-\noindent The first line marks the beginning of the method,
-called \pcode{write}. It takes a single integer argument
-indicated by the \pcode{(I)} and returns no result, indicated
-by the \pcode{V}. Since the method has only one argument, we
-only need a single local variable (Line~2) and a stack with
-two cells will be sufficient (Line 3). Line 4 instructs the
-JVM to get the value of the mem  \pcode{out} of the class
-\pcode{java/lang/System}. It expects the value to be of type
-\pcode{java/io/PrintStream}. A reference to this value will be
-placed on the stack. Line~5 copies the integer we want to
-print out onto the stack. In the next line we call the method
-\pcode{println} (from the class \pcode{java/io/PrintStream}).
-We want to print out an integer and do not expect anything
-back (that is why the type annotation is \pcode{(I)V}). The
-\pcode{return}-instruction in the next line changes the
-control-flow back to the place from where \pcode{write} was
-called. This method needs to be part of a header that is
-included in any code we generate. The helper-method
-\pcode{write} can be invoked with the two instructions
+\noindent The first line marks the beginning of the method, called
+\pcode{write}. It takes a single integer argument indicated by the
+\pcode{(I)} and returns no result, indicated by the \pcode{V} (for
+void). Since the method has only one argument, we only need a single
+local variable (Line~2) and a stack with two cells will be sufficient
+(Line 3). Line 4 instructs the JVM to get the value of the member
+\pcode{out} of the class \pcode{java/lang/System}. It expects the value
+to be of type \pcode{java/io/PrintStream}. A reference to this value
+will be placed on the stack.\footnote{Note the syntax \texttt{L
+\ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
+designers of Jasmin decided that this syntax is pleasing to the eye. So
+if you wanted to have strings in your Jasmin code, you would need to
+write \texttt{Ljava/lang/String;}\;. If you want arrays of one dimension,
+then use \texttt{[\ldots}; two dimensions, use \texttt{[[\ldots} and
+so on. Looks all very ugly to my eyes.} Line~5 copies the integer we
+want to print out onto the stack. In the line after that we call the
+method \pcode{println} (from the class \pcode{java/io/PrintStream}). We
+want to print out an integer and do not expect anything back (that is
+why the type annotation is \pcode{(I)V}). The \pcode{return}-instruction
+in the next line changes the control-flow back to the place from where
+\pcode{write} was called. This method needs to be part of a header that
+is included in any code we generate. The helper-method \pcode{write} can
+be invoked with the two instructions
 
 \begin{lstlisting}[mathescape,language=JVMIS]
 iload $E(x)$ 
@@ -662,22 +668,19 @@
 explained shortly).
 
 
-By generating code for a WHILE-program, we end up with a list
-of (JVM assembly) instructions. Unfortunately, there is a bit
-more boilerplate code needed before these instructions can be
-run. The complete code is shown in Figure~\ref{boiler}. This
-boilerplate code is very specific to the JVM. If we target any
-other virtual machine or a machine language, then we would
-need to change this code. Lines 4 to 8 in Figure~\ref{boiler}
-contain a method for object creation in the JVM; this method
-is called \emph{before} the \pcode{main}-method in Lines 10 to
-17. Interesting are the Lines 11 and 12 where we hardwire that
-the stack of our programs will never be larger than 200 and
-that the maximum number of variables is also 200. This seem to
-be conservative default values that allow is to run some
-simple WHILE-programs. In a real compiler, we would of course
-need to work harder and find out appropriate values for the
-stack and local variables.
+By generating code for a WHILE-program, we end up with a list of (JVM
+assembly) instructions. Unfortunately, there is a bit more boilerplate
+code needed before these instructions can be run. Essentially we have to
+enclose them inside a Java \texttt{main}-method. The corresponding code
+is shown in Figure~\ref{boiler}. This boilerplate code is very specific
+to the JVM. If we target any other virtual machine or a machine
+language, then we would need to change this code.  Interesting are the
+Lines 5 and 6 where we hardwire that the stack of our programs will
+never be larger than 200 and that the maximum number of variables is
+also 200. This seem to be conservative default values that allow is to
+run some simple WHILE-programs. In a real compiler, we would of course
+need to work harder and find out appropriate values for the stack and
+local variables.
 
 \begin{figure}[t]
 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
@@ -693,7 +696,9 @@
     return
 .end method
 \end{lstlisting}
-\caption{The boilerplate code needed for running generated code.\label{boiler}}
+\caption{The boilerplate code needed for running generated code. It
+  hardwires limits for stack space and number of local
+  variables.\label{boiler}}
 \end{figure}
 
 
@@ -709,11 +714,11 @@
 expected. Having this code at our disposal, we need the assembler to
 translate the generated code into JVM bytecode (a class file). This
 bytecode is then understood by the JVM and can be run by just invoking
-the \pcode{java}-program.
+the \pcode{java}-program. Again I let you do the work.
 
 
 \begin{figure}[p]
-\lstinputlisting[language=JVMIS,mathescape]{../progs/test-small.j}
+\lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
 \begin{tikzpicture}[remember picture,overlay]
   \draw[|<->|,very thick] (LA.north) -- (LB.south) 
      node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
@@ -872,6 +877,9 @@
 
 \subsection*{Added Value}
 
+% 33296 bytes -> 21882
+% shave off 2 seconds
+
 As you have probably seen, the compiler writer has a lot of freedom
 about how to generate code from what the progarmmer wrote as program.
 The only condition is that generated code should behave as expected by
@@ -902,7 +910,11 @@
 situations ourselves. Lets assume we want to handle them in the
 following way: if the programmer access a field out-of-bounds, we just
 return the default 0, and if a programmer wants to update an
-out-of-bounds filed, we quietly ignore this update.
+out-of-bounds filed, we want to ``quietly'' ignore this update.
+
+
+arraylength
+
 
 \end{document}