handouts/ho07.tex
changeset 710 183663740fb7
parent 709 c112a6cb5e52
child 711 6f3f3dd01786
equal deleted inserted replaced
709:c112a6cb5e52 710:183663740fb7
     2 \documentclass{article}
     2 \documentclass{article}
     3 \usepackage{../style}
     3 \usepackage{../style}
     4 \usepackage{../langs}
     4 \usepackage{../langs}
     5 \usepackage{../grammar}
     5 \usepackage{../grammar}
     6 \usepackage{../graphics}
     6 \usepackage{../graphics}
     7 
     7 \usepackage{framed}
     8 %% add safety check on references...whether it is above 0 and below the 
     8 \usepackage[belowskip=7pt,aboveskip=0pt]{caption}
     9 %% index
       
    10 
     9 
    11 
    10 
    12 
    11 
    13 
    12 
    14 \begin{document}
    13 \begin{document}
    19 The purpose of a compiler is to transform a program a human can read and
    18 The purpose of a compiler is to transform a program a human can read and
    20 write into code the machine can run as fast as possible. The fastest
    19 write into code the machine can run as fast as possible. The fastest
    21 code would be machine code the CPU can run directly, but it is often
    20 code would be machine code the CPU can run directly, but it is often
    22 good enough for improving the speed of a program to target a virtual
    21 good enough for improving the speed of a program to target a virtual
    23 machine instead. This produces not the fastest possible code, but code
    22 machine instead. This produces not the fastest possible code, but code
    24 that is often pretty fast. This way of producing code has the advantage
    23 that is often pretty fast. This way of producing code has also the
    25 that the virtual machine takes care of things a compiler would normally
    24 advantage that the virtual machine takes care of things a compiler would
    26 need to take care of (hairy things like explicit memory management). 
    25 normally need to take care of (hairy things like explicit memory
       
    26 management). 
    27 
    27 
    28 As a first example in this module we will implement a compiler for the
    28 As a first example in this module we will implement a compiler for the
    29 very simple WHILE-language that we parsed in the last lecture. The
    29 very simple WHILE-language that we parsed in the last lecture. The
    30 compiler will target the Java Virtual Machine (JVM), but not directly.
    30 compiler will target the Java Virtual Machine (JVM), but not directly.
    31 Pictorially the compiler will work as follows:
    31 Pictorially the compiler will work as follows:
    32 
    32  
    33 \begin{center}
    33 \begin{center}
    34   \begin{tikzpicture}[scale=1,font=\bf,
    34   \begin{tikzpicture}[scale=1,font=\bf,
    35                       node/.style={
    35                       node/.style={
    36                       rectangle,rounded corners=3mm,
    36                       rectangle,rounded corners=3mm,
    37                       ultra thick,draw=black!50,minimum height=18mm, 
    37                       ultra thick,draw=black!50,minimum height=18mm, 
    53 The input will be WHILE-programs; the output will be assembly files
    53 The input will be WHILE-programs; the output will be assembly files
    54 (with the file extension .j). Assembly files essentially contain
    54 (with the file extension .j). Assembly files essentially contain
    55 human-readable machine code, meaning they are not just bits and bytes,
    55 human-readable machine code, meaning they are not just bits and bytes,
    56 but rather something you can read and understand---with a bit of
    56 but rather something you can read and understand---with a bit of
    57 practice of course. An \emph{assembler} will then translate the assembly
    57 practice of course. An \emph{assembler} will then translate the assembly
    58 files into unreadable class or binary files the JVM can run.
    58 files into unreadable class- or binary-files the JVM can run.
    59 Unfortunately, the Java ecosystem does not come with an assembler which
    59 Unfortunately, the Java ecosystem does not come with an assembler which
    60 would be handy for our compiler-endeavour  (unlike Microsoft's  Common
    60 would be handy for our compiler-endeavour  (unlike Microsoft's  Common
    61 Language Infrastructure for the .Net platform which has an assembler
    61 Language Infrastructure for the .Net platform which has an assembler
    62 out-of-the-box). As a substitute we shall therefore use the 3rd-party
    62 out-of-the-box). As a substitute we shall use the 3rd-party
    63 programs Jasmin and Krakatau 
    63 programs Jasmin and Krakatau 
    64 
    64 
    65 \begin{itemize}
    65 \begin{itemize}
    66   \item \url{http://jasmin.sourceforge.net}
    66   \item \url{http://jasmin.sourceforge.net}
    67   \item \url{https://github.com/Storyyeller/Krakatau}
    67   \item \url{https://github.com/Storyyeller/Krakatau}
    70 \noindent
    70 \noindent
    71 The first is a Java program and the second a program written in Python.
    71 The first is a Java program and the second a program written in Python.
    72 Each of them allow us to generate \emph{assembly} files that are still
    72 Each of them allow us to generate \emph{assembly} files that are still
    73 readable by humans, as opposed to class-files which are pretty much just
    73 readable by humans, as opposed to class-files which are pretty much just
    74 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
    74 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
    75 an assembly file as input and generate the corresponding class file for
    75 our assembly files as input and generate the corresponding class-files for
    76 us. 
    76 us. 
    77 
    77 
    78 Good about the JVM is that it is a stack-based virtual machine, a fact
    78 What is good about the JVM is that it is a stack-based virtual machine,
    79 which will make it easy to generate code for arithmetic expressions. For
    79 a fact which will make it easy to generate code for arithmetic
    80 example when compiling the expression $1 + 2$ we need to generate the
    80 expressions. For example when compiling the expression $1 + 2$ we need
    81 following three instructions
    81 to generate the following three instructions
    82 
    82 
    83 \begin{lstlisting}[language=JVMIS,numbers=none]
    83 \begin{lstlisting}[language=JVMIS,numbers=none]
    84 ldc 1
    84 ldc 1
    85 ldc 2
    85 ldc 2
    86 iadd 
    86 iadd 
    87 \end{lstlisting}
    87 \end{lstlisting}
    88 
    88 
    89 \noindent The first instruction loads the constant $1$ onto the stack,
    89 \noindent The first instruction loads the constant $1$ onto the stack,
    90 the next one loads $2$, the third instruction adds both numbers together
    90 the next one loads $2$, the third instruction adds both numbers together
    91 replacing the top two elements of the stack with the result $3$. For
    91 replacing the top two elements of the stack with the result $3$. For
    92 simplicity, we will consider throughout only integer numbers. This means
    92 simplicity, we will consider throughout only arithmetic involving
    93 our main JVM instructions for arithmetic will be \code{iadd},
    93 integer numbers. This means our main JVM instructions for arithmetic
    94 \code{isub}, \code{imul}, \code{idiv} and so on. The \code{i} stands for
    94 will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
    95 integer instructions in the JVM (alternatives are \code{d} for doubles,
    95 The \code{i} stands for integer instructions in the JVM (alternatives
    96 \code{l} for longs and \code{f} for floats etc).
    96 are \code{d} for doubles, \code{l} for longs and \code{f} for floats
       
    97 etc).
    97 
    98 
    98 Recall our grammar for arithmetic expressions (\meta{E} is the
    99 Recall our grammar for arithmetic expressions (\meta{E} is the
    99 starting symbol):
   100 starting symbol):
   100 
   101 
   101 
   102 
   170 \end{center}
   171 \end{center}
   171 
   172 
   172 \noindent
   173 \noindent
   173 This is all fine, but our arithmetic expressions can contain variables
   174 This is all fine, but our arithmetic expressions can contain variables
   174 and we have not considered them yet. To fix this we will represent our
   175 and we have not considered them yet. To fix this we will represent our
   175 variables as the \emph{local variables} of the JVM. Essentially, local
   176 variables as \emph{local variables} of the JVM. Essentially, local
   176 variables are an array or pointers to memory cells, containing in our
   177 variables are an array or pointers to memory cells, containing in our
   177 case only integers. Looking up a variable can be done with the
   178 case only integers. Looking up a variable can be done with the
   178 instruction
   179 instruction
   179 
   180 
   180 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   181 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   266 the index is and return the environment unchanged (that is in this case
   267 the index is and return the environment unchanged (that is in this case
   267 $E' = E$). However, if this is the first encounter of the variable $x$
   268 $E' = E$). However, if this is the first encounter of the variable $x$
   268 in the program, then we have to augment the environment and assign $x$
   269 in the program, then we have to augment the environment and assign $x$
   269 with the largest index in $E$ plus one (that is $E' = E(x \mapsto
   270 with the largest index in $E$ plus one (that is $E' = E(x \mapsto
   270 largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we
   271 largest\_index + 1)$). To sum up, for the assignment $x := x + 1$ we
   271 generate the following code
   272 generate the following code snippet
   272 
   273 
   273 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   274 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   274 iload $n_x$
   275 iload $n_x$
   275 ldc 1
   276 ldc 1
   276 iadd
   277 iadd
   643 to be of type \pcode{java/io/PrintStream}. A reference to this value
   644 to be of type \pcode{java/io/PrintStream}. A reference to this value
   644 will be placed on the stack.\footnote{Note the syntax \texttt{L
   645 will be placed on the stack.\footnote{Note the syntax \texttt{L
   645 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
   646 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
   646 designers of Jasmin decided that this syntax is pleasing to the eye. So
   647 designers of Jasmin decided that this syntax is pleasing to the eye. So
   647 if you wanted to have strings in your Jasmin code, you would need to
   648 if you wanted to have strings in your Jasmin code, you would need to
   648 write \texttt{Ljava/lang/String;}\;. If you want arrays of one dimension,
   649 write \texttt{Ljava/lang/String;}\;. If you want arrays of one
   649 then use \texttt{[\ldots}; two dimensions, use \texttt{[[\ldots} and
   650 dimension, then use \texttt{[\ldots}; two dimensions, use
   650 so on. Looks all very ugly to my eyes.} Line~5 copies the integer we
   651 \texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5
   651 want to print out onto the stack. In the line after that we call the
   652 copies the integer we want to print out onto the stack. In the line
   652 method \pcode{println} (from the class \pcode{java/io/PrintStream}). We
   653 after that we call the method \pcode{println} (from the class
   653 want to print out an integer and do not expect anything back (that is
   654 \pcode{java/io/PrintStream}). We want to print out an integer and do not
   654 why the type annotation is \pcode{(I)V}). The \pcode{return}-instruction
   655 expect anything back (that is why the type annotation is \pcode{(I)V}).
   655 in the next line changes the control-flow back to the place from where
   656 The \pcode{return}-instruction in the next line changes the control-flow
   656 \pcode{write} was called. This method needs to be part of a header that
   657 back to the place from where \pcode{write} was called. This method needs
   657 is included in any code we generate. The helper-method \pcode{write} can
   658 to be part of a header that is included in any code we generate. The
   658 be invoked with the two instructions
   659 helper-method \pcode{write} can be invoked with the two instructions
   659 
   660 
   660 \begin{lstlisting}[mathescape,language=JVMIS]
   661 \begin{lstlisting}[mathescape,language=JVMIS]
   661 iload $E(x)$ 
   662 iload $E(x)$ 
   662 invokestatic XXX/XXX/write(I)V
   663 invokestatic XXX/XXX/write(I)V
   663 \end{lstlisting}
   664 \end{lstlisting}
   681 run some simple WHILE-programs. In a real compiler, we would of course
   682 run some simple WHILE-programs. In a real compiler, we would of course
   682 need to work harder and find out appropriate values for the stack and
   683 need to work harder and find out appropriate values for the stack and
   683 local variables.
   684 local variables.
   684 
   685 
   685 \begin{figure}[t]
   686 \begin{figure}[t]
       
   687 \begin{framed}
   686 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
   688 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
   687 .class public XXX.XXX
   689 .class public XXX.XXX
   688 .super java/lang/Object
   690 .super java/lang/Object
   689 
   691 
   690 .method public static main([Ljava/lang/String;)V
   692 .method public static main([Ljava/lang/String;)V
   694       $\textit{\ldots{}here comes the compiled code\ldots}$
   696       $\textit{\ldots{}here comes the compiled code\ldots}$
   695 
   697 
   696     return
   698     return
   697 .end method
   699 .end method
   698 \end{lstlisting}
   700 \end{lstlisting}
       
   701 \end{framed}
   699 \caption{The boilerplate code needed for running generated code. It
   702 \caption{The boilerplate code needed for running generated code. It
   700   hardwires limits for stack space and number of local
   703   hardwires limits for stack space and number of local
   701   variables.\label{boiler}}
   704   variables.\label{boiler}}
   702 \end{figure}
   705 \end{figure}
   703 
   706 
   716 bytecode is then understood by the JVM and can be run by just invoking
   719 bytecode is then understood by the JVM and can be run by just invoking
   717 the \pcode{java}-program. Again I let you do the work.
   720 the \pcode{java}-program. Again I let you do the work.
   718 
   721 
   719 
   722 
   720 \begin{figure}[p]
   723 \begin{figure}[p]
       
   724 \begin{framed}
   721 \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
   725 \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j}
   722 \begin{tikzpicture}[remember picture,overlay]
   726 \begin{tikzpicture}[remember picture,overlay]
   723   \draw[|<->|,very thick] (LA.north) -- (LB.south) 
   727   \draw[|<->|,very thick] (LA.north) -- (LB.south) 
   724      node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
   728      node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; 
   725   \draw[|<->|,very thick] (LC.north) -- (LD.south)
   729   \draw[|<->|,very thick] (LC.north) -- (LD.south)
   726      node[left=0mm,midway] {\footnotesize\texttt{write x}};
   730      node[left=-0.5mm,midway] {\footnotesize\texttt{write x}};
   727 \end{tikzpicture}
   731 \end{tikzpicture}
       
   732 \end{framed}
   728 \caption{The generated code for the test program \texttt{x := 1 + 2; write
   733 \caption{The generated code for the test program \texttt{x := 1 + 2; write
   729 x}. This code can be processed by a Java assembler producing a
   734 x}. This code can be processed by a Java assembler producing a
   730 class-file, which can then be run by the {\tt{}java}-program.\label{test}}
   735 class-file, which can then be run by the {\tt{}java}-program.\label{test}}
   731 \end{figure}
   736 \end{figure}
   732 
   737 
   850 }
   855 }
   851 \end{lstlisting}
   856 \end{lstlisting}
   852 
   857 
   853 \noindent 
   858 \noindent 
   854 The idea behind the translation is that BF-programs operate on an array,
   859 The idea behind the translation is that BF-programs operate on an array,
   855 called here \texttt{mem}. The BP-memory pointer into this array is
   860 called here \texttt{mem}. The BF-memory pointer into this array is
   856 represented as the variable \texttt{ptr}. As usual the BF-instructions
   861 represented as the variable \texttt{ptr}. As usual the BF-instructions
   857 \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
   862 \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The
   858 instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
   863 instructions \code{+} and \code{-} update a cell in \texttt{mem}. In
   859 Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary variable
   864 Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary
   860 since we have not changed our write functions in order to cope with
   865 variable since we have not changed our write functions in order to cope
   861 writing out any array-content directly. Lines 7 and 8 are for
   866 with writing out any array-content directly. Lines 7 and 8 are for
   862 translating BF-loops. Line 8 is interesting in the sense that we need to
   867 translating BF-loops. Line 8 is interesting in the sense that we need to
   863 generate a \code{skip} instruction just before finishing with the
   868 generate a \code{skip} instruction just before finishing with the
   864 closing \code{"\}"}. The reason is that we are rather pedantic about
   869 closing \code{"\}"}. The reason is that we are rather pedantic about
   865 semicolons in our WHILE-grammar: the last command cannot have a
   870 semicolons in our WHILE-grammar: the last command cannot have a
   866 semicolon---adding a \code{skip} works around this snag. Putting all
   871 semicolon---adding a \code{skip} works around this snag. 
   867 this together is we can generate WHILE-programs with more than 400
   872 
   868 instructions and then run the compiled JVM code for such programs.
   873 Putting all this together and we can generate WHILE-programs with more
   869 \bigskip
   874 than 15K JVM-instructions; run the compiled JVM code for such
   870 
   875 programs and marvel at the output\ldots\medskip
   871 \noindent
   876 
   872 Hooooray, we can finally run the BF-mandelbrot program on the JVM and it
   877 \noindent
   873 completes within 20 seconds (after nearly 10 minutes of parsing the
   878 \ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it
   874 corresponding WHILE-program and generating 270K of a class file). Try
   879 completes within 20 or so seconds (after nearly 10 minutes of parsing
   875 replicating the 20 secs with an interpreter! OK, we now face the
   880 the corresponding WHILE-program; the size of the resulting class files
   876 nagging question about what to do next\ldots
   881 is around 32K). Try replicating the 20 secs with an interpreter! The
   877 
   882 good point is that we now have a sufficiently complicated program in our
   878 \subsection*{Added Value}
   883 WHILE-language in order to do some benchmarking. Which means we now face
   879 
   884 the question about what to do next\ldots
   880 % 33296 bytes -> 21882
   885 
   881 % shave off 2 seconds
   886 \subsection*{Optimisations \& Co}
       
   887 
       
   888 Every compiler that deserves its name performs some optimisation on the
       
   889 code. If we make the extra effort of writing a compiler for a language,
       
   890 then obviously we want to have our code to run as fast as possible.
       
   891 
       
   892 So let's optimise a bit the code we generate. There is actually one
       
   893 aspect in our generated code where we can make easily efficiency gains:
       
   894 this has to do with some of the quirks of the JVM. Whenever we push a
       
   895 constant onto the stack, we used the JVM instruction \code{ldc
       
   896 some_const}. This is a rather generic instructions in the sense that it
       
   897 works not just for integers but also for strings, objects and so on.
       
   898 What this instruction does is to put the constant into a constant pool
       
   899 and then to use an index to this constant pool. This means \code{ldc}
       
   900 will be represented by at least two bytes in the class file. While this
       
   901 is sensible for ``large'' constants like strings, it is a bit of
       
   902 overkill for small integers (which many integers will be when compiling
       
   903 a BF-program). To counter this ``waste'', the JVM has specific
       
   904 instructions for small integers, for example
       
   905  
       
   906 \begin{itemize}
       
   907 \item \code{iconst_0},\ldots, \code{iconst_5}
       
   908 \item \code{bipush n}
       
   909 \end{itemize}
       
   910 
       
   911 \noindent
       
   912 where the \code{n} is \code{bipush} is between -128 and 128.   By having
       
   913 dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and
       
   914 \code{iconst_m1}), we can make the generated code size smaller as these
       
   915 instructions only require 1 Byte (as opposed the generic \code{ldc}
       
   916 which needs 1 Byte plus another for the index into the constant pool).
       
   917 While in theory the use of such special instructions should make the
       
   918 code only smaller, it actually makes the code also run faster. Probably
       
   919 because the JVM has to process less code and uses a specific instruction
       
   920 in the underlying CPU.  The story with \code{bipush} is slightly
       
   921 different, because it also uses two Bytes---so it does not result in a
       
   922 reduction in code size. But probably it uses  specific instruction in
       
   923 the underlying CPU which make the JVM code run faster.  
       
   924 
       
   925 
       
   926 
       
   927 
       
   928 \begin{itemize}
       
   929 \item \code{iload_0},\ldots, \code{iload_3}
       
   930 \item \code{istore_0},\ldots, \code{istore_3}
       
   931 \item \code{aload_0},\ldots, \code{aload_3}
       
   932 \item \code{astore_0},\ldots, \code{astore_3}
       
   933 \end{itemize}
       
   934 
       
   935 % 33296 bytes -> 21787
       
   936 % 21 ->  16 seconds
   882 
   937 
   883 As you have probably seen, the compiler writer has a lot of freedom
   938 As you have probably seen, the compiler writer has a lot of freedom
   884 about how to generate code from what the progarmmer wrote as program.
   939 about how to generate code from what the programmer wrote as program.
   885 The only condition is that generated code should behave as expected by
   940 The only condition is that generated code should behave as expected by
   886 the programmer. Then all is fine\ldots mission accomplished! But
   941 the programmer. Then all is fine\ldots mission accomplished! But
   887 sometimes the compiler writer is expected to go an extra mile, or even
   942 sometimes the compiler writer is expected to go an extra mile, or even
   888 miles. Suppose we are given the following WHILE-program:
   943 miles. Suppose we are given the following WHILE-program:
   889 
   944