19 The purpose of a compiler is to transform a program a human can read and |
18 The purpose of a compiler is to transform a program a human can read and |
20 write into code the machine can run as fast as possible. The fastest |
19 write into code the machine can run as fast as possible. The fastest |
21 code would be machine code the CPU can run directly, but it is often |
20 code would be machine code the CPU can run directly, but it is often |
22 good enough for improving the speed of a program to target a virtual |
21 good enough for improving the speed of a program to target a virtual |
23 machine instead. This produces not the fastest possible code, but code |
22 machine instead. This produces not the fastest possible code, but code |
24 that is often pretty fast. This way of producing code has the advantage |
23 that is often pretty fast. This way of producing code has also the |
25 that the virtual machine takes care of things a compiler would normally |
24 advantage that the virtual machine takes care of things a compiler would |
26 need to take care of (hairy things like explicit memory management). |
25 normally need to take care of (hairy things like explicit memory |
|
26 management). |
27 |
27 |
28 As a first example in this module we will implement a compiler for the |
28 As a first example in this module we will implement a compiler for the |
29 very simple WHILE-language that we parsed in the last lecture. The |
29 very simple WHILE-language that we parsed in the last lecture. The |
30 compiler will target the Java Virtual Machine (JVM), but not directly. |
30 compiler will target the Java Virtual Machine (JVM), but not directly. |
31 Pictorially the compiler will work as follows: |
31 Pictorially the compiler will work as follows: |
32 |
32 |
33 \begin{center} |
33 \begin{center} |
34 \begin{tikzpicture}[scale=1,font=\bf, |
34 \begin{tikzpicture}[scale=1,font=\bf, |
35 node/.style={ |
35 node/.style={ |
36 rectangle,rounded corners=3mm, |
36 rectangle,rounded corners=3mm, |
37 ultra thick,draw=black!50,minimum height=18mm, |
37 ultra thick,draw=black!50,minimum height=18mm, |
53 The input will be WHILE-programs; the output will be assembly files |
53 The input will be WHILE-programs; the output will be assembly files |
54 (with the file extension .j). Assembly files essentially contain |
54 (with the file extension .j). Assembly files essentially contain |
55 human-readable machine code, meaning they are not just bits and bytes, |
55 human-readable machine code, meaning they are not just bits and bytes, |
56 but rather something you can read and understand---with a bit of |
56 but rather something you can read and understand---with a bit of |
57 practice of course. An \emph{assembler} will then translate the assembly |
57 practice of course. An \emph{assembler} will then translate the assembly |
58 files into unreadable class or binary files the JVM can run. |
58 files into unreadable class- or binary-files the JVM can run. |
59 Unfortunately, the Java ecosystem does not come with an assembler which |
59 Unfortunately, the Java ecosystem does not come with an assembler which |
60 would be handy for our compiler-endeavour (unlike Microsoft's Common |
60 would be handy for our compiler-endeavour (unlike Microsoft's Common |
61 Language Infrastructure for the .Net platform which has an assembler |
61 Language Infrastructure for the .Net platform which has an assembler |
62 out-of-the-box). As a substitute we shall therefore use the 3rd-party |
62 out-of-the-box). As a substitute we shall use the 3rd-party |
63 programs Jasmin and Krakatau |
63 programs Jasmin and Krakatau |
64 |
64 |
65 \begin{itemize} |
65 \begin{itemize} |
66 \item \url{http://jasmin.sourceforge.net} |
66 \item \url{http://jasmin.sourceforge.net} |
67 \item \url{https://github.com/Storyyeller/Krakatau} |
67 \item \url{https://github.com/Storyyeller/Krakatau} |
70 \noindent |
70 \noindent |
71 The first is a Java program and the second a program written in Python. |
71 The first is a Java program and the second a program written in Python. |
72 Each of them allow us to generate \emph{assembly} files that are still |
72 Each of them allow us to generate \emph{assembly} files that are still |
73 readable by humans, as opposed to class-files which are pretty much just |
73 readable by humans, as opposed to class-files which are pretty much just |
74 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take |
74 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take |
75 an assembly file as input and generate the corresponding class file for |
75 our assembly files as input and generate the corresponding class-files for |
76 us. |
76 us. |
77 |
77 |
78 Good about the JVM is that it is a stack-based virtual machine, a fact |
78 What is good about the JVM is that it is a stack-based virtual machine, |
79 which will make it easy to generate code for arithmetic expressions. For |
79 a fact which will make it easy to generate code for arithmetic |
80 example when compiling the expression $1 + 2$ we need to generate the |
80 expressions. For example when compiling the expression $1 + 2$ we need |
81 following three instructions |
81 to generate the following three instructions |
82 |
82 |
83 \begin{lstlisting}[language=JVMIS,numbers=none] |
83 \begin{lstlisting}[language=JVMIS,numbers=none] |
84 ldc 1 |
84 ldc 1 |
85 ldc 2 |
85 ldc 2 |
86 iadd |
86 iadd |
87 \end{lstlisting} |
87 \end{lstlisting} |
88 |
88 |
89 \noindent The first instruction loads the constant $1$ onto the stack, |
89 \noindent The first instruction loads the constant $1$ onto the stack, |
90 the next one loads $2$, the third instruction adds both numbers together |
90 the next one loads $2$, the third instruction adds both numbers together |
91 replacing the top two elements of the stack with the result $3$. For |
91 replacing the top two elements of the stack with the result $3$. For |
92 simplicity, we will consider throughout only integer numbers. This means |
92 simplicity, we will consider throughout only arithmetic involving |
93 our main JVM instructions for arithmetic will be \code{iadd}, |
93 integer numbers. This means our main JVM instructions for arithmetic |
94 \code{isub}, \code{imul}, \code{idiv} and so on. The \code{i} stands for |
94 will be \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on. |
95 integer instructions in the JVM (alternatives are \code{d} for doubles, |
95 The \code{i} stands for integer instructions in the JVM (alternatives |
96 \code{l} for longs and \code{f} for floats etc). |
96 are \code{d} for doubles, \code{l} for longs and \code{f} for floats |
|
97 etc). |
97 |
98 |
98 Recall our grammar for arithmetic expressions (\meta{E} is the |
99 Recall our grammar for arithmetic expressions (\meta{E} is the |
99 starting symbol): |
100 starting symbol): |
100 |
101 |
101 |
102 |
643 to be of type \pcode{java/io/PrintStream}. A reference to this value |
644 to be of type \pcode{java/io/PrintStream}. A reference to this value |
644 will be placed on the stack.\footnote{Note the syntax \texttt{L |
645 will be placed on the stack.\footnote{Note the syntax \texttt{L |
645 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the |
646 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the |
646 designers of Jasmin decided that this syntax is pleasing to the eye. So |
647 designers of Jasmin decided that this syntax is pleasing to the eye. So |
647 if you wanted to have strings in your Jasmin code, you would need to |
648 if you wanted to have strings in your Jasmin code, you would need to |
648 write \texttt{Ljava/lang/String;}\;. If you want arrays of one dimension, |
649 write \texttt{Ljava/lang/String;}\;. If you want arrays of one |
649 then use \texttt{[\ldots}; two dimensions, use \texttt{[[\ldots} and |
650 dimension, then use \texttt{[\ldots}; two dimensions, use |
650 so on. Looks all very ugly to my eyes.} Line~5 copies the integer we |
651 \texttt{[[\ldots} and so on. Looks all very ugly to my eyes.} Line~5 |
651 want to print out onto the stack. In the line after that we call the |
652 copies the integer we want to print out onto the stack. In the line |
652 method \pcode{println} (from the class \pcode{java/io/PrintStream}). We |
653 after that we call the method \pcode{println} (from the class |
653 want to print out an integer and do not expect anything back (that is |
654 \pcode{java/io/PrintStream}). We want to print out an integer and do not |
654 why the type annotation is \pcode{(I)V}). The \pcode{return}-instruction |
655 expect anything back (that is why the type annotation is \pcode{(I)V}). |
655 in the next line changes the control-flow back to the place from where |
656 The \pcode{return}-instruction in the next line changes the control-flow |
656 \pcode{write} was called. This method needs to be part of a header that |
657 back to the place from where \pcode{write} was called. This method needs |
657 is included in any code we generate. The helper-method \pcode{write} can |
658 to be part of a header that is included in any code we generate. The |
658 be invoked with the two instructions |
659 helper-method \pcode{write} can be invoked with the two instructions |
659 |
660 |
660 \begin{lstlisting}[mathescape,language=JVMIS] |
661 \begin{lstlisting}[mathescape,language=JVMIS] |
661 iload $E(x)$ |
662 iload $E(x)$ |
662 invokestatic XXX/XXX/write(I)V |
663 invokestatic XXX/XXX/write(I)V |
663 \end{lstlisting} |
664 \end{lstlisting} |
716 bytecode is then understood by the JVM and can be run by just invoking |
719 bytecode is then understood by the JVM and can be run by just invoking |
717 the \pcode{java}-program. Again I let you do the work. |
720 the \pcode{java}-program. Again I let you do the work. |
718 |
721 |
719 |
722 |
720 \begin{figure}[p] |
723 \begin{figure}[p] |
|
724 \begin{framed} |
721 \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j} |
725 \lstinputlisting[language=JVMIS,mathescape,basicstyle=\ttfamily\small]{../progs/test-small.j} |
722 \begin{tikzpicture}[remember picture,overlay] |
726 \begin{tikzpicture}[remember picture,overlay] |
723 \draw[|<->|,very thick] (LA.north) -- (LB.south) |
727 \draw[|<->|,very thick] (LA.north) -- (LB.south) |
724 node[left=0mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; |
728 node[left=-0.5mm,midway] {\footnotesize\texttt{x\,:=\,1\,+\,2}}; |
725 \draw[|<->|,very thick] (LC.north) -- (LD.south) |
729 \draw[|<->|,very thick] (LC.north) -- (LD.south) |
726 node[left=0mm,midway] {\footnotesize\texttt{write x}}; |
730 node[left=-0.5mm,midway] {\footnotesize\texttt{write x}}; |
727 \end{tikzpicture} |
731 \end{tikzpicture} |
|
732 \end{framed} |
728 \caption{The generated code for the test program \texttt{x := 1 + 2; write |
733 \caption{The generated code for the test program \texttt{x := 1 + 2; write |
729 x}. This code can be processed by a Java assembler producing a |
734 x}. This code can be processed by a Java assembler producing a |
730 class-file, which can then be run by the {\tt{}java}-program.\label{test}} |
735 class-file, which can then be run by the {\tt{}java}-program.\label{test}} |
731 \end{figure} |
736 \end{figure} |
732 |
737 |
850 } |
855 } |
851 \end{lstlisting} |
856 \end{lstlisting} |
852 |
857 |
853 \noindent |
858 \noindent |
854 The idea behind the translation is that BF-programs operate on an array, |
859 The idea behind the translation is that BF-programs operate on an array, |
855 called here \texttt{mem}. The BP-memory pointer into this array is |
860 called here \texttt{mem}. The BF-memory pointer into this array is |
856 represented as the variable \texttt{ptr}. As usual the BF-instructions |
861 represented as the variable \texttt{ptr}. As usual the BF-instructions |
857 \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The |
862 \code{>} and \code{<} increase, respectively decrease, \texttt{ptr}. The |
858 instructions \code{+} and \code{-} update a cell in \texttt{mem}. In |
863 instructions \code{+} and \code{-} update a cell in \texttt{mem}. In |
859 Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary variable |
864 Line 6 we need to first assign a \texttt{mem}-cell to an auxiliary |
860 since we have not changed our write functions in order to cope with |
865 variable since we have not changed our write functions in order to cope |
861 writing out any array-content directly. Lines 7 and 8 are for |
866 with writing out any array-content directly. Lines 7 and 8 are for |
862 translating BF-loops. Line 8 is interesting in the sense that we need to |
867 translating BF-loops. Line 8 is interesting in the sense that we need to |
863 generate a \code{skip} instruction just before finishing with the |
868 generate a \code{skip} instruction just before finishing with the |
864 closing \code{"\}"}. The reason is that we are rather pedantic about |
869 closing \code{"\}"}. The reason is that we are rather pedantic about |
865 semicolons in our WHILE-grammar: the last command cannot have a |
870 semicolons in our WHILE-grammar: the last command cannot have a |
866 semicolon---adding a \code{skip} works around this snag. Putting all |
871 semicolon---adding a \code{skip} works around this snag. |
867 this together is we can generate WHILE-programs with more than 400 |
872 |
868 instructions and then run the compiled JVM code for such programs. |
873 Putting all this together and we can generate WHILE-programs with more |
869 \bigskip |
874 than 15K JVM-instructions; run the compiled JVM code for such |
870 |
875 programs and marvel at the output\ldots\medskip |
871 \noindent |
876 |
872 Hooooray, we can finally run the BF-mandelbrot program on the JVM and it |
877 \noindent |
873 completes within 20 seconds (after nearly 10 minutes of parsing the |
878 \ldots{}Hooooray, we can finally run the BF-mandelbrot program on the JVM: it |
874 corresponding WHILE-program and generating 270K of a class file). Try |
879 completes within 20 or so seconds (after nearly 10 minutes of parsing |
875 replicating the 20 secs with an interpreter! OK, we now face the |
880 the corresponding WHILE-program; the size of the resulting class files |
876 nagging question about what to do next\ldots |
881 is around 32K). Try replicating the 20 secs with an interpreter! The |
877 |
882 good point is that we now have a sufficiently complicated program in our |
878 \subsection*{Added Value} |
883 WHILE-language in order to do some benchmarking. Which means we now face |
879 |
884 the question about what to do next\ldots |
880 % 33296 bytes -> 21882 |
885 |
881 % shave off 2 seconds |
886 \subsection*{Optimisations \& Co} |
|
887 |
|
888 Every compiler that deserves its name performs some optimisation on the |
|
889 code. If we make the extra effort of writing a compiler for a language, |
|
890 then obviously we want to have our code to run as fast as possible. |
|
891 |
|
892 So let's optimise a bit the code we generate. There is actually one |
|
893 aspect in our generated code where we can make easily efficiency gains: |
|
894 this has to do with some of the quirks of the JVM. Whenever we push a |
|
895 constant onto the stack, we used the JVM instruction \code{ldc |
|
896 some_const}. This is a rather generic instructions in the sense that it |
|
897 works not just for integers but also for strings, objects and so on. |
|
898 What this instruction does is to put the constant into a constant pool |
|
899 and then to use an index to this constant pool. This means \code{ldc} |
|
900 will be represented by at least two bytes in the class file. While this |
|
901 is sensible for ``large'' constants like strings, it is a bit of |
|
902 overkill for small integers (which many integers will be when compiling |
|
903 a BF-program). To counter this ``waste'', the JVM has specific |
|
904 instructions for small integers, for example |
|
905 |
|
906 \begin{itemize} |
|
907 \item \code{iconst_0},\ldots, \code{iconst_5} |
|
908 \item \code{bipush n} |
|
909 \end{itemize} |
|
910 |
|
911 \noindent |
|
912 where the \code{n} is \code{bipush} is between -128 and 128. By having |
|
913 dedicated instructions such as \code{iconst_0} to \code{iconst_5} (and |
|
914 \code{iconst_m1}), we can make the generated code size smaller as these |
|
915 instructions only require 1 Byte (as opposed the generic \code{ldc} |
|
916 which needs 1 Byte plus another for the index into the constant pool). |
|
917 While in theory the use of such special instructions should make the |
|
918 code only smaller, it actually makes the code also run faster. Probably |
|
919 because the JVM has to process less code and uses a specific instruction |
|
920 in the underlying CPU. The story with \code{bipush} is slightly |
|
921 different, because it also uses two Bytes---so it does not result in a |
|
922 reduction in code size. But probably it uses specific instruction in |
|
923 the underlying CPU which make the JVM code run faster. |
|
924 |
|
925 |
|
926 |
|
927 |
|
928 \begin{itemize} |
|
929 \item \code{iload_0},\ldots, \code{iload_3} |
|
930 \item \code{istore_0},\ldots, \code{istore_3} |
|
931 \item \code{aload_0},\ldots, \code{aload_3} |
|
932 \item \code{astore_0},\ldots, \code{astore_3} |
|
933 \end{itemize} |
|
934 |
|
935 % 33296 bytes -> 21787 |
|
936 % 21 -> 16 seconds |
882 |
937 |
883 As you have probably seen, the compiler writer has a lot of freedom |
938 As you have probably seen, the compiler writer has a lot of freedom |
884 about how to generate code from what the progarmmer wrote as program. |
939 about how to generate code from what the programmer wrote as program. |
885 The only condition is that generated code should behave as expected by |
940 The only condition is that generated code should behave as expected by |
886 the programmer. Then all is fine\ldots mission accomplished! But |
941 the programmer. Then all is fine\ldots mission accomplished! But |
887 sometimes the compiler writer is expected to go an extra mile, or even |
942 sometimes the compiler writer is expected to go an extra mile, or even |
888 miles. Suppose we are given the following WHILE-program: |
943 miles. Suppose we are given the following WHILE-program: |
889 |
944 |