50 \end{center} |
50 \end{center} |
51 |
51 |
52 \noindent |
52 \noindent |
53 The input will be WHILE-programs; the output will be assembly files |
53 The input will be WHILE-programs; the output will be assembly files |
54 (with the file extension .j). Assembly files essentially contain |
54 (with the file extension .j). Assembly files essentially contain |
55 human-readable machine code, meaning they are not just bits and bytes, |
55 human-readable low-level code, meaning they are not just bits and bytes, |
56 but rather something you can read and understand---with a bit of |
56 but rather something you can read and understand---with a bit of |
57 practice of course. An \emph{assembler} will then translate the assembly |
57 practice of course. An \emph{assembler} will then translate the assembly |
58 files into unreadable class- or binary-files the JVM can run. |
58 files into unreadable class- or binary-files the JVM or CPU can run. |
59 Unfortunately, the Java ecosystem does not come with an assembler which |
59 Unfortunately, the Java ecosystem does not come with an assembler which |
60 would be handy for our compiler-endeavour (unlike Microsoft's Common |
60 would be handy for our compiler-endeavour (unlike Microsoft's Common |
61 Language Infrastructure for the .Net platform which has an assembler |
61 Language Infrastructure for the .Net platform which has an assembler |
62 out-of-the-box). As a substitute we shall use the 3rd-party |
62 out-of-the-box). As a substitute we shall use the 3rd-party programs |
63 programs Jasmin and Krakatau |
63 Jasmin and Krakatau |
64 |
64 |
65 \begin{itemize} |
65 \begin{itemize} |
66 \item \url{http://jasmin.sourceforge.net} |
66 \item \url{http://jasmin.sourceforge.net} |
67 \item \url{https://github.com/Storyyeller/Krakatau} |
67 \item \url{https://github.com/Storyyeller/Krakatau} |
68 \end{itemize} |
68 \end{itemize} |
501 \end{tikzpicture} |
501 \end{tikzpicture} |
502 |
502 |
503 \noindent The first three lines correspond to the the boolean |
503 \noindent The first three lines correspond to the the boolean |
504 expression $1 = 1$. The jump for when this boolean expression |
504 expression $1 = 1$. The jump for when this boolean expression |
505 is false is in Line~3. Lines 4-6 corresponds to the if-branch; |
505 is false is in Line~3. Lines 4-6 corresponds to the if-branch; |
506 the else-branch is in Lines 8 and 9. Note carefully how the |
506 the else-branch is in Lines 8 and 9. |
507 environment $E$ is threaded through the recursive calls of |
507 |
508 \textit{compile}. The function receives an environment $E$, |
508 Note carefully how the environment $E$ is threaded through the recursive |
509 but it might extend it when compiling the if-branch, yielding |
509 calls of \textit{compile}. The function receives an environment $E$, but |
510 $E'$. This happens for example in the if-statement above |
510 it might extend it when compiling the if-branch, yielding $E'$. This |
511 whenever the variable \code{x} has not been used before. |
511 happens for example in the if-statement above whenever the variable |
512 Similarly with the environment $E''$ for the second call to |
512 \code{x} has not been used before. Similarly with the environment $E''$ |
513 \textit{compile}. $E''$ is also the environment that needs to |
513 for the second call to \textit{compile}. $E''$ is also the environment |
514 be returned as part of the answer. |
514 that needs to be returned as part of the answer. |
515 |
515 |
516 The compilation of the while-loops, say |
516 The compilation of the while-loops, say |
517 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case |
517 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case |
518 the condition is true and we need to do another iteration, |
518 the condition is true and we need to do another iteration, |
519 and the control-flow needs to be as follows |
519 and the control-flow needs to be as follows |
638 \pcode{write}. It takes a single integer argument indicated by the |
638 \pcode{write}. It takes a single integer argument indicated by the |
639 \pcode{(I)} and returns no result, indicated by the \pcode{V} (for |
639 \pcode{(I)} and returns no result, indicated by the \pcode{V} (for |
640 void). Since the method has only one argument, we only need a single |
640 void). Since the method has only one argument, we only need a single |
641 local variable (Line~2) and a stack with two cells will be sufficient |
641 local variable (Line~2) and a stack with two cells will be sufficient |
642 (Line 3). Line 4 instructs the JVM to get the value of the member |
642 (Line 3). Line 4 instructs the JVM to get the value of the member |
643 \pcode{out} of the class \pcode{java/lang/System}. It expects the value |
643 \pcode{out} from the class \pcode{java/lang/System}. It expects the value |
644 to be of type \pcode{java/io/PrintStream}. A reference to this value |
644 to be of type \pcode{java/io/PrintStream}. A reference to this value |
645 will be placed on the stack.\footnote{Note the syntax \texttt{L |
645 will be placed on the stack.\footnote{Note the syntax \texttt{L |
646 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the |
646 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the |
647 designers of Jasmin decided that this syntax is pleasing to the eye. So |
647 designers of Jasmin decided that this syntax is pleasing to the eye. So |
648 if you wanted to have strings in your Jasmin code, you would need to |
648 if you wanted to have strings in your Jasmin code, you would need to |
884 WHILE-language in order to do some benchmarking. Which means we now face |
884 WHILE-language in order to do some benchmarking. Which means we now face |
885 the question about what to do next\ldots |
885 the question about what to do next\ldots |
886 |
886 |
887 \subsection*{Optimisations \& Co} |
887 \subsection*{Optimisations \& Co} |
888 |
888 |
889 Every compiler that deserves its name performs optimisations on the |
889 Every compiler that deserves its name has to perform some optimisations |
890 code. If we make the extra effort of writing a compiler for a language, |
890 on the code: if we put in the extra effort of writing a compiler for a |
891 then obviously we want to have our code to run as fast as possible. |
891 language, then obviously we want to have our code to run as fast as |
892 So we should look into this. |
892 possible. So we should look into this in more detail. |
893 |
893 |
894 There is actually one aspect in our generated code where we can make |
894 There is actually one aspect in our generated code where we can make |
895 easily efficiency gains: this has to do with some of the quirks of the |
895 easily efficiency gains. This has to do with some of the quirks of the |
896 JVM. Whenever we push a constant onto the stack, we used the JVM |
896 JVM. Whenever we push a constant onto the stack, we used the JVM |
897 instruction \instr{ldc some_const}. This is a rather generic instruction |
897 instruction \instr{ldc some_const}. This is a rather generic instruction |
898 in the sense that it works not just for integers but also for strings, |
898 in the sense that it works not just for integers but also for strings, |
899 objects and so on. What this instruction does is putting the constant |
899 objects and so on. What this instruction does is putting the constant |
900 into a \emph{constant pool} and then to use an index into this constant |
900 into a \emph{constant pool} and then uses an index into this constant |
901 pool. This means \instr{ldc} will be represented by at least two bytes |
901 pool. This means \instr{ldc} will be represented by at least two bytes |
902 in the class file. While this is sensible for ``large'' constants like |
902 in the class file. While this is a sensible strategy for ``large'' |
903 strings, it is a bit of overkill for small integers (which many integers |
903 constants like strings, it is a bit of overkill for small integers |
904 will be when compiling a BF-program). To counter this ``waste'', the JVM |
904 (which many integers will be when compiling a BF-program). To counter |
905 has specific instructions for small integers, for example |
905 this ``waste'', the JVM has specific instructions for small integers, |
|
906 for example |
906 |
907 |
907 \begin{itemize} |
908 \begin{itemize} |
908 \item \instr{iconst_0},\ldots, \instr{iconst_5} |
909 \item \instr{iconst_0},\ldots, \instr{iconst_5} |
909 \item \instr{bipush n} |
910 \item \instr{bipush n} |
910 \end{itemize} |
911 \end{itemize} |
916 size smaller as these instructions only require 1 byte (as opposed the |
917 size smaller as these instructions only require 1 byte (as opposed the |
917 generic \instr{ldc} which needs 1 byte plus another for the index into |
918 generic \instr{ldc} which needs 1 byte plus another for the index into |
918 the constant pool). While in theory the use of such special instructions |
919 the constant pool). While in theory the use of such special instructions |
919 should make the code only smaller, it actually makes the code also run |
920 should make the code only smaller, it actually makes the code also run |
920 faster. Probably because the JVM has to process less code and uses a |
921 faster. Probably because the JVM has to process less code and uses a |
921 specific instruction in the underlying CPU. The story with |
922 specific instruction for the underlying CPU. The story with |
922 \instr{bipush} is slightly different, because it also uses two |
923 \instr{bipush} is slightly different, because it also uses two |
923 bytes---so it does not result in a reduction in code size. But againm, |
924 bytes---so it does not necessarily result in a reduction of code size. |
924 it probably uses a specific instruction in the underlying CPU that make |
925 Instead, it probably uses a specific instruction in the underlying CPU |
925 the JVM code run faster. This means when generating code we can use |
926 that makes the JVM code run faster.\footnote{This is all ``probable'' |
926 the following helper function |
927 because I have not read the 700 pages of JVM documentation by Oracle and |
|
928 also have no clue how the JVM is implemented.} This means when |
|
929 generating code for pushing constants onto the stack, we can use the |
|
930 following Scala helper-function |
927 |
931 |
928 \begin{lstlisting}[language=Scala] |
932 \begin{lstlisting}[language=Scala] |
929 def compile_num(i: Int) = |
933 def compile_num(i: Int) = |
930 if (0 <= i && i <= 5) i"iconst_$i" else |
934 if (0 <= i && i <= 5) i"iconst_$i" else |
931 if (-128 <= i && i <= 127) i"bipush $i" else i"ldc $i" |
935 if (-128 <= i && i <= 127) i"bipush $i" |
|
936 else i"ldc $i" |
932 \end{lstlisting} |
937 \end{lstlisting} |
933 |
938 |
934 \noindent |
939 \noindent |
935 that generates the more efficient instructions for pushing a constant |
940 It generates the more efficient instructions when pushing a small integer |
936 onto the stack. Note the JVM also has special instructions that |
941 constant onto the stack. The default is \instr{ldc} for any other constants. |
937 load and store the first three local variables. The assumption is that |
942 |
938 most operations and arguments in a method will only use very few |
943 The JVM also has such special instructions for |
939 local variables. So the JVM has the following instructions: |
944 loading and storing the first three local variables. The assumption is |
|
945 that most operations and arguments in a method will only use very few |
|
946 local variables. So we can use the following instructions: |
940 |
947 |
941 \begin{itemize} |
948 \begin{itemize} |
942 \item \instr{iload_0},\ldots, \instr{iload_3} |
949 \item \instr{iload_0},\ldots, \instr{iload_3} |
943 \item \instr{istore_0},\ldots, \instr{istore_3} |
950 \item \instr{istore_0},\ldots, \instr{istore_3} |
944 \item \instr{aload_0},\ldots, \instr{aload_3} |
951 \item \instr{aload_0},\ldots, \instr{aload_3} |
945 \item \instr{astore_0},\ldots, \instr{astore_3} |
952 \item \instr{astore_0},\ldots, \instr{astore_3} |
946 \end{itemize} |
953 \end{itemize} |
947 |
954 |
948 |
955 |
949 \noindent Having implemented these optimisations, the code size of the |
956 \noindent Having implemented these optimisations, the code size of the |
950 BF-Mandelbrot program reduces and also it runs faster. According to my |
957 BF-Mandelbrot program reduces and also the class-file runs faster (the |
951 very rough experiments: |
958 parsing part is still very slow). According to my very rough |
|
959 experiments: |
952 |
960 |
953 \begin{center} |
961 \begin{center} |
954 \begin{tabular}{lll} |
962 \begin{tabular}{lll} |
955 & class-size & runtime\\\hline |
963 & class-size & runtime\\\hline |
956 Mandelbrot:\\ |
964 Mandelbrot:\\ |
959 \end{tabular} |
967 \end{tabular} |
960 \end{center} |
968 \end{center} |
961 |
969 |
962 \noindent |
970 \noindent |
963 Quite good! Such optimisations are called \emph{peephole optimisations}, |
971 Quite good! Such optimisations are called \emph{peephole optimisations}, |
964 because it is type of optimisations that involve changing a small set of |
972 because they involve changing one or a small set of instructions into an |
965 instructions into an equivalent set that has better performance. |
973 equivalent set that has better performance. |
966 |
974 |
967 If you look careful at our code you will quickly find another source of |
975 If you look careful at our generated code you will quickly find another |
968 inefficiency in programs like |
976 source of inefficiency in programs like |
969 |
977 |
970 \begin{lstlisting}[mathescape,language=While] |
978 \begin{lstlisting}[mathescape,language=While] |
971 x := ...; |
979 x := ...; |
972 write x |
980 write x |
973 \end{lstlisting} |
981 \end{lstlisting} |
974 |
982 |
975 \noindent |
983 \noindent |
976 where our code first calculates the new result the for \texttt{x} on the |
984 where our code first calculates the new result the for \texttt{x} on the |
977 stack, then pops off the result into a local variable, and after that |
985 stack, then pops off the result into a local variable, and after that |
978 loads the local variable back onto the stack for writing out a number. |
986 loads the local variable back onto the stack for writing out a number. |
|
987 |
|
988 \begin{lstlisting}[mathescape,language=JVMIS] |
|
989 ... |
|
990 istore 0 |
|
991 iload 0 |
|
992 ... |
|
993 \end{lstlisting} |
|
994 |
|
995 \noindent |
979 If we can detect such situations, then we can leave the value of |
996 If we can detect such situations, then we can leave the value of |
980 \texttt{x} on the stack with for example the much cheaper instruction |
997 \texttt{x} on the stack with for example the much cheaper instruction |
981 \instr{dup}. Now the problem with this optimisation is that it is quite |
998 \instr{dup}. Now the problem with this optimisation is that it is quite |
982 easy for the snippet above, but what about instances where there is |
999 easy for the snippet above, but what about instances where there is |
983 further WHILE-code in \emph{between} these two statements? Sometimes we |
1000 further WHILE-code in \emph{between} these two statements? Sometimes we |
984 will be able to optimise, sometimes we will not. The compiler needs to |
1001 will be able to optimise, sometimes we will not. The compiler needs to |
985 find out which situation applies. This can become quickly much more |
1002 find out which situation applies. This can quickly become much more |
986 complicated. So we leave this kind of optimisations here and look at |
1003 complicated. So we leave this kind of optimisations here and look at |
987 something more interesting and possibly surprising. |
1004 something more interesting and possibly surprising. |
988 |
1005 |
989 |
1006 As you might have seen, the compiler writer has a lot of freedom about |
990 As you have probably seen, the compiler writer has a lot of freedom |
1007 how to generate code from what the programmer wrote as program. The only |
991 about how to generate code from what the programmer wrote as program. |
1008 condition is that generated code should behave as expected by the |
992 The only condition is that generated code should behave as expected by |
1009 programmer. Then all is fine with the code above\ldots mission |
993 the programmer. Then all is fine\ldots mission accomplished! But |
1010 accomplished! But sometimes the compiler writer is expected to go an |
994 sometimes the compiler writer is expected to go an extra mile, or even |
1011 extra mile, or even miles and change(!) the meaning of a program. |
995 miles and change the meaning of a program in unexpected ways. Suppose we |
1012 Suppose we are given the following WHILE-program: |
996 are given the following WHILE-program: |
|
997 |
1013 |
998 \begin{lstlisting}[mathescape,language=While] |
1014 \begin{lstlisting}[mathescape,language=While] |
999 new(arr[10]); |
1015 new(arr[10]); |
1000 arr[14] := 3 + arr[13] |
1016 arr[14] := 3 + arr[13] |
1001 \end{lstlisting} |
1017 \end{lstlisting} |
1002 |
1018 |
1003 \noindent |
1019 \noindent |
1004 Admittedly this is a contrived program, and probably not meant to be |
1020 Admittedly this is a contrived program, and probably not meant to be |
1005 like this by any sane programmer, but it is supposed to make the |
1021 like this by any sane programmer, but it is supposed to make the |
1006 following point: We generate an array of size 10, and then try to access |
1022 following point: The program generates an array of size 10, and then |
1007 the non-existing element at index 13 and even updating element with |
1023 tries to access the non-existing element at index 13 and even updating |
1008 index 14. Obviously this is baloney. Still, our compiler generates code |
1024 the element with index 14. Obviously this is baloney. Still, our |
1009 for this program without any questions asked. We can even run this code |
1025 compiler generates code for this program without any questions asked. We |
1010 on the JVM\ldots of course the result is an exception trace where the |
1026 can even run this code on the JVM\ldots of course the result is an |
1011 JVM yells at us for doing naughty things. (This is much better than C, |
1027 exception trace where the JVM yells at us for doing naughty |
1012 for example, where such errors are not prevented and as a result |
1028 things.\footnote{Still this is much better than C, for example, where |
1013 insidious attacks can be mounted against such kind C-programs. I assume |
1029 such errors are not prevented and as a result insidious attacks can be |
1014 everyone has heard about \emph{Buffer Overflow Attacks}.) Now what |
1030 mounted against such kind C-programs. I assume everyone has heard about |
1015 should we do in such situations? Index over- or underflows are |
1031 \emph{Buffer Overflow Attacks}.} Now what should we do in such |
1016 notoriously difficult to detect statically (at compiletime), so it seem |
1032 situations? Over- and underflows of indices are notoriously difficult to |
1017 raising an exception at run-time like the JVM is the best compromise. |
1033 detect statically (at compiletime). So it might seem raising an |
|
1034 exception at run-time like the JVM is the best compromise. |
1018 |
1035 |
1019 Well, imagine we do not want to rely in our compiler on the JVM for |
1036 Well, imagine we do not want to rely in our compiler on the JVM for |
1020 producing an annoying, but safe exception trace, rather we want to |
1037 producing an annoying, but safe exception trace, rather we want to |
1021 handle such situations ourselves according to what we thing should |
1038 handle such situations ourselves according to what we think should |
1022 happen in such cases. Let's assume we want to handle them in the |
1039 happen in such cases. Let us assume we want to handle them in the |
1023 following way: if the programmer access a field out-of-bounds, we just |
1040 following way: if the programmer access a field out-of-bounds, we just |
1024 return a default 0, and if a programmer wants to update an |
1041 return a default 0, and if a programmer wants to update an out-of-bounds |
1025 out-of-bounds field, we want to ``quietly'' ignore this update. |
1042 field, we want to ``quietly'' ignore this update. One way to achieve |
1026 |
1043 this would be to rewrite the WHILE-programs and insert the necessary |
1027 |
1044 if-conditions for safely reading and writing arrays. Another way |
1028 arraylength |
1045 is to modify the code we generate. |
|
1046 |
|
1047 \begin{lstlisting}[mathescape,language=JVMIS2] |
|
1048 $\textit{index\_aexp}$ |
|
1049 aload loc_var |
|
1050 dup2 |
|
1051 arraylength |
|
1052 if_icmple L1 |
|
1053 pop2 |
|
1054 iconst_0 |
|
1055 goto L2 |
|
1056 L1: |
|
1057 swap |
|
1058 iaload |
|
1059 L2: |
|
1060 \end{lstlisting} |
|
1061 |
|
1062 \begin{lstlisting}[mathescape,language=JVMIS2] |
|
1063 $\textit{index\_aexp}$ |
|
1064 aload loc_var |
|
1065 dup2 |
|
1066 arraylength |
|
1067 if_icmple L1 |
|
1068 pop2 |
|
1069 goto L2 |
|
1070 L1: |
|
1071 swap |
|
1072 $\textit{value\_aexp}$ |
|
1073 iastore |
|
1074 L2: |
|
1075 \end{lstlisting} |
1029 |
1076 |
1030 |
1077 |
1031 \end{document} |
1078 \end{document} |
1032 |
1079 |
1033 %%% Local Variables: |
1080 %%% Local Variables: |