handouts/ho07.tex
changeset 712 e71eb9ce2373
parent 711 6f3f3dd01786
child 713 0ea14d84efe3
equal deleted inserted replaced
711:6f3f3dd01786 712:e71eb9ce2373
    50   \end{center}
    50   \end{center}
    51 
    51 
    52 \noindent
    52 \noindent
    53 The input will be WHILE-programs; the output will be assembly files
    53 The input will be WHILE-programs; the output will be assembly files
    54 (with the file extension .j). Assembly files essentially contain
    54 (with the file extension .j). Assembly files essentially contain
    55 human-readable machine code, meaning they are not just bits and bytes,
    55 human-readable low-level code, meaning they are not just bits and bytes,
    56 but rather something you can read and understand---with a bit of
    56 but rather something you can read and understand---with a bit of
    57 practice of course. An \emph{assembler} will then translate the assembly
    57 practice of course. An \emph{assembler} will then translate the assembly
    58 files into unreadable class- or binary-files the JVM can run.
    58 files into unreadable class- or binary-files the JVM or CPU can run.
    59 Unfortunately, the Java ecosystem does not come with an assembler which
    59 Unfortunately, the Java ecosystem does not come with an assembler which
    60 would be handy for our compiler-endeavour  (unlike Microsoft's  Common
    60 would be handy for our compiler-endeavour  (unlike Microsoft's  Common
    61 Language Infrastructure for the .Net platform which has an assembler
    61 Language Infrastructure for the .Net platform which has an assembler
    62 out-of-the-box). As a substitute we shall use the 3rd-party
    62 out-of-the-box). As a substitute we shall use the 3rd-party programs
    63 programs Jasmin and Krakatau 
    63 Jasmin and Krakatau 
    64 
    64 
    65 \begin{itemize}
    65 \begin{itemize}
    66   \item \url{http://jasmin.sourceforge.net}
    66   \item \url{http://jasmin.sourceforge.net}
    67   \item \url{https://github.com/Storyyeller/Krakatau}
    67   \item \url{https://github.com/Storyyeller/Krakatau}
    68 \end{itemize}
    68 \end{itemize}
   114 
   114 
   115 
   115 
   116 \noindent where \meta{Id} stands for variables and \meta{Num}
   116 \noindent where \meta{Id} stands for variables and \meta{Num}
   117 for numbers. For the moment let us omit variables from arithmetic
   117 for numbers. For the moment let us omit variables from arithmetic
   118 expressions. Our parser will take this grammar and given an input
   118 expressions. Our parser will take this grammar and given an input
   119 program produce an abstract syntax tree. For example we will obtain for
   119 program produce an abstract syntax tree. For example we obtain for
   120 the expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
   120 the expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
   121 
   121 
   122 \begin{center}
   122 \begin{center}
   123 \begin{tikzpicture}
   123 \begin{tikzpicture}
   124 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
   124 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
   246 that assignments in the WHILE-language might change the
   246 that assignments in the WHILE-language might change the
   247 environment---clearly if a variable is used for the first
   247 environment---clearly if a variable is used for the first
   248 time, we need to allocate a new index and if it has been used
   248 time, we need to allocate a new index and if it has been used
   249 before, then we need to be able to retrieve the associated index.
   249 before, then we need to be able to retrieve the associated index.
   250 This is reflected in the clause for compiling assignments, say
   250 This is reflected in the clause for compiling assignments, say
   251 $\textit{x := a}$:
   251 $x := a$:
   252 
   252 
   253 \begin{center}
   253 \begin{center}
   254 \begin{tabular}{lcl}
   254 \begin{tabular}{lcl}
   255 $\textit{compile}(x := a, E)$ & $\dn$ & 
   255 $\textit{compile}(x := a, E)$ & $\dn$ & 
   256 $(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
   256 $(\textit{compile}(a, E) \;@\;\instr{istore}\;index, E')$
   259 
   259 
   260 \noindent We first generate code for the right-hand side of the
   260 \noindent We first generate code for the right-hand side of the
   261 assignment (that is the arithmetic expression $a$) and then add an
   261 assignment (that is the arithmetic expression $a$) and then add an
   262 \instr{istore}-instruction at the end. By convention running the code
   262 \instr{istore}-instruction at the end. By convention running the code
   263 for the arithmetic expression $a$ will leave the result on top of the
   263 for the arithmetic expression $a$ will leave the result on top of the
   264 stack. After that the \instr{istore} instruction, the result will be
   264 stack. After that the \instr{istore}-instruction, the result will be
   265 stored in the index corresponding to the variable $x$. If the variable
   265 stored in the index corresponding to the variable $x$. If the variable
   266 $x$ has been used before in the program, we just need to look up what
   266 $x$ has been used before in the program, we just need to look up what
   267 the index is and return the environment unchanged (that is in this case
   267 the index is and return the environment unchanged (that is in this case
   268 $E' = E$). However, if this is the first encounter of the variable $x$
   268 $E' = E$). However, if this is the first encounter of the variable $x$
   269 in the program, then we have to augment the environment and assign $x$
   269 in the program, then we have to augment the environment and assign $x$
   374 where the jump should go. Since we are generating assembly
   374 where the jump should go. Since we are generating assembly
   375 code for the JVM, we do not actually have to give (numeric)
   375 code for the JVM, we do not actually have to give (numeric)
   376 addresses, but can just attach (symbolic) labels to our code.
   376 addresses, but can just attach (symbolic) labels to our code.
   377 These labels specify a target for a jump. Therefore the labels
   377 These labels specify a target for a jump. Therefore the labels
   378 need to be unique, as otherwise it would be ambiguous where a
   378 need to be unique, as otherwise it would be ambiguous where a
   379 jump should go to. A label, say \pcode{L}, is attached to code
   379 jump should go to. A label, say \pcode{L}, is attached to assembly 
   380 like
   380 code like
   381 
   381 
   382 \begin{lstlisting}[mathescape,numbers=none]
   382 \begin{lstlisting}[mathescape,numbers=none]
   383 L:
   383 L:
   384   $\textit{instr\_1}$
   384   $\textit{instr\_1}$
   385   $\textit{instr\_2}$
   385   $\textit{instr\_2}$
   466 for the jump addresses (just before the else-branch and just
   466 for the jump addresses (just before the else-branch and just
   467 after). In the next two lines we generate the instructions for
   467 after). In the next two lines we generate the instructions for
   468 the two branches, $is_1$ and $is_2$. The final code will
   468 the two branches, $is_1$ and $is_2$. The final code will
   469 be first the code for $b$ (including the label 
   469 be first the code for $b$ (including the label 
   470 just-before-the-else-branch), then the \pcode{goto} for after
   470 just-before-the-else-branch), then the \pcode{goto} for after
   471 the else-branch, the label $L_\textit{ifesle}$, followed by
   471 the else-branch, the label $L_\textit{ifelse}$, followed by
   472 the instructions for the else-branch, followed by the 
   472 the instructions for the else-branch, followed by the 
   473 after-the-else-branch label. Consider for example the 
   473 after-the-else-branch label. Consider for example the 
   474 if-statement:
   474 if-statement:
   475 
   475 
   476 \begin{lstlisting}[mathescape,numbers=none,language=While]
   476 \begin{lstlisting}[mathescape,numbers=none,language=While]
   501 \end{tikzpicture}
   501 \end{tikzpicture}
   502 
   502 
   503 \noindent The first three lines correspond to the the boolean
   503 \noindent The first three lines correspond to the the boolean
   504 expression $1 = 1$. The jump for when this boolean expression
   504 expression $1 = 1$. The jump for when this boolean expression
   505 is false is in Line~3. Lines 4-6 corresponds to the if-branch;
   505 is false is in Line~3. Lines 4-6 corresponds to the if-branch;
   506 the else-branch is in Lines 8 and 9. Note carefully how the
   506 the else-branch is in Lines 8 and 9. 
   507 environment $E$ is threaded through the recursive calls of
   507 
   508 \textit{compile}. The function receives an environment $E$,
   508 Note carefully how the environment $E$ is threaded through the recursive
   509 but it might extend it when compiling the if-branch, yielding
   509 calls of \textit{compile}. The function receives an environment $E$, but
   510 $E'$. This happens for example in the if-statement above
   510 it might extend it when compiling the if-branch, yielding $E'$. This
   511 whenever the variable \code{x} has not been used before.
   511 happens for example in the if-statement above whenever the variable
   512 Similarly with the environment $E''$ for the second call to
   512 \code{x} has not been used before. Similarly with the environment $E''$
   513 \textit{compile}. $E''$ is also the environment that needs to
   513 for the second call to \textit{compile}. $E''$ is also the environment
   514 be returned as part of the answer.
   514 that needs to be returned as part of the answer.
   515 
   515 
   516 The compilation of the while-loops, say 
   516 The compilation of the while-loops, say 
   517 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
   517 \pcode{while} $b$ \pcode{do} $cs$, is very similar. In case
   518 the condition is true and we need to do another iteration, 
   518 the condition is true and we need to do another iteration, 
   519 and the control-flow needs to be as follows
   519 and the control-flow needs to be as follows
   638 \pcode{write}. It takes a single integer argument indicated by the
   638 \pcode{write}. It takes a single integer argument indicated by the
   639 \pcode{(I)} and returns no result, indicated by the \pcode{V} (for
   639 \pcode{(I)} and returns no result, indicated by the \pcode{V} (for
   640 void). Since the method has only one argument, we only need a single
   640 void). Since the method has only one argument, we only need a single
   641 local variable (Line~2) and a stack with two cells will be sufficient
   641 local variable (Line~2) and a stack with two cells will be sufficient
   642 (Line 3). Line 4 instructs the JVM to get the value of the member
   642 (Line 3). Line 4 instructs the JVM to get the value of the member
   643 \pcode{out} of the class \pcode{java/lang/System}. It expects the value
   643 \pcode{out} from the class \pcode{java/lang/System}. It expects the value
   644 to be of type \pcode{java/io/PrintStream}. A reference to this value
   644 to be of type \pcode{java/io/PrintStream}. A reference to this value
   645 will be placed on the stack.\footnote{Note the syntax \texttt{L
   645 will be placed on the stack.\footnote{Note the syntax \texttt{L
   646 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
   646 \ldots{};} for the \texttt{PrintStream} type is not an typo. Somehow the
   647 designers of Jasmin decided that this syntax is pleasing to the eye. So
   647 designers of Jasmin decided that this syntax is pleasing to the eye. So
   648 if you wanted to have strings in your Jasmin code, you would need to
   648 if you wanted to have strings in your Jasmin code, you would need to
   755 not support ``dynamic'' arrays, that is the size of our arrays will
   755 not support ``dynamic'' arrays, that is the size of our arrays will
   756 always be fixed. The second construct is for referencing an array cell
   756 always be fixed. The second construct is for referencing an array cell
   757 inside an arithmetic expression---we need to be able to look up the
   757 inside an arithmetic expression---we need to be able to look up the
   758 contents of an array at an index determined by an arithmetic expression.
   758 contents of an array at an index determined by an arithmetic expression.
   759 Similarly in the line below, we need to be able to update the content of
   759 Similarly in the line below, we need to be able to update the content of
   760 an array at an calculated index.  
   760 an array at a calculated index.  
   761 
   761 
   762 For creating a new array we can generate the following three JVM
   762 For creating a new array we can generate the following three JVM
   763 instructions:
   763 instructions:
   764 
   764 
   765 \begin{lstlisting}[mathescape,language=JVMIS]
   765 \begin{lstlisting}[mathescape,language=JVMIS]
   833               | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) 
   833               | \texttt{new}(\meta{Id}\,[\,\meta{Num}\,]) 
   834               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
   834               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
   835 \end{plstx}
   835 \end{plstx}
   836 
   836 
   837 With this in place we can turn back to the idea of creating
   837 With this in place we can turn back to the idea of creating
   838 WHILE-programs by translating BF programs. This is a relatively easy
   838 WHILE-programs by translating BF-programs. This is a relatively easy
   839 task because BF has only eight instructions (we will actually implement
   839 task because BF has only eight instructions (we will actually implement
   840 seven because we can omit the read-in instruction from BF). What makes
   840 seven because we can omit the read-in instruction from BF). What makes
   841 this translation easy is that BF-loops can be straightforwardly
   841 this translation easy is that BF-loops can be straightforwardly
   842 represented as while-loops. The Scala code for the translation is as
   842 represented as while-loops. The Scala code for the translation is as
   843 follows:
   843 follows:
   884 WHILE-language in order to do some benchmarking. Which means we now face
   884 WHILE-language in order to do some benchmarking. Which means we now face
   885 the question about what to do next\ldots
   885 the question about what to do next\ldots
   886 
   886 
   887 \subsection*{Optimisations \& Co}
   887 \subsection*{Optimisations \& Co}
   888 
   888 
   889 Every compiler that deserves its name performs optimisations on the
   889 Every compiler that deserves its name has to perform some optimisations
   890 code. If we make the extra effort of writing a compiler for a language,
   890 on the code: if we put in the extra effort of writing a compiler for a
   891 then obviously we want to have our code to run as fast as possible. 
   891 language, then obviously we want to have our code to run as fast as
   892 So we should look into this.
   892 possible. So we should look into this in more detail.
   893 
   893 
   894 There is actually one aspect in our generated code where we can make
   894 There is actually one aspect in our generated code where we can make
   895 easily efficiency gains: this has to do with some of the quirks of the
   895 easily efficiency gains. This has to do with some of the quirks of the
   896 JVM. Whenever we push a constant onto the stack, we used the JVM
   896 JVM. Whenever we push a constant onto the stack, we used the JVM
   897 instruction \instr{ldc some_const}. This is a rather generic instruction
   897 instruction \instr{ldc some_const}. This is a rather generic instruction
   898 in the sense that it works not just for integers but also for strings,
   898 in the sense that it works not just for integers but also for strings,
   899 objects and so on. What this instruction does is putting the constant
   899 objects and so on. What this instruction does is putting the constant
   900 into a \emph{constant pool} and then to use an index into this constant
   900 into a \emph{constant pool} and then uses an index into this constant
   901 pool. This means \instr{ldc} will be represented by at least two bytes
   901 pool. This means \instr{ldc} will be represented by at least two bytes
   902 in the class file. While this is sensible for ``large'' constants like
   902 in the class file. While this is a sensible strategy for ``large''
   903 strings, it is a bit of overkill for small integers (which many integers
   903 constants like strings, it is a bit of overkill for small integers
   904 will be when compiling a BF-program). To counter this ``waste'', the JVM
   904 (which many integers will be when compiling a BF-program). To counter
   905 has specific instructions for small integers, for example
   905 this ``waste'', the JVM has specific instructions for small integers,
       
   906 for example
   906  
   907  
   907 \begin{itemize}
   908 \begin{itemize}
   908 \item \instr{iconst_0},\ldots, \instr{iconst_5}
   909 \item \instr{iconst_0},\ldots, \instr{iconst_5}
   909 \item \instr{bipush n}
   910 \item \instr{bipush n}
   910 \end{itemize}
   911 \end{itemize}
   916 size smaller as these instructions only require 1 byte (as opposed the
   917 size smaller as these instructions only require 1 byte (as opposed the
   917 generic \instr{ldc} which needs 1 byte plus another for the index into
   918 generic \instr{ldc} which needs 1 byte plus another for the index into
   918 the constant pool). While in theory the use of such special instructions
   919 the constant pool). While in theory the use of such special instructions
   919 should make the code only smaller, it actually makes the code also run
   920 should make the code only smaller, it actually makes the code also run
   920 faster. Probably because the JVM has to process less code and uses a
   921 faster. Probably because the JVM has to process less code and uses a
   921 specific instruction in the underlying CPU.  The story with
   922 specific instruction for the underlying CPU.  The story with
   922 \instr{bipush} is slightly different, because it also uses two
   923 \instr{bipush} is slightly different, because it also uses two
   923 bytes---so it does not result in a reduction in code size. But againm,
   924 bytes---so it does not necessarily result in a reduction of code size.
   924 it probably uses a specific instruction in the underlying CPU that make
   925 Instead, it probably uses a specific instruction in the underlying CPU
   925 the JVM code run faster. This means when generating code we can use
   926 that makes the JVM code run faster.\footnote{This is all ``probable''
   926 the following helper function
   927 because I have not read the 700 pages of JVM documentation by Oracle and
       
   928 also have no clue how the JVM is implemented.} This means when
       
   929 generating code for pushing constants onto the stack, we can use the
       
   930 following Scala helper-function
   927 
   931 
   928 \begin{lstlisting}[language=Scala]
   932 \begin{lstlisting}[language=Scala]
   929 def compile_num(i: Int) = 
   933 def compile_num(i: Int) = 
   930   if (0 <= i && i <= 5) i"iconst_$i" else 
   934   if (0 <= i && i <= 5) i"iconst_$i" else 
   931   if (-128 <= i && i <= 127) i"bipush $i" else i"ldc $i"
   935   if (-128 <= i && i <= 127) i"bipush $i" 
       
   936   else i"ldc $i"
   932 \end{lstlisting}
   937 \end{lstlisting}
   933 
   938 
   934 \noindent 
   939 \noindent 
   935 that generates the more efficient instructions for pushing a constant
   940 It generates the more efficient instructions when pushing a small integer
   936 onto the stack. Note the JVM also has special instructions  that
   941 constant onto the stack. The default is \instr{ldc} for any other constants.
   937 load and store the first three local variables. The assumption is that 
   942 
   938 most operations and arguments in a method will only use very few 
   943 The JVM also has such special instructions for
   939 local variables. So the JVM has the following instructions:
   944 loading and storing the first three local variables. The assumption is
       
   945 that most operations and arguments in a method will only use very few
       
   946 local variables. So we can use the following instructions:
   940 
   947 
   941 \begin{itemize}
   948 \begin{itemize}
   942 \item \instr{iload_0},\ldots, \instr{iload_3}
   949 \item \instr{iload_0},\ldots, \instr{iload_3}
   943 \item \instr{istore_0},\ldots, \instr{istore_3}
   950 \item \instr{istore_0},\ldots, \instr{istore_3}
   944 \item \instr{aload_0},\ldots, \instr{aload_3}
   951 \item \instr{aload_0},\ldots, \instr{aload_3}
   945 \item \instr{astore_0},\ldots, \instr{astore_3}
   952 \item \instr{astore_0},\ldots, \instr{astore_3}
   946 \end{itemize}
   953 \end{itemize}
   947 
   954 
   948 
   955 
   949 \noindent Having implemented these optimisations, the code size of the
   956 \noindent Having implemented these optimisations, the code size of the
   950 BF-Mandelbrot program reduces and also it runs faster. According to my
   957 BF-Mandelbrot program reduces and also the class-file runs faster (the
   951 very rough experiments:
   958 parsing part is still very slow). According to my very rough
       
   959 experiments:
   952 
   960 
   953 \begin{center}
   961 \begin{center}
   954 \begin{tabular}{lll}
   962 \begin{tabular}{lll}
   955 & class-size & runtime\\\hline
   963 & class-size & runtime\\\hline
   956 Mandelbrot:\\
   964 Mandelbrot:\\
   959 \end{tabular}
   967 \end{tabular}
   960 \end{center}
   968 \end{center}
   961 
   969 
   962 \noindent 
   970 \noindent 
   963 Quite good! Such optimisations are called \emph{peephole optimisations},
   971 Quite good! Such optimisations are called \emph{peephole optimisations},
   964 because it is type of optimisations that involve changing a small set of
   972 because they involve changing one or a small set of instructions into an
   965 instructions into an equivalent set that has better performance. 
   973 equivalent set that has better performance. 
   966 
   974 
   967 If you look careful at our code you will quickly find another source of
   975 If you look careful at our generated code you will quickly find another
   968 inefficiency in programs like
   976 source of inefficiency in programs like
   969 
   977 
   970 \begin{lstlisting}[mathescape,language=While]
   978 \begin{lstlisting}[mathescape,language=While]
   971 x := ...;
   979 x := ...;
   972 write x
   980 write x
   973 \end{lstlisting}
   981 \end{lstlisting}
   974 
   982 
   975 \noindent
   983 \noindent
   976 where our code first calculates the new result the for \texttt{x} on the
   984 where our code first calculates the new result the for \texttt{x} on the
   977 stack, then pops off the result into a local variable, and after that
   985 stack, then pops off the result into a local variable, and after that
   978 loads the local variable back onto the stack for writing out a number.
   986 loads the local variable back onto the stack for writing out a number.
       
   987 
       
   988 \begin{lstlisting}[mathescape,language=JVMIS]
       
   989 ... 
       
   990 istore 0
       
   991 iload 0
       
   992 ...
       
   993 \end{lstlisting}
       
   994 
       
   995 \noindent
   979 If we can detect such situations, then we can leave the value of
   996 If we can detect such situations, then we can leave the value of
   980 \texttt{x} on the stack with for example the much cheaper instruction
   997 \texttt{x} on the stack with for example the much cheaper instruction
   981 \instr{dup}. Now the problem with this optimisation is that it is quite
   998 \instr{dup}. Now the problem with this optimisation is that it is quite
   982 easy for the snippet above, but what about instances where there is
   999 easy for the snippet above, but what about instances where there is
   983 further WHILE-code in \emph{between} these two statements? Sometimes we
  1000 further WHILE-code in \emph{between} these two statements? Sometimes we
   984 will be able to optimise, sometimes we will not. The compiler needs to
  1001 will be able to optimise, sometimes we will not. The compiler needs to
   985 find out which situation applies. This can become quickly much more
  1002 find out which situation applies. This can quickly become  much more
   986 complicated. So we leave this kind of optimisations here and look at
  1003 complicated. So we leave this kind of optimisations here and look at
   987 something more interesting and possibly surprising.
  1004 something more interesting and possibly surprising.
   988 
  1005 
   989 
  1006 As you might have seen, the compiler writer has a lot of freedom about
   990 As you have probably seen, the compiler writer has a lot of freedom
  1007 how to generate code from what the programmer wrote as program. The only
   991 about how to generate code from what the programmer wrote as program.
  1008 condition is that generated code should behave as expected by the
   992 The only condition is that generated code should behave as expected by
  1009 programmer. Then all is fine with the code above\ldots mission
   993 the programmer. Then all is fine\ldots mission accomplished! But
  1010 accomplished! But sometimes the compiler writer is expected to go an
   994 sometimes the compiler writer is expected to go an extra mile, or even
  1011 extra mile, or even miles and change(!) the meaning of a program.
   995 miles and change the meaning of a program in unexpected ways. Suppose we
  1012 Suppose we are given the following WHILE-program:
   996 are given the following WHILE-program:
       
   997 
  1013 
   998 \begin{lstlisting}[mathescape,language=While]
  1014 \begin{lstlisting}[mathescape,language=While]
   999 new(arr[10]);
  1015 new(arr[10]);
  1000 arr[14] := 3 + arr[13]
  1016 arr[14] := 3 + arr[13]
  1001 \end{lstlisting}
  1017 \end{lstlisting}
  1002 
  1018 
  1003 \noindent 
  1019 \noindent 
  1004 Admittedly this is a contrived program, and probably not meant to be
  1020 Admittedly this is a contrived program, and probably not meant to be
  1005 like this by any sane programmer, but it is supposed to make the
  1021 like this by any sane programmer, but it is supposed to make the
  1006 following point: We generate an array of size 10, and then try to access
  1022 following point: The program generates an array of size 10, and then
  1007 the non-existing element at index 13 and even updating element with
  1023 tries to access the non-existing element at index 13 and even updating
  1008 index 14. Obviously this is baloney. Still, our compiler generates code
  1024 the element with index 14. Obviously this is baloney. Still, our
  1009 for this program without any questions asked. We can even run this code
  1025 compiler generates code for this program without any questions asked. We
  1010 on the JVM\ldots of course the result is an exception trace where the
  1026 can even run this code on the JVM\ldots of course the result is an
  1011 JVM yells at us for doing naughty things. (This is much better than C,
  1027 exception trace where the JVM yells at us for doing naughty
  1012 for example, where such errors are not prevented and as a result
  1028 things.\footnote{Still this is much better than C, for example, where
  1013 insidious attacks can be mounted against such kind C-programs. I assume
  1029 such errors are not prevented and as a result insidious attacks can be
  1014 everyone has heard about \emph{Buffer Overflow Attacks}.) Now what
  1030 mounted against such kind C-programs. I assume everyone has heard about
  1015 should we do in such situations? Index over- or underflows are
  1031 \emph{Buffer Overflow Attacks}.} Now what should we do in such
  1016 notoriously difficult to detect statically (at compiletime), so it seem
  1032 situations? Over- and underflows of indices are notoriously difficult to
  1017 raising an exception at run-time like the JVM is the best compromise.
  1033 detect statically (at compiletime). So it might seem raising an
       
  1034 exception at run-time like the JVM is the best compromise.
  1018 
  1035 
  1019 Well, imagine we do not want to rely in our compiler on the JVM for
  1036 Well, imagine we do not want to rely in our compiler on the JVM for
  1020 producing an annoying, but safe exception trace, rather we want to
  1037 producing an annoying, but safe exception trace, rather we want to
  1021 handle such situations ourselves according to what we thing should
  1038 handle such situations ourselves according to what we think should
  1022 happen in such cases. Let's assume we want to handle them in the
  1039 happen in such cases. Let us assume we want to handle them in the
  1023 following way: if the programmer access a field out-of-bounds, we just
  1040 following way: if the programmer access a field out-of-bounds, we just
  1024 return a default 0, and if a programmer wants to update an
  1041 return a default 0, and if a programmer wants to update an out-of-bounds
  1025 out-of-bounds field, we want to ``quietly'' ignore this update.
  1042 field, we want to ``quietly'' ignore this update. One way to achieve
  1026 
  1043 this would be to rewrite the WHILE-programs and insert the necessary
  1027 
  1044 if-conditions for safely reading and writing arrays. Another way
  1028 arraylength
  1045 is to modify the code we generate.
       
  1046 
       
  1047 \begin{lstlisting}[mathescape,language=JVMIS2]
       
  1048   $\textit{index\_aexp}$ 
       
  1049   aload loc_var 
       
  1050   dup2
       
  1051   arraylength
       
  1052   if_icmple L1
       
  1053   pop2
       
  1054   iconst_0
       
  1055   goto L2
       
  1056 L1:
       
  1057   swap
       
  1058   iaload
       
  1059 L2:
       
  1060 \end{lstlisting}
       
  1061 
       
  1062  \begin{lstlisting}[mathescape,language=JVMIS2]
       
  1063   $\textit{index\_aexp}$ 
       
  1064   aload loc_var 
       
  1065   dup2
       
  1066   arraylength
       
  1067   if_icmple L1
       
  1068   pop2
       
  1069   goto L2
       
  1070 L1:
       
  1071   swap
       
  1072   $\textit{value\_aexp}$
       
  1073   iastore
       
  1074 L2:
       
  1075 \end{lstlisting}
  1029 
  1076 
  1030 
  1077 
  1031 \end{document}
  1078 \end{document}
  1032 
  1079 
  1033 %%% Local Variables: 
  1080 %%% Local Variables: