handouts/ho07.tex
changeset 690 8d57433c7b5e
parent 668 9ce78065f68d
child 691 991849dfbcb1
equal deleted inserted replaced
689:d7c9ef381437 690:8d57433c7b5e
     9 \begin{document}
     9 \begin{document}
    10 \fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019}
    10 \fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019}
    11 
    11 
    12 \section*{Handout 7 (Compilation)}
    12 \section*{Handout 7 (Compilation)}
    13 
    13 
       
    14 
       
    15 
    14 The purpose of a compiler is to transform a program a human can read and
    16 The purpose of a compiler is to transform a program a human can read and
    15 write into code the machine can run as fast as possible. The fastest
    17 write into code the machine can run as fast as possible. The fastest
    16 code would be machine code the CPU can run directly, but it is often
    18 code would be machine code the CPU can run directly, but it is often
    17 good enough for improving the speed of a program to just target a
    19 good enough for improving the speed of a program to target a
    18 virtual machine. This produces not the fastest possible code, but code
    20 virtual machine. This produces not the fastest possible code, but code
    19 that is pretty enough. This way of producing code has the advantage that
    21 that is often pretty fast. This way of producing code has the advantage that
    20 the virtual machine takes care of things a compiler would normally need
    22 the virtual machine takes care of things a compiler would normally need
    21 to take care of (like explicit memory management). An interesting answer
    23 to take care of (like explicit memory management). 
    22 for why studying compilers is given by John Regher in his compiler
       
    23 blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
       
    24 
       
    25 \begin{quote}\it{}``We can start off with a couple of observations
       
    26   about the role of compilers. First, hardware is getting weirder
       
    27   rather than getting clocked faster: almost all processors are
       
    28   multicores and it looks like there is increasing asymmetry in
       
    29   resources across cores. Processors come with vector units, crypto
       
    30   accelerators, bit twiddling instructions, and lots of features to
       
    31   make virtualization and concurrency work. We have DSPs, GPUs,
       
    32   big.little, and Xeon Phi. This is only scratching the
       
    33   surface. Second, we’re getting tired of low-level languages and
       
    34   their associated security disasters, we want to write new code, to
       
    35   whatever extent possible, in safer, higher-level
       
    36   languages. Compilers are caught right in the middle of these
       
    37   opposing trends: one of their main jobs is to help bridge the large
       
    38   and growing gap between increasingly high-level languages and
       
    39   increasingly wacky platforms. It’s effectively a perpetual
       
    40   employment act for solid compiler hackers.''
       
    41 \end{quote}  
       
    42 
    24 
    43 As a first example in this module we will implement a compiler for the
    25 As a first example in this module we will implement a compiler for the
    44 very simple While-language. It will generate code for the Java Virtual
    26 very simple While-language. It will generate code for the Java Virtual
    45 Machine (JVM). This is a stack-based virtual machine, a fact which will
    27 Machine (JVM). Unfortunately the Java ecosystem does not come with an
    46 make it easy to generate code for arithmetic expressions. For example
    28 assembler which would be handy for our  compiler-endeavour  (unlike
    47 when compiling $1 + 2$ we need to generate the following three
    29 Microsoft's  Common Language Infrastructure for the .Net platform which
    48 instructions
    30 has an assembler out-of-the-box). As a substitute we use in this module
       
    31 the 3rd-party programs Jasmin and Krakatau
       
    32 
       
    33 \begin{itemize}
       
    34   \item \url{http://jasmin.sourceforge.net}
       
    35   \item \url{https://github.com/Storyyeller/Krakatau}
       
    36 \end{itemize}
       
    37 
       
    38 \noindent
       
    39 The first is a Java program and the second a program written in Python.
       
    40 Each of them allow us to generate \emph{assembly} files that are still
       
    41 readable by humans, as opposed to class-files which are pretty much just
       
    42 (horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
       
    43 an assembly file as input and generate the corresponding class file for
       
    44 us. 
       
    45 
       
    46 Good about the JVM is that it is a stack-based virtual machine, a fact
       
    47 which will make it easy to generate code for arithmetic expressions. For
       
    48 example when compiling the expression $1 + 2$ we need to generate the
       
    49 following three instructions
    49 
    50 
    50 \begin{lstlisting}[language=JVMIS,numbers=none]
    51 \begin{lstlisting}[language=JVMIS,numbers=none]
    51 ldc 1
    52 ldc 1
    52 ldc 2
    53 ldc 2
    53 iadd 
    54 iadd 
   111 \end{lstlisting}
   112 \end{lstlisting}
   112 
   113 
   113 \noindent If we ``run'' these instructions, the result $8$ will be on
   114 \noindent If we ``run'' these instructions, the result $8$ will be on
   114 top of the stack (I leave this to you to verify; the meaning of each
   115 top of the stack (I leave this to you to verify; the meaning of each
   115 instruction should be clear). The result being on the top of the stack
   116 instruction should be clear). The result being on the top of the stack
   116 will be a convention we always observe in our compiler, that is the
   117 will be an important convention we always observe in our compiler. Note,
   117 results of arithmetic expressions will always be on top of the stack.
   118 that a different bracketing of the expression, for example $(1 + (2 *
   118 Note, that a different bracketing of the expression, for example $(1 +
   119 3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
   119 (2 * 3)) + (4 - 3)$, produces a different abstract syntax tree and thus
   120 a different list of instructions. Generating code in this
   120 also a different list of instructions. Generating code in this
       
   121 post-order-traversal fashion is rather easy to implement: it can be done
   121 post-order-traversal fashion is rather easy to implement: it can be done
   122 with the following recursive \textit{compile}-function, which takes the
   122 with the following recursive \textit{compile}-function, which takes the
   123 abstract syntax tree as argument:
   123 abstract syntax tree as argument:
   124 
   124 
   125 \begin{center}
   125 \begin{center}
   182 \end{tabular}
   182 \end{tabular}
   183 \end{center}
   183 \end{center}
   184 
   184 
   185 \noindent In the last line we generate the code for variables
   185 \noindent In the last line we generate the code for variables
   186 where $E(x)$ stands for looking up the environment to which
   186 where $E(x)$ stands for looking up the environment to which
   187 index the variable $x$ maps to.
   187 index the variable $x$ maps to. This is similar to an interpreter,
       
   188 which also needs an environment: the difference is that the 
       
   189 interpreter maintains a mapping from variables to current values (what is the
       
   190 currently the value of a variable), while compilers need a mapping
       
   191 from variables to memory locations (where can I find the current 
       
   192 value for the variable in memory).
   188 
   193 
   189 There is a similar \textit{compile}-function for boolean
   194 There is a similar \textit{compile}-function for boolean
   190 expressions, but it includes a ``trick'' to do with
   195 expressions, but it includes a ``trick'' to do with
   191 \pcode{if}- and \pcode{while}-statements. To explain the issue
   196 \pcode{if}- and \pcode{while}-statements. To explain the issue
   192 let us first describe the compilation of statements of the
   197 let us first describe the compilation of statements of the
   204 list of instructions (in this case the empty list) and an
   209 list of instructions (in this case the empty list) and an
   205 environment for variables. The reason for the environment is
   210 environment for variables. The reason for the environment is
   206 that assignments in the While-language might change the
   211 that assignments in the While-language might change the
   207 environment---clearly if a variable is used for the first
   212 environment---clearly if a variable is used for the first
   208 time, we need to allocate a new index and if it has been used
   213 time, we need to allocate a new index and if it has been used
   209 before, we need to be able to retrieve the associated index.
   214 before, then we need to be able to retrieve the associated index.
   210 This is reflected in the clause for compiling assignments:
   215 This is reflected in the clause for compiling assignments, say
       
   216 $\textit{x := a}$:
   211 
   217 
   212 \begin{center}
   218 \begin{center}
   213 \begin{tabular}{lcl}
   219 \begin{tabular}{lcl}
   214 $\textit{compile}(x := a, E)$ & $\dn$ & 
   220 $\textit{compile}(x := a, E)$ & $\dn$ & 
   215 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
   221 $(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$
   226 the index is and return the environment unchanged (that is in
   232 the index is and return the environment unchanged (that is in
   227 this case $E' = E$). However, if this is the first encounter 
   233 this case $E' = E$). However, if this is the first encounter 
   228 of the variable $x$ in the program, then we have to augment 
   234 of the variable $x$ in the program, then we have to augment 
   229 the environment and assign $x$ with the largest index in $E$
   235 the environment and assign $x$ with the largest index in $E$
   230 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
   236 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
   231 That means for the assignment $x := x + 1$ we generate the
   237 This means for the assignment $x := x + 1$ we generate the
   232 following code
   238 following code
   233 
   239 
   234 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   240 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   235 iload $n_x$
   241 iload $n_x$
   236 ldc 1
   242 ldc 1
   237 iadd
   243 iadd
   238 istore $n_x$
   244 istore $n_x$
   239 \end{lstlisting}
   245 \end{lstlisting}
   240 
   246 
   241 \noindent 
   247 \noindent 
   242 where $n_x$ is the index for the variable $x$. The code for 
   248 where $n_x$ is the index (or pointer to the memory) for the variable $x$ . The code for 
   243 looking-up the index for the variable is as follow:
   249 looking-up the index for the variable is as follow:
   244 
   250 
   245 \begin{center}
   251 \begin{center}
   246 \begin{tabular}{lcl}
   252 \begin{tabular}{lcl}
   247 $index \;=\; \textit{if}\;(E(x).\textit{isDefind})\;\textit{then}\;E(x)\;\textit{else}\;|E|$
   253 $index \;=\; E\textit{.getOrElse}(x, |E|)$
   248 \end{tabular}
   254 \end{tabular}
   249 \end{center}
   255 \end{center}
   250 
   256 
   251 \noindent
   257 \noindent
   252 In case the environment $E$ contains an index for $x$, we return it.
   258 In case the environment $E$ contains an index for $x$, we return it.
   253 Otherwise we ``create'' a new index by returning the size $|E|$ of the
   259 Otherwise we ``create'' a new index by returning the size $|E|$ of the
   254 environment (that will be an index that is guaranteed to be not used
   260 environment (that will be an index that is guaranteed to be not used
   255 yet).
   261 yet).
   256 
   262 
   257 More complicated is the generation of code for \pcode{if}-statements, say
   263 A bit more complicated is the generation of code for \pcode{if}-statements, say
   258 
   264 
   259 \begin{lstlisting}[mathescape,language={},numbers=none]
   265 \begin{lstlisting}[mathescape,language={},numbers=none]
   260 if $b$ then $cs_1$ else $cs_2$
   266 if $b$ then $cs_1$ else $cs_2$
   261 \end{lstlisting}
   267 \end{lstlisting}
   262 
   268 
   421 the else-branch, the label $L_\textit{ifesle}$, followed by
   427 the else-branch, the label $L_\textit{ifesle}$, followed by
   422 the instructions for the else-branch, followed by the 
   428 the instructions for the else-branch, followed by the 
   423 after-the-else-branch label. Consider for example the 
   429 after-the-else-branch label. Consider for example the 
   424 if-statement:
   430 if-statement:
   425 
   431 
   426 \begin{lstlisting}[mathescape,numbers=none,language={}]
   432 \begin{lstlisting}[mathescape,numbers=none,language=While]
   427 if 1 = 1 then x := 2 else y := 3
   433 if 1 = 1 then x := 2 else y := 3
   428 \end{lstlisting}
   434 \end{lstlisting}
   429 
   435 
   430 \noindent 
   436 \noindent 
   431 The generated code is as follows:
   437 The generated code is as follows:
   432 
   438 
   433 \begin{lstlisting}[language=JVMIS,mathescape,language={}]
   439 \begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
   434    ldc 1
   440    ldc 1
   435    ldc 1
   441    ldc 1
   436    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
   442    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
   437    ldc 2
   443    ldc 2
   438    istore 0
   444    istore 0
   531 \end{center}
   537 \end{center}
   532 
   538 
   533 \noindent I let you go through how this clause works. As an example
   539 \noindent I let you go through how this clause works. As an example
   534 you can consider the while-loop
   540 you can consider the while-loop
   535 
   541 
   536 \begin{lstlisting}[mathescape,numbers=none,language={}]
   542 \begin{lstlisting}[mathescape,numbers=none,language=While]
   537 while x <= 10 do x := x + 1
   543 while x <= 10 do x := x + 1
   538 \end{lstlisting}
   544 \end{lstlisting}
   539 
   545 
   540 \noindent yielding the following code
   546 \noindent yielding the following code
   541 
   547 
   542 \begin{lstlisting}[language=JVMIS,mathescape,language={}]
   548 \begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
   543 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
   549 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
   544    iload 0
   550    iload 0
   545    ldc 10
   551    ldc 10
   546    if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
   552    if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
   547    iload 0
   553    iload 0
   557            -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
   563            -- ++(0mm,17.3mm) |- (\tikztotarget)},line width=1mm] (LB.east);
   558   \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
   564   \draw[->,very thick] (LC) edge [->,to path={-- ++(10mm,0mm) 
   559            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
   565            -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
   560 \end{tikzpicture}
   566 \end{tikzpicture}
   561 
   567 
       
   568 \noindent
       
   569 I leave it to you to read the code and follow its controlflow.
   562 
   570 
   563 Next we need to consider the statement \pcode{write x}, which
   571 Next we need to consider the statement \pcode{write x}, which
   564 can be used to print out the content of a variable. For this
   572 can be used to print out the content of a variable. For this
   565 we need to use a Java library function. In order to avoid
   573 we need to use a Java library function. In order to avoid
   566 having to generate a lot of code for each
   574 having to generate a lot of code for each
   568 just call this method with an argument (which needs to be
   576 just call this method with an argument (which needs to be
   569 placed onto the stack). The code of the helper-method is as
   577 placed onto the stack). The code of the helper-method is as
   570 follows.
   578 follows.
   571 
   579 
   572 
   580 
   573 \begin{lstlisting}[language=JVMIS]
   581 \begin{lstlisting}[language=JVMIS,numbers=left]
   574 .method public static write(I)V 
   582 .method public static write(I)V 
   575     .limit locals 1 
   583     .limit locals 1 
   576     .limit stack 2 
   584     .limit stack 2 
   577     getstatic java/lang/System/out Ljava/io/PrintStream; 
   585     getstatic java/lang/System/out Ljava/io/PrintStream; 
   578     iload 0
   586     iload 0
   611 need to be replaced by an appropriate class name (this will be
   619 need to be replaced by an appropriate class name (this will be
   612 explained shortly).
   620 explained shortly).
   613 
   621 
   614 
   622 
   615 \begin{figure}[t]
   623 \begin{figure}[t]
   616 \begin{lstlisting}[mathescape,language=JVMIS]
   624 \begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
   617 .class public XXX.XXX
   625 .class public XXX.XXX
   618 .super java/lang/Object
   626 .super java/lang/Object
   619 
   627 
   620 .method public <init>()V
   628 .method public <init>()V
   621     aload_0
   629     aload_0
   672 \caption{Generated code for a test program. This code can be 
   680 \caption{Generated code for a test program. This code can be 
   673 processed by an Java assembler producing a class-file, which
   681 processed by an Java assembler producing a class-file, which
   674 can be run by the {\tt{}java}-program.\label{test}}
   682 can be run by the {\tt{}java}-program.\label{test}}
   675 \end{figure}
   683 \end{figure}
   676 
   684 
       
   685 \subsection*{Arrays}
       
   686 
       
   687 Maybe a useful addition to the While-language are arrays. This 
       
   688 lets us generate more interesting While-programs by translating
       
   689 BF*** programs into equivalent While-programs. This means we 
       
   690 want to support the following three constructions
       
   691 
       
   692 \begin{lstlisting}[mathescape,language=While]
       
   693 new arr[15000]
       
   694 x := 3 + arr[3 + y]
       
   695 arr[42 * n] := ...
       
   696 \end{lstlisting}
       
   697 
       
   698 \noindent
       
   699 The first one creates a new array with name \pcode{arr} that can 
       
   700 hold a given number of integers. The second is referencing an array
       
   701 inside a arithmetic expression. Essentially we have to be able to 
       
   702 look up the contents of an array at an index. Similarly we need to be
       
   703 able to update the content of an array (3rd line). The latter two
       
   704 constructions state that the index to an array can be given by 
       
   705 an arithmetic expression. For creating a new array we need to generate
       
   706 the following JVM instructions:
       
   707 
       
   708 \begin{lstlisting}[mathescape,language=JVMIS]
       
   709 ldc number 
       
   710 newarray int 
       
   711 astore loc_var
       
   712 \end{lstlisting}
       
   713 
       
   714 \noindent
       
   715 First we need to put the dimension of the array onto the
       
   716 stack, then next instruction creates the array and last we
       
   717 need to store the array in a local variable (like ``simple'' variables).
       
   718 For looking up an element in an array we can use the following code
       
   719 
       
   720 \begin{lstlisting}[mathescape,language=JVMIS]
       
   721 aload loc_var 
       
   722 index_aexp 
       
   723 iaload
       
   724 \end{lstlisting}
       
   725 
       
   726 \noindent
       
   727 This first loads the ``pointer'' to the array onto the stack. Then we have the
       
   728 instructions corresponding to the index where we want to look up the array. The
       
   729 idea is that these instructions will leave a concrete number on the stack, which is
       
   730 the index. Finally we just need to load the corresponding element onto the stack.
       
   731 
   677 \end{document}
   732 \end{document}
   678 
   733 
   679 %%% Local Variables: 
   734 %%% Local Variables: 
   680 %%% mode: latex  
   735 %%% mode: latex  
   681 %%% TeX-master: t
   736 %%% TeX-master: t