handouts/ho07.tex
changeset 370 a65767fe5d71
parent 369 43c0ed473720
child 372 d6af4b1239de
equal deleted inserted replaced
369:43c0ed473720 370:a65767fe5d71
     1 \documentclass{article}
     1 \documentclass{article}
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 
     4 \usepackage{../grammar}
       
     5 \usepackage{tikz-qtree}
     5 
     6 
     6 \begin{document}
     7 \begin{document}
     7 
     8 
     8 \section*{Handout 7 (Compilation)}
     9 \section*{Handout 7 (Compilation of the WHILE-Language)}
     9 
    10 
    10 The purpose of a compiler is to transform a program, a human
    11 The purpose of a compiler is to transform a program, a human
    11 can write, into code the machine can run as fast as possible.
    12 can write, into code the machine can run as fast as possible.
    12 The fastest code would be machine code the CPU can run
    13 The fastest code would be machine code the CPU can run
    13 directly, but it is often enough to improve the speed of a
    14 directly, but it is often enough to improve the speed of a
    17 compiler would normally need to take care of (like explicit
    18 compiler would normally need to take care of (like explicit
    18 memory management).
    19 memory management).
    19 
    20 
    20 We will be generating code for the Java Virtual Machine. This
    21 We will be generating code for the Java Virtual Machine. This
    21 is a stack-based virtual machine which will make it easy to
    22 is a stack-based virtual machine which will make it easy to
    22 generate code for arithmetic expressions. Recall
    23 generate code for arithmetic expressions. For example for 
    23 that our
    24 generating code for the expression $1 + 2$ we need to issue
       
    25 the following three instructions
       
    26 
       
    27 \begin{lstlisting}[numbers=none]
       
    28 ldc 1
       
    29 ldc 2
       
    30 iadd 
       
    31 \end{lstlisting}
       
    32 
       
    33 \noindent The first instruction loads the constant $1$ on the 
       
    34 stack, the next one $2$, the third instruction add both 
       
    35 numbers together replacing the top elements of the stack with 
       
    36 the result $3$. We will throughout consider only integer 
       
    37 numbers and results, therefore we can use the instructions
       
    38 \code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
       
    39 The \code{i} stands for integer instructions (alternatives are
       
    40 \code{d} for doubles, \code{l} for longs and \code{f} for 
       
    41 floats).
       
    42 
       
    43 Recall our grammar for arithmetic expressions:
       
    44 
       
    45 
       
    46 \begin{plstx}[rhs style=, margin=3cm]
       
    47 : \meta{E} ::= \meta{T} $+$ \meta{E}
       
    48          | \meta{T} $-$ \meta{E}
       
    49          | \meta{T}\\
       
    50 : \meta{T} ::= \meta{F} $*$ \meta{T}
       
    51           | \meta{F} $\backslash$ \meta{T}
       
    52           | \meta{F}\\
       
    53 : \meta{F} ::= ( \meta{E} )
       
    54           | \meta{Id}
       
    55           | \meta{Num}\\
       
    56 \end{plstx}
       
    57 
       
    58 
       
    59 \noindent where \meta{Id} stands for variables and
       
    60 \meta{Num} for number. For the moment let us omit variables from
       
    61 arithmetic expressions. Our parser will take this grammar and
       
    62 produce abstract syntax trees, for
       
    63 example for the expression $1 + ((2 * 3) + (4 - 3))$ it will
       
    64 produce the following tree.
       
    65 
       
    66 \begin{center}
       
    67 \begin{tikzpicture}
       
    68 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
       
    69 \end{tikzpicture}
       
    70 \end{center}
       
    71 
       
    72 \noindent 
       
    73 To generate code for this expression, we need to traverse this
       
    74 tree in post-order fashion---this will produce code for a 
       
    75 stack-machine, like the JVM. Doing so gives the instructions
       
    76 
       
    77 \begin{lstlisting}[numbers=none]
       
    78 ldc 1 
       
    79 ldc 2 
       
    80 ldc 3 
       
    81 imul 
       
    82 ldc 4 
       
    83 ldc 3 
       
    84 isub 
       
    85 iadd 
       
    86 iadd
       
    87 \end{lstlisting}
       
    88 
       
    89 \noindent If we ``run'' these instructions, the result $8$
       
    90 will be on top of the stack. This will be a convention we
       
    91 always observe, namely that the results of arithmetic
       
    92 expressions will always be on top of the stack. Note, that a
       
    93 different bracketing, for example $(1 + (2 * 3)) + (4 - 3)$,
       
    94 produces a different abstract syntax tree and thus potentially
       
    95 also a different list of instructions. Generating code in this
       
    96 fashion is rather simple: it can be done by the following
       
    97 \textit{compile}-function:
       
    98 
       
    99 \begin{center}
       
   100 \begin{tabular}{lcl}
       
   101 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
       
   102 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
       
   103 $\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
       
   104 $\textit{compile}(a_1 - a_2)$ & $\dn$ & 
       
   105 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
       
   106 $\textit{compile}(a_1 * a_2)$ & $\dn$ & 
       
   107 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
       
   108 $\textit{compile}(a_1 \backslash a_2)$ & $\dn$ & 
       
   109 $\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
       
   110 \end{tabular}
       
   111 \end{center}
       
   112 
       
   113 \noindent However, our arithmetic expressions can also contain
       
   114 variables. We will represent them as \emph{local variables} in
       
   115 the JVM. Essentially, local variables are an array or pointers
       
   116 to the memory containing in our case only integers. Looking up
       
   117 a variable can be done by the instruction
       
   118 
       
   119 \begin{lstlisting}[mathescape,numbers=none]
       
   120 iload $index$
       
   121 \end{lstlisting}
       
   122 
       
   123 \noindent 
       
   124 which places the content of the local variable $index$ onto 
       
   125 thestack. Storing the top of the stack into a local variable 
       
   126 can be done by the instruction
       
   127 
       
   128 \begin{lstlisting}[mathescape,numbers=none]
       
   129 istore $index$
       
   130 \end{lstlisting}
       
   131 
       
   132 \noindent Note that this also pops off the top of the stack.
       
   133 One problem we have to overcome is that local variables are
       
   134 addressed, not by identifiers, but by numbers (starting from
       
   135 $0$). Therefore our compiler needs to maintain a kind of
       
   136 environment (similar to the interpreter) where variables are
       
   137 associated to numbers. This association needs to be unique: if
       
   138 we muddle up the numbers, then we essentially confuse
       
   139 variables and the result will usually be an erroneous result.
       
   140 Therefore our \textit{compile}-function will take two
       
   141 arguments: the abstract syntax tree and the environment, $E$,
       
   142 that maps identifiers to index-numbers.
       
   143 
       
   144 \begin{center}
       
   145 \begin{tabular}{lcl}
       
   146 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
       
   147 $\textit{compile}(a_1 + a_2, E)$ & $\dn$ & 
       
   148 $\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
       
   149 $\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
       
   150 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
       
   151 $\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
       
   152 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
       
   153 $\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ & 
       
   154 $\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
       
   155 $\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
       
   156 \end{tabular}
       
   157 \end{center}
       
   158 
       
   159 \noindent In the last line we generate the code for variables
       
   160 where $E(x)$ stands for looking up to which index the variable
       
   161 $x$ maps to.
       
   162 
    24 
   163 
    25 \end{document}
   164 \end{document}
    26 
   165 
    27 %%% Local Variables: 
   166 %%% Local Variables: 
    28 %%% mode: latex  
   167 %%% mode: latex