afl-material: comparison handouts/ho07.tex

equal deleted inserted replaced

-:43c0ed473720
+:a65767fe5d71
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
+\usepackage{../grammar}
+\usepackage{tikz-qtree}
 \begin{document}
-\section*{Handout 7 (Compilation)}
+\section*{Handout 7 (Compilation of the WHILE-Language)}
 The purpose of a compiler is to transform a program, a human
 can write, into code the machine can run as fast as possible.
 The fastest code would be machine code the CPU can run
 directly, but it is often enough to improve the speed of a
 compiler would normally need to take care of (like explicit
 memory management).
 We will be generating code for the Java Virtual Machine. This
 is a stack-based virtual machine which will make it easy to
-generate code for arithmetic expressions. Recall
+generate code for arithmetic expressions. For example for
-that our
+generating code for the expression $1 + 2$ we need to issue
+the following three instructions
+\begin{lstlisting}[numbers=none]
+ldc 1
+ldc 2
+iadd
+\end{lstlisting}
+\noindent The first instruction loads the constant $1$ on the
+stack, the next one $2$, the third instruction add both
+numbers together replacing the top elements of the stack with
+the result $3$. We will throughout consider only integer
+numbers and results, therefore we can use the instructions
+\code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on.
+The \code{i} stands for integer instructions (alternatives are
+\code{d} for doubles, \code{l} for longs and \code{f} for
+floats).
+Recall our grammar for arithmetic expressions:
+\begin{plstx}[rhs style=, margin=3cm]
+: \meta{E} ::= \meta{T} $+$ \meta{E}
+| \meta{T} $-$ \meta{E}
+| \meta{T}\\
+: \meta{T} ::= \meta{F} $*$ \meta{T}
+| \meta{F} $\backslash$ \meta{T}
+| \meta{F}\\
+: \meta{F} ::= ( \meta{E} )
+| \meta{Id}
+| \meta{Num}\\
+\end{plstx}
+\noindent where \meta{Id} stands for variables and
+\meta{Num} for number. For the moment let us omit variables from
+arithmetic expressions. Our parser will take this grammar and
+produce abstract syntax trees, for
+example for the expression $1 + ((2 * 3) + (4 - 3))$ it will
+produce the following tree.
+\begin{center}
+\begin{tikzpicture}
+\Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
+\end{tikzpicture}
+\end{center}
+\noindent
+To generate code for this expression, we need to traverse this
+tree in post-order fashion---this will produce code for a
+stack-machine, like the JVM. Doing so gives the instructions
+\begin{lstlisting}[numbers=none]
+ldc 1
+ldc 2
+ldc 3
+imul
+ldc 4
+ldc 3
+isub
+iadd
+iadd
+\end{lstlisting}
+\noindent If we ``run'' these instructions, the result $8$
+will be on top of the stack. This will be a convention we
+always observe, namely that the results of arithmetic
+expressions will always be on top of the stack. Note, that a
+different bracketing, for example $(1 + (2 * 3)) + (4 - 3)$,
+produces a different abstract syntax tree and thus potentially
+also a different list of instructions. Generating code in this
+fashion is rather simple: it can be done by the following
+\textit{compile}-function:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
+$\textit{compile}(a_1 + a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\;\textit{compile}(a_2)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1 - a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1 * a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1 \backslash a_2)$ & $\dn$ &
+$\textit{compile}(a_1) \;@\; \textit{compile}(a_2)\;@\; \pcode{idiv}$\\
+\end{tabular}
+\end{center}
+\noindent However, our arithmetic expressions can also contain
+variables. We will represent them as \emph{local variables} in
+the JVM. Essentially, local variables are an array or pointers
+to the memory containing in our case only integers. Looking up
+a variable can be done by the instruction
+\begin{lstlisting}[mathescape,numbers=none]
+iload $index$
+\end{lstlisting}
+\noindent
+which places the content of the local variable $index$ onto
+thestack. Storing the top of the stack into a local variable
+can be done by the instruction
+\begin{lstlisting}[mathescape,numbers=none]
+istore $index$
+\end{lstlisting}
+\noindent Note that this also pops off the top of the stack.
+One problem we have to overcome is that local variables are
+addressed, not by identifiers, but by numbers (starting from
+$0$). Therefore our compiler needs to maintain a kind of
+environment (similar to the interpreter) where variables are
+associated to numbers. This association needs to be unique: if
+we muddle up the numbers, then we essentially confuse
+variables and the result will usually be an erroneous result.
+Therefore our \textit{compile}-function will take two
+arguments: the abstract syntax tree and the environment, $E$,
+that maps identifiers to index-numbers.
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
+$\textit{compile}(a_1 + a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{iadd}$\\
+$\textit{compile}(a_1 - a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{isub}$\\
+$\textit{compile}(a_1 * a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{imul}$\\
+$\textit{compile}(a_1 \backslash a_2, E)$ & $\dn$ &
+$\textit{compile}(a_1, E) \;@\; \textit{compile}(a_2, E)\;@\; \pcode{idiv}$\\
+$\textit{compile}(x, E)$ & $\dn$ & $\pcode{iload}\;E(x)$\\
+\end{tabular}
+\end{center}
+\noindent In the last line we generate the code for variables
+where $E(x)$ stands for looking up to which index the variable
+$x$ maps to.
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 370	a65767fe5d71
parent 369	43c0ed473720
child 372	d6af4b1239de