# HG changeset patch # User Christian Urban # Date 1447725530 0 # Node ID d6af4b1239de9c5d8945006f52ed299542b93c7b # Parent 0b1a92b305cf75d0feda168e2db3b4466e24211a updated diff -r 0b1a92b305cf -r d6af4b1239de graphics.sty --- a/graphics.sty Mon Nov 16 15:16:26 2015 +0000 +++ b/graphics.sty Tue Nov 17 01:58:50 2015 +0000 @@ -6,6 +6,7 @@ \usetikzlibrary{arrows} \usetikzlibrary{backgrounds} \usetikzlibrary{fit} +\usepackage{tikz-qtree} \usepackage{graphicx} \usepackage{pgfplots} diff -r 0b1a92b305cf -r d6af4b1239de handouts/ho07.pdf Binary file handouts/ho07.pdf has changed diff -r 0b1a92b305cf -r d6af4b1239de handouts/ho07.tex --- a/handouts/ho07.tex Mon Nov 16 15:16:26 2015 +0000 +++ b/handouts/ho07.tex Tue Nov 17 01:58:50 2015 +0000 @@ -2,11 +2,12 @@ \usepackage{../style} \usepackage{../langs} \usepackage{../grammar} -\usepackage{tikz-qtree} +\usepackage{../graphics} + \begin{document} -\section*{Handout 7 (Compilation of the WHILE-Language)} +\section*{Handout 7 (Compilation)} The purpose of a compiler is to transform a program, a human can write, into code the machine can run as fast as possible. @@ -18,11 +19,13 @@ compiler would normally need to take care of (like explicit memory management). -We will be generating code for the Java Virtual Machine. This -is a stack-based virtual machine which will make it easy to -generate code for arithmetic expressions. For example for -generating code for the expression $1 + 2$ we need to issue -the following three instructions +As an example we will implement a compiler for the very simple +While-language. We will be generating code for the Java +Virtual Machine. This is a stack-based virtual machine, a fact +which will make it easy to generate code for arithmetic +expressions. For example for generating code for the +expression $1 + 2$ we need to generate the following three +instructions \begin{lstlisting}[numbers=none] ldc 1 @@ -30,17 +33,18 @@ iadd \end{lstlisting} -\noindent The first instruction loads the constant $1$ on the -stack, the next one $2$, the third instruction add both -numbers together replacing the top elements of the stack with -the result $3$. We will throughout consider only integer -numbers and results, therefore we can use the instructions -\code{iadd}, \code{isub}, \code{imul}, \code{idiv} and so on. -The \code{i} stands for integer instructions (alternatives are -\code{d} for doubles, \code{l} for longs and \code{f} for -floats). +\noindent The first instruction loads the constant $1$ onto +the stack, the next one $2$, the third instruction adds both +numbers together replacing the top elements of the stack with +the result $3$. For simplicity, we will throughout consider +only integer numbers and results. Therefore we can use the +instructions \code{iadd}, \code{isub}, \code{imul}, +\code{idiv} and so on. The \code{i} stands for integer +instructions in the JVM (alternatives are \code{d} for doubles, +\code{l} for longs and \code{f} for floats). -Recall our grammar for arithmetic expressions: +Recall our grammar for arithmetic expressions ($E$ is the +starting symbol): \begin{plstx}[rhs style=, margin=3cm] : \meta{E} ::= \meta{T} $+$ \meta{E} @@ -55,9 +59,9 @@ \noindent where \meta{Id} stands for variables and -\meta{Num} for number. For the moment let us omit variables from +\meta{Num} for numbers. For the moment let us omit variables from arithmetic expressions. Our parser will take this grammar and -produce abstract syntax trees, for +produce abstract syntax trees. For example for the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the following tree. @@ -65,10 +69,11 @@ \begin{tikzpicture} \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]] \end{tikzpicture} \end{center} -\noindent -To generate code for this expression, we need to traverse this -tree in post-order fashion---this will produce code for a -stack-machine, like the JVM. Doing so gives the instructions +\noindent To generate code for this expression, we need to +traverse this tree in post-order fashion and emit code for +each node---this traversal in post-order fashion will produce +code for a stack-machine (what the JVM is). Doing so for the +tree above generates the instructions \begin{lstlisting}[numbers=none] ldc 1 @@ -83,14 +88,17 @@ \end{lstlisting} \noindent If we ``run'' these instructions, the result $8$ -will be on top of the stack. This will be a convention we -always observe, namely that the results of arithmetic +will be on top of the stack (I leave this to you to verify; +the meaning of each instruction should be clear). The result +being on the top of the stack will be a convention we always +observe in our compiler, that is the results of arithmetic expressions will always be on top of the stack. Note, that a -different bracketing, for example $(1 + (2 * 3)) + (4 - 3)$, -produces a different abstract syntax tree and thus potentially -also a different list of instructions. Generating code in this -fashion is rather simple: it can be done by the following -\textit{compile}-function: +different bracketing of the expression, for example $(1 + (2 * +3)) + (4 - 3)$, produces a different abstract syntax tree and +thus potentially also a different list of instructions. +Generating code in this fashion is rather easy to implement: +it can be done with the following \textit{compile}-function, +which takes the abstract syntax tree as argument: \begin{center} \begin{tabular}{lcl} @@ -106,11 +114,11 @@ \end{tabular} \end{center} -\noindent However, our arithmetic expressions can also contain +However, our arithmetic expressions can also contain variables. We will represent them as \emph{local variables} in the JVM. Essentially, local variables are an array or pointers -to the memory containing in our case only integers. Looking up -a variable can be done by the instruction +to memory cells, containing in our case only integers. Looking +up a variable can be done with the instruction \begin{lstlisting}[mathescape,numbers=none] iload $index$ @@ -118,7 +126,7 @@ \noindent which places the content of the local variable $index$ onto -thestack. Storing the top of the stack into a local variable +the stack. Storing the top of the stack into a local variable can be done by the instruction \begin{lstlisting}[mathescape,numbers=none] @@ -126,16 +134,16 @@ \end{lstlisting} \noindent Note that this also pops off the top of the stack. -One problem we have to overcome is that local variables are -addressed, not by identifiers, but by numbers (starting from -$0$). Therefore our compiler needs to maintain a kind of -environment (similar to the interpreter) where variables are -associated to numbers. This association needs to be unique: if -we muddle up the numbers, then we essentially confuse -variables and the result will usually be an erroneous result. -Therefore our \textit{compile}-function will take two -arguments: the abstract syntax tree and the environment, $E$, -that maps identifiers to index-numbers. +One problem we have to overcome, however, is that local +variables are addressed, not by identifiers, but by numbers +(starting from $0$). Therefore our compiler needs to maintain +a kind of environment where variables are associated to +numbers. This association needs to be unique: if we muddle up +the numbers, then we essentially confuse variables and the +consequence will usually be an erroneous result. Our extended +\textit{compile}-function for arithmetic expressions will +therefore take two arguments: the abstract syntax tree and the +environment, $E$, that maps identifiers to index-numbers. \begin{center} \begin{tabular}{lcl} @@ -153,9 +161,216 @@ \end{center} \noindent In the last line we generate the code for variables -where $E(x)$ stands for looking up to which index the variable -$x$ maps to. +where $E(x)$ stands for looking up the environment to which +index the variable $x$ maps to. + +There is a similar \textit{compile}-function for boolean +expressions, but it includes a ``trick'' to do with +\pcode{if}- and \pcode{while}-statements. To explain the issue +let us explain first the compilation of statements of the +While-language. The clause for \pcode{skip} is trivial, since +we do not have to generate any instruction + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\ +\end{tabular} +\end{center} + +\noindent Note that the \textit{compile}-function for +statements returns a pair, a list of instructions (in this +case the empty list) and an environment for variables. The +reason for the environment is that assignments in the +While-language might change the environment---clearly if a +variable is used for the first time, we need to allocate a new +index and if it has been used before, we need to be able to +retrieve the associated index. This is reflected in +the clause for compiling assignments: + +\begin{center} +\begin{tabular}{lcl} +$\text{compile}(x := a, E)$ & $\dn$ & +$(\textit{compile}(a, E) \;@\;\pcode{istore}\;index, E')$ +\end{tabular} +\end{center} + +\noindent We first generate code for the right-hand side of +the assignment and then add an \pcode{istore}-instruction at +the end. By convention the result of the arithmetic expression +$a$ will be on top of the stack. After the \pcode{istore} +instruction, the result will be stored in the index +corresponding to the variable $x$. If the variable $x$ has +been used before in the program, we just need to look up what +the index is and return the environment unchanged (that is in +this case $E' = E$). However, if this is the first encounter +of the variable $x$ in the program, then we have to augment +the environment and assign $x$ with the largest index in $E$ +plus one (that is $E' = E(x \mapsto largest\_index + 1)$). +That means for the assignment $x := x + 1$ we generate the +following code + +\begin{lstlisting}[mathescape,numbers=none] +iload $n_x$ +ldc 1 +iadd +istore $n_x$ +\end{lstlisting} + +\noindent +where $n_x$ is the index for the variable $x$. + +More complicated is the code for \pcode{if}-statments, say + +\begin{lstlisting}[mathescape,language={},numbers=none] +if $b$ then $cs_1$ else $cs_2$ +\end{lstlisting} + +\noindent where $b$ is a boolean expression and the $cs_i$ +are the instructions for each \pcode{if}-branch. Lets assume +we already generated code for $b$ and $cs_{1/2}$. Then in the +true-case the control-flow of the program needs to be + +\begin{center} +\begin{tikzpicture}[node distance=2mm and 4mm, + block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, + point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, + skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] +\node (A1) [point] {}; +\node (b) [block, right=of A1] {code of $b$}; +\node (A2) [point, right=of b] {}; +\node (cs1) [block, right=of A2] {code of $cs_1$}; +\node (A3) [point, right=of cs1] {}; +\node (cs2) [block, right=of A3] {code of $cs_2$}; +\node (A4) [point, right=of cs2] {}; + +\draw (A1) edge [->, black, line width=1mm] (b); +\draw (b) edge [->, black, line width=1mm] (cs1); +\draw (cs1) edge [->, black, line width=1mm] (A3); +\draw (A3) edge [->, black, skip loop] (A4); +\node [below=of cs2] {\raisebox{-5mm}{\small{}jump}}; +\end{tikzpicture} +\end{center} + +\noindent where we start with running the code for $b$; since +we are in the true case we continue with running the code for +$cs_1$. After this however, we must not run the code for +$cs_2$, but always jump after the last instruction of $cs_2$ +(the code for the \pcode{else}-branch). Note that this jump is +unconditional, meaning we always have to jump to the end of +$cs_2$. The corresponding instruction of the JVM is +\pcode{goto}. In case $b$ turns out to be false we need the +control-flow +\begin{center} +\begin{tikzpicture}[node distance=2mm and 4mm, + block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm}, + point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red}, + skip loop/.style={black, line width=1mm, to path={-- ++(0,-10mm) -| (\tikztotarget)}}] +\node (A1) [point] {}; +\node (b) [block, right=of A1] {code of $b$}; +\node (A2) [point, right=of b] {}; +\node (cs1) [block, right=of A2] {code of $cs_1$}; +\node (A3) [point, right=of cs1] {}; +\node (cs2) [block, right=of A3] {code of $cs_2$}; +\node (A4) [point, right=of cs2] {}; + +\draw (A1) edge [->, black, line width=1mm] (b); +\draw (b) edge [->, black, line width=1mm] (A2); +\draw (A2) edge [skip loop] (A3); +\draw (A3) edge [->, black, line width=1mm] (cs2); +\draw (cs2) edge [->,black, line width=1mm] (A4); +\node [below=of cs1] {\raisebox{-5mm}{\small{}conditional jump}}; +\end{tikzpicture} +\end{center} + +\noindent where we now need a conditional jump (if the +if-condition is false) from the end of the code for the +boolean to the beginning of the instructions $cs_2$. Once we +are finished with running $cs_2$ we can continue with whatever +code comes after the if-statement. + +The \pcode{goto} and conditional jumps need addresses to where +the jump should go. Since we are generating assembly code for +the JVM, we do not actually have to give addresses, but need +to attach labels to our code. These labels specify a target +for a jump. Therefore the labels need to be unique, as +otherwise it would be ambiguous where a jump should go. +A labels, say \pcode{L}, is attached to code like + +\begin{lstlisting}[mathescape,numbers=none] +L: + $instr_1$ + $instr_2$ + $\vdots$ +\end{lstlisting} + +Recall the ``trick'' with compiling boolean expressions: the +\textit{compile}-function for boolean expressions takes three +arguments: an abstract syntax tree, an environment for +variable indices and also the label, $lab$, to where an conditional +jump needs to go. The clause for the expression $a_1 = a_2$, +for example, is as follows: + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(a_1 = a_2, E, lab)$ & $\dn$\\ +\multicolumn{3}{l}{$\qquad\textit{compile}(a_1, E) \;@\;\textit{compile}(a_2, E)\;@\; \pcode{if_icmpne}\;lab$} +\end{tabular} +\end{center} + +\noindent +We are generating code for the subexpressions $a_1$ and $a_2$. +This will mean after running the corresponding code there will +be two integers on top of the stack. If they are equal, we do +not have to do anything and just continue with the next +instructions (see control-flow of ifs above). However if they +are \emph{not} equal, then we need to (conditionally) jump to +the label $lab$. This can be done with the instruction + +\begin{lstlisting}[mathescape,numbers=none] +if_icmpne $lab$ +\end{lstlisting} + +\noindent Other jump instructions for boolean operators are + +\begin{center} +\begin{tabular}{l@{\hspace{10mm}}c@{\hspace{10mm}}l} +$=$ & $\Rightarrow$ & \pcode{if_icmpne}\\ +$\not=$ & $\Rightarrow$ & \pcode{if_icmpeq}\\ +$<$ & $\Rightarrow$ & \pcode{if_icmpge}\\ +$\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\ +\end{tabular} +\end{center} + +\noindent and so on. I leave it to you to extend the +\textit{compile}-function for the other boolean expressions. +Note that we need to jump whenever the boolean is \emph{not} +true, which means we have to ``negate'' the jump---equals +becomes not-equal, less becomes greater-or-equal. If you do +not like this design (it can be the source of some nasty, +hard-to-detect errors), you can also change the layout of the +code and first give the code for the else-branch and then for +the if-branch. + +We are now ready to give the compile function for +if-statments--remember this function returns for staments a +pair consisting of the code and an environment: + +\begin{center} +\begin{tabular}{lcl} +$\textit{compile}(\pcode{if}\;b\;\pcode{then}\; cs_1\;\pcode{else}\; cs_2, E)$ & $\dn$\\ +\multicolumn{3}{l}{$\qquad l_\textit{ifelse}\;$ (fresh label)}\\ +\multicolumn{3}{l}{$\qquad l_\textit{ifend}\;$ (fresh label)}\\ +\multicolumn{3}{l}{$\qquad (is_1, E') = \textit{compile}(cs_1, E)$}\\ +\multicolumn{3}{l}{$\qquad (is_2, E'') = \textit{compile}(cs_2, E')$}\\ +\multicolumn{3}{l}{$\qquad(\textit{compile}(b, E, l_\textit{ifelse})$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_1$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\; \pcode{goto}\;l_\textit{ifend}$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;l_\textit{ifelse}:$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;is_2$}\\ +\multicolumn{3}{l}{$\qquad\phantom{(}@\;l_\textit{ifend}:, E'')$}\\ +\end{tabular} +\end{center} \end{document} diff -r 0b1a92b305cf -r d6af4b1239de slides/slides07.pdf Binary file slides/slides07.pdf has changed diff -r 0b1a92b305cf -r d6af4b1239de slides/slides07.tex --- a/slides/slides07.tex Mon Nov 16 15:16:26 2015 +0000 +++ b/slides/slides07.tex Tue Nov 17 01:58:50 2015 +0000 @@ -365,48 +365,6 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\begin{frame}[t] -\frametitle{Compiling Statements} - -We return a list of instructions and an environment for the -variables - -\begin{center} -\bl{\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} -$\text{compile}(\text{skip}, E)$ & $\dn$ & $(\textit{Nil}, E)$\bigskip\\ -$\text{compile}(x := a, E)$ & $\dn$\\ -\multicolumn{3}{l}{$(\text{compile}(a, E) \;@\;\text{istore}\;index, E(x\mapsto index))$}\\ -\end{tabular}} -\end{center}\medskip - -where \bl{$index$} is \bl{$E(x)$} if it is already defined, -or if it is not then the largest index not yet seen - -\end{frame} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\mode{ -\begin{frame}[t] -\frametitle{\begin{tabular}{c}Compiling AExps\end{tabular}} - -{\Large\bl{$x := x + 1$}} - -\begin{center} -\bl{\begin{tabular}{l} -iload $n_x$\\ -ldc 1\\ -iadd\\ -istore $n_x$\\ -\end{tabular}} -\end{center} - -where \bl{$n_x$} is the index corresponding to the variable \bl{$x$} - -\end{frame}} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[c, fragile] \frametitle{Mathematical Functions} @@ -426,9 +384,50 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\mode{ \begin{frame}[t] -\frametitle{\begin{tabular}{c}Compiling Ifs\end{tabular}} +\frametitle{Compiling Statements} + +We return a list of instructions and an environment for the +variables + +\begin{center} +\bl{\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}} +$\text{compile}(\text{skip}, E)$ & $\dn$ & $(\textit{Nil}, E)$\bigskip\\ +$\text{compile}(x := a, E)$ & $\dn$\\ +\multicolumn{3}{l}{$(\text{compile}(a, E) \;@\;\text{istore}\;index, E(x\mapsto index))$}\\ +\end{tabular}} +\end{center}\medskip + +where \bl{$index$} is \bl{$E(x)$} if it is already defined, +or if it is not, then the largest index not yet seen + +\end{frame} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}[c] +\frametitle{Compiling Assignments} + +{\Large\bl{$x := x + 1$}} + +\begin{center} +\bl{\begin{tabular}{l} +iload $n_x$\\ +ldc 1\\ +iadd\\ +istore $n_x$\\ +\end{tabular}} +\end{center} + +where \bl{$n_x$} is the index corresponding to the variable~\bl{$x$} + +\end{frame} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}[t] +\frametitle{Compiling Ifs} {\Large\bl{$\text{if}\;b\;\text{then}\;cs_1\;\text{else}\;cs_2$}}\bigskip\bigskip @@ -463,8 +462,7 @@ \end{tikzpicture} \end{center} - -\end{frame}} +\end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -500,9 +498,8 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\mode{ \begin{frame}[t] -\frametitle{\begin{tabular}{c}Compiling BExps\end{tabular}} +\frametitle{Compiling BExps} {\Large\bl{$a_1 = a_2$}} @@ -513,7 +510,7 @@ \end{tabular}} \end{center} -\end{frame}} +\end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -559,9 +556,8 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\mode{ \begin{frame}[t] -\frametitle{\begin{tabular}{c}Compiling Ifs\end{tabular}} +\frametitle{Compiling Ifs} {\Large\bl{if $b$ then $cs_1$ else $cs_2$}} @@ -581,7 +577,7 @@ \end{tabular}} \end{center} -\end{frame}} +\end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%