# HG changeset patch # User Christian Urban # Date 1572020603 -3600 # Node ID 9ce78065f68d4d7f6ddacf4ed28f02f04835450e # Parent 412556272333505d988a090ba81735edde109feb updated diff -r 412556272333 -r 9ce78065f68d handouts/ho07.pdf Binary file handouts/ho07.pdf has changed diff -r 412556272333 -r 9ce78065f68d handouts/ho07.tex --- a/handouts/ho07.tex Fri Oct 25 14:55:31 2019 +0100 +++ b/handouts/ho07.tex Fri Oct 25 17:23:23 2019 +0100 @@ -7,19 +7,20 @@ \begin{document} +\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019} \section*{Handout 7 (Compilation)} -The purpose of a compiler is to transform a program, a human -can read and write, into code the machine can run as fast as -possible. The fastest code would be machine code the CPU can -run directly, but it is often enough to improve the speed of a -program by just targeting a virtual machine. This produces not -the fastest possible code, but code that is fast enough and -has the advantage that the virtual machine takes care of -things a compiler would normally need to take care of (like -explicit memory management). Why study compilers? John Regher -gives this answer in his compiler blog:\footnote{\url{http://blog.regehr.org/archives/1419}} +The purpose of a compiler is to transform a program a human can read and +write into code the machine can run as fast as possible. The fastest +code would be machine code the CPU can run directly, but it is often +good enough for improving the speed of a program to just target a +virtual machine. This produces not the fastest possible code, but code +that is pretty enough. This way of producing code has the advantage that +the virtual machine takes care of things a compiler would normally need +to take care of (like explicit memory management). An interesting answer +for why studying compilers is given by John Regher in his compiler +blog:\footnote{\url{http://blog.regehr.org/archives/1419}} \begin{quote}\it{}``We can start off with a couple of observations about the role of compilers. First, hardware is getting weirder @@ -41,19 +42,19 @@ As a first example in this module we will implement a compiler for the very simple While-language. It will generate code for the Java Virtual -Machine (JVM). This is a stack-based virtual machine, a fact which -will make it easy to generate code for arithmetic expressions. For -example for generating code for the expression $1 + 2$ we need to -generate the following three instructions +Machine (JVM). This is a stack-based virtual machine, a fact which will +make it easy to generate code for arithmetic expressions. For example +when compiling $1 + 2$ we need to generate the following three +instructions -\begin{lstlisting}[numbers=none] +\begin{lstlisting}[language=JVMIS,numbers=none] ldc 1 ldc 2 iadd \end{lstlisting} \noindent The first instruction loads the constant $1$ onto -the stack, the next one $2$, the third instruction adds both +the stack, the next one loads $2$, the third instruction adds both numbers together replacing the top two elements of the stack with the result $3$. For simplicity, we will throughout consider only integer numbers and results. Therefore we can @@ -80,11 +81,10 @@ \noindent where \meta{Id} stands for variables and \meta{Num} -for numbers. For the moment let us omit variables from -arithmetic expressions. Our parser will take this grammar and -given an input produce abstract syntax trees. For example for -the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the -following tree. +for numbers. For the moment let us omit variables from arithmetic +expressions. Our parser will take this grammar and given an input +produce abstract syntax trees. For example we will obtain for the +expression $1 + ((2 * 3) + (4 - 3))$ the following tree. \begin{center} \begin{tikzpicture} @@ -92,13 +92,13 @@ \end{tikzpicture} \end{center} -\noindent To generate code for this expression, we need to +\noindent To generate JVM code for this expression, we need to traverse this tree in post-order fashion and emit code for each node---this traversal in post-order fashion will produce code for a stack-machine (what the JVM is). Doing so for the tree above generates the instructions -\begin{lstlisting}[numbers=none] +\begin{lstlisting}[language=JVMIS,numbers=none] ldc 1 ldc 2 ldc 3 @@ -110,19 +110,17 @@ iadd \end{lstlisting} -\noindent If we ``run'' these instructions, the result $8$ -will be on top of the stack (I leave this to you to verify; -the meaning of each instruction should be clear). The result -being on the top of the stack will be a convention we always -observe in our compiler, that is the results of arithmetic -expressions will always be on top of the stack. Note, that a -different bracketing of the expression, for example $(1 + (2 * -3)) + (4 - 3)$, produces a different abstract syntax tree and -thus potentially also a different list of instructions. -Generating code in this fashion is rather easy to implement: -it can be done with the following recursive -\textit{compile}-function, which takes the abstract syntax -tree as argument: +\noindent If we ``run'' these instructions, the result $8$ will be on +top of the stack (I leave this to you to verify; the meaning of each +instruction should be clear). The result being on the top of the stack +will be a convention we always observe in our compiler, that is the +results of arithmetic expressions will always be on top of the stack. +Note, that a different bracketing of the expression, for example $(1 + +(2 * 3)) + (4 - 3)$, produces a different abstract syntax tree and thus +also a different list of instructions. Generating code in this +post-order-traversal fashion is rather easy to implement: it can be done +with the following recursive \textit{compile}-function, which takes the +abstract syntax tree as argument: \begin{center} \begin{tabular}{lcl} @@ -144,7 +142,7 @@ to memory cells, containing in our case only integers. Looking up a variable can be done with the instruction -\begin{lstlisting}[mathescape,numbers=none] +\begin{lstlisting}[language=JVMIS,mathescape,numbers=none] iload $index$ \end{lstlisting} @@ -153,7 +151,7 @@ the stack. Storing the top of the stack into a local variable can be done by the instruction -\begin{lstlisting}[mathescape,numbers=none] +\begin{lstlisting}[language=JVMIS,mathescape,numbers=none] istore $index$ \end{lstlisting} @@ -201,7 +199,7 @@ \end{tabular} \end{center} -\noindent $[]$ is the empty list of instructions. Note that +\noindent whereby $[]$ is the empty list of instructions. Note that the \textit{compile}-function for statements returns a pair, a list of instructions (in this case the empty list) and an environment for variables. The reason for the environment is @@ -233,7 +231,7 @@ That means for the assignment $x := x + 1$ we generate the following code -\begin{lstlisting}[mathescape,numbers=none] +\begin{lstlisting}[language=JVMIS,mathescape,numbers=none] iload $n_x$ ldc 1 iadd @@ -241,18 +239,31 @@ \end{lstlisting} \noindent -where $n_x$ is the index for the variable $x$. +where $n_x$ is the index for the variable $x$. The code for +looking-up the index for the variable is as follow: -More complicated is the code for \pcode{if}-statements, say +\begin{center} +\begin{tabular}{lcl} +$index \;=\; \textit{if}\;(E(x).\textit{isDefind})\;\textit{then}\;E(x)\;\textit{else}\;|E|$ +\end{tabular} +\end{center} + +\noindent +In case the environment $E$ contains an index for $x$, we return it. +Otherwise we ``create'' a new index by returning the size $|E|$ of the +environment (that will be an index that is guaranteed to be not used +yet). + +More complicated is the generation of code for \pcode{if}-statements, say \begin{lstlisting}[mathescape,language={},numbers=none] if $b$ then $cs_1$ else $cs_2$ \end{lstlisting} -\noindent where $b$ is a boolean expression and the $cs_{1/2}$ -are the statements for each \pcode{if}-branch. Lets assume -we already generated code for $b$ and $cs_{1/2}$. Then in the -true-case the control-flow of the program needs to be +\noindent where $b$ is a boolean expression and the $cs_{1/2}$ are the +statements for each of the \pcode{if}-branches. Lets assume we already +generated code for $b$ and $cs_{1/2}$. Then in the true-case the +control-flow of the program needs to be \begin{center} \begin{tikzpicture}[node distance=2mm and 4mm, @@ -419,7 +430,7 @@ \noindent The generated code is as follows: -\begin{lstlisting}[mathescape,language={}] +\begin{lstlisting}[language=JVMIS,mathescape,language={}] ldc 1 ldc 1 if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ @@ -528,7 +539,7 @@ \noindent yielding the following code -\begin{lstlisting}[mathescape,language={}] +\begin{lstlisting}[language=JVMIS,mathescape,language={}] L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ iload 0 ldc 10 diff -r 412556272333 -r 9ce78065f68d progs/compile.scala --- a/progs/compile.scala Fri Oct 25 14:55:31 2019 +0100 +++ b/progs/compile.scala Fri Oct 25 17:23:23 2019 +0100 @@ -120,7 +120,7 @@ // environments -type Env = Map[String, String] +type Env = Map[String, Int] // arithmetic expression compilation def compile_aexp(a: AExp, env : Env) : String = a match { @@ -150,8 +150,7 @@ def compile_stmt(s: Stmt, env: Env) : (String, Env) = s match { case Skip => ("", env) case Assign(x, a) => { - val index = if (env.isDefinedAt(x)) env(x) else - env.keys.size.toString + val index = if (env.isDefinedAt(x)) env(x) else env.keys.size (compile_aexp(a, env) ++ i"istore $index", env + (x -> index)) } case If(b, bl1, bl2) => { @@ -180,8 +179,7 @@ (i"iload ${env(x)}" ++ i"invokestatic XXX/XXX/write(I)V", env) case Read(x) => { - val index = if (env.isDefinedAt(x)) env(x) else - env.keys.size.toString + val index = if (env.isDefinedAt(x)) env(x) else env.keys.size (i"invokestatic XXX/XXX/read()I" ++ i"istore $index", env + (x -> index)) }