updated
authorChristian Urban <urbanc@in.tum.de>
Fri, 25 Oct 2019 17:23:23 +0100
changeset 668 9ce78065f68d
parent 667 412556272333
child 669 2f5a4d76756d
updated
handouts/ho07.pdf
handouts/ho07.tex
progs/compile.scala
Binary file handouts/ho07.pdf has changed
--- a/handouts/ho07.tex	Fri Oct 25 14:55:31 2019 +0100
+++ b/handouts/ho07.tex	Fri Oct 25 17:23:23 2019 +0100
@@ -7,19 +7,20 @@
 
 
 \begin{document}
+\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019}
 
 \section*{Handout 7 (Compilation)}
 
-The purpose of a compiler is to transform a program, a human
-can read and write, into code the machine can run as fast as
-possible. The fastest code would be machine code the CPU can
-run directly, but it is often enough to improve the speed of a
-program by just targeting a virtual machine. This produces not
-the fastest possible code, but code that is fast enough and
-has the advantage that the virtual machine takes care of
-things a compiler would normally need to take care of (like
-explicit memory management). Why study compilers? John Regher
-gives this answer in his compiler blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
+The purpose of a compiler is to transform a program a human can read and
+write into code the machine can run as fast as possible. The fastest
+code would be machine code the CPU can run directly, but it is often
+good enough for improving the speed of a program to just target a
+virtual machine. This produces not the fastest possible code, but code
+that is pretty enough. This way of producing code has the advantage that
+the virtual machine takes care of things a compiler would normally need
+to take care of (like explicit memory management). An interesting answer
+for why studying compilers is given by John Regher in his compiler
+blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
 
 \begin{quote}\it{}``We can start off with a couple of observations
   about the role of compilers. First, hardware is getting weirder
@@ -41,19 +42,19 @@
 
 As a first example in this module we will implement a compiler for the
 very simple While-language. It will generate code for the Java Virtual
-Machine (JVM). This is a stack-based virtual machine, a fact which
-will make it easy to generate code for arithmetic expressions. For
-example for generating code for the expression $1 + 2$ we need to
-generate the following three instructions
+Machine (JVM). This is a stack-based virtual machine, a fact which will
+make it easy to generate code for arithmetic expressions. For example
+when compiling $1 + 2$ we need to generate the following three
+instructions
 
-\begin{lstlisting}[numbers=none]
+\begin{lstlisting}[language=JVMIS,numbers=none]
 ldc 1
 ldc 2
 iadd 
 \end{lstlisting}
 
 \noindent The first instruction loads the constant $1$ onto
-the stack, the next one $2$, the third instruction adds both
+the stack, the next one loads $2$, the third instruction adds both
 numbers together replacing the top two elements of the stack
 with the result $3$. For simplicity, we will throughout
 consider only integer numbers and results. Therefore we can
@@ -80,11 +81,10 @@
 
 
 \noindent where \meta{Id} stands for variables and \meta{Num}
-for numbers. For the moment let us omit variables from
-arithmetic expressions. Our parser will take this grammar and
-given an input produce abstract syntax trees. For example for
-the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
-following tree.
+for numbers. For the moment let us omit variables from arithmetic
+expressions. Our parser will take this grammar and given an input
+produce abstract syntax trees. For example we will obtain for the
+expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
 
 \begin{center}
 \begin{tikzpicture}
@@ -92,13 +92,13 @@
 \end{tikzpicture}
 \end{center}
 
-\noindent To generate code for this expression, we need to
+\noindent To generate JVM code for this expression, we need to
 traverse this tree in post-order fashion and emit code for
 each node---this traversal in post-order fashion will produce
 code for a stack-machine (what the JVM is). Doing so for the
 tree above generates the instructions
 
-\begin{lstlisting}[numbers=none]
+\begin{lstlisting}[language=JVMIS,numbers=none]
 ldc 1 
 ldc 2 
 ldc 3 
@@ -110,19 +110,17 @@
 iadd
 \end{lstlisting}
 
-\noindent If we ``run'' these instructions, the result $8$
-will be on top of the stack (I leave this to you to verify;
-the meaning of each instruction should be clear). The result
-being on the top of the stack will be a convention we always
-observe in our compiler, that is the results of arithmetic
-expressions will always be on top of the stack. Note, that a
-different bracketing of the expression, for example $(1 + (2 *
-3)) + (4 - 3)$, produces a different abstract syntax tree and
-thus potentially also a different list of instructions.
-Generating code in this fashion is rather easy to implement:
-it can be done with the following recursive
-\textit{compile}-function, which takes the abstract syntax
-tree as argument:
+\noindent If we ``run'' these instructions, the result $8$ will be on
+top of the stack (I leave this to you to verify; the meaning of each
+instruction should be clear). The result being on the top of the stack
+will be a convention we always observe in our compiler, that is the
+results of arithmetic expressions will always be on top of the stack.
+Note, that a different bracketing of the expression, for example $(1 +
+(2 * 3)) + (4 - 3)$, produces a different abstract syntax tree and thus
+also a different list of instructions. Generating code in this
+post-order-traversal fashion is rather easy to implement: it can be done
+with the following recursive \textit{compile}-function, which takes the
+abstract syntax tree as argument:
 
 \begin{center}
 \begin{tabular}{lcl}
@@ -144,7 +142,7 @@
 to memory cells, containing in our case only integers. Looking
 up a variable can be done with the instruction
 
-\begin{lstlisting}[mathescape,numbers=none]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 iload $index$
 \end{lstlisting}
 
@@ -153,7 +151,7 @@
 the stack. Storing the top of the stack into a local variable 
 can be done by the instruction
 
-\begin{lstlisting}[mathescape,numbers=none]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 istore $index$
 \end{lstlisting}
 
@@ -201,7 +199,7 @@
 \end{tabular}
 \end{center}
 
-\noindent $[]$ is the empty list of instructions. Note that
+\noindent whereby $[]$ is the empty list of instructions. Note that
 the \textit{compile}-function for statements returns a pair, a
 list of instructions (in this case the empty list) and an
 environment for variables. The reason for the environment is
@@ -233,7 +231,7 @@
 That means for the assignment $x := x + 1$ we generate the
 following code
 
-\begin{lstlisting}[mathescape,numbers=none]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
 iload $n_x$
 ldc 1
 iadd
@@ -241,18 +239,31 @@
 \end{lstlisting}
 
 \noindent 
-where $n_x$ is the index for the variable $x$.
+where $n_x$ is the index for the variable $x$. The code for 
+looking-up the index for the variable is as follow:
 
-More complicated is the code for \pcode{if}-statements, say
+\begin{center}
+\begin{tabular}{lcl}
+$index \;=\; \textit{if}\;(E(x).\textit{isDefind})\;\textit{then}\;E(x)\;\textit{else}\;|E|$
+\end{tabular}
+\end{center}
+
+\noindent
+In case the environment $E$ contains an index for $x$, we return it.
+Otherwise we ``create'' a new index by returning the size $|E|$ of the
+environment (that will be an index that is guaranteed to be not used
+yet).
+
+More complicated is the generation of code for \pcode{if}-statements, say
 
 \begin{lstlisting}[mathescape,language={},numbers=none]
 if $b$ then $cs_1$ else $cs_2$
 \end{lstlisting}
 
-\noindent where $b$ is a boolean expression and the $cs_{1/2}$
-are the statements for each \pcode{if}-branch. Lets assume
-we already generated code for $b$ and $cs_{1/2}$. Then in the
-true-case the control-flow of the program needs to be
+\noindent where $b$ is a boolean expression and the $cs_{1/2}$ are the
+statements for each of the \pcode{if}-branches. Lets assume we already
+generated code for $b$ and $cs_{1/2}$. Then in the true-case the
+control-flow of the program needs to be
 
 \begin{center}
 \begin{tikzpicture}[node distance=2mm and 4mm,
@@ -419,7 +430,7 @@
 \noindent 
 The generated code is as follows:
 
-\begin{lstlisting}[mathescape,language={}]
+\begin{lstlisting}[language=JVMIS,mathescape,language={}]
    ldc 1
    ldc 1
    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
@@ -528,7 +539,7 @@
 
 \noindent yielding the following code
 
-\begin{lstlisting}[mathescape,language={}]
+\begin{lstlisting}[language=JVMIS,mathescape,language={}]
 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
    iload 0
    ldc 10
--- a/progs/compile.scala	Fri Oct 25 14:55:31 2019 +0100
+++ b/progs/compile.scala	Fri Oct 25 17:23:23 2019 +0100
@@ -120,7 +120,7 @@
 
 
 // environments 
-type Env = Map[String, String]
+type Env = Map[String, Int]
 
 // arithmetic expression compilation
 def compile_aexp(a: AExp, env : Env) : String = a match {
@@ -150,8 +150,7 @@
 def compile_stmt(s: Stmt, env: Env) : (String, Env) = s match {
   case Skip => ("", env)
   case Assign(x, a) => {
-    val index = if (env.isDefinedAt(x)) env(x) else 
-                    env.keys.size.toString
+    val index = if (env.isDefinedAt(x)) env(x) else env.keys.size
     (compile_aexp(a, env) ++ i"istore $index", env + (x -> index))
   } 
   case If(b, bl1, bl2) => {
@@ -180,8 +179,7 @@
     (i"iload ${env(x)}" ++ 
      i"invokestatic XXX/XXX/write(I)V", env)
   case Read(x) => {
-    val index = if (env.isDefinedAt(x)) env(x) else 
-                    env.keys.size.toString
+    val index = if (env.isDefinedAt(x)) env(x) else env.keys.size
     (i"invokestatic XXX/XXX/read()I" ++ 
      i"istore $index", env + (x -> index))
   }