Binary file handouts/ho01.pdf has changed
--- a/handouts/ho01.tex Thu Nov 14 13:50:39 2019 +0000
+++ b/handouts/ho01.tex Fri Nov 15 14:58:16 2019 +0000
@@ -43,8 +43,35 @@
\section*{Handout 1}
+The purpose of a compiler is to transform a program a human can read
+and write into code the machine can run as fast as
+possible. Developping a compiler is an old craft going back to 1952
+with the first compiler written by Grace Hopper. Why studiying
+compilers nowadays? An interesting answer is given by John Regher in
+his compiler
+blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
+
+\begin{quote}\it{}``We can start off with a couple of observations
+ about the role of compilers. First, hardware is getting weirder
+ rather than getting clocked faster: almost all processors are
+ multicores and it looks like there is increasing asymmetry in
+ resources across cores. Processors come with vector units, crypto
+ accelerators, bit twiddling instructions, and lots of features to
+ make virtualization and concurrency work. We have DSPs, GPUs,
+ big.little, and Xeon Phi. This is only scratching the
+ surface. Second, we’re getting tired of low-level languages and
+ their associated security disasters, we want to write new code, to
+ whatever extent possible, in safer, higher-level
+ languages. Compilers are caught right in the middle of these
+ opposing trends: one of their main jobs is to help bridge the large
+ and growing gap between increasingly high-level languages and
+ increasingly wacky platforms. It’s effectively a perpetual
+ employment act for solid compiler hackers.''
+\end{quote}
+
+
The first part of this module is about text processing, be it for
-web-crawlers, compilers, dictionaries, DNA-data, spam-filters and so on.
+compilers, dictionaries, DNA-data, spam-filters and so on.
When looking for a particular string, say \pcode{"foobar"}, in a large
text we can use the Knuth-Morris-Pratt algorithm, which is currently the
most efficient general string search algorithm. But often we do
Binary file handouts/ho07.pdf has changed
--- a/handouts/ho07.tex Thu Nov 14 13:50:39 2019 +0000
+++ b/handouts/ho07.tex Fri Nov 15 14:58:16 2019 +0000
@@ -11,41 +11,42 @@
\section*{Handout 7 (Compilation)}
+
+
The purpose of a compiler is to transform a program a human can read and
write into code the machine can run as fast as possible. The fastest
code would be machine code the CPU can run directly, but it is often
-good enough for improving the speed of a program to just target a
+good enough for improving the speed of a program to target a
virtual machine. This produces not the fastest possible code, but code
-that is pretty enough. This way of producing code has the advantage that
+that is often pretty fast. This way of producing code has the advantage that
the virtual machine takes care of things a compiler would normally need
-to take care of (like explicit memory management). An interesting answer
-for why studying compilers is given by John Regher in his compiler
-blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
-
-\begin{quote}\it{}``We can start off with a couple of observations
- about the role of compilers. First, hardware is getting weirder
- rather than getting clocked faster: almost all processors are
- multicores and it looks like there is increasing asymmetry in
- resources across cores. Processors come with vector units, crypto
- accelerators, bit twiddling instructions, and lots of features to
- make virtualization and concurrency work. We have DSPs, GPUs,
- big.little, and Xeon Phi. This is only scratching the
- surface. Second, we’re getting tired of low-level languages and
- their associated security disasters, we want to write new code, to
- whatever extent possible, in safer, higher-level
- languages. Compilers are caught right in the middle of these
- opposing trends: one of their main jobs is to help bridge the large
- and growing gap between increasingly high-level languages and
- increasingly wacky platforms. It’s effectively a perpetual
- employment act for solid compiler hackers.''
-\end{quote}
+to take care of (like explicit memory management).
As a first example in this module we will implement a compiler for the
very simple While-language. It will generate code for the Java Virtual
-Machine (JVM). This is a stack-based virtual machine, a fact which will
-make it easy to generate code for arithmetic expressions. For example
-when compiling $1 + 2$ we need to generate the following three
-instructions
+Machine (JVM). Unfortunately the Java ecosystem does not come with an
+assembler which would be handy for our compiler-endeavour (unlike
+Microsoft's Common Language Infrastructure for the .Net platform which
+has an assembler out-of-the-box). As a substitute we use in this module
+the 3rd-party programs Jasmin and Krakatau
+
+\begin{itemize}
+ \item \url{http://jasmin.sourceforge.net}
+ \item \url{https://github.com/Storyyeller/Krakatau}
+\end{itemize}
+
+\noindent
+The first is a Java program and the second a program written in Python.
+Each of them allow us to generate \emph{assembly} files that are still
+readable by humans, as opposed to class-files which are pretty much just
+(horrible) zeros and ones. Jasmin (respectively Krakatau) will then take
+an assembly file as input and generate the corresponding class file for
+us.
+
+Good about the JVM is that it is a stack-based virtual machine, a fact
+which will make it easy to generate code for arithmetic expressions. For
+example when compiling the expression $1 + 2$ we need to generate the
+following three instructions
\begin{lstlisting}[language=JVMIS,numbers=none]
ldc 1
@@ -113,11 +114,10 @@
\noindent If we ``run'' these instructions, the result $8$ will be on
top of the stack (I leave this to you to verify; the meaning of each
instruction should be clear). The result being on the top of the stack
-will be a convention we always observe in our compiler, that is the
-results of arithmetic expressions will always be on top of the stack.
-Note, that a different bracketing of the expression, for example $(1 +
-(2 * 3)) + (4 - 3)$, produces a different abstract syntax tree and thus
-also a different list of instructions. Generating code in this
+will be an important convention we always observe in our compiler. Note,
+that a different bracketing of the expression, for example $(1 + (2 *
+3)) + (4 - 3)$, produces a different abstract syntax tree and thus also
+a different list of instructions. Generating code in this
post-order-traversal fashion is rather easy to implement: it can be done
with the following recursive \textit{compile}-function, which takes the
abstract syntax tree as argument:
@@ -184,7 +184,12 @@
\noindent In the last line we generate the code for variables
where $E(x)$ stands for looking up the environment to which
-index the variable $x$ maps to.
+index the variable $x$ maps to. This is similar to an interpreter,
+which also needs an environment: the difference is that the
+interpreter maintains a mapping from variables to current values (what is the
+currently the value of a variable), while compilers need a mapping
+from variables to memory locations (where can I find the current
+value for the variable in memory).
There is a similar \textit{compile}-function for boolean
expressions, but it includes a ``trick'' to do with
@@ -206,8 +211,9 @@
that assignments in the While-language might change the
environment---clearly if a variable is used for the first
time, we need to allocate a new index and if it has been used
-before, we need to be able to retrieve the associated index.
-This is reflected in the clause for compiling assignments:
+before, then we need to be able to retrieve the associated index.
+This is reflected in the clause for compiling assignments, say
+$\textit{x := a}$:
\begin{center}
\begin{tabular}{lcl}
@@ -228,7 +234,7 @@
of the variable $x$ in the program, then we have to augment
the environment and assign $x$ with the largest index in $E$
plus one (that is $E' = E(x \mapsto largest\_index + 1)$).
-That means for the assignment $x := x + 1$ we generate the
+This means for the assignment $x := x + 1$ we generate the
following code
\begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
@@ -239,12 +245,12 @@
\end{lstlisting}
\noindent
-where $n_x$ is the index for the variable $x$. The code for
+where $n_x$ is the index (or pointer to the memory) for the variable $x$ . The code for
looking-up the index for the variable is as follow:
\begin{center}
\begin{tabular}{lcl}
-$index \;=\; \textit{if}\;(E(x).\textit{isDefind})\;\textit{then}\;E(x)\;\textit{else}\;|E|$
+$index \;=\; E\textit{.getOrElse}(x, |E|)$
\end{tabular}
\end{center}
@@ -254,7 +260,7 @@
environment (that will be an index that is guaranteed to be not used
yet).
-More complicated is the generation of code for \pcode{if}-statements, say
+A bit more complicated is the generation of code for \pcode{if}-statements, say
\begin{lstlisting}[mathescape,language={},numbers=none]
if $b$ then $cs_1$ else $cs_2$
@@ -423,14 +429,14 @@
after-the-else-branch label. Consider for example the
if-statement:
-\begin{lstlisting}[mathescape,numbers=none,language={}]
+\begin{lstlisting}[mathescape,numbers=none,language=While]
if 1 = 1 then x := 2 else y := 3
\end{lstlisting}
\noindent
The generated code is as follows:
-\begin{lstlisting}[language=JVMIS,mathescape,language={}]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
ldc 1
ldc 1
if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
@@ -533,13 +539,13 @@
\noindent I let you go through how this clause works. As an example
you can consider the while-loop
-\begin{lstlisting}[mathescape,numbers=none,language={}]
+\begin{lstlisting}[mathescape,numbers=none,language=While]
while x <= 10 do x := x + 1
\end{lstlisting}
\noindent yielding the following code
-\begin{lstlisting}[language=JVMIS,mathescape,language={}]
+\begin{lstlisting}[language=JVMIS,mathescape,numbers=left]
L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
iload 0
ldc 10
@@ -559,6 +565,8 @@
-- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east);
\end{tikzpicture}
+\noindent
+I leave it to you to read the code and follow its controlflow.
Next we need to consider the statement \pcode{write x}, which
can be used to print out the content of a variable. For this
@@ -570,7 +578,7 @@
follows.
-\begin{lstlisting}[language=JVMIS]
+\begin{lstlisting}[language=JVMIS,numbers=left]
.method public static write(I)V
.limit locals 1
.limit stack 2
@@ -613,7 +621,7 @@
\begin{figure}[t]
-\begin{lstlisting}[mathescape,language=JVMIS]
+\begin{lstlisting}[mathescape,language=JVMIS,numbers=left]
.class public XXX.XXX
.super java/lang/Object
@@ -674,6 +682,53 @@
can be run by the {\tt{}java}-program.\label{test}}
\end{figure}
+\subsection*{Arrays}
+
+Maybe a useful addition to the While-language are arrays. This
+lets us generate more interesting While-programs by translating
+BF*** programs into equivalent While-programs. This means we
+want to support the following three constructions
+
+\begin{lstlisting}[mathescape,language=While]
+new arr[15000]
+x := 3 + arr[3 + y]
+arr[42 * n] := ...
+\end{lstlisting}
+
+\noindent
+The first one creates a new array with name \pcode{arr} that can
+hold a given number of integers. The second is referencing an array
+inside a arithmetic expression. Essentially we have to be able to
+look up the contents of an array at an index. Similarly we need to be
+able to update the content of an array (3rd line). The latter two
+constructions state that the index to an array can be given by
+an arithmetic expression. For creating a new array we need to generate
+the following JVM instructions:
+
+\begin{lstlisting}[mathescape,language=JVMIS]
+ldc number
+newarray int
+astore loc_var
+\end{lstlisting}
+
+\noindent
+First we need to put the dimension of the array onto the
+stack, then next instruction creates the array and last we
+need to store the array in a local variable (like ``simple'' variables).
+For looking up an element in an array we can use the following code
+
+\begin{lstlisting}[mathescape,language=JVMIS]
+aload loc_var
+index_aexp
+iaload
+\end{lstlisting}
+
+\noindent
+This first loads the ``pointer'' to the array onto the stack. Then we have the
+instructions corresponding to the index where we want to look up the array. The
+idea is that these instructions will leave a concrete number on the stack, which is
+the index. Finally we just need to load the corresponding element onto the stack.
+
\end{document}
%%% Local Variables:
--- a/langs.sty Thu Nov 14 13:50:39 2019 +0000
+++ b/langs.sty Fri Nov 15 14:58:16 2019 +0000
@@ -56,7 +56,7 @@
}[keywords,comments,strings]
\lstdefinelanguage{While}{
- morekeywords={if,then,else,while,do,true,false,write,upto,read,for,skip},
+ morekeywords={if,then,else,while,do,true,false,write,upto,read,for,skip,new},
morecomment=[l]{//},
morecomment=[n]{/*}{*/},
morestring=[b]",
--- a/progs/compile.scala Thu Nov 14 13:50:39 2019 +0000
+++ b/progs/compile.scala Fri Nov 15 14:58:16 2019 +0000
@@ -119,7 +119,7 @@
}
// this allows you to write things like
-// i"add" and l"Lable"
+// i"add" and l"Label"
// environments
@@ -205,7 +205,7 @@
// main compilation function for blocks
def compile(bl: Block, class_name: String) : String = {
val instructions = compile_block(bl, Map.empty)._1
- (beginning ++ instructions.mkString ++ ending).replaceAllLiterally("XXX", class_name)
+ (beginning ++ instructions ++ ending).replaceAllLiterally("XXX", class_name)
}
@@ -257,7 +257,7 @@
// Fibonacci numbers as a bare-bone test-case
val fib_test =
- List(Assign("n", Num(10)), // n := 10;
+ List(Assign("n", Num(9)), // n := 10;
Assign("minus1",Num(0)), // minus1 := 0;
Assign("minus2",Num(1)), // minus2 := 1;
Assign("temp",Num(0)), // temp := 0;