# HG changeset patch # User Christian Urban # Date 1573829896 0 # Node ID 8d57433c7b5ee61ce8596f427d938964e13a27a3 # Parent d7c9ef381437c9e5bb5592ce10ffa8df1094e6ec updated diff -r d7c9ef381437 -r 8d57433c7b5e handouts/ho01.pdf Binary file handouts/ho01.pdf has changed diff -r d7c9ef381437 -r 8d57433c7b5e handouts/ho01.tex --- a/handouts/ho01.tex Thu Nov 14 13:50:39 2019 +0000 +++ b/handouts/ho01.tex Fri Nov 15 14:58:16 2019 +0000 @@ -43,8 +43,35 @@ \section*{Handout 1} +The purpose of a compiler is to transform a program a human can read +and write into code the machine can run as fast as +possible. Developping a compiler is an old craft going back to 1952 +with the first compiler written by Grace Hopper. Why studiying +compilers nowadays? An interesting answer is given by John Regher in +his compiler +blog:\footnote{\url{http://blog.regehr.org/archives/1419}} + +\begin{quote}\it{}``We can start off with a couple of observations + about the role of compilers. First, hardware is getting weirder + rather than getting clocked faster: almost all processors are + multicores and it looks like there is increasing asymmetry in + resources across cores. Processors come with vector units, crypto + accelerators, bit twiddling instructions, and lots of features to + make virtualization and concurrency work. We have DSPs, GPUs, + big.little, and Xeon Phi. This is only scratching the + surface. Second, we’re getting tired of low-level languages and + their associated security disasters, we want to write new code, to + whatever extent possible, in safer, higher-level + languages. Compilers are caught right in the middle of these + opposing trends: one of their main jobs is to help bridge the large + and growing gap between increasingly high-level languages and + increasingly wacky platforms. It’s effectively a perpetual + employment act for solid compiler hackers.'' +\end{quote} + + The first part of this module is about text processing, be it for -web-crawlers, compilers, dictionaries, DNA-data, spam-filters and so on. +compilers, dictionaries, DNA-data, spam-filters and so on. When looking for a particular string, say \pcode{"foobar"}, in a large text we can use the Knuth-Morris-Pratt algorithm, which is currently the most efficient general string search algorithm. But often we do diff -r d7c9ef381437 -r 8d57433c7b5e handouts/ho07.pdf Binary file handouts/ho07.pdf has changed diff -r d7c9ef381437 -r 8d57433c7b5e handouts/ho07.tex --- a/handouts/ho07.tex Thu Nov 14 13:50:39 2019 +0000 +++ b/handouts/ho07.tex Fri Nov 15 14:58:16 2019 +0000 @@ -11,41 +11,42 @@ \section*{Handout 7 (Compilation)} + + The purpose of a compiler is to transform a program a human can read and write into code the machine can run as fast as possible. The fastest code would be machine code the CPU can run directly, but it is often -good enough for improving the speed of a program to just target a +good enough for improving the speed of a program to target a virtual machine. This produces not the fastest possible code, but code -that is pretty enough. This way of producing code has the advantage that +that is often pretty fast. This way of producing code has the advantage that the virtual machine takes care of things a compiler would normally need -to take care of (like explicit memory management). An interesting answer -for why studying compilers is given by John Regher in his compiler -blog:\footnote{\url{http://blog.regehr.org/archives/1419}} - -\begin{quote}\it{}``We can start off with a couple of observations - about the role of compilers. First, hardware is getting weirder - rather than getting clocked faster: almost all processors are - multicores and it looks like there is increasing asymmetry in - resources across cores. Processors come with vector units, crypto - accelerators, bit twiddling instructions, and lots of features to - make virtualization and concurrency work. We have DSPs, GPUs, - big.little, and Xeon Phi. This is only scratching the - surface. Second, we’re getting tired of low-level languages and - their associated security disasters, we want to write new code, to - whatever extent possible, in safer, higher-level - languages. Compilers are caught right in the middle of these - opposing trends: one of their main jobs is to help bridge the large - and growing gap between increasingly high-level languages and - increasingly wacky platforms. It’s effectively a perpetual - employment act for solid compiler hackers.'' -\end{quote} +to take care of (like explicit memory management). As a first example in this module we will implement a compiler for the very simple While-language. It will generate code for the Java Virtual -Machine (JVM). This is a stack-based virtual machine, a fact which will -make it easy to generate code for arithmetic expressions. For example -when compiling $1 + 2$ we need to generate the following three -instructions +Machine (JVM). Unfortunately the Java ecosystem does not come with an +assembler which would be handy for our compiler-endeavour (unlike +Microsoft's Common Language Infrastructure for the .Net platform which +has an assembler out-of-the-box). As a substitute we use in this module +the 3rd-party programs Jasmin and Krakatau + +\begin{itemize} + \item \url{http://jasmin.sourceforge.net} + \item \url{https://github.com/Storyyeller/Krakatau} +\end{itemize} + +\noindent +The first is a Java program and the second a program written in Python. +Each of them allow us to generate \emph{assembly} files that are still +readable by humans, as opposed to class-files which are pretty much just +(horrible) zeros and ones. Jasmin (respectively Krakatau) will then take +an assembly file as input and generate the corresponding class file for +us. + +Good about the JVM is that it is a stack-based virtual machine, a fact +which will make it easy to generate code for arithmetic expressions. For +example when compiling the expression $1 + 2$ we need to generate the +following three instructions \begin{lstlisting}[language=JVMIS,numbers=none] ldc 1 @@ -113,11 +114,10 @@ \noindent If we ``run'' these instructions, the result $8$ will be on top of the stack (I leave this to you to verify; the meaning of each instruction should be clear). The result being on the top of the stack -will be a convention we always observe in our compiler, that is the -results of arithmetic expressions will always be on top of the stack. -Note, that a different bracketing of the expression, for example $(1 + -(2 * 3)) + (4 - 3)$, produces a different abstract syntax tree and thus -also a different list of instructions. Generating code in this +will be an important convention we always observe in our compiler. Note, +that a different bracketing of the expression, for example $(1 + (2 * +3)) + (4 - 3)$, produces a different abstract syntax tree and thus also +a different list of instructions. Generating code in this post-order-traversal fashion is rather easy to implement: it can be done with the following recursive \textit{compile}-function, which takes the abstract syntax tree as argument: @@ -184,7 +184,12 @@ \noindent In the last line we generate the code for variables where $E(x)$ stands for looking up the environment to which -index the variable $x$ maps to. +index the variable $x$ maps to. This is similar to an interpreter, +which also needs an environment: the difference is that the +interpreter maintains a mapping from variables to current values (what is the +currently the value of a variable), while compilers need a mapping +from variables to memory locations (where can I find the current +value for the variable in memory). There is a similar \textit{compile}-function for boolean expressions, but it includes a ``trick'' to do with @@ -206,8 +211,9 @@ that assignments in the While-language might change the environment---clearly if a variable is used for the first time, we need to allocate a new index and if it has been used -before, we need to be able to retrieve the associated index. -This is reflected in the clause for compiling assignments: +before, then we need to be able to retrieve the associated index. +This is reflected in the clause for compiling assignments, say +$\textit{x := a}$: \begin{center} \begin{tabular}{lcl} @@ -228,7 +234,7 @@ of the variable $x$ in the program, then we have to augment the environment and assign $x$ with the largest index in $E$ plus one (that is $E' = E(x \mapsto largest\_index + 1)$). -That means for the assignment $x := x + 1$ we generate the +This means for the assignment $x := x + 1$ we generate the following code \begin{lstlisting}[language=JVMIS,mathescape,numbers=none] @@ -239,12 +245,12 @@ \end{lstlisting} \noindent -where $n_x$ is the index for the variable $x$. The code for +where $n_x$ is the index (or pointer to the memory) for the variable $x$ . The code for looking-up the index for the variable is as follow: \begin{center} \begin{tabular}{lcl} -$index \;=\; \textit{if}\;(E(x).\textit{isDefind})\;\textit{then}\;E(x)\;\textit{else}\;|E|$ +$index \;=\; E\textit{.getOrElse}(x, |E|)$ \end{tabular} \end{center} @@ -254,7 +260,7 @@ environment (that will be an index that is guaranteed to be not used yet). -More complicated is the generation of code for \pcode{if}-statements, say +A bit more complicated is the generation of code for \pcode{if}-statements, say \begin{lstlisting}[mathescape,language={},numbers=none] if $b$ then $cs_1$ else $cs_2$ @@ -423,14 +429,14 @@ after-the-else-branch label. Consider for example the if-statement: -\begin{lstlisting}[mathescape,numbers=none,language={}] +\begin{lstlisting}[mathescape,numbers=none,language=While] if 1 = 1 then x := 2 else y := 3 \end{lstlisting} \noindent The generated code is as follows: -\begin{lstlisting}[language=JVMIS,mathescape,language={}] +\begin{lstlisting}[language=JVMIS,mathescape,numbers=left] ldc 1 ldc 1 if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$ @@ -533,13 +539,13 @@ \noindent I let you go through how this clause works. As an example you can consider the while-loop -\begin{lstlisting}[mathescape,numbers=none,language={}] +\begin{lstlisting}[mathescape,numbers=none,language=While] while x <= 10 do x := x + 1 \end{lstlisting} \noindent yielding the following code -\begin{lstlisting}[language=JVMIS,mathescape,language={}] +\begin{lstlisting}[language=JVMIS,mathescape,numbers=left] L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$ iload 0 ldc 10 @@ -559,6 +565,8 @@ -- ++(0mm,-17.3mm) |- (\tikztotarget)},line width=1mm] (LD.east); \end{tikzpicture} +\noindent +I leave it to you to read the code and follow its controlflow. Next we need to consider the statement \pcode{write x}, which can be used to print out the content of a variable. For this @@ -570,7 +578,7 @@ follows. -\begin{lstlisting}[language=JVMIS] +\begin{lstlisting}[language=JVMIS,numbers=left] .method public static write(I)V .limit locals 1 .limit stack 2 @@ -613,7 +621,7 @@ \begin{figure}[t] -\begin{lstlisting}[mathescape,language=JVMIS] +\begin{lstlisting}[mathescape,language=JVMIS,numbers=left] .class public XXX.XXX .super java/lang/Object @@ -674,6 +682,53 @@ can be run by the {\tt{}java}-program.\label{test}} \end{figure} +\subsection*{Arrays} + +Maybe a useful addition to the While-language are arrays. This +lets us generate more interesting While-programs by translating +BF*** programs into equivalent While-programs. This means we +want to support the following three constructions + +\begin{lstlisting}[mathescape,language=While] +new arr[15000] +x := 3 + arr[3 + y] +arr[42 * n] := ... +\end{lstlisting} + +\noindent +The first one creates a new array with name \pcode{arr} that can +hold a given number of integers. The second is referencing an array +inside a arithmetic expression. Essentially we have to be able to +look up the contents of an array at an index. Similarly we need to be +able to update the content of an array (3rd line). The latter two +constructions state that the index to an array can be given by +an arithmetic expression. For creating a new array we need to generate +the following JVM instructions: + +\begin{lstlisting}[mathescape,language=JVMIS] +ldc number +newarray int +astore loc_var +\end{lstlisting} + +\noindent +First we need to put the dimension of the array onto the +stack, then next instruction creates the array and last we +need to store the array in a local variable (like ``simple'' variables). +For looking up an element in an array we can use the following code + +\begin{lstlisting}[mathescape,language=JVMIS] +aload loc_var +index_aexp +iaload +\end{lstlisting} + +\noindent +This first loads the ``pointer'' to the array onto the stack. Then we have the +instructions corresponding to the index where we want to look up the array. The +idea is that these instructions will leave a concrete number on the stack, which is +the index. Finally we just need to load the corresponding element onto the stack. + \end{document} %%% Local Variables: diff -r d7c9ef381437 -r 8d57433c7b5e langs.sty --- a/langs.sty Thu Nov 14 13:50:39 2019 +0000 +++ b/langs.sty Fri Nov 15 14:58:16 2019 +0000 @@ -56,7 +56,7 @@ }[keywords,comments,strings] \lstdefinelanguage{While}{ - morekeywords={if,then,else,while,do,true,false,write,upto,read,for,skip}, + morekeywords={if,then,else,while,do,true,false,write,upto,read,for,skip,new}, morecomment=[l]{//}, morecomment=[n]{/*}{*/}, morestring=[b]", diff -r d7c9ef381437 -r 8d57433c7b5e progs/compile.scala --- a/progs/compile.scala Thu Nov 14 13:50:39 2019 +0000 +++ b/progs/compile.scala Fri Nov 15 14:58:16 2019 +0000 @@ -119,7 +119,7 @@ } // this allows you to write things like -// i"add" and l"Lable" +// i"add" and l"Label" // environments @@ -205,7 +205,7 @@ // main compilation function for blocks def compile(bl: Block, class_name: String) : String = { val instructions = compile_block(bl, Map.empty)._1 - (beginning ++ instructions.mkString ++ ending).replaceAllLiterally("XXX", class_name) + (beginning ++ instructions ++ ending).replaceAllLiterally("XXX", class_name) } @@ -257,7 +257,7 @@ // Fibonacci numbers as a bare-bone test-case val fib_test = - List(Assign("n", Num(10)), // n := 10; + List(Assign("n", Num(9)), // n := 10; Assign("minus1",Num(0)), // minus1 := 0; Assign("minus2",Num(1)), // minus2 := 1; Assign("temp",Num(0)), // temp := 0;