handouts/ho07.tex
changeset 692 8c7ccdebcb89
parent 691 991849dfbcb1
child 705 bfc8703b1527
equal deleted inserted replaced
691:991849dfbcb1 692:8c7ccdebcb89
    56 
    56 
    57 \noindent The first instruction loads the constant $1$ onto
    57 \noindent The first instruction loads the constant $1$ onto
    58 the stack, the next one loads $2$, the third instruction adds both
    58 the stack, the next one loads $2$, the third instruction adds both
    59 numbers together replacing the top two elements of the stack
    59 numbers together replacing the top two elements of the stack
    60 with the result $3$. For simplicity, we will throughout
    60 with the result $3$. For simplicity, we will throughout
    61 consider only integer numbers and results. Therefore we can
    61 consider only integer numbers. Therefore we can
    62 use the JVM instructions \code{iadd}, \code{isub},
    62 use the JVM instructions \code{iadd}, \code{isub},
    63 \code{imul}, \code{idiv} and so on. The \code{i} stands for
    63 \code{imul}, \code{idiv} and so on. The \code{i} stands for
    64 integer instructions in the JVM (alternatives are \code{d} for
    64 integer instructions in the JVM (alternatives are \code{d} for
    65 doubles, \code{l} for longs and \code{f} for floats).
    65 doubles, \code{l} for longs and \code{f} for floats).
    66 
    66 
   162 a kind of environment where variables are associated to
   162 a kind of environment where variables are associated to
   163 numbers. This association needs to be unique: if we muddle up
   163 numbers. This association needs to be unique: if we muddle up
   164 the numbers, then we essentially confuse variables and the
   164 the numbers, then we essentially confuse variables and the
   165 consequence will usually be an erroneous result. Our extended
   165 consequence will usually be an erroneous result. Our extended
   166 \textit{compile}-function for arithmetic expressions will
   166 \textit{compile}-function for arithmetic expressions will
   167 therefore take two arguments: the abstract syntax tree and the
   167 therefore take two arguments: the abstract syntax tree and an
   168 environment, $E$, that maps identifiers to index-numbers.
   168 environment, $E$, that maps identifiers to index-numbers.
   169 
   169 
   170 \begin{center}
   170 \begin{center}
   171 \begin{tabular}{lcl}
   171 \begin{tabular}{lcl}
   172 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
   172 $\textit{compile}(n, E)$ & $\dn$ & $\pcode{ldc}\;n$\\
   232 the index is and return the environment unchanged (that is in
   232 the index is and return the environment unchanged (that is in
   233 this case $E' = E$). However, if this is the first encounter 
   233 this case $E' = E$). However, if this is the first encounter 
   234 of the variable $x$ in the program, then we have to augment 
   234 of the variable $x$ in the program, then we have to augment 
   235 the environment and assign $x$ with the largest index in $E$
   235 the environment and assign $x$ with the largest index in $E$
   236 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
   236 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
   237 This means for the assignment $x := x + 1$ we generate the
   237 To sum up, for the assignment $x := x + 1$ we generate the
   238 following code
   238 following code
   239 
   239 
   240 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   240 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   241 iload $n_x$
   241 iload $n_x$
   242 ldc 1
   242 ldc 1
   243 iadd
   243 iadd
   244 istore $n_x$
   244 istore $n_x$
   245 \end{lstlisting}
   245 \end{lstlisting}
   246 
   246 
   247 \noindent 
   247 \noindent 
   248 where $n_x$ is the index (or pointer to the memory) for the variable $x$ . The code for 
   248 where $n_x$ is the index (or pointer to the memory) for the variable
   249 looking-up the index for the variable is as follow:
   249 $x$. The code for looking-up the index for the variable is as follow:
   250 
   250 
   251 \begin{center}
   251 \begin{center}
   252 \begin{tabular}{lcl}
   252 \begin{tabular}{lcl}
   253 $index \;=\; E\textit{.getOrElse}(x, |E|)$
   253 $index \;=\; E\textit{.getOrElse}(x, |E|)$
   254 \end{tabular}
   254 \end{tabular}
   256 
   256 
   257 \noindent
   257 \noindent
   258 In case the environment $E$ contains an index for $x$, we return it.
   258 In case the environment $E$ contains an index for $x$, we return it.
   259 Otherwise we ``create'' a new index by returning the size $|E|$ of the
   259 Otherwise we ``create'' a new index by returning the size $|E|$ of the
   260 environment (that will be an index that is guaranteed to be not used
   260 environment (that will be an index that is guaranteed to be not used
   261 yet).
   261 yet). In all this we take advantage of the JVM which provides us with 
   262 
   262 a potentially limitless supply of places where we can store values
   263 A bit more complicated is the generation of code for \pcode{if}-statements, say
   263 of variables.
       
   264 
       
   265 A bit more complicated is the generation of code for
       
   266 \pcode{if}-statements, say
   264 
   267 
   265 \begin{lstlisting}[mathescape,language={},numbers=none]
   268 \begin{lstlisting}[mathescape,language={},numbers=none]
   266 if $b$ then $cs_1$ else $cs_2$
   269 if $b$ then $cs_1$ else $cs_2$
   267 \end{lstlisting}
   270 \end{lstlisting}
   268 
   271 
   269 \noindent where $b$ is a boolean expression and the $cs_{1/2}$ are the
   272 \noindent where $b$ is a boolean expression and where both $cs_{1/2}$
   270 statements for each of the \pcode{if}-branches. Lets assume we already
   273 are the statements for each of the \pcode{if}-branches. Lets assume we
   271 generated code for $b$ and $cs_{1/2}$. Then in the true-case the
   274 already generated code for $b$ and $cs_{1/2}$. Then in the true-case the
   272 control-flow of the program needs to be
   275 control-flow of the program needs to behave as 
   273 
   276 
   274 \begin{center}
   277 \begin{center}
   275 \begin{tikzpicture}[node distance=2mm and 4mm,
   278 \begin{tikzpicture}[node distance=2mm and 4mm,
   276  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   279  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   277  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   280  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   344   $instr_1$
   347   $instr_1$
   345   $instr_2$
   348   $instr_2$
   346     $\vdots$
   349     $\vdots$
   347 \end{lstlisting}
   350 \end{lstlisting}
   348  
   351  
   349 \noindent where a label is indicated by a colon. 
   352 \noindent where a label is indicated by a colon. The task of the
       
   353 assmbler (in our case Jasmin or Krakatau) is to resolve the labels
       
   354 to actual addresses, for example jump 10 instructions forward,
       
   355 or 20 instructions backwards.
   350  
   356  
   351 Recall the ``trick'' with compiling boolean expressions: the 
   357 Recall the ``trick'' with compiling boolean expressions: the 
   352 \textit{compile}-function for boolean expressions takes three
   358 \textit{compile}-function for boolean expressions takes three
   353 arguments: an abstract syntax tree, an environment for 
   359 arguments: an abstract syntax tree, an environment for 
   354 variable indices and also the label, $lab$, to where an conditional 
   360 variable indices and also the label, $lab$, to where an conditional 
   370 with the next instructions (see control-flow of ifs above).
   376 with the next instructions (see control-flow of ifs above).
   371 However if they are \emph{not} equal, then we need to
   377 However if they are \emph{not} equal, then we need to
   372 (conditionally) jump to the label $lab$. This can be done with
   378 (conditionally) jump to the label $lab$. This can be done with
   373 the instruction
   379 the instruction
   374 
   380 
   375 \begin{lstlisting}[mathescape,numbers=none]
   381 \begin{lstlisting}[mathescape,numbers=none,language=JVMIS]
   376 if_icmpne $lab$
   382 if_icmpne $lab$
   377 \end{lstlisting}
   383 \end{lstlisting}
   378 
   384 
   379 \noindent Other jump instructions for boolean operators are
   385 \noindent Other jump instructions for boolean operators are
   380 
   386 
   385 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
   391 $\le$ & $\Rightarrow$ & \pcode{if_icmpgt}\\
   386 \end{tabular}
   392 \end{tabular}
   387 \end{center}
   393 \end{center}
   388 
   394 
   389 \noindent and so on. I leave it to you to extend the
   395 \noindent and so on. I leave it to you to extend the
   390 \textit{compile}-function for the other boolean expressions.
   396 \textit{compile}-function for the other boolean expressions. Note that
   391 Note that we need to jump whenever the boolean is \emph{not}
   397 we need to jump whenever the boolean is \emph{not} true, which means we
   392 true, which means we have to ``negate'' the jump
   398 have to ``negate'' the jump condition---equals becomes not-equal, less
   393 condition---equals becomes not-equal, less becomes
   399 becomes greater-or-equal. If you do not like this design (it can be the
   394 greater-or-equal. If you do not like this design (it can be
   400 source of some nasty, hard-to-detect errors), you can also change the
   395 the source of some nasty, hard-to-detect errors), you can also
   401 layout of the code and first give the code for the else-branch and then
   396 change the layout of the code and first give the code for the
   402 for the if-branch. However in the case of while-loops this
   397 else-branch and then for the if-branch. However in the case
   403 ``upside-down-inside-out'' way of generating code still seems the most
   398 of while-loops this way of generating code still seems
   404 convenient.
   399 the most convenient.
       
   400 
   405 
   401 We are now ready to give the compile function for 
   406 We are now ready to give the compile function for 
   402 if-statements---remember this function returns for statements a 
   407 if-statements---remember this function returns for statements a 
   403 pair consisting of the code and an environment:
   408 pair consisting of the code and an environment:
   404 
   409 
   667 \begin{lstlisting}[mathescape,language=While]
   672 \begin{lstlisting}[mathescape,language=While]
   668 x := 1 + 2;
   673 x := 1 + 2;
   669 write x
   674 write x
   670 \end{lstlisting}
   675 \end{lstlisting}
   671 
   676 
   672 \noindent Having this code at our disposal, we need the
   677 \noindent I let you read the code and make sure the code behaves as
   673 assembler to translate the generated code into JVM bytecode (a
   678 expected. Having this code at our disposal, we need the assembler to
   674 class file). This bytecode is understood by the JVM and can be
   679 translate the generated code into JVM bytecode (a class file). This
   675 run by just invoking the \pcode{java}-program.
   680 bytecode is then understood by the JVM and can be run by just invoking
       
   681 the \pcode{java}-program.
   676 
   682 
   677 
   683 
   678 \begin{figure}[p]
   684 \begin{figure}[p]
   679 \lstinputlisting[language=JVMIS]{../progs/test-small.j}
   685 \lstinputlisting[language=JVMIS]{../progs/test-small.j}
   680 \caption{Generated code for a test program. This code can be 
   686 \caption{Generated code for a test program. This code can be 
   698 \noindent
   704 \noindent
   699 The first construct is for creating new arrays, in this instance the
   705 The first construct is for creating new arrays, in this instance the
   700 name of the array is \pcode{arr} and it can hold 15000 integers. The
   706 name of the array is \pcode{arr} and it can hold 15000 integers. The
   701 second is for referencing an array cell inside an arithmetic
   707 second is for referencing an array cell inside an arithmetic
   702 expression---we need to be able to look up the contents of an array at
   708 expression---we need to be able to look up the contents of an array at
   703 an index determined by an arithmetic expression. Similarly, we need to
   709 an index determined by an arithmetic expression. Similarly in the line
   704 be able to update the content of an array at an calculated index---the
   710 below, we need to be able to update the content of an array at an
   705 3rd feature.  
   711 calculated index.  
   706 
   712 
   707 For creating a new array we can generate the following three JVM
   713 For creating a new array we can generate the following three JVM
   708 instructions:
   714 instructions:
   709 
   715 
   710 \begin{lstlisting}[mathescape,language=JVMIS]
   716 \begin{lstlisting}[mathescape,language=JVMIS]
   715 
   721 
   716 \noindent
   722 \noindent
   717 First we need to put the dimension of the array onto the stack. The next
   723 First we need to put the dimension of the array onto the stack. The next
   718 instruction creates the array. With the last we can store the array as a
   724 instruction creates the array. With the last we can store the array as a
   719 local variable (like the ``simple'' variables from the previous
   725 local variable (like the ``simple'' variables from the previous
   720 section). The use of a local variable allows us to have multiple arrays
   726 section). The use of a local variable for each array allows us to have
   721 in a While-program. For looking up an element in an array we can use the
   727 multiple arrays in a While-program. For looking up an element in an
   722 following JVM code
   728 array we can use the following JVM code
   723 
   729 
   724 \begin{lstlisting}[mathescape,language=JVMIS]
   730 \begin{lstlisting}[mathescape,language=JVMIS]
   725 aload loc_var 
   731 aload loc_var 
   726 index_aexp 
   732 index_aexp 
   727 iaload
   733 iaload
   744 \end{lstlisting}
   750 \end{lstlisting}
   745 
   751 
   746 \noindent
   752 \noindent
   747 Again the first instruction loads the ``pointer'' to the array onto the
   753 Again the first instruction loads the ``pointer'' to the array onto the
   748 stack. Then we have some instructions corresponding to the index where
   754 stack. Then we have some instructions corresponding to the index where
   749 we want to update the array. After that come the instructions for with what
   755 we want to update the array. After that come the instructions for with
   750 value we want to update the array. Finally the instruction for 
   756 what value we want to update the array. The last line contains the
   751 updating the array.
   757 instruction for updating the array.
   752 
   758 
   753 Next we need to update our grammar rules: it seems best to extend the 
   759 Next we need to modify our grammar rules for our While-language: it
   754 rule for factors in arithmetic expressions with a rule for looking up
   760 seems best to extend the rule for factors in arithmetic expressions with
   755 an array.
   761 a rule for looking up an array.
   756 
   762 
   757 \begin{plstx}[rhs style=, margin=3cm]
   763 \begin{plstx}[rhs style=, margin=3cm]
   758 : \meta{E} ::= \meta{T} $+$ \meta{E}
   764 : \meta{E} ::= \meta{T} $+$ \meta{E}
   759          | \meta{T} $-$ \meta{E}
   765          | \meta{T} $-$ \meta{E}
   760          | \meta{T}\\
   766          | \meta{T}\\
   767           | \meta{Num}\\
   773           | \meta{Num}\\
   768 \end{plstx}
   774 \end{plstx}
   769 
   775 
   770 \noindent
   776 \noindent
   771 There is no problem with left-recursion as the \meta{E} is ``protected''
   777 There is no problem with left-recursion as the \meta{E} is ``protected''
   772 by an identifier and brackets. There are two new rules in statements,
   778 by an identifier and the brackets. There are two new rules for statements,
   773 namely creation of an array and array assignment:
   779 one for creating an array and one for array assignment:
   774 
   780 
   775 \begin{plstx}[rhs style=, margin=2cm, one per line]
   781 \begin{plstx}[rhs style=, margin=2cm, one per line]
   776 : \meta{Stmt} ::=  \ldots
   782 : \meta{Stmt} ::=  \ldots
   777               | \texttt{new}\; \meta{Id}\,[\,\meta{Num}\,] 
   783               | \texttt{new}\; \meta{Id}\,[\,\meta{Num}\,] 
   778               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
   784               | \meta{Id}\,[\,\meta{E}\,]\,:=\,\meta{E}\\
   779 \end{plstx}
   785 \end{plstx}
   780 
   786 
       
   787 With this in place we can turn back to the idea of creating While
       
   788 programs by translating BF programs. This is a relatively easy task
       
   789 because BF only has eight instructions (we will actually only have seven
       
   790 because we can omit the read-in instruction from BF). But also translating
       
   791 BF-loops is going to be easy since they straightforwardly can be 
       
   792 represented by While-loops. The Scala code for the translation is
       
   793 as follows:
       
   794 
       
   795 \begin{lstlisting}[language=Scala,numbers=left]
       
   796 def instr(c: Char) : String = c match {
       
   797   case '>' => "ptr := ptr + 1;"
       
   798   case '<' => "ptr := ptr - 1;"
       
   799   case '+' => "field[ptr] := field[ptr] + 1;"
       
   800   case '-' => "field[ptr] := field[ptr] - 1;"
       
   801   case '.' => "x := field[ptr]; write x;"
       
   802   case '['  => "while (field[ptr] != 0) do {"
       
   803   case ']'  => "skip};"
       
   804   case _ => ""
       
   805 }
       
   806 \end{lstlisting}
       
   807 
       
   808 \noindent 
       
   809 The idea behind the translation is that BF-programs operate on an array,
       
   810 called \texttt{field}. The BP-memory pointer into this array is
       
   811 represented as the variable \texttt{ptr}. The BF-instructions \code{>}
       
   812 and \code{<} increase, respectively decrease, \texttt{ptr}. The
       
   813 instructions \code{+} and \code{-} update a cell in \texttt{field}. In
       
   814 Line 6 we need to first assign a field-cell to an auxiliary variable
       
   815 since we have not changed our write functions in order to cope with
       
   816 writing out any array-content directly. Lines 7 and 8 are for
       
   817 translating BF-loops. Line 8 is interesting in the sense that we need to
       
   818 generate a \code{skip} instruction just before finishing with the 
       
   819 closing \code{"\}"}. The reason is that we are rather pedantic about
       
   820 semicolons in our While-grammar: the last command cannot have a
       
   821 semicolon---adding a \code{skip} works around this snag. Putting
       
   822 all this together is we can generate While-programs with more than
       
   823 400 instructions and then run the compiled JVM code for such programs.
       
   824 
       
   825 
   781 \end{document}
   826 \end{document}
   782 
   827 
   783 %%% Local Variables: 
   828 %%% Local Variables: 
   784 %%% mode: latex  
   829 %%% mode: latex  
   785 %%% TeX-master: t
   830 %%% TeX-master: t