cws/cw05.tex
changeset 337 c0d9e6548b08
parent 336 25d9c3b2bc99
equal deleted inserted replaced
336:25d9c3b2bc99 337:c0d9e6548b08
    21 \mbox{}\hfill\textit{ I would pick implicits.''}\smallskip\\
    21 \mbox{}\hfill\textit{ I would pick implicits.''}\smallskip\\
    22 \mbox{}\hfill\textit{ --- Martin Odersky (creator of the Scala language)}\bigskip\bigskip
    22 \mbox{}\hfill\textit{ --- Martin Odersky (creator of the Scala language)}\bigskip\bigskip
    23 
    23 
    24 
    24 
    25 \noindent
    25 \noindent
    26 This part is about a small (esotheric) programming language called
    26 This part is about a small (esoteric) programming language called
    27 brainf***. Actually, we will implement an interpreter for our own version
    27 brainf***. Actually, we will implement an interpreter for our own version
    28 of this language called brainf*ck++.\bigskip
    28 of this language called brainf*ck++.\bigskip
    29 
    29 
    30 \IMPORTANT{This part is worth 10\% and you need to submit on \cwTEN{} at 4pm.}
    30 \IMPORTANT{This part is worth 10\% and you need to submit it on \cwTEN{} at 4pm.}
    31 
    31 
    32 \noindent
    32 \noindent
    33 Also note that the running time of each part will be restricted to a
    33 Also note that the running time of each part will be restricted to a
    34 maximum of 30 seconds on my laptop.
    34 maximum of 30 seconds on my laptop.
    35 
    35 
   128 and \texttt{']'} from the original, and in addition the commands
   128 and \texttt{']'} from the original, and in addition the commands
   129 \texttt{'@'}, \texttt{'*'} and \texttt{'\#'}.  Every other character
   129 \texttt{'@'}, \texttt{'*'} and \texttt{'\#'}.  Every other character
   130 is considered a comment.
   130 is considered a comment.
   131 
   131 
   132 Our interpreter for bf++ operates on memory cells containing
   132 Our interpreter for bf++ operates on memory cells containing
   133 integers. For this it uses a single memory pointer that points at each
   133 integers. For this it uses a single memory pointer, called
   134 stage to one memory cell. This pointer can be moved forward by one
   134 \texttt{mp}, that points at each stage to one memory cell.
   135 memory cell by using the command \texttt{'>'}, and backward by using
   135 
   136 \texttt{'<'}. The commands \texttt{'+'} and \texttt{'-'} increase,
   136 \begin{center}
   137 respectively decrease, by 1 the content of the memory cell to which
   137 \begin{tikzpicture}
   138 the memory pointer currently points to. The command for output is
   138   \draw [line width=1mm, rounded corners] (0,0) rectangle (5, 0.5);
   139 \texttt{'.'} whereby output works by reading the content of the memory
   139   \draw (0.5, 0) -- (0.5, 0.5);
   140 cell to which the memory pointer points to and printing it out as an
   140   \draw (1.0, 0) -- (1.0, 0.5);
   141 ASCII character.\footnote{In the originial version of bf, there is also a
   141 
   142   command for input, but we omit it here. All our programs will be
   142   \draw (2.5, 0) -- (2.5, 0.5);
   143   ``autonomous''.}  The
   143   \draw (2.0, 0) -- (2.0, 0.5);
       
   144 
       
   145   \draw (4.5, 0) -- (4.5, 0.5);
       
   146   \draw (4.0, 0) -- (4.0, 0.5);
       
   147 
       
   148   \draw (1.5,0.25) node {$\cdots$};
       
   149   \draw (3.0,0.25) node {$\cdots$};
       
   150 
       
   151   \draw [->, thick] (2.25, -0.5) -- (2.25, -0.15);
       
   152   \draw (2.25,-0.8) node {\texttt{mp}};
       
   153 
       
   154   \draw (0.7,0.7) node {\sf\footnotesize memory};
       
   155 \end{tikzpicture}    
       
   156 \end{center}  
       
   157 
       
   158 \noindent
       
   159 This pointer can be moved forward by one memory cell by using the
       
   160 command \texttt{'>'}, and backward by using \texttt{'<'}. The commands
       
   161 \texttt{'+'} and \texttt{'-'} increase, respectively decrease, by 1
       
   162 the content of the memory cell to which the memory pointer currently
       
   163 points to. The command for output in bf++ is \texttt{'.'} whereby output works
       
   164 by reading the content of the memory cell to which the memory pointer
       
   165 points to and printing it out as an ASCII character.\footnote{In the
       
   166   original version of bf, there is also a command for input, but we
       
   167   omit it here. All our programs will be ``autonomous''.}  The
   144 commands \texttt{'['} and \texttt{']'} are looping
   168 commands \texttt{'['} and \texttt{']'} are looping
   145 constructs. Everything in between \texttt{'['} and \texttt{']'} is
   169 constructs. Everything in between \texttt{'['} and \texttt{']'} is
   146 repeated until a counter (memory cell) reaches zero.  A typical
   170 repeated until a counter (memory cell) reaches zero.  A typical
   147 program in brainf*** looks as follows:
   171 program in brainf*** looks as follows:
   148 
   172 
   153 \end{verbatim}
   177 \end{verbatim}
   154 \end{center}  
   178 \end{center}  
   155 
   179 
   156 \noindent
   180 \noindent
   157 This one prints out Hello World\ldots{}obviously \texttt{;o)} We also
   181 This one prints out Hello World\ldots{}obviously \texttt{;o)} We also
   158 add 3 new commands to the bf++ langauge, whose purpose we explain later.
   182 add 3 new commands in the bf++-version of the bf-language. The purpose
       
   183 of these commands we explain later.
   159 
   184 
   160 
   185 
   161 \subsubsection*{Tasks (file bf.scala)}
   186 \subsubsection*{Tasks (file bf.scala)}
   162 
   187 
   163 \begin{itemize}
   188 \begin{itemize}
   164 \item[(1)]  Write a function that takes a filename (a string) as an argument
   189 \item[(1)]  Write a function that takes a filename (a string) as an argument
   165   and requests the corresponding file from disk. It returns the
   190   and requests the corresponding file from disk. It returns the
   166   content of the file as a string. If the file does not exists,
   191   content of the file as a string. If the file does not exists,
   167   the function should return the empty string.\\
   192   the function should return the empty string.
   168   \mbox{}\hfill[1 Mark]
   193   \mbox{}\hfill[1 Mark]
   169   
   194   
   170 \item[(2)] Brainf**k++ memory is represented by a \texttt{Map} from
   195 \item[(2)] Brainf**k++ memory is represented by a \texttt{Map} from
   171   integers to integers. The empty memory is represented by
   196   integers to integers. The empty memory is represented by
   172   \texttt{Map()}, that is nothing is stored in the
   197   \texttt{Map()}, that is nothing is stored in the
   276   Sierpinski triangle or the Hello world programs (they seem to be particularly
   301   Sierpinski triangle or the Hello world programs (they seem to be particularly
   277   useful for debugging purposes), or have a look at
   302   useful for debugging purposes), or have a look at
   278 
   303 
   279   \begin{center}
   304   \begin{center}
   280   \url{https://esolangs.org/wiki/Brainfuck}
   305   \url{https://esolangs.org/wiki/Brainfuck}
   281   \end{center}\hfill[2 Marks]
   306   \end{center}
       
   307 
       
   308   \noindent for more bf/bf++-programs and the test cases given in \texttt{bf.scala}.\\
       
   309   \mbox{}\hfill[2 Marks]
   282   
   310   
   283   \begin{figure}[p]
   311   \begin{figure}[p]
   284   \begin{center}
   312   \begin{center}
   285     \begin{tabular}{|@{\hspace{0.5mm}}p{0.8cm}|l|}
   313     \begin{tabular}{|@{\hspace{0.5mm}}p{0.8cm}|l|}
   286       \hline
   314       \hline
   330                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
   358                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
   331                        \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) == 0} then}\\
   359                        \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) == 0} then}\\
   332                        $\bullet$ & $\texttt{pc} + 1$\\
   360                        $\bullet$ & $\texttt{pc} + 1$\\
   333                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
   361                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
   334                            \end{tabular}\\\hline
   362                            \end{tabular}\\\hline
       
   363       \hfill\texttt{'*'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
       
   364                        $\bullet$ & $\texttt{pc} + 1$\\
       
   365                        $\bullet$ & $\texttt{mp}$ unchanged\\
       
   366                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) * mem(mp - 1)}\\
       
   367                              \multicolumn{2}{@{}l}{this multiplies the content of the memory cells at
       
   368                              \texttt{mp} and \texttt{mp - 1}}\\
       
   369                              \multicolumn{2}{@{}l}{and stores the result at \texttt{mp}}
       
   370                            \end{tabular}\\\hline
   335       \hfill\texttt{'@'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   371       \hfill\texttt{'@'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   336                        $\bullet$ & $\texttt{pc} + 1$\\
   372                        $\bullet$ & $\texttt{pc} + 1$\\
   337                        $\bullet$ & $\texttt{mp}$ unchanged\\
   373                        $\bullet$ & $\texttt{mp}$ unchanged\\
   338                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
   374                              $\bullet$ & \texttt{mem} updated with
   339                        \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
   375                                          \texttt{mem(mp) -> mem(mp - 1)}\\
   340                            \end{tabular}\\\hline
   376                              \multicolumn{2}{@{}l}{this updates the memory cell having the index stored at \texttt{mem(mp)},}\\
   341       \hfill\texttt{'*'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   377                              \multicolumn{2}{@{}l}{with the value stored at \texttt{mem(mp - 1)},}
   342                        $\bullet$ & $\texttt{pc} + 1$\\
       
   343                        $\bullet$ & $\texttt{mp}$ unchanged\\
       
   344                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
       
   345                        \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
       
   346                            \end{tabular}\\\hline
   378                            \end{tabular}\\\hline
   347       \hfill\texttt{'\#'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   379       \hfill\texttt{'\#'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   348                        $\bullet$ & $\texttt{pc} + 1$\\
   380                        $\bullet$ & $\texttt{pc} + 1$\\
   349                        $\bullet$ & $\texttt{mp}$ unchanged\\
   381                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
   350                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
   382                        $\bullet$ & print out \,\texttt{mem(mp)} as a number\\
   351                        \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
       
   352                      \end{tabular}\\\hline  
   383                      \end{tabular}\\\hline  
   353       any other char & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   384       any other char & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   354                          $\bullet$ & $\texttt{pc} + 1$\\
   385                          $\bullet$ & $\texttt{pc} + 1$\\
   355                          $\bullet$ & \texttt{mp} and \texttt{mem} unchanged
   386                          $\bullet$ & \texttt{mp} and \texttt{mem} unchanged
   356                        \end{tabular}\\
   387                        \end{tabular}\\
   357       \hline                 
   388       \hline                 
   358     \end{tabular}
   389     \end{tabular}
   359   \end{center}
   390     \\\mbox{}\\[-10mm]\mbox{}
   360   \caption{The rules for how commands in the brainf*** language update the program counter \texttt{pc},
   391   \end{center}
       
   392   \caption{The rules for how commands in the brainf***++ language update the
       
   393     program counter \texttt{pc},
   361     the memory pointer \texttt{mp} and the memory \texttt{mem}.\label{comms}}
   394     the memory pointer \texttt{mp} and the memory \texttt{mem}.\label{comms}}
   362   \end{figure}
   395   \end{figure}
   363 \end{itemize}\bigskip  
   396 \end{itemize}\bigskip  
   364 
   397 
   365 %%\newpage
   398 %%\newpage
   366 
   399 
   367 \subsection*{Part B (4 Marks)}
   400 \subsection*{Part B (4 Marks)}
   368 
   401 
   369 I am sure you agree while it is fun to look at bf-programs, like the
   402 I am sure you agree while it is fun to marvel at bf++-programs, like the
   370 Sierpinski triangle or the Mandelbrot program, being interpreted, it
   403 Sierpinski triangle or the Mandelbrot program, being interpreted, it
   371 is much more fun to write a compiler for the bf-language.
   404 is much more fun to write a compiler for the bf++-language.
   372 
   405 
   373 
   406 
   374 \subsubsection*{Tasks (file bfc.scala)}
   407 \subsubsection*{Tasks (file bfc.scala)}
   375 
   408 
   376 \begin{itemize}
   409 \begin{itemize}
   377 \item[(5)] Compilers in general attempt to make programs run
   410 \item[(5)] Compilers, in general, attempt to make programs run
   378   faster by precomputing as much information as possible
   411   faster by precomputing as much information as possible
   379   before running the program. In our case we can precompute the
   412   before running the program. In our case we can precompute the
   380   addresses where we need to jump at the beginning and end of
   413   addresses where we need to jump at the beginning and end of
   381   loops. 
   414   loops. 
   382 
   415 
   383   For this write a function \texttt{jtable} that precomputes the ``jump
   416   For this write a function \texttt{jtable} that precomputes the ``jump
   384   table'' for a bf-program. This function takes a bf-program 
   417   table'' for a bf++-program. This function takes a bf++-program 
   385   as an argument and returns a \texttt{Map[Int, Int]}. The 
   418   as an argument and returns a \texttt{Map[Int, Int]}. The 
   386   purpose of this Map is to record the information, in cases
   419   purpose of this Map is to record the information, in cases
   387   a pc-position points to a '\texttt{[}' or a '\texttt{]}',
   420   a pc-position points to a '\texttt{[}' or a '\texttt{]}',
   388   to which pc-position do we need to jump next?
   421   to which pc-position do we need to jump next?
   389  
   422  
   424   For the latter consider that it is difficult for compilers to
   457   For the latter consider that it is difficult for compilers to
   425   comprehend what is intended with whole programs, but they are very good
   458   comprehend what is intended with whole programs, but they are very good
   426   at finding out what small snippets of code do, and then try to
   459   at finding out what small snippets of code do, and then try to
   427   generate faster code for such snippets.
   460   generate faster code for such snippets.
   428 
   461 
   429   In our case, dead code is everything that is not a bf-command.
   462   In our case, dead code is everything that is not a bf++-command.
   430   Therefore write a function \texttt{optimise} which deletes such
   463   Therefore write a function \texttt{optimise} which deletes such
   431   dead code from a bf-program. Moreover this function should replace every substring
   464   dead code from a bf++-program. Moreover this function should replace every substring
   432   of the form \pcode{[-]} by a new command \texttt{0}. 
   465   of the form \pcode{[-]} by a new command \texttt{0}. 
   433   The idea is that the loop \pcode{[-]} just resets the
   466   The idea is that the loop \pcode{[-]} just resets the
   434   memory at the current location to 0. It is more efficient
   467   memory at the current location to 0. It is more efficient
   435   to do this in a single step, rather than stepwise in a loop as in
   468   to do this in a single step, rather than stepwise in a loop as in
   436   the original bf-programs.
   469   the original bf++-programs.
   437 
   470 
   438   In the extended \texttt{compute3} and \texttt{run3} functions you should
   471   In the extended \texttt{compute3} and \texttt{run3} functions you should
   439   implement this command by writing 0 to \pcode{mem(mp)}, that is use
   472   implement this command by writing 0 to \pcode{mem(mp)}, that is use
   440   \pcode{write(mem, mp, 0)} as the rule for the command \texttt{0}.
   473   \pcode{write(mem, mp, 0)} as the rule for the command \texttt{0}.
   441   The easiest way to modify a string in this way is to use the regular
   474   The easiest way to modify a string in this way is to use the regular
   442   expression \pcode{"""[^<>+-.,\\[\\]]"""}, which recognises everything that is 
   475   expression \pcode{"""[^<>+-.\\[\\]@\#*]"""}, which recognises everything that is 
   443   not a bf-command. Similarly, the
   476   not a bf++-command. Similarly, the
   444   regular expression \pcode{"""\\[-\\]"""} finds all occurrences of \pcode{[-]}.  By using the Scala method \pcode{.replaceAll} you can replace substrings
   477   regular expression \pcode{"""\\[-\\]"""} finds all occurrences of \pcode{[-]}.  By using the Scala method \pcode{.replaceAll} you can replace substrings
   445   with new strings.\\
   478   with new strings.\\
   446   \mbox{}\hfill{[1 Mark]}
   479   \mbox{}\hfill{[1 Mark]}
   447 
   480 
   448 \item[(7)] Finally, real compilers try to take advantage of CPUs which often
   481 \item[(7)] Finally, real compilers try to take advantage of modern
   449   provide complex operations in hardware that can combine many smaller
   482   CPUs which often provide complex operations in hardware that can
   450   instructions into a single faster instruction.
   483   combine many smaller instructions into a single faster instruction.
   451 
   484 
   452   In our case we can optimise the several single increments performed at a
   485   In our case we can optimise the several single increments performed at a
   453   memory cell, for example \pcode{++++}, by a single ``increment by
   486   memory cell, for example \pcode{++++}, by a single ``increment by
   454   4''. For this optimisation we just have to make sure these single
   487   4''. For this optimisation we just have to make sure these single
   455   increments are all next to each other. Similar optimisations should apply
   488   increments are all next to each other. Similar optimisations should apply
   456   for the bf-commands \pcode{-}, \pcode{<} and
   489   for the bf-commands \pcode{-}, \pcode{<} and
   457   \pcode{>}, which can all be replaced by extended versions that take
   490   \pcode{>}, which can all be replaced by extended versions that take
   458   the amount of the increment (decrement) into account. We will do
   491   the amount of the increment (decrement) into account. We will do
   459   this by introducing two-character bf-commands. For example
   492   this by introducing two-character bf++-commands. For example
   460 
   493 
   461   \begin{center}
   494   \begin{center}
   462     \begin{tabular}{l|l}
   495     \begin{tabular}{l|l}
   463       original bf-cmds & replacement\\
   496       original bf-cmds & replacement\\
   464       \hline
   497       \hline
   473 
   506 
   474 
   507 
   475   If there are more
   508   If there are more
   476   than 26 \pcode{+}'s in a row, then more than one ``two-character''
   509   than 26 \pcode{+}'s in a row, then more than one ``two-character''
   477   bf-commands need to be generated (the idea is that more than
   510   bf-commands need to be generated (the idea is that more than
   478   26 copies of a single bf-command in a row is a rare occurrence in
   511   26 copies of a single bf++-command in a row is a rare occurrence in
   479   actual bf-programs). Similar replacements apply
   512   actual bf++-programs). Similar replacements apply
   480   for \pcode{-}, \pcode{<} and \pcode{>}, but
   513   for \pcode{-}, \pcode{<} and \pcode{>}, but
   481   all other bf-commands should be unaffected by this
   514   all other bf++-commands should be unaffected by this
   482   change. 
   515   change. 
   483 
   516 
   484   For this write a function \texttt{combine} which replaces sequences
   517   For this write a function \texttt{combine} which replaces sequences
   485   of repeated increment and decrement commands by appropriate
   518   of repeated increment and decrement commands by appropriate
   486   two-character commands. In the functions \pcode{compute4} and
   519   two-character commands. In the functions \pcode{compute4} and
   487   \pcode{run4}, the ``combine'' and the optimisation from (6) should
   520   \pcode{run4}, the ``combine'' and the optimisation from (6) should
   488   be performed. Make sure that when a two-character bf-command is
   521   be performed. Make sure that when a two-character bf++-command is
   489   encountered you need to increase the \pcode{pc}-counter by two in
   522   encountered you need to increase the \pcode{pc}-counter by two in
   490   order to progress to the next command. For example
   523   order to progress to the next command. For example
   491 
   524 
   492   \begin{center}
   525   \begin{center}
   493   \pcode{combine(optimise(load_bff("benchmark.bf")))}  
   526   \pcode{combine(optimise(load_bff("benchmark.bf")))}  
   510   worthwhile overall (of course for the \pcode{>A}'s and so on, the compiler incurs a
   543   worthwhile overall (of course for the \pcode{>A}'s and so on, the compiler incurs a
   511   penalty). Luckily, after you have performed all
   544   penalty). Luckily, after you have performed all
   512   optimisations in (5) - (7), you can expect that the
   545   optimisations in (5) - (7), you can expect that the
   513   \pcode{benchmark.bf} program runs four to five times faster.
   546   \pcode{benchmark.bf} program runs four to five times faster.
   514   You can also test whether your compiler produces the correct result
   547   You can also test whether your compiler produces the correct result
   515   by for example testing
   548   by  testing for example
   516 
   549 
   517   \begin{center}
   550   \begin{center}
   518   \pcode{run(load_bff("sierpinski.bf")) == run4(load_bff("sierpinski.bf"))}
   551   \pcode{run(load_bff("sierpinski.bf")) == run4(load_bff("sierpinski.bf"))}
   519   \end{center}
   552   \end{center}
   520 
   553