cws/cw05.tex
changeset 237 db4d2fcd8063
parent 234 c51305a2217f
child 241 c650a91db720
equal deleted inserted replaced
236:e461b5325b5e 237:db4d2fcd8063
    95 language was already introduced in 1964 by Corado B\"ohm, an Italian
    95 language was already introduced in 1964 by Corado B\"ohm, an Italian
    96 computer pioneer. The main feature of brainf*** is its minimalistic
    96 computer pioneer. The main feature of brainf*** is its minimalistic
    97 set of instructions---just 8 instructions in total and all of which
    97 set of instructions---just 8 instructions in total and all of which
    98 are single characters. Despite the minimalism, this language has been
    98 are single characters. Despite the minimalism, this language has been
    99 shown to be Turing complete\ldots{}if this doesn't ring any bell with
    99 shown to be Turing complete\ldots{}if this doesn't ring any bell with
   100 you: it roughly means that every algorithm we know can, in principle,
   100 you: it roughly means that every(!) algorithm can, in principle,
   101 be implemented in brainf***. It just takes a lot of determination and
   101 be implemented in brainf***. It just takes a lot of determination and
   102 quite a lot of memory resources. Some relatively sophisticated sample
   102 quite a lot of memory resources. Some relatively sophisticated sample
   103 programs in brainf*** are given in the file \texttt{bf.scala}, including
   103 programs in brainf*** are given in the file \texttt{bf.scala}, including
   104 a brainf*** program for the Sierpinski triangle and the Mandelbrot set.\bigskip
   104 a brainf*** program for the Sierpinski triangle and the Mandelbrot set.\bigskip
   105 
   105 
   134 This one prints out Hello World\ldots{}obviously ;o) 
   134 This one prints out Hello World\ldots{}obviously ;o) 
   135 
   135 
   136 \subsubsection*{Tasks (file bf.scala)}
   136 \subsubsection*{Tasks (file bf.scala)}
   137 
   137 
   138 \begin{itemize}
   138 \begin{itemize}
   139 \item[(1)]  Write a function that takes a file name as an argument
   139 \item[(1)]  Write a function that takes a filename (a string) as an argument
   140   and requests the corresponding file from disk. It returns the
   140   and requests the corresponding file from disk. It returns the
   141   content of the file as a String. If the file does not exists,
   141   content of the file as a string. If the file does not exists,
   142   the function should return the empty string.\\
   142   the function should return the empty string.\\
   143   \mbox{}\hfill[1 Mark]
   143   \mbox{}\hfill[1 Mark]
   144   
   144   
   145 \item[(2)] Brainf*** memory is represented by a \texttt{Map} from
   145 \item[(2)] Brainf*** memory is represented by a \texttt{Map} from
   146   integers to integers. The empty memory is represented by
   146   integers to integers. The empty memory is represented by
   148   memory; \texttt{Map(0 -> 1, 2 -> 3)} stores \texttt{1} at
   148   memory; \texttt{Map(0 -> 1, 2 -> 3)} stores \texttt{1} at
   149   memory location \texttt{0}, and at \texttt{2} it stores \texttt{3}. The
   149   memory location \texttt{0}, and at \texttt{2} it stores \texttt{3}. The
   150   convention is that if we query the memory at a location that is
   150   convention is that if we query the memory at a location that is
   151   \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
   151   \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
   152   a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
   152   a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
   153   a memory pointer (an \texttt{Int}) as argument, and `safely' reads the
   153   a memory pointer (an \texttt{Int}) as arguments, and `safely' reads the
   154   corresponding memory location. If the \texttt{Map} is not defined at
   154   corresponding memory location. If the \texttt{Map} is not defined at
   155   the memory pointer, \texttt{sread} returns \texttt{0}.
   155   the memory pointer, \texttt{sread} returns \texttt{0}.
   156 
   156 
   157   Write another function \texttt{write}, which takes a memory, a
   157   Write another function \texttt{write}, which takes a memory, a
   158   memory pointer and an integer value as argument and updates the
   158   memory pointer and an integer value as arguments and updates the
   159   \texttt{Map} with the value at the given memory location. As usual
   159   \texttt{Map} with the value at the given memory location. As usual,
   160   the \texttt{Map} is not updated `in-place' but a new map is created
   160   the \texttt{Map} is not updated `in-place' but a new map is created
   161   with the same data, except the value is stored at the given memory
   161   with the same data, except the new value is stored at the given memory
   162   pointer.\hfill[1 Mark]
   162   pointer.\hfill[1 Mark]
   163 
   163 
   164 \item[(3)] Write two functions, \texttt{jumpRight} and
   164 \item[(3)] Write two functions, \texttt{jumpRight} and
   165   \texttt{jumpLeft} that are needed to implement the loop constructs
   165   \texttt{jumpLeft}, that are needed to implement the loop constructs
   166   of brainf***. They take a program (a \texttt{String}) and a program
   166   in brainf***. They take a program (a \texttt{String}) and a program
   167   counter (an \texttt{Int}) as argument and move right (respectively
   167   counter (an \texttt{Int}) as arguments and move right (respectively
   168   left) in the string in order to find the \textbf{matching}
   168   left) in the string in order to find the \textbf{matching}
   169   opening/closing bracket. For example, given the following program
   169   opening/closing bracket. For example, given the following program
   170   with the program counter indicated by an arrow:
   170   with the program counter indicated by an arrow:
   171 
   171 
   172   \begin{center}
   172   \begin{center}
   178   
   178   
   179   \begin{center}
   179   \begin{center}
   180   \texttt{--[..+>--]\barbelow{,}>,++}
   180   \texttt{--[..+>--]\barbelow{,}>,++}
   181   \end{center}
   181   \end{center}
   182 
   182 
   183   meaning it jumps to after the loop. Similarly, if you are in 8th position
   183   meaning it jumps to after the loop. Similarly, if you are in 8th position,
   184   then \texttt{jumpLeft} is supposed to jump to just after the opening
   184   then \texttt{jumpLeft} is supposed to jump to just after the opening
   185   bracket (that is jumping to the beginning of the loop):
   185   bracket (that is jumping to the beginning of the loop):
   186 
   186 
   187   \begin{center}
   187   \begin{center}
   188     \texttt{--[..+>-\barbelow{-}],>,++}
   188     \texttt{--[..+>-\barbelow{-}],>,++}
   338   loops. 
   338   loops. 
   339 
   339 
   340   For this write a function \texttt{jtable} that precomputes the ``jump
   340   For this write a function \texttt{jtable} that precomputes the ``jump
   341   table'' for a bf-program. This function takes a bf-program 
   341   table'' for a bf-program. This function takes a bf-program 
   342   as an argument and returns a \texttt{Map[Int, Int]}. The 
   342   as an argument and returns a \texttt{Map[Int, Int]}. The 
   343   purpose of this Map is to record the information
   343   purpose of this Map is to record the information, in cases
   344   that given on the pc-position $pc$ is a '\texttt{[}' or a '\texttt{]}',
   344   a pc-position points to a '\texttt{[}' or a '\texttt{]}',
   345   then to which pc-position do we need to jump next?
   345   to which pc-position do we need to jump next?
   346  
   346  
   347   For example for the program
   347   For example for the program
   348     
   348     
   349   \begin{center}
   349   \begin{center}
   350     \texttt{+++++[->++++++++++<]>--<+++[->>++++++++++}
   350     \texttt{+++++[->++++++++++<]>--<+++[->>++++++++++}
   372   and \texttt{jumpRight} was called previously, you should look
   372   and \texttt{jumpRight} was called previously, you should look
   373   up the jump address in the \texttt{jtable}. Feel free to reuse
   373   up the jump address in the \texttt{jtable}. Feel free to reuse
   374   the function \texttt{jumpLeft} and \texttt{jumpRight} for
   374   the function \texttt{jumpLeft} and \texttt{jumpRight} for
   375   calculating the \texttt{jtable}.\hfill{[1 Mark]}
   375   calculating the \texttt{jtable}.\hfill{[1 Mark]}
   376 
   376 
   377 \item[(6)] Compilers try to eliminate any ``dead'' code that could slow
   377 \item[(6)] Compilers try to eliminate any ``dead'' code that could
   378   down programs and also perform what is often called
   378   slow down programs and also perform what is often called
   379   \emph{peephole optimisations}.\footnote{\url{https://en.wikipedia.org/wiki/Peephole_optimization}} While it is difficult for compilers to comprehend what
   379   \emph{peephole
   380   is intended with whole programs, they are very good at finding out
   380     optimisations}.\footnote{\url{https://en.wikipedia.org/wiki/Peephole_optimization}}
   381   what small snippets of code do, and then try to generate
   381   For the latter consider that it is difficult for compilers to
   382   faster code for such snippets.
   382   comprehend what is intended with whole programs, they are very good
       
   383   at finding out what small snippets of code do, and then try to
       
   384   generate faster code for such snippets.
   383 
   385 
   384   In our case, dead code is everything that is not a bf-command.
   386   In our case, dead code is everything that is not a bf-command.
   385   Therefore write a function \texttt{optimise} which deletes such
   387   Therefore write a function \texttt{optimise} which deletes such
   386   dead code. Moreover this function should replace every substring
   388   dead code from a bf-program. Moreover this function should replace every substring
   387   of the form \pcode{[-]} by a new command \texttt{0}. 
   389   of the form \pcode{[-]} by a new command \texttt{0}. 
   388   The idea is that the loop \pcode{[-]} just resets the
   390   The idea is that the loop \pcode{[-]} just resets the
   389   memory at the current location to 0. It is more efficient
   391   memory at the current location to 0. It is more efficient
   390   to do this in a single step, rather than incrementally as in
   392   to do this in a single step, rather than stepwise in a loop as in
   391   the original bf-programs.
   393   the original bf-programs.
   392 
   394 
   393   In the extended \texttt{compute3} and \texttt{run3} functions you should
   395   In the extended \texttt{compute3} and \texttt{run3} functions you should
   394   implement this command by writing 0 to \pcode{mem(mp)}, that is use
   396   implement this command by writing 0 to \pcode{mem(mp)}, that is use
   395   \pcode{write(mem, mp, 0)} as the rule for the command \texttt{0}.
   397   \pcode{write(mem, mp, 0)} as the rule for the command \texttt{0}.
   396   The easiest way to modify a string in this way is to use the regular
   398   The easiest way to modify a string in this way is to use the regular
   397   expression \pcode{"""[^<>+-.,\\[\\]]"""}, which recognises everything that is 
   399   expression \pcode{"""[^<>+-.,\\[\\]]"""}, which recognises everything that is 
   398   not a bf-command. Similarly, the
   400   not a bf-command. Similarly, the
   399   regular expression \pcode{"""\\[-\\]"""} finds all occurrences of \pcode{[-]} and 
   401   regular expression \pcode{"""\\[-\\]"""} finds all occurrences of \pcode{[-]}.  By using the Scala method \pcode{.replaceAll} you can replace substrings
   400   by using the Scala method \pcode{.replaceAll} you can replace it with the 
   402   with new strings.\\
   401   string \pcode{"0"} standing for the new bf-command.\hfill{[1 Mark]}
   403   \mbox{}\hfill{[1 Mark]}
   402 
   404 
   403 \item[(7)] Finally, real compilers try to take advantage of CPUs which often
   405 \item[(7)] Finally, real compilers try to take advantage of CPUs which often
   404   provide complex operations in hardware that can combine many smaller
   406   provide complex operations in hardware that can combine many smaller
   405   instructions into a single faster instruction.
   407   instructions into a single faster instruction.
   406 
   408 
   407   In our case we can optimise the several single increments of a
   409   In our case we can optimise the several single increments performed at a
   408   memory cell, for example \pcode{++++}, by a single ``increment by
   410   memory cell, for example \pcode{++++}, by a single ``increment by
   409   4''. For this optimisation we just have to make sure these single
   411   4''. For this optimisation we just have to make sure these single
   410   increments are all next to each other. Similarly optimisations should apply
   412   increments are all next to each other. Similar optimisations should apply
   411   for the bf-commands \pcode{-}, \pcode{<} and
   413   for the bf-commands \pcode{-}, \pcode{<} and
   412   \pcode{>}, which can all be replaced by extended versions that take
   414   \pcode{>}, which can all be replaced by extended versions that take
   413   the amount of the increment (decrement) into account. We will do
   415   the amount of the increment (decrement) into account. We will do
   414   this by introducing two-character bf-commands. For example
   416   this by introducing two-character bf-commands. For example
   415 
   417 
   425       \hspace{5mm}(these are 26 \pcode{+}'s)\\
   427       \hspace{5mm}(these are 26 \pcode{+}'s)\\
   426     \end{tabular} 
   428     \end{tabular} 
   427   \end{center}  
   429   \end{center}  
   428 
   430 
   429 
   431 
   430   Similarly for \pcode{-}, \pcode{<} and \pcode{>}. If there are more
   432   If there are more
   431   than 26 \pcode{+}'s in a row, then more than on ``two-character''
   433   than 26 \pcode{+}'s in a row, then more than one ``two-character''
   432   bf-commands need to be generated (the idea is that more than
   434   bf-commands need to be generated (the idea is that more than
   433   26 copies of a single bf-command in a row is a rare occurrence in
   435   26 copies of a single bf-command in a row is a rare occurrence in
   434   actual bf-programs). All other bf-commands should be unaffected by this
   436   actual bf-programs). Similar replacements apply
       
   437   for \pcode{-}, \pcode{<} and \pcode{>}, but
       
   438   all other bf-commands should be unaffected by this
   435   change. 
   439   change. 
   436 
   440 
   437   For this write a function \texttt{combine} which replaces sequences
   441   For this write a function \texttt{combine} which replaces sequences
   438   of repeated increment and decrement commands by appropriate
   442   of repeated increment and decrement commands by appropriate
   439   two-character commands.  In the functions \pcode{compute4} and
   443   two-character commands. In the functions \pcode{compute4} and
   440   \pcode{run4}, the ``combine'' and the optimisation from (6) should
   444   \pcode{run4}, the ``combine'' and the optimisation from (6) should
   441   be performed. Make sure that when a two-character bf-command is
   445   be performed. Make sure that when a two-character bf-command is
   442   encountered you need to increase the \pcode{pc}-counter by two in
   446   encountered you need to increase the \pcode{pc}-counter by two in
   443   order to process the next command. For example
   447   order to progress to the next command. For example
   444 
   448 
   445   \begin{center}
   449   \begin{center}
   446   \pcode{combine(optimise(load_bff("benchmark.bf")))}  
   450   \pcode{combine(optimise(load_bff("benchmark.bf")))}  
   447   \end{center}  
   451   \end{center}  
   448 
   452 
   457   \begin{center}
   461   \begin{center}
   458     \pcode{>++[<+++++++++++++>-]<[[}\hspace{3mm}\ldots
   462     \pcode{>++[<+++++++++++++>-]<[[}\hspace{3mm}\ldots
   459   \end{center}    
   463   \end{center}    
   460 
   464 
   461   As you can see, the compiler bets on saving so much time on the
   465   As you can see, the compiler bets on saving so much time on the
   462   \pcode{+B} and \pcode{+M} steps so that the optimisations is
   466   \pcode{+B} and \pcode{+M} steps such that the optimisations is
   463   worthwhile overall (of course for the \pcode{>A}'s and so on, the compiler incurs a
   467   worthwhile overall (of course for the \pcode{>A}'s and so on, the compiler incurs a
   464   penalty). Luckily, after you have performed all
   468   penalty). Luckily, after you have performed all
   465   optimisations in (5) - (7), you can expect that the
   469   optimisations in (5) - (7), you can expect that the
   466   \pcode{benchmark.bf} program runs four to five times faster.
   470   \pcode{benchmark.bf} program runs four to five times faster.
   467   You can also test whether your compiler produces the correct result
   471   You can also test whether your compiler produces the correct result