\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\usepackage{disclaimer}
\usepackage{tikz}
\usepackage{pgf}
\usepackage{pgfplots}
\usepackage{stackengine}
%% \usepackage{accents}
\newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
\begin{document}
% BF IDE
% https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
\section*{Coursework 10 (Scala)}
This coursework is worth 10\%. It is about a small programming
language called brainf***. The first part is due on 13 December at
11pm; the second, more advanced part, is due on 20 December at
11pm.\bigskip
\IMPORTANT{}
\noindent
Also note that the running time of each part will be restricted to a
maximum of 30 seconds on my laptop.
\DISCLAIMER{}
\subsection*{Reference Implementation}
As usual, this Scala assignment comes with a reference implementation in form of
two \texttt{jar}-files. You can download them from KEATS. They allow you to run any
test cases on your own computer. For example you can call Scala on the command line with the
option \texttt{-cp bf.jar} and then query any function from the
\texttt{bf.scala} template file. You have to
prefix the calls with \texttt{CW10a} and \texttt{CW10b}, respectively. For example
\begin{lstlisting}[language={},xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small]
$ scala -cp bf.jar
scala> import CW10a._
scala> run(load_bff("sierpinski.bf"))
*
* *
* *
* * * *
* *
* * * *
* * * *
* * * * * * * *
* *
* * * *
* * * *
* * * * * * * *
* * * *
* * * * * * * *
* * * * * * * *
* * * * * * * * * * * * * * * *
* *
* * * *
* * * *
* * * * * * * *
* * * *
* * * * * * * *
* * * * * * * *
* * * * * * * * * * * * * * * *
* * * *
* * * * * * * *
* * * * * * * *
* * * * * * * * * * * * * * * *
* * * * * * * *
* * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
\end{lstlisting}%$
\subsection*{Part 1 (6 Marks)}
Coming from Java or C++, you might think Scala is a rather esoteric
programming language. But remember, some serious companies have built
their business on
Scala.\footnote{\url{https://en.wikipedia.org/wiki/Scala_(programming_language)\#Companies}}
I claim functional programming is not a fad. And there are far, far
more esoteric languages out there. One is called \emph{brainf***}. You
are asked in this part to implement an interpreter for
this language.
Urban M\"uller developed brainf*** in 1993. A close relative of this
language was already introduced in 1964 by Corado B\"ohm, an Italian
computer pioneer. The main feature of brainf*** is its minimalistic
set of instructions---just 8 instructions in total and all of which
are single characters. Despite the minimalism, this language has been
shown to be Turing complete\ldots{}if this doesn't ring any bell with
you: it roughly means that every(!) algorithm can, in principle,
be implemented in brainf***. It just takes a lot of determination and
quite a lot of memory resources. Some relatively sophisticated sample
programs in brainf*** are given in the file \texttt{bf.scala}, including
a brainf*** program for the Sierpinski triangle and the Mandelbrot set.\bigskip
\noindent
As mentioned above, brainf*** has 8 single-character commands, namely
\texttt{'>'}, \texttt{'<'}, \texttt{'+'}, \texttt{'-'}, \texttt{'.'},
\texttt{','}, \texttt{'['} and \texttt{']'}. Every other character is
considered a comment. Brainf*** operates on memory cells containing
integers. For this it uses a single memory pointer that points at each
stage to one memory cell. This pointer can be moved forward by one
memory cell by using the command \texttt{'>'}, and backward by using
\texttt{'<'}. The commands \texttt{'+'} and \texttt{'-'} increase,
respectively decrease, by 1 the content of the memory cell to which
the memory pointer currently points to. The commands for input/output
are \texttt{','} and \texttt{'.'}. Output works by reading the content
of the memory cell to which the memory pointer points to and printing
it out as an ASCII character. Input works the other way, taking some
user input and storing it in the cell to which the memory pointer
points to. The commands \texttt{'['} and \texttt{']'} are looping
constructs. Everything in between \texttt{'['} and \texttt{']'} is
repeated until a counter (memory cell) reaches zero. A typical
program in brainf*** looks as follows:
\begin{center}
\begin{verbatim}
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.++
+++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
\end{verbatim}
\end{center}
\noindent
This one prints out Hello World\ldots{}obviously ;o)
\subsubsection*{Tasks (file bf.scala)}
\begin{itemize}
\item[(1)] Write a function that takes a filename (a string) as an argument
and requests the corresponding file from disk. It returns the
content of the file as a string. If the file does not exists,
the function should return the empty string.\\
\mbox{}\hfill[1 Mark]
\item[(2)] Brainf*** memory is represented by a \texttt{Map} from
integers to integers. The empty memory is represented by
\texttt{Map()}, that is nothing is stored in the
memory; \texttt{Map(0 -> 1, 2 -> 3)} stores \texttt{1} at
memory location \texttt{0}, and at \texttt{2} it stores \texttt{3}. The
convention is that if we query the memory at a location that is
\emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
a memory pointer (an \texttt{Int}) as arguments, and `safely' reads the
corresponding memory location. If the \texttt{Map} is not defined at
the memory pointer, \texttt{sread} returns \texttt{0}.
Write another function \texttt{write}, which takes a memory, a
memory pointer and an integer value as arguments and updates the
\texttt{Map} with the value at the given memory location. As usual,
the \texttt{Map} is not updated `in-place' but a new map is created
with the same data, except the new value is stored at the given memory
pointer.\hfill[1 Mark]
\item[(3)] Write two functions, \texttt{jumpRight} and
\texttt{jumpLeft}, that are needed to implement the loop constructs
in brainf***. They take a program (a \texttt{String}) and a program
counter (an \texttt{Int}) as arguments and move right (respectively
left) in the string in order to find the \textbf{matching}
opening/closing bracket. For example, given the following program
with the program counter indicated by an arrow:
\begin{center}
\texttt{--[\barbelow{.}.+>--],>,++}
\end{center}
then the matching closing bracket is in 9th position (counting from 0) and
\texttt{jumpRight} is supposed to return the position just after this
\begin{center}
\texttt{--[..+>--]\barbelow{,}>,++}
\end{center}
meaning it jumps to after the loop. Similarly, if you are in 8th position,
then \texttt{jumpLeft} is supposed to jump to just after the opening
bracket (that is jumping to the beginning of the loop):
\begin{center}
\texttt{--[..+>-\barbelow{-}],>,++}
\qquad$\stackrel{\texttt{jumpLeft}}{\longrightarrow}$\qquad
\texttt{--[\barbelow{.}.+>--],>,++}
\end{center}
Unfortunately we have to take into account that there might be
other opening and closing brackets on the `way' to find the
matching bracket. For example in the brainf*** program
\begin{center}
\texttt{--[\barbelow{.}.[+>]--],>,++}
\end{center}
we do not want to return the index for the \texttt{'-'} in the 9th
position, but the program counter for \texttt{','} in 12th
position. The easiest to find out whether a bracket is matched is by
using levels (which are the third argument in \texttt{jumpLeft} and
\texttt{jumpLeft}). In case of \texttt{jumpRight} you increase the
level by one whenever you find an opening bracket and decrease by
one for a closing bracket. Then in \texttt{jumpRight} you are looking
for the closing bracket on level \texttt{0}. For \texttt{jumpLeft} you
do the opposite. In this way you can find \textbf{matching} brackets
in strings such as
\begin{center}
\texttt{--[\barbelow{.}.[[-]+>[.]]--],>,++}
\end{center}
for which \texttt{jumpRight} should produce the position:
\begin{center}
\texttt{--[..[[-]+>[.]]--]\barbelow{,}>,++}
\end{center}
It is also possible that the position returned by \texttt{jumpRight} or
\texttt{jumpLeft} is outside the string in cases where there are
no matching brackets. For example
\begin{center}
\texttt{--[\barbelow{.}.[[-]+>[.]]--,>,++}
\qquad$\stackrel{\texttt{jumpRight}}{\longrightarrow}$\qquad
\texttt{--[..[[-]+>[.]]-->,++\barbelow{\;\phantom{+}}}
\end{center}
\hfill[2 Marks]
\item[(4)] Write a recursive function \texttt{compute} that runs a
brainf*** program. It takes a program, a program counter, a memory
pointer and a memory as arguments. If the program counter is outside
the program string, the execution stops and \texttt{compute} returns the
memory. If the program counter is inside the string, it reads the
corresponding character and updates the program counter \texttt{pc},
memory pointer \texttt{mp} and memory \texttt{mem} according to the
rules shown in Figure~\ref{comms}. It then calls recursively
\texttt{compute} with the updated data. The most convenient way to
implement the brainf**k rules in Scala is to use pattern-matching
and to calculate a triple consisting of the updated \texttt{pc},
\texttt{mp} and \texttt{mem}.
Write another function \texttt{run} that calls \texttt{compute} with a
given brainfu** program and memory, and the program counter and memory pointer
set to~$0$. Like \texttt{compute}, it returns the memory after the execution
of the program finishes. You can test your brainf**k interpreter with the
Sierpinski triangle or the Hello world programs (they seem to be particularly
useful for debugging purposes), or have a look at
\begin{center}
\url{https://esolangs.org/wiki/Brainfuck}
\end{center}\hfill[2 Marks]
\begin{figure}[p]
\begin{center}
\begin{tabular}{|@{\hspace{0.5mm}}p{0.8cm}|l|}
\hline
\hfill\texttt{'>'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp} + 1$\\
$\bullet$ & \texttt{mem} unchanged
\end{tabular}\\\hline
\hfill\texttt{'<'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp} - 1$\\
$\bullet$ & \texttt{mem} unchanged
\end{tabular}\\\hline
\hfill\texttt{'+'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp}$ unchanged\\
$\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) + 1}\\
\end{tabular}\\\hline
\hfill\texttt{'-'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp}$ unchanged\\
$\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\
\end{tabular}\\\hline
\hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
$\bullet$ & print out \,\texttt{mem(mp)} as a character\\
\end{tabular}\\\hline
\hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp}$ unchanged\\
$\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
\multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
\end{tabular}\\\hline
\hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
\multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\
$\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
\multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) != 0} then}\\
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
\end{tabular}
\\\hline
\hfill\texttt{']'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
\multicolumn{2}{@{}l}{if \texttt{mem(mp) != 0} then}\\
$\bullet$ & $\texttt{pc = jumpLeft(prog, pc - 1, 0)}$\\
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
\multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) == 0} then}\\
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
\end{tabular}\\\hline
any other char & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
$\bullet$ & $\texttt{pc} + 1$\\
$\bullet$ & \texttt{mp} and \texttt{mem} unchanged
\end{tabular}\\
\hline
\end{tabular}
\end{center}
\caption{The rules for how commands in the brainf*** language update the program counter \texttt{pc},
the memory pointer \texttt{mp} and the memory \texttt{mem}.\label{comms}}
\end{figure}
\end{itemize}\bigskip
\newpage
\subsection*{Part 2 (4 Marks)}
I am sure you agree while it is fun to look at bf-programs, like the
Sierpinski triangle or the Mandelbrot program, being interpreted, it
is much more fun to write a compiler for the bf-language.
\subsubsection*{Tasks (file bfc.scala)}
\begin{itemize}
\item[(5)] Compilers in general attempt to make programs run
faster by precomputing as much information as possible
before running the program. In our case we can precompute the
addresses where we need to jump at the beginning and end of
loops.
For this write a function \texttt{jtable} that precomputes the ``jump
table'' for a bf-program. This function takes a bf-program
as an argument and returns a \texttt{Map[Int, Int]}. The
purpose of this Map is to record the information, in cases
a pc-position points to a '\texttt{[}' or a '\texttt{]}',
to which pc-position do we need to jump next?
For example for the program
\begin{center}
\texttt{+++++[->++++++++++<]>--<+++[->>++++++++++}
\texttt{<<]>>++<<----------[+>.>.<+<]}
\end{center}
we obtain the Map (note the precise numbers might differ depending on white
spaces etc.~in the bf-program):
\begin{center}
\texttt{Map(69 -> 61, 5 -> 20, 60 -> 70, 27 -> 44, 43 -> 28, 19 -> 6)}
\end{center}
This Map states that for the '\texttt{[}' on position 5, we need to
jump to position 20, which is just after the corresponding '\texttt{]}'.
Similarly, for the '\texttt{]}' on position 19, we need to jump to
position 6, which is just after the '\texttt{[}' on position 5, and so
on. The idea is to not calculate this information each time
we hit a bracket, but just look up this information in the
\texttt{jtable}.
Then adapt the \texttt{compute} and \texttt{run} functions
from Part 1 in order to take advantage of the information
stored in the \texttt{jtable}. This means whenever \texttt{jumpLeft}
and \texttt{jumpRight} was called previously, you should look
up the jump address in the \texttt{jtable}. Feel free to reuse
the function \texttt{jumpLeft} and \texttt{jumpRight} for
calculating the \texttt{jtable}.\hfill{[1 Mark]}
\item[(6)] Compilers try to eliminate any ``dead'' code that could
slow down programs and also perform what is often called
\emph{peephole
optimisations}.\footnote{\url{https://en.wikipedia.org/wiki/Peephole_optimization}}
For the latter consider that it is difficult for compilers to
comprehend what is intended with whole programs, but they are very good
at finding out what small snippets of code do, and then try to
generate faster code for such snippets.
In our case, dead code is everything that is not a bf-command.
Therefore write a function \texttt{optimise} which deletes such
dead code from a bf-program. Moreover this function should replace every substring
of the form \pcode{[-]} by a new command \texttt{0}.
The idea is that the loop \pcode{[-]} just resets the
memory at the current location to 0. It is more efficient
to do this in a single step, rather than stepwise in a loop as in
the original bf-programs.
In the extended \texttt{compute3} and \texttt{run3} functions you should
implement this command by writing 0 to \pcode{mem(mp)}, that is use
\pcode{write(mem, mp, 0)} as the rule for the command \texttt{0}.
The easiest way to modify a string in this way is to use the regular
expression \pcode{"""[^<>+-.,\\[\\]]"""}, which recognises everything that is
not a bf-command. Similarly, the
regular expression \pcode{"""\\[-\\]"""} finds all occurrences of \pcode{[-]}. By using the Scala method \pcode{.replaceAll} you can replace substrings
with new strings.\\
\mbox{}\hfill{[1 Mark]}
\item[(7)] Finally, real compilers try to take advantage of CPUs which often
provide complex operations in hardware that can combine many smaller
instructions into a single faster instruction.
In our case we can optimise the several single increments performed at a
memory cell, for example \pcode{++++}, by a single ``increment by
4''. For this optimisation we just have to make sure these single
increments are all next to each other. Similar optimisations should apply
for the bf-commands \pcode{-}, \pcode{<} and
\pcode{>}, which can all be replaced by extended versions that take
the amount of the increment (decrement) into account. We will do
this by introducing two-character bf-commands. For example
\begin{center}
\begin{tabular}{l|l}
original bf-cmds & replacement\\
\hline
\pcode{+} & \pcode{+A}\\
\pcode{++} & \pcode{+B}\\
\pcode{+++} & \pcode{+C}\\
\ldots{} & \ldots{}\\
\pcode{+++....++} & \pcode{+Z}\\
\hspace{5mm}(these are 26 \pcode{+}'s)\\
\end{tabular}
\end{center}
If there are more
than 26 \pcode{+}'s in a row, then more than one ``two-character''
bf-commands need to be generated (the idea is that more than
26 copies of a single bf-command in a row is a rare occurrence in
actual bf-programs). Similar replacements apply
for \pcode{-}, \pcode{<} and \pcode{>}, but
all other bf-commands should be unaffected by this
change.
For this write a function \texttt{combine} which replaces sequences
of repeated increment and decrement commands by appropriate
two-character commands. In the functions \pcode{compute4} and
\pcode{run4}, the ``combine'' and the optimisation from (6) should
be performed. Make sure that when a two-character bf-command is
encountered you need to increase the \pcode{pc}-counter by two in
order to progress to the next command. For example
\begin{center}
\pcode{combine(optimise(load_bff("benchmark.bf")))}
\end{center}
generates the improved program
\begin{center}
\pcode{>A+B[<A+M>A-A]<A[[}\hspace{3mm}\ldots{}
\end{center}
for the original benchmark program
\begin{center}
\pcode{>++[<+++++++++++++>-]<[[}\hspace{3mm}\ldots
\end{center}
As you can see, the compiler bets on saving a lot of time on the
\pcode{+B} and \pcode{+M} steps so that the optimisations is
worthwhile overall (of course for the \pcode{>A}'s and so on, the compiler incurs a
penalty). Luckily, after you have performed all
optimisations in (5) - (7), you can expect that the
\pcode{benchmark.bf} program runs four to five times faster.
You can also test whether your compiler produces the correct result
by for example testing
\begin{center}
\pcode{run(load_bff("sierpinski.bf")) == run4(load_bff("sierpinski.bf"))}
\end{center}
which should return true for all the different compiler stages. \\
\mbox{}\hfill{[2 Marks]}
\end{itemize}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: