251
+ − 1
% !TEX program = xelatex
6
+ − 2
\documentclass{article}
426
+ − 3
\usepackage{../styles/style}
+ − 4
\usepackage{../styles/langs}
218
+ − 5
\usepackage{disclaimer}
+ − 6
\usepackage{tikz}
+ − 7
\usepackage{pgf}
+ − 8
\usepackage{pgfplots}
+ − 9
\usepackage{stackengine}
+ − 10
%% \usepackage{accents}
+ − 11
\newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
+ − 12
325
+ − 13
%% change Console to scala.io.StdIn.readByte()
+ − 14
6
+ − 15
+ − 16
\begin{document}
+ − 17
429
+ − 18
\section*{Main Part 5 (Scala, 10 Marks)}
6
+ − 19
336
+ − 20
\mbox{}\hfill\textit{``If there's one feature that makes Scala, `Scala',}\\
+ − 21
\mbox{}\hfill\textit{ I would pick implicits.''}\smallskip\\
+ − 22
\mbox{}\hfill\textit{ --- Martin Odersky (creator of the Scala language)}\bigskip\bigskip
278
+ − 23
+ − 24
+ − 25
\noindent
337
+ − 26
This part is about a small (esoteric) programming language called
399
+ − 27
brainf***. We will implement an interpreter and compiler for
+ − 28
this language.\bigskip
218
+ − 29
411
+ − 30
%\IMPORTANT{This part is worth 10\% and you need to submit it on \cwTEN{} at 5pm.
+ − 31
%Any 1\% you achieve counts as your ``weekly engagement''.}
+ − 32
\IMPORTANTNONE{}
62
+ − 33
+ − 34
\noindent
218
+ − 35
Also note that the running time of each part will be restricted to a
+ − 36
maximum of 30 seconds on my laptop.
+ − 37
+ − 38
\DISCLAIMER{}
268
+ − 39
\newpage
86
+ − 40
230
+ − 41
\subsection*{Reference Implementation}
+ − 42
268
+ − 43
As usual, this Scala assignment comes with a reference implementation in
+ − 44
form of two \texttt{jar}-files. You can download them from KEATS. They
+ − 45
allow you to run any test cases on your own computer. For example you
+ − 46
can call Scala on the command line with the option \texttt{-cp bf.jar}
+ − 47
and then query any function from the \texttt{bf.scala} template file.
399
+ − 48
You have to prefix the calls with \texttt{M5a} and \texttt{M5b},
268
+ − 49
respectively. For example
230
+ − 50
+ − 51
+ − 52
\begin{lstlisting}[language={},xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small]
+ − 53
$ scala -cp bf.jar
399
+ − 54
scala> import M5a._
292
+ − 55
scala> run(load_bff("sierpinski.bf")) ; ()
230
+ − 56
*
+ − 57
* *
+ − 58
* *
+ − 59
* * * *
+ − 60
* *
+ − 61
* * * *
+ − 62
* * * *
+ − 63
* * * * * * * *
+ − 64
* *
+ − 65
* * * *
+ − 66
* * * *
+ − 67
* * * * * * * *
+ − 68
* * * *
+ − 69
* * * * * * * *
+ − 70
* * * * * * * *
+ − 71
* * * * * * * * * * * * * * * *
+ − 72
* *
+ − 73
* * * *
+ − 74
* * * *
+ − 75
* * * * * * * *
+ − 76
* * * *
+ − 77
* * * * * * * *
+ − 78
* * * * * * * *
+ − 79
* * * * * * * * * * * * * * * *
+ − 80
* * * *
+ − 81
* * * * * * * *
+ − 82
* * * * * * * *
+ − 83
* * * * * * * * * * * * * * * *
+ − 84
* * * * * * * *
+ − 85
* * * * * * * * * * * * * * * *
+ − 86
* * * * * * * * * * * * * * * *
+ − 87
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
+ − 88
\end{lstlisting}%$
+ − 89
268
+ − 90
\newpage
218
+ − 91
429
+ − 92
\subsection*{Part A (6 Marks)}
218
+ − 93
229
+ − 94
Coming from Java or C++, you might think Scala is a rather esoteric
218
+ − 95
programming language. But remember, some serious companies have built
+ − 96
their business on
+ − 97
Scala.\footnote{\url{https://en.wikipedia.org/wiki/Scala_(programming_language)\#Companies}}
230
+ − 98
I claim functional programming is not a fad. And there are far, far
336
+ − 99
more esoteric languages out there. One is called \emph{brainf***}.
+ − 100
\here{https://esolangs.org/wiki/Brainfuck}
+ − 101
You
400
+ − 102
are asked in this part to implement an interpreter for this language.
218
+ − 103
336
+ − 104
Urban M\"uller developed the original version of brainf*** in 1993. A close
+ − 105
relative of this language was already introduced in 1964 by Corado
+ − 106
B\"ohm, an Italian computer pioneer. The main feature of brainf*** is
+ − 107
its minimalistic set of instructions---just 8 instructions in total
+ − 108
and all of which are single characters. Despite the minimalism, this
+ − 109
language has been shown to be Turing complete\ldots{}if this doesn't
+ − 110
ring any bell with you: it roughly means that every(!) algorithm can,
+ − 111
in principle, be implemented in brainf***. It just takes a lot of
429
+ − 112
determination and quite a lot of memory and time.
336
+ − 113
+ − 114
Some relatively sophisticated sample programs in brainf*** are given
+ − 115
in the file \texttt{bf.scala}, including a brainf*** program for the
+ − 116
Sierpinski triangle and the Mandelbrot set. There seems to be even a
+ − 117
dedicated Windows IDE for bf programs, though I am not sure whether
+ − 118
this is just an elaborate April fools' joke---judge yourself:
247
+ − 119
+ − 120
\begin{center}
+ − 121
\url{https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5}
268
+ − 122
\end{center} \bigskip
247
+ − 123
218
+ − 124
+ − 125
\noindent
336
+ − 126
As mentioned above, the original brainf*** has 8 single-character
399
+ − 127
commands. Our version of bf will contain the commands \texttt{'>'},
336
+ − 128
\texttt{'<'}, \texttt{'+'}, \texttt{'-'}, \texttt{'.'}, \texttt{'['}
399
+ − 129
and \texttt{']'}. Every other character
336
+ − 130
is considered a comment.
+ − 131
399
+ − 132
Our interpreter for bf operates on memory cells containing
337
+ − 133
integers. For this it uses a single memory pointer, called
+ − 134
\texttt{mp}, that points at each stage to one memory cell.
+ − 135
+ − 136
\begin{center}
+ − 137
\begin{tikzpicture}
+ − 138
\draw [line width=1mm, rounded corners] (0,0) rectangle (5, 0.5);
+ − 139
\draw (0.5, 0) -- (0.5, 0.5);
+ − 140
\draw (1.0, 0) -- (1.0, 0.5);
+ − 141
+ − 142
\draw (2.5, 0) -- (2.5, 0.5);
+ − 143
\draw (2.0, 0) -- (2.0, 0.5);
+ − 144
+ − 145
\draw (4.5, 0) -- (4.5, 0.5);
+ − 146
\draw (4.0, 0) -- (4.0, 0.5);
+ − 147
+ − 148
\draw (1.5,0.25) node {$\cdots$};
+ − 149
\draw (3.0,0.25) node {$\cdots$};
+ − 150
+ − 151
\draw [->, thick] (2.25, -0.5) -- (2.25, -0.15);
+ − 152
\draw (2.25,-0.8) node {\texttt{mp}};
+ − 153
+ − 154
\draw (0.7,0.7) node {\sf\footnotesize memory};
+ − 155
\end{tikzpicture}
+ − 156
\end{center}
+ − 157
+ − 158
\noindent
+ − 159
This pointer can be moved forward by one memory cell by using the
+ − 160
command \texttt{'>'}, and backward by using \texttt{'<'}. The commands
+ − 161
\texttt{'+'} and \texttt{'-'} increase, respectively decrease, by 1
+ − 162
the content of the memory cell to which the memory pointer currently
399
+ − 163
points to. The command for output in bf is \texttt{'.'} whereby output works
337
+ − 164
by reading the content of the memory cell to which the memory pointer
+ − 165
points to and printing it out as an ASCII character.\footnote{In the
+ − 166
original version of bf, there is also a command for input, but we
+ − 167
omit it here. All our programs will be ``autonomous''.} The
336
+ − 168
commands \texttt{'['} and \texttt{']'} are looping
218
+ − 169
constructs. Everything in between \texttt{'['} and \texttt{']'} is
+ − 170
repeated until a counter (memory cell) reaches zero. A typical
+ − 171
program in brainf*** looks as follows:
+ − 172
+ − 173
\begin{center}
+ − 174
\begin{verbatim}
230
+ − 175
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.++
+ − 176
+++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
218
+ − 177
\end{verbatim}
+ − 178
\end{center}
+ − 179
+ − 180
\noindent
399
+ − 181
This one prints out Hello World\ldots{}obviously \texttt{;o)}
336
+ − 182
218
+ − 183
+ − 184
\subsubsection*{Tasks (file bf.scala)}
109
+ − 185
+ − 186
\begin{itemize}
237
+ − 187
\item[(1)] Write a function that takes a filename (a string) as an argument
229
+ − 188
and requests the corresponding file from disk. It returns the
237
+ − 189
content of the file as a string. If the file does not exists,
337
+ − 190
the function should return the empty string.
399
+ − 191
\mbox{}\hfill[0.5 Marks]
229
+ − 192
399
+ − 193
\item[(2)] Brainf**k memory is represented by a \texttt{Map} from
218
+ − 194
integers to integers. The empty memory is represented by
+ − 195
\texttt{Map()}, that is nothing is stored in the
229
+ − 196
memory; \texttt{Map(0 -> 1, 2 -> 3)} stores \texttt{1} at
+ − 197
memory location \texttt{0}, and at \texttt{2} it stores \texttt{3}. The
218
+ − 198
convention is that if we query the memory at a location that is
+ − 199
\emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
268
+ − 200
a `safe-read' function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
237
+ − 201
a memory pointer (an \texttt{Int}) as arguments, and `safely' reads the
218
+ − 202
corresponding memory location. If the \texttt{Map} is not defined at
+ − 203
the memory pointer, \texttt{sread} returns \texttt{0}.
+ − 204
+ − 205
Write another function \texttt{write}, which takes a memory, a
237
+ − 206
memory pointer and an integer value as arguments and updates the
+ − 207
\texttt{Map} with the value at the given memory location. As usual,
218
+ − 208
the \texttt{Map} is not updated `in-place' but a new map is created
237
+ − 209
with the same data, except the new value is stored at the given memory
399
+ − 210
pointer.\hfill[0.5 Marks]
218
+ − 211
229
+ − 212
\item[(3)] Write two functions, \texttt{jumpRight} and
237
+ − 213
\texttt{jumpLeft}, that are needed to implement the loop constructs
399
+ − 214
in brainf**k. They take a program (a \texttt{String}) and a program
237
+ − 215
counter (an \texttt{Int}) as arguments and move right (respectively
218
+ − 216
left) in the string in order to find the \textbf{matching}
+ − 217
opening/closing bracket. For example, given the following program
+ − 218
with the program counter indicated by an arrow:
+ − 219
+ − 220
\begin{center}
336
+ − 221
\texttt{--[\barbelow{.}.+>--].>.++}
218
+ − 222
\end{center}
+ − 223
+ − 224
then the matching closing bracket is in 9th position (counting from 0) and
+ − 225
\texttt{jumpRight} is supposed to return the position just after this
109
+ − 226
218
+ − 227
\begin{center}
336
+ − 228
\texttt{--[..+>--]\barbelow{.}>.++}
218
+ − 229
\end{center}
+ − 230
237
+ − 231
meaning it jumps to after the loop. Similarly, if you are in 8th position,
218
+ − 232
then \texttt{jumpLeft} is supposed to jump to just after the opening
+ − 233
bracket (that is jumping to the beginning of the loop):
109
+ − 234
218
+ − 235
\begin{center}
336
+ − 236
\texttt{--[..+>-\barbelow{-}].>.++}
218
+ − 237
\qquad$\stackrel{\texttt{jumpLeft}}{\longrightarrow}$\qquad
336
+ − 238
\texttt{--[\barbelow{.}.+>--].>.++}
218
+ − 239
\end{center}
+ − 240
+ − 241
Unfortunately we have to take into account that there might be
+ − 242
other opening and closing brackets on the `way' to find the
399
+ − 243
matching bracket. For example in the brain*ck program
218
+ − 244
+ − 245
\begin{center}
336
+ − 246
\texttt{--[\barbelow{.}.[+>]--].>.++}
218
+ − 247
\end{center}
109
+ − 248
218
+ − 249
we do not want to return the index for the \texttt{'-'} in the 9th
336
+ − 250
position, but the program counter for \texttt{'.'} in 12th
218
+ − 251
position. The easiest to find out whether a bracket is matched is by
+ − 252
using levels (which are the third argument in \texttt{jumpLeft} and
+ − 253
\texttt{jumpLeft}). In case of \texttt{jumpRight} you increase the
+ − 254
level by one whenever you find an opening bracket and decrease by
+ − 255
one for a closing bracket. Then in \texttt{jumpRight} you are looking
+ − 256
for the closing bracket on level \texttt{0}. For \texttt{jumpLeft} you
+ − 257
do the opposite. In this way you can find \textbf{matching} brackets
+ − 258
in strings such as
+ − 259
+ − 260
\begin{center}
336
+ − 261
\texttt{--[\barbelow{.}.[[-]+>[.]]--].>.++}
218
+ − 262
\end{center}
109
+ − 263
218
+ − 264
for which \texttt{jumpRight} should produce the position:
+ − 265
+ − 266
\begin{center}
336
+ − 267
\texttt{--[..[[-]+>[.]]--]\barbelow{.}>.++}
218
+ − 268
\end{center}
+ − 269
+ − 270
It is also possible that the position returned by \texttt{jumpRight} or
+ − 271
\texttt{jumpLeft} is outside the string in cases where there are
+ − 272
no matching brackets. For example
+ − 273
+ − 274
\begin{center}
336
+ − 275
\texttt{--[\barbelow{.}.[[-]+>[.]]--.>.++}
218
+ − 276
\qquad$\stackrel{\texttt{jumpRight}}{\longrightarrow}$\qquad
336
+ − 277
\texttt{--[..[[-]+>[.]]-->.++\barbelow{\;\phantom{+}}}
218
+ − 278
\end{center}
229
+ − 279
\hfill[2 Marks]
109
+ − 280
+ − 281
230
+ − 282
\item[(4)] Write a recursive function \texttt{compute} that runs a
399
+ − 283
brain*u*k program. It takes a program, a program counter, a memory
218
+ − 284
pointer and a memory as arguments. If the program counter is outside
230
+ − 285
the program string, the execution stops and \texttt{compute} returns the
218
+ − 286
memory. If the program counter is inside the string, it reads the
+ − 287
corresponding character and updates the program counter \texttt{pc},
+ − 288
memory pointer \texttt{mp} and memory \texttt{mem} according to the
+ − 289
rules shown in Figure~\ref{comms}. It then calls recursively
230
+ − 290
\texttt{compute} with the updated data. The most convenient way to
399
+ − 291
implement the brainf**k rules in Scala is to use pattern-matching
230
+ − 292
and to calculate a triple consisting of the updated \texttt{pc},
229
+ − 293
\texttt{mp} and \texttt{mem}.
218
+ − 294
230
+ − 295
Write another function \texttt{run} that calls \texttt{compute} with a
399
+ − 296
given brainfu*k program and memory, and the program counter and memory pointer
230
+ − 297
set to~$0$. Like \texttt{compute}, it returns the memory after the execution
399
+ − 298
of the program finishes. You can test your brainf**k interpreter with the
229
+ − 299
Sierpinski triangle or the Hello world programs (they seem to be particularly
+ − 300
useful for debugging purposes), or have a look at
109
+ − 301
218
+ − 302
\begin{center}
+ − 303
\url{https://esolangs.org/wiki/Brainfuck}
337
+ − 304
\end{center}
+ − 305
399
+ − 306
\noindent for more bf-programs and the test cases given in \texttt{bf.scala}.\\
337
+ − 307
\mbox{}\hfill[2 Marks]
109
+ − 308
218
+ − 309
\begin{figure}[p]
+ − 310
\begin{center}
230
+ − 311
\begin{tabular}{|@{\hspace{0.5mm}}p{0.8cm}|l|}
218
+ − 312
\hline
+ − 313
\hfill\texttt{'>'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 314
$\bullet$ & $\texttt{pc} + 1$\\
+ − 315
$\bullet$ & $\texttt{mp} + 1$\\
+ − 316
$\bullet$ & \texttt{mem} unchanged
+ − 317
\end{tabular}\\\hline
+ − 318
\hfill\texttt{'<'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 319
$\bullet$ & $\texttt{pc} + 1$\\
+ − 320
$\bullet$ & $\texttt{mp} - 1$\\
+ − 321
$\bullet$ & \texttt{mem} unchanged
+ − 322
\end{tabular}\\\hline
+ − 323
\hfill\texttt{'+'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 324
$\bullet$ & $\texttt{pc} + 1$\\
+ − 325
$\bullet$ & $\texttt{mp}$ unchanged\\
+ − 326
$\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) + 1}\\
+ − 327
\end{tabular}\\\hline
+ − 328
\hfill\texttt{'-'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 329
$\bullet$ & $\texttt{pc} + 1$\\
+ − 330
$\bullet$ & $\texttt{mp}$ unchanged\\
+ − 331
$\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\
+ − 332
\end{tabular}\\\hline
+ − 333
\hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 334
$\bullet$ & $\texttt{pc} + 1$\\
+ − 335
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
+ − 336
$\bullet$ & print out \,\texttt{mem(mp)} as a character\\
+ − 337
\end{tabular}\\\hline
336
+ − 338
%\hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 339
% $\bullet$ & $\texttt{pc} + 1$\\
+ − 340
% $\bullet$ & $\texttt{mp}$ unchanged\\
+ − 341
% $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
+ − 342
% \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
+ − 343
% \end{tabular}\\\hline
218
+ − 344
\hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 345
\multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\
+ − 346
$\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\
+ − 347
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
+ − 348
\multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) != 0} then}\\
+ − 349
$\bullet$ & $\texttt{pc} + 1$\\
+ − 350
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
+ − 351
\end{tabular}
+ − 352
\\\hline
+ − 353
\hfill\texttt{']'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 354
\multicolumn{2}{@{}l}{if \texttt{mem(mp) != 0} then}\\
+ − 355
$\bullet$ & $\texttt{pc = jumpLeft(prog, pc - 1, 0)}$\\
+ − 356
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
+ − 357
\multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) == 0} then}\\
+ − 358
$\bullet$ & $\texttt{pc} + 1$\\
+ − 359
$\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
336
+ − 360
\end{tabular}\\\hline
399
+ − 361
%\hfill\texttt{'*'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 362
% $\bullet$ & $\texttt{pc} + 1$\\
+ − 363
% $\bullet$ & $\texttt{mp}$ unchanged\\
+ − 364
% $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) * mem(mp - 1)}\\
+ − 365
% \multicolumn{2}{@{}l}{this multiplies the content of the memory cells at
+ − 366
% \texttt{mp} and \texttt{mp - 1}}\\
+ − 367
% \multicolumn{2}{@{}l}{and stores the result at \texttt{mp}}
+ − 368
% \end{tabular}\\\hline
+ − 369
%\hfill\texttt{'@'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 370
% $\bullet$ & $\texttt{pc} + 1$\\
+ − 371
% $\bullet$ & $\texttt{mp}$ unchanged\\
+ − 372
% $\bullet$ & \texttt{mem} updated with
+ − 373
% \texttt{mem(mp) -> mem(mp - 1)}\\
+ − 374
% \multicolumn{2}{@{}l}{this updates the memory cell having the index stored at \texttt{mem(mp)},}\\
+ − 375
% \multicolumn{2}{@{}l}{with the value stored at \texttt{mem(mp - 1)},}
+ − 376
% \end{tabular}\\\hline
+ − 377
%\hfill\texttt{'\#'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ − 378
% $\bullet$ & $\texttt{pc} + 1$\\
+ − 379
% $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
+ − 380
% $\bullet$ & print out \,\texttt{mem(mp)} as a number\\
+ − 381
% \end{tabular}\\\hline
218
+ − 382
any other char & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
399
+ − 383
% $\bullet$ & $\texttt{pc} + 1$\\
218
+ − 384
$\bullet$ & \texttt{mp} and \texttt{mem} unchanged
+ − 385
\end{tabular}\\
+ − 386
\hline
+ − 387
\end{tabular}
337
+ − 388
\\\mbox{}\\[-10mm]\mbox{}
218
+ − 389
\end{center}
399
+ − 390
\caption{The rules for how commands in the brainf*** language update the
337
+ − 391
program counter \texttt{pc},
230
+ − 392
the memory pointer \texttt{mp} and the memory \texttt{mem}.\label{comms}}
218
+ − 393
\end{figure}
429
+ − 394
218
+ − 395
429
+ − 396
\item[(5)] Let us also generate some BF programs ourselves:
+ − 397
Write a function
+ − 398
\texttt{generate} which takes a list of characters as input and generates
+ − 399
a corresponding BF program that prints out this list of characters. One
+ − 400
way to generate such a program is to consider each character in the list,
+ − 401
add as many \texttt{"+"} given by the ASCII code of the character, then add the
+ − 402
\texttt{"."} command for printing and add the loop \texttt{"[-]"} for ``zero-ing'' the memory
+ − 403
cell; then go to the next character in the list. For example
+ − 404
the list \texttt{"ABC".toString} produces (as a single string)
+ − 405
+ − 406
\begin{center}
+ − 407
\texttt{+++++++++++++++++++++++++++++++++++++++++++++++++++++}
+ − 408
\texttt{++++++++++++.[-]+++++++++++++++++++++++++++++++++++++}
+ − 409
\texttt{+++++++++++++++++++++++++++++.[-]++++++++++++++++++++}
+ − 410
\texttt{+++++++++++++++++++++++++++++++++++++++++++++++.[-]}
+ − 411
\end{center}\mbox{}\hfill[1 Mark]
+ − 412
+ − 413
\end{itemize}\bigskip
268
+ − 414
%%\newpage
233
+ − 415
307
+ − 416
\subsection*{Part B (4 Marks)}
218
+ − 417
399
+ − 418
I am sure you agree while it is fun to marvel at bf-programs, like the
233
+ − 419
Sierpinski triangle or the Mandelbrot program, being interpreted, it
399
+ − 420
is much more fun to write a compiler for the bf-language.
233
+ − 421
+ − 422
+ − 423
\subsubsection*{Tasks (file bfc.scala)}
+ − 424
+ − 425
\begin{itemize}
429
+ − 426
\item[(6)] Compilers, in general, attempt to make programs run
233
+ − 427
faster by precomputing as much information as possible
+ − 428
before running the program. In our case we can precompute the
+ − 429
addresses where we need to jump at the beginning and end of
+ − 430
loops.
+ − 431
+ − 432
For this write a function \texttt{jtable} that precomputes the ``jump
399
+ − 433
table'' for a bf-program. This function takes a bf-program
233
+ − 434
as an argument and returns a \texttt{Map[Int, Int]}. The
237
+ − 435
purpose of this Map is to record the information, in cases
+ − 436
a pc-position points to a '\texttt{[}' or a '\texttt{]}',
+ − 437
to which pc-position do we need to jump next?
233
+ − 438
+ − 439
For example for the program
+ − 440
+ − 441
\begin{center}
+ − 442
\texttt{+++++[->++++++++++<]>--<+++[->>++++++++++}
+ − 443
\texttt{<<]>>++<<----------[+>.>.<+<]}
+ − 444
\end{center}
+ − 445
+ − 446
we obtain the Map (note the precise numbers might differ depending on white
+ − 447
spaces etc.~in the bf-program):
+ − 448
+ − 449
\begin{center}
+ − 450
\texttt{Map(69 -> 61, 5 -> 20, 60 -> 70, 27 -> 44, 43 -> 28, 19 -> 6)}
+ − 451
\end{center}
+ − 452
+ − 453
This Map states that for the '\texttt{[}' on position 5, we need to
+ − 454
jump to position 20, which is just after the corresponding '\texttt{]}'.
+ − 455
Similarly, for the '\texttt{]}' on position 19, we need to jump to
+ − 456
position 6, which is just after the '\texttt{[}' on position 5, and so
+ − 457
on. The idea is to not calculate this information each time
+ − 458
we hit a bracket, but just look up this information in the
+ − 459
\texttt{jtable}.
+ − 460
+ − 461
Then adapt the \texttt{compute} and \texttt{run} functions
+ − 462
from Part 1 in order to take advantage of the information
+ − 463
stored in the \texttt{jtable}. This means whenever \texttt{jumpLeft}
+ − 464
and \texttt{jumpRight} was called previously, you should look
+ − 465
up the jump address in the \texttt{jtable}. Feel free to reuse
+ − 466
the function \texttt{jumpLeft} and \texttt{jumpRight} for
+ − 467
calculating the \texttt{jtable}.\hfill{[1 Mark]}
+ − 468
429
+ − 469
\item[(7)] Compilers try to eliminate any ``dead'' code that could
237
+ − 470
slow down programs and also perform what is often called
+ − 471
\emph{peephole
+ − 472
optimisations}.\footnote{\url{https://en.wikipedia.org/wiki/Peephole_optimization}}
+ − 473
For the latter consider that it is difficult for compilers to
241
+ − 474
comprehend what is intended with whole programs, but they are very good
237
+ − 475
at finding out what small snippets of code do, and then try to
+ − 476
generate faster code for such snippets.
233
+ − 477
399
+ − 478
In our case, dead code is everything that is not a bf-command.
233
+ − 479
Therefore write a function \texttt{optimise} which deletes such
399
+ − 480
dead code from a bf-program. Moreover this function should replace every substring
233
+ − 481
of the form \pcode{[-]} by a new command \texttt{0}.
+ − 482
The idea is that the loop \pcode{[-]} just resets the
+ − 483
memory at the current location to 0. It is more efficient
237
+ − 484
to do this in a single step, rather than stepwise in a loop as in
399
+ − 485
the original bf-programs.
218
+ − 486
233
+ − 487
In the extended \texttt{compute3} and \texttt{run3} functions you should
+ − 488
implement this command by writing 0 to \pcode{mem(mp)}, that is use
+ − 489
\pcode{write(mem, mp, 0)} as the rule for the command \texttt{0}.
+ − 490
The easiest way to modify a string in this way is to use the regular
420
+ − 491
expression \pcode{"""[^<>+\\-.\\[\\]]"""}, which recognises everything that is
399
+ − 492
not a bf-command. Similarly, the
237
+ − 493
regular expression \pcode{"""\\[-\\]"""} finds all occurrences of \pcode{[-]}. By using the Scala method \pcode{.replaceAll} you can replace substrings
+ − 494
with new strings.\\
+ − 495
\mbox{}\hfill{[1 Mark]}
233
+ − 496
429
+ − 497
\item[(8)] Finally, real compilers try to take advantage of modern
337
+ − 498
CPUs which often provide complex operations in hardware that can
+ − 499
combine many smaller instructions into a single faster instruction.
233
+ − 500
237
+ − 501
In our case we can optimise the several single increments performed at a
233
+ − 502
memory cell, for example \pcode{++++}, by a single ``increment by
+ − 503
4''. For this optimisation we just have to make sure these single
237
+ − 504
increments are all next to each other. Similar optimisations should apply
233
+ − 505
for the bf-commands \pcode{-}, \pcode{<} and
+ − 506
\pcode{>}, which can all be replaced by extended versions that take
+ − 507
the amount of the increment (decrement) into account. We will do
399
+ − 508
this by introducing two-character bf-commands. For example
233
+ − 509
+ − 510
\begin{center}
+ − 511
\begin{tabular}{l|l}
+ − 512
original bf-cmds & replacement\\
+ − 513
\hline
+ − 514
\pcode{+} & \pcode{+A}\\
+ − 515
\pcode{++} & \pcode{+B}\\
+ − 516
\pcode{+++} & \pcode{+C}\\
+ − 517
\ldots{} & \ldots{}\\
+ − 518
\pcode{+++....++} & \pcode{+Z}\\
+ − 519
\hspace{5mm}(these are 26 \pcode{+}'s)\\
+ − 520
\end{tabular}
+ − 521
\end{center}
+ − 522
+ − 523
237
+ − 524
If there are more
+ − 525
than 26 \pcode{+}'s in a row, then more than one ``two-character''
233
+ − 526
bf-commands need to be generated (the idea is that more than
399
+ − 527
26 copies of a single bf-command in a row is a rare occurrence in
+ − 528
actual bf-programs). Similar replacements apply
237
+ − 529
for \pcode{-}, \pcode{<} and \pcode{>}, but
399
+ − 530
all other bf-commands should be unaffected by this
233
+ − 531
change.
+ − 532
+ − 533
For this write a function \texttt{combine} which replaces sequences
+ − 534
of repeated increment and decrement commands by appropriate
237
+ − 535
two-character commands. In the functions \pcode{compute4} and
233
+ − 536
\pcode{run4}, the ``combine'' and the optimisation from (6) should
399
+ − 537
be performed. Make sure that when a two-character bf-command is
233
+ − 538
encountered you need to increase the \pcode{pc}-counter by two in
237
+ − 539
order to progress to the next command. For example
233
+ − 540
+ − 541
\begin{center}
+ − 542
\pcode{combine(optimise(load_bff("benchmark.bf")))}
+ − 543
\end{center}
+ − 544
+ − 545
generates the improved program
+ − 546
+ − 547
\begin{center}
+ − 548
\pcode{>A+B[<A+M>A-A]<A[[}\hspace{3mm}\ldots{}
+ − 549
\end{center}
+ − 550
+ − 551
for the original benchmark program
+ − 552
+ − 553
\begin{center}
+ − 554
\pcode{>++[<+++++++++++++>-]<[[}\hspace{3mm}\ldots
+ − 555
\end{center}
+ − 556
241
+ − 557
As you can see, the compiler bets on saving a lot of time on the
+ − 558
\pcode{+B} and \pcode{+M} steps so that the optimisations is
233
+ − 559
worthwhile overall (of course for the \pcode{>A}'s and so on, the compiler incurs a
+ − 560
penalty). Luckily, after you have performed all
+ − 561
optimisations in (5) - (7), you can expect that the
234
+ − 562
\pcode{benchmark.bf} program runs four to five times faster.
+ − 563
You can also test whether your compiler produces the correct result
337
+ − 564
by testing for example
234
+ − 565
+ − 566
\begin{center}
+ − 567
\pcode{run(load_bff("sierpinski.bf")) == run4(load_bff("sierpinski.bf"))}
+ − 568
\end{center}
+ − 569
+ − 570
which should return true for all the different compiler stages. \\
+ − 571
\mbox{}\hfill{[2 Marks]}
233
+ − 572
\end{itemize}
6
+ − 573
+ − 574
\end{document}
+ − 575
68
+ − 576
6
+ − 577
%%% Local Variables:
+ − 578
%%% mode: latex
+ − 579
%%% TeX-master: t
+ − 580
%%% End: