cws/cw05.tex
changeset 229 5549016ab10f
parent 221 9e7897f25e13
child 230 bebe34c975a8
equal deleted inserted replaced
228:33c2655be47d 229:5549016ab10f
     7 \usepackage{pgfplots}
     7 \usepackage{pgfplots}
     8 \usepackage{stackengine}
     8 \usepackage{stackengine}
     9 %% \usepackage{accents}
     9 %% \usepackage{accents}
    10 \newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
    10 \newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
    11 
    11 
    12 \begin{filecontents}{re-python2.data}
       
    13 1 0.033
       
    14 5 0.036
       
    15 10 0.034
       
    16 15 0.036
       
    17 18 0.059
       
    18 19 0.084 
       
    19 20 0.141
       
    20 21 0.248
       
    21 22 0.485
       
    22 23 0.878
       
    23 24 1.71
       
    24 25 3.40
       
    25 26 7.08
       
    26 27 14.12
       
    27 28 26.69
       
    28 \end{filecontents}
       
    29 
       
    30 \begin{filecontents}{re-java.data}
       
    31 5  0.00298
       
    32 10  0.00418
       
    33 15  0.00996
       
    34 16  0.01710
       
    35 17  0.03492
       
    36 18  0.03303
       
    37 19  0.05084
       
    38 20  0.10177
       
    39 21  0.19960
       
    40 22  0.41159
       
    41 23  0.82234
       
    42 24  1.70251
       
    43 25  3.36112
       
    44 26  6.63998
       
    45 27  13.35120
       
    46 28  29.81185
       
    47 \end{filecontents}
       
    48 
       
    49 \begin{filecontents}{re-js.data}
       
    50 5   0.061
       
    51 10  0.061
       
    52 15  0.061
       
    53 20  0.070
       
    54 23  0.131
       
    55 25  0.308
       
    56 26  0.564
       
    57 28  1.994
       
    58 30  7.648
       
    59 31  15.881 
       
    60 32  32.190
       
    61 \end{filecontents}
       
    62 
       
    63 \begin{filecontents}{re-java9.data}
       
    64 1000  0.01410
       
    65 2000  0.04882
       
    66 3000  0.10609
       
    67 4000  0.17456
       
    68 5000  0.27530
       
    69 6000  0.41116
       
    70 7000  0.53741
       
    71 8000  0.70261
       
    72 9000  0.93981
       
    73 10000 0.97419
       
    74 11000 1.28697
       
    75 12000 1.51387
       
    76 14000 2.07079
       
    77 16000 2.69846
       
    78 20000 4.41823
       
    79 24000 6.46077
       
    80 26000 7.64373
       
    81 30000 9.99446
       
    82 34000 12.966885
       
    83 38000 16.281621
       
    84 42000 19.180228
       
    85 46000 21.984721
       
    86 50000 26.950203
       
    87 60000 43.0327746
       
    88 \end{filecontents}
       
    89 
       
    90 
    12 
    91 \begin{document}
    13 \begin{document}
    92 
    14 
    93 % BF IDE
    15 % BF IDE
    94 % https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
    16 % https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
    95   
    17   
    96 \section*{Coursework 9 (Scala)}
    18 \section*{Coursework 9 (Scala)}
    97 
    19 
    98 This coursework is worth 10\%. It is about a regular expression
    20 This coursework is worth 10\%. It is about a small programming
    99 matcher and the shunting yard algorithm by Dijkstra. The first part is
    21 language called brainf***. The first part is due on 13 December at
   100 due on 6 December at 11pm; the second, more advanced part, is due on
    22 11pm; the second, more advanced part, is due on 20 December at
   101 21 December at 11pm. In the first part, you are asked to implement a
    23 11pm.\bigskip
   102 regular expression matcher based on derivatives of regular
       
   103 expressions. The reason is that regular expression matching in
       
   104 languages like Java, JavaScipt and Python can sometimes be extremely
       
   105 slow. The advanced part is about the shunting yard algorithm that
       
   106 transforms the usual infix notation of arithmetic expressions into the
       
   107 postfix notation, which is for example used in compilers.\bigskip
       
   108 
    24 
   109 \IMPORTANT{}
    25 \IMPORTANT{}
   110 
    26 
   111 \noindent
    27 \noindent
   112 Also note that the running time of each part will be restricted to a
    28 Also note that the running time of each part will be restricted to a
   113 maximum of 30 seconds on my laptop.
    29 maximum of 30 seconds on my laptop.
   114 
    30 
   115 \DISCLAIMER{}
    31 \DISCLAIMER{}
   116 
    32 
   117 
    33 
   118 \subsection*{Part 1 (6 Marks, Regular Expression Matcher)}
    34 
   119 
    35 \subsection*{Part 1 (6 Marks)}
   120 The task is to implement a regular expression matcher that is based on
    36 
   121 derivatives of regular expressions. Most of the functions are defined by
    37 Coming from Java or C++, you might think Scala is a rather esoteric
   122 recursion over regular expressions and can be elegantly implemented
       
   123 using Scala's pattern-matching. The implementation should deal with the
       
   124 following regular expressions, which have been predefined in the file
       
   125 \texttt{re.scala}:
       
   126 
       
   127 \begin{center}
       
   128 \begin{tabular}{lcll}
       
   129   $r$ & $::=$ & $\ZERO$     & cannot match anything\\
       
   130       &   $|$ & $\ONE$      & can only match the empty string\\
       
   131       &   $|$ & $c$         & can match a single character (in this case $c$)\\
       
   132       &   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
       
   133   &   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
       
   134           &  & & then the second part with $r_2$\\
       
   135       &   $|$ & $r^*$       & can match a string with zero or more copies of $r$\\
       
   136 \end{tabular}
       
   137 \end{center}
       
   138 
       
   139 \noindent 
       
   140 Why? Knowing how to match regular expressions and strings will let you
       
   141 solve a lot of problems that vex other humans. Regular expressions are
       
   142 one of the fastest and simplest ways to match patterns in text, and
       
   143 are endlessly useful for searching, editing and analysing data in all
       
   144 sorts of places (for example analysing network traffic in order to
       
   145 detect security breaches). However, you need to be fast, otherwise you
       
   146 will stumble over problems such as recently reported at
       
   147 
       
   148 {\small
       
   149 \begin{itemize}
       
   150 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
       
   151 \item[$\bullet$] \url{https://vimeo.com/112065252}
       
   152 \item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
       
   153 \end{itemize}}
       
   154 
       
   155 \subsubsection*{Tasks (file re.scala)}
       
   156 
       
   157 The file \texttt{re.scala} has already a definition for regular
       
   158 expressions and also defines some handy shorthand notation for
       
   159 regular expressions. The notation in this document matches up
       
   160 with the code in the file as follows:
       
   161 
       
   162 \begin{center}
       
   163   \begin{tabular}{rcl@{\hspace{10mm}}l}
       
   164     & & code: & shorthand:\smallskip \\ 
       
   165   $\ZERO$ & $\mapsto$ & \texttt{ZERO}\\
       
   166   $\ONE$  & $\mapsto$ & \texttt{ONE}\\
       
   167   $c$     & $\mapsto$ & \texttt{CHAR(c)}\\
       
   168   $r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\
       
   169   $r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\
       
   170   $r^*$ & $\mapsto$ &  \texttt{STAR(r)} & \texttt{r.\%}
       
   171 \end{tabular}    
       
   172 \end{center}  
       
   173 
       
   174 
       
   175 \begin{itemize}
       
   176 \item[(1)] Implement a function, called \textit{nullable}, by
       
   177   recursion over regular expressions. This function tests whether a
       
   178   regular expression can match the empty string. This means given a
       
   179   regular expression it either returns true or false. The function
       
   180   \textit{nullable}
       
   181   is defined as follows:
       
   182 
       
   183 \begin{center}
       
   184 \begin{tabular}{lcl}
       
   185 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
       
   186 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
       
   187 $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
       
   188 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
       
   189 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
       
   190 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
       
   191 \end{tabular}
       
   192 \end{center}~\hfill[1 Mark]
       
   193 
       
   194 \item[(2)] Implement a function, called \textit{der}, by recursion over
       
   195   regular expressions. It takes a character and a regular expression
       
   196   as arguments and calculates the derivative regular expression according
       
   197   to the rules:
       
   198 
       
   199 \begin{center}
       
   200 \begin{tabular}{lcl}
       
   201 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
       
   202 $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
       
   203 $\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
       
   204 $\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
       
   205 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
       
   206       & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
       
   207       & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
       
   208 $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
       
   209 \end{tabular}
       
   210 \end{center}
       
   211 
       
   212 For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives
       
   213 w.r.t.~the characters $a$, $b$ and $c$ are
       
   214 
       
   215 \begin{center}
       
   216   \begin{tabular}{lcll}
       
   217     $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & \quad($= r'$)\\
       
   218     $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
       
   219     $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
       
   220   \end{tabular}
       
   221 \end{center}
       
   222 
       
   223 Let $r'$ stand for the first derivative, then taking the derivatives of $r'$
       
   224 w.r.t.~the characters $a$, $b$ and $c$ gives
       
   225 
       
   226 \begin{center}
       
   227   \begin{tabular}{lcll}
       
   228     $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
       
   229     $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & \quad($= r''$)\\
       
   230     $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
       
   231   \end{tabular}
       
   232 \end{center}
       
   233 
       
   234 One more example: Let $r''$ stand for the second derivative above,
       
   235 then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$
       
   236 and $c$ gives
       
   237 
       
   238 \begin{center}
       
   239   \begin{tabular}{lcll}
       
   240     $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
       
   241     $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
       
   242     $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ &
       
   243     (is $\textit{nullable}$)                      
       
   244   \end{tabular}
       
   245 \end{center}
       
   246 
       
   247 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
       
   248 \mbox{}\hfill\mbox{[1 Mark]}
       
   249 
       
   250 \item[(3)] Implement the function \textit{simp}, which recursively
       
   251   traverses a regular expression from the inside to the outside, and
       
   252   on the way simplifies every regular expression on the left (see
       
   253   below) to the regular expression on the right, except it does not
       
   254   simplify inside ${}^*$-regular expressions.
       
   255 
       
   256   \begin{center}
       
   257 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
       
   258 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
       
   259 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
       
   260 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
       
   261 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
       
   262 $r + \ZERO$ & $\mapsto$ & $r$\\ 
       
   263 $\ZERO + r$ & $\mapsto$ & $r$\\ 
       
   264 $r + r$ & $\mapsto$ & $r$\\ 
       
   265 \end{tabular}
       
   266   \end{center}
       
   267 
       
   268   For example the regular expression
       
   269   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
       
   270 
       
   271   simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
       
   272   seen as trees and there are several methods for traversing
       
   273   trees. One of them corresponds to the inside-out traversal, which is
       
   274   sometimes also called post-order traversal'' you traverse inside the
       
   275   tree and on the way up, you apply simplification rules.
       
   276   Furthermore,
       
   277   remember numerical expressions from school times: there you had expressions
       
   278   like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
       
   279   and simplification rules that looked very similar to rules
       
   280   above. You would simplify such numerical expressions by replacing
       
   281   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
       
   282   look whether more rules are applicable. If you organise the
       
   283   simplification in an inside-out fashion, it is always clear which
       
   284   rule should be applied next.\hfill[1 Mark]
       
   285 
       
   286 \item[(4)] Implement two functions: The first, called \textit{ders},
       
   287   takes a list of characters and a regular expression as arguments, and
       
   288   builds the derivative w.r.t.~the list as follows:
       
   289 
       
   290 \begin{center}
       
   291 \begin{tabular}{lcl}
       
   292 $\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
       
   293   $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
       
   294     $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
       
   295 \end{tabular}
       
   296 \end{center}
       
   297 
       
   298 Note that this function is different from \textit{der}, which only
       
   299 takes a single character.
       
   300 
       
   301 The second function, called \textit{matcher}, takes a string and a
       
   302 regular expression as arguments. It builds first the derivatives
       
   303 according to \textit{ders} and after that tests whether the resulting
       
   304 derivative regular expression can match the empty string (using
       
   305 \textit{nullable}).  For example the \textit{matcher} will produce
       
   306 true for the regular expression $(a\cdot b)\cdot c$ and the string
       
   307 $abc$, but false if you give it the string $ab$. \hfill[1 Mark]
       
   308 
       
   309 \item[(5)] Implement a function, called \textit{size}, by recursion
       
   310   over regular expressions. If a regular expression is seen as a tree,
       
   311   then \textit{size} should return the number of nodes in such a
       
   312   tree. Therefore this function is defined as follows:
       
   313 
       
   314 \begin{center}
       
   315 \begin{tabular}{lcl}
       
   316 $\textit{size}(\ZERO)$ & $\dn$ & $1$\\
       
   317 $\textit{size}(\ONE)$  & $\dn$ & $1$\\
       
   318 $\textit{size}(c)$     & $\dn$ & $1$\\
       
   319 $\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
       
   320 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
       
   321 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
       
   322 \end{tabular}
       
   323 \end{center}
       
   324 
       
   325 You can use \textit{size} in order to test how much the `evil' regular
       
   326 expression $(a^*)^* \cdot b$ grows when taking successive derivatives
       
   327 according the letter $a$ without simplification and then compare it to
       
   328 taking the derivative, but simplify the result.  The sizes
       
   329 are given in \texttt{re.scala}. \hfill[1 Mark]
       
   330 
       
   331 \item[(6)] You do not have to implement anything specific under this
       
   332   task.  The purpose is that you will be marked for some ``power''
       
   333   test cases. For example can your matcher decide withing 30 seconds
       
   334   whether the regular expression $(a^*)^*\cdot b$ matches strings of the
       
   335   form $aaa\ldots{}aaaa$, for say 1 Million $a$'s. And does simplification
       
   336   simplify the regular expression
       
   337 
       
   338   \[
       
   339   \texttt{SEQ(SEQ(SEQ(..., ONE | ONE) , ONE | ONE), ONE | ONE)}
       
   340   \]  
       
   341 
       
   342   \noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested
       
   343   50 or more times.\\
       
   344   \mbox{}\hfill[1 Mark]
       
   345 \end{itemize}
       
   346 
       
   347 \subsection*{Background}
       
   348 
       
   349 Although easily implementable in Scala, the idea behind the derivative
       
   350 function might not so easy to be seen. To understand its purpose
       
   351 better, assume a regular expression $r$ can match strings of the form
       
   352 $c\!::\!cs$ (that means strings which start with a character $c$ and have
       
   353 some rest, or tail, $cs$). If you take the derivative of $r$ with
       
   354 respect to the character $c$, then you obtain a regular expression
       
   355 that can match all the strings $cs$.  In other words, the regular
       
   356 expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
       
   357 that can be matched by $r$, except that the $c$ is chopped off.
       
   358 
       
   359 Assume now $r$ can match the string $abc$. If you take the derivative
       
   360 according to $a$ then you obtain a regular expression that can match
       
   361 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now
       
   362 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ you
       
   363 obtain a regular expression that can match the string $c$ (it is $bc$
       
   364 where $b$ is chopped off). If you finally build the derivative of this
       
   365 according $c$, that is
       
   366 $\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtain
       
   367 a regular expression that can match the empty string. You can test
       
   368 whether this is indeed the case using the function nullable, which is
       
   369 what your matcher is doing.
       
   370 
       
   371 The purpose of the $\textit{simp}$ function is to keep the regular
       
   372 expressions small. Normally the derivative function makes the regular
       
   373 expression bigger (see the SEQ case and the example in (2)) and the
       
   374 algorithm would be slower and slower over time. The $\textit{simp}$
       
   375 function counters this increase in size and the result is that the
       
   376 algorithm is fast throughout.  By the way, this algorithm is by Janusz
       
   377 Brzozowski who came up with the idea of derivatives in 1964 in his PhD
       
   378 thesis.
       
   379 
       
   380 \begin{center}\small
       
   381 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
       
   382 \end{center}
       
   383 
       
   384 
       
   385 If you want to see how badly the regular expression matchers do in
       
   386 Java\footnote{Version 8 and below; Version 9 and above does not seem to be as
       
   387   catastrophic, but still much worse than the regular expression
       
   388   matcher based on derivatives.}, JavaScript and in Python with the
       
   389 `evil' regular expression $(a^*)^*\cdot b$, then have a look at the
       
   390 graphs below (you can try it out for yourself: have a look at the file
       
   391 \texttt{catastrophic9.java}, \texttt{catastrophic.js} and
       
   392 \texttt{catastrophic.py} on KEATS). Compare this with the matcher you
       
   393 have implemented. How long can the string of $a$'s be in your matcher
       
   394 and still stay within the 30 seconds time limit?
       
   395 
       
   396 \begin{center}
       
   397 \begin{tabular}{@{}cc@{}}
       
   398 \multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings 
       
   399            $\underbrace{a\ldots a}_{n}$}\bigskip\\
       
   400   
       
   401 \begin{tikzpicture}
       
   402 \begin{axis}[
       
   403     xlabel={$n$},
       
   404     x label style={at={(1.05,0.0)}},
       
   405     ylabel={time in secs},
       
   406     y label style={at={(0.06,0.5)}},
       
   407     enlargelimits=false,
       
   408     xtick={0,5,...,30},
       
   409     xmax=33,
       
   410     ymax=45,
       
   411     ytick={0,5,...,40},
       
   412     scaled ticks=false,
       
   413     axis lines=left,
       
   414     width=6cm,
       
   415     height=5.5cm, 
       
   416     legend entries={Python, Java 8, JavaScript},  
       
   417     legend pos=north west]
       
   418 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
       
   419 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
       
   420 \addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
       
   421 \end{axis}
       
   422 \end{tikzpicture}
       
   423   & 
       
   424 \begin{tikzpicture}
       
   425 \begin{axis}[
       
   426     xlabel={$n$},
       
   427     x label style={at={(1.05,0.0)}},
       
   428     ylabel={time in secs},
       
   429     y label style={at={(0.06,0.5)}},
       
   430     %enlargelimits=false,
       
   431     %xtick={0,5000,...,30000},
       
   432     xmax=65000,
       
   433     ymax=45,
       
   434     ytick={0,5,...,40},
       
   435     scaled ticks=false,
       
   436     axis lines=left,
       
   437     width=6cm,
       
   438     height=5.5cm, 
       
   439     legend entries={Java 9},  
       
   440     legend pos=north west]
       
   441 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java9.data};
       
   442 \end{axis}
       
   443 \end{tikzpicture}
       
   444 \end{tabular}  
       
   445 \end{center}
       
   446 \newpage
       
   447 
       
   448 \subsection*{Part 2 (4 Marks)}
       
   449 
       
   450 Coming from Java or C++, you might think Scala is a quite esoteric
       
   451 programming language.  But remember, some serious companies have built
    38 programming language.  But remember, some serious companies have built
   452 their business on
    39 their business on
   453 Scala.\footnote{\url{https://en.wikipedia.org/wiki/Scala_(programming_language)\#Companies}}
    40 Scala.\footnote{\url{https://en.wikipedia.org/wiki/Scala_(programming_language)\#Companies}}
   454 And there are far, far more esoteric languages out there. One is
    41 And there are far, far more esoteric languages out there. One is
   455 called \emph{brainf***}. You are asked in this part to implement an
    42 called \emph{brainf***}. You are asked in this part to implement an
   456 interpreter for this language.
    43 interpreter and compiler for this language.
   457 
    44 
   458 Urban M\"uller developed brainf*** in 1993.  A close relative of this
    45 Urban M\"uller developed brainf*** in 1993.  A close relative of this
   459 language was already introduced in 1964 by Corado B\"ohm, an Italian
    46 language was already introduced in 1964 by Corado B\"ohm, an Italian
   460 computer pioneer, who unfortunately died a few months ago. The main
    47 computer pioneer. The main feature of brainf*** is its minimalistic
   461 feature of brainf*** is its minimalistic set of instructions---just 8
    48 set of instructions---just 8 instructions in total and all of which
   462 instructions in total and all of which are single characters. Despite
    49 are single characters. Despite the minimalism, this language has been
   463 the minimalism, this language has been shown to be Turing
    50 shown to be Turing complete\ldots{}if this doesn't ring any bell with
   464 complete\ldots{}if this doesn't ring any bell with you: it roughly
    51 you: it roughly means that every algorithm we know can, in principle,
   465 means that every algorithm we know can, in principle, be implemented in
    52 be implemented in brainf***. It just takes a lot of determination and
   466 brainf***. It just takes a lot of determination and quite a lot of
    53 quite a lot of memory resources. Some relatively sophisticated sample
   467 memory resources. Some relatively sophisticated sample programs in
    54 programs in brainf*** are given in the file \texttt{bf.scala}, including
   468 brainf*** are given in the file \texttt{bf.scala}.\bigskip
    55 a brainf*** program for the Sierpinski triangle and Mandelbot set.\bigskip
   469 
    56 
   470 \noindent
    57 \noindent
   471 As mentioned above, brainf*** has 8 single-character commands, namely
    58 As mentioned above, brainf*** has 8 single-character commands, namely
   472 \texttt{'>'}, \texttt{'<'}, \texttt{'+'}, \texttt{'-'}, \texttt{'.'},
    59 \texttt{'>'}, \texttt{'<'}, \texttt{'+'}, \texttt{'-'}, \texttt{'.'},
   473 \texttt{','}, \texttt{'['} and \texttt{']'}. Every other character is
    60 \texttt{','}, \texttt{'['} and \texttt{']'}. Every other character is
   498 This one prints out Hello World\ldots{}obviously. 
    85 This one prints out Hello World\ldots{}obviously. 
   499 
    86 
   500 \subsubsection*{Tasks (file bf.scala)}
    87 \subsubsection*{Tasks (file bf.scala)}
   501 
    88 
   502 \begin{itemize}
    89 \begin{itemize}
   503 \item[(2a)] Brainf*** memory is represented by a \texttt{Map} from
    90 \item[(1)]  Write a function that takes a file name as argument and
       
    91   and requests the corresponding file from disk. It returns the
       
    92   content of the file as a String. If the file does not exists,
       
    93   the function should return the empty string.\\
       
    94   \mbox{}\hfill[1 Mark]
       
    95   
       
    96 \item[(2)] Brainf*** memory is represented by a \texttt{Map} from
   504   integers to integers. The empty memory is represented by
    97   integers to integers. The empty memory is represented by
   505   \texttt{Map()}, that is nothing is stored in the
    98   \texttt{Map()}, that is nothing is stored in the
   506   memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly stores \texttt{1} at
    99   memory; \texttt{Map(0 -> 1, 2 -> 3)} stores \texttt{1} at
   507   memory location \texttt{0}; at \texttt{2} it stores \texttt{3}. The
   100   memory location \texttt{0}, and at \texttt{2} it stores \texttt{3}. The
   508   convention is that if we query the memory at a location that is
   101   convention is that if we query the memory at a location that is
   509   \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
   102   \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
   510   a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
   103   a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
   511   a memory pointer (an \texttt{Int}) as argument, and safely reads the
   104   a memory pointer (an \texttt{Int}) as argument, and `safely' reads the
   512   corresponding memory location. If the \texttt{Map} is not defined at
   105   corresponding memory location. If the \texttt{Map} is not defined at
   513   the memory pointer, \texttt{sread} returns \texttt{0}.
   106   the memory pointer, \texttt{sread} returns \texttt{0}.
   514 
   107 
   515   Write another function \texttt{write}, which takes a memory, a
   108   Write another function \texttt{write}, which takes a memory, a
   516   memory pointer and an integer value as argument and updates the
   109   memory pointer and an integer value as argument and updates the
   517   \texttt{Map} with the value at the given memory location. As usual
   110   \texttt{Map} with the value at the given memory location. As usual
   518   the \texttt{Map} is not updated `in-place' but a new map is created
   111   the \texttt{Map} is not updated `in-place' but a new map is created
   519   with the same data, except the value is stored at the given memory
   112   with the same data, except the value is stored at the given memory
   520   pointer.\hfill[1 Mark]
   113   pointer.\hfill[1 Mark]
   521 
   114 
   522 \item[(2b)] Write two functions, \texttt{jumpRight} and
   115 \item[(3)] Write two functions, \texttt{jumpRight} and
   523   \texttt{jumpLeft} that are needed to implement the loop constructs
   116   \texttt{jumpLeft} that are needed to implement the loop constructs
   524   of brainf***. They take a program (a \texttt{String}) and a program
   117   of brainf***. They take a program (a \texttt{String}) and a program
   525   counter (an \texttt{Int}) as argument and move right (respectively
   118   counter (an \texttt{Int}) as argument and move right (respectively
   526   left) in the string in order to find the \textbf{matching}
   119   left) in the string in order to find the \textbf{matching}
   527   opening/closing bracket. For example, given the following program
   120   opening/closing bracket. For example, given the following program
   584   \begin{center}
   177   \begin{center}
   585   \texttt{--[\barbelow{.}.[[-]+>[.]]--,>,++}
   178   \texttt{--[\barbelow{.}.[[-]+>[.]]--,>,++}
   586   \qquad$\stackrel{\texttt{jumpRight}}{\longrightarrow}$\qquad
   179   \qquad$\stackrel{\texttt{jumpRight}}{\longrightarrow}$\qquad
   587   \texttt{--[..[[-]+>[.]]-->,++\barbelow{\;\phantom{+}}}
   180   \texttt{--[..[[-]+>[.]]-->,++\barbelow{\;\phantom{+}}}
   588   \end{center}
   181   \end{center}
   589   \hfill[1 Mark]
   182   \hfill[2 Marks]
   590 
   183 
   591 
   184 
   592 \item[(2c)] Write a recursive function \texttt{run} that executes a
   185 \item[(4)] Write a recursive function \texttt{run} that executes a
   593   brainf*** program. It takes a program, a program counter, a memory
   186   brainf*** program. It takes a program, a program counter, a memory
   594   pointer and a memory as arguments. If the program counter is outside
   187   pointer and a memory as arguments. If the program counter is outside
   595   the program string, the execution stops and \texttt{run} returns the
   188   the program string, the execution stops and \texttt{run} returns the
   596   memory. If the program counter is inside the string, it reads the
   189   memory. If the program counter is inside the string, it reads the
   597   corresponding character and updates the program counter \texttt{pc},
   190   corresponding character and updates the program counter \texttt{pc},
   598   memory pointer \texttt{mp} and memory \texttt{mem} according to the
   191   memory pointer \texttt{mp} and memory \texttt{mem} according to the
   599   rules shown in Figure~\ref{comms}. It then calls recursively
   192   rules shown in Figure~\ref{comms}. It then calls recursively
   600   \texttt{run} with the updated data.
   193   \texttt{run} with the updated data. The most convenient way to
       
   194   implement the rules in \texttt{run} is to use pattern-matching
       
   195   and calculating a triple consisting of the new \texttt{pc},
       
   196   \texttt{mp} and \texttt{mem}.
   601 
   197 
   602   Write another function \texttt{start} that calls \texttt{run} with a
   198   Write another function \texttt{start} that calls \texttt{run} with a
   603   given brainfu** program and memory, and the program counter and memory pointer
   199   given brainfu** program and memory, and the program counter and memory pointer
   604   set to~$0$. Like \texttt{run} it returns the memory after the execution
   200   set to~$0$. Like \texttt{run} it returns the memory after the execution
   605   of the program finishes. You can test your brainf**k interpreter with the
   201   of the program finishes. You can test your brainf**k interpreter with the
   606   Sierpinski triangle or the Hello world programs or have a look at
   202   Sierpinski triangle or the Hello world programs (they seem to be particularly
       
   203   useful for debugging purposes), or have a look at
   607 
   204 
   608   \begin{center}
   205   \begin{center}
   609   \url{https://esolangs.org/wiki/Brainfuck}
   206   \url{https://esolangs.org/wiki/Brainfuck}
   610   \end{center}\hfill[2 Marks]
   207   \end{center}\hfill[2 Marks]
   611   
   208   
   671   \caption{The rules for how commands in the brainf*** language update the program counter \texttt{pc},
   268   \caption{The rules for how commands in the brainf*** language update the program counter \texttt{pc},
   672     memory pointer \texttt{mp} and memory \texttt{mem}.\label{comms}}
   269     memory pointer \texttt{mp} and memory \texttt{mem}.\label{comms}}
   673   \end{figure}
   270   \end{figure}
   674 \end{itemize}\bigskip  
   271 \end{itemize}\bigskip  
   675 
   272 
   676 
   273 \subsection*{Part 2 (4 Marks)}
       
   274 
       
   275 While it is fun to look at bf-programs, like the Sierpinski triangle or the Mandelbrot
       
   276 program, being interpreted, it is much more fun to write a compiler for the bf-language.
   677 
   277 
   678 
   278 
   679 \end{document}
   279 \end{document}
   680 
   280 
   681 
   281