77 \begin{document} |
91 \begin{document} |
78 |
92 |
79 % BF IDE |
93 % BF IDE |
80 % https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5 |
94 % https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5 |
81 |
95 |
82 \section*{Coursework 8 (Regular Expressions and Brainf***)} |
96 \section*{Coursework 9 (Scala)} |
83 |
97 |
84 This coursework is worth 10\%. It is about regular expressions, |
98 This coursework is worth 10\%. It is about a regular expression |
85 pattern matching and an interpreter. The first part is due on 30 |
99 matcher and the shunting yard algorithm by Dijkstra. The first part is |
86 November at 11pm; the second, more advanced part, is due on 21 |
100 due on 6 December at 11pm; the second, more advanced part, is due on |
87 December at 11pm. In the first part, you are asked to implement a |
101 21 December at 11pm. In the first part, you are asked to implement a |
88 regular expression matcher based on derivatives of regular |
102 regular expression matcher based on derivatives of regular |
89 expressions. The reason is that regular expression matching in Java |
103 expressions. The background is that regular expression matching in |
90 and Python can sometimes be extremely slow. The advanced part is about |
104 languages like Java, JavaScript and Python can sometimes be excruciatingly |
91 an interpreter for a very simple programming language.\bigskip |
105 slow. The advanced part is about the shunting yard algorithm that |
|
106 transforms the usual infix notation of arithmetic expressions into the |
|
107 postfix notation, which is for example used in compilers.\bigskip |
92 |
108 |
93 \IMPORTANT{} |
109 \IMPORTANT{} |
94 |
110 |
95 \noindent |
111 \noindent |
96 Also note that the running time of each part will be restricted to a |
112 Also note that the running time of each part will be restricted to a |
97 maximum of 30 seconds on my laptop. |
113 maximum of 30 seconds on my laptop. |
98 |
114 |
99 \DISCLAIMER{} |
115 \DISCLAIMER{} |
|
116 |
|
117 \subsection*{Reference Implementation} |
|
118 |
|
119 This Scala assignment comes with three reference implementations in form of |
|
120 \texttt{jar}-files. This allows you to run any test cases on your own |
|
121 computer. For example you can call Scala on the command line with the |
|
122 option \texttt{-cp re.jar} and then query any function from the |
|
123 \texttt{re.scala} template file. As usual you have to |
|
124 prefix the calls with \texttt{CW9a}, \texttt{CW9b} and \texttt{CW9c}. |
|
125 Since some tasks are time sensitive, you can check the reference |
|
126 implementation as follows: if you want to know how long it takes |
|
127 to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$ |
|
128 you can querry as follows: |
|
129 |
|
130 |
|
131 |
|
132 \begin{lstlisting}[language={},xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small] |
|
133 $ scala -cp re.jar |
|
134 scala> import CW9a._ |
|
135 scala> for (i <- 0 to 5000000 by 500000) { |
|
136 | println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL, "a" * i))) + "secs.") |
|
137 | } |
|
138 0 0.00002 secs. |
|
139 500000 0.10608 secs. |
|
140 1000000 0.22286 secs. |
|
141 1500000 0.35982 secs. |
|
142 2000000 0.45828 secs. |
|
143 2500000 0.59558 secs. |
|
144 3000000 0.73191 secs. |
|
145 3500000 0.83499 secs. |
|
146 4000000 0.99149 secs. |
|
147 4500000 1.15395 secs. |
|
148 5000000 1.29659 secs. |
|
149 \end{lstlisting}%$ |
100 |
150 |
101 |
151 |
102 \subsection*{Part 1 (6 Marks)} |
152 \subsection*{Part 1 (6 Marks)} |
103 |
153 |
104 The task is to implement a regular expression matcher that is based on |
154 The task is to implement a regular expression matcher that is based on |
411 \end{center} |
482 \end{center} |
412 \newpage |
483 \newpage |
413 |
484 |
414 \subsection*{Part 2 (4 Marks)} |
485 \subsection*{Part 2 (4 Marks)} |
415 |
486 |
416 Coming from Java or C++, you might think Scala is a quite esoteric |
487 The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra, |
417 programming language. But remember, some serious companies have built |
488 an influential computer scientist who developed many well-known |
418 their business on |
489 algorithms. This algorithm transforms the usual infix notation of |
419 Scala.\footnote{\url{https://en.wikipedia.org/wiki/Scala_(programming_language)\#Companies}} |
490 arithmetic expressions into the postfix notation, sometimes also |
420 And there are far, far more esoteric languages out there. One is |
491 called reverse Polish notation. |
421 called \emph{brainf***}. You are asked in this part to implement an |
492 |
422 interpreter for this language. |
493 Why on Earth do people use the postfix notation? It is much more |
423 |
494 convenient to work with the usual infix notation for arithmetic |
424 Urban M\"uller developed brainf*** in 1993. A close relative of this |
495 expressions. Most modern calculators (as opposed to the ones used 20 |
425 language was already introduced in 1964 by Corado B\"ohm, an Italian |
496 years ago) understand infix notation. So why on Earth? \ldots{}Well, |
426 computer pioneer, who unfortunately died a few months ago. The main |
497 many computers under the hood, even nowadays, use postfix notation |
427 feature of brainf*** is its minimalistic set of instructions---just 8 |
498 extensively. For example if you give to the Java compiler the |
428 instructions in total and all of which are single characters. Despite |
499 expression $1 + ((2 * 3) + (4 - 3))$, it will generate for it the Java Byte |
429 the minimalism, this language has been shown to be Turing |
500 code |
430 complete\ldots{}if this doesn't ring any bell with you: it roughly |
501 |
431 means that every algorithm we know can, in principle, be implemented in |
502 \begin{lstlisting}[language=JVMIS,numbers=none] |
432 brainf***. It just takes a lot of determination and quite a lot of |
503 ldc 1 |
433 memory resources. Some relatively sophisticated sample programs in |
504 ldc 2 |
434 brainf*** are given in the file \texttt{bf.scala}.\bigskip |
505 ldc 3 |
|
506 imul |
|
507 ldc 4 |
|
508 ldc 3 |
|
509 isub |
|
510 iadd |
|
511 iadd |
|
512 \end{lstlisting} |
435 |
513 |
436 \noindent |
514 \noindent |
437 As mentioned above, brainf*** has 8 single-character commands, namely |
515 where the command \texttt{ldc} loads a constant onto a stack, and \texttt{imul}, |
438 \texttt{'>'}, \texttt{'<'}, \texttt{'+'}, \texttt{'-'}, \texttt{'.'}, |
516 \texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this |
439 \texttt{','}, \texttt{'['} and \texttt{']'}. Every other character is |
517 is the arithmetic expression in postfix notation.\bigskip |
440 considered a comment. Brainf*** operates on memory cells containing |
|
441 integers. For this it uses a single memory pointer that points at each |
|
442 stage to one memory cell. This pointer can be moved forward by one |
|
443 memory cell by using the command \texttt{'>'}, and backward by using |
|
444 \texttt{'<'}. The commands \texttt{'+'} and \texttt{'-'} increase, |
|
445 respectively decrease, by 1 the content of the memory cell to which |
|
446 the memory pointer currently points to. The commands for input/output |
|
447 are \texttt{','} and \texttt{'.'}. Output works by reading the content |
|
448 of the memory cell to which the memory pointer points to and printing |
|
449 it out as an ASCII character. Input works the other way, taking some |
|
450 user input and storing it in the cell to which the memory pointer |
|
451 points to. The commands \texttt{'['} and \texttt{']'} are looping |
|
452 constructs. Everything in between \texttt{'['} and \texttt{']'} is |
|
453 repeated until a counter (memory cell) reaches zero. A typical |
|
454 program in brainf*** looks as follows: |
|
455 |
|
456 \begin{center} |
|
457 \begin{verbatim} |
|
458 ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++ |
|
459 ..+++.>>.<-.<.+++.------.--------.>>+.>++. |
|
460 \end{verbatim} |
|
461 \end{center} |
|
462 |
518 |
463 \noindent |
519 \noindent |
464 This one prints out Hello World\ldots{}obviously. |
520 The shunting yard algorithm processes an input token list using an |
465 |
521 operator stack and an output list. The input consists of numbers, |
466 \subsubsection*{Tasks (file bf.scala)} |
522 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of |
|
523 the assignment we assume the input is always a well-formed expression |
|
524 in infix notation. The algorithm uses information about the |
|
525 precedences of the operators (given in the template file). The |
|
526 algorithm processes the input token list as follows: |
467 |
527 |
468 \begin{itemize} |
528 \begin{itemize} |
469 \item[(2a)] Brainf*** memory is represented by a \texttt{Map} from |
529 \item If there is a number as input token, then this token is |
470 integers to integers. The empty memory is represented by |
530 transferred to the output list. Then the rest of the input is |
471 \texttt{Map()}, that is nothing is stored in the |
531 processed. |
472 memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly stores \texttt{1} at |
532 \item If there is an operator as input token, then you need to check |
473 memory location \texttt{0}; at \texttt{2} it stores \texttt{3}. The |
533 what is on top of the operator stack. If there are operators with |
474 convention is that if we query the memory at a location that is |
534 a higher or equal precedence, these operators are first popped off |
475 \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write |
535 the stack and moved to the output list. Then the operator from the input |
476 a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and |
536 is pushed onto the stack and the rest of the input is processed. |
477 a memory pointer (an \texttt{Int}) as argument, and safely reads the |
537 \item If the input is a left-parenthesis, you push it on to the stack |
478 corresponding memory location. If the \texttt{Map} is not defined at |
538 and continue processing the input. |
479 the memory pointer, \texttt{sread} returns \texttt{0}. |
539 \item If the input is a right-parenthesis, then you move all operators |
480 |
540 from the stack to the output list until you reach the left-parenthesis. |
481 Write another function \texttt{write}, which takes a memory, a |
541 Then you discharge the $($ and $)$ from the input and stack, and continue |
482 memory pointer and an integer value as argument and updates the |
542 processing. |
483 \texttt{Map} with the value at the given memory location. As usual |
543 \item If the input is empty, then you move all remaining operators |
484 the \texttt{Map} is not updated `in-place' but a new map is created |
544 from the stack to the output list. |
485 with the same data, except the value is stored at the given memory |
545 \end{itemize} |
486 pointer.\hfill[1 Mark] |
546 |
487 |
547 \subsubsection*{Tasks (file postfix.scala)} |
488 \item[(2b)] Write two functions, \texttt{jumpRight} and |
548 |
489 \texttt{jumpLeft} that are needed to implement the loop constructs |
549 \begin{itemize} |
490 of brainf***. They take a program (a \texttt{String}) and a program |
550 \item[(7)] Implement the shunting yard algorithm outlined above. The |
491 counter (an \texttt{Int}) as argument and move right (respectively |
551 function, called \texttt{syard}, takes a list of tokens as first |
492 left) in the string in order to find the \textbf{matching} |
552 argument. The second and third arguments are the stack and output |
493 opening/closing bracket. For example, given the following program |
553 list represented as Scala lists. The most convenient way to |
494 with the program counter indicated by an arrow: |
554 implement this algorithm is to analyse what the input list, stack |
495 |
555 and output look like in each step using pattern-matching. The |
496 \begin{center} |
556 algorithm transforms for example the input |
497 \texttt{--[\barbelow{.}.+>--],>,++} |
557 |
498 \end{center} |
558 \[ |
499 |
559 \texttt{List(3, +, 4, *, (, 2, -, 1, ))} |
500 then the matching closing bracket is in 9th position (counting from 0) and |
560 \] |
501 \texttt{jumpRight} is supposed to return the position just after this |
561 |
|
562 into the postfix output |
|
563 |
|
564 \[ |
|
565 \texttt{List(3, 4, 2, 1, -, *, +)} |
|
566 \] |
502 |
567 |
503 \begin{center} |
568 You can assume the input list is always a list representing |
504 \texttt{--[..+>--]\barbelow{,}>,++} |
569 a well-formed infix arithmetic expression.\hfill[1 Mark] |
505 \end{center} |
570 |
506 |
571 \item[(8)] Implement a compute function that takes a postfix expression |
507 meaning it jumps to after the loop. Similarly, if you are in 8th position |
572 as argument and evaluates it generating an integer as result. It uses a |
508 then \texttt{jumpLeft} is supposed to jump to just after the opening |
573 stack to evaluate the postfix expression. The operators $+$, $-$, $*$ |
509 bracket (that is jumping to the beginning of the loop): |
574 are as usual; $/$ is division on integers, for example $7 / 3 = 2$. |
510 |
|
511 \begin{center} |
|
512 \texttt{--[..+>-\barbelow{-}],>,++} |
|
513 \qquad$\stackrel{\texttt{jumpLeft}}{\longrightarrow}$\qquad |
|
514 \texttt{--[\barbelow{.}.+>--],>,++} |
|
515 \end{center} |
|
516 |
|
517 Unfortunately we have to take into account that there might be |
|
518 other opening and closing brackets on the `way' to find the |
|
519 matching bracket. For example in the brainf*** program |
|
520 |
|
521 \begin{center} |
|
522 \texttt{--[\barbelow{.}.[+>]--],>,++} |
|
523 \end{center} |
|
524 |
|
525 we do not want to return the index for the \texttt{'-'} in the 9th |
|
526 position, but the program counter for \texttt{','} in 12th |
|
527 position. The easiest to find out whether a bracket is matched is by |
|
528 using levels (which are the third argument in \texttt{jumpLeft} and |
|
529 \texttt{jumpLeft}). In case of \texttt{jumpRight} you increase the |
|
530 level by one whenever you find an opening bracket and decrease by |
|
531 one for a closing bracket. Then in \texttt{jumpRight} you are looking |
|
532 for the closing bracket on level \texttt{0}. For \texttt{jumpLeft} you |
|
533 do the opposite. In this way you can find \textbf{matching} brackets |
|
534 in strings such as |
|
535 |
|
536 \begin{center} |
|
537 \texttt{--[\barbelow{.}.[[-]+>[.]]--],>,++} |
|
538 \end{center} |
|
539 |
|
540 for which \texttt{jumpRight} should produce the position: |
|
541 |
|
542 \begin{center} |
|
543 \texttt{--[..[[-]+>[.]]--]\barbelow{,}>,++} |
|
544 \end{center} |
|
545 |
|
546 It is also possible that the position returned by \texttt{jumpRight} or |
|
547 \texttt{jumpLeft} is outside the string in cases where there are |
|
548 no matching brackets. For example |
|
549 |
|
550 \begin{center} |
|
551 \texttt{--[\barbelow{.}.[[-]+>[.]]--,>,++} |
|
552 \qquad$\stackrel{\texttt{jumpRight}}{\longrightarrow}$\qquad |
|
553 \texttt{--[..[[-]+>[.]]-->,++\barbelow{\;\phantom{+}}} |
|
554 \end{center} |
|
555 \hfill[1 Mark] |
575 \hfill[1 Mark] |
556 |
576 \end{itemize} |
557 |
577 |
558 \item[(2c)] Write a recursive function \texttt{run} that executes a |
578 \subsubsection*{Task (file postfix2.scala)} |
559 brainf*** program. It takes a program, a program counter, a memory |
579 |
560 pointer and a memory as arguments. If the program counter is outside |
580 \begin{itemize} |
561 the program string, the execution stops and \texttt{run} returns the |
581 \item[(9)] Extend the code in (7) and (8) to include the power |
562 memory. If the program counter is inside the string, it reads the |
582 operator. This requires proper account of associativity of |
563 corresponding character and updates the program counter \texttt{pc}, |
583 the operators. The power operator is right-associative, whereas the |
564 memory pointer \texttt{mp} and memory \texttt{mem} according to the |
584 other operators are left-associative. Left-associative operators |
565 rules shown in Figure~\ref{comms}. It then calls recursively |
585 are popped off if the precedence is bigger or equal, while |
566 \texttt{run} with the updated data. |
586 right-associative operators are only popped off if the precedence is |
567 |
587 bigger. The compute function in this task should use |
568 Write another function \texttt{start} that calls \texttt{run} with a |
588 \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks] |
569 given brainfu** program and memory, and the program counter and memory pointer |
589 \end{itemize} |
570 set to~$0$. Like \texttt{run} it returns the memory after the execution |
|
571 of the program finishes. You can test your brainf**k interpreter with the |
|
572 Sierpinski triangle or the Hello world programs or have a look at |
|
573 |
|
574 \begin{center} |
|
575 \url{https://esolangs.org/wiki/Brainfuck} |
|
576 \end{center}\hfill[2 Marks] |
|
577 |
|
578 \begin{figure}[p] |
|
579 \begin{center} |
|
580 \begin{tabular}{|@{}p{0.8cm}|l|} |
|
581 \hline |
|
582 \hfill\texttt{'>'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
583 $\bullet$ & $\texttt{pc} + 1$\\ |
|
584 $\bullet$ & $\texttt{mp} + 1$\\ |
|
585 $\bullet$ & \texttt{mem} unchanged |
|
586 \end{tabular}\\\hline |
|
587 \hfill\texttt{'<'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
588 $\bullet$ & $\texttt{pc} + 1$\\ |
|
589 $\bullet$ & $\texttt{mp} - 1$\\ |
|
590 $\bullet$ & \texttt{mem} unchanged |
|
591 \end{tabular}\\\hline |
|
592 \hfill\texttt{'+'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
593 $\bullet$ & $\texttt{pc} + 1$\\ |
|
594 $\bullet$ & $\texttt{mp}$ unchanged\\ |
|
595 $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) + 1}\\ |
|
596 \end{tabular}\\\hline |
|
597 \hfill\texttt{'-'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
598 $\bullet$ & $\texttt{pc} + 1$\\ |
|
599 $\bullet$ & $\texttt{mp}$ unchanged\\ |
|
600 $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\ |
|
601 \end{tabular}\\\hline |
|
602 \hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
603 $\bullet$ & $\texttt{pc} + 1$\\ |
|
604 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\ |
|
605 $\bullet$ & print out \,\texttt{mem(mp)} as a character\\ |
|
606 \end{tabular}\\\hline |
|
607 \hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
608 $\bullet$ & $\texttt{pc} + 1$\\ |
|
609 $\bullet$ & $\texttt{mp}$ unchanged\\ |
|
610 $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\ |
|
611 \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}} |
|
612 \end{tabular}\\\hline |
|
613 \hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
614 \multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\ |
|
615 $\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\ |
|
616 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\ |
|
617 \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) != 0} then}\\ |
|
618 $\bullet$ & $\texttt{pc} + 1$\\ |
|
619 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\ |
|
620 \end{tabular} |
|
621 \\\hline |
|
622 \hfill\texttt{']'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
623 \multicolumn{2}{@{}l}{if \texttt{mem(mp) != 0} then}\\ |
|
624 $\bullet$ & $\texttt{pc = jumpLeft(prog, pc - 1, 0)}$\\ |
|
625 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\ |
|
626 \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) == 0} then}\\ |
|
627 $\bullet$ & $\texttt{pc} + 1$\\ |
|
628 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\ |
|
629 \end{tabular}\\\hline |
|
630 any other char & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
|
631 $\bullet$ & $\texttt{pc} + 1$\\ |
|
632 $\bullet$ & \texttt{mp} and \texttt{mem} unchanged |
|
633 \end{tabular}\\ |
|
634 \hline |
|
635 \end{tabular} |
|
636 \end{center} |
|
637 \caption{The rules for how commands in the brainf*** language update the program counter \texttt{pc}, |
|
638 memory pointer \texttt{mp} and memory \texttt{mem}.\label{comms}} |
|
639 \end{figure} |
|
640 \end{itemize}\bigskip |
|
641 |
590 |
642 |
591 |
643 |
592 |
644 |
593 |
645 \end{document} |
594 \end{document} |