96 \section*{Coursework 9 (Scala)} |
96 \section*{Coursework 9 (Scala)} |
97 |
97 |
98 This coursework is worth 10\%. It is about a regular expression |
98 This coursework is worth 10\%. It is about a regular expression |
99 matcher and the shunting yard algorithm by Dijkstra. The first part is |
99 matcher and the shunting yard algorithm by Dijkstra. The first part is |
100 due on 6 December at 11pm; the second, more advanced part, is due on |
100 due on 6 December at 11pm; the second, more advanced part, is due on |
101 21 December at 11pm. In the first part, you are asked to implement a |
101 20 December at 11pm. In the first part, you are asked to implement a |
102 regular expression matcher based on derivatives of regular |
102 regular expression matcher based on derivatives of regular |
103 expressions. The background is that regular expression matching in |
103 expressions. The background is that regular expression matching in |
104 languages like Java, JavaScript and Python can sometimes be excruciatingly |
104 languages like Java, JavaScript and Python can sometimes be excruciatingly |
105 slow. The advanced part is about the shunting yard algorithm that |
105 slow. The advanced part is about the shunting yard algorithm that |
106 transforms the usual infix notation of arithmetic expressions into the |
106 transforms the usual infix notation of arithmetic expressions into the |
115 \DISCLAIMER{} |
115 \DISCLAIMER{} |
116 |
116 |
117 \subsection*{Reference Implementation} |
117 \subsection*{Reference Implementation} |
118 |
118 |
119 This Scala assignment comes with three reference implementations in form of |
119 This Scala assignment comes with three reference implementations in form of |
120 \texttt{jar}-files. This allows you to run any test cases on your own |
120 \texttt{jar}-files you can download from KEATS. This allows you to run any |
|
121 test cases on your own |
121 computer. For example you can call Scala on the command line with the |
122 computer. For example you can call Scala on the command line with the |
122 option \texttt{-cp re.jar} and then query any function from the |
123 option \texttt{-cp re.jar} and then query any function from the |
123 \texttt{re.scala} template file. As usual you have to |
124 \texttt{re.scala} template file. As usual you have to |
124 prefix the calls with \texttt{CW9a}, \texttt{CW9b} and \texttt{CW9c}. |
125 prefix the calls with \texttt{CW9a}, \texttt{CW9b} and \texttt{CW9c}. |
125 Since some tasks are time sensitive, you can check the reference |
126 Since some tasks are time sensitive, you can check the reference |
126 implementation as follows: if you want to know how long it takes |
127 implementation as follows: if you want to know, for example, how long it takes |
127 to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$ |
128 to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$ |
128 you can querry as follows: |
129 you can querry as follows: |
129 |
130 |
130 |
131 |
131 |
132 |
283 |
284 |
284 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\ |
285 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\ |
285 \mbox{}\hfill\mbox{[1 Mark]} |
286 \mbox{}\hfill\mbox{[1 Mark]} |
286 |
287 |
287 \item[(3)] Implement the function \textit{simp}, which recursively |
288 \item[(3)] Implement the function \textit{simp}, which recursively |
288 traverses a regular expression from the inside to the outside, and |
289 traverses a regular expression, and on the way up simplifies every |
289 on the way simplifies every regular expression on the left (see |
290 regular expression on the left (see below) to the regular expression |
290 below) to the regular expression on the right, except it does not |
291 on the right, except it does not simplify inside ${}^*$-regular |
291 simplify inside ${}^*$-regular expressions. |
292 expressions. |
292 |
293 |
293 \begin{center} |
294 \begin{center} |
294 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll} |
295 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll} |
295 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ |
296 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ |
296 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ |
297 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ |
306 \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] |
307 \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] |
307 |
308 |
308 simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be |
309 simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be |
309 seen as trees and there are several methods for traversing |
310 seen as trees and there are several methods for traversing |
310 trees. One of them corresponds to the inside-out traversal, which is |
311 trees. One of them corresponds to the inside-out traversal, which is |
311 sometimes also called post-order traversal'' you traverse inside the |
312 sometimes also called post-order traversal: you traverse inside the |
312 tree and on the way up, you apply simplification rules. |
313 tree and on the way up you apply simplification rules. |
313 Furthermore, |
314 Furthermore, |
314 remember numerical expressions from school times: there you had expressions |
315 remember numerical expressions from school times: there you had expressions |
315 like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ |
316 like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ |
316 and simplification rules that looked very similar to rules |
317 and simplification rules that looked very similar to rules |
317 above. You would simplify such numerical expressions by replacing |
318 above. You would simplify such numerical expressions by replacing |
318 for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then |
319 for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then |
319 look whether more rules are applicable. If you organise the |
320 look whether more rules are applicable. If you organise the |
320 simplification in an inside-out fashion, it is always clear which |
321 simplification in an inside-out fashion, it is always clear which |
321 rule should be applied next.\hfill[1 Mark] |
322 simplification should be applied next.\hfill[1 Mark] |
322 |
323 |
323 \item[(4)] Implement two functions: The first, called \textit{ders}, |
324 \item[(4)] Implement two functions: The first, called \textit{ders}, |
324 takes a list of characters and a regular expression as arguments, and |
325 takes a list of characters and a regular expression as arguments, and |
325 builds the derivative w.r.t.~the list as follows: |
326 builds the derivative w.r.t.~the list as follows: |
326 |
327 |
495 convenient to work with the usual infix notation for arithmetic |
496 convenient to work with the usual infix notation for arithmetic |
496 expressions. Most modern calculators (as opposed to the ones used 20 |
497 expressions. Most modern calculators (as opposed to the ones used 20 |
497 years ago) understand infix notation. So why on Earth? \ldots{}Well, |
498 years ago) understand infix notation. So why on Earth? \ldots{}Well, |
498 many computers under the hood, even nowadays, use postfix notation |
499 many computers under the hood, even nowadays, use postfix notation |
499 extensively. For example if you give to the Java compiler the |
500 extensively. For example if you give to the Java compiler the |
500 expression $1 + ((2 * 3) + (4 - 3))$, it will generate for it the Java Byte |
501 expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte |
501 code |
502 code |
502 |
503 |
503 \begin{lstlisting}[language=JVMIS,numbers=none] |
504 \begin{lstlisting}[language=JVMIS,numbers=none] |
504 ldc 1 |
505 ldc 1 |
505 ldc 2 |
506 ldc 2 |
520 \noindent |
521 \noindent |
521 The shunting yard algorithm processes an input token list using an |
522 The shunting yard algorithm processes an input token list using an |
522 operator stack and an output list. The input consists of numbers, |
523 operator stack and an output list. The input consists of numbers, |
523 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of |
524 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of |
524 the assignment we assume the input is always a well-formed expression |
525 the assignment we assume the input is always a well-formed expression |
525 in infix notation. The algorithm uses information about the |
526 in infix notation. The calculation in the shunting yard algorithm uses |
|
527 information about the |
526 precedences of the operators (given in the template file). The |
528 precedences of the operators (given in the template file). The |
527 algorithm processes the input token list as follows: |
529 algorithm processes the input token list as follows: |
528 |
530 |
529 \begin{itemize} |
531 \begin{itemize} |
530 \item If there is a number as input token, then this token is |
532 \item If there is a number as input token, then this token is |
531 transferred to the output list. Then the rest of the input is |
533 transferred directly to the output list. Then the rest of the input is |
532 processed. |
534 processed. |
533 \item If there is an operator as input token, then you need to check |
535 \item If there is an operator as input token, then you need to check |
534 what is on top of the operator stack. If there are operators with |
536 what is on top of the operator stack. If there are operators with |
535 a higher or equal precedence, these operators are first popped off |
537 a higher or equal precedence, these operators are first popped off |
536 the stack and moved to the output list. Then the operator from the input |
538 from the stack and moved to the output list. Then the operator from the input |
537 is pushed onto the stack and the rest of the input is processed. |
539 is pushed onto the stack and the rest of the input is processed. |
538 \item If the input is a left-parenthesis, you push it on to the stack |
540 \item If the input is a left-parenthesis, you push it on to the stack |
539 and continue processing the input. |
541 and continue processing the input. |
540 \item If the input is a right-parenthesis, then you move all operators |
542 \item If the input is a right-parenthesis, then you pop off all operators |
541 from the stack to the output list until you reach the left-parenthesis. |
543 from the stack to the output list until you reach the left-parenthesis. |
542 Then you discharge the $($ and $)$ from the input and stack, and continue |
544 Then you discharge the $($ and $)$ from the input and stack, and continue |
543 processing. |
545 processing the input list. |
544 \item If the input is empty, then you move all remaining operators |
546 \item If the input is empty, then you move all remaining operators |
545 from the stack to the output list. |
547 from the stack to the output list. |
546 \end{itemize} |
548 \end{itemize} |
547 |
549 |
548 \subsubsection*{Tasks (file postfix.scala)} |
550 \subsubsection*{Tasks (file postfix.scala)} |
549 |
551 |
550 \begin{itemize} |
552 \begin{itemize} |
551 \item[(7)] Implement the shunting yard algorithm outlined above. The |
553 \item[(7)] Implement the shunting yard algorithm described above. The |
552 function, called \texttt{syard}, takes a list of tokens as first |
554 function, called \texttt{syard}, takes a list of tokens as first |
553 argument. The second and third arguments are the stack and output |
555 argument. The second and third arguments are the stack and output |
554 list represented as Scala lists. The most convenient way to |
556 list represented as Scala lists. The most convenient way to |
555 implement this algorithm is to analyse what the input list, stack |
557 implement this algorithm is to analyse what the input list, stack |
556 and output look like in each step using pattern-matching. The |
558 and output list look like in each step using pattern-matching. The |
557 algorithm transforms for example the input |
559 algorithm transforms for example the input |
558 |
560 |
559 \[ |
561 \[ |
560 \texttt{List(3, +, 4, *, (, 2, -, 1, ))} |
562 \texttt{List(3, +, 4, *, (, 2, -, 1, ))} |
561 \] |
563 \] |