cws/cw04.tex
changeset 224 42d760984496
parent 222 e52cc402caee
child 245 975d34506e88
equal deleted inserted replaced
223:c6453f3547ec 224:42d760984496
    96 \section*{Coursework 9 (Scala)}
    96 \section*{Coursework 9 (Scala)}
    97 
    97 
    98 This coursework is worth 10\%. It is about a regular expression
    98 This coursework is worth 10\%. It is about a regular expression
    99 matcher and the shunting yard algorithm by Dijkstra. The first part is
    99 matcher and the shunting yard algorithm by Dijkstra. The first part is
   100 due on 6 December at 11pm; the second, more advanced part, is due on
   100 due on 6 December at 11pm; the second, more advanced part, is due on
   101 21 December at 11pm. In the first part, you are asked to implement a
   101 20 December at 11pm. In the first part, you are asked to implement a
   102 regular expression matcher based on derivatives of regular
   102 regular expression matcher based on derivatives of regular
   103 expressions. The background is that regular expression matching in
   103 expressions. The background is that regular expression matching in
   104 languages like Java, JavaScript and Python can sometimes be excruciatingly
   104 languages like Java, JavaScript and Python can sometimes be excruciatingly
   105 slow. The advanced part is about the shunting yard algorithm that
   105 slow. The advanced part is about the shunting yard algorithm that
   106 transforms the usual infix notation of arithmetic expressions into the
   106 transforms the usual infix notation of arithmetic expressions into the
   115 \DISCLAIMER{}
   115 \DISCLAIMER{}
   116 
   116 
   117 \subsection*{Reference Implementation}
   117 \subsection*{Reference Implementation}
   118 
   118 
   119 This Scala assignment comes with three reference implementations in form of
   119 This Scala assignment comes with three reference implementations in form of
   120 \texttt{jar}-files. This allows you to run any test cases on your own
   120 \texttt{jar}-files you can download from KEATS. This allows you to run any
       
   121 test cases on your own
   121 computer. For example you can call Scala on the command line with the
   122 computer. For example you can call Scala on the command line with the
   122 option \texttt{-cp re.jar} and then query any function from the
   123 option \texttt{-cp re.jar} and then query any function from the
   123 \texttt{re.scala} template file. As usual you have to
   124 \texttt{re.scala} template file. As usual you have to
   124 prefix the calls with \texttt{CW9a}, \texttt{CW9b} and \texttt{CW9c}.
   125 prefix the calls with \texttt{CW9a}, \texttt{CW9b} and \texttt{CW9c}.
   125 Since some tasks are time sensitive, you can check the reference
   126 Since some tasks are time sensitive, you can check the reference
   126 implementation as follows: if you want to know how long it takes
   127 implementation as follows: if you want to know, for example, how long it takes
   127 to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$
   128 to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$
   128 you can querry as follows:
   129 you can querry as follows:
   129 
   130 
   130 
   131 
   131 
   132 
   283 
   284 
   284 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
   285 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
   285 \mbox{}\hfill\mbox{[1 Mark]}
   286 \mbox{}\hfill\mbox{[1 Mark]}
   286 
   287 
   287 \item[(3)] Implement the function \textit{simp}, which recursively
   288 \item[(3)] Implement the function \textit{simp}, which recursively
   288   traverses a regular expression from the inside to the outside, and
   289   traverses a regular expression, and on the way up simplifies every
   289   on the way simplifies every regular expression on the left (see
   290   regular expression on the left (see below) to the regular expression
   290   below) to the regular expression on the right, except it does not
   291   on the right, except it does not simplify inside ${}^*$-regular
   291   simplify inside ${}^*$-regular expressions.
   292   expressions.
   292 
   293 
   293   \begin{center}
   294   \begin{center}
   294 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
   295 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
   295 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
   296 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
   296 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
   297 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
   306   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
   307   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
   307 
   308 
   308   simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
   309   simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
   309   seen as trees and there are several methods for traversing
   310   seen as trees and there are several methods for traversing
   310   trees. One of them corresponds to the inside-out traversal, which is
   311   trees. One of them corresponds to the inside-out traversal, which is
   311   sometimes also called post-order traversal'' you traverse inside the
   312   sometimes also called post-order traversal: you traverse inside the
   312   tree and on the way up, you apply simplification rules.
   313   tree and on the way up you apply simplification rules.
   313   Furthermore,
   314   Furthermore,
   314   remember numerical expressions from school times: there you had expressions
   315   remember numerical expressions from school times: there you had expressions
   315   like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
   316   like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
   316   and simplification rules that looked very similar to rules
   317   and simplification rules that looked very similar to rules
   317   above. You would simplify such numerical expressions by replacing
   318   above. You would simplify such numerical expressions by replacing
   318   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
   319   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
   319   look whether more rules are applicable. If you organise the
   320   look whether more rules are applicable. If you organise the
   320   simplification in an inside-out fashion, it is always clear which
   321   simplification in an inside-out fashion, it is always clear which
   321   rule should be applied next.\hfill[1 Mark]
   322   simplification should be applied next.\hfill[1 Mark]
   322 
   323 
   323 \item[(4)] Implement two functions: The first, called \textit{ders},
   324 \item[(4)] Implement two functions: The first, called \textit{ders},
   324   takes a list of characters and a regular expression as arguments, and
   325   takes a list of characters and a regular expression as arguments, and
   325   builds the derivative w.r.t.~the list as follows:
   326   builds the derivative w.r.t.~the list as follows:
   326 
   327 
   357 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
   358 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
   358 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
   359 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
   359 \end{tabular}
   360 \end{tabular}
   360 \end{center}
   361 \end{center}
   361 
   362 
   362 You can use \textit{size} in order to test how much the `evil' regular
   363 You can use \textit{size} in order to test how much the ``evil'' regular
   363 expression $(a^*)^* \cdot b$ grows when taking successive derivatives
   364 expression $(a^*)^* \cdot b$ grows when taking successive derivatives
   364 according the letter $a$ without simplification and then compare it to
   365 according the letter $a$ without simplification and then compare it to
   365 taking the derivative, but simplify the result.  The sizes
   366 taking the derivative, but simplify the result.  The sizes
   366 are given in \texttt{re.scala}. \hfill[1 Mark]
   367 are given in \texttt{re.scala}. \hfill[1 Mark]
   367 
   368 
   495 convenient to work with the usual infix notation for arithmetic
   496 convenient to work with the usual infix notation for arithmetic
   496 expressions. Most modern calculators (as opposed to the ones used 20
   497 expressions. Most modern calculators (as opposed to the ones used 20
   497 years ago) understand infix notation. So why on Earth? \ldots{}Well,
   498 years ago) understand infix notation. So why on Earth? \ldots{}Well,
   498 many computers under the hood, even nowadays, use postfix notation
   499 many computers under the hood, even nowadays, use postfix notation
   499 extensively. For example if you give to the Java compiler the
   500 extensively. For example if you give to the Java compiler the
   500 expression $1 + ((2 * 3) + (4 - 3))$, it will generate for it the Java Byte
   501 expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte
   501 code
   502 code
   502 
   503 
   503 \begin{lstlisting}[language=JVMIS,numbers=none]
   504 \begin{lstlisting}[language=JVMIS,numbers=none]
   504 ldc 1 
   505 ldc 1 
   505 ldc 2 
   506 ldc 2 
   520 \noindent
   521 \noindent
   521 The shunting yard algorithm processes an input token list using an
   522 The shunting yard algorithm processes an input token list using an
   522 operator stack and an output list. The input consists of numbers,
   523 operator stack and an output list. The input consists of numbers,
   523 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of
   524 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of
   524 the assignment we assume the input is always a well-formed expression
   525 the assignment we assume the input is always a well-formed expression
   525 in infix notation.  The algorithm uses information about the
   526 in infix notation.  The calculation in the shunting yard algorithm uses
       
   527 information about the
   526 precedences of the operators (given in the template file). The
   528 precedences of the operators (given in the template file). The
   527 algorithm processes the input token list as follows:
   529 algorithm processes the input token list as follows:
   528 
   530 
   529 \begin{itemize}
   531 \begin{itemize}
   530 \item If there is a number as input token, then this token is
   532 \item If there is a number as input token, then this token is
   531   transferred to the output list. Then the rest of the input is
   533   transferred directly to the output list. Then the rest of the input is
   532   processed.
   534   processed.
   533 \item If there is an operator as input token, then you need to check
   535 \item If there is an operator as input token, then you need to check
   534   what is on top of the operator stack. If there are operators with
   536   what is on top of the operator stack. If there are operators with
   535   a higher or equal precedence, these operators are first popped off
   537   a higher or equal precedence, these operators are first popped off
   536   the stack and moved to the output list. Then the operator from the input
   538   from the stack and moved to the output list. Then the operator from the input
   537   is pushed onto the stack and the rest of the input is processed.
   539   is pushed onto the stack and the rest of the input is processed.
   538 \item If the input is a left-parenthesis, you push it on to the stack
   540 \item If the input is a left-parenthesis, you push it on to the stack
   539   and continue processing the input.
   541   and continue processing the input.
   540 \item If the input is a right-parenthesis, then you move all operators
   542 \item If the input is a right-parenthesis, then you pop off all operators
   541   from the stack to the output list until you reach the left-parenthesis.
   543   from the stack to the output list until you reach the left-parenthesis.
   542   Then you discharge the $($ and $)$ from the input and stack, and continue
   544   Then you discharge the $($ and $)$ from the input and stack, and continue
   543   processing.
   545   processing the input list.
   544 \item If the input is empty, then you move all remaining operators
   546 \item If the input is empty, then you move all remaining operators
   545   from the stack to the output list.  
   547   from the stack to the output list.  
   546 \end{itemize}  
   548 \end{itemize}  
   547 
   549 
   548 \subsubsection*{Tasks (file postfix.scala)}
   550 \subsubsection*{Tasks (file postfix.scala)}
   549 
   551 
   550 \begin{itemize}
   552 \begin{itemize}
   551 \item[(7)] Implement the shunting yard algorithm outlined above. The
   553 \item[(7)] Implement the shunting yard algorithm described above. The
   552   function, called \texttt{syard}, takes a list of tokens as first
   554   function, called \texttt{syard}, takes a list of tokens as first
   553   argument. The second and third arguments are the stack and output
   555   argument. The second and third arguments are the stack and output
   554   list represented as Scala lists. The most convenient way to
   556   list represented as Scala lists. The most convenient way to
   555   implement this algorithm is to analyse what the input list, stack
   557   implement this algorithm is to analyse what the input list, stack
   556   and output look like in each step using pattern-matching. The
   558   and output list look like in each step using pattern-matching. The
   557   algorithm transforms for example the input
   559   algorithm transforms for example the input
   558 
   560 
   559   \[
   561   \[
   560   \texttt{List(3, +, 4, *, (, 2, -, 1, ))}
   562   \texttt{List(3, +, 4, *, (, 2, -, 1, ))}
   561   \]
   563   \]