diff -r 017f621f5835 -r 3ffe978a5664 cws/main_cw03.tex --- a/cws/main_cw03.tex Thu Nov 04 12:20:12 2021 +0000 +++ b/cws/main_cw03.tex Fri Nov 05 16:47:55 2021 +0000 @@ -119,7 +119,7 @@ % BF IDE % https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5 -\section*{Main Part 3 (Scala, 7 Marks)} +\section*{Main Part 3 (Scala, 6 Marks)} %\mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\ %\mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\ @@ -150,7 +150,7 @@ computer. For example you can call Scala on the command line with the option \texttt{-cp re.jar} and then query any function from the \texttt{re.scala} template file. As usual you have to prefix the calls -with \texttt{CW8c} or import this object. Since some tasks +with \texttt{M3} or import this object. Since some tasks are time sensitive, you can check the reference implementation as follows: if you want to know, for example, how long it takes to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$ you can @@ -159,7 +159,7 @@ \begin{lstlisting}[xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small] $ scala -cp re.jar -scala> import CW8c._ +scala> import M3._ scala> for (i <- 0 to 5000000 by 500000) { | println(f"$i: ${time_needed(2, matcher(EVIL, "a" * i))}%.5f secs.") | } @@ -230,12 +230,21 @@ $\ZERO$ & $\mapsto$ & \texttt{ZERO}\\ $\ONE$ & $\mapsto$ & \texttt{ONE}\\ $c$ & $\mapsto$ & \texttt{CHAR(c)}\\ + $\sum rs$ & $\mapsto$ & \texttt{ALTs(rs)}\\ $r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\ $r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\ $r^*$ & $\mapsto$ & \texttt{STAR(r)} & \texttt{r.\%} \end{tabular} \end{center} +\noindent +The alternative regular expressions comes in two versions: one is +binary (+ / \texttt{ALT}) and the other is n-ary ($\sum$ / +\texttt{ALTs}). The latter takes a list of regular expressions as +arguments. In what follows we shall use $rs$ to stand for lists of +regular expressions. The binary alternative can be seen as an abbreviation, +that is $r_1 + r_2 \dn \sum\,[r_1, r_2]$. As a result we can ignore the +binary version and only implement the n-ary alternative. \begin{itemize} \item[(1)] Implement a function, called \textit{nullable}, by @@ -250,11 +259,11 @@ $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\ $\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\ $\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\ -$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ +$\textit{nullable}(\sum rs)$ & $\dn$ & $\exists r \in rs.\;\textit{nullable}(r)$\\ $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\ $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\ \end{tabular} -\end{center}~\hfill[1 Mark] +\end{center}~\hfill[0.5 Marks] \item[(2)] Implement a function, called \textit{der}, by recursion over regular expressions. It takes a character and a regular expression @@ -266,7 +275,7 @@ $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\ $\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\ $\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\ -$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\ +$\textit{der}\;c\;(\sum\;[r_1,..,r_n])$ & $\dn$ & $\sum\;[\textit{der}\;c\;r_1,..,\textit{der}\;c\;r_n]$\\ $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\ & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\ & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\ @@ -312,7 +321,32 @@ Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\ \mbox{}\hfill\mbox{[1 Mark]} -\item[(3)] Implement the function \textit{simp}, which recursively +\item[(3)] We next want to simplify regular expressions: essentially + we want to remove $\ZERO$ in regular expressions like $r + \ZERO$ + and $\ZERO + r$. However, our n-ary alternative takes a list of + regular expressions as argument, we therefore need a more general + ``flatten'' function, called \texttt{flts}. This function should + analyse a list of regular expressions, say $rs$, as follows: + + \begin{center} + \begin{tabular}{lllll} + 1) &$rs = []$ & $\dn$ & $[]$ & (empty list)\\ + 2) &$rs = \ZERO :: rest$ & $\dn$ & $\textrm{flatten}\;rest$\\ + 3) &$rs = (\sum rs_1) :: rest$ & $\dn$ & $rs_1 ::: \textrm{flatten}\;rest$\\ + 4) &$rs = r :: rest$ & $\dn$ & $r :: \textrm{flatten}\;rest$ & (otherwise)\\ + \end{tabular} + \end{center} + + The first clause just states that empty lists cannot be further + flattened. The second removes all $\ZERO$s from the list. + The third is when the first regular expression is an \texttt{ALTs}, then + the content of this alternative should be spilled out and appended + with the flattened rest of the list. The last case is for all other + cases where the head of the list is not $\ZERO$ and not an \texttt{ALTs}, + then we just keep the head of the list and flatten the rest. + \mbox{}\hfill\mbox{[1 Mark]} + +\item[(4)] Implement the function \textit{simp}, which recursively traverses a regular expression, and on the way up simplifies every regular expression on the left (see below) to the regular expression on the right, except it does not simplify inside ${}^*$-regular @@ -324,31 +358,35 @@ $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ $r \cdot \ONE$ & $\mapsto$ & $r$\\ $\ONE \cdot r$ & $\mapsto$ & $r$\\ -$r + \ZERO$ & $\mapsto$ & $r$\\ -$\ZERO + r$ & $\mapsto$ & $r$\\ -$r + r$ & $\mapsto$ & $r$\\ +$\sum\;[r_1,..,r_n]$ & $\mapsto$ & $\sum\;(\texttt{(flts + distinct)}[simp(r_1),..,simp(r_n)])$\\ \end{tabular} - \end{center} +\end{center} + + The last case is as follows: first apply $simp$ to all regular expressions + $r_1,.. ,r_n$; then flatten the resulting list using \texttt{flts}; + finally remove all duplicates in this list (this can be done in Scala + using the function \texttt{\_.distinct}). For example the regular expression \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be seen as trees and there are several methods for traversing - trees. One of them corresponds to the inside-out traversal, which is also - sometimes called post-order tra\-versal: you traverse inside the - tree and on the way up you apply simplification rules. - \textbf{Another Hint:} - Remember numerical expressions from school times---there you had expressions - like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ - and simplification rules that looked very similar to rules - above. You would simplify such numerical expressions by replacing - for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then - look whether more rules are applicable. If you organise the - simplification in an inside-out fashion, it is always clear which - simplification should be applied next.\hfill[1 Mark] + trees. One of them corresponds to the inside-out traversal, which is + also sometimes called post-order tra\-versal: you traverse inside + the tree and on the way up you apply simplification rules. + \textbf{Another Hint:} Remember numerical expressions from school + times---there you had expressions like + $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ and + simplification rules that looked very similar to rules above. You + would simplify such numerical expressions by replacing for example + the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then look whether + more rules are applicable. If you regard regular expressions as + trees and if you organise the simplification in an inside-out + fashion, it is always clear which simplification should be applied + next.\mbox{}\hfill\mbox{[1 Mark]} -\item[(4)] Implement two functions: The first, called \textit{ders}, +\item[(5)] Implement two functions: The first, called \textit{ders}, takes a list of characters and a regular expression as arguments, and builds the derivative w.r.t.~the list as follows: @@ -371,7 +409,7 @@ true for the regular expression $(a\cdot b)\cdot c$ and the string $abc$, but false if you give it the string $ab$. \hfill[1 Mark] -\item[(5)] Implement a function, called \textit{size}, by recursion +\item[(6)] Implement a function, called \textit{size}, by recursion over regular expressions. If a regular expression is seen as a tree, then \textit{size} should return the number of nodes in such a tree. Therefore this function is defined as follows: @@ -381,7 +419,7 @@ $\textit{size}(\ZERO)$ & $\dn$ & $1$\\ $\textit{size}(\ONE)$ & $\dn$ & $1$\\ $\textit{size}(c)$ & $\dn$ & $1$\\ -$\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\ +$\textit{size}(\sum\,[r_1,..,r_n]$ & $\dn$ & $1 + \textit{size}(r_1) + ... + \textit{size}(r_n)$\\ $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\ $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\ \end{tabular} @@ -391,9 +429,9 @@ expression $(a^*)^* \cdot b$ grows when taking successive derivatives according the letter $a$ without simplification and then compare it to taking the derivative, but simplify the result. The sizes -are given in \texttt{re.scala}. \hfill[1 Mark] +are given in \texttt{re.scala}. \hfill[0.5 Marks] -\item[(6)] You do not have to implement anything specific under this +\item[(7)] You do not have to implement anything specific under this task. The purpose here is that you will be marked for some ``power'' test cases. For example can your matcher decide within 30 seconds whether the regular expression $(a^*)^*\cdot b$ matches strings of the @@ -406,20 +444,22 @@ \noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested 50 or more times?\\ - \mbox{}\hfill[2 Marks] + \mbox{}\hfill[1 Mark] \end{itemize} \subsection*{Background} -Although easily implementable in Scala, the idea behind the derivative -function might not so easy to be seen. To understand its purpose -better, assume a regular expression $r$ can match strings of the form -$c\!::\!cs$ (that means strings which start with a character $c$ and have -some rest, or tail, $cs$). If you take the derivative of $r$ with -respect to the character $c$, then you obtain a regular expression -that can match all the strings $cs$. In other words, the regular -expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$ -that can be matched by $r$, except that the $c$ is chopped off. +Although easily implementable in Scala (ok maybe the \texttt{simp} functions and +\texttt{ALTs} needs a bit more thinking), the idea behind the +derivative function might not so easy to be seen. To understand its +purpose better, assume a regular expression $r$ can match strings of +the form $c\!::\!cs$ (that means strings which start with a character +$c$ and have some rest, or tail, $cs$). If you take the derivative of +$r$ with respect to the character $c$, then you obtain a regular +expression that can match all the strings $cs$. In other words, the +regular expression $\textit{der}\;c\;r$ can match the same strings +$c\!::\!cs$ that can be matched by $r$, except that the $c$ is chopped +off. Assume now $r$ can match the string $abc$. If you take the derivative according to $a$ then you obtain a regular expression that can match @@ -451,12 +491,13 @@ Java\footnote{Version 8 and below; Version 9 and above does not seem to be as catastrophic, but still much worse than the regular expression matcher based on derivatives.}, JavaScript and Python with the -`evil' regular expression $(a^*)^*\cdot b$, then have a look at the +``evil'' regular expression $(a^*)^*\cdot b$, then have a look at the graphs below (you can try it out for yourself: have a look at the files \texttt{catastrophic9.java}, \texttt{catastrophic.js}, \texttt{catastrophic.py} etc on KEATS). Compare this with the matcher you -have implemented. How long can the string of $a$'s be in your matcher -and still stay within the 30 seconds time limit? +have implemented. How long can a string of $a$'s be in your matcher +and still stay within the 30 seconds time limit? It should be muuuch better +than your off-the-shelf matcher in your bog-standard language. \begin{center} \begin{tabular}{@{}cc@{}} @@ -477,7 +518,7 @@ scaled ticks=false, axis lines=left, width=6cm, - height=5.5cm, + height=5.0cm, legend entries={Python, Java 8, JavaScript, Swift, Dart}, legend pos=north west, legend cell align=left] @@ -503,7 +544,7 @@ scaled ticks=false, axis lines=left, width=6cm, - height=5.5cm, + height=5.0cm, legend entries={Java 9}, legend pos=north west] \addplot[cyan,mark=*, mark options={fill=white}] table {re-java9.data};