--- a/cws/main_cw03.tex Thu Nov 04 12:20:12 2021 +0000
+++ b/cws/main_cw03.tex Fri Nov 05 16:47:55 2021 +0000
@@ -119,7 +119,7 @@
% BF IDE
% https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
-\section*{Main Part 3 (Scala, 7 Marks)}
+\section*{Main Part 3 (Scala, 6 Marks)}
%\mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\
%\mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\
@@ -150,7 +150,7 @@
computer. For example you can call Scala on the command line with the
option \texttt{-cp re.jar} and then query any function from the
\texttt{re.scala} template file. As usual you have to prefix the calls
-with \texttt{CW8c} or import this object. Since some tasks
+with \texttt{M3} or import this object. Since some tasks
are time sensitive, you can check the reference implementation as
follows: if you want to know, for example, how long it takes to match
strings of $a$'s using the regular expression $(a^*)^*\cdot b$ you can
@@ -159,7 +159,7 @@
\begin{lstlisting}[xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small]
$ scala -cp re.jar
-scala> import CW8c._
+scala> import M3._
scala> for (i <- 0 to 5000000 by 500000) {
| println(f"$i: ${time_needed(2, matcher(EVIL, "a" * i))}%.5f secs.")
| }
@@ -230,12 +230,21 @@
$\ZERO$ & $\mapsto$ & \texttt{ZERO}\\
$\ONE$ & $\mapsto$ & \texttt{ONE}\\
$c$ & $\mapsto$ & \texttt{CHAR(c)}\\
+ $\sum rs$ & $\mapsto$ & \texttt{ALTs(rs)}\\
$r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\
$r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\
$r^*$ & $\mapsto$ & \texttt{STAR(r)} & \texttt{r.\%}
\end{tabular}
\end{center}
+\noindent
+The alternative regular expressions comes in two versions: one is
+binary (+ / \texttt{ALT}) and the other is n-ary ($\sum$ /
+\texttt{ALTs}). The latter takes a list of regular expressions as
+arguments. In what follows we shall use $rs$ to stand for lists of
+regular expressions. The binary alternative can be seen as an abbreviation,
+that is $r_1 + r_2 \dn \sum\,[r_1, r_2]$. As a result we can ignore the
+binary version and only implement the n-ary alternative.
\begin{itemize}
\item[(1)] Implement a function, called \textit{nullable}, by
@@ -250,11 +259,11 @@
$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
$\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\
$\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\
-$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
+$\textit{nullable}(\sum rs)$ & $\dn$ & $\exists r \in rs.\;\textit{nullable}(r)$\\
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
\end{tabular}
-\end{center}~\hfill[1 Mark]
+\end{center}~\hfill[0.5 Marks]
\item[(2)] Implement a function, called \textit{der}, by recursion over
regular expressions. It takes a character and a regular expression
@@ -266,7 +275,7 @@
$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
$\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\
$\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
-$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
+$\textit{der}\;c\;(\sum\;[r_1,..,r_n])$ & $\dn$ & $\sum\;[\textit{der}\;c\;r_1,..,\textit{der}\;c\;r_n]$\\
$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
& & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
& & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
@@ -312,7 +321,32 @@
Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
\mbox{}\hfill\mbox{[1 Mark]}
-\item[(3)] Implement the function \textit{simp}, which recursively
+\item[(3)] We next want to simplify regular expressions: essentially
+ we want to remove $\ZERO$ in regular expressions like $r + \ZERO$
+ and $\ZERO + r$. However, our n-ary alternative takes a list of
+ regular expressions as argument, we therefore need a more general
+ ``flatten'' function, called \texttt{flts}. This function should
+ analyse a list of regular expressions, say $rs$, as follows:
+
+ \begin{center}
+ \begin{tabular}{lllll}
+ 1) &$rs = []$ & $\dn$ & $[]$ & (empty list)\\
+ 2) &$rs = \ZERO :: rest$ & $\dn$ & $\textrm{flatten}\;rest$\\
+ 3) &$rs = (\sum rs_1) :: rest$ & $\dn$ & $rs_1 ::: \textrm{flatten}\;rest$\\
+ 4) &$rs = r :: rest$ & $\dn$ & $r :: \textrm{flatten}\;rest$ & (otherwise)\\
+ \end{tabular}
+ \end{center}
+
+ The first clause just states that empty lists cannot be further
+ flattened. The second removes all $\ZERO$s from the list.
+ The third is when the first regular expression is an \texttt{ALTs}, then
+ the content of this alternative should be spilled out and appended
+ with the flattened rest of the list. The last case is for all other
+ cases where the head of the list is not $\ZERO$ and not an \texttt{ALTs},
+ then we just keep the head of the list and flatten the rest.
+ \mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(4)] Implement the function \textit{simp}, which recursively
traverses a regular expression, and on the way up simplifies every
regular expression on the left (see below) to the regular expression
on the right, except it does not simplify inside ${}^*$-regular
@@ -324,31 +358,35 @@
$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
$r \cdot \ONE$ & $\mapsto$ & $r$\\
$\ONE \cdot r$ & $\mapsto$ & $r$\\
-$r + \ZERO$ & $\mapsto$ & $r$\\
-$\ZERO + r$ & $\mapsto$ & $r$\\
-$r + r$ & $\mapsto$ & $r$\\
+$\sum\;[r_1,..,r_n]$ & $\mapsto$ & $\sum\;(\texttt{(flts + distinct)}[simp(r_1),..,simp(r_n)])$\\
\end{tabular}
- \end{center}
+\end{center}
+
+ The last case is as follows: first apply $simp$ to all regular expressions
+ $r_1,.. ,r_n$; then flatten the resulting list using \texttt{flts};
+ finally remove all duplicates in this list (this can be done in Scala
+ using the function \texttt{\_.distinct}).
For example the regular expression
\[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
seen as trees and there are several methods for traversing
- trees. One of them corresponds to the inside-out traversal, which is also
- sometimes called post-order tra\-versal: you traverse inside the
- tree and on the way up you apply simplification rules.
- \textbf{Another Hint:}
- Remember numerical expressions from school times---there you had expressions
- like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
- and simplification rules that looked very similar to rules
- above. You would simplify such numerical expressions by replacing
- for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
- look whether more rules are applicable. If you organise the
- simplification in an inside-out fashion, it is always clear which
- simplification should be applied next.\hfill[1 Mark]
+ trees. One of them corresponds to the inside-out traversal, which is
+ also sometimes called post-order tra\-versal: you traverse inside
+ the tree and on the way up you apply simplification rules.
+ \textbf{Another Hint:} Remember numerical expressions from school
+ times---there you had expressions like
+ $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ and
+ simplification rules that looked very similar to rules above. You
+ would simplify such numerical expressions by replacing for example
+ the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then look whether
+ more rules are applicable. If you regard regular expressions as
+ trees and if you organise the simplification in an inside-out
+ fashion, it is always clear which simplification should be applied
+ next.\mbox{}\hfill\mbox{[1 Mark]}
-\item[(4)] Implement two functions: The first, called \textit{ders},
+\item[(5)] Implement two functions: The first, called \textit{ders},
takes a list of characters and a regular expression as arguments, and
builds the derivative w.r.t.~the list as follows:
@@ -371,7 +409,7 @@
true for the regular expression $(a\cdot b)\cdot c$ and the string
$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
-\item[(5)] Implement a function, called \textit{size}, by recursion
+\item[(6)] Implement a function, called \textit{size}, by recursion
over regular expressions. If a regular expression is seen as a tree,
then \textit{size} should return the number of nodes in such a
tree. Therefore this function is defined as follows:
@@ -381,7 +419,7 @@
$\textit{size}(\ZERO)$ & $\dn$ & $1$\\
$\textit{size}(\ONE)$ & $\dn$ & $1$\\
$\textit{size}(c)$ & $\dn$ & $1$\\
-$\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
+$\textit{size}(\sum\,[r_1,..,r_n]$ & $\dn$ & $1 + \textit{size}(r_1) + ... + \textit{size}(r_n)$\\
$\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
$\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
\end{tabular}
@@ -391,9 +429,9 @@
expression $(a^*)^* \cdot b$ grows when taking successive derivatives
according the letter $a$ without simplification and then compare it to
taking the derivative, but simplify the result. The sizes
-are given in \texttt{re.scala}. \hfill[1 Mark]
+are given in \texttt{re.scala}. \hfill[0.5 Marks]
-\item[(6)] You do not have to implement anything specific under this
+\item[(7)] You do not have to implement anything specific under this
task. The purpose here is that you will be marked for some ``power''
test cases. For example can your matcher decide within 30 seconds
whether the regular expression $(a^*)^*\cdot b$ matches strings of the
@@ -406,20 +444,22 @@
\noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested
50 or more times?\\
- \mbox{}\hfill[2 Marks]
+ \mbox{}\hfill[1 Mark]
\end{itemize}
\subsection*{Background}
-Although easily implementable in Scala, the idea behind the derivative
-function might not so easy to be seen. To understand its purpose
-better, assume a regular expression $r$ can match strings of the form
-$c\!::\!cs$ (that means strings which start with a character $c$ and have
-some rest, or tail, $cs$). If you take the derivative of $r$ with
-respect to the character $c$, then you obtain a regular expression
-that can match all the strings $cs$. In other words, the regular
-expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
-that can be matched by $r$, except that the $c$ is chopped off.
+Although easily implementable in Scala (ok maybe the \texttt{simp} functions and
+\texttt{ALTs} needs a bit more thinking), the idea behind the
+derivative function might not so easy to be seen. To understand its
+purpose better, assume a regular expression $r$ can match strings of
+the form $c\!::\!cs$ (that means strings which start with a character
+$c$ and have some rest, or tail, $cs$). If you take the derivative of
+$r$ with respect to the character $c$, then you obtain a regular
+expression that can match all the strings $cs$. In other words, the
+regular expression $\textit{der}\;c\;r$ can match the same strings
+$c\!::\!cs$ that can be matched by $r$, except that the $c$ is chopped
+off.
Assume now $r$ can match the string $abc$. If you take the derivative
according to $a$ then you obtain a regular expression that can match
@@ -451,12 +491,13 @@
Java\footnote{Version 8 and below; Version 9 and above does not seem to be as
catastrophic, but still much worse than the regular expression
matcher based on derivatives.}, JavaScript and Python with the
-`evil' regular expression $(a^*)^*\cdot b$, then have a look at the
+``evil'' regular expression $(a^*)^*\cdot b$, then have a look at the
graphs below (you can try it out for yourself: have a look at the files
\texttt{catastrophic9.java}, \texttt{catastrophic.js},
\texttt{catastrophic.py} etc on KEATS). Compare this with the matcher you
-have implemented. How long can the string of $a$'s be in your matcher
-and still stay within the 30 seconds time limit?
+have implemented. How long can a string of $a$'s be in your matcher
+and still stay within the 30 seconds time limit? It should be muuuch better
+than your off-the-shelf matcher in your bog-standard language.
\begin{center}
\begin{tabular}{@{}cc@{}}
@@ -477,7 +518,7 @@
scaled ticks=false,
axis lines=left,
width=6cm,
- height=5.5cm,
+ height=5.0cm,
legend entries={Python, Java 8, JavaScript, Swift, Dart},
legend pos=north west,
legend cell align=left]
@@ -503,7 +544,7 @@
scaled ticks=false,
axis lines=left,
width=6cm,
- height=5.5cm,
+ height=5.0cm,
legend entries={Java 9},
legend pos=north west]
\addplot[cyan,mark=*, mark options={fill=white}] table {re-java9.data};