cws/main_cw03.tex
changeset 396 3ffe978a5664
parent 390 175a950470a9
child 415 fced9a61c881
--- a/cws/main_cw03.tex	Thu Nov 04 12:20:12 2021 +0000
+++ b/cws/main_cw03.tex	Fri Nov 05 16:47:55 2021 +0000
@@ -119,7 +119,7 @@
 % BF IDE
 % https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
   
-\section*{Main Part 3 (Scala, 7 Marks)}
+\section*{Main Part 3 (Scala, 6 Marks)}
 
 %\mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\
 %\mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\
@@ -150,7 +150,7 @@
 computer. For example you can call Scala on the command line with the
 option \texttt{-cp re.jar} and then query any function from the
 \texttt{re.scala} template file. As usual you have to prefix the calls
-with \texttt{CW8c} or import this object.  Since some tasks
+with \texttt{M3} or import this object.  Since some tasks
 are time sensitive, you can check the reference implementation as
 follows: if you want to know, for example, how long it takes to match
 strings of $a$'s using the regular expression $(a^*)^*\cdot b$ you can
@@ -159,7 +159,7 @@
 
 \begin{lstlisting}[xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small]
 $ scala -cp re.jar
-scala> import CW8c._  
+scala> import M3._  
 scala> for (i <- 0 to 5000000 by 500000) {
   | println(f"$i: ${time_needed(2, matcher(EVIL, "a" * i))}%.5f secs.")
   | }
@@ -230,12 +230,21 @@
   $\ZERO$ & $\mapsto$ & \texttt{ZERO}\\
   $\ONE$  & $\mapsto$ & \texttt{ONE}\\
   $c$     & $\mapsto$ & \texttt{CHAR(c)}\\
+  $\sum rs$ & $\mapsto$ & \texttt{ALTs(rs)}\\  
   $r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\
   $r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\
   $r^*$ & $\mapsto$ &  \texttt{STAR(r)} & \texttt{r.\%}
 \end{tabular}    
 \end{center}  
 
+\noindent
+The alternative regular expressions comes in two versions: one is
+binary (+ / \texttt{ALT}) and the other is n-ary ($\sum$ /
+\texttt{ALTs}). The latter takes a list of regular expressions as
+arguments.  In what follows we shall use $rs$ to stand for lists of
+regular expressions.  The binary alternative can be seen as an abbreviation,
+that is $r_1 + r_2 \dn \sum\,[r_1, r_2]$. As a result we can ignore the
+binary version and only implement the n-ary alternative.
 
 \begin{itemize}
 \item[(1)] Implement a function, called \textit{nullable}, by
@@ -250,11 +259,11 @@
 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
 $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
-$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
+$\textit{nullable}(\sum rs)$ & $\dn$ & $\exists r \in rs.\;\textit{nullable}(r)$\\
 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
 \end{tabular}
-\end{center}~\hfill[1 Mark]
+\end{center}~\hfill[0.5 Marks]
 
 \item[(2)] Implement a function, called \textit{der}, by recursion over
   regular expressions. It takes a character and a regular expression
@@ -266,7 +275,7 @@
 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
 $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
 $\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
-$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
+$\textit{der}\;c\;(\sum\;[r_1,..,r_n])$ & $\dn$ & $\sum\;[\textit{der}\;c\;r_1,..,\textit{der}\;c\;r_n]$\\
 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
       & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
       & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
@@ -312,7 +321,32 @@
 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
 \mbox{}\hfill\mbox{[1 Mark]}
 
-\item[(3)] Implement the function \textit{simp}, which recursively
+\item[(3)] We next want to simplify regular expressions: essentially
+  we want to remove $\ZERO$ in regular expressions like $r + \ZERO$
+  and $\ZERO + r$.  However, our n-ary alternative takes a list of
+  regular expressions as argument, we therefore need a more general
+  ``flatten'' function, called \texttt{flts}. This function should
+  analyse a list of regular expressions, say $rs$, as follows:
+
+  \begin{center}
+    \begin{tabular}{lllll}
+      1) &$rs = []$                & $\dn$ & $[]$ & (empty list)\\
+      2) &$rs = \ZERO :: rest$     & $\dn$ & $\textrm{flatten}\;rest$\\
+      3) &$rs = (\sum rs_1) :: rest$ & $\dn$ & $rs_1 ::: \textrm{flatten}\;rest$\\
+      4) &$rs = r :: rest$         & $\dn$ & $r :: \textrm{flatten}\;rest$ & (otherwise)\\
+    \end{tabular}  
+  \end{center}  
+
+  The first clause just states that empty lists cannot be further
+  flattened. The second removes all $\ZERO$s from the list.
+  The third is when the first regular expression is an \texttt{ALTs}, then
+  the content of this alternative should be spilled out and appended
+  with the flattened rest of the list. The last case is for all other
+  cases where the head of the list is not $\ZERO$ and not an \texttt{ALTs},
+  then we just keep the head of the list and flatten the rest.
+  \mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(4)] Implement the function \textit{simp}, which recursively
   traverses a regular expression, and on the way up simplifies every
   regular expression on the left (see below) to the regular expression
   on the right, except it does not simplify inside ${}^*$-regular
@@ -324,31 +358,35 @@
 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
-$r + \ZERO$ & $\mapsto$ & $r$\\ 
-$\ZERO + r$ & $\mapsto$ & $r$\\ 
-$r + r$ & $\mapsto$ & $r$\\ 
+$\sum\;[r_1,..,r_n]$ & $\mapsto$ & $\sum\;(\texttt{(flts + distinct)}[simp(r_1),..,simp(r_n)])$\\  
 \end{tabular}
-  \end{center}
+\end{center}
+
+  The last case is as follows: first apply $simp$ to all regular expressions
+  $r_1,.. ,r_n$; then flatten the resulting list using \texttt{flts};
+  finally remove all duplicates in this list (this can be done in Scala
+  using the function \texttt{\_.distinct}).
 
   For example the regular expression
   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
 
   simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
   seen as trees and there are several methods for traversing
-  trees. One of them corresponds to the inside-out traversal, which is also
-  sometimes called post-order tra\-versal: you traverse inside the
-  tree and on the way up you apply simplification rules.
-  \textbf{Another Hint:}
-  Remember numerical expressions from school times---there you had expressions
-  like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
-  and simplification rules that looked very similar to rules
-  above. You would simplify such numerical expressions by replacing
-  for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
-  look whether more rules are applicable. If you organise the
-  simplification in an inside-out fashion, it is always clear which
-  simplification should be applied next.\hfill[1 Mark]
+  trees. One of them corresponds to the inside-out traversal, which is
+  also sometimes called post-order tra\-versal: you traverse inside
+  the tree and on the way up you apply simplification rules.
+  \textbf{Another Hint:} Remember numerical expressions from school
+  times---there you had expressions like
+  $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ and
+  simplification rules that looked very similar to rules above. You
+  would simplify such numerical expressions by replacing for example
+  the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then look whether
+  more rules are applicable. If you regard regular expressions as
+  trees and if you organise the simplification in an inside-out
+  fashion, it is always clear which simplification should be applied
+  next.\mbox{}\hfill\mbox{[1 Mark]}
 
-\item[(4)] Implement two functions: The first, called \textit{ders},
+\item[(5)] Implement two functions: The first, called \textit{ders},
   takes a list of characters and a regular expression as arguments, and
   builds the derivative w.r.t.~the list as follows:
 
@@ -371,7 +409,7 @@
 true for the regular expression $(a\cdot b)\cdot c$ and the string
 $abc$, but false if you give it the string $ab$. \hfill[1 Mark]
 
-\item[(5)] Implement a function, called \textit{size}, by recursion
+\item[(6)] Implement a function, called \textit{size}, by recursion
   over regular expressions. If a regular expression is seen as a tree,
   then \textit{size} should return the number of nodes in such a
   tree. Therefore this function is defined as follows:
@@ -381,7 +419,7 @@
 $\textit{size}(\ZERO)$ & $\dn$ & $1$\\
 $\textit{size}(\ONE)$  & $\dn$ & $1$\\
 $\textit{size}(c)$     & $\dn$ & $1$\\
-$\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
+$\textit{size}(\sum\,[r_1,..,r_n]$ & $\dn$ & $1 + \textit{size}(r_1) + ... + \textit{size}(r_n)$\\
 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
 \end{tabular}
@@ -391,9 +429,9 @@
 expression $(a^*)^* \cdot b$ grows when taking successive derivatives
 according the letter $a$ without simplification and then compare it to
 taking the derivative, but simplify the result.  The sizes
-are given in \texttt{re.scala}. \hfill[1 Mark]
+are given in \texttt{re.scala}. \hfill[0.5 Marks]
 
-\item[(6)] You do not have to implement anything specific under this
+\item[(7)] You do not have to implement anything specific under this
   task.  The purpose here is that you will be marked for some ``power''
   test cases. For example can your matcher decide within 30 seconds
   whether the regular expression $(a^*)^*\cdot b$ matches strings of the
@@ -406,20 +444,22 @@
 
   \noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested
   50 or more times?\\
-  \mbox{}\hfill[2 Marks]
+  \mbox{}\hfill[1 Mark]
 \end{itemize}
 
 \subsection*{Background}
 
-Although easily implementable in Scala, the idea behind the derivative
-function might not so easy to be seen. To understand its purpose
-better, assume a regular expression $r$ can match strings of the form
-$c\!::\!cs$ (that means strings which start with a character $c$ and have
-some rest, or tail, $cs$). If you take the derivative of $r$ with
-respect to the character $c$, then you obtain a regular expression
-that can match all the strings $cs$.  In other words, the regular
-expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
-that can be matched by $r$, except that the $c$ is chopped off.
+Although easily implementable in Scala (ok maybe the \texttt{simp} functions and
+\texttt{ALTs} needs a bit more thinking), the idea behind the
+derivative function might not so easy to be seen. To understand its
+purpose better, assume a regular expression $r$ can match strings of
+the form $c\!::\!cs$ (that means strings which start with a character
+$c$ and have some rest, or tail, $cs$). If you take the derivative of
+$r$ with respect to the character $c$, then you obtain a regular
+expression that can match all the strings $cs$.  In other words, the
+regular expression $\textit{der}\;c\;r$ can match the same strings
+$c\!::\!cs$ that can be matched by $r$, except that the $c$ is chopped
+off.
 
 Assume now $r$ can match the string $abc$. If you take the derivative
 according to $a$ then you obtain a regular expression that can match
@@ -451,12 +491,13 @@
 Java\footnote{Version 8 and below; Version 9 and above does not seem to be as
   catastrophic, but still much worse than the regular expression
   matcher based on derivatives.}, JavaScript and Python with the
-`evil' regular expression $(a^*)^*\cdot b$, then have a look at the
+``evil'' regular expression $(a^*)^*\cdot b$, then have a look at the
 graphs below (you can try it out for yourself: have a look at the files
 \texttt{catastrophic9.java}, \texttt{catastrophic.js},
 \texttt{catastrophic.py} etc on KEATS). Compare this with the matcher you
-have implemented. How long can the string of $a$'s be in your matcher
-and still stay within the 30 seconds time limit?
+have implemented. How long can a string of $a$'s be in your matcher
+and still stay within the 30 seconds time limit? It should be muuuch better
+than your off-the-shelf matcher in your bog-standard language.
 
 \begin{center}
 \begin{tabular}{@{}cc@{}}
@@ -477,7 +518,7 @@
     scaled ticks=false,
     axis lines=left,
     width=6cm,
-    height=5.5cm, 
+    height=5.0cm, 
     legend entries={Python, Java 8, JavaScript, Swift, Dart},  
     legend pos=north west,
     legend cell align=left]
@@ -503,7 +544,7 @@
     scaled ticks=false,
     axis lines=left,
     width=6cm,
-    height=5.5cm, 
+    height=5.0cm, 
     legend entries={Java 9},  
     legend pos=north west]
 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java9.data};