diff -r c5ca7f8e21a5 -r 114a89518aea cws/cw03.tex --- a/cws/cw03.tex Fri Nov 17 14:11:58 2017 +0000 +++ b/cws/cw03.tex Tue Nov 21 16:31:11 2017 +0000 @@ -8,22 +8,39 @@ This coursework is worth 10\%. It is about regular expressions, pattern matching and polymorphism. The first part is due on 30 -November at 11pm; the second, more advanced part, is due on 7 December -at 11pm. You are asked to implement a regular expression matcher. Make -sure the files you submit can be processed by just calling -\texttt{scala <>}.\bigskip +November at 11pm; the second, more advanced part, is due on 21 +December at 11pm. You are asked to implement a regular expression +matcher based on derivatives of regular expressions. The reason is +that regular expression matching in Java can be extremely slow +sometimes.\bigskip \noindent -\textbf{Important:} Do not use any mutable data structures in your -submission! They are not needed. This menas you cannot use -\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your -code! It has a different meaning in Scala, than in Java. Do not use -\texttt{var}! This declares a mutable variable. Make sure the -functions you submit are defined on the ``top-level'' of Scala, not -inside a class or object. Also note that the running time of -each part will be restricted to a maximum of 360 seconds on my -laptop. +\textbf{Important:} + +\begin{itemize} +\item Make sure the files you submit can be processed by just calling\\ + \mbox{\texttt{scala <>}} on the commandline. Use the + template files provided and do not make any changes to arguments of + functions or to any types. You are free to implement any auxiliary + function you might need. + +\item Do not use any mutable data structures in your +submissions! They are not needed. This means you cannot create new +\texttt{Array}s or \texttt{ListBuffer}s, for example. +\item Do not use \texttt{return} in your code! It has a different + meaning in Scala, than in Java. + +\item Do not use \texttt{var}! This declares a mutable variable. Only + use \texttt{val}! + +\item Do not use any parallel collections! No \texttt{.par} therefore! + Our testing and marking infrastructure is not set up for it. +\end{itemize} + +\noindent +Also note that the running time of each part will be restricted to a +maximum of 360 seconds on my laptop \subsection*{Disclaimer} @@ -54,12 +71,13 @@ \end{center} \noindent -Why? Knowing how to match regular expressions and strings will -let you solve a lot of problems that vex other humans. Regular -expressions are one of the fastest and simplest ways to match patterns -in text, and are endlessly useful for searching, editing and -analysing text in all sorts of places. However, you need to be -fast, otherwise you will stumble over problems such as recently reported at +Why? Knowing how to match regular expressions and strings will let you +solve a lot of problems that vex other humans. Regular expressions are +one of the fastest and simplest ways to match patterns in text, and +are endlessly useful for searching, editing and analysing data in all +sorts of places (for example analysing network traffic in order to +detect security breaches). However, you need to be fast, otherwise you +will stumble over problems such as recently reported at {\small \begin{itemize} @@ -71,9 +89,10 @@ \subsubsection*{Tasks (file re.scala)} \begin{itemize} -\item[(1a)] Implement a function, called \textit{nullable}, by recursion over - regular expressions. This function tests whether a regular expression can match - the empty string. +\item[(1a)] Implement a function, called \textit{nullable}, by + recursion over regular expressions. This function tests whether a + regular expression can match the empty string, that is given a + regular expression it either returns true or false. \begin{center} \begin{tabular}{lcl} @@ -134,7 +153,8 @@ \begin{tabular}{lcll} $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\ $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\ - $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ + $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ & + (is $\textit{nullable}$) \end{tabular} \end{center} @@ -165,12 +185,12 @@ simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be seen as trees and there are several methods for traversing trees. One of them corresponds to the inside-out traversal. Also - remember numerical expressions from school: there you had exprssions - like $u + \ldots + (1 \cdot x) * \ldots (z + (y \cdot 0)) \ldots$ + remember numerical expressions from school: there you had expressions + like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ and simplification rules that looked very similar to rules above. You would simplify such numerical expressions by replacing for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then - look if more rules are applicable. If you organise this + look whether more rules are applicable. If you organise the simplification in an inside-out fashion, it is always clear which rule should applied next.\\\mbox{}\hfill[1 Mark] @@ -333,7 +353,7 @@ Although easily implementable in Scala, the idea behind the derivative function might not so easy to be seen. To understand its purpose better, assume a regular expression $r$ can match strings of the form -$c::cs$ (that means strings which start with a character $c$ and have +$c\!::\!cs$ (that means strings which start with a character $c$ and have some rest, or tail, $cs$). If you now take the derivative of $r$ with respect to the character $c$, then you obtain a regular expressions that can match all the strings $cs$. In other words, the regular