cws/cw03.tex
changeset 152 114a89518aea
parent 110 62389faa66e4
child 153 4383809c176a
equal deleted inserted replaced
151:c5ca7f8e21a5 152:114a89518aea
     6 
     6 
     7 \section*{Coursework 8 (Scala, Regular Expressions)}
     7 \section*{Coursework 8 (Scala, Regular Expressions)}
     8 
     8 
     9 This coursework is worth 10\%. It is about regular expressions,
     9 This coursework is worth 10\%. It is about regular expressions,
    10 pattern matching and polymorphism. The first part is due on 30
    10 pattern matching and polymorphism. The first part is due on 30
    11 November at 11pm; the second, more advanced part, is due on 7 December
    11 November at 11pm; the second, more advanced part, is due on 21
    12 at 11pm. You are asked to implement a regular expression matcher. Make
    12 December at 11pm. You are asked to implement a regular expression
    13 sure the files you submit can be processed by just calling
    13 matcher based on derivatives of regular expressions. The reason is
    14 \texttt{scala <<filename.scala>>}.\bigskip
    14 that regular expression matching in Java can be extremely slow
       
    15 sometimes.\bigskip
    15 
    16 
    16 \noindent
    17 \noindent
    17 \textbf{Important:} Do not use any mutable data structures in your
    18 \textbf{Important:}
    18 submission! They are not needed. This menas you cannot use 
    19 
    19 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
    20 \begin{itemize}
    20 code! It has a different meaning in Scala, than in Java.  Do not use
    21 \item Make sure the files you submit can be processed by just calling\\
    21 \texttt{var}! This declares a mutable variable.  Make sure the
    22   \mbox{\texttt{scala <<filename.scala>>}} on the commandline. Use the
    22 functions you submit are defined on the ``top-level'' of Scala, not
    23   template files provided and do not make any changes to arguments of
    23 inside a class or object. Also note that the running time of
    24   functions or to any types. You are free to implement any auxiliary
    24 each part will be restricted to a maximum of 360 seconds on my
    25   function you might need.
    25 laptop.
    26 
    26 
    27 \item Do not use any mutable data structures in your
       
    28 submissions! They are not needed. This means you cannot create new 
       
    29 \texttt{Array}s or \texttt{ListBuffer}s, for example. 
       
    30 
       
    31 \item Do not use \texttt{return} in your code! It has a different
       
    32   meaning in Scala, than in Java.
       
    33 
       
    34 \item Do not use \texttt{var}! This declares a mutable variable. Only
       
    35   use \texttt{val}!
       
    36 
       
    37 \item Do not use any parallel collections! No \texttt{.par} therefore!
       
    38   Our testing and marking infrastructure is not set up for it.
       
    39 \end{itemize}
       
    40 
       
    41 \noindent
       
    42 Also note that the running time of each part will be restricted to a
       
    43 maximum of 360 seconds on my laptop
    27 
    44 
    28 
    45 
    29 \subsection*{Disclaimer}
    46 \subsection*{Disclaimer}
    30 
    47 
    31 It should be understood that the work you submit represents
    48 It should be understood that the work you submit represents
    52       &   $|$ & $r^*$       & can match zero or more times $r$\\
    69       &   $|$ & $r^*$       & can match zero or more times $r$\\
    53 \end{tabular}
    70 \end{tabular}
    54 \end{center}
    71 \end{center}
    55 
    72 
    56 \noindent 
    73 \noindent 
    57 Why? Knowing how to match regular expressions and strings will
    74 Why? Knowing how to match regular expressions and strings will let you
    58 let you solve a lot of problems that vex other humans. Regular
    75 solve a lot of problems that vex other humans. Regular expressions are
    59 expressions are one of the fastest and simplest ways to match patterns
    76 one of the fastest and simplest ways to match patterns in text, and
    60 in text, and are endlessly useful for searching, editing and
    77 are endlessly useful for searching, editing and analysing data in all
    61 analysing text in all sorts of places. However, you need to be
    78 sorts of places (for example analysing network traffic in order to
    62 fast, otherwise you will stumble over problems such as recently reported at
    79 detect security breaches). However, you need to be fast, otherwise you
       
    80 will stumble over problems such as recently reported at
    63 
    81 
    64 {\small
    82 {\small
    65 \begin{itemize}
    83 \begin{itemize}
    66 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
    84 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
    67 \item[$\bullet$] \url{https://vimeo.com/112065252}
    85 \item[$\bullet$] \url{https://vimeo.com/112065252}
    69 \end{itemize}}
    87 \end{itemize}}
    70 
    88 
    71 \subsubsection*{Tasks (file re.scala)}
    89 \subsubsection*{Tasks (file re.scala)}
    72 
    90 
    73 \begin{itemize}
    91 \begin{itemize}
    74 \item[(1a)] Implement a function, called \textit{nullable}, by recursion over
    92 \item[(1a)] Implement a function, called \textit{nullable}, by
    75   regular expressions. This function tests whether a regular expression can match
    93   recursion over regular expressions. This function tests whether a
    76   the empty string.
    94   regular expression can match the empty string, that is given a
       
    95   regular expression it either returns true or false.
    77 
    96 
    78 \begin{center}
    97 \begin{center}
    79 \begin{tabular}{lcl}
    98 \begin{tabular}{lcl}
    80 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
    99 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
    81 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
   100 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
   132 
   151 
   133 \begin{center}
   152 \begin{center}
   134   \begin{tabular}{lcll}
   153   \begin{tabular}{lcll}
   135     $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
   154     $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
   136     $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
   155     $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
   137     $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$
   156     $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ &
       
   157     (is $\textit{nullable}$)                      
   138   \end{tabular}
   158   \end{tabular}
   139 \end{center}
   159 \end{center}
   140 
   160 
   141 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
   161 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
   142 \mbox{}\hfill\mbox{[1 Mark]}
   162 \mbox{}\hfill\mbox{[1 Mark]}
   163   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
   183   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
   164 
   184 
   165   simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be
   185   simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be
   166   seen as trees and there are several methods for traversing
   186   seen as trees and there are several methods for traversing
   167   trees. One of them corresponds to the inside-out traversal. Also
   187   trees. One of them corresponds to the inside-out traversal. Also
   168   remember numerical expressions from school: there you had exprssions
   188   remember numerical expressions from school: there you had expressions
   169   like $u + \ldots + (1 \cdot x) * \ldots (z + (y \cdot 0)) \ldots$
   189   like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
   170   and simplification rules that looked very similar to rules
   190   and simplification rules that looked very similar to rules
   171   above. You would simplify such numerical expressions by replacing
   191   above. You would simplify such numerical expressions by replacing
   172   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
   192   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
   173   look if more rules are applicable. If you organise this
   193   look whether more rules are applicable. If you organise the
   174   simplification in an inside-out fashion, it is always clear which
   194   simplification in an inside-out fashion, it is always clear which
   175   rule should applied next.\\\mbox{}\hfill[1 Mark]
   195   rule should applied next.\\\mbox{}\hfill[1 Mark]
   176 
   196 
   177 \item[(1d)] Implement two functions: The first, called \textit{ders},
   197 \item[(1d)] Implement two functions: The first, called \textit{ders},
   178   takes a list of characters and a regular expression as arguments, and
   198   takes a list of characters and a regular expression as arguments, and
   331 \subsection*{Background}
   351 \subsection*{Background}
   332 
   352 
   333 Although easily implementable in Scala, the idea behind the derivative
   353 Although easily implementable in Scala, the idea behind the derivative
   334 function might not so easy to be seen. To understand its purpose
   354 function might not so easy to be seen. To understand its purpose
   335 better, assume a regular expression $r$ can match strings of the form
   355 better, assume a regular expression $r$ can match strings of the form
   336 $c::cs$ (that means strings which start with a character $c$ and have
   356 $c\!::\!cs$ (that means strings which start with a character $c$ and have
   337 some rest, or tail, $cs$). If you now take the derivative of $r$ with
   357 some rest, or tail, $cs$). If you now take the derivative of $r$ with
   338 respect to the character $c$, then you obtain a regular expressions
   358 respect to the character $c$, then you obtain a regular expressions
   339 that can match all the strings $cs$.  In other words, the regular
   359 that can match all the strings $cs$.  In other words, the regular
   340 expression $\textit{der}\;c\;r$ can match the same strings $c::cs$
   360 expression $\textit{der}\;c\;r$ can match the same strings $c::cs$
   341 that can be matched by $r$, except that the $c$ is chopped off.
   361 that can be matched by $r$, except that the $c$ is chopped off.