cws/cw03.tex
changeset 68 8da9e0c16194
parent 62 2151c77e1e24
child 69 f1295a0ab4ed
equal deleted inserted replaced
67:ca5884c2e3bd 68:8da9e0c16194
     4 
     4 
     5 \begin{document}
     5 \begin{document}
     6 
     6 
     7 \section*{Coursework 8 (Scala, Regular Expressions}
     7 \section*{Coursework 8 (Scala, Regular Expressions}
     8 
     8 
     9 This coursework is worth 10\% and is due on XXXX at
     9 This coursework is worth 10\%. It is about regular expressions and
    10 16:00. You are asked to implement a regular expression matcher.
    10 pattern matching. The first part is due on 30 November at 11pm; the
    11 
    11 second, more advanced part, is due on 7 December at 11pm. The
    12 Make sure the files
    12 second part is not yet included. For the first part you are
       
    13 asked to implement a regular expression matcher. Make sure the files
    13 you submit can be processed by just calling \texttt{scala
    14 you submit can be processed by just calling \texttt{scala
    14   <<filename.scala>>}.\bigskip 
    15   <<filename.scala>>}.\bigskip
    15 
    16 
    16 \noindent
    17 \noindent
    17 \textbf{Important:} Do not use any mutable data structures in your
    18 \textbf{Important:} Do not use any mutable data structures in your
    18 submissions! They are not needed. This excluded the use of
    19 submissions! They are not needed. This excluded the use of
    19 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
    20 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
    20 code! It has a different meaning in Scala, than in Java.
    21 code! It has a different meaning in Scala, than in Java.  Do not use
    21 Do not use \texttt{var}! This declares a mutable variable. Feel free to
    22 \texttt{var}! This declares a mutable variable.  Make sure the
    22 copy any code you need from files \texttt{knight1.scala},
       
    23 \texttt{knight2.scala} and \texttt{knight3.scala}. Make sure the
       
    24 functions you submit are defined on the ``top-level'' of Scala, not
    23 functions you submit are defined on the ``top-level'' of Scala, not
    25 inside a class or object.
    24 inside a class or object.
    26 
    25 
    27 
    26 
    28 \subsection*{Disclaimer}
    27 \subsection*{Disclaimer!!!!!!!!}
    29 
    28 
    30 It should be understood that the work you submit represents
    29 It should be understood that the work you submit represents
    31 your own effort. You have not copied from anyone else. An
    30 your own effort! You have not copied from anyone else. An
    32 exception is the Scala code I showed during the lectures or
    31 exception is the Scala code I showed during the lectures or
    33 uploaded to KEATS, which you can freely use.\bigskip
    32 uploaded to KEATS, which you can freely use.\bigskip
    34 
    33 
    35 
    34 
    36 \subsubsection*{Task}
    35 \subsection*{Part 1 (6 Marks)}
    37 
    36 
    38 The task is to implement a regular expression matcher based on
    37 The task is to implement a regular expression matcher based on
    39 derivatives of regular expressions. The implementation should
    38 derivatives of regular expressions. The implementation can deal
    40 be able to deal with the usual (basic) regular expressions
    39 with the following regular expressions, which have been predefined
       
    40 file re.scala:
    41 
    41 
    42 \begin{center}
    42 \begin{center}
    43 \begin{tabular}{lcll}
    43 \begin{tabular}{lcll}
    44   $r$ & $::=$ & $\ZERO$     & cannot match anything\\
    44   $r$ & $::=$ & $\ZERO$     & cannot match anything\\
    45       &   $|$ & $\ONE$      & can only match the empty string\\
    45       &   $|$ & $\ONE$      & can only match the empty string\\
    46       &   $|$ & $c$         & can match a character $c$\\
    46       &   $|$ & $c$         & can match a character (in this case $c$)\\
    47       &   $|$ & $r_1 + r_2$ & can match either with $r_1$ or with $r_2$\\
    47       &   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
    48       &   $|$ & $r_1 \cdot r_2$ & can match first with $r_1$ and then with $r_2$\\
    48   &   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
       
    49           &  & & then the second part with $r_2$\\
    49       &   $|$ & $r^*$       & can match zero or more times $r$\\
    50       &   $|$ & $r^*$       & can match zero or more times $r$\\
    50       &   $|$ & $r^{\{\uparrow n\}}$ & can match zero upto $n$ times $r$\\
       
    51       &   $|$ & $r^{\{n\}}$ & can match exactly $n$ times $r$\\
       
    52 \end{tabular}
    51 \end{tabular}
    53 \end{center}
    52 \end{center}
    54 
    53 
    55 \noindent
    54 \noindent 
    56 Implement a function called \textit{nullable} by recursion over
    55 Why? Knowing how to match regular expressions and strings fast will
    57 regular expressions:
    56 let you solve a lot of problems that vex other humans. Regular
       
    57 expressions are one of the fastest and simplest ways to match patterns
       
    58 in text, and are endlessly useful for searching, editing and
       
    59 analysing text in all sorts of places. However, you need to be
       
    60 fast, otherwise you will stumble upon problems such as recently reported at
       
    61 
       
    62 {\small
       
    63 \begin{itemize}
       
    64 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
       
    65 \item[$\bullet$] \url{https://vimeo.com/112065252}
       
    66 \item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
       
    67 \end{itemize}}
       
    68 
       
    69 \subsection*{Tasks (file re.scala)}
       
    70 
       
    71 \begin{itemize}
       
    72 \item[(1a)] Implement a function, called \textit{nullable}, by recursion over
       
    73   regular expressions. This function test whether a regular expression can match
       
    74   the empty string.
    58 
    75 
    59 \begin{center}
    76 \begin{center}
    60 \begin{tabular}{lcl}
    77 \begin{tabular}{lcl}
    61 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
    78 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
    62 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
    79 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
    63 $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
    80 $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
    64 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
    81 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
    65 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
    82 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
    66 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
    83 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
    67 $\textit{nullable}(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{true}$\\
       
    68 $\textit{nullable}(r^{\{n\}})$ & $\dn$ & 
       
    69    $\textit{if}\;n = 0\; \textit{then} \; \textit{true} \; \textit{else} \; \textit{nullable}(r)$\\
       
    70 \end{tabular}
    84 \end{tabular}
    71 \end{center}
    85 \end{center}\hfill[1 Mark]
       
    86 
       
    87 \item[(1b)] Implement a function, called \textit{der}, by recursion over
       
    88   regular expressions. It takes a character and a regular expression
       
    89   as arguments and calculates the derivative regular expression.
    72 
    90 
    73 \begin{center}
    91 \begin{center}
    74 \begin{tabular}{lcl}
    92 \begin{tabular}{lcl}
    75 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
    93 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
    76 $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
    94 $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
    78 $\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
    96 $\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
    79 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
    97 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
    80       & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
    98       & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
    81       & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
    99       & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
    82 $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
   100 $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
    83 $\textit{der}\;c\;(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{if}\;n = 0\;\textit{then}\;\ZERO\;\text{else}\;
       
    84    (\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\
       
    85 $\textit{der}\;c\;(r^{\{n\}})$ & $\dn$ & 
       
    86    $\textit{if}\;n = 0\; \textit{then} \; \ZERO\; \textit{else}\;$\\
       
    87    & & $\textit{if} \;\textit{nullable}(r)\;\textit{then}\;(\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\
       
    88    & & $\textit{else}\;(\textit{der}\;c\;r)\cdot (r^{\{n-1\}})$
       
    89 \end{tabular}
   101 \end{tabular}
    90 \end{center}
   102 \end{center}\hfill[1 Mark]
    91 
   103 
       
   104 \item[(1c)] Implement the function \textit{simp}, which recursively
       
   105   traverses a regular expression from inside to outside, and
       
   106   simplifies every sub-regular-expressions on the left to
       
   107   the regular expression on the right, except it does not simplify inside
       
   108   ${}^*$-regular expressions.
    92 
   109 
    93 Be careful that your implementation of \textit{nullable} and
   110   \begin{center}
    94 \textit{der}\;c\; satisfies for every $r$ the following two
       
    95 properties (see also Question 2):
       
    96 
       
    97 \begin{itemize}
       
    98 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
       
    99 \item $L(der\,c\,r) = Der\,c\,(L(r))$
       
   100 \end{itemize}
       
   101 
       
   102 \noindent {\bf Important!} Your implementation should have
       
   103 explicit cases for the basic regular expressions, but also
       
   104 explicit cases for the extended regular expressions. That
       
   105 means do not treat the extended regular expressions by just
       
   106 translating them into the basic ones. See also Question 2,
       
   107 where you are asked to explicitly give the rules for
       
   108 \textit{nullable} and \textit{der}\;c\; for the extended regular
       
   109 expressions.
       
   110 
       
   111 
       
   112 \subsection*{Question 1}
       
   113 
       
   114 What is your King's email address (you will need it in
       
   115 Question 3)?
       
   116 
       
   117 \subsection*{Question 2}
       
   118 
       
   119 This question does not require any implementation. From the
       
   120 lectures you have seen the definitions for the functions
       
   121 \textit{nullable} and \textit{der}\;c\; for the basic regular
       
   122 expressions. Give the rules for the extended regular
       
   123 expressions:
       
   124 
       
   125 \begin{center}
       
   126 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
       
   127 $\textit{nullable}([c_1 c_2 \ldots c_n])$  & $\dn$ & $?$\\
       
   128 $\textit{nullable}(r^+)$                   & $\dn$ & $?$\\
       
   129 $\textit{nullable}(r^?)$                   & $\dn$ & $?$\\
       
   130 $\textit{nullable}(r^{\{n,m\}})$            & $\dn$ & $?$\\
       
   131 $\textit{nullable}(\sim{}r)$               & $\dn$ & $?$\medskip\\
       
   132 $der\, c\, ([c_1 c_2 \ldots c_n])$  & $\dn$ & $?$\\
       
   133 $der\, c\, (r^+)$                   & $\dn$ & $?$\\
       
   134 $der\, c\, (r^?)$                   & $\dn$ & $?$\\
       
   135 $der\, c\, (r^{\{n,m\}})$            & $\dn$ & $?$\\
       
   136 $der\, c\, (\sim{}r)$               & $\dn$ & $?$\\
       
   137 \end{tabular}
       
   138 \end{center}
       
   139 
       
   140 \noindent
       
   141 Remember your definitions have to satisfy the two properties
       
   142 
       
   143 \begin{itemize}
       
   144 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
       
   145 \item $L(der\,c\,r)) = Der\,c\,(L(r))$
       
   146 \end{itemize}
       
   147 
       
   148 \subsection*{Question 3}
       
   149 
       
   150 Implement the following regular expression for email addresses
       
   151 
       
   152 \[
       
   153 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}})
       
   154 \]
       
   155 
       
   156 \noindent and calculate the derivative according to your email
       
   157 address. When calculating the derivative, simplify all regular
       
   158 expressions as much as possible by applying the
       
   159 following 7 simplification rules:
       
   160 
       
   161 \begin{center}
       
   162 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
   111 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
   163 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
   112 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
   164 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
   113 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
   165 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
   114 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
   166 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
   115 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
   167 $r + \ZERO$ & $\mapsto$ & $r$\\ 
   116 $r + \ZERO$ & $\mapsto$ & $r$\\ 
   168 $\ZERO + r$ & $\mapsto$ & $r$\\ 
   117 $\ZERO + r$ & $\mapsto$ & $r$\\ 
   169 $r + r$ & $\mapsto$ & $r$\\ 
   118 $r + r$ & $\mapsto$ & $r$\\ 
   170 \end{tabular}
   119 \end{tabular}
       
   120   \end{center}
       
   121 
       
   122   For example
       
   123   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
       
   124 
       
   125   simplifies to just $r_1$. 
       
   126   \hfill[1 Mark]
       
   127 
       
   128 \item[(1d)] Implement two functions: The first, called \textit{ders},
       
   129   takes a list of characters as arguments and a regular expression and
       
   130   buids the derivative as follows:
       
   131 
       
   132 \begin{center}
       
   133 \begin{tabular}{lcl}
       
   134 $\textit{ders}\;Nil\;(r)$ & $\dn$ & $r$\\
       
   135   $\textit{ders}\;c::cs\;(r)$  & $\dn$ &
       
   136     $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
       
   137 \end{tabular}
   171 \end{center}
   138 \end{center}
   172 
   139 
   173 \noindent Write down your simplified derivative in a readable
   140 The second, called \textit{matcher}, takes a string and a regular expression
   174 notation using parentheses where necessary. That means you
   141 as arguments. It builds first the derivatives according to \textit{ders}
   175 should use the infix notation $+$, $\cdot$, $^*$ and so on,
   142 and at the end tests whether the resulting redular expression can match
   176 instead of code.
   143 the empty string (using \textit{nullable}).
   177  
   144 For example the \textit{matcher} will produce true if given the
   178 \subsection*{Question 4}
   145 regular expression $a\cdot b\cdot c$ and the string $abc$.
       
   146 \hfill[1 Mark]
   179 
   147 
   180 Suppose \textit{[a-z]} stands for the range regular expression
   148 \item[(1e)] Implement the function \textit{replace}: it searches (from the left to 
   181 $[a,b,c,\ldots,z]$.  Consider the regular expression $/ \cdot * \cdot
   149 right) in string $s_1$ all the non-empty substrings that match the 
   182 (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot *
   150 regular expression---these substrings are assumed to be
   183 \cdot /$ and decide wether the following four strings are matched by
   151 the longest substrings matched by the regular expression and
   184 this regular expression. Answer yes or no.
   152 assumed to be non-overlapping. All these substrings in $s_1$ are replaced
       
   153 by $s_2$. For example given the regular expression
   185 
   154 
   186 \begin{enumerate}
   155 \[(a \cdot a)^* + (b \cdot b)\]
   187 \item \texttt{"/**/"}
       
   188 \item \texttt{"/*foobar*/"}
       
   189 \item \texttt{"/*test*/test*/"}
       
   190 \item \texttt{"/*test/*test*/"}
       
   191 \end{enumerate}
       
   192 
   156 
   193 \noindent
   157 \noindent the string $aabbbaaaaaaabaaaaabbaaaabb$ and
   194 Also test your regular expression matcher with the regular
   158 replacement string $c$ yields the string
   195 expression $a^{\{3,5\}}$ and the strings
       
   196 
   159 
   197 \begin{enumerate}
   160 \[
   198 \setcounter{enumi}{4}
   161 ccbcabcaccc
   199 \item \texttt{aa}
   162 \]
   200 \item \texttt{aaa}
       
   201 \item \texttt{aaaaa}
       
   202 \item \texttt{aaaaaa}
       
   203 \end{enumerate}
       
   204 
   163 
   205 \noindent
   164 \hfill[2 Mark]
   206 Does your matcher produce the expected results?
   165 \end{itemize}
   207 
   166 
   208 \subsection*{Question 5}
   167 \subsection*{Part 2 (4 Marks)}
   209 
   168 
   210 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be
   169 Coming soon.
   211 $(a^{\{19,19\}}) \cdot (a^?)$.  Decide whether the following three
       
   212 strings consisting of $a$s only can be matched by $(r_1^+)^+$.
       
   213 Similarly test them with $(r_2^+)^+$. Again answer in all six cases
       
   214 with yes or no. \medskip
       
   215 
       
   216 \noindent
       
   217 These are strings are meant to be entirely made up of $a$s. Be careful
       
   218 when copy-and-pasting the strings so as to not forgetting any $a$ and
       
   219 to not introducing any other character.
       
   220 
       
   221 \begin{enumerate}
       
   222 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
       
   223 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
       
   224 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   225 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   226 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   227 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   228 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   229 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   230 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   231 \end{enumerate}
       
   232 
       
   233 
   170 
   234 \end{document}
   171 \end{document}
       
   172 
   235 
   173 
   236 %%% Local Variables: 
   174 %%% Local Variables: 
   237 %%% mode: latex
   175 %%% mode: latex
   238 %%% TeX-master: t
   176 %%% TeX-master: t
   239 %%% End: 
   177 %%% End: