cws/cw03.tex
changeset 69 f1295a0ab4ed
parent 68 8da9e0c16194
child 74 fb2d8012ed08
equal deleted inserted replaced
68:8da9e0c16194 69:f1295a0ab4ed
    14 you submit can be processed by just calling \texttt{scala
    14 you submit can be processed by just calling \texttt{scala
    15   <<filename.scala>>}.\bigskip
    15   <<filename.scala>>}.\bigskip
    16 
    16 
    17 \noindent
    17 \noindent
    18 \textbf{Important:} Do not use any mutable data structures in your
    18 \textbf{Important:} Do not use any mutable data structures in your
    19 submissions! They are not needed. This excluded the use of
    19 submission! They are not needed. This excluded the use of
    20 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
    20 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
    21 code! It has a different meaning in Scala, than in Java.  Do not use
    21 code! It has a different meaning in Scala, than in Java.  Do not use
    22 \texttt{var}! This declares a mutable variable.  Make sure the
    22 \texttt{var}! This declares a mutable variable.  Make sure the
    23 functions you submit are defined on the ``top-level'' of Scala, not
    23 functions you submit are defined on the ``top-level'' of Scala, not
    24 inside a class or object.
    24 inside a class or object.
    32 uploaded to KEATS, which you can freely use.\bigskip
    32 uploaded to KEATS, which you can freely use.\bigskip
    33 
    33 
    34 
    34 
    35 \subsection*{Part 1 (6 Marks)}
    35 \subsection*{Part 1 (6 Marks)}
    36 
    36 
    37 The task is to implement a regular expression matcher based on
    37 The task is to implement a regular expression matcher that is based on
    38 derivatives of regular expressions. The implementation can deal
    38 derivatives of regular expressions. The implementation can deal
    39 with the following regular expressions, which have been predefined
    39 with the following regular expressions, which have been predefined
    40 file re.scala:
    40 in the file re.scala:
    41 
    41 
    42 \begin{center}
    42 \begin{center}
    43 \begin{tabular}{lcll}
    43 \begin{tabular}{lcll}
    44   $r$ & $::=$ & $\ZERO$     & cannot match anything\\
    44   $r$ & $::=$ & $\ZERO$     & cannot match anything\\
    45       &   $|$ & $\ONE$      & can only match the empty string\\
    45       &   $|$ & $\ONE$      & can only match the empty string\\
    55 Why? Knowing how to match regular expressions and strings fast will
    55 Why? Knowing how to match regular expressions and strings fast will
    56 let you solve a lot of problems that vex other humans. Regular
    56 let you solve a lot of problems that vex other humans. Regular
    57 expressions are one of the fastest and simplest ways to match patterns
    57 expressions are one of the fastest and simplest ways to match patterns
    58 in text, and are endlessly useful for searching, editing and
    58 in text, and are endlessly useful for searching, editing and
    59 analysing text in all sorts of places. However, you need to be
    59 analysing text in all sorts of places. However, you need to be
    60 fast, otherwise you will stumble upon problems such as recently reported at
    60 fast, otherwise you will stumble over problems such as recently reported at
    61 
    61 
    62 {\small
    62 {\small
    63 \begin{itemize}
    63 \begin{itemize}
    64 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
    64 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
    65 \item[$\bullet$] \url{https://vimeo.com/112065252}
    65 \item[$\bullet$] \url{https://vimeo.com/112065252}
    68 
    68 
    69 \subsection*{Tasks (file re.scala)}
    69 \subsection*{Tasks (file re.scala)}
    70 
    70 
    71 \begin{itemize}
    71 \begin{itemize}
    72 \item[(1a)] Implement a function, called \textit{nullable}, by recursion over
    72 \item[(1a)] Implement a function, called \textit{nullable}, by recursion over
    73   regular expressions. This function test whether a regular expression can match
    73   regular expressions. This function tests whether a regular expression can match
    74   the empty string.
    74   the empty string.
    75 
    75 
    76 \begin{center}
    76 \begin{center}
    77 \begin{tabular}{lcl}
    77 \begin{tabular}{lcl}
    78 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
    78 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
    84 \end{tabular}
    84 \end{tabular}
    85 \end{center}\hfill[1 Mark]
    85 \end{center}\hfill[1 Mark]
    86 
    86 
    87 \item[(1b)] Implement a function, called \textit{der}, by recursion over
    87 \item[(1b)] Implement a function, called \textit{der}, by recursion over
    88   regular expressions. It takes a character and a regular expression
    88   regular expressions. It takes a character and a regular expression
    89   as arguments and calculates the derivative regular expression.
    89   as arguments and calculates the derivative regular expression according
       
    90   to the rules:
    90 
    91 
    91 \begin{center}
    92 \begin{center}
    92 \begin{tabular}{lcl}
    93 \begin{tabular}{lcl}
    93 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
    94 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
    94 $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
    95 $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
    97 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
    98 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
    98       & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
    99       & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
    99       & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
   100       & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
   100 $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
   101 $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
   101 \end{tabular}
   102 \end{tabular}
   102 \end{center}\hfill[1 Mark]
   103 \end{center}
       
   104 
       
   105 For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives
       
   106 w.r.t.~the characters $a$, $b$ and $c$ are
       
   107 
       
   108 \begin{center}
       
   109   \begin{tabular}{lcll}
       
   110     $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\
       
   111     $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
       
   112     $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
       
   113   \end{tabular}
       
   114 \end{center}
       
   115 
       
   116 Let $r'$ stand for the first derivative, then taking the derivatives of $r'$
       
   117 w.r.t.~the characters $a$, $b$ and $c$ gives
       
   118 
       
   119 \begin{center}
       
   120   \begin{tabular}{lcll}
       
   121     $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
       
   122     $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\
       
   123     $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
       
   124   \end{tabular}
       
   125 \end{center}
       
   126 
       
   127 One more example: Let $r''$ stand for the second derivative above,
       
   128 then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$
       
   129 and $c$ gives
       
   130 
       
   131 \begin{center}
       
   132   \begin{tabular}{lcll}
       
   133     $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
       
   134     $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
       
   135     $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$
       
   136   \end{tabular}
       
   137 \end{center}
       
   138 
       
   139 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
       
   140 \mbox{}\hfill\mbox{[1 Mark]}
   103 
   141 
   104 \item[(1c)] Implement the function \textit{simp}, which recursively
   142 \item[(1c)] Implement the function \textit{simp}, which recursively
   105   traverses a regular expression from inside to outside, and
   143   traverses a regular expression from the inside to the outside, and
   106   simplifies every sub-regular-expressions on the left to
   144   simplifies every sub-regular-expression on the left (see below) to
   107   the regular expression on the right, except it does not simplify inside
   145   the regular expression on the right, except it does not simplify inside
   108   ${}^*$-regular expressions.
   146   ${}^*$-regular expressions.
   109 
   147 
   110   \begin{center}
   148   \begin{center}
   111 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
   149 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
   112 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
   150 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
   113 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
   151 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
   114 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
   152 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
   115 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
   153 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
   116 $r + \ZERO$ & $\mapsto$ & $r$\\ 
   154 $r + \ZERO$ & $\mapsto$ & $r$\\ 
   117 $\ZERO + r$ & $\mapsto$ & $r$\\ 
   155 $\ZERO + r$ & $\mapsto$ & $r$\\ 
   118 $r + r$ & $\mapsto$ & $r$\\ 
   156 $r + r$ & $\mapsto$ & $r$\\ 
   119 \end{tabular}
   157 \end{tabular}
   120   \end{center}
   158   \end{center}
   121 
   159 
   122   For example
   160   For example the regular expression
   123   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
   161   \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
   124 
   162 
   125   simplifies to just $r_1$. 
   163   simplifies to just $r_1$. 
   126   \hfill[1 Mark]
   164   \hfill[1 Mark]
   127 
   165 
   128 \item[(1d)] Implement two functions: The first, called \textit{ders},
   166 \item[(1d)] Implement two functions: The first, called \textit{ders},
   129   takes a list of characters as arguments and a regular expression and
   167   takes a list of characters and a regular expression as arguments, and
   130   buids the derivative as follows:
   168   builds the derivative w.r.t.~the list as follows:
   131 
   169 
   132 \begin{center}
   170 \begin{center}
   133 \begin{tabular}{lcl}
   171 \begin{tabular}{lcl}
   134 $\textit{ders}\;Nil\;(r)$ & $\dn$ & $r$\\
   172 $\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
   135   $\textit{ders}\;c::cs\;(r)$  & $\dn$ &
   173   $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
   136     $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
   174     $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
   137 \end{tabular}
   175 \end{tabular}
   138 \end{center}
   176 \end{center}
   139 
   177 
   140 The second, called \textit{matcher}, takes a string and a regular expression
   178 The second, called \textit{matcher}, takes a string and a regular expression
   141 as arguments. It builds first the derivatives according to \textit{ders}
   179 as arguments. It builds first the derivatives according to \textit{ders}
   142 and at the end tests whether the resulting redular expression can match
   180 and after that tests whether the resulting derivative regular expression can match
   143 the empty string (using \textit{nullable}).
   181 the empty string (using \textit{nullable}).
   144 For example the \textit{matcher} will produce true if given the
   182 For example the \textit{matcher} will produce true given the
   145 regular expression $a\cdot b\cdot c$ and the string $abc$.
   183 regular expression $(a\cdot b)\cdot c$ and the string $abc$.
   146 \hfill[1 Mark]
   184 \hfill[1 Mark]
   147 
   185 
   148 \item[(1e)] Implement the function \textit{replace}: it searches (from the left to 
   186 \item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
   149 right) in string $s_1$ all the non-empty substrings that match the 
   187   (from the left to 
   150 regular expression---these substrings are assumed to be
   188 right) in the string $s_1$ all the non-empty substrings that match the 
       
   189 regular expression $r$---these substrings are assumed to be
   151 the longest substrings matched by the regular expression and
   190 the longest substrings matched by the regular expression and
   152 assumed to be non-overlapping. All these substrings in $s_1$ are replaced
   191 assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$
   153 by $s_2$. For example given the regular expression
   192 are replaced by $s_2$. For example given the regular expression
   154 
   193 
   155 \[(a \cdot a)^* + (b \cdot b)\]
   194 \[(a \cdot a)^* + (b \cdot b)\]
   156 
   195 
   157 \noindent the string $aabbbaaaaaaabaaaaabbaaaabb$ and
   196 \noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
   158 replacement string $c$ yields the string
   197 replacement string $s_2 = c$ yields the string
   159 
   198 
   160 \[
   199 \[
   161 ccbcabcaccc
   200 ccbcabcaccc
   162 \]
   201 \]
   163 
   202