| author | Christian Urban <christian dot urban at kcl dot ac dot uk> | 
| Wed, 11 Jan 2017 14:46:37 +0000 | |
| changeset 101 | 2acb1d25fe91 | 
| parent 94 | ae4708c851ee | 
| child 100 | e868732cab63 | 
| permissions | -rw-r--r-- | 
| 6 | 1 | \documentclass{article}
 | 
| 62 | 2 | \usepackage{../style}
 | 
| 78 | 3 | \usepackage{../langs}
 | 
| 6 | 4 | |
| 5 | \begin{document}
 | |
| 6 | ||
| 62 | 7 | \section*{Coursework 8 (Scala, Regular Expressions}
 | 
| 6 | 8 | |
| 79 | 9 | This coursework is worth 10\%. It is about regular expressions, | 
| 10 | pattern matching and polymorphism. The first part is due on 30 | |
| 11 | November at 11pm; the second, more advanced part, is due on 7 December | |
| 12 | at 11pm. You are asked to implement a regular expression matcher. Make | |
| 13 | sure the files you submit can be processed by just calling | |
| 14 | \texttt{scala <<filename.scala>>}.\bigskip
 | |
| 62 | 15 | |
| 16 | \noindent | |
| 17 | \textbf{Important:} Do not use any mutable data structures in your
 | |
| 74 | 18 | submission! They are not needed. This excludes the use of | 
| 62 | 19 | \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
 | 
| 68 | 20 | code! It has a different meaning in Scala, than in Java. Do not use | 
| 21 | \texttt{var}! This declares a mutable variable.  Make sure the
 | |
| 62 | 22 | functions you submit are defined on the ``top-level'' of Scala, not | 
| 86 | 23 | inside a class or object. Also note that the running time of | 
| 24 | each part will be restricted to a maximum of 360 seconds. | |
| 25 | ||
| 6 | 26 | |
| 27 | ||
| 68 | 28 | \subsection*{Disclaimer!!!!!!!!}
 | 
| 6 | 29 | |
| 30 | It should be understood that the work you submit represents | |
| 68 | 31 | your own effort! You have not copied from anyone else. An | 
| 6 | 32 | exception is the Scala code I showed during the lectures or | 
| 33 | uploaded to KEATS, which you can freely use.\bigskip | |
| 34 | ||
| 35 | ||
| 68 | 36 | \subsection*{Part 1 (6 Marks)}
 | 
| 6 | 37 | |
| 69 | 38 | The task is to implement a regular expression matcher that is based on | 
| 68 | 39 | derivatives of regular expressions. The implementation can deal | 
| 40 | with the following regular expressions, which have been predefined | |
| 69 | 41 | in the file re.scala: | 
| 6 | 42 | |
| 43 | \begin{center}
 | |
| 44 | \begin{tabular}{lcll}
 | |
| 45 | $r$ & $::=$ & $\ZERO$ & cannot match anything\\ | |
| 46 | & $|$ & $\ONE$ & can only match the empty string\\ | |
| 68 | 47 | & $|$ & $c$ & can match a character (in this case $c$)\\ | 
| 48 | & $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\ | |
| 49 | & $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\ | |
| 50 | & & & then the second part with $r_2$\\ | |
| 6 | 51 | & $|$ & $r^*$ & can match zero or more times $r$\\ | 
| 52 | \end{tabular}
 | |
| 53 | \end{center}
 | |
| 54 | ||
| 68 | 55 | \noindent | 
| 56 | Why? Knowing how to match regular expressions and strings fast will | |
| 57 | let you solve a lot of problems that vex other humans. Regular | |
| 58 | expressions are one of the fastest and simplest ways to match patterns | |
| 59 | in text, and are endlessly useful for searching, editing and | |
| 60 | analysing text in all sorts of places. However, you need to be | |
| 69 | 61 | fast, otherwise you will stumble over problems such as recently reported at | 
| 68 | 62 | |
| 63 | {\small
 | |
| 64 | \begin{itemize}
 | |
| 65 | \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
 | |
| 66 | \item[$\bullet$] \url{https://vimeo.com/112065252}
 | |
| 67 | \item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
 | |
| 68 | \end{itemize}}
 | |
| 69 | ||
| 79 | 70 | \subsubsection*{Tasks (file re.scala)}
 | 
| 68 | 71 | |
| 72 | \begin{itemize}
 | |
| 73 | \item[(1a)] Implement a function, called \textit{nullable}, by recursion over
 | |
| 69 | 74 | regular expressions. This function tests whether a regular expression can match | 
| 68 | 75 | the empty string. | 
| 6 | 76 | |
| 77 | \begin{center}
 | |
| 78 | \begin{tabular}{lcl}
 | |
| 79 | $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
 | |
| 80 | $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
 | |
| 81 | $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
 | |
| 82 | $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
 | |
| 83 | $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
 | |
| 84 | $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
 | |
| 85 | \end{tabular}
 | |
| 68 | 86 | \end{center}\hfill[1 Mark]
 | 
| 87 | ||
| 88 | \item[(1b)] Implement a function, called \textit{der}, by recursion over
 | |
| 89 | regular expressions. It takes a character and a regular expression | |
| 69 | 90 | as arguments and calculates the derivative regular expression according | 
| 91 | to the rules: | |
| 6 | 92 | |
| 93 | \begin{center}
 | |
| 94 | \begin{tabular}{lcl}
 | |
| 95 | $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
 | |
| 96 | $\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
 | |
| 97 | $\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
 | |
| 98 | $\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
 | |
| 99 | $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
 | |
| 100 |       & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
 | |
| 101 |       & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
 | |
| 102 | $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
 | |
| 103 | \end{tabular}
 | |
| 69 | 104 | \end{center}
 | 
| 105 | ||
| 106 | For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives | |
| 107 | w.r.t.~the characters $a$, $b$ and $c$ are | |
| 108 | ||
| 109 | \begin{center}
 | |
| 110 |   \begin{tabular}{lcll}
 | |
| 111 |     $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\
 | |
| 112 |     $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
 | |
| 113 |     $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
 | |
| 114 |   \end{tabular}
 | |
| 115 | \end{center}
 | |
| 116 | ||
| 117 | Let $r'$ stand for the first derivative, then taking the derivatives of $r'$ | |
| 118 | w.r.t.~the characters $a$, $b$ and $c$ gives | |
| 119 | ||
| 120 | \begin{center}
 | |
| 121 |   \begin{tabular}{lcll}
 | |
| 122 |     $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
 | |
| 123 |     $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\
 | |
| 124 |     $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
 | |
| 125 |   \end{tabular}
 | |
| 126 | \end{center}
 | |
| 127 | ||
| 128 | One more example: Let $r''$ stand for the second derivative above, | |
| 129 | then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$ | |
| 130 | and $c$ gives | |
| 131 | ||
| 132 | \begin{center}
 | |
| 133 |   \begin{tabular}{lcll}
 | |
| 134 |     $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
 | |
| 135 |     $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
 | |
| 136 |     $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$
 | |
| 137 |   \end{tabular}
 | |
| 138 | \end{center}
 | |
| 139 | ||
| 140 | Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
 | |
| 141 | \mbox{}\hfill\mbox{[1 Mark]}
 | |
| 6 | 142 | |
| 68 | 143 | \item[(1c)] Implement the function \textit{simp}, which recursively
 | 
| 69 | 144 | traverses a regular expression from the inside to the outside, and | 
| 145 | simplifies every sub-regular-expression on the left (see below) to | |
| 68 | 146 | the regular expression on the right, except it does not simplify inside | 
| 147 |   ${}^*$-regular expressions.
 | |
| 6 | 148 | |
| 68 | 149 |   \begin{center}
 | 
| 69 | 150 | \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
 | 
| 6 | 151 | $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ | 
| 152 | $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ | |
| 153 | $r \cdot \ONE$ & $\mapsto$ & $r$\\ | |
| 154 | $\ONE \cdot r$ & $\mapsto$ & $r$\\ | |
| 155 | $r + \ZERO$ & $\mapsto$ & $r$\\ | |
| 156 | $\ZERO + r$ & $\mapsto$ & $r$\\ | |
| 157 | $r + r$ & $\mapsto$ & $r$\\ | |
| 158 | \end{tabular}
 | |
| 68 | 159 |   \end{center}
 | 
| 160 | ||
| 69 | 161 | For example the regular expression | 
| 68 | 162 | \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] | 
| 163 | ||
| 79 | 164 |   simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be
 | 
| 165 | seen as trees and there are several methods for traversing | |
| 166 | trees. One of them corresponds to the inside-out traversal. Also | |
| 167 | remember numerical expressions from school: there you had exprssions | |
| 168 | like $u + \ldots + (1 \cdot x) * \ldots (z + (y \cdot 0)) \ldots$ | |
| 169 | and simplification rules that looked very similar to rules | |
| 170 | above. You would simplify such numerical expressions by replacing | |
| 171 | for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then | |
| 172 | look if more rules are applicable. If you organise this | |
| 173 | simplification in an inside-out fashion, it is always clear which | |
| 174 |   rule should applied next.\\\mbox{}\hfill[1 Mark]
 | |
| 68 | 175 | |
| 176 | \item[(1d)] Implement two functions: The first, called \textit{ders},
 | |
| 69 | 177 | takes a list of characters and a regular expression as arguments, and | 
| 178 | builds the derivative w.r.t.~the list as follows: | |
| 68 | 179 | |
| 180 | \begin{center}
 | |
| 181 | \begin{tabular}{lcl}
 | |
| 69 | 182 | $\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
 | 
| 183 |   $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
 | |
| 68 | 184 |     $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
 | 
| 185 | \end{tabular}
 | |
| 6 | 186 | \end{center}
 | 
| 187 | ||
| 78 | 188 | Note that this function is different from \textit{der}, which only
 | 
| 189 | takes a single character. | |
| 190 | ||
| 191 | The second function, called \textit{matcher}, takes a string and a
 | |
| 192 | regular expression as arguments. It builds first the derivatives | |
| 193 | according to \textit{ders} and after that tests whether the resulting
 | |
| 194 | derivative regular expression can match the empty string (using | |
| 195 | \textit{nullable}).  For example the \textit{matcher} will produce
 | |
| 196 | true given the regular expression $(a\cdot b)\cdot c$ and the string | |
| 197 | $abc$.\\  \mbox{}\hfill[1 Mark]
 | |
| 6 | 198 | |
| 69 | 199 | \item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
 | 
| 200 | (from the left to | |
| 201 | right) in the string $s_1$ all the non-empty substrings that match the | |
| 202 | regular expression $r$---these substrings are assumed to be | |
| 68 | 203 | the longest substrings matched by the regular expression and | 
| 69 | 204 | assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$ | 
| 205 | are replaced by $s_2$. For example given the regular expression | |
| 6 | 206 | |
| 68 | 207 | \[(a \cdot a)^* + (b \cdot b)\] | 
| 6 | 208 | |
| 69 | 209 | \noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and | 
| 78 | 210 | replacement the string $s_2 = c$ yields the string | 
| 6 | 211 | |
| 68 | 212 | \[ | 
| 213 | ccbcabcaccc | |
| 214 | \] | |
| 6 | 215 | |
| 78 | 216 | \hfill[2 Marks] | 
| 68 | 217 | \end{itemize}
 | 
| 6 | 218 | |
| 78 | 219 | |
| 220 | ||
| 221 | ||
| 222 | \subsection*{Part 2 (4 Marks)}
 | |
| 223 | ||
| 224 | You need to copy all the code from \texttt{re.scala} into
 | |
| 225 | \texttt{re2.scala} in order to complete Part 2.  Parts (2a) and (2b)
 | |
| 226 | give you another method and datapoints for testing the \textit{der}
 | |
| 227 | and \textit{simp} functions from Part~1.
 | |
| 228 | ||
| 79 | 229 | \subsubsection*{Tasks (file re2.scala)}
 | 
| 78 | 230 | |
| 231 | \begin{itemize}
 | |
| 232 | \item[(2a)] Write a \textbf{polymorphic} function, called
 | |
| 233 |   \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an
 | |
| 234 | integer $n$, a function $f$ and an $x$ as arguments. This function | |
| 235 | should iterate $f$ $n$-times starting with the argument $x$, like | |
| 236 | ||
| 237 |   \[\underbrace{f(\ldots (f}_{n\text{-times}}(x)))
 | |
| 238 | \] | |
| 239 | ||
| 240 |   More formally that means \textit{iterT} behaves as follows:
 | |
| 241 | ||
| 242 |   \begin{center}
 | |
| 243 |     \begin{tabular}{lcl}
 | |
| 244 |       $\textit{iterT}(n, f, x)$ & $\dn$ &
 | |
| 245 |       $\begin{cases}
 | |
| 246 |         \;x & \textit{if}\;n = 0\\
 | |
| 247 |         \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise}
 | |
| 248 |         \end{cases}$      
 | |
| 249 |     \end{tabular}
 | |
| 250 | \end{center}
 | |
| 251 | ||
| 252 |   Make sure you write a \textbf{tail-recursive} version of
 | |
| 253 |   \textit{iterT}.  If you add the annotation \texttt{@tailrec} (see
 | |
| 79 | 254 | below) your code should not produce an error message. | 
| 78 | 255 | |
| 256 |   \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
 | |
| 257 | import scala.annotation.tailrec | |
| 258 | ||
| 259 | @tailrec | |
| 260 | def iterT[A](n: Int, f: A => A, x: A): A = ... | |
| 261 |   \end{lstlisting}
 | |
| 262 | ||
| 263 |   You can assume that \textit{iterT} will only be called for positive
 | |
| 264 |   integers $0 \le n$. Given the type variable \texttt{A}, the type of
 | |
| 265 |   $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means
 | |
| 266 |   \textit{iterT} can be used, for example, for functions from integers
 | |
| 79 | 267 | to integers, or strings to strings, or regular expressions to | 
| 268 |   regular expressions.  \\ \mbox{}\hfill[2 Marks]
 | |
| 78 | 269 | |
| 270 | \item[(2b)] Implement a function, called \textit{size}, by recursion
 | |
| 271 | over regular expressions. If a regular expression is seen as a tree, | |
| 272 |   then \textit{size} should return the number of nodes in such a
 | |
| 273 | tree. Therefore this function is defined as follows: | |
| 274 | ||
| 275 | \begin{center}
 | |
| 276 | \begin{tabular}{lcl}
 | |
| 277 | $\textit{size}(\ZERO)$ & $\dn$ & $1$\\
 | |
| 278 | $\textit{size}(\ONE)$  & $\dn$ & $1$\\
 | |
| 279 | $\textit{size}(c)$     & $\dn$ & $1$\\
 | |
| 280 | $\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
 | |
| 281 | $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
 | |
| 282 | $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
 | |
| 283 | \end{tabular}
 | |
| 284 | \end{center}
 | |
| 285 | ||
| 286 | You can use \textit{size} and \textit{iterT} in order to test how much
 | |
| 287 | the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking | |
| 79 | 288 | successive derivatives according the letter $a$ and then compare it to | 
| 78 | 289 | taking the derivative, but simlifying the derivative after each step. | 
| 290 | For example, the calls | |
| 291 | ||
| 292 |   \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
 | |
| 293 |   size(iterT(20, (r: Rexp) => der('a', r), EVIL))
 | |
| 294 |   size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL))
 | |
| 295 |   \end{lstlisting}
 | |
| 296 | ||
| 297 | produce without simplification a regular expression of size of | |
| 79 | 298 | 7340068 after 20 iterations, while the one with | 
| 299 | simplification gives | |
| 78 | 300 |   just 8.\\ \mbox{}\hfill[1 Mark]
 | 
| 301 | ||
| 302 | ||
| 303 | \item[(2c)] Write a \textbf{polymorphic} function, called
 | |
| 304 |   \textit{fixpT}, that takes
 | |
| 305 | a function $f$ and an $x$ as arguments. The purpose | |
| 306 |   of \textit{fixpT} is to calculate a fixpoint of the function $f$
 | |
| 307 | starting from the argument $x$. | |
| 308 | A fixpoint, say $y$, is when $f(y) = y$ holds. | |
| 309 |   That means \textit{fixpT} behaves as follows:
 | |
| 310 | ||
| 311 |   \begin{center}
 | |
| 312 |     \begin{tabular}{lcl}
 | |
| 313 |       $\textit{fixpT}(f, x)$ & $\dn$ &
 | |
| 314 |       $\begin{cases}
 | |
| 315 |         \;x & \textit{if}\;f(x) = x\\
 | |
| 316 |         \;\textit{fixpT}(f, f(x)) & \textit{otherwise}
 | |
| 317 |         \end{cases}$      
 | |
| 318 |     \end{tabular}
 | |
| 319 | \end{center}
 | |
| 320 | ||
| 321 |   Make sure you calculate in the code of $\textit{fixpT}$ the result
 | |
| 322 |   of $f(x)$ only once. Given the type variable \texttt{A} in
 | |
| 323 |   $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of
 | |
| 324 |   $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example
 | |
| 325 | function where in one the fixpoint is 1 and in the other | |
| 326 |   it is the string $a$.\\ \mbox{}\hfill[1 Mark]  
 | |
| 327 | \end{itemize}\bigskip  
 | |
| 328 | ||
| 329 | ||
| 94 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 330 | \subsection*{Background}
 | 
| 78 | 331 | |
| 94 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 332 | Although easily implementable in Scala, the idea behind the derivative | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 333 | function might not so easy to be seen. To understand its purpose | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 334 | better, assume a regular expression $r$ can match strings of the form | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 335 | $c::cs$ (that means strings which start with a character $c$ and have | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 336 | some rest, or tail, $cs$). If you now take the derivative of $r$ with | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 337 | respect to the character $c$, then you obtain a regular expressions | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 338 | that can match all the strings $cs$. In other words, the regular | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 339 | expression $\textit{der}\;c\;r$ can match the same strings $c::cs$
 | 
| 
ae4708c851ee
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
86diff
changeset | 340 | that can be matched by $r$, except that the $c$ is chopped off. | 
| 75 | 341 | |
| 342 | Assume now $r$ can match the string $abc$. If you take the derivative | |
| 343 | according to $a$ then you obtain a regular expression that can match | |
| 344 | $bc$ (it is $abc$ where the $a$ has been chopped off). If you now | |
| 78 | 345 | build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you
 | 
| 346 | obtain a regular expression that can match the string $c$ (it is $bc$ | |
| 347 | where $b$ is chopped off). If you finally build the derivative of this | |
| 348 | according $c$, that is | |
| 349 | $\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you
 | |
| 350 | obtain a regular expression that can match the empty string. You can | |
| 351 | test this using the function nullable, which is what your matcher is | |
| 352 | doing. | |
| 75 | 353 | |
| 78 | 354 | The purpose of the simp function is to keep the regular expression | 
| 355 | small. Normally the derivative function makes the regular expression | |
| 356 | bigger (see the SEQ case and the example in (2b)) and the algorithm | |
| 357 | would be slower and slower over time. The simp function counters this | |
| 358 | increase in size and the result is that the algorithm is fast | |
| 359 | throughout. By the way, this algorithm is by Janusz Brzozowski who | |
| 360 | came up with the idea of derivatives in 1964 in his PhD thesis. | |
| 75 | 361 | |
| 78 | 362 | \begin{center}\small
 | 
| 363 | \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
 | |
| 364 | \end{center}
 | |
| 6 | 365 | |
| 366 | \end{document}
 | |
| 367 | ||
| 68 | 368 | |
| 6 | 369 | %%% Local Variables: | 
| 370 | %%% mode: latex | |
| 371 | %%% TeX-master: t | |
| 372 | %%% End: |