201 \] |
201 \] |
202 |
202 |
203 \hfill[2 Mark] |
203 \hfill[2 Mark] |
204 \end{itemize} |
204 \end{itemize} |
205 |
205 |
|
206 \textbf{Background} Although easily implementable in Scala, the idea |
|
207 behind the derivative function might not so easy to be seen. To |
|
208 understand its purpose better, assume a regular expression $r$ can |
|
209 match strings of the form $c::cs$ (that means strings which start with |
|
210 a character $c$ and have some rest $cs$). If you now take the |
|
211 derivative of $r$ with respect to the character $c$, then you obtain a |
|
212 regular expressions that can match all the strings $cs$. In other |
|
213 words the regular expression $\textit{der}\;c\;r$ can match the same |
|
214 strings $c::cs$ that can be matched by $r$, except that the $c$ is |
|
215 chopped off. |
|
216 |
|
217 Assume now $r$ can match the string $abc$. If you take the derivative |
|
218 according to $a$ then you obtain a regular expression that can match |
|
219 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now |
|
220 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you obtain a regular |
|
221 expression that can match the string "c" (it is "bc" where 'b' is |
|
222 chopped off). If you finally build the derivative of this according |
|
223 'c', that is der('c', der('b', der('a', r))), you obtain a regular |
|
224 expression that can match the empty string. You can test this using |
|
225 the function nullable, which is what your matcher is doing. |
|
226 |
|
227 The purpose of the simp function is to keep the regular expression small. Normally the derivative function makes the regular expression bigger (see the SEQ case) and the algorithm would be slower and slower over time. The simp function counters this increase in size and the result is that the algorithm is fast throughout. |
|
228 By the way this whole idea is by Janusz Brzozowski who came up with this in 1964 in his PhD thesis. |
|
229 https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist) |
|
230 |
|
231 |
|
232 |
206 \subsection*{Part 2 (4 Marks)} |
233 \subsection*{Part 2 (4 Marks)} |
207 |
234 |
208 Coming soon. |
235 Coming soon. |
209 |
236 |
210 \end{document} |
237 \end{document} |