| author | Christian Urban <urbanc@in.tum.de> | 
| Sat, 21 Jan 2017 00:43:35 +0000 | |
| changeset 99 | 2b7a2dc0e18f | 
| parent 94 | ae4708c851ee | 
| child 100 | e868732cab63 | 
| permissions | -rw-r--r-- | 
| 6 | 1  | 
\documentclass{article}
 | 
| 62 | 2  | 
\usepackage{../style}
 | 
| 78 | 3  | 
\usepackage{../langs}
 | 
| 6 | 4  | 
|
5  | 
\begin{document}
 | 
|
6  | 
||
| 62 | 7  | 
\section*{Coursework 8 (Scala, Regular Expressions}
 | 
| 6 | 8  | 
|
| 79 | 9  | 
This coursework is worth 10\%. It is about regular expressions,  | 
10  | 
pattern matching and polymorphism. The first part is due on 30  | 
|
11  | 
November at 11pm; the second, more advanced part, is due on 7 December  | 
|
12  | 
at 11pm. You are asked to implement a regular expression matcher. Make  | 
|
13  | 
sure the files you submit can be processed by just calling  | 
|
14  | 
\texttt{scala <<filename.scala>>}.\bigskip
 | 
|
| 62 | 15  | 
|
16  | 
\noindent  | 
|
17  | 
\textbf{Important:} Do not use any mutable data structures in your
 | 
|
| 74 | 18  | 
submission! They are not needed. This excludes the use of  | 
| 62 | 19  | 
\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
 | 
| 68 | 20  | 
code! It has a different meaning in Scala, than in Java. Do not use  | 
21  | 
\texttt{var}! This declares a mutable variable.  Make sure the
 | 
|
| 62 | 22  | 
functions you submit are defined on the ``top-level'' of Scala, not  | 
| 86 | 23  | 
inside a class or object. Also note that the running time of  | 
24  | 
each part will be restricted to a maximum of 360 seconds.  | 
|
25  | 
||
| 6 | 26  | 
|
27  | 
||
| 68 | 28  | 
\subsection*{Disclaimer!!!!!!!!}
 | 
| 6 | 29  | 
|
30  | 
It should be understood that the work you submit represents  | 
|
| 68 | 31  | 
your own effort! You have not copied from anyone else. An  | 
| 6 | 32  | 
exception is the Scala code I showed during the lectures or  | 
33  | 
uploaded to KEATS, which you can freely use.\bigskip  | 
|
34  | 
||
35  | 
||
| 68 | 36  | 
\subsection*{Part 1 (6 Marks)}
 | 
| 6 | 37  | 
|
| 69 | 38  | 
The task is to implement a regular expression matcher that is based on  | 
| 68 | 39  | 
derivatives of regular expressions. The implementation can deal  | 
40  | 
with the following regular expressions, which have been predefined  | 
|
| 69 | 41  | 
in the file re.scala:  | 
| 6 | 42  | 
|
43  | 
\begin{center}
 | 
|
44  | 
\begin{tabular}{lcll}
 | 
|
45  | 
$r$ & $::=$ & $\ZERO$ & cannot match anything\\  | 
|
46  | 
& $|$ & $\ONE$ & can only match the empty string\\  | 
|
| 68 | 47  | 
& $|$ & $c$ & can match a character (in this case $c$)\\  | 
48  | 
& $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\  | 
|
49  | 
& $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\  | 
|
50  | 
& & & then the second part with $r_2$\\  | 
|
| 6 | 51  | 
& $|$ & $r^*$ & can match zero or more times $r$\\  | 
52  | 
\end{tabular}
 | 
|
53  | 
\end{center}
 | 
|
54  | 
||
| 68 | 55  | 
\noindent  | 
56  | 
Why? Knowing how to match regular expressions and strings fast will  | 
|
57  | 
let you solve a lot of problems that vex other humans. Regular  | 
|
58  | 
expressions are one of the fastest and simplest ways to match patterns  | 
|
59  | 
in text, and are endlessly useful for searching, editing and  | 
|
60  | 
analysing text in all sorts of places. However, you need to be  | 
|
| 69 | 61  | 
fast, otherwise you will stumble over problems such as recently reported at  | 
| 68 | 62  | 
|
63  | 
{\small
 | 
|
64  | 
\begin{itemize}
 | 
|
65  | 
\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
 | 
|
66  | 
\item[$\bullet$] \url{https://vimeo.com/112065252}
 | 
|
67  | 
\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
 | 
|
68  | 
\end{itemize}}
 | 
|
69  | 
||
| 79 | 70  | 
\subsubsection*{Tasks (file re.scala)}
 | 
| 68 | 71  | 
|
72  | 
\begin{itemize}
 | 
|
73  | 
\item[(1a)] Implement a function, called \textit{nullable}, by recursion over
 | 
|
| 69 | 74  | 
regular expressions. This function tests whether a regular expression can match  | 
| 68 | 75  | 
the empty string.  | 
| 6 | 76  | 
|
77  | 
\begin{center}
 | 
|
78  | 
\begin{tabular}{lcl}
 | 
|
79  | 
$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
 | 
|
80  | 
$\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
 | 
|
81  | 
$\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
 | 
|
82  | 
$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
 | 
|
83  | 
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
 | 
|
84  | 
$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
 | 
|
85  | 
\end{tabular}
 | 
|
| 68 | 86  | 
\end{center}\hfill[1 Mark]
 | 
87  | 
||
88  | 
\item[(1b)] Implement a function, called \textit{der}, by recursion over
 | 
|
89  | 
regular expressions. It takes a character and a regular expression  | 
|
| 69 | 90  | 
as arguments and calculates the derivative regular expression according  | 
91  | 
to the rules:  | 
|
| 6 | 92  | 
|
93  | 
\begin{center}
 | 
|
94  | 
\begin{tabular}{lcl}
 | 
|
95  | 
$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
 | 
|
96  | 
$\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
 | 
|
97  | 
$\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
 | 
|
98  | 
$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
 | 
|
99  | 
$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
 | 
|
100  | 
      & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
 | 
|
101  | 
      & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
 | 
|
102  | 
$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
 | 
|
103  | 
\end{tabular}
 | 
|
| 69 | 104  | 
\end{center}
 | 
105  | 
||
106  | 
For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives  | 
|
107  | 
w.r.t.~the characters $a$, $b$ and $c$ are  | 
|
108  | 
||
109  | 
\begin{center}
 | 
|
110  | 
  \begin{tabular}{lcll}
 | 
|
111  | 
    $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\
 | 
|
112  | 
    $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
 | 
|
113  | 
    $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
 | 
|
114  | 
  \end{tabular}
 | 
|
115  | 
\end{center}
 | 
|
116  | 
||
117  | 
Let $r'$ stand for the first derivative, then taking the derivatives of $r'$  | 
|
118  | 
w.r.t.~the characters $a$, $b$ and $c$ gives  | 
|
119  | 
||
120  | 
\begin{center}
 | 
|
121  | 
  \begin{tabular}{lcll}
 | 
|
122  | 
    $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
 | 
|
123  | 
    $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\
 | 
|
124  | 
    $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
 | 
|
125  | 
  \end{tabular}
 | 
|
126  | 
\end{center}
 | 
|
127  | 
||
128  | 
One more example: Let $r''$ stand for the second derivative above,  | 
|
129  | 
then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$  | 
|
130  | 
and $c$ gives  | 
|
131  | 
||
132  | 
\begin{center}
 | 
|
133  | 
  \begin{tabular}{lcll}
 | 
|
134  | 
    $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
 | 
|
135  | 
    $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
 | 
|
136  | 
    $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$
 | 
|
137  | 
  \end{tabular}
 | 
|
138  | 
\end{center}
 | 
|
139  | 
||
140  | 
Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
 | 
|
141  | 
\mbox{}\hfill\mbox{[1 Mark]}
 | 
|
| 6 | 142  | 
|
| 68 | 143  | 
\item[(1c)] Implement the function \textit{simp}, which recursively
 | 
| 69 | 144  | 
traverses a regular expression from the inside to the outside, and  | 
145  | 
simplifies every sub-regular-expression on the left (see below) to  | 
|
| 68 | 146  | 
the regular expression on the right, except it does not simplify inside  | 
147  | 
  ${}^*$-regular expressions.
 | 
|
| 6 | 148  | 
|
| 68 | 149  | 
  \begin{center}
 | 
| 69 | 150  | 
\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
 | 
| 6 | 151  | 
$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\  | 
152  | 
$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\  | 
|
153  | 
$r \cdot \ONE$ & $\mapsto$ & $r$\\  | 
|
154  | 
$\ONE \cdot r$ & $\mapsto$ & $r$\\  | 
|
155  | 
$r + \ZERO$ & $\mapsto$ & $r$\\  | 
|
156  | 
$\ZERO + r$ & $\mapsto$ & $r$\\  | 
|
157  | 
$r + r$ & $\mapsto$ & $r$\\  | 
|
158  | 
\end{tabular}
 | 
|
| 68 | 159  | 
  \end{center}
 | 
160  | 
||
| 69 | 161  | 
For example the regular expression  | 
| 68 | 162  | 
\[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]  | 
163  | 
||
| 79 | 164  | 
  simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be
 | 
165  | 
seen as trees and there are several methods for traversing  | 
|
166  | 
trees. One of them corresponds to the inside-out traversal. Also  | 
|
167  | 
remember numerical expressions from school: there you had exprssions  | 
|
168  | 
like $u + \ldots + (1 \cdot x) * \ldots (z + (y \cdot 0)) \ldots$  | 
|
169  | 
and simplification rules that looked very similar to rules  | 
|
170  | 
above. You would simplify such numerical expressions by replacing  | 
|
171  | 
for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then  | 
|
172  | 
look if more rules are applicable. If you organise this  | 
|
173  | 
simplification in an inside-out fashion, it is always clear which  | 
|
174  | 
  rule should applied next.\\\mbox{}\hfill[1 Mark]
 | 
|
| 68 | 175  | 
|
176  | 
\item[(1d)] Implement two functions: The first, called \textit{ders},
 | 
|
| 69 | 177  | 
takes a list of characters and a regular expression as arguments, and  | 
178  | 
builds the derivative w.r.t.~the list as follows:  | 
|
| 68 | 179  | 
|
180  | 
\begin{center}
 | 
|
181  | 
\begin{tabular}{lcl}
 | 
|
| 69 | 182  | 
$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
 | 
183  | 
  $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
 | 
|
| 68 | 184  | 
    $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
 | 
185  | 
\end{tabular}
 | 
|
| 6 | 186  | 
\end{center}
 | 
187  | 
||
| 78 | 188  | 
Note that this function is different from \textit{der}, which only
 | 
189  | 
takes a single character.  | 
|
190  | 
||
191  | 
The second function, called \textit{matcher}, takes a string and a
 | 
|
192  | 
regular expression as arguments. It builds first the derivatives  | 
|
193  | 
according to \textit{ders} and after that tests whether the resulting
 | 
|
194  | 
derivative regular expression can match the empty string (using  | 
|
195  | 
\textit{nullable}).  For example the \textit{matcher} will produce
 | 
|
196  | 
true given the regular expression $(a\cdot b)\cdot c$ and the string  | 
|
197  | 
$abc$.\\  \mbox{}\hfill[1 Mark]
 | 
|
| 6 | 198  | 
|
| 69 | 199  | 
\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
 | 
200  | 
(from the left to  | 
|
201  | 
right) in the string $s_1$ all the non-empty substrings that match the  | 
|
202  | 
regular expression $r$---these substrings are assumed to be  | 
|
| 68 | 203  | 
the longest substrings matched by the regular expression and  | 
| 69 | 204  | 
assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$  | 
205  | 
are replaced by $s_2$. For example given the regular expression  | 
|
| 6 | 206  | 
|
| 68 | 207  | 
\[(a \cdot a)^* + (b \cdot b)\]  | 
| 6 | 208  | 
|
| 69 | 209  | 
\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and  | 
| 78 | 210  | 
replacement the string $s_2 = c$ yields the string  | 
| 6 | 211  | 
|
| 68 | 212  | 
\[  | 
213  | 
ccbcabcaccc  | 
|
214  | 
\]  | 
|
| 6 | 215  | 
|
| 78 | 216  | 
\hfill[2 Marks]  | 
| 68 | 217  | 
\end{itemize}
 | 
| 6 | 218  | 
|
| 78 | 219  | 
|
220  | 
||
221  | 
||
222  | 
\subsection*{Part 2 (4 Marks)}
 | 
|
223  | 
||
224  | 
You need to copy all the code from \texttt{re.scala} into
 | 
|
225  | 
\texttt{re2.scala} in order to complete Part 2.  Parts (2a) and (2b)
 | 
|
226  | 
give you another method and datapoints for testing the \textit{der}
 | 
|
227  | 
and \textit{simp} functions from Part~1.
 | 
|
228  | 
||
| 79 | 229  | 
\subsubsection*{Tasks (file re2.scala)}
 | 
| 78 | 230  | 
|
231  | 
\begin{itemize}
 | 
|
232  | 
\item[(2a)] Write a \textbf{polymorphic} function, called
 | 
|
233  | 
  \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an
 | 
|
234  | 
integer $n$, a function $f$ and an $x$ as arguments. This function  | 
|
235  | 
should iterate $f$ $n$-times starting with the argument $x$, like  | 
|
236  | 
||
237  | 
  \[\underbrace{f(\ldots (f}_{n\text{-times}}(x)))
 | 
|
238  | 
\]  | 
|
239  | 
||
240  | 
  More formally that means \textit{iterT} behaves as follows:
 | 
|
241  | 
||
242  | 
  \begin{center}
 | 
|
243  | 
    \begin{tabular}{lcl}
 | 
|
244  | 
      $\textit{iterT}(n, f, x)$ & $\dn$ &
 | 
|
245  | 
      $\begin{cases}
 | 
|
246  | 
        \;x & \textit{if}\;n = 0\\
 | 
|
247  | 
        \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise}
 | 
|
248  | 
        \end{cases}$      
 | 
|
249  | 
    \end{tabular}
 | 
|
250  | 
\end{center}
 | 
|
251  | 
||
252  | 
  Make sure you write a \textbf{tail-recursive} version of
 | 
|
253  | 
  \textit{iterT}.  If you add the annotation \texttt{@tailrec} (see
 | 
|
| 79 | 254  | 
below) your code should not produce an error message.  | 
| 78 | 255  | 
|
256  | 
  \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
 | 
|
257  | 
import scala.annotation.tailrec  | 
|
258  | 
||
259  | 
@tailrec  | 
|
260  | 
def iterT[A](n: Int, f: A => A, x: A): A = ...  | 
|
261  | 
  \end{lstlisting}
 | 
|
262  | 
||
263  | 
  You can assume that \textit{iterT} will only be called for positive
 | 
|
264  | 
  integers $0 \le n$. Given the type variable \texttt{A}, the type of
 | 
|
265  | 
  $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means
 | 
|
266  | 
  \textit{iterT} can be used, for example, for functions from integers
 | 
|
| 79 | 267  | 
to integers, or strings to strings, or regular expressions to  | 
268  | 
  regular expressions.  \\ \mbox{}\hfill[2 Marks]
 | 
|
| 78 | 269  | 
|
270  | 
\item[(2b)] Implement a function, called \textit{size}, by recursion
 | 
|
271  | 
over regular expressions. If a regular expression is seen as a tree,  | 
|
272  | 
  then \textit{size} should return the number of nodes in such a
 | 
|
273  | 
tree. Therefore this function is defined as follows:  | 
|
274  | 
||
275  | 
\begin{center}
 | 
|
276  | 
\begin{tabular}{lcl}
 | 
|
277  | 
$\textit{size}(\ZERO)$ & $\dn$ & $1$\\
 | 
|
278  | 
$\textit{size}(\ONE)$  & $\dn$ & $1$\\
 | 
|
279  | 
$\textit{size}(c)$     & $\dn$ & $1$\\
 | 
|
280  | 
$\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
 | 
|
281  | 
$\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
 | 
|
282  | 
$\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
 | 
|
283  | 
\end{tabular}
 | 
|
284  | 
\end{center}
 | 
|
285  | 
||
286  | 
You can use \textit{size} and \textit{iterT} in order to test how much
 | 
|
287  | 
the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking  | 
|
| 79 | 288  | 
successive derivatives according the letter $a$ and then compare it to  | 
| 78 | 289  | 
taking the derivative, but simlifying the derivative after each step.  | 
290  | 
For example, the calls  | 
|
291  | 
||
292  | 
  \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
 | 
|
293  | 
  size(iterT(20, (r: Rexp) => der('a', r), EVIL))
 | 
|
294  | 
  size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL))
 | 
|
295  | 
  \end{lstlisting}
 | 
|
296  | 
||
297  | 
produce without simplification a regular expression of size of  | 
|
| 79 | 298  | 
7340068 after 20 iterations, while the one with  | 
299  | 
simplification gives  | 
|
| 78 | 300  | 
  just 8.\\ \mbox{}\hfill[1 Mark]
 | 
301  | 
||
302  | 
||
303  | 
\item[(2c)] Write a \textbf{polymorphic} function, called
 | 
|
304  | 
  \textit{fixpT}, that takes
 | 
|
305  | 
a function $f$ and an $x$ as arguments. The purpose  | 
|
306  | 
  of \textit{fixpT} is to calculate a fixpoint of the function $f$
 | 
|
307  | 
starting from the argument $x$.  | 
|
308  | 
A fixpoint, say $y$, is when $f(y) = y$ holds.  | 
|
309  | 
  That means \textit{fixpT} behaves as follows:
 | 
|
310  | 
||
311  | 
  \begin{center}
 | 
|
312  | 
    \begin{tabular}{lcl}
 | 
|
313  | 
      $\textit{fixpT}(f, x)$ & $\dn$ &
 | 
|
314  | 
      $\begin{cases}
 | 
|
315  | 
        \;x & \textit{if}\;f(x) = x\\
 | 
|
316  | 
        \;\textit{fixpT}(f, f(x)) & \textit{otherwise}
 | 
|
317  | 
        \end{cases}$      
 | 
|
318  | 
    \end{tabular}
 | 
|
319  | 
\end{center}
 | 
|
320  | 
||
321  | 
  Make sure you calculate in the code of $\textit{fixpT}$ the result
 | 
|
322  | 
  of $f(x)$ only once. Given the type variable \texttt{A} in
 | 
|
323  | 
  $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of
 | 
|
324  | 
  $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example
 | 
|
325  | 
function where in one the fixpoint is 1 and in the other  | 
|
326  | 
  it is the string $a$.\\ \mbox{}\hfill[1 Mark]  
 | 
|
327  | 
\end{itemize}\bigskip  
 | 
|
328  | 
||
329  | 
||
| 
94
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
330  | 
\subsection*{Background}
 | 
| 78 | 331  | 
|
| 
94
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
332  | 
Although easily implementable in Scala, the idea behind the derivative  | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
333  | 
function might not so easy to be seen. To understand its purpose  | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
334  | 
better, assume a regular expression $r$ can match strings of the form  | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
335  | 
$c::cs$ (that means strings which start with a character $c$ and have  | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
336  | 
some rest, or tail, $cs$). If you now take the derivative of $r$ with  | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
337  | 
respect to the character $c$, then you obtain a regular expressions  | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
338  | 
that can match all the strings $cs$. In other words, the regular  | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
339  | 
expression $\textit{der}\;c\;r$ can match the same strings $c::cs$
 | 
| 
 
ae4708c851ee
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
86 
diff
changeset
 | 
340  | 
that can be matched by $r$, except that the $c$ is chopped off.  | 
| 75 | 341  | 
|
342  | 
Assume now $r$ can match the string $abc$. If you take the derivative  | 
|
343  | 
according to $a$ then you obtain a regular expression that can match  | 
|
344  | 
$bc$ (it is $abc$ where the $a$ has been chopped off). If you now  | 
|
| 78 | 345  | 
build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you
 | 
346  | 
obtain a regular expression that can match the string $c$ (it is $bc$  | 
|
347  | 
where $b$ is chopped off). If you finally build the derivative of this  | 
|
348  | 
according $c$, that is  | 
|
349  | 
$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you
 | 
|
350  | 
obtain a regular expression that can match the empty string. You can  | 
|
351  | 
test this using the function nullable, which is what your matcher is  | 
|
352  | 
doing.  | 
|
| 75 | 353  | 
|
| 78 | 354  | 
The purpose of the simp function is to keep the regular expression  | 
355  | 
small. Normally the derivative function makes the regular expression  | 
|
356  | 
bigger (see the SEQ case and the example in (2b)) and the algorithm  | 
|
357  | 
would be slower and slower over time. The simp function counters this  | 
|
358  | 
increase in size and the result is that the algorithm is fast  | 
|
359  | 
throughout. By the way, this algorithm is by Janusz Brzozowski who  | 
|
360  | 
came up with the idea of derivatives in 1964 in his PhD thesis.  | 
|
| 75 | 361  | 
|
| 78 | 362  | 
\begin{center}\small
 | 
363  | 
\url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
 | 
|
364  | 
\end{center}
 | 
|
| 6 | 365  | 
|
366  | 
\end{document}
 | 
|
367  | 
||
| 68 | 368  | 
|
| 6 | 369  | 
%%% Local Variables:  | 
370  | 
%%% mode: latex  | 
|
371  | 
%%% TeX-master: t  | 
|
372  | 
%%% End:  |