| author | Christian Urban <urbanc@in.tum.de> | 
| Tue, 11 Apr 2017 06:22:46 +0800 | |
| changeset 483 | faba5360372c | 
| parent 481 | e2e13cc2c9d7 | 
| child 488 | 057b4603b940 | 
| permissions | -rw-r--r-- | 
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1  | 
\documentclass{article}
 | 
| 
251
 
5b5a68df6d16
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
217 
diff
changeset
 | 
2  | 
\usepackage{../style}
 | 
| 
217
 
cd6066f1056a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
140 
diff
changeset
 | 
3  | 
\usepackage{../langs}
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
4  | 
\usepackage{../graphics}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
5  | 
\usepackage{../data}
 | 
| 480 | 6  | 
|
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
7  | 
|
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
8  | 
\begin{document}
 | 
| 471 | 9  | 
\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
10  | 
|
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
11  | 
|
| 
272
 
1446bc47a294
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
268 
diff
changeset
 | 
12  | 
\section*{Handout 2 (Regular Expression Matching)}
 | 
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
13  | 
|
| 412 | 14  | 
This lecture is about implementing a more efficient regular expression  | 
| 478 | 15  | 
matcher (the plots on the right below)---more efficient than the  | 
16  | 
matchers from regular expression libraries in Ruby, Python and Java  | 
|
17  | 
(the plots on the left). The first pair of plots show the running time  | 
|
18  | 
for the regular expression $(a^*)^*\cdot b$ and strings composed of  | 
|
19  | 
$n$ \pcode{a}s (meaning this regular expression actually does not
 | 
|
20  | 
match the strings). The second pair of plots show the running time for  | 
|
21  | 
the regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings
 | 
|
22  | 
also composed of $n$ \pcode{a}s (this time the regular expressions
 | 
|
| 412 | 23  | 
match the strings). To see the substantial differences in the left  | 
| 478 | 24  | 
and right plots below, note the different scales of the $x$-axes.  | 
25  | 
||
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
26  | 
|
| 478 | 27  | 
\begin{center}
 | 
28  | 
Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$
 | 
|
29  | 
\begin{tabular}{@{}cc@{}}
 | 
|
30  | 
\begin{tikzpicture}
 | 
|
31  | 
\begin{axis}[
 | 
|
32  | 
    xlabel={$n$},
 | 
|
33  | 
    x label style={at={(1.05,0.0)}},
 | 
|
34  | 
    ylabel={time in secs},
 | 
|
35  | 
enlargelimits=false,  | 
|
36  | 
    xtick={0,5,...,30},
 | 
|
37  | 
xmax=33,  | 
|
38  | 
ymax=35,  | 
|
39  | 
    ytick={0,5,...,30},
 | 
|
40  | 
scaled ticks=false,  | 
|
41  | 
axis lines=left,  | 
|
42  | 
width=5cm,  | 
|
43  | 
height=5cm,  | 
|
44  | 
    legend entries={Java, Python},  
 | 
|
45  | 
legend pos=north west,  | 
|
46  | 
legend cell align=left]  | 
|
47  | 
\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
 | 
|
48  | 
\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
 | 
|
49  | 
\end{axis}
 | 
|
50  | 
\end{tikzpicture}
 | 
|
51  | 
&  | 
|
52  | 
\begin{tikzpicture}
 | 
|
53  | 
  \begin{axis}[
 | 
|
54  | 
    xlabel={$n$},
 | 
|
55  | 
    x label style={at={(1.05,0.0)}},
 | 
|
56  | 
    ylabel={time in secs},
 | 
|
57  | 
enlargelimits=false,  | 
|
58  | 
ymax=35,  | 
|
59  | 
    ytick={0,5,...,30},
 | 
|
60  | 
axis lines=left,  | 
|
61  | 
scaled ticks=false,  | 
|
62  | 
width=6.5cm,  | 
|
63  | 
height=5cm,  | 
|
64  | 
    legend entries={Scala V3},  
 | 
|
65  | 
legend pos=north east,  | 
|
66  | 
legend cell align=left]  | 
|
67  | 
%\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};    
 | 
|
68  | 
\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 | 
|
69  | 
\end{axis}
 | 
|
70  | 
\end{tikzpicture}
 | 
|
71  | 
\end{tabular}
 | 
|
72  | 
\end{center}
 | 
|
| 
263
 
92e6985018ae
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
262 
diff
changeset
 | 
73  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
74  | 
\begin{center}
 | 
| 415 | 75  | 
Graphs: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$\\
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
76  | 
\begin{tabular}{@{}cc@{}}
 | 
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
77  | 
\begin{tikzpicture}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
78  | 
\begin{axis}[
 | 
| 414 | 79  | 
    xlabel={$n$},
 | 
80  | 
    x label style={at={(1.05,0.0)}},
 | 
|
| 412 | 81  | 
    ylabel={\small time in secs},
 | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
82  | 
enlargelimits=false,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
83  | 
    xtick={0,5,...,30},
 | 
| 
291
 
201c2c6d8696
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
272 
diff
changeset
 | 
84  | 
xmax=33,  | 
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
85  | 
ymax=35,  | 
| 
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
86  | 
    ytick={0,5,...,30},
 | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
87  | 
scaled ticks=false,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
88  | 
axis lines=left,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
89  | 
width=5cm,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
90  | 
height=5cm,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
91  | 
    legend entries={Python,Ruby},  
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
92  | 
legend pos=north west,  | 
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
93  | 
legend cell align=left]  | 
| 
434
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
94  | 
\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
 | 
| 
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
95  | 
\addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};  
 | 
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
96  | 
\end{axis}
 | 
| 
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
97  | 
\end{tikzpicture}
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
98  | 
&  | 
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
99  | 
\begin{tikzpicture}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
100  | 
  \begin{axis}[
 | 
| 414 | 101  | 
    xlabel={$n$},
 | 
102  | 
    x label style={at={(1.1,0.05)}},
 | 
|
| 412 | 103  | 
    ylabel={\small time in secs},
 | 
104  | 
enlargelimits=false,  | 
|
| 477 | 105  | 
    xtick={0,2500,...,11000},
 | 
106  | 
xmax=12000,  | 
|
| 412 | 107  | 
ymax=35,  | 
108  | 
    ytick={0,5,...,30},
 | 
|
109  | 
scaled ticks=false,  | 
|
110  | 
axis lines=left,  | 
|
111  | 
width=6.5cm,  | 
|
| 478 | 112  | 
height=5cm,  | 
113  | 
    legend entries={Scala V3},  
 | 
|
114  | 
legend pos=north east,  | 
|
115  | 
legend cell align=left]  | 
|
116  | 
%\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 | 
|
| 412 | 117  | 
\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 | 
118  | 
\end{axis}
 | 
|
119  | 
\end{tikzpicture}
 | 
|
120  | 
\end{tabular}
 | 
|
121  | 
\end{center}
 | 
|
| 478 | 122  | 
\medskip  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
123  | 
|
| 412 | 124  | 
\noindent  | 
| 478 | 125  | 
We will use these regular expressions and strings as running  | 
126  | 
examples. There will be several versions (V1, V2, V3,\ldots) of the  | 
|
127  | 
algorithm.\footnote{The corresponding files are \texttt{re1.scala},
 | 
|
128  | 
  \texttt{re2.scala} and so on. As usual, you can find the code on
 | 
|
129  | 
KEATS.}\bigskip  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
130  | 
|
| 478 | 131  | 
\noindent  | 
| 412 | 132  | 
Having specified in the previous lecture what  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
133  | 
problem our regular expression matcher is supposed to solve,  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
134  | 
namely for any given regular expression $r$ and string $s$  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
135  | 
answer \textit{true} if and only if
 | 
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
136  | 
|
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
137  | 
\[  | 
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
138  | 
s \in L(r)  | 
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
139  | 
\]  | 
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
140  | 
|
| 412 | 141  | 
\noindent we can look at an algorithm to solve this problem. Clearly  | 
142  | 
we cannot use the function $L$ directly for this, because in general  | 
|
143  | 
the set of strings $L$ returns is infinite (recall what $L(a^*)$ is).  | 
|
144  | 
In such cases there is no way we can implement an exhaustive test for  | 
|
145  | 
whether a string is member of this set or not. In contrast our  | 
|
146  | 
matching algorithm will operate on the regular expression $r$ and  | 
|
| 414 | 147  | 
string $s$, only, which are both finite objects. Before we explain  | 
| 412 | 148  | 
the matching algorithm, however, let us have a closer look at what it  | 
149  | 
means when two regular expressions are equivalent.  | 
|
| 
258
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
150  | 
|
| 
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
151  | 
\subsection*{Regular Expression Equivalences}
 | 
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
152  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
153  | 
We already defined in Handout 1 what it means for two regular  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
154  | 
expressions to be equivalent, namely if their meaning is the  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
155  | 
same language:  | 
| 
258
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
156  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
157  | 
\[  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
158  | 
r_1 \equiv r_2 \;\dn\; L(r_1) = L(r_2)  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
159  | 
\]  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
160  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
161  | 
\noindent  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
162  | 
It is relatively easy to verify that some concrete equivalences  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
163  | 
hold, for example  | 
| 
124
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
164  | 
|
| 
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
165  | 
\begin{center}
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
166  | 
\begin{tabular}{rcl}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
167  | 
$(a + b) + c$ & $\equiv$ & $a + (b + c)$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
168  | 
$a + a$ & $\equiv$ & $a$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
169  | 
$a + b$ & $\equiv$ & $b + a$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
170  | 
$(a \cdot b) \cdot c$ & $\equiv$ & $a \cdot (b \cdot c)$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
171  | 
$c \cdot (a + b)$ & $\equiv$ & $(c \cdot a) + (c \cdot b)$\\  | 
| 
124
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
172  | 
\end{tabular}
 | 
| 
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
173  | 
\end{center}
 | 
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
174  | 
|
| 
124
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
175  | 
\noindent  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
176  | 
but also easy to verify that the following regular expressions  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
177  | 
are \emph{not} equivalent
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
178  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
179  | 
\begin{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
180  | 
\begin{tabular}{rcl}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
181  | 
$a \cdot a$ & $\not\equiv$ & $a$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
182  | 
$a + (b \cdot c)$ & $\not\equiv$ & $(a + b) \cdot (a + c)$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
183  | 
\end{tabular}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
184  | 
\end{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
185  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
186  | 
\noindent I leave it to you to verify these equivalences and  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
187  | 
non-equivalences. It is also interesting to look at some  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
188  | 
corner cases involving $\ONE$ and $\ZERO$:  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
189  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
190  | 
\begin{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
191  | 
\begin{tabular}{rcl}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
192  | 
$a \cdot \ZERO$ & $\not\equiv$ & $a$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
193  | 
$a + \ONE$ & $\not\equiv$ & $a$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
194  | 
$\ONE$ & $\equiv$ & $\ZERO^*$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
195  | 
$\ONE^*$ & $\equiv$ & $\ONE$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
196  | 
$\ZERO^*$ & $\not\equiv$ & $\ZERO$\\  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
197  | 
\end{tabular}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
198  | 
\end{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
199  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
200  | 
\noindent Again I leave it to you to make sure you agree  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
201  | 
with these equivalences and non-equivalences.  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
202  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
203  | 
|
| 
318
 
7975e4f0d4de
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
296 
diff
changeset
 | 
204  | 
For our matching algorithm however the following seven  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
205  | 
equivalences will play an important role:  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
206  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
207  | 
\begin{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
208  | 
\begin{tabular}{rcl}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
209  | 
$r + \ZERO$ & $\equiv$ & $r$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
210  | 
$\ZERO + r$ & $\equiv$ & $r$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
211  | 
$r \cdot \ONE$ & $\equiv$ & $r$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
212  | 
$\ONE \cdot r$ & $\equiv$ & $r$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
213  | 
$r \cdot \ZERO$ & $\equiv$ & $\ZERO$\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
214  | 
$\ZERO \cdot r$ & $\equiv$ & $\ZERO$\\  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
215  | 
$r + r$ & $\equiv$ & $r$  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
216  | 
\end{tabular}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
217  | 
\end{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
218  | 
|
| 412 | 219  | 
\noindent which always hold no matter what the regular expression $r$  | 
220  | 
looks like. The first two are easy to verify since $L(\ZERO)$ is the  | 
|
221  | 
empty set. The next two are also easy to verify since $L(\ONE) =  | 
|
222  | 
\{[]\}$ and appending the empty string to every string of another set,
 | 
|
223  | 
leaves the set unchanged. Be careful to fully comprehend the fifth and  | 
|
224  | 
sixth equivalence: if you concatenate two sets of strings and one is  | 
|
225  | 
the empty set, then the concatenation will also be the empty set. To  | 
|
226  | 
see this, check the definition of $\_ @ \_$ for sets. The last  | 
|
227  | 
equivalence is again trivial.  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
228  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
229  | 
What will be important later on is that we can orient these  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
230  | 
equivalences and read them from left to right. In this way we  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
231  | 
can view them as \emph{simplification rules}. Consider for 
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
232  | 
example the regular expression  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
233  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
234  | 
\begin{equation}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
235  | 
(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
236  | 
\label{big}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
237  | 
\end{equation}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
238  | 
|
| 412 | 239  | 
\noindent If we can find an equivalent regular expression that is  | 
240  | 
simpler (smaller for example), then this might potentially make our  | 
|
241  | 
matching algorithm run faster. We can look for such a simpler regular  | 
|
242  | 
expression $r'$ because whether a string $s$ is in $L(r)$ or in  | 
|
243  | 
$L(r')$ with $r\equiv r'$ will always give the same answer. In the  | 
|
244  | 
example above you will see that the regular expression is equivalent  | 
|
245  | 
to just $r_1$. You can verify this by iteratively applying the  | 
|
246  | 
simplification rules from above:  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
247  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
248  | 
\begin{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
249  | 
\begin{tabular}{ll}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
250  | 
& $(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
251  | 
(\underline{r_4 \cdot \ZERO})$\smallskip\\
 | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
252  | 
$\equiv$ & $(r_1 + \ZERO) \cdot \ONE + \underline{((\ONE + r_2) + r_3) \cdot 
 | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
253  | 
\ZERO}$\smallskip\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
254  | 
$\equiv$ & $\underline{(r_1 + \ZERO) \cdot \ONE} + \ZERO$\smallskip\\
 | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
255  | 
$\equiv$ & $(\underline{r_1 + \ZERO}) + \ZERO$\smallskip\\
 | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
256  | 
$\equiv$ & $\underline{r_1 + \ZERO}$\smallskip\\
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
257  | 
$\equiv$ & $r_1$\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
258  | 
\end{tabular}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
259  | 
\end{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
260  | 
|
| 
296
 
796b9b81ac8d
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
291 
diff
changeset
 | 
261  | 
\noindent In each step, I underlined where a simplification  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
262  | 
rule is applied. Our matching algorithm in the next section  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
263  | 
will often generate such ``useless'' $\ONE$s and  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
264  | 
$\ZERO$s, therefore simplifying them away will make the  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
265  | 
algorithm quite a bit faster.  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
266  | 
|
| 479 | 267  | 
Finally here are three equivalences between regulare expressions which are  | 
268  | 
not so obvious:  | 
|
269  | 
||
270  | 
\begin{center}
 | 
|
271  | 
\begin{tabular}{rcl}
 | 
|
272  | 
$r^*$ & $\equiv$ & $1 + r\cdot r^*$\\  | 
|
273  | 
$(r_1 + r_2)^*$ & $\equiv$ & $r_1^* \cdot (r_2\cdot r_1^*)^*$\\  | 
|
274  | 
$(r_1 \cdot r_2)^*$ & $\equiv$ & $1 + r_1\cdot (r_2 \cdot r_1)^* \cdot r_2$\\  | 
|
275  | 
\end{tabular}
 | 
|
276  | 
\end{center}
 | 
|
277  | 
||
278  | 
\noindent  | 
|
279  | 
You can try to establish them. As an aside, there has been a lot of research  | 
|
280  | 
in questions like: Can one always decide when two regular expressions are  | 
|
281  | 
equivalent or not? What does an algorithm look like to decide this?  | 
|
282  | 
||
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
283  | 
\subsection*{The Matching Algorithm}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
284  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
285  | 
The algorithm we will define below consists of two parts. One  | 
| 412 | 286  | 
is the function $\textit{nullable}$ which takes a regular expression as
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
287  | 
argument and decides whether it can match the empty string  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
288  | 
(this means it returns a boolean in Scala). This can be easily  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
289  | 
defined recursively as follows:  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
290  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
291  | 
\begin{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
292  | 
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
 | 
| 412 | 293  | 
$\textit{nullable}(\ZERO)$      & $\dn$ & $\textit{false}$\\
 | 
294  | 
$\textit{nullable}(\ONE)$         & $\dn$ & $\textit{true}$\\
 | 
|
295  | 
$\textit{nullable}(c)$                & $\dn$ & $\textit{false}$\\
 | 
|
296  | 
$\textit{nullable}(r_1 + r_2)$     & $\dn$ &  $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ 
 | 
|
297  | 
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ &  $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
 | 
|
298  | 
$\textit{nullable}(r^*)$              & $\dn$ & $\textit{true}$ \\
 | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
299  | 
\end{tabular}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
300  | 
\end{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
301  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
302  | 
\noindent The idea behind this function is that the following  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
303  | 
property holds:  | 
| 
124
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
304  | 
|
| 
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
305  | 
\[  | 
| 412 | 306  | 
\textit{nullable}(r) \;\;\text{if and only if}\;\; []\in L(r)
 | 
| 
124
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
307  | 
\]  | 
| 
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
308  | 
|
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
309  | 
\noindent Note on the left-hand side of the if-and-only-if we  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
310  | 
have a function we can implement; on the right we have its  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
311  | 
specification (which we cannot implement in a programming  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
312  | 
language).  | 
| 
124
 
dd8b5a3dac0a
adde
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
123 
diff
changeset
 | 
313  | 
|
| 
258
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
314  | 
The other function of our matching algorithm calculates a  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
315  | 
\emph{derivative} of a regular expression. This is a function
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
316  | 
which will take a regular expression, say $r$, and a  | 
| 412 | 317  | 
character, say $c$, as arguments and returns a new regular  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
318  | 
expression. Be careful that the intuition behind this function  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
319  | 
is not so easy to grasp on first reading. Essentially this  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
320  | 
function solves the following problem: if $r$ can match a  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
321  | 
string of the form $c\!::\!s$, what does the regular  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
322  | 
expression look like that can match just $s$? The definition  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
323  | 
of this function is as follows:  | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
324  | 
|
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
325  | 
\begin{center}
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
326  | 
\begin{tabular}{l@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
 | 
| 414 | 327  | 
  $\textit{der}\, c\, (\ZERO)$      & $\dn$ & $\ZERO$\\
 | 
328  | 
  $\textit{der}\, c\, (\ONE)$         & $\dn$ & $\ZERO$ \\
 | 
|
329  | 
  $\textit{der}\, c\, (d)$                & $\dn$ & if $c = d$ then $\ONE$ else $\ZERO$\\
 | 
|
330  | 
  $\textit{der}\, c\, (r_1 + r_2)$        & $\dn$ & $\textit{der}\, c\, r_1 + \textit{der}\, c\, r_2$\\
 | 
|
331  | 
  $\textit{der}\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $\textit{nullable} (r_1)$\\
 | 
|
332  | 
  & & then $(\textit{der}\,c\,r_1) \cdot r_2 + \textit{der}\, c\, r_2$\\ 
 | 
|
333  | 
  & & else $(\textit{der}\, c\, r_1) \cdot r_2$\\
 | 
|
334  | 
  $\textit{der}\, c\, (r^*)$          & $\dn$ & $(\textit{der}\,c\,r) \cdot (r^*)$
 | 
|
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
335  | 
  \end{tabular}
 | 
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
336  | 
\end{center}
 | 
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
337  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
338  | 
\noindent The first two clauses can be rationalised as  | 
| 414 | 339  | 
follows: recall that $\textit{der}$ should calculate a regular
 | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
340  | 
expression so that given the ``input'' regular expression can  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
341  | 
match a string of the form $c\!::\!s$, we want a regular  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
342  | 
expression for $s$. Since neither $\ZERO$ nor $\ONE$  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
343  | 
can match a string of the form $c\!::\!s$, we return  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
344  | 
$\ZERO$. In the third case we have to make a  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
345  | 
case-distinction: In case the regular expression is $c$, then  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
346  | 
clearly it can recognise a string of the form $c\!::\!s$, just  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
347  | 
that $s$ is the empty string. Therefore we return the  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
348  | 
$\ONE$-regular expression. In the other case we again  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
349  | 
return $\ZERO$ since no string of the $c\!::\!s$ can be  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
350  | 
matched. Next come the recursive cases, which are a bit more  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
351  | 
involved. Fortunately, the $+$-case is still relatively  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
352  | 
straightforward: all strings of the form $c\!::\!s$ are either  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
353  | 
matched by the regular expression $r_1$ or $r_2$. So we just  | 
| 414 | 354  | 
have to recursively call $\textit{der}$ with these two regular
 | 
| 
332
 
4755ad4b457b
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
325 
diff
changeset
 | 
355  | 
expressions and compose the results again with $+$. Makes  | 
| 412 | 356  | 
sense?  | 
357  | 
||
358  | 
The $\cdot$-case is more complicated: if $r_1\cdot r_2$  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
359  | 
matches a string of the form $c\!::\!s$, then the first part  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
360  | 
must be matched by $r_1$. Consequently, it makes sense to  | 
| 414 | 361  | 
construct the regular expression for $s$ by calling $\textit{der}$ with
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
362  | 
$r_1$ and ``appending'' $r_2$. There is however one exception  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
363  | 
to this simple rule: if $r_1$ can match the empty string, then  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
364  | 
all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
365  | 
nullable (that is can match the empty string) we have to allow  | 
| 414 | 366  | 
the choice $\textit{der}\,c\,r_2$ for calculating the regular
 | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
367  | 
expression that can match $s$. Therefore we have to add the  | 
| 414 | 368  | 
regular expression $\textit{der}\,c\,r_2$ in the result. The $*$-case
 | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
369  | 
is again simple: if $r^*$ matches a string of the form  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
370  | 
$c\!::\!s$, then the first part must be ``matched'' by a  | 
| 414 | 371  | 
single copy of $r$. Therefore we call recursively $\textit{der}\,c\,r$
 | 
372  | 
and ``append'' $r^*$ in order to match the rest of $s$. Still  | 
|
373  | 
makes sense?  | 
|
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
374  | 
|
| 414 | 375  | 
If all this did not make sense yet, here is another way to rationalise  | 
376  | 
the definition of $\textit{der}$ by considering the following operation
 | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
377  | 
on sets:  | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
378  | 
|
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
379  | 
\begin{equation}\label{Der}
 | 
| 414 | 380  | 
\textit{Der}\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
381  | 
\end{equation}
 | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
382  | 
|
| 
291
 
201c2c6d8696
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
272 
diff
changeset
 | 
383  | 
\noindent This operation essentially transforms a set of  | 
| 
 
201c2c6d8696
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
272 
diff
changeset
 | 
384  | 
strings $A$ by filtering out all strings that do not start  | 
| 
 
201c2c6d8696
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
272 
diff
changeset
 | 
385  | 
with $c$ and then strips off the $c$ from all the remaining  | 
| 
 
201c2c6d8696
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
272 
diff
changeset
 | 
386  | 
strings. For example suppose $A = \{f\!oo, bar, f\!rak\}$ then
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
387  | 
|
| 414 | 388  | 
\[ \textit{Der}\,f\,A = \{oo, rak\}\quad,\quad 
 | 
389  | 
   \textit{Der}\,b\,A = \{ar\} \quad \text{and} \quad 
 | 
|
390  | 
   \textit{Der}\,a\,A = \{\} 
 | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
391  | 
\]  | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
392  | 
|
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
393  | 
\noindent  | 
| 414 | 394  | 
Note that in the last case $\textit{Der}$ is empty, because no string in $A$
 | 
| 
258
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
395  | 
starts with $a$. With this operation we can state the following  | 
| 414 | 396  | 
property about $\textit{der}$:
 | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
397  | 
|
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
398  | 
\[  | 
| 414 | 399  | 
L(\textit{der}\,c\,r) = \textit{Der}\,c\,(L(r))
 | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
400  | 
\]  | 
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
401  | 
|
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
402  | 
\noindent  | 
| 414 | 403  | 
This property clarifies what regular expression $\textit{der}$ calculates,
 | 
| 
258
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
404  | 
namely take the set of strings that $r$ can match (that is $L(r)$),  | 
| 
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
405  | 
filter out all strings not starting with $c$ and strip off the $c$  | 
| 
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
406  | 
from the remaining strings---this is exactly the language that  | 
| 414 | 407  | 
$\textit{der}\,c\,r$ can match.
 | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
408  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
409  | 
If we want to find out whether the string $abc$ is matched by  | 
| 414 | 410  | 
the regular expression $r_1$ then we can iteratively apply $\textit{der}$
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
411  | 
as follows  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
412  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
413  | 
\begin{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
414  | 
\begin{tabular}{rll}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
415  | 
Input: $r_1$, $abc$\medskip\\  | 
| 414 | 416  | 
Step 1: & build derivative of $a$ and $r_1$ & $(r_2 = \textit{der}\,a\,r_1)$\smallskip\\
 | 
417  | 
Step 2: & build derivative of $b$ and $r_2$ & $(r_3 = \textit{der}\,b\,r_2)$\smallskip\\
 | 
|
| 
433
 
c08290ee4f1f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
416 
diff
changeset
 | 
418  | 
Step 3: & build derivative of $c$ and $r_3$ & $(r_4 = \textit{der}\,c\,r_3)$\smallskip\\
 | 
| 
 
c08290ee4f1f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
416 
diff
changeset
 | 
419  | 
Step 4: & the string is exhausted: & $(\textit{nullable}(r_4))$\\
 | 
| 
 
c08290ee4f1f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
416 
diff
changeset
 | 
420  | 
& test whether $r_4$ can recognise the\\  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
421  | 
& empty string\smallskip\\  | 
| 412 | 422  | 
Output: & result of this test $\Rightarrow \textit{true} \,\text{or}\, \textit{false}$\\        
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
423  | 
\end{tabular}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
424  | 
\end{center}
 | 
| 
140
 
1be892087df2
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
133 
diff
changeset
 | 
425  | 
|
| 414 | 426  | 
\noindent Again the operation $\textit{Der}$ might help to rationalise
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
427  | 
this algorithm. We want to know whether $abc \in L(r_1)$. We  | 
| 414 | 428  | 
do not know yet---but let us assume it is. Then $\textit{Der}\,a\,L(r_1)$
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
429  | 
builds the set where all the strings not starting with $a$ are  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
430  | 
filtered out. Of the remaining strings, the $a$ is stripped  | 
| 412 | 431  | 
off. So we should still have $bc$ in the set.  | 
432  | 
Then we continue with filtering out all strings not  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
433  | 
starting with $b$ and stripping off the $b$ from the remaining  | 
| 414 | 434  | 
strings, that means we build $\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1)))$.
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
435  | 
Finally we filter out all strings not starting with $c$ and  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
436  | 
strip off $c$ from the remaining string. This is  | 
| 414 | 437  | 
$\textit{Der}\,c\,(\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1))))$. Now if $abc$ was in the 
 | 
438  | 
original set ($L(r_1)$), then $\textit{Der}\,c\,(\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1))))$ 
 | 
|
| 412 | 439  | 
must contain the empty string. If not, then $abc$ was not in the  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
440  | 
language we started with.  | 
| 
140
 
1be892087df2
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
133 
diff
changeset
 | 
441  | 
|
| 414 | 442  | 
Our matching algorithm using $\textit{der}$ and $\textit{nullable}$ works
 | 
443  | 
similarly, just using regular expression instead of sets. In order to  | 
|
444  | 
define our algorithm we need to extend the notion of derivatives from single  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
445  | 
characters to strings. This can be done using the following  | 
| 414 | 446  | 
function, taking a string and a regular expression as input and  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
447  | 
a regular expression as output.  | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
448  | 
|
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
449  | 
\begin{center}
 | 
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
450  | 
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
451  | 
  $\textit{ders}\, []\, r$     & $\dn$ & $r$ & \\
 | 
| 414 | 452  | 
  $\textit{ders}\, (c\!::\!s)\, r$ & $\dn$ & $\textit{ders}\,s\,(\textit{der}\,c\,r)$ & \\
 | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
453  | 
  \end{tabular}
 | 
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
454  | 
\end{center}
 | 
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
455  | 
|
| 414 | 456  | 
\noindent This function iterates $\textit{der}$ taking one character at
 | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
457  | 
the time from the original string until it is exhausted.  | 
| 414 | 458  | 
Having $\textit{der}s$ in place, we can finally define our matching
 | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
459  | 
algorithm:  | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
460  | 
|
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
461  | 
\[  | 
| 414 | 462  | 
\textit{matches}\,s\,r \dn \textit{nullable}(\textit{ders}\,s\,r)
 | 
| 
125
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
463  | 
\]  | 
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
464  | 
|
| 
 
39c75cf4e079
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
124 
diff
changeset
 | 
465  | 
\noindent  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
466  | 
and we can claim that  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
467  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
468  | 
\[  | 
| 414 | 469  | 
\textit{matches}\,s\,r\quad\text{if and only if}\quad s\in L(r)
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
470  | 
\]  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
471  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
472  | 
\noindent holds, which means our algorithm satisfies the  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
473  | 
specification. Of course we can claim many things\ldots  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
474  | 
whether the claim holds any water is a different question,  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
475  | 
which for example is the point of the Strand-2 Coursework.  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
476  | 
|
| 414 | 477  | 
This algorithm was introduced by Janus Brzozowski in 1964, but  | 
478  | 
is more widely known only in the last 10 or so years. Its  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
479  | 
main attractions are simplicity and being fast, as well as  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
480  | 
being easily extendable for other regular expressions such as  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
481  | 
$r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of
 | 
| 414 | 482  | 
Strand-1 Coursework 1).  | 
| 
258
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
483  | 
|
| 
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
484  | 
\subsection*{The Matching Algorithm in Scala}
 | 
| 
 
1e4da6d2490c
updated programs
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
251 
diff
changeset
 | 
485  | 
|
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
486  | 
Another attraction of the algorithm is that it can be easily  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
487  | 
implemented in a functional programming language, like Scala.  | 
| 
296
 
796b9b81ac8d
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
291 
diff
changeset
 | 
488  | 
Given the implementation of regular expressions in Scala shown  | 
| 
 
796b9b81ac8d
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
291 
diff
changeset
 | 
489  | 
in the first lecture and handout, the functions and subfunctions  | 
| 
 
796b9b81ac8d
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
291 
diff
changeset
 | 
490  | 
for \pcode{matches} are shown in Figure~\ref{scala1}.
 | 
| 
126
 
7c7185cb4f2b
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
125 
diff
changeset
 | 
491  | 
|
| 
 
7c7185cb4f2b
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
125 
diff
changeset
 | 
492  | 
\begin{figure}[p]
 | 
| 477 | 493  | 
\lstinputlisting[numbers=left,linebackgroundcolor=  | 
494  | 
                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
 | 
|
495  | 
                  {../progs/app5.scala}
 | 
|
| 412 | 496  | 
\caption{Scala implementation of the \textit{nullable} and 
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
497  | 
derivative functions. These functions are easy to  | 
| 412 | 498  | 
implement in functional languages, because their built-in pattern  | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
499  | 
matching and recursion allow us to mimic the mathematical  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
500  | 
  definitions very closely.\label{scala1}}
 | 
| 
126
 
7c7185cb4f2b
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
125 
diff
changeset
 | 
501  | 
\end{figure}
 | 
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
502  | 
|
| 414 | 503  | 
|
| 
443
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
504  | 
%Remember our second example involving the regular expression  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
505  | 
%$(a^*)^* \cdot b$ which could not match strings of $n$ \texttt{a}s. 
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
506  | 
%Java needed around 30 seconds to find this out a string with $n=28$.  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
507  | 
%It seems our algorithm is doing rather well in comparison:  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
508  | 
%  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
509  | 
%\begin{center}
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
510  | 
%\begin{tikzpicture}
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
511  | 
%\begin{axis}[
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
512  | 
%    title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
513  | 
%    xlabel={$n$},
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
514  | 
%    x label style={at={(1.05,0.0)}},
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
515  | 
%    ylabel={time in secs},
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
516  | 
% enlargelimits=false,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
517  | 
%    xtick={0,1000,...,6500},
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
518  | 
% xmax=6800,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
519  | 
%    ytick={0,5,...,30},
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
520  | 
% ymax=34,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
521  | 
% scaled ticks=false,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
522  | 
% axis lines=left,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
523  | 
% width=8cm,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
524  | 
% height=4.5cm,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
525  | 
%    legend entries={Java,Scala V1},  
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
526  | 
% legend pos=north east,  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
527  | 
% legend cell align=left]  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
528  | 
%\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
529  | 
%\addplot[red,mark=triangle*,mark options={fill=white}] table {re1a.data};
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
530  | 
%\end{axis}
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
531  | 
%\end{tikzpicture}
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
532  | 
%\end{center}
 | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
533  | 
%  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
534  | 
%\noindent  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
535  | 
%This is not an error: it hardly takes more than half a second for  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
536  | 
%strings up to the length of 6500. After that we receive a  | 
| 
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
537  | 
%StackOverflow exception, but still\ldots  | 
| 414 | 538  | 
|
539  | 
For running the algorithm with our first example, the evil  | 
|
| 
394
 
2f9fe225ecc8
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
343 
diff
changeset
 | 
540  | 
regular expression $a^?{}^{\{n\}}a^{\{n\}}$, we need to implement
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
541  | 
the optional regular expression and the exactly $n$-times  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
542  | 
regular expression. This can be done with the translations  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
543  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
544  | 
\lstinputlisting[numbers=none]{../progs/app51.scala}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
545  | 
|
| 414 | 546  | 
\noindent Running the matcher with this example, we find it is  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
547  | 
slightly worse then the matcher in Ruby and Python.  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
548  | 
Ooops\ldots  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
549  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
550  | 
\begin{center}
 | 
| 414 | 551  | 
\begin{tikzpicture}
 | 
552  | 
\begin{axis}[    
 | 
|
| 415 | 553  | 
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 554  | 
    xlabel={$n$},
 | 
555  | 
    x label style={at={(1.05,0.0)}},
 | 
|
556  | 
    ylabel={time in secs},
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
557  | 
enlargelimits=false,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
558  | 
    xtick={0,5,...,30},
 | 
| 415 | 559  | 
xmax=32,  | 
| 414 | 560  | 
    ytick={0,5,...,30},
 | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
561  | 
scaled ticks=false,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
562  | 
axis lines=left,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
563  | 
width=6cm,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
564  | 
height=5cm,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
565  | 
    legend entries={Python,Ruby,Scala V1},  
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
566  | 
legend pos=outer north east,  | 
| 415 | 567  | 
legend cell align=left]  | 
| 
434
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
568  | 
\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
 | 
| 
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
569  | 
\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};  
 | 
| 
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
570  | 
\addplot[red,mark=triangle*,mark options={fill=white}] table {re1.data};  
 | 
| 414 | 571  | 
\end{axis}
 | 
572  | 
\end{tikzpicture}
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
573  | 
\end{center}
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
574  | 
|
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
575  | 
\noindent Analysing this failure we notice that for  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
576  | 
$a^{\{n\}}$ we generate quite big regular expressions:
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
577  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
578  | 
\begin{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
579  | 
\begin{tabular}{rl}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
580  | 
1: & $a$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
581  | 
2: & $a\cdot a$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
582  | 
3: & $a\cdot a\cdot a$\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
583  | 
& \ldots\\  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
584  | 
13: & $a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a$\\  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
585  | 
& \ldots  | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
586  | 
\end{tabular}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
587  | 
\end{center}
 | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
588  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
589  | 
\noindent Our algorithm traverses such regular expressions at  | 
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
590  | 
least once every time a derivative is calculated. So having  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
591  | 
large regular expressions will cause problems. This problem  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
592  | 
is aggravated by $a^?$ being represented as $a + \ONE$.  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
593  | 
|
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
594  | 
We can however fix this by having an explicit constructor for  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
595  | 
$r^{\{n\}}$. In Scala we would introduce a constructor like
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
596  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
597  | 
\begin{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
598  | 
\code{case class NTIMES(r: Rexp, n: Int) extends Rexp}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
599  | 
\end{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
600  | 
|
| 478 | 601  | 
\noindent With this fix we have a constant ``size'' regular expression  | 
602  | 
for our running example no matter how large $n$ is (see the  | 
|
603  | 
\texttt{size} section in the implementations).  This means we have to
 | 
|
604  | 
also add cases for \pcode{NTIMES} in the functions $\textit{nullable}$
 | 
|
605  | 
and $\textit{der}$. Does the change have any effect?
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
606  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
607  | 
\begin{center}
 | 
| 414 | 608  | 
\begin{tikzpicture}
 | 
609  | 
\begin{axis}[
 | 
|
| 415 | 610  | 
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 611  | 
    xlabel={$n$},
 | 
612  | 
    x label style={at={(1.01,0.0)}},
 | 
|
613  | 
    ylabel={time in secs},
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
614  | 
enlargelimits=false,  | 
| 477 | 615  | 
    xtick={0,200,...,1100},
 | 
616  | 
xmax=1200,  | 
|
| 414 | 617  | 
    ytick={0,5,...,30},
 | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
618  | 
scaled ticks=false,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
619  | 
axis lines=left,  | 
| 414 | 620  | 
width=10cm,  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
621  | 
height=5cm,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
622  | 
    legend entries={Python,Ruby,Scala V1,Scala V2},  
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
623  | 
legend pos=outer north east,  | 
| 414 | 624  | 
legend cell align=left]  | 
| 
434
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
625  | 
\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
 | 
| 
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
626  | 
\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};  
 | 
| 
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
627  | 
\addplot[red,mark=triangle*,mark options={fill=white}] table {re1.data};  
 | 
| 
 
8664ff87cd77
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
433 
diff
changeset
 | 
628  | 
\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 | 
| 414 | 629  | 
\end{axis}
 | 
630  | 
\end{tikzpicture}
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
631  | 
\end{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
632  | 
|
| 478 | 633  | 
\noindent Now we are talking business! The modified matcher can within  | 
634  | 
25 seconds handle regular expressions up to $n = 1,100$ before a  | 
|
635  | 
StackOverflow is raised. Recall that Python and Ruby (and our first  | 
|
636  | 
version, Scala V1) could only handle $n = 27$ or so in 30  | 
|
637  | 
seconds. There is no change for our second example $(a^*)^* \cdot  | 
|
638  | 
b$---so this is still good.  | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
639  | 
|
| 412 | 640  | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
641  | 
The moral is that our algorithm is rather sensitive to the  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
642  | 
size of regular expressions it needs to handle. This is of  | 
| 414 | 643  | 
course obvious because both $\textit{nullable}$ and $\textit{der}$ frequently
 | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
644  | 
need to traverse the whole regular expression. There seems,  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
645  | 
however, one more issue for making the algorithm run faster.  | 
| 
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
646  | 
The derivative function often produces ``useless''  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
647  | 
$\ZERO$s and $\ONE$s. To see this, consider $r = ((a  | 
| 478 | 648  | 
\cdot b) + b)^*$ and the following three derivatives  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
649  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
650  | 
\begin{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
651  | 
\begin{tabular}{l}
 | 
| 414 | 652  | 
$\textit{der}\,a\,r = ((\ONE \cdot b) + \ZERO) \cdot r$\\
 | 
653  | 
$\textit{der}\,b\,r = ((\ZERO \cdot b) + \ONE)\cdot r$\\
 | 
|
654  | 
$\textit{der}\,c\,r = ((\ZERO \cdot b) + \ZERO)\cdot r$
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
655  | 
\end{tabular}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
656  | 
\end{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
657  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
658  | 
\noindent  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
659  | 
If we simplify them according to the simple rules from the  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
660  | 
beginning, we can replace the right-hand sides by the  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
661  | 
smaller equivalent regular expressions  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
662  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
663  | 
\begin{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
664  | 
\begin{tabular}{l}
 | 
| 414 | 665  | 
$\textit{der}\,a\,r \equiv b \cdot r$\\
 | 
666  | 
$\textit{der}\,b\,r \equiv r$\\
 | 
|
667  | 
$\textit{der}\,c\,r \equiv \ZERO$
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
668  | 
\end{tabular}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
669  | 
\end{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
670  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
671  | 
\noindent I leave it to you to contemplate whether such a  | 
| 478 | 672  | 
simplification can have any impact on the correctness of our algorithm  | 
673  | 
(will it change any answers?). Figure~\ref{scala2} gives a
 | 
|
674  | 
simplification function that recursively traverses a regular  | 
|
675  | 
expression and simplifies it according to the rules given at the  | 
|
676  | 
beginning. There are only rules for $+$, $\cdot$ and $n$-times (the  | 
|
677  | 
latter because we added it in the second version of our  | 
|
678  | 
matcher). There is no simplification rule for a star, because  | 
|
679  | 
empirical data and also a little thought showed that simplifying under  | 
|
680  | 
a star is a waste of computation time. The simplification function  | 
|
681  | 
will be called after every derivation. This additional step removes  | 
|
682  | 
all the ``junk'' the derivative function introduced. Does this improve  | 
|
683  | 
the speed? You bet!!  | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
684  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
685  | 
\begin{figure}[p]
 | 
| 477 | 686  | 
\lstinputlisting[numbers=left,linebackgroundcolor=  | 
687  | 
  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
 | 
|
688  | 
                {../progs/app6.scala}
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
689  | 
\caption{The simplification function and modified 
 | 
| 
325
 
794c599cee53
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
318 
diff
changeset
 | 
690  | 
\texttt{ders}-function; this function now
 | 
| 
333
 
8890852e18b7
updated coursework
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
332 
diff
changeset
 | 
691  | 
calls \texttt{der} first, but then simplifies
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
692  | 
the resulting derivative regular expressions before  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
693  | 
building the next derivative, see  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
694  | 
Line~\ref{simpline}.\label{scala2}}
 | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
695  | 
\end{figure}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
696  | 
|
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
697  | 
\begin{center}
 | 
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
698  | 
\begin{tikzpicture}
 | 
| 414 | 699  | 
\begin{axis}[
 | 
| 415 | 700  | 
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 701  | 
    xlabel={$n$},
 | 
702  | 
    x label style={at={(1.04,0.0)}},
 | 
|
703  | 
    ylabel={time in secs},
 | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
704  | 
enlargelimits=false,  | 
| 478 | 705  | 
    xtick={0,2500,...,10000},
 | 
706  | 
xmax=12000,  | 
|
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
707  | 
    ytick={0,5,...,30},
 | 
| 
443
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
708  | 
ymax=32,  | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
709  | 
scaled ticks=false,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
710  | 
axis lines=left,  | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
711  | 
width=9cm,  | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
712  | 
height=5cm,  | 
| 415 | 713  | 
    legend entries={Scala V2,Scala V3},
 | 
714  | 
legend pos=outer north east,  | 
|
715  | 
legend cell align=left]  | 
|
716  | 
\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 | 
|
| 
268
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
717  | 
\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 | 
| 
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
718  | 
\end{axis}
 | 
| 
 
18bef085a7ca
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
263 
diff
changeset
 | 
719  | 
\end{tikzpicture}
 | 
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
720  | 
\end{center}
 | 
| 
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
721  | 
|
| 415 | 722  | 
\noindent  | 
| 478 | 723  | 
To reacap, Python and Ruby needed approximately 30 seconds to match a  | 
724  | 
string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot
 | 
|
725  | 
a^{\{n\}}$.  We need a third of this time to do the same with strings
 | 
|
726  | 
up to 11,000 \texttt{a}s.  Similarly, Java and Python needed 30
 | 
|
727  | 
seconds to find out the regular expression $(a^*)^* \cdot b$ does not  | 
|
728  | 
match the string of 28 \texttt{a}s. We can do the same in
 | 
|
729  | 
for strings of 6,000,000 \texttt{a}s:
 | 
|
| 415 | 730  | 
|
731  | 
||
| 414 | 732  | 
\begin{center}
 | 
733  | 
\begin{tikzpicture}
 | 
|
734  | 
\begin{axis}[
 | 
|
| 415 | 735  | 
    title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 736  | 
    xlabel={$n$},
 | 
737  | 
    ylabel={time in secs},
 | 
|
738  | 
enlargelimits=false,  | 
|
| 478 | 739  | 
ymax=35,  | 
| 414 | 740  | 
    ytick={0,5,...,30},
 | 
741  | 
axis lines=left,  | 
|
| 478 | 742  | 
scaled ticks=false,  | 
743  | 
    x label style={at={(1.09,0.0)}},
 | 
|
744  | 
%xmax=7700000,  | 
|
| 414 | 745  | 
width=9cm,  | 
746  | 
height=5cm,  | 
|
| 478 | 747  | 
    legend entries={Scala V3},
 | 
| 415 | 748  | 
legend pos=outer north east,  | 
749  | 
legend cell align=left]  | 
|
| 478 | 750  | 
%\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};
 | 
| 414 | 751  | 
\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 | 
752  | 
\end{axis}
 | 
|
753  | 
\end{tikzpicture}
 | 
|
754  | 
\end{center}
 | 
|
755  | 
||
| 415 | 756  | 
\subsection*{Epilogue}
 | 
757  | 
||
758  | 
(23/Aug/2016) I recently found another place where this algorithm can be  | 
|
| 478 | 759  | 
sped up (this idea is not integrated with what is coming next,  | 
| 415 | 760  | 
but I present it nonetheless). The idea is to define \texttt{ders}
 | 
761  | 
not such that it iterates the derivative character-by-character, but  | 
|
762  | 
in bigger chunks. The resulting code for \texttt{ders2} looks as
 | 
|
763  | 
follows:  | 
|
764  | 
||
765  | 
\lstinputlisting[numbers=none]{../progs/app52.scala} 
 | 
|
766  | 
||
767  | 
\noindent  | 
|
768  | 
I have not fully understood why this version is much faster,  | 
|
769  | 
but it seems it is a combination of the clauses for \texttt{ALT}
 | 
|
770  | 
and \texttt{SEQ}. In the latter case we call \texttt{der} with 
 | 
|
771  | 
a single character and this potentially produces an alternative.  | 
|
772  | 
The derivative of such an alternative can then be more effeciently  | 
|
773  | 
calculated by \texttt{ders2} since it pushes a whole string
 | 
|
774  | 
under an \texttt{ALT}. The numbers are that in the second case  
 | 
|
775  | 
$(a^*)^* \cdot b$ both versions are pretty much the same, but in the  | 
|
776  | 
first case $a^{?\{n\}} \cdot a^{\{n\}}$ the improvement gives 
 | 
|
777  | 
another factor of 100 speedup. Nice!  | 
|
| 414 | 778  | 
|
| 415 | 779  | 
\begin{center}
 | 
780  | 
\begin{tabular}{cc}
 | 
|
781  | 
\begin{tikzpicture}
 | 
|
782  | 
\begin{axis}[
 | 
|
783  | 
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
|
784  | 
    xlabel={$n$},
 | 
|
785  | 
    x label style={at={(1.04,0.0)}},
 | 
|
786  | 
    ylabel={time in secs},
 | 
|
787  | 
enlargelimits=false,  | 
|
788  | 
xmax=7100000,  | 
|
789  | 
    ytick={0,5,...,30},
 | 
|
790  | 
ymax=33,  | 
|
791  | 
%scaled ticks=false,  | 
|
792  | 
axis lines=left,  | 
|
| 
443
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
793  | 
width=5.5cm,  | 
| 415 | 794  | 
height=5cm,  | 
795  | 
    legend entries={Scala V3, Scala V4},
 | 
|
| 
443
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
796  | 
    legend style={at={(0.1,-0.2)},anchor=north}]
 | 
| 415 | 797  | 
\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 | 
798  | 
\addplot[purple,mark=square*,mark options={fill=white}] table {re4.data};
 | 
|
799  | 
\end{axis}
 | 
|
800  | 
\end{tikzpicture}
 | 
|
801  | 
&  | 
|
802  | 
\begin{tikzpicture}
 | 
|
803  | 
\begin{axis}[
 | 
|
804  | 
    title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
|
805  | 
    xlabel={$n$},
 | 
|
806  | 
    x label style={at={(1.09,0.0)}},
 | 
|
807  | 
    ylabel={time in secs},
 | 
|
808  | 
enlargelimits=false,  | 
|
809  | 
xmax=8100000,  | 
|
810  | 
    ytick={0,5,...,30},
 | 
|
811  | 
ymax=33,  | 
|
812  | 
%scaled ticks=false,  | 
|
813  | 
axis lines=left,  | 
|
| 
443
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
814  | 
width=5.5cm,  | 
| 415 | 815  | 
height=5cm,  | 
816  | 
    legend entries={Scala V3, Scala V4},
 | 
|
| 
443
 
cd43d8c6eb84
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
434 
diff
changeset
 | 
817  | 
    legend style={at={(0.1,-0.2)},anchor=north}]
 | 
| 415 | 818  | 
\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 | 
819  | 
\addplot[purple,mark=square*,mark options={fill=white}] table {re4a.data};
 | 
|
820  | 
\end{axis}
 | 
|
821  | 
\end{tikzpicture}
 | 
|
822  | 
\end{tabular}
 | 
|
823  | 
\end{center}
 | 
|
| 414 | 824  | 
|
| 412 | 825  | 
|
| 
334
 
fd89a63e9db3
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
333 
diff
changeset
 | 
826  | 
\section*{Proofs}
 | 
| 
 
fd89a63e9db3
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
333 
diff
changeset
 | 
827  | 
|
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
828  | 
You might not like doing proofs. But they serve a very  | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
829  | 
important purpose in Computer Science: How can we be sure that  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
830  | 
our algorithm matches its specification. We can try to test  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
831  | 
the algorithm, but that often overlooks corner cases and an  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
832  | 
exhaustive testing is impossible (since there are infinitely  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
833  | 
many inputs). Proofs allow us to ensure that an algorithm  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
834  | 
really meets its specification.  | 
| 
338
 
f16120cb4e19
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
334 
diff
changeset
 | 
835  | 
|
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
836  | 
For the programs we look at in this module, the proofs will  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
837  | 
mostly by some form of induction. Remember that regular  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
838  | 
expressions are defined as  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
839  | 
|
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
840  | 
\begin{center}
 | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
841  | 
\begin{tabular}{r@{\hspace{1mm}}r@{\hspace{1mm}}l@{\hspace{13mm}}l}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
842  | 
$r$ & $::=$ & $\ZERO$ & null language\\  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
843  | 
        & $\mid$ & $\ONE$           & empty string / \texttt{""} / []\\
 | 
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
844  | 
& $\mid$ & $c$ & single character\\  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
845  | 
& $\mid$ & $r_1 + r_2$ & alternative / choice\\  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
846  | 
& $\mid$ & $r_1 \cdot r_2$ & sequence\\  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
847  | 
& $\mid$ & $r^*$ & star (zero or more)\\  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
848  | 
  \end{tabular}
 | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
849  | 
\end{center}
 | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
850  | 
|
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
851  | 
\noindent If you want to show a property $P(r)$ for all  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
852  | 
regular expressions $r$, then you have to follow essentially  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
853  | 
the recipe:  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
854  | 
|
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
855  | 
\begin{itemize}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
856  | 
\item $P$ has to hold for $\ZERO$, $\ONE$ and $c$  | 
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
857  | 
(these are the base cases).  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
858  | 
\item $P$ has to hold for $r_1 + r_2$ under the assumption  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
859  | 
that $P$ already holds for $r_1$ and $r_2$.  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
860  | 
\item $P$ has to hold for $r_1 \cdot r_2$ under the  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
861  | 
assumption that $P$ already holds for $r_1$ and $r_2$.  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
862  | 
\item $P$ has to hold for $r^*$ under the assumption  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
863  | 
that $P$ already holds for $r$.  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
864  | 
\end{itemize}
 | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
865  | 
|
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
866  | 
\noindent  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
867  | 
A simple proof is for example showing the following  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
868  | 
property:  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
869  | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
870  | 
\begin{equation}
 | 
| 412 | 871  | 
\textit{nullable}(r) \;\;\text{if and only if}\;\; []\in L(r)
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
872  | 
\label{nullableprop}
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
873  | 
\end{equation}
 | 
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
874  | 
|
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
875  | 
\noindent  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
876  | 
Let us say that this property is $P(r)$, then the first case  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
877  | 
we need to check is whether $P(\ZERO)$ (see recipe  | 
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
878  | 
above). So we have to show that  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
879  | 
|
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
880  | 
\[  | 
| 412 | 881  | 
\textit{nullable}(\ZERO) \;\;\text{if and only if}\;\; 
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
882  | 
[]\in L(\ZERO)  | 
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
883  | 
\]  | 
| 
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
884  | 
|
| 412 | 885  | 
\noindent whereby $\textit{nullable}(\ZERO)$ is by definition of
 | 
886  | 
the function $\textit{nullable}$ always $\textit{false}$. We also have
 | 
|
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
887  | 
that $L(\ZERO)$ is by definition $\{\}$. It is
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
888  | 
impossible that the empty string $[]$ is in the empty set.  | 
| 
339
 
bc395ccfba7f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
338 
diff
changeset
 | 
889  | 
Therefore also the right-hand side is false. Consequently we  | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
890  | 
verified this case: both sides are false. We would still need  | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
891  | 
to do this for $P(\ONE)$ and $P(c)$. I leave this to  | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
892  | 
you to verify.  | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
893  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
894  | 
Next we need to check the inductive cases, for example  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
895  | 
$P(r_1 + r_2)$, which is  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
896  | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
897  | 
\begin{equation}
 | 
| 412 | 898  | 
\textit{nullable}(r_1 + r_2) \;\;\text{if and only if}\;\; 
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
899  | 
[]\in L(r_1 + r_2)  | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
900  | 
\label{propalt}
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
901  | 
\end{equation}
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
902  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
903  | 
\noindent The difference to the base cases is that in this  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
904  | 
case we can already assume we proved  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
905  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
906  | 
\begin{center}
 | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
907  | 
\begin{tabular}{l}
 | 
| 412 | 908  | 
$\textit{nullable}(r_1) \;\;\text{if and only if}\;\; []\in L(r_1)$ and\\
 | 
909  | 
$\textit{nullable}(r_2) \;\;\text{if and only if}\;\; []\in L(r_2)$\\
 | 
|
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
910  | 
\end{tabular}
 | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
911  | 
\end{center}
 | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
912  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
913  | 
\noindent These are the induction hypotheses. To check this  | 
| 412 | 914  | 
case, we can start from $\textit{nullable}(r_1 + r_2)$, which by 
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
915  | 
definition is  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
916  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
917  | 
\[  | 
| 412 | 918  | 
\textit{nullable}(r_1) \vee \textit{nullable}(r_2)
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
919  | 
\]  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
920  | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
921  | 
\noindent Using the two induction hypotheses from above,  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
922  | 
we can transform this into  | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
923  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
924  | 
\[  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
925  | 
[] \in L(r_1) \vee []\in(r_2)  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
926  | 
\]  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
927  | 
|
| 412 | 928  | 
\noindent We just replaced the $\textit{nullable}(\ldots)$ parts by
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
929  | 
the equivalent $[] \in L(\ldots)$ from the induction  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
930  | 
hypotheses. A bit of thinking convinces you that if  | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
931  | 
$[] \in L(r_1) \vee []\in L(r_2)$ then the empty string  | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
932  | 
must be in the union $L(r_1)\cup L(r_2)$, that is  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
933  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
934  | 
\[  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
935  | 
[] \in L(r_1)\cup L(r_2)  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
936  | 
\]  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
937  | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
938  | 
\noindent but this is by definition of $L$ exactly $[] \in  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
939  | 
L(r_1 + r_2)$, which we needed to establish according to  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
940  | 
\eqref{propalt}. What we have shown is that starting from
 | 
| 412 | 941  | 
$\textit{nullable}(r_1 + r_2)$ we have done equivalent transformations
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
942  | 
to end up with $[] \in L(r_1 + r_2)$. Consequently we have  | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
943  | 
established that $P(r_1 + r_2)$ holds.  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
944  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
945  | 
In order to complete the proof we would now need to look  | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
946  | 
at the cases \mbox{$P(r_1\cdot r_2)$} and $P(r^*)$. Again I let you
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
947  | 
check the details.  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
948  | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
949  | 
You might have to do induction proofs over strings.  | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
950  | 
That means you want to establish a property $P(s)$ for all  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
951  | 
strings $s$. For this remember strings are lists of  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
952  | 
characters. These lists can be either the empty list or a  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
953  | 
list of the form $c::s$. If you want to perform an induction  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
954  | 
proof for strings you need to consider the cases  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
955  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
956  | 
\begin{itemize}
 | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
957  | 
\item $P$ has to hold for $[]$ (this is the base case).  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
958  | 
\item $P$ has to hold for $c::s$ under the assumption  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
959  | 
that $P$ already holds for $s$.  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
960  | 
\end{itemize}
 | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
961  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
962  | 
\noindent  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
963  | 
Given this recipe, I let you show  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
964  | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
965  | 
\begin{equation}
 | 
| 414 | 966  | 
\textit{Ders}\,s\,(L(r)) = L(\textit{ders}\,s\,r)
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
967  | 
\label{dersprop}
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
968  | 
\end{equation}
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
969  | 
|
| 414 | 970  | 
\noindent by induction on $s$. Recall $\textit{Der}$ is defined for 
 | 
971  | 
character---see \eqref{Der}; $\textit{Ders}$ is similar, but for strings:
 | 
|
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
972  | 
|
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
973  | 
\[  | 
| 414 | 974  | 
\textit{Ders}\,s\,A\;\dn\;\{s'\,|\,s @ s' \in A\}
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
975  | 
\]  | 
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
976  | 
|
| 
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
977  | 
\noindent In this proof you can assume the following property  | 
| 414 | 978  | 
for $der$ and $\textit{Der}$ has already been proved, that is you can
 | 
| 
399
 
5c1fbb39c93e
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
394 
diff
changeset
 | 
979  | 
assume  | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
980  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
981  | 
\[  | 
| 414 | 982  | 
L(\textit{der}\,c\,r) = \textit{Der}\,c\,(L(r))
 | 
| 
340
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
983  | 
\]  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
984  | 
|
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
985  | 
\noindent holds (this would be of course a property that  | 
| 
 
c49122dbcdd1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
339 
diff
changeset
 | 
986  | 
needs to be proved in a side-lemma by induction on $r$).  | 
| 
338
 
f16120cb4e19
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
334 
diff
changeset
 | 
987  | 
|
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
988  | 
To sum up, using reasoning like the one shown above allows us  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
989  | 
to show the correctness of our algorithm. To see this,  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
990  | 
start from the specification  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
991  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
992  | 
\[  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
993  | 
s \in L(r)  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
994  | 
\]  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
995  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
996  | 
\noindent That is the problem we want to solve. Thinking a  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
997  | 
little, you will see that this problem is equivalent to the  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
998  | 
following problem  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
999  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1000  | 
\begin{equation}
 | 
| 414 | 1001  | 
[] \in \textit{Ders}\,s\,(L(r))
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1002  | 
\label{dersstep}
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1003  | 
\end{equation}
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1004  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1005  | 
\noindent But we have shown above in \eqref{dersprop}, that
 | 
| 414 | 1006  | 
the $\textit{Ders}$ can be replaced by $L(\textit{ders}\ldots)$. That means 
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1007  | 
\eqref{dersstep} is equivalent to 
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1008  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1009  | 
\begin{equation}
 | 
| 414 | 1010  | 
[] \in L(\textit{ders}\,s\,r)
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1011  | 
\label{prefinalstep}
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1012  | 
\end{equation}
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1013  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1014  | 
\noindent We have also shown that testing whether the empty  | 
| 412 | 1015  | 
string is in a language is equivalent to the $\textit{nullable}$
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1016  | 
function; see \eqref{nullableprop}. That means
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1017  | 
\eqref{prefinalstep} is equivalent with
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1018  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1019  | 
\[  | 
| 414 | 1020  | 
\textit{nullable}(\textit{ders}\,s\,r)
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1021  | 
\]  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1022  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1023  | 
\noindent But this is just the definition of $matches$  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1024  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1025  | 
\[  | 
| 414 | 1026  | 
matches\,s\,r \dn nullable(\textit{ders}\,s\,r)
 | 
| 
343
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1027  | 
\]  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1028  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1029  | 
\noindent In effect we have shown  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1030  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1031  | 
\[  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1032  | 
matches\,s\,r\;\;\text{if and only if}\;\;
 | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1033  | 
s\in L(r)  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1034  | 
\]  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1035  | 
|
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1036  | 
\noindent which is the property we set out to prove:  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1037  | 
our algorithm meets its specification. To have done  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1038  | 
so, requires a few induction proofs about strings and  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1039  | 
regular expressions. Following the recipes is already a big  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1040  | 
step in performing these proofs.  | 
| 
 
539b2e88f5b9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
340 
diff
changeset
 | 
1041  | 
|
| 
262
 
ee4304bc6350
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
261 
diff
changeset
 | 
1042  | 
\end{document}
 | 
| 
261
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
1043  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
1044  | 
|
| 
 
24531cfaa36a
updated handouts
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
259 
diff
changeset
 | 
1045  | 
|
| 
123
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1046  | 
|
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1047  | 
%%% Local Variables:  | 
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1048  | 
%%% mode: latex  | 
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1049  | 
%%% TeX-master: t  | 
| 
 
a75f9c9d8f94
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1050  | 
%%% End:  |