| author | Christian Urban <christian dot urban at kcl dot ac dot uk> | 
| Tue, 18 Oct 2016 20:39:54 +0100 | |
| changeset 456 | 4abd90760ffe | 
| parent 443 | cd43d8c6eb84 | 
| child 471 | e5df48ff7033 | 
| permissions | -rw-r--r-- | 
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1 | \documentclass{article}
 | 
| 251 
5b5a68df6d16
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
217diff
changeset | 2 | \usepackage{../style}
 | 
| 217 
cd6066f1056a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
140diff
changeset | 3 | \usepackage{../langs}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 4 | \usepackage{../graphics}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 5 | \usepackage{../data}
 | 
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 6 | |
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 7 | |
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 8 | \begin{document}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 9 | \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016}
 | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 10 | |
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 11 | |
| 272 
1446bc47a294
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
268diff
changeset | 12 | \section*{Handout 2 (Regular Expression Matching)}
 | 
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 13 | |
| 412 | 14 | This lecture is about implementing a more efficient regular expression | 
| 15 | matcher (the plots on the right)---more efficient than the matchers | |
| 16 | from regular expression libraries in Ruby, Python and Java (the plots | |
| 17 | on the left). The first pair of plots show the running time for the | |
| 18 | regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings composed
 | |
| 19 | of $n$ \pcode{a}s. The second pair of plots show the running time
 | |
| 20 | for the regular expression $(a^*)^*\cdot b$ and also strings composed | |
| 21 | of $n$ \pcode{a}s (meaning this regular expression actually does not
 | |
| 22 | match the strings). To see the substantial differences in the left | |
| 23 | and right plots below, note the different scales of the $x$-axes. | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 24 | |
| 263 
92e6985018ae
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
262diff
changeset | 25 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 26 | \begin{center}
 | 
| 415 | 27 | Graphs: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$\\
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 28 | \begin{tabular}{@{}cc@{}}
 | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 29 | \begin{tikzpicture}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 30 | \begin{axis}[
 | 
| 414 | 31 |     xlabel={$n$},
 | 
| 32 |     x label style={at={(1.05,0.0)}},
 | |
| 412 | 33 |     ylabel={\small time in secs},
 | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 34 | enlargelimits=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 35 |     xtick={0,5,...,30},
 | 
| 291 
201c2c6d8696
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
272diff
changeset | 36 | xmax=33, | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 37 | ymax=35, | 
| 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 38 |     ytick={0,5,...,30},
 | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 39 | scaled ticks=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 40 | axis lines=left, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 41 | width=5cm, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 42 | height=5cm, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 43 |     legend entries={Python,Ruby},  
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 44 | legend pos=north west, | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 45 | legend cell align=left] | 
| 434 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 46 | \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
 | 
| 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 47 | \addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};  
 | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 48 | \end{axis}
 | 
| 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 49 | \end{tikzpicture}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 50 | & | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 51 | \begin{tikzpicture}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 52 |   \begin{axis}[
 | 
| 414 | 53 |     xlabel={$n$},
 | 
| 54 |     x label style={at={(1.1,0.05)}},
 | |
| 412 | 55 |     ylabel={\small time in secs},
 | 
| 56 | enlargelimits=false, | |
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 57 |     xtick={0,3000,...,9000},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 58 | xmax=10000, | 
| 412 | 59 | ymax=35, | 
| 60 |     ytick={0,5,...,30},
 | |
| 61 | scaled ticks=false, | |
| 62 | axis lines=left, | |
| 63 | width=6.5cm, | |
| 64 | height=5cm] | |
| 434 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 65 | \addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 | 
| 412 | 66 | \addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 | 
| 67 | \end{axis}
 | |
| 68 | \end{tikzpicture}
 | |
| 69 | \end{tabular}
 | |
| 70 | \end{center}
 | |
| 71 | ||
| 72 | \begin{center}
 | |
| 415 | 73 | Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$
 | 
| 412 | 74 | \begin{tabular}{@{}cc@{}}
 | 
| 75 | \begin{tikzpicture}
 | |
| 76 | \begin{axis}[
 | |
| 414 | 77 |     xlabel={$n$},
 | 
| 78 |     x label style={at={(1.05,0.0)}},
 | |
| 412 | 79 |     ylabel={time in secs},
 | 
| 80 | enlargelimits=false, | |
| 81 |     xtick={0,5,...,30},
 | |
| 82 | xmax=33, | |
| 83 | ymax=35, | |
| 84 |     ytick={0,5,...,30},
 | |
| 85 | scaled ticks=false, | |
| 86 | axis lines=left, | |
| 87 | width=5cm, | |
| 88 | height=5cm, | |
| 89 |     legend entries={Java},  
 | |
| 90 | legend pos=north west, | |
| 91 | legend cell align=left] | |
| 415 | 92 | \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
 | 
| 412 | 93 | \end{axis}
 | 
| 94 | \end{tikzpicture}
 | |
| 95 | & | |
| 96 | \begin{tikzpicture}
 | |
| 97 |   \begin{axis}[
 | |
| 414 | 98 |     xlabel={$n$},
 | 
| 416 | 99 |     x label style={at={(1.05,0.0)}},
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 100 |     ylabel={time in secs},
 | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 101 | enlargelimits=false, | 
| 291 
201c2c6d8696
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
272diff
changeset | 102 | ymax=35, | 
| 
201c2c6d8696
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
272diff
changeset | 103 |     ytick={0,5,...,30},
 | 
| 416 | 104 | axis lines=left, | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 105 | scaled ticks=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 106 | width=6.5cm, | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 107 | height=5cm] | 
| 434 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 108 | \addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};    
 | 
| 416 | 109 | \addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 110 | \end{axis}
 | 
| 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 111 | \end{tikzpicture}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 112 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 113 | \end{center}\medskip
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 114 | |
| 412 | 115 | \noindent | 
| 116 | We will use these regular expressions and strings | |
| 117 | as running examples. | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 118 | |
| 412 | 119 | Having specified in the previous lecture what | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 120 | problem our regular expression matcher is supposed to solve, | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 121 | namely for any given regular expression $r$ and string $s$ | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 122 | answer \textit{true} if and only if
 | 
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 123 | |
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 124 | \[ | 
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 125 | s \in L(r) | 
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 126 | \] | 
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 127 | |
| 412 | 128 | \noindent we can look at an algorithm to solve this problem. Clearly | 
| 129 | we cannot use the function $L$ directly for this, because in general | |
| 130 | the set of strings $L$ returns is infinite (recall what $L(a^*)$ is). | |
| 131 | In such cases there is no way we can implement an exhaustive test for | |
| 132 | whether a string is member of this set or not. In contrast our | |
| 133 | matching algorithm will operate on the regular expression $r$ and | |
| 414 | 134 | string $s$, only, which are both finite objects. Before we explain | 
| 412 | 135 | the matching algorithm, however, let us have a closer look at what it | 
| 136 | means when two regular expressions are equivalent. | |
| 258 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 137 | |
| 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 138 | \subsection*{Regular Expression Equivalences}
 | 
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 139 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 140 | We already defined in Handout 1 what it means for two regular | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 141 | expressions to be equivalent, namely if their meaning is the | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 142 | same language: | 
| 258 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 143 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 144 | \[ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 145 | r_1 \equiv r_2 \;\dn\; L(r_1) = L(r_2) | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 146 | \] | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 147 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 148 | \noindent | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 149 | It is relatively easy to verify that some concrete equivalences | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 150 | hold, for example | 
| 124 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 151 | |
| 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 152 | \begin{center}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 153 | \begin{tabular}{rcl}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 154 | $(a + b) + c$ & $\equiv$ & $a + (b + c)$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 155 | $a + a$ & $\equiv$ & $a$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 156 | $a + b$ & $\equiv$ & $b + a$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 157 | $(a \cdot b) \cdot c$ & $\equiv$ & $a \cdot (b \cdot c)$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 158 | $c \cdot (a + b)$ & $\equiv$ & $(c \cdot a) + (c \cdot b)$\\ | 
| 124 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 159 | \end{tabular}
 | 
| 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 160 | \end{center}
 | 
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 161 | |
| 124 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 162 | \noindent | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 163 | but also easy to verify that the following regular expressions | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 164 | are \emph{not} equivalent
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 165 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 166 | \begin{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 167 | \begin{tabular}{rcl}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 168 | $a \cdot a$ & $\not\equiv$ & $a$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 169 | $a + (b \cdot c)$ & $\not\equiv$ & $(a + b) \cdot (a + c)$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 170 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 171 | \end{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 172 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 173 | \noindent I leave it to you to verify these equivalences and | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 174 | non-equivalences. It is also interesting to look at some | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 175 | corner cases involving $\ONE$ and $\ZERO$: | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 176 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 177 | \begin{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 178 | \begin{tabular}{rcl}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 179 | $a \cdot \ZERO$ & $\not\equiv$ & $a$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 180 | $a + \ONE$ & $\not\equiv$ & $a$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 181 | $\ONE$ & $\equiv$ & $\ZERO^*$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 182 | $\ONE^*$ & $\equiv$ & $\ONE$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 183 | $\ZERO^*$ & $\not\equiv$ & $\ZERO$\\ | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 184 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 185 | \end{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 186 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 187 | \noindent Again I leave it to you to make sure you agree | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 188 | with these equivalences and non-equivalences. | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 189 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 190 | |
| 318 
7975e4f0d4de
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
296diff
changeset | 191 | For our matching algorithm however the following seven | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 192 | equivalences will play an important role: | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 193 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 194 | \begin{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 195 | \begin{tabular}{rcl}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 196 | $r + \ZERO$ & $\equiv$ & $r$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 197 | $\ZERO + r$ & $\equiv$ & $r$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 198 | $r \cdot \ONE$ & $\equiv$ & $r$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 199 | $\ONE \cdot r$ & $\equiv$ & $r$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 200 | $r \cdot \ZERO$ & $\equiv$ & $\ZERO$\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 201 | $\ZERO \cdot r$ & $\equiv$ & $\ZERO$\\ | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 202 | $r + r$ & $\equiv$ & $r$ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 203 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 204 | \end{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 205 | |
| 412 | 206 | \noindent which always hold no matter what the regular expression $r$ | 
| 207 | looks like. The first two are easy to verify since $L(\ZERO)$ is the | |
| 208 | empty set. The next two are also easy to verify since $L(\ONE) = | |
| 209 | \{[]\}$ and appending the empty string to every string of another set,
 | |
| 210 | leaves the set unchanged. Be careful to fully comprehend the fifth and | |
| 211 | sixth equivalence: if you concatenate two sets of strings and one is | |
| 212 | the empty set, then the concatenation will also be the empty set. To | |
| 213 | see this, check the definition of $\_ @ \_$ for sets. The last | |
| 214 | equivalence is again trivial. | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 215 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 216 | What will be important later on is that we can orient these | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 217 | equivalences and read them from left to right. In this way we | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 218 | can view them as \emph{simplification rules}. Consider for 
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 219 | example the regular expression | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 220 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 221 | \begin{equation}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 222 | (r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO) | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 223 | \label{big}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 224 | \end{equation}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 225 | |
| 412 | 226 | \noindent If we can find an equivalent regular expression that is | 
| 227 | simpler (smaller for example), then this might potentially make our | |
| 228 | matching algorithm run faster. We can look for such a simpler regular | |
| 229 | expression $r'$ because whether a string $s$ is in $L(r)$ or in | |
| 230 | $L(r')$ with $r\equiv r'$ will always give the same answer. In the | |
| 231 | example above you will see that the regular expression is equivalent | |
| 232 | to just $r_1$. You can verify this by iteratively applying the | |
| 233 | simplification rules from above: | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 234 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 235 | \begin{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 236 | \begin{tabular}{ll}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 237 | & $(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 238 | (\underline{r_4 \cdot \ZERO})$\smallskip\\
 | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 239 | $\equiv$ & $(r_1 + \ZERO) \cdot \ONE + \underline{((\ONE + r_2) + r_3) \cdot 
 | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 240 | \ZERO}$\smallskip\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 241 | $\equiv$ & $\underline{(r_1 + \ZERO) \cdot \ONE} + \ZERO$\smallskip\\
 | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 242 | $\equiv$ & $(\underline{r_1 + \ZERO}) + \ZERO$\smallskip\\
 | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 243 | $\equiv$ & $\underline{r_1 + \ZERO}$\smallskip\\
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 244 | $\equiv$ & $r_1$\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 245 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 246 | \end{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 247 | |
| 296 
796b9b81ac8d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
291diff
changeset | 248 | \noindent In each step, I underlined where a simplification | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 249 | rule is applied. Our matching algorithm in the next section | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 250 | will often generate such ``useless'' $\ONE$s and | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 251 | $\ZERO$s, therefore simplifying them away will make the | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 252 | algorithm quite a bit faster. | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 253 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 254 | \subsection*{The Matching Algorithm}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 255 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 256 | The algorithm we will define below consists of two parts. One | 
| 412 | 257 | is the function $\textit{nullable}$ which takes a regular expression as
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 258 | argument and decides whether it can match the empty string | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 259 | (this means it returns a boolean in Scala). This can be easily | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 260 | defined recursively as follows: | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 261 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 262 | \begin{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 263 | \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
 | 
| 412 | 264 | $\textit{nullable}(\ZERO)$      & $\dn$ & $\textit{false}$\\
 | 
| 265 | $\textit{nullable}(\ONE)$         & $\dn$ & $\textit{true}$\\
 | |
| 266 | $\textit{nullable}(c)$                & $\dn$ & $\textit{false}$\\
 | |
| 267 | $\textit{nullable}(r_1 + r_2)$     & $\dn$ &  $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ 
 | |
| 268 | $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ &  $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
 | |
| 269 | $\textit{nullable}(r^*)$              & $\dn$ & $\textit{true}$ \\
 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 270 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 271 | \end{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 272 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 273 | \noindent The idea behind this function is that the following | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 274 | property holds: | 
| 124 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 275 | |
| 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 276 | \[ | 
| 412 | 277 | \textit{nullable}(r) \;\;\text{if and only if}\;\; []\in L(r)
 | 
| 124 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 278 | \] | 
| 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 279 | |
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 280 | \noindent Note on the left-hand side of the if-and-only-if we | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 281 | have a function we can implement; on the right we have its | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 282 | specification (which we cannot implement in a programming | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 283 | language). | 
| 124 
dd8b5a3dac0a
adde
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
123diff
changeset | 284 | |
| 258 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 285 | The other function of our matching algorithm calculates a | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 286 | \emph{derivative} of a regular expression. This is a function
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 287 | which will take a regular expression, say $r$, and a | 
| 412 | 288 | character, say $c$, as arguments and returns a new regular | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 289 | expression. Be careful that the intuition behind this function | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 290 | is not so easy to grasp on first reading. Essentially this | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 291 | function solves the following problem: if $r$ can match a | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 292 | string of the form $c\!::\!s$, what does the regular | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 293 | expression look like that can match just $s$? The definition | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 294 | of this function is as follows: | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 295 | |
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 296 | \begin{center}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 297 | \begin{tabular}{l@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
 | 
| 414 | 298 |   $\textit{der}\, c\, (\ZERO)$      & $\dn$ & $\ZERO$\\
 | 
| 299 |   $\textit{der}\, c\, (\ONE)$         & $\dn$ & $\ZERO$ \\
 | |
| 300 |   $\textit{der}\, c\, (d)$                & $\dn$ & if $c = d$ then $\ONE$ else $\ZERO$\\
 | |
| 301 |   $\textit{der}\, c\, (r_1 + r_2)$        & $\dn$ & $\textit{der}\, c\, r_1 + \textit{der}\, c\, r_2$\\
 | |
| 302 |   $\textit{der}\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $\textit{nullable} (r_1)$\\
 | |
| 303 |   & & then $(\textit{der}\,c\,r_1) \cdot r_2 + \textit{der}\, c\, r_2$\\ 
 | |
| 304 |   & & else $(\textit{der}\, c\, r_1) \cdot r_2$\\
 | |
| 305 |   $\textit{der}\, c\, (r^*)$          & $\dn$ & $(\textit{der}\,c\,r) \cdot (r^*)$
 | |
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 306 |   \end{tabular}
 | 
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 307 | \end{center}
 | 
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 308 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 309 | \noindent The first two clauses can be rationalised as | 
| 414 | 310 | follows: recall that $\textit{der}$ should calculate a regular
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 311 | expression so that given the ``input'' regular expression can | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 312 | match a string of the form $c\!::\!s$, we want a regular | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 313 | expression for $s$. Since neither $\ZERO$ nor $\ONE$ | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 314 | can match a string of the form $c\!::\!s$, we return | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 315 | $\ZERO$. In the third case we have to make a | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 316 | case-distinction: In case the regular expression is $c$, then | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 317 | clearly it can recognise a string of the form $c\!::\!s$, just | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 318 | that $s$ is the empty string. Therefore we return the | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 319 | $\ONE$-regular expression. In the other case we again | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 320 | return $\ZERO$ since no string of the $c\!::\!s$ can be | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 321 | matched. Next come the recursive cases, which are a bit more | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 322 | involved. Fortunately, the $+$-case is still relatively | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 323 | straightforward: all strings of the form $c\!::\!s$ are either | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 324 | matched by the regular expression $r_1$ or $r_2$. So we just | 
| 414 | 325 | have to recursively call $\textit{der}$ with these two regular
 | 
| 332 
4755ad4b457b
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
325diff
changeset | 326 | expressions and compose the results again with $+$. Makes | 
| 412 | 327 | sense? | 
| 328 | ||
| 329 | The $\cdot$-case is more complicated: if $r_1\cdot r_2$ | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 330 | matches a string of the form $c\!::\!s$, then the first part | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 331 | must be matched by $r_1$. Consequently, it makes sense to | 
| 414 | 332 | construct the regular expression for $s$ by calling $\textit{der}$ with
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 333 | $r_1$ and ``appending'' $r_2$. There is however one exception | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 334 | to this simple rule: if $r_1$ can match the empty string, then | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 335 | all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 336 | nullable (that is can match the empty string) we have to allow | 
| 414 | 337 | the choice $\textit{der}\,c\,r_2$ for calculating the regular
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 338 | expression that can match $s$. Therefore we have to add the | 
| 414 | 339 | regular expression $\textit{der}\,c\,r_2$ in the result. The $*$-case
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 340 | is again simple: if $r^*$ matches a string of the form | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 341 | $c\!::\!s$, then the first part must be ``matched'' by a | 
| 414 | 342 | single copy of $r$. Therefore we call recursively $\textit{der}\,c\,r$
 | 
| 343 | and ``append'' $r^*$ in order to match the rest of $s$. Still | |
| 344 | makes sense? | |
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 345 | |
| 414 | 346 | If all this did not make sense yet, here is another way to rationalise | 
| 347 | the definition of $\textit{der}$ by considering the following operation
 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 348 | on sets: | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 349 | |
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 350 | \begin{equation}\label{Der}
 | 
| 414 | 351 | \textit{Der}\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 352 | \end{equation}
 | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 353 | |
| 291 
201c2c6d8696
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
272diff
changeset | 354 | \noindent This operation essentially transforms a set of | 
| 
201c2c6d8696
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
272diff
changeset | 355 | strings $A$ by filtering out all strings that do not start | 
| 
201c2c6d8696
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
272diff
changeset | 356 | with $c$ and then strips off the $c$ from all the remaining | 
| 
201c2c6d8696
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
272diff
changeset | 357 | strings. For example suppose $A = \{f\!oo, bar, f\!rak\}$ then
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 358 | |
| 414 | 359 | \[ \textit{Der}\,f\,A = \{oo, rak\}\quad,\quad 
 | 
| 360 |    \textit{Der}\,b\,A = \{ar\} \quad \text{and} \quad 
 | |
| 361 |    \textit{Der}\,a\,A = \{\} 
 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 362 | \] | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 363 | |
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 364 | \noindent | 
| 414 | 365 | Note that in the last case $\textit{Der}$ is empty, because no string in $A$
 | 
| 258 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 366 | starts with $a$. With this operation we can state the following | 
| 414 | 367 | property about $\textit{der}$:
 | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 368 | |
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 369 | \[ | 
| 414 | 370 | L(\textit{der}\,c\,r) = \textit{Der}\,c\,(L(r))
 | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 371 | \] | 
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 372 | |
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 373 | \noindent | 
| 414 | 374 | This property clarifies what regular expression $\textit{der}$ calculates,
 | 
| 258 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 375 | namely take the set of strings that $r$ can match (that is $L(r)$), | 
| 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 376 | filter out all strings not starting with $c$ and strip off the $c$ | 
| 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 377 | from the remaining strings---this is exactly the language that | 
| 414 | 378 | $\textit{der}\,c\,r$ can match.
 | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 379 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 380 | If we want to find out whether the string $abc$ is matched by | 
| 414 | 381 | the regular expression $r_1$ then we can iteratively apply $\textit{der}$
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 382 | as follows | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 383 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 384 | \begin{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 385 | \begin{tabular}{rll}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 386 | Input: $r_1$, $abc$\medskip\\ | 
| 414 | 387 | Step 1: & build derivative of $a$ and $r_1$ & $(r_2 = \textit{der}\,a\,r_1)$\smallskip\\
 | 
| 388 | Step 2: & build derivative of $b$ and $r_2$ & $(r_3 = \textit{der}\,b\,r_2)$\smallskip\\
 | |
| 433 
c08290ee4f1f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
416diff
changeset | 389 | Step 3: & build derivative of $c$ and $r_3$ & $(r_4 = \textit{der}\,c\,r_3)$\smallskip\\
 | 
| 
c08290ee4f1f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
416diff
changeset | 390 | Step 4: & the string is exhausted: & $(\textit{nullable}(r_4))$\\
 | 
| 
c08290ee4f1f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
416diff
changeset | 391 | & test whether $r_4$ can recognise the\\ | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 392 | & empty string\smallskip\\ | 
| 412 | 393 | Output: & result of this test $\Rightarrow \textit{true} \,\text{or}\, \textit{false}$\\        
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 394 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 395 | \end{center}
 | 
| 140 
1be892087df2
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
133diff
changeset | 396 | |
| 414 | 397 | \noindent Again the operation $\textit{Der}$ might help to rationalise
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 398 | this algorithm. We want to know whether $abc \in L(r_1)$. We | 
| 414 | 399 | do not know yet---but let us assume it is. Then $\textit{Der}\,a\,L(r_1)$
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 400 | builds the set where all the strings not starting with $a$ are | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 401 | filtered out. Of the remaining strings, the $a$ is stripped | 
| 412 | 402 | off. So we should still have $bc$ in the set. | 
| 403 | Then we continue with filtering out all strings not | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 404 | starting with $b$ and stripping off the $b$ from the remaining | 
| 414 | 405 | strings, that means we build $\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1)))$.
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 406 | Finally we filter out all strings not starting with $c$ and | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 407 | strip off $c$ from the remaining string. This is | 
| 414 | 408 | $\textit{Der}\,c\,(\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1))))$. Now if $abc$ was in the 
 | 
| 409 | original set ($L(r_1)$), then $\textit{Der}\,c\,(\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1))))$ 
 | |
| 412 | 410 | must contain the empty string. If not, then $abc$ was not in the | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 411 | language we started with. | 
| 140 
1be892087df2
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
133diff
changeset | 412 | |
| 414 | 413 | Our matching algorithm using $\textit{der}$ and $\textit{nullable}$ works
 | 
| 414 | similarly, just using regular expression instead of sets. In order to | |
| 415 | define our algorithm we need to extend the notion of derivatives from single | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 416 | characters to strings. This can be done using the following | 
| 414 | 417 | function, taking a string and a regular expression as input and | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 418 | a regular expression as output. | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 419 | |
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 420 | \begin{center}
 | 
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 421 | \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 422 |   $\textit{ders}\, []\, r$     & $\dn$ & $r$ & \\
 | 
| 414 | 423 |   $\textit{ders}\, (c\!::\!s)\, r$ & $\dn$ & $\textit{ders}\,s\,(\textit{der}\,c\,r)$ & \\
 | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 424 |   \end{tabular}
 | 
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 425 | \end{center}
 | 
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 426 | |
| 414 | 427 | \noindent This function iterates $\textit{der}$ taking one character at
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 428 | the time from the original string until it is exhausted. | 
| 414 | 429 | Having $\textit{der}s$ in place, we can finally define our matching
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 430 | algorithm: | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 431 | |
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 432 | \[ | 
| 414 | 433 | \textit{matches}\,s\,r \dn \textit{nullable}(\textit{ders}\,s\,r)
 | 
| 125 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 434 | \] | 
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 435 | |
| 
39c75cf4e079
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
124diff
changeset | 436 | \noindent | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 437 | and we can claim that | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 438 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 439 | \[ | 
| 414 | 440 | \textit{matches}\,s\,r\quad\text{if and only if}\quad s\in L(r)
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 441 | \] | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 442 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 443 | \noindent holds, which means our algorithm satisfies the | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 444 | specification. Of course we can claim many things\ldots | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 445 | whether the claim holds any water is a different question, | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 446 | which for example is the point of the Strand-2 Coursework. | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 447 | |
| 414 | 448 | This algorithm was introduced by Janus Brzozowski in 1964, but | 
| 449 | is more widely known only in the last 10 or so years. Its | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 450 | main attractions are simplicity and being fast, as well as | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 451 | being easily extendable for other regular expressions such as | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 452 | $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of
 | 
| 414 | 453 | Strand-1 Coursework 1). | 
| 258 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 454 | |
| 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 455 | \subsection*{The Matching Algorithm in Scala}
 | 
| 
1e4da6d2490c
updated programs
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
251diff
changeset | 456 | |
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 457 | Another attraction of the algorithm is that it can be easily | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 458 | implemented in a functional programming language, like Scala. | 
| 296 
796b9b81ac8d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
291diff
changeset | 459 | Given the implementation of regular expressions in Scala shown | 
| 
796b9b81ac8d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
291diff
changeset | 460 | in the first lecture and handout, the functions and subfunctions | 
| 
796b9b81ac8d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
291diff
changeset | 461 | for \pcode{matches} are shown in Figure~\ref{scala1}.
 | 
| 126 
7c7185cb4f2b
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
125diff
changeset | 462 | |
| 
7c7185cb4f2b
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
125diff
changeset | 463 | \begin{figure}[p]
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 464 | \lstinputlisting{../progs/app5.scala}
 | 
| 412 | 465 | \caption{Scala implementation of the \textit{nullable} and 
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 466 | derivative functions. These functions are easy to | 
| 412 | 467 | implement in functional languages, because their built-in pattern | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 468 | matching and recursion allow us to mimic the mathematical | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 469 |   definitions very closely.\label{scala1}}
 | 
| 126 
7c7185cb4f2b
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
125diff
changeset | 470 | \end{figure}
 | 
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 471 | |
| 414 | 472 | |
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 473 | %Remember our second example involving the regular expression | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 474 | %$(a^*)^* \cdot b$ which could not match strings of $n$ \texttt{a}s. 
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 475 | %Java needed around 30 seconds to find this out a string with $n=28$. | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 476 | %It seems our algorithm is doing rather well in comparison: | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 477 | % | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 478 | %\begin{center}
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 479 | %\begin{tikzpicture}
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 480 | %\begin{axis}[
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 481 | %    title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 482 | %    xlabel={$n$},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 483 | %    x label style={at={(1.05,0.0)}},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 484 | %    ylabel={time in secs},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 485 | % enlargelimits=false, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 486 | %    xtick={0,1000,...,6500},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 487 | % xmax=6800, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 488 | %    ytick={0,5,...,30},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 489 | % ymax=34, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 490 | % scaled ticks=false, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 491 | % axis lines=left, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 492 | % width=8cm, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 493 | % height=4.5cm, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 494 | %    legend entries={Java,Scala V1},  
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 495 | % legend pos=north east, | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 496 | % legend cell align=left] | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 497 | %\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 498 | %\addplot[red,mark=triangle*,mark options={fill=white}] table {re1a.data};
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 499 | %\end{axis}
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 500 | %\end{tikzpicture}
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 501 | %\end{center}
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 502 | % | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 503 | %\noindent | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 504 | %This is not an error: it hardly takes more than half a second for | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 505 | %strings up to the length of 6500. After that we receive a | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 506 | %StackOverflow exception, but still\ldots | 
| 414 | 507 | |
| 508 | For running the algorithm with our first example, the evil | |
| 394 
2f9fe225ecc8
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
343diff
changeset | 509 | regular expression $a^?{}^{\{n\}}a^{\{n\}}$, we need to implement
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 510 | the optional regular expression and the exactly $n$-times | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 511 | regular expression. This can be done with the translations | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 512 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 513 | \lstinputlisting[numbers=none]{../progs/app51.scala}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 514 | |
| 414 | 515 | \noindent Running the matcher with this example, we find it is | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 516 | slightly worse then the matcher in Ruby and Python. | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 517 | Ooops\ldots | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 518 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 519 | \begin{center}
 | 
| 414 | 520 | \begin{tikzpicture}
 | 
| 521 | \begin{axis}[    
 | |
| 415 | 522 |     title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 523 |     xlabel={$n$},
 | 
| 524 |     x label style={at={(1.05,0.0)}},
 | |
| 525 |     ylabel={time in secs},
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 526 | enlargelimits=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 527 |     xtick={0,5,...,30},
 | 
| 415 | 528 | xmax=32, | 
| 414 | 529 |     ytick={0,5,...,30},
 | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 530 | scaled ticks=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 531 | axis lines=left, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 532 | width=6cm, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 533 | height=5cm, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 534 |     legend entries={Python,Ruby,Scala V1},  
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 535 | legend pos=outer north east, | 
| 415 | 536 | legend cell align=left] | 
| 434 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 537 | \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
 | 
| 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 538 | \addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};  
 | 
| 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 539 | \addplot[red,mark=triangle*,mark options={fill=white}] table {re1.data};  
 | 
| 414 | 540 | \end{axis}
 | 
| 541 | \end{tikzpicture}
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 542 | \end{center}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 543 | |
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 544 | \noindent Analysing this failure we notice that for | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 545 | $a^{\{n\}}$ we generate quite big regular expressions:
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 546 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 547 | \begin{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 548 | \begin{tabular}{rl}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 549 | 1: & $a$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 550 | 2: & $a\cdot a$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 551 | 3: & $a\cdot a\cdot a$\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 552 | & \ldots\\ | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 553 | 13: & $a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a$\\ | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 554 | & \ldots | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 555 | \end{tabular}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 556 | \end{center}
 | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 557 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 558 | \noindent Our algorithm traverses such regular expressions at | 
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 559 | least once every time a derivative is calculated. So having | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 560 | large regular expressions will cause problems. This problem | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 561 | is aggravated by $a^?$ being represented as $a + \ONE$. | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 562 | |
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 563 | We can however fix this by having an explicit constructor for | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 564 | $r^{\{n\}}$. In Scala we would introduce a constructor like
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 565 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 566 | \begin{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 567 | \code{case class NTIMES(r: Rexp, n: Int) extends Rexp}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 568 | \end{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 569 | |
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 570 | \noindent With this fix we have a constant ``size'' regular | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 571 | expression for our running example no matter how large $n$ is. | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 572 | This means we have to also add cases for \pcode{NTIMES} in the
 | 
| 414 | 573 | functions $\textit{nullable}$ and $\textit{der}$. Does the change have any
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 574 | effect? | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 575 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 576 | \begin{center}
 | 
| 414 | 577 | \begin{tikzpicture}
 | 
| 578 | \begin{axis}[
 | |
| 415 | 579 |     title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 580 |     xlabel={$n$},
 | 
| 581 |     x label style={at={(1.01,0.0)}},
 | |
| 582 |     ylabel={time in secs},
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 583 | enlargelimits=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 584 |     xtick={0,100,...,1000},
 | 
| 414 | 585 | xmax=1100, | 
| 586 |     ytick={0,5,...,30},
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 587 | scaled ticks=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 588 | axis lines=left, | 
| 414 | 589 | width=10cm, | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 590 | height=5cm, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 591 |     legend entries={Python,Ruby,Scala V1,Scala V2},  
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 592 | legend pos=outer north east, | 
| 414 | 593 | legend cell align=left] | 
| 434 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 594 | \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
 | 
| 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 595 | \addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};  
 | 
| 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 596 | \addplot[red,mark=triangle*,mark options={fill=white}] table {re1.data};  
 | 
| 
8664ff87cd77
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
433diff
changeset | 597 | \addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 | 
| 414 | 598 | \end{axis}
 | 
| 599 | \end{tikzpicture}
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 600 | \end{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 601 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 602 | \noindent Now we are talking business! The modified matcher | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 603 | can within 30 seconds handle regular expressions up to | 
| 414 | 604 | $n = 950$ before a StackOverflow is raised. Recall that Python and Ruby | 
| 605 | (and our first version, Scala V1) could only handle $n = 27$ or so in 30 | |
| 606 | seconds. There is no change for our second example | |
| 607 | $(a^*)^* \cdot b$---so this is still good. | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 608 | |
| 412 | 609 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 610 | The moral is that our algorithm is rather sensitive to the | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 611 | size of regular expressions it needs to handle. This is of | 
| 414 | 612 | course obvious because both $\textit{nullable}$ and $\textit{der}$ frequently
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 613 | need to traverse the whole regular expression. There seems, | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 614 | however, one more issue for making the algorithm run faster. | 
| 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 615 | The derivative function often produces ``useless'' | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 616 | $\ZERO$s and $\ONE$s. To see this, consider $r = ((a | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 617 | \cdot b) + b)^*$ and the following two derivatives | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 618 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 619 | \begin{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 620 | \begin{tabular}{l}
 | 
| 414 | 621 | $\textit{der}\,a\,r = ((\ONE \cdot b) + \ZERO) \cdot r$\\
 | 
| 622 | $\textit{der}\,b\,r = ((\ZERO \cdot b) + \ONE)\cdot r$\\
 | |
| 623 | $\textit{der}\,c\,r = ((\ZERO \cdot b) + \ZERO)\cdot r$
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 624 | \end{tabular}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 625 | \end{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 626 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 627 | \noindent | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 628 | If we simplify them according to the simple rules from the | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 629 | beginning, we can replace the right-hand sides by the | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 630 | smaller equivalent regular expressions | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 631 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 632 | \begin{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 633 | \begin{tabular}{l}
 | 
| 414 | 634 | $\textit{der}\,a\,r \equiv b \cdot r$\\
 | 
| 635 | $\textit{der}\,b\,r \equiv r$\\
 | |
| 636 | $\textit{der}\,c\,r \equiv \ZERO$
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 637 | \end{tabular}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 638 | \end{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 639 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 640 | \noindent I leave it to you to contemplate whether such a | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 641 | simplification can have any impact on the correctness of our | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 642 | algorithm (will it change any answers?). Figure~\ref{scala2}
 | 
| 296 
796b9b81ac8d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
291diff
changeset | 643 | gives a simplification function that recursively traverses a | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 644 | regular expression and simplifies it according to the rules | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 645 | given at the beginning. There are only rules for $+$, $\cdot$ | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 646 | and $n$-times (the latter because we added it in the second | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 647 | version of our matcher). There is no rule for a star, because | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 648 | empirical data and also a little thought showed that | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 649 | simplifying under a star is a waste of computation time. The | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 650 | simplification function will be called after every derivation. | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 651 | This additional step removes all the ``junk'' the derivative | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 652 | function introduced. Does this improve the speed? You bet!! | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 653 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 654 | \begin{figure}[p]
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 655 | \lstinputlisting{../progs/app6.scala}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 656 | \caption{The simplification function and modified 
 | 
| 325 
794c599cee53
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
318diff
changeset | 657 | \texttt{ders}-function; this function now
 | 
| 333 
8890852e18b7
updated coursework
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
332diff
changeset | 658 | calls \texttt{der} first, but then simplifies
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 659 | the resulting derivative regular expressions before | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 660 | building the next derivative, see | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 661 | Line~\ref{simpline}.\label{scala2}}
 | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 662 | \end{figure}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 663 | |
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 664 | \begin{center}
 | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 665 | \begin{tikzpicture}
 | 
| 414 | 666 | \begin{axis}[
 | 
| 415 | 667 |     title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 668 |     xlabel={$n$},
 | 
| 669 |     x label style={at={(1.04,0.0)}},
 | |
| 670 |     ylabel={time in secs},
 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 671 | enlargelimits=false, | 
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 672 |     xtick={0,3000,...,9000},
 | 
| 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 673 | xmax=10000, | 
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 674 |     ytick={0,5,...,30},
 | 
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 675 | ymax=32, | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 676 | scaled ticks=false, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 677 | axis lines=left, | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 678 | width=9cm, | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 679 | height=5cm, | 
| 415 | 680 |     legend entries={Scala V2,Scala V3},
 | 
| 681 | legend pos=outer north east, | |
| 682 | legend cell align=left] | |
| 683 | \addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 | |
| 268 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 684 | \addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 | 
| 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 685 | \end{axis}
 | 
| 
18bef085a7ca
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
263diff
changeset | 686 | \end{tikzpicture}
 | 
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 687 | \end{center}
 | 
| 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 688 | |
| 415 | 689 | \noindent | 
| 690 | To reacap, Python and Ruby needed approximately 30 seconds to match | |
| 691 | a string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot a^{\{n\}}$. 
 | |
| 692 | We need a third of this time to do the same with strings up to 12,000 \texttt{a}s. 
 | |
| 693 | Similarly, Java needed 30 seconds to find out the regular expression | |
| 694 | $(a^*)^* \cdot b$ does not match the string of 28 \texttt{a}s. We can do
 | |
| 695 | the same in approximately 5 seconds for strings of 6000000 \texttt{a}s:
 | |
| 696 | ||
| 697 | ||
| 414 | 698 | \begin{center}
 | 
| 699 | \begin{tikzpicture}
 | |
| 700 | \begin{axis}[
 | |
| 415 | 701 |     title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 | 
| 414 | 702 |     xlabel={$n$},
 | 
| 703 |     x label style={at={(1.09,0.0)}},
 | |
| 704 |     ylabel={time in secs},
 | |
| 705 | enlargelimits=false, | |
| 415 | 706 | xmax=7700000, | 
| 414 | 707 |     ytick={0,5,...,30},
 | 
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 708 | ymax=32, | 
| 415 | 709 | %scaled ticks=false, | 
| 414 | 710 | axis lines=left, | 
| 711 | width=9cm, | |
| 712 | height=5cm, | |
| 415 | 713 |     legend entries={Scala V2, Scala V3},
 | 
| 714 | legend pos=outer north east, | |
| 715 | legend cell align=left] | |
| 716 | \addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};
 | |
| 414 | 717 | \addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 | 
| 718 | \end{axis}
 | |
| 719 | \end{tikzpicture}
 | |
| 720 | \end{center}
 | |
| 721 | ||
| 415 | 722 | \subsection*{Epilogue}
 | 
| 723 | ||
| 724 | (23/Aug/2016) I recently found another place where this algorithm can be | |
| 725 | sped (this idea is not integrated with what is coming next, | |
| 726 | but I present it nonetheless). The idea is to define \texttt{ders}
 | |
| 727 | not such that it iterates the derivative character-by-character, but | |
| 728 | in bigger chunks. The resulting code for \texttt{ders2} looks as
 | |
| 729 | follows: | |
| 730 | ||
| 731 | \lstinputlisting[numbers=none]{../progs/app52.scala} 
 | |
| 732 | ||
| 733 | \noindent | |
| 734 | I have not fully understood why this version is much faster, | |
| 735 | but it seems it is a combination of the clauses for \texttt{ALT}
 | |
| 736 | and \texttt{SEQ}. In the latter case we call \texttt{der} with 
 | |
| 737 | a single character and this potentially produces an alternative. | |
| 738 | The derivative of such an alternative can then be more effeciently | |
| 739 | calculated by \texttt{ders2} since it pushes a whole string
 | |
| 740 | under an \texttt{ALT}. The numbers are that in the second case  
 | |
| 741 | $(a^*)^* \cdot b$ both versions are pretty much the same, but in the | |
| 742 | first case $a^{?\{n\}} \cdot a^{\{n\}}$ the improvement gives 
 | |
| 743 | another factor of 100 speedup. Nice! | |
| 414 | 744 | |
| 415 | 745 | \begin{center}
 | 
| 746 | \begin{tabular}{cc}
 | |
| 747 | \begin{tikzpicture}
 | |
| 748 | \begin{axis}[
 | |
| 749 |     title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 | |
| 750 |     xlabel={$n$},
 | |
| 751 |     x label style={at={(1.04,0.0)}},
 | |
| 752 |     ylabel={time in secs},
 | |
| 753 | enlargelimits=false, | |
| 754 | xmax=7100000, | |
| 755 |     ytick={0,5,...,30},
 | |
| 756 | ymax=33, | |
| 757 | %scaled ticks=false, | |
| 758 | axis lines=left, | |
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 759 | width=5.5cm, | 
| 415 | 760 | height=5cm, | 
| 761 |     legend entries={Scala V3, Scala V4},
 | |
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 762 |     legend style={at={(0.1,-0.2)},anchor=north}]
 | 
| 415 | 763 | \addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 | 
| 764 | \addplot[purple,mark=square*,mark options={fill=white}] table {re4.data};
 | |
| 765 | \end{axis}
 | |
| 766 | \end{tikzpicture}
 | |
| 767 | & | |
| 768 | \begin{tikzpicture}
 | |
| 769 | \begin{axis}[
 | |
| 770 |     title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 | |
| 771 |     xlabel={$n$},
 | |
| 772 |     x label style={at={(1.09,0.0)}},
 | |
| 773 |     ylabel={time in secs},
 | |
| 774 | enlargelimits=false, | |
| 775 | xmax=8100000, | |
| 776 |     ytick={0,5,...,30},
 | |
| 777 | ymax=33, | |
| 778 | %scaled ticks=false, | |
| 779 | axis lines=left, | |
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 780 | width=5.5cm, | 
| 415 | 781 | height=5cm, | 
| 782 |     legend entries={Scala V3, Scala V4},
 | |
| 443 
cd43d8c6eb84
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
434diff
changeset | 783 |     legend style={at={(0.1,-0.2)},anchor=north}]
 | 
| 415 | 784 | \addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 | 
| 785 | \addplot[purple,mark=square*,mark options={fill=white}] table {re4a.data};
 | |
| 786 | \end{axis}
 | |
| 787 | \end{tikzpicture}
 | |
| 788 | \end{tabular}
 | |
| 789 | \end{center}
 | |
| 414 | 790 | |
| 412 | 791 | |
| 334 
fd89a63e9db3
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
333diff
changeset | 792 | \section*{Proofs}
 | 
| 
fd89a63e9db3
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
333diff
changeset | 793 | |
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 794 | You might not like doing proofs. But they serve a very | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 795 | important purpose in Computer Science: How can we be sure that | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 796 | our algorithm matches its specification. We can try to test | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 797 | the algorithm, but that often overlooks corner cases and an | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 798 | exhaustive testing is impossible (since there are infinitely | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 799 | many inputs). Proofs allow us to ensure that an algorithm | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 800 | really meets its specification. | 
| 338 
f16120cb4e19
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
334diff
changeset | 801 | |
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 802 | For the programs we look at in this module, the proofs will | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 803 | mostly by some form of induction. Remember that regular | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 804 | expressions are defined as | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 805 | |
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 806 | \begin{center}
 | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 807 | \begin{tabular}{r@{\hspace{1mm}}r@{\hspace{1mm}}l@{\hspace{13mm}}l}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 808 | $r$ & $::=$ & $\ZERO$ & null language\\ | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 809 |         & $\mid$ & $\ONE$           & empty string / \texttt{""} / []\\
 | 
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 810 | & $\mid$ & $c$ & single character\\ | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 811 | & $\mid$ & $r_1 + r_2$ & alternative / choice\\ | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 812 | & $\mid$ & $r_1 \cdot r_2$ & sequence\\ | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 813 | & $\mid$ & $r^*$ & star (zero or more)\\ | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 814 |   \end{tabular}
 | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 815 | \end{center}
 | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 816 | |
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 817 | \noindent If you want to show a property $P(r)$ for all | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 818 | regular expressions $r$, then you have to follow essentially | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 819 | the recipe: | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 820 | |
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 821 | \begin{itemize}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 822 | \item $P$ has to hold for $\ZERO$, $\ONE$ and $c$ | 
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 823 | (these are the base cases). | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 824 | \item $P$ has to hold for $r_1 + r_2$ under the assumption | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 825 | that $P$ already holds for $r_1$ and $r_2$. | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 826 | \item $P$ has to hold for $r_1 \cdot r_2$ under the | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 827 | assumption that $P$ already holds for $r_1$ and $r_2$. | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 828 | \item $P$ has to hold for $r^*$ under the assumption | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 829 | that $P$ already holds for $r$. | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 830 | \end{itemize}
 | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 831 | |
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 832 | \noindent | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 833 | A simple proof is for example showing the following | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 834 | property: | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 835 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 836 | \begin{equation}
 | 
| 412 | 837 | \textit{nullable}(r) \;\;\text{if and only if}\;\; []\in L(r)
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 838 | \label{nullableprop}
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 839 | \end{equation}
 | 
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 840 | |
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 841 | \noindent | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 842 | Let us say that this property is $P(r)$, then the first case | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 843 | we need to check is whether $P(\ZERO)$ (see recipe | 
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 844 | above). So we have to show that | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 845 | |
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 846 | \[ | 
| 412 | 847 | \textit{nullable}(\ZERO) \;\;\text{if and only if}\;\; 
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 848 | []\in L(\ZERO) | 
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 849 | \] | 
| 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 850 | |
| 412 | 851 | \noindent whereby $\textit{nullable}(\ZERO)$ is by definition of
 | 
| 852 | the function $\textit{nullable}$ always $\textit{false}$. We also have
 | |
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 853 | that $L(\ZERO)$ is by definition $\{\}$. It is
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 854 | impossible that the empty string $[]$ is in the empty set. | 
| 339 
bc395ccfba7f
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
338diff
changeset | 855 | Therefore also the right-hand side is false. Consequently we | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 856 | verified this case: both sides are false. We would still need | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 857 | to do this for $P(\ONE)$ and $P(c)$. I leave this to | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 858 | you to verify. | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 859 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 860 | Next we need to check the inductive cases, for example | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 861 | $P(r_1 + r_2)$, which is | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 862 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 863 | \begin{equation}
 | 
| 412 | 864 | \textit{nullable}(r_1 + r_2) \;\;\text{if and only if}\;\; 
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 865 | []\in L(r_1 + r_2) | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 866 | \label{propalt}
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 867 | \end{equation}
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 868 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 869 | \noindent The difference to the base cases is that in this | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 870 | case we can already assume we proved | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 871 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 872 | \begin{center}
 | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 873 | \begin{tabular}{l}
 | 
| 412 | 874 | $\textit{nullable}(r_1) \;\;\text{if and only if}\;\; []\in L(r_1)$ and\\
 | 
| 875 | $\textit{nullable}(r_2) \;\;\text{if and only if}\;\; []\in L(r_2)$\\
 | |
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 876 | \end{tabular}
 | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 877 | \end{center}
 | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 878 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 879 | \noindent These are the induction hypotheses. To check this | 
| 412 | 880 | case, we can start from $\textit{nullable}(r_1 + r_2)$, which by 
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 881 | definition is | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 882 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 883 | \[ | 
| 412 | 884 | \textit{nullable}(r_1) \vee \textit{nullable}(r_2)
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 885 | \] | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 886 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 887 | \noindent Using the two induction hypotheses from above, | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 888 | we can transform this into | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 889 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 890 | \[ | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 891 | [] \in L(r_1) \vee []\in(r_2) | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 892 | \] | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 893 | |
| 412 | 894 | \noindent We just replaced the $\textit{nullable}(\ldots)$ parts by
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 895 | the equivalent $[] \in L(\ldots)$ from the induction | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 896 | hypotheses. A bit of thinking convinces you that if | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 897 | $[] \in L(r_1) \vee []\in L(r_2)$ then the empty string | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 898 | must be in the union $L(r_1)\cup L(r_2)$, that is | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 899 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 900 | \[ | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 901 | [] \in L(r_1)\cup L(r_2) | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 902 | \] | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 903 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 904 | \noindent but this is by definition of $L$ exactly $[] \in | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 905 | L(r_1 + r_2)$, which we needed to establish according to | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 906 | \eqref{propalt}. What we have shown is that starting from
 | 
| 412 | 907 | $\textit{nullable}(r_1 + r_2)$ we have done equivalent transformations
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 908 | to end up with $[] \in L(r_1 + r_2)$. Consequently we have | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 909 | established that $P(r_1 + r_2)$ holds. | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 910 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 911 | In order to complete the proof we would now need to look | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 912 | at the cases \mbox{$P(r_1\cdot r_2)$} and $P(r^*)$. Again I let you
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 913 | check the details. | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 914 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 915 | You might have to do induction proofs over strings. | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 916 | That means you want to establish a property $P(s)$ for all | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 917 | strings $s$. For this remember strings are lists of | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 918 | characters. These lists can be either the empty list or a | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 919 | list of the form $c::s$. If you want to perform an induction | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 920 | proof for strings you need to consider the cases | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 921 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 922 | \begin{itemize}
 | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 923 | \item $P$ has to hold for $[]$ (this is the base case). | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 924 | \item $P$ has to hold for $c::s$ under the assumption | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 925 | that $P$ already holds for $s$. | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 926 | \end{itemize}
 | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 927 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 928 | \noindent | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 929 | Given this recipe, I let you show | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 930 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 931 | \begin{equation}
 | 
| 414 | 932 | \textit{Ders}\,s\,(L(r)) = L(\textit{ders}\,s\,r)
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 933 | \label{dersprop}
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 934 | \end{equation}
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 935 | |
| 414 | 936 | \noindent by induction on $s$. Recall $\textit{Der}$ is defined for 
 | 
| 937 | character---see \eqref{Der}; $\textit{Ders}$ is similar, but for strings:
 | |
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 938 | |
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 939 | \[ | 
| 414 | 940 | \textit{Ders}\,s\,A\;\dn\;\{s'\,|\,s @ s' \in A\}
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 941 | \] | 
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 942 | |
| 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 943 | \noindent In this proof you can assume the following property | 
| 414 | 944 | for $der$ and $\textit{Der}$ has already been proved, that is you can
 | 
| 399 
5c1fbb39c93e
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
394diff
changeset | 945 | assume | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 946 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 947 | \[ | 
| 414 | 948 | L(\textit{der}\,c\,r) = \textit{Der}\,c\,(L(r))
 | 
| 340 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 949 | \] | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 950 | |
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 951 | \noindent holds (this would be of course a property that | 
| 
c49122dbcdd1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
339diff
changeset | 952 | needs to be proved in a side-lemma by induction on $r$). | 
| 338 
f16120cb4e19
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
334diff
changeset | 953 | |
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 954 | To sum up, using reasoning like the one shown above allows us | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 955 | to show the correctness of our algorithm. To see this, | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 956 | start from the specification | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 957 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 958 | \[ | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 959 | s \in L(r) | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 960 | \] | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 961 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 962 | \noindent That is the problem we want to solve. Thinking a | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 963 | little, you will see that this problem is equivalent to the | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 964 | following problem | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 965 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 966 | \begin{equation}
 | 
| 414 | 967 | [] \in \textit{Ders}\,s\,(L(r))
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 968 | \label{dersstep}
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 969 | \end{equation}
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 970 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 971 | \noindent But we have shown above in \eqref{dersprop}, that
 | 
| 414 | 972 | the $\textit{Ders}$ can be replaced by $L(\textit{ders}\ldots)$. That means 
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 973 | \eqref{dersstep} is equivalent to 
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 974 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 975 | \begin{equation}
 | 
| 414 | 976 | [] \in L(\textit{ders}\,s\,r)
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 977 | \label{prefinalstep}
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 978 | \end{equation}
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 979 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 980 | \noindent We have also shown that testing whether the empty | 
| 412 | 981 | string is in a language is equivalent to the $\textit{nullable}$
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 982 | function; see \eqref{nullableprop}. That means
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 983 | \eqref{prefinalstep} is equivalent with
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 984 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 985 | \[ | 
| 414 | 986 | \textit{nullable}(\textit{ders}\,s\,r)
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 987 | \] | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 988 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 989 | \noindent But this is just the definition of $matches$ | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 990 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 991 | \[ | 
| 414 | 992 | matches\,s\,r \dn nullable(\textit{ders}\,s\,r)
 | 
| 343 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 993 | \] | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 994 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 995 | \noindent In effect we have shown | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 996 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 997 | \[ | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 998 | matches\,s\,r\;\;\text{if and only if}\;\;
 | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 999 | s\in L(r) | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1000 | \] | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1001 | |
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1002 | \noindent which is the property we set out to prove: | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1003 | our algorithm meets its specification. To have done | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1004 | so, requires a few induction proofs about strings and | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1005 | regular expressions. Following the recipes is already a big | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1006 | step in performing these proofs. | 
| 
539b2e88f5b9
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
340diff
changeset | 1007 | |
| 262 
ee4304bc6350
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
261diff
changeset | 1008 | \end{document}
 | 
| 261 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 1009 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 1010 | |
| 
24531cfaa36a
updated handouts
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
259diff
changeset | 1011 | |
| 123 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1012 | |
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1013 | %%% Local Variables: | 
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1014 | %%% mode: latex | 
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1015 | %%% TeX-master: t | 
| 
a75f9c9d8f94
added
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff
changeset | 1016 | %%% End: |