5 |
5 |
6 \begin{document} |
6 \begin{document} |
7 |
7 |
8 \section*{Handout 2} |
8 \section*{Handout 2} |
9 |
9 |
10 Having specified what problem our matching algorithm, |
10 Having specified what problem our matching algorithm, \pcode{match}, |
11 \pcode{match}, is supposed to solve, namely for a given |
11 is supposed to solve, namely for a given regular expression $r$ and |
12 regular expression $r$ and string $s$ answer \textit{true} if |
12 string $s$ answer \textit{true} if and only if |
13 and only if |
|
14 |
13 |
15 \[ |
14 \[ |
16 s \in L(r) |
15 s \in L(r) |
17 \] |
16 \] |
18 |
17 |
19 \noindent we can look at an algorithm to solve this problem. |
18 \noindent we can look at an algorithm to solve this problem. |
20 Clearly we cannot use the function $L$ directly for this, |
19 Clearly we cannot use the function $L$ directly for this, |
21 because in general the set of strings $L$ returns is infinite |
20 because in general the set of strings $L$ returns is infinite |
22 (recall what $L(a^*)$ is). In such cases there is no way we |
21 (recall what $L(a^*)$ is). In such cases there is no way we |
23 can implement an exhaustive test for whether a string is |
22 can implement an exhaustive test for whether a string is |
24 member of this set or not. |
23 member of this set or not. Before we come to the matching |
25 |
24 algorithm, lets have a closer look at what it means when |
26 The algorithm we will define below consists of two parts. One is the function $nullable$ which takes a |
25 two regular expressions are equivalent. |
27 regular expression as argument and decides whether it can match the empty string (this means it returns a |
26 |
|
27 \subsection*{Regular Expression Equivalences} |
|
28 |
|
29 |
|
30 \subsection*{Matching Algorithm} |
|
31 |
|
32 The algorithm we will define below consists of two parts. One is the |
|
33 function $nullable$ which takes a regular expression as argument and |
|
34 decides whether it can match the empty string (this means it returns a |
28 boolean). This can be easily defined recursively as follows: |
35 boolean). This can be easily defined recursively as follows: |
29 |
36 |
30 \begin{center} |
37 \begin{center} |
31 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} |
38 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} |
32 $nullable(\varnothing)$ & $\dn$ & $f\!\/alse$\\ |
39 $nullable(\varnothing)$ & $\dn$ & $f\!\/alse$\\ |
44 \[ |
51 \[ |
45 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r) |
52 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r) |
46 \] |
53 \] |
47 |
54 |
48 \noindent |
55 \noindent |
49 Note on the left-hand side we have a function we can implement; on the right we have its specification. |
56 Note on the left-hand side we have a function we can implement; on the |
50 |
57 right we have its specification. |
51 The other function of our matching algorithm calculates a \emph{derivative} of a regular expression. This is a function |
58 |
52 which will take a regular expression, say $r$, and a character, say $c$, as argument and return |
59 The other function of our matching algorithm calculates a |
53 a new regular expression. Be careful that the intuition behind this function is not so easy to grasp on first |
60 \emph{derivative} of a regular expression. This is a function which |
54 reading. Essentially this function solves the following problem: if $r$ can match a string of the form |
61 will take a regular expression, say $r$, and a character, say $c$, as |
55 $c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this |
62 argument and return a new regular expression. Be careful that the |
|
63 intuition behind this function is not so easy to grasp on first |
|
64 reading. Essentially this function solves the following problem: if |
|
65 $r$ can match a string of the form $c\!::\!s$, what does the regular |
|
66 expression look like that can match just $s$. The definition of this |
56 function is as follows: |
67 function is as follows: |
57 |
68 |
58 \begin{center} |
69 \begin{center} |
59 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}} |
70 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}} |
60 $der\, c\, (\varnothing)$ & $\dn$ & $\varnothing$ & \\ |
71 $der\, c\, (\varnothing)$ & $\dn$ & $\varnothing$ & \\ |
67 $der\, c\, (r^*)$ & $\dn$ & $(der\,c\,r) \cdot (r^*)$ & |
78 $der\, c\, (r^*)$ & $\dn$ & $(der\,c\,r) \cdot (r^*)$ & |
68 \end{tabular} |
79 \end{tabular} |
69 \end{center} |
80 \end{center} |
70 |
81 |
71 \noindent |
82 \noindent |
72 The first two clauses can be rationalised as follows: recall that $der$ should calculate a regular |
83 The first two clauses can be rationalised as follows: recall that |
73 expression, if the ``input'' regular expression can match a string of the form $c\!::\!s$. Since neither |
84 $der$ should calculate a regular expression, if the ``input'' regular |
74 $\varnothing$ nor $\epsilon$ can match such a string we return $\varnothing$. In the third case |
85 expression can match a string of the form $c\!::\!s$. Since neither |
75 we have to make a case-distinction: In case the regular expression is $c$, then clearly it can recognise |
86 $\varnothing$ nor $\epsilon$ can match such a string we return |
76 a string of the form $c\!::\!s$, just that $s$ is the empty string. Therefore we return the $\epsilon$-regular |
87 $\varnothing$. In the third case we have to make a case-distinction: |
77 expression. In the other case we again return $\varnothing$ since no string of the $c\!::\!s$ can be matched. |
88 In case the regular expression is $c$, then clearly it can recognise a |
78 The $+$-case is relatively straightforward: all strings of the form $c\!::\!s$ are either matched by the |
89 string of the form $c\!::\!s$, just that $s$ is the empty |
79 regular expression $r_1$ or $r_2$. So we just have to recursively call $der$ with these two regular |
90 string. Therefore we return the $\epsilon$-regular expression. In the |
80 expressions and compose the results again with $+$. The $\cdot$-case is more complicated: |
91 other case we again return $\varnothing$ since no string of the |
81 if $r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first part must be matched by $r_1$. |
92 $c\!::\!s$ can be matched. The $+$-case is relatively |
82 Consequently, it makes sense to construct the regular expression for $s$ by calling $der$ with $r_1$ and |
93 straightforward: all strings of the form $c\!::\!s$ are either matched |
83 ``appending'' $r_2$. There is however one exception to this simple rule: if $r_1$ can match the empty |
94 by the regular expression $r_1$ or $r_2$. So we just have to |
84 string, then all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is nullable (that is can match the |
95 recursively call $der$ with these two regular expressions and compose |
85 empty string) we have to allow the choice $der\,c\,r_2$ for calculating the regular expression that can match |
96 the results again with $+$. The $\cdot$-case is more complicated: if |
86 $s$. The $*$-case is again simple: if $r^*$ matches a string of the form $c\!::\!s$, then the first part must be |
97 $r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first |
87 ``matched'' by a single copy of $r$. Therefore we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to |
98 part must be matched by $r_1$. Consequently, it makes sense to |
88 match the rest of $s$. |
99 construct the regular expression for $s$ by calling $der$ with $r_1$ |
89 |
100 and ``appending'' $r_2$. There is however one exception to this simple |
90 Another way to rationalise the definition of $der$ is to consider the following operation on sets: |
101 rule: if $r_1$ can match the empty string, then all of $c\!::\!s$ is |
|
102 matched by $r_2$. So in case $r_1$ is nullable (that is can match the |
|
103 empty string) we have to allow the choice $der\,c\,r_2$ for |
|
104 calculating the regular expression that can match $s$. The $*$-case is |
|
105 again simple: if $r^*$ matches a string of the form $c\!::\!s$, then |
|
106 the first part must be ``matched'' by a single copy of $r$. Therefore |
|
107 we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to match |
|
108 the rest of $s$. |
|
109 |
|
110 Another way to rationalise the definition of $der$ is to consider the |
|
111 following operation on sets: |
91 |
112 |
92 \[ |
113 \[ |
93 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\} |
114 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\} |
94 \] |
115 \] |
95 |
116 |
96 \noindent |
117 \noindent |
97 which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then |
118 which essentially transforms a set of strings $A$ by filtering out all |
98 strips off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then |
119 strings that do not start with $c$ and then strips off the $c$ from |
|
120 all the remaining strings. For example suppose $A = \{"f\!oo", "bar", |
|
121 "f\!rak"\}$ then |
99 \[ |
122 \[ |
100 Der\,f\,A = \{"oo", "rak"\}\quad,\quad |
123 Der\,f\,A = \{"oo", "rak"\}\quad,\quad |
101 Der\,b\,A = \{"ar"\} \quad \text{and} \quad |
124 Der\,b\,A = \{"ar"\} \quad \text{and} \quad |
102 Der\,a\,A = \varnothing |
125 Der\,a\,A = \varnothing |
103 \] |
126 \] |
104 |
127 |
105 \noindent |
128 \noindent |
106 Note that in the last case $Der$ is empty, because no string in $A$ starts with $a$. With this operation we can |
129 Note that in the last case $Der$ is empty, because no string in $A$ |
107 state the following property about $der$: |
130 starts with $a$. With this operation we can state the following |
|
131 property about $der$: |
108 |
132 |
109 \[ |
133 \[ |
110 L(der\,c\,r) = Der\,c\,(L(r)) |
134 L(der\,c\,r) = Der\,c\,(L(r)) |
111 \] |
135 \] |
112 |
136 |
113 \noindent |
137 \noindent |
114 This property clarifies what regular expression $der$ calculates, namely take the set of strings |
138 This property clarifies what regular expression $der$ calculates, |
115 that $r$ can match (that is $L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the |
139 namely take the set of strings that $r$ can match (that is $L(r)$), |
116 remaining strings---this is exactly the language that $der\,c\,r$ can match. |
140 filter out all strings not starting with $c$ and strip off the $c$ |
117 |
141 from the remaining strings---this is exactly the language that |
118 If we want to find out whether the string $"abc"$ is matched by the regular expression $r$ |
142 $der\,c\,r$ can match. |
119 then we can iteratively apply $Der$ as follows |
143 |
|
144 If we want to find out whether the string $"abc"$ is matched by the |
|
145 regular expression $r$ then we can iteratively apply $Der$ as follows |
120 |
146 |
121 \begin{enumerate} |
147 \begin{enumerate} |
122 \item $Der\,a\,(L(r))$ |
148 \item $Der\,a\,(L(r))$ |
123 \item $Der\,b\,(Der\,a\,(L(r)))$ |
149 \item $Der\,b\,(Der\,a\,(L(r)))$ |
124 \item $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$ |
150 \item $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$ |
125 \end{enumerate} |
151 \end{enumerate} |
126 |
152 |
127 \noindent |
153 \noindent |
128 In the last step we need to test whether the empty string is in the set. Our matching algorithm will work similarly, |
154 In the last step we need to test whether the empty string is in the |
129 just using regular expression instead of sets. For this we need to lift the notion of derivatives from characters to strings. This can be |
155 set. Our matching algorithm will work similarly, just using regular |
130 done using the following function, taking a string and regular expression as input and a regular expression |
156 expression instead of sets. For this we need to lift the notion of |
131 as output. |
157 derivatives from characters to strings. This can be done using the |
|
158 following function, taking a string and regular expression as input |
|
159 and a regular expression as output. |
132 |
160 |
133 \begin{center} |
161 \begin{center} |
134 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}} |
162 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}} |
135 $der\!s\, []\, r$ & $\dn$ & $r$ & \\ |
163 $der\!s\, []\, r$ & $\dn$ & $r$ & \\ |
136 $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\ |
164 $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\ |