55 \[ |
55 \[ |
56 s \in L(r) |
56 s \in L(r) |
57 \] |
57 \] |
58 |
58 |
59 \noindent |
59 \noindent |
60 Clearly we cannot use the function $L$ directly in order to solve this problem, because in general |
60 we can look at an algorithm to solve this problem. |
61 the set of strings $L$ returns is infinite (recall what $L(a^*)$ is). In such cases there is no algorithm |
61 Clearly we cannot use the function $L$ directly for this, because in general |
62 then can test exhaustively, whether a string is member of this set. |
62 the set of strings $L$ returns is infinite (recall what $L(a^*)$ is). In such cases there is no way we can implement |
63 |
63 an exhaustive test for whether a string is member of this set or not. |
64 The algorithm we define below consists of two parts. One is the function $nullable$ which takes a |
64 |
|
65 The algorithm we will define below consists of two parts. One is the function $nullable$ which takes a |
65 regular expression as argument and decides whether it can match the empty string (this means it returns a |
66 regular expression as argument and decides whether it can match the empty string (this means it returns a |
66 boolean). This can be easily defined recursively as follows: |
67 boolean). This can be easily defined recursively as follows: |
67 |
68 |
68 \begin{center} |
69 \begin{center} |
69 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} |
70 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} |
82 \[ |
83 \[ |
83 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r) |
84 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r) |
84 \] |
85 \] |
85 |
86 |
86 \noindent |
87 \noindent |
87 On the left-hand side we have a function we can implement; on the right we have its specification. |
88 Note on the left-hand side we have a function we can implement; on the right we have its specification. |
88 |
89 |
89 The other function is calculating a \emph{derivative} of a regular expression. This is a function |
90 The other function of our matching algorithm calculates a \emph{derivative} of a regular expression. This is a function |
90 which will take a regular expression, say $r$, and a character, say $c$, as argument and return |
91 which will take a regular expression, say $r$, and a character, say $c$, as argument and return |
91 a new regular expression. Beware that the intuition behind this function is not so easy to grasp on first |
92 a new regular expression. Be careful that the intuition behind this function is not so easy to grasp on first |
92 reading. Essentially this function solves the following problem: if $r$ can match a string of the form |
93 reading. Essentially this function solves the following problem: if $r$ can match a string of the form |
93 $c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this |
94 $c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this |
94 function is as follows: |
95 function is as follows: |
95 |
96 |
96 \begin{center} |
97 \begin{center} |
131 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\} |
132 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\} |
132 \] |
133 \] |
133 |
134 |
134 \noindent |
135 \noindent |
135 which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then |
136 which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then |
136 strip off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then |
137 strips off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then |
137 \[ |
138 \[ |
138 Der\,f\,A = \{"oo", "rak"\}\quad,\quad |
139 Der\,f\,A = \{"oo", "rak"\}\quad,\quad |
139 Der\,b\,A = \{"ar"\} \quad \text{and} \quad |
140 Der\,b\,A = \{"ar"\} \quad \text{and} \quad |
140 Der\,a\,A = \varnothing |
141 Der\,a\,A = \varnothing |
141 \] |
142 \] |
148 L(der\,c\,r) = Der\,c\,(L(r)) |
149 L(der\,c\,r) = Der\,c\,(L(r)) |
149 \] |
150 \] |
150 |
151 |
151 \noindent |
152 \noindent |
152 This property clarifies what regular expression $der$ calculates, namely take the set of strings |
153 This property clarifies what regular expression $der$ calculates, namely take the set of strings |
153 that $r$ can match ($L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the |
154 that $r$ can match (that is $L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the |
154 remaining strings---this is exactly the language that $der\,c\,r$ can match. |
155 remaining strings---this is exactly the language that $der\,c\,r$ can match. |
155 |
156 |
156 For our matching algorithm we need to lift the notion of derivatives from characters to strings. This can be |
157 If we want to find out whether the string $"abc"$ is matched by the regular expression $r$ |
|
158 then we can iteratively apply $Der$ as follows |
|
159 |
|
160 \begin{enumerate} |
|
161 \item $Der\,a\,(L(r))$ |
|
162 \item $Der\,b\,(Der\,a\,(L(r)))$ |
|
163 \item $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$ |
|
164 \end{enumerate} |
|
165 |
|
166 \noindent |
|
167 In the last step we need to test whether the empty string is in the set. Our matching algorithm will work similarly, |
|
168 just using regular expression instead of sets. For this we need to lift the notion of derivatives from characters to strings. This can be |
157 done using the following function, taking a string and regular expression as input and a regular expression |
169 done using the following function, taking a string and regular expression as input and a regular expression |
158 as output. |
170 as output. |
159 |
171 |
160 \begin{center} |
172 \begin{center} |
161 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}} |
173 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}} |