202 |
202 |
203 |
203 |
204 \subsection{Overview of the Proof} |
204 \subsection{Overview of the Proof} |
205 \marginpar{trying to make it more intuitive |
205 \marginpar{trying to make it more intuitive |
206 and provide more insights into proof} |
206 and provide more insights into proof} |
207 The most important intuition is what we call the "closed forms" of |
207 The most important idea in this chapter %intuition |
|
208 is what we call the "closed forms" of |
208 regular expression derivatives with respect to strings. |
209 regular expression derivatives with respect to strings. |
|
210 In short it allows us to express $r \backslash_{rsimps} s$ |
|
211 as a different recursive function so induction on the size bound can go through. |
|
212 A simple induction on $s$ or $r$ fails for $r\backslash_{rsimps} s$, but |
|
213 works for $\textit{ClosedForm}(r,s)$. |
|
214 |
|
215 |
|
216 |
209 Assume we have a regular expression $r$, be it an alternative, |
217 Assume we have a regular expression $r$, be it an alternative, |
210 a sequence or a star, the idea is if we try to take several derivatives |
218 a sequence or a star, the idea is if we try to take several derivatives |
211 of it on paper, we end up getting a list of subexpressions, |
219 of it on paper, we end up getting a list of subexpressions, |
212 something like |
220 something like |
213 %omitting certain |
221 %omitting certain |
216 r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n, |
224 r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n, |
217 \] |
225 \] |
218 if we omit the way these regular expressions need to be nested. |
226 if we omit the way these regular expressions need to be nested. |
219 where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments |
227 where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments |
220 of $r$ and $s$. |
228 of $r$ and $s$. |
221 We call the precise formalisation for the shape of |
|
222 \[ |
|
223 r_1 + r_2 + r_3 + \ldots + r_n |
|
224 \] |
|
225 "closed form". |
|
226 The second important observation is that the list %of regular expressions |
229 The second important observation is that the list %of regular expressions |
227 $[r_1, \ldots, r_n]$ %is not |
230 $[r_1, \ldots, r_n]$ %is not |
228 cannot grow indefinitely because they all come from $r$, and derivatives |
231 cannot grow indefinitely because they all come from $r$, and derivatives |
229 of the same regular expression are finite up to some isomorphisms. |
232 of the same regular expression are finite up to some isomorphisms. |
230 We prove that the simplifications of $\blexersimp$ %make use of |
233 We prove that the simplifications of $\blexersimp$ %make use of |
231 is powerful enough to counteract the effect of nested structure of alternatives |
234 is powerful enough to counteract the effect of nested structure of alternatives |
232 and eliminate duplicates |
235 and eliminate duplicates |
233 such that indeed the list in $a\backslash s$ does not grow unbounded. |
236 such that indeed the list in $a\backslash s$ does not grow unbounded. |
|
237 We call the precise formalisation for the shape of |
|
238 \[ |
|
239 r_1 + r_2 + r_3 + \ldots + r_n |
|
240 \] |
|
241 "closed form". |
|
242 The name was chosen because turning the recursive relation |
|
243 \[ |
|
244 a \backslash_{bsimps} (c\!::\!s) \dn (\textit{bsimp} \; (a\backslash c)) \backslash_{bsimps} s |
|
245 \] |
|
246 into some easier-to-estimate forms |
|
247 like |
|
248 \[ |
|
249 \sum (a_1\backslash s \cdot a_2) :: (\map \; (a_2\backslash\_) \; |
|
250 (\textit{Suffix} \; s \; a_1)) |
|
251 %\backslash_{bsimp |
|
252 \] |
|
253 was reminiscent of |
|
254 %similar to t |
|
255 solving recurrence relations like |
|
256 $T \; n = 2 (T \frac{1}{2} n) + n$ to obtain |
|
257 their closed forms. |
|
258 %$T \; n = n \ln n + (s \; n)$ ($s \; n$ is |
|
259 %some higher-order terms). |
|
260 %(for example we know $T$ is $\Theta (n \ln n)$). |
|
261 Just like a closed form of a recursive definition makes estimating |
|
262 their growth possible, the closed |
|
263 form of $a \backslash_{bsimps} s$ |
|
264 allows us to prove the existence of a size bound. |
|
265 Note that \ref{eq:approx} is only an approximate |
|
266 term to show our point. |
|
267 The precise formalised formula (\ref{seqClosedForm}) |
|
268 needs to wait until all $\textit{rrexp}$-related |
|
269 definitions are given, |
|
270 %but for now we can think of the above as "the sequence |
|
271 %regular expression $a_1 \cdot a_2$ after derivatives and simplifications |
|
272 %w.r.t string $s$ looks like |
|
273 %an alternative of giant list of sub-expressions, where each |
|
274 |
|
275 |
234 |
276 |
235 A high-level overview of the main components of the finiteness proof |
277 A high-level overview of the main components of the finiteness proof |
236 is as follows: |
278 is as follows: |
237 \begin{figure}[H] |
279 \begin{figure}[H] |
238 \begin{tikzpicture}[scale=1,font=\bf, |
280 \begin{tikzpicture}[scale=1,font=\bf, |