12 the calculated derivatives: |
12 the calculated derivatives: |
13 given an annotated regular expression $a$, for any string $s$ |
13 given an annotated regular expression $a$, for any string $s$ |
14 our algorithm $\blexersimp$'s derivatives |
14 our algorithm $\blexersimp$'s derivatives |
15 are finitely bounded |
15 are finitely bounded |
16 by a constant that only depends on $a$. |
16 by a constant that only depends on $a$. |
17 Formally we show that there exists an integer $N_a$ such that |
17 Formally we show that there exists a constant integer $N_a$ such that |
18 \begin{center} |
18 \begin{center} |
19 $\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$ |
19 $\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$ |
20 \end{center} |
20 \end{center} |
21 \noindent |
21 \noindent |
22 where the size ($\llbracket \_ \rrbracket$) of |
22 where the size ($\llbracket \_ \rrbracket$) of |
49 $r,n$ indicate that $S$ depends on them), then the runtime of |
49 $r,n$ indicate that $S$ depends on them), then the runtime of |
50 derivative-based approaches would be $O(S_{r,n} * n)$. |
50 derivative-based approaches would be $O(S_{r,n} * n)$. |
51 We observe that if $S_{r,n}$ continously grows with $n$ (for example |
51 We observe that if $S_{r,n}$ continously grows with $n$ (for example |
52 growing exponentially fast), then this |
52 growing exponentially fast), then this |
53 is equally bad as catastrophic backtracking. |
53 is equally bad as catastrophic backtracking. |
54 Our finiteness bound seeks to find a constant upper bound $C$ of $\S_{r,n}$, |
54 Our finiteness bound seeks to find a constant integer |
|
55 upper bound $C$ (which in our case is $N_a$ where $a = r^\uparrow$) of $\S_{r,n}$, |
55 so that the complexity of the algorithm can be seen as linear ($O(C * n)$). |
56 so that the complexity of the algorithm can be seen as linear ($O(C * n)$). |
56 Even if $C$ is still large in our current work, it is still a constant |
57 Even if $C$ is still large in our current work, it is still a constant |
57 rather than ever-increasing number with respect to input length $n$. |
58 rather than ever-increasing number with respect to input length $n$. |
58 More importantly this $C$ constant can potentially |
59 More importantly this $C$ constant can potentially |
59 be shrunken as we optimize our simplification procedure. |
60 be shrunken as we optimize our simplification procedure. |
199 Before delving into the details of the formalisation, |
200 Before delving into the details of the formalisation, |
200 we are going to provide an overview of it in the following subsection. |
201 we are going to provide an overview of it in the following subsection. |
201 |
202 |
202 |
203 |
203 \subsection{Overview of the Proof} |
204 \subsection{Overview of the Proof} |
|
205 \marginpar{trying to make it more intuitive |
|
206 and provide more insights into proof} |
|
207 The most important intuition is what we call the "closed forms" of |
|
208 regular expression derivatives with respect to strings. |
|
209 Assume we have a regular expression $r$, be it an alternative, |
|
210 a sequence or a star, the idea is if we try to take several derivatives |
|
211 of it on paper, we end up getting a list of subexpressions, |
|
212 something like |
|
213 %omitting certain |
|
214 %nested structures of those expressions: |
|
215 \[ |
|
216 r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n, |
|
217 \] |
|
218 if we omit the way these regular expressions need to be nested. |
|
219 where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments |
|
220 of $r$ and $s$. |
|
221 The second important observation is that the list %of regular expressions |
|
222 $[r_1, \ldots, r_n]$ %is not |
|
223 cannot grow indefinitely because they all come from $r$, and derivatives |
|
224 of the same regular expression are finite up to some isomorphisms. |
|
225 We prove that the simplifications of $\blexersimp$ %make use of |
|
226 is powerful enough to counteract the effect of nested structure of alternatives |
|
227 and eliminate duplicates |
|
228 such that indeed the list in $a\backslash s$ does not grow unbounded. |
|
229 |
204 A high-level overview of the main components of the finiteness proof |
230 A high-level overview of the main components of the finiteness proof |
205 is as follows: |
231 is as follows: |
206 \begin{figure}[H] |
232 \begin{figure}[H] |
207 \begin{tikzpicture}[scale=1,font=\bf, |
233 \begin{tikzpicture}[scale=1,font=\bf, |
208 node/.style={ |
234 node/.style={ |