4 |
4 |
5 \label{Finite} |
5 \label{Finite} |
6 % In Chapter 4 \ref{Chapter4} we give the second guarantee |
6 % In Chapter 4 \ref{Chapter4} we give the second guarantee |
7 %of our bitcoded algorithm, that is a finite bound on the size of any |
7 %of our bitcoded algorithm, that is a finite bound on the size of any |
8 %regex's derivatives. |
8 %regex's derivatives. |
9 |
9 %(this is cahpter 5 now) |
10 |
10 |
11 In this chapter we give a bound in terms of the size of |
11 In this chapter we give a bound in terms of the size of |
12 the calculated derivatives: |
12 the calculated derivatives: |
13 given an annotated regular expression $a$, for any string $s$ |
13 given an annotated regular expression $a$, for any string $s$ |
14 our algorithm $\blexersimp$'s derivatives |
14 our algorithm $\blexersimp$'s derivatives |
15 are finitely bounded |
15 are finitely bounded |
16 by a constant that only depends on $a$. |
16 by a constant that only depends on $a$. |
17 Formally we show that there exists an $N_a$ such that |
17 Formally we show that there exists an integer $N_a$ such that |
18 \begin{center} |
18 \begin{center} |
19 $\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$ |
19 $\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$ |
20 \end{center} |
20 \end{center} |
21 \noindent |
21 \noindent |
22 where the size ($\llbracket \_ \rrbracket$) of |
22 where the size ($\llbracket \_ \rrbracket$) of |
23 an annotated regular expression is defined |
23 an annotated regular expression is defined |
24 in terms of the number of nodes in its |
24 in terms of the number of nodes in its |
25 tree structure (its recursive definition is given in the next page). |
25 tree structure (its recursive definition is given in the next page). |
26 We believe this size bound |
26 We believe this size bound |
27 is important in the context of POSIX lexing because |
27 is important in the context of POSIX lexing because |
|
28 \marginpar{Addressing Gerog comment: "how does this relate to backtracking?"} |
28 \begin{itemize} |
29 \begin{itemize} |
29 \item |
30 \item |
30 It is a stepping stone towards the goal |
31 It is a stepping stone towards the goal |
31 of eliminating ``catastrophic backtracking''. |
32 of eliminating ``catastrophic backtracking''. |
32 If the internal data structures used by our algorithm |
33 The derivative-based lexing algorithm avoids backtracking |
33 grows beyond a finite bound, then clearly |
34 by a trade-off between space and time. |
34 the algorithm (which traverses these structures) will |
35 Backtracking algorithms |
35 be slow. |
36 save other possibilities on a stack when exploring one possible |
36 The next step is to refine the bound $N_a$ so that it |
37 path of matching. Catastrophic backtracking typically occurs |
37 is not just finite but polynomial in $\llbracket a\rrbracket$. |
38 when the number of steps increase exponentially with respect |
|
39 to input. In other words, the runtime is $O((c_r)^n)$ of the input |
|
40 string length $n$, where the base of the exponent is determined by the |
|
41 regular expression $r$. |
|
42 %so that they |
|
43 %can be traversed in the future in a DFS manner, |
|
44 %different matchings are stored as sub-expressions |
|
45 %in a regular expression derivative. |
|
46 Derivatives saves these possibilities as sub-expressions |
|
47 and traverse those during future derivatives. If we denote the size |
|
48 of intermediate derivatives as $S_{r,n}$ (where the subscripts |
|
49 $r,n$ indicate that $S$ depends on them), then the runtime of |
|
50 derivative-based approaches would be $O(S_{r,n} * n)$. |
|
51 We observe that if $S_{r,n}$ continously grows with $n$ (for example |
|
52 growing exponentially fast), then this |
|
53 is equally bad as catastrophic backtracking. |
|
54 Our finiteness bound seeks to find a constant upper bound $C$ of $\S_{r,n}$, |
|
55 so that the complexity of the algorithm can be seen as linear ($O(C * n)$). |
|
56 Even if $C$ is still large in our current work, it is still a constant |
|
57 rather than ever-increasing number with respect to input length $n$. |
|
58 More importantly this $C$ constant can potentially |
|
59 be shrunken as we optimize our simplification procedure. |
|
60 %and showing the potential |
|
61 %improvements can be by the notion of partial derivatives. |
|
62 |
|
63 %If the internal data structures used by our algorithm |
|
64 %grows beyond a finite bound, then clearly |
|
65 %the algorithm (which traverses these structures) will |
|
66 %be slow. |
|
67 %The next step is to refine the bound $N_a$ so that it |
|
68 %is not just finite but polynomial in $\llbracket a\rrbracket$. |
38 \item |
69 \item |
39 Having the finite bound formalised |
70 Having the finite bound formalised |
40 gives us higher confidence that |
71 gives us higher confidence that |
41 our simplification algorithm $\simp$ does not ``misbehave'' |
72 our simplification algorithm $\simp$ does not ``misbehave'' |
42 like $\textit{simpSL}$ does. |
73 like $\textit{simpSL}$ does. |