1 % Chapter Template |
1 % Chapter Template |
2 |
2 |
3 \chapter{Finiteness Bound} % Main chapter title |
3 \chapter{A Formal Proof That $\textit{Blexer}\_\textit{simp}$ will not Grow Unbounded} % Main chapter title |
4 |
4 |
5 \label{Finite} |
5 \label{Finite} |
6 % In Chapter 4 \ref{Chapter4} we give the second guarantee |
6 % In Chapter 4 \ref{Chapter4} we give the second guarantee |
7 %of our bitcoded algorithm, that is a finite bound on the size of any |
7 %of our bitcoded algorithm, that is a finite bound on the size of any |
8 %regex's derivatives. |
8 %regex's derivatives. |
9 %(this is cahpter 5 now) |
9 %(this is cahpter 5 now) |
10 |
10 |
11 In this chapter we give a bound in terms of the size of |
11 In this chapter we prove a bound in terms of the size of |
12 the calculated derivatives: |
12 the calculated derivatives: |
13 given an annotated regular expression $a$, for any string $s$ |
13 given an annotated regular expression $a$, there exists |
|
14 a constant integer $N$, such that for any string $s$ |
14 our algorithm $\blexersimp$'s derivatives |
15 our algorithm $\blexersimp$'s derivatives |
15 are finitely bounded |
16 are bounded |
16 by a constant that only depends on $a$. |
17 by $N$. %a constant that only depends on $a$. |
17 Formally we show that there exists a constant integer $N_a$ such that |
18 Formally this can be expresssed |
18 \begin{center} |
19 as |
19 $\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$ |
20 %we show that there exists a constant integer $N_a$ such that |
|
21 \begin{center} |
|
22 $\llbracket \bderssimp{a}{s} \rrbracket \leq N$ |
20 \end{center} |
23 \end{center} |
21 \noindent |
24 \noindent |
22 where the size ($\llbracket \_ \rrbracket$) of |
25 where the size ($\llbracket \_ \rrbracket$) of |
23 an annotated regular expression is defined |
26 an annotated regular expression is defined |
24 in terms of the number of nodes in its |
27 in terms of the number of nodes in its |
34 by a trade-off between space and time. |
37 by a trade-off between space and time. |
35 Backtracking algorithms |
38 Backtracking algorithms |
36 save other possibilities on a stack when exploring one possible |
39 save other possibilities on a stack when exploring one possible |
37 path of matching. Catastrophic backtracking typically occurs |
40 path of matching. Catastrophic backtracking typically occurs |
38 when the number of steps increase exponentially with respect |
41 when the number of steps increase exponentially with respect |
39 to input. In other words, the runtime is $O((c_r)^n)$ of the input |
42 to input. In other words, the complexity is $O((c_r)^n)$ of the input |
40 string length $n$, where the base of the exponent is determined by the |
43 string length $n$, where the base of the exponent is determined by the |
41 regular expression $r$. |
44 regular expression $r$. |
42 %so that they |
45 %so that they |
43 %can be traversed in the future in a DFS manner, |
46 %can be traversed in the future in a DFS manner, |
44 %different matchings are stored as sub-expressions |
47 %different matchings are stored as sub-expressions |
150 change between the black and blue nodes: |
153 change between the black and blue nodes: |
151 After $\simp$ the node shrinks. |
154 After $\simp$ the node shrinks. |
152 Our proof states that all the blue nodes |
155 Our proof states that all the blue nodes |
153 stay below a size bound $N_a$ determined by the input $a$. |
156 stay below a size bound $N_a$ determined by the input $a$. |
154 |
157 |
155 \noindent |
|
156 Sulzmann and Lu's assumed a similar picture of their algorithm, |
158 Sulzmann and Lu's assumed a similar picture of their algorithm, |
157 though in fact their algorithm's size might be better depicted by the following graph: |
159 even though it did not work as they expected. |
158 \begin{figure}[H] |
160 %though in fact their algorithm's size might be better depicted by the following graph: |
159 \begin{tikzpicture}[scale=2, |
161 %\begin{figure}[H] |
160 every node/.style={minimum size=11mm}, |
162 % \begin{tikzpicture}[scale=2, |
161 ->,>=stealth',shorten >=1pt,auto,thick |
163 % every node/.style={minimum size=11mm}, |
162 ] |
164 % ->,>=stealth',shorten >=1pt,auto,thick |
163 \node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$}; |
165 % ] |
164 \node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$}; |
166 % \node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$}; |
165 \draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$}; |
167 % \node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$}; |
166 |
168 % \draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$}; |
167 \node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$}; |
169 % |
168 \draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$}; |
170 % \node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$}; |
169 |
171 % \draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$}; |
170 \node (r2) [rectangle, draw=black, thick, right=of r1s, minimum size = 17mm]{$a_2$}; |
172 % |
171 \draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$}; |
173 % \node (r2) [rectangle, draw=black, thick, right=of r1s, minimum size = 17mm]{$a_2$}; |
172 |
174 % \draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$}; |
173 \node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$}; |
175 % |
174 \draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$}; |
176 % \node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$}; |
175 |
177 % \draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$}; |
176 \node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$}; |
178 % |
177 \draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$}; |
179 % \node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$}; |
178 |
180 % \draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$}; |
179 \node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$}; |
181 % |
180 \draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$}; |
182 % \node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$}; |
181 |
183 % \draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$}; |
182 \node (rnn) [right = of rns, minimum size = 1mm]{}; |
184 % |
183 \draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$}; |
185 % \node (rnn) [right = of rns, minimum size = 1mm]{}; |
184 |
186 % \draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$}; |
185 \end{tikzpicture} |
187 % |
186 \caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks} |
188 % \end{tikzpicture} |
187 \end{figure} |
189 % \caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks} |
188 \noindent |
190 %\end{figure} |
|
191 %\noindent |
189 The picture means that in some cases their lexer (where they use $\simpsulz$ |
192 The picture means that in some cases their lexer (where they use $\simpsulz$ |
190 as the simplification function) |
193 as the simplification function) |
191 will have a size explosion, causing the running time |
194 will have a size explosion, causing the running time |
192 of each derivative step to grow continuously (for example |
195 of each derivative step to grow continuously (for example |
193 in \ref{SulzmannLuLexerTime}). |
196 in \ref{SulzmannLuLexerTime}). |
763 If we are interested in the size of a derivative like |
766 If we are interested in the size of a derivative like |
764 $(r_1 \cdot r_2)\backslash s$, |
767 $(r_1 \cdot r_2)\backslash s$, |
765 we have to somehow get a more concrete form to begin. |
768 we have to somehow get a more concrete form to begin. |
766 We call such more concrete representations the ``closed forms'' of |
769 We call such more concrete representations the ``closed forms'' of |
767 string derivatives as opposed to their original definitions. |
770 string derivatives as opposed to their original definitions. |
768 The terminology ``closed form'' is borrowed from mathematics, |
771 The name ``closed from'' was inspired by closed forms in math, |
769 which usually describe expressions that are solely comprised of finitely many |
772 and the similarity with closed forms here is that they make |
770 well-known and easy-to-compute operations such as |
773 estimating the same term easier. |
771 additions, multiplications, and exponential functions. |
774 %The terminology ``closed form'' is borrowed from mathematics, |
|
775 %which usually describe expressions that are solely comprised of finitely many |
|
776 %well-known and easy-to-compute operations such as |
|
777 %additions, multiplications, and exponential functions. |
772 |
778 |
773 We start by proving some basic identities |
779 We start by proving some basic identities |
774 involving the simplification functions for r-regular expressions. |
780 involving the simplification functions for r-regular expressions. |
775 After that we introduce the rewrite relations |
781 After that we introduce the rewrite relations |
776 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$ |
782 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$ |