ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 668 3831621d7b14
parent 663 0d1e68268d0f
equal deleted inserted replaced
667:660cf698eb26 668:3831621d7b14
     1 % Chapter Template
     1 % Chapter Template
     2 
     2 
     3 \chapter{Finiteness Bound} % Main chapter title
     3 \chapter{A Formal Proof That $\textit{Blexer}\_\textit{simp}$ will not Grow Unbounded} % Main chapter title
     4 
     4 
     5 \label{Finite} 
     5 \label{Finite} 
     6 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
     6 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
     7 %of our bitcoded algorithm, that is a finite bound on the size of any 
     7 %of our bitcoded algorithm, that is a finite bound on the size of any 
     8 %regex's derivatives. 
     8 %regex's derivatives. 
     9 %(this is cahpter 5 now)
     9 %(this is cahpter 5 now)
    10 
    10 
    11 In this chapter we give a bound in terms of the size of 
    11 In this chapter we prove a bound in terms of the size of 
    12 the calculated derivatives: 
    12 the calculated derivatives: 
    13 given an annotated regular expression $a$, for any string $s$
    13 given an annotated regular expression $a$, there exists
       
    14 a constant integer $N$, such that for any string $s$
    14 our algorithm $\blexersimp$'s derivatives
    15 our algorithm $\blexersimp$'s derivatives
    15 are finitely bounded
    16 are bounded
    16 by a constant that only depends on $a$.
    17 by $N$. %a constant that only depends on $a$.
    17 Formally we show that there exists a constant integer $N_a$ such that
    18 Formally this can be expresssed
    18 \begin{center}
    19 as 
    19 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
    20 %we show that there exists a constant integer $N_a$ such that
       
    21 \begin{center}
       
    22 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N$
    20 \end{center}
    23 \end{center}
    21 \noindent
    24 \noindent
    22 where the size ($\llbracket \_ \rrbracket$) of 
    25 where the size ($\llbracket \_ \rrbracket$) of 
    23 an annotated regular expression is defined
    26 an annotated regular expression is defined
    24 in terms of the number of nodes in its 
    27 in terms of the number of nodes in its 
    34 		by a trade-off between space and time.
    37 		by a trade-off between space and time.
    35 		Backtracking algorithms
    38 		Backtracking algorithms
    36 		save other possibilities on a stack when exploring one possible
    39 		save other possibilities on a stack when exploring one possible
    37 		path of matching. Catastrophic backtracking typically occurs
    40 		path of matching. Catastrophic backtracking typically occurs
    38 		when the number of steps increase exponentially with respect
    41 		when the number of steps increase exponentially with respect
    39 		to input. In other words, the runtime is $O((c_r)^n)$ of the input
    42 		to input. In other words, the complexity is $O((c_r)^n)$ of the input
    40 		string length $n$, where the base of the exponent is determined by the
    43 		string length $n$, where the base of the exponent is determined by the
    41 		regular expression $r$.
    44 		regular expression $r$.
    42 		%so that they
    45 		%so that they
    43 		%can be traversed in the future in a DFS manner,
    46 		%can be traversed in the future in a DFS manner,
    44 		%different matchings are stored as sub-expressions 
    47 		%different matchings are stored as sub-expressions 
   105 		\begin{tabular}{lcl}
   108 		\begin{tabular}{lcl}
   106 			$\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
   109 			$\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
   107 			$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
   110 			$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
   108 			$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
   111 			$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
   109 			$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
   112 			$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
   110 			$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as   + 1$\\
   113 			$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $(\textit{sum}\; (\map \; (\llbracket \_ \rrbracket)\; as)  ) + 1$\\
   111 			$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
   114 			$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
   112 		\end{tabular}
   115 		\end{tabular}
   113 	\end{center}
   116 	\end{center}
   114 	\caption{The size function of bitcoded regular expressions}\label{brexpSize}
   117 	\caption{The size function of bitcoded regular expressions}\label{brexpSize}
   115 \end{figure}
   118 \end{figure}
   150 change between the black and blue nodes:
   153 change between the black and blue nodes:
   151 After $\simp$ the node shrinks.
   154 After $\simp$ the node shrinks.
   152 Our proof states that all the blue nodes
   155 Our proof states that all the blue nodes
   153 stay below a size bound $N_a$ determined by the input $a$.
   156 stay below a size bound $N_a$ determined by the input $a$.
   154 
   157 
   155 \noindent
       
   156 Sulzmann and Lu's assumed a similar picture of their algorithm,
   158 Sulzmann and Lu's assumed a similar picture of their algorithm,
   157 though in fact their algorithm's size might be better depicted by the following graph:
   159 even though it did not work as they expected.
   158 \begin{figure}[H]
   160 %though in fact their algorithm's size might be better depicted by the following graph:
   159 	\begin{tikzpicture}[scale=2,
   161 %\begin{figure}[H]
   160 		every node/.style={minimum size=11mm},
   162 %	\begin{tikzpicture}[scale=2,
   161 		->,>=stealth',shorten >=1pt,auto,thick
   163 %		every node/.style={minimum size=11mm},
   162 		]
   164 %		->,>=stealth',shorten >=1pt,auto,thick
   163 		\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
   165 %		]
   164 		\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
   166 %		\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
   165 		\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
   167 %		\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
   166 
   168 %		\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
   167 		\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$};
   169 %
   168 		\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$};
   170 %		\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$};
   169 
   171 %		\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$};
   170 		\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 17mm]{$a_2$};
   172 %
   171 		\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
   173 %		\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 17mm]{$a_2$};
   172 
   174 %		\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
   173 		\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$};
   175 %
   174 		\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$};
   176 %		\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$};
   175 
   177 %		\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$};
   176 		\node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$};
   178 %
   177 		\draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$};
   179 %		\node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$};
   178 
   180 %		\draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$};
   179 		\node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$};
   181 %
   180 		\draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$};
   182 %		\node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$};
   181 
   183 %		\draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$};
   182 		\node (rnn) [right = of rns, minimum size = 1mm]{};
   184 %
   183 		\draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$};
   185 %		\node (rnn) [right = of rns, minimum size = 1mm]{};
   184 
   186 %		\draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$};
   185 	\end{tikzpicture}
   187 %
   186 	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
   188 %	\end{tikzpicture}
   187 \end{figure}
   189 %	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
   188 \noindent
   190 %\end{figure}
       
   191 %\noindent
   189 The picture means that in some cases their lexer (where they use $\simpsulz$ 
   192 The picture means that in some cases their lexer (where they use $\simpsulz$ 
   190 as the simplification function)
   193 as the simplification function)
   191 will have a size explosion, causing the running time 
   194 will have a size explosion, causing the running time 
   192 of each derivative step to grow continuously (for example 
   195 of each derivative step to grow continuously (for example 
   193 in \ref{SulzmannLuLexerTime}).
   196 in \ref{SulzmannLuLexerTime}).
   763 If we are interested in the size of a derivative like 
   766 If we are interested in the size of a derivative like 
   764 $(r_1 \cdot r_2)\backslash s$,
   767 $(r_1 \cdot r_2)\backslash s$,
   765 we have to somehow get a more concrete form to begin.
   768 we have to somehow get a more concrete form to begin.
   766 We call such more concrete representations the ``closed forms'' of
   769 We call such more concrete representations the ``closed forms'' of
   767 string derivatives as opposed to their original definitions.
   770 string derivatives as opposed to their original definitions.
   768 The terminology ``closed form'' is borrowed from mathematics,
   771 The name ``closed from'' was inspired by closed forms in math,
   769 which usually describe expressions that are solely comprised of finitely many
   772 and the similarity with closed forms here is that they make
   770 well-known and easy-to-compute operations such as 
   773 estimating the same term easier.
   771 additions, multiplications, and exponential functions.
   774 %The terminology ``closed form'' is borrowed from mathematics,
       
   775 %which usually describe expressions that are solely comprised of finitely many
       
   776 %well-known and easy-to-compute operations such as 
       
   777 %additions, multiplications, and exponential functions.
   772 
   778 
   773 We start by proving some basic identities
   779 We start by proving some basic identities
   774 involving the simplification functions for r-regular expressions.
   780 involving the simplification functions for r-regular expressions.
   775 After that we introduce the rewrite relations
   781 After that we introduce the rewrite relations
   776 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
   782 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$