ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 661 71502e4d8691
parent 660 eddc4eaba7c4
child 662 fa7552396c71
equal deleted inserted replaced
660:eddc4eaba7c4 661:71502e4d8691
    12 the calculated derivatives: 
    12 the calculated derivatives: 
    13 given an annotated regular expression $a$, for any string $s$
    13 given an annotated regular expression $a$, for any string $s$
    14 our algorithm $\blexersimp$'s derivatives
    14 our algorithm $\blexersimp$'s derivatives
    15 are finitely bounded
    15 are finitely bounded
    16 by a constant that only depends on $a$.
    16 by a constant that only depends on $a$.
    17 Formally we show that there exists an integer $N_a$ such that
    17 Formally we show that there exists a constant integer $N_a$ such that
    18 \begin{center}
    18 \begin{center}
    19 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
    19 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
    20 \end{center}
    20 \end{center}
    21 \noindent
    21 \noindent
    22 where the size ($\llbracket \_ \rrbracket$) of 
    22 where the size ($\llbracket \_ \rrbracket$) of 
    49 		$r,n$ indicate that $S$ depends on them), then the runtime of 
    49 		$r,n$ indicate that $S$ depends on them), then the runtime of 
    50 		derivative-based approaches would be $O(S_{r,n} * n)$.
    50 		derivative-based approaches would be $O(S_{r,n} * n)$.
    51 		We observe that if $S_{r,n}$ continously grows with $n$ (for example
    51 		We observe that if $S_{r,n}$ continously grows with $n$ (for example
    52 		growing exponentially fast), then this
    52 		growing exponentially fast), then this
    53 		is equally bad as catastrophic backtracking.
    53 		is equally bad as catastrophic backtracking.
    54 		Our finiteness bound seeks to find a constant upper bound $C$ of $\S_{r,n}$,
    54 		Our finiteness bound seeks to find a constant integer
       
    55 		upper bound $C$ (which in our case is $N_a$ where $a = r^\uparrow$) of $\S_{r,n}$,
    55 		so that the complexity of the algorithm can be seen as linear ($O(C * n)$).
    56 		so that the complexity of the algorithm can be seen as linear ($O(C * n)$).
    56 		Even if $C$ is still large in our current work, it is still a constant
    57 		Even if $C$ is still large in our current work, it is still a constant
    57 		rather than ever-increasing number with respect to input length $n$.
    58 		rather than ever-increasing number with respect to input length $n$.
    58 		More importantly this $C$ constant can potentially
    59 		More importantly this $C$ constant can potentially
    59 		be shrunken as we optimize our simplification procedure. 
    60 		be shrunken as we optimize our simplification procedure. 
   199 Before delving into the details of the formalisation,
   200 Before delving into the details of the formalisation,
   200 we are going to provide an overview of it in the following subsection.
   201 we are going to provide an overview of it in the following subsection.
   201 
   202 
   202 
   203 
   203 \subsection{Overview of the Proof}
   204 \subsection{Overview of the Proof}
       
   205 \marginpar{trying to make it more intuitive
       
   206 and provide more insights into proof}
       
   207 The most important intuition is what we call the "closed forms" of
       
   208 regular expression derivatives with respect to strings.
       
   209 Assume we have a regular expression $r$, be it an alternative,
       
   210 a sequence or a star, the idea is if we try to take several derivatives
       
   211 of it on paper, we end up getting a list of subexpressions,
       
   212 something like
       
   213 %omitting certain
       
   214 %nested structures of those expressions:
       
   215 \[
       
   216 	r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n,
       
   217 \]
       
   218 if we omit the way these regular expressions need to be nested.
       
   219 where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments
       
   220 of $r$ and $s$.
       
   221 The second important observation is that the list %of regular expressions
       
   222 $[r_1, \ldots, r_n]$ %is not
       
   223 cannot grow indefinitely because they all come from $r$, and derivatives
       
   224 of the same regular expression are finite up to some isomorphisms.
       
   225 We prove that the simplifications of $\blexersimp$ %make use of 
       
   226 is powerful enough to counteract the effect of nested structure of alternatives
       
   227 and eliminate duplicates
       
   228 such that indeed the list in $a\backslash s$ does not grow unbounded.
       
   229 
   204 A high-level overview of the main components of the finiteness proof
   230 A high-level overview of the main components of the finiteness proof
   205 is as follows:
   231 is as follows:
   206 \begin{figure}[H]
   232 \begin{figure}[H]
   207 	\begin{tikzpicture}[scale=1,font=\bf,
   233 	\begin{tikzpicture}[scale=1,font=\bf,
   208 		node/.style={
   234 		node/.style={