lexing: comparison ChengsongTanPhdThesis/Chapters/Finite.tex

equal deleted inserted replaced

-:eddc4eaba7c4
+:71502e4d8691
 the calculated derivatives:
 given an annotated regular expression $a$, for any string $s$
 our algorithm $\blexersimp$'s derivatives
 are finitely bounded
 by a constant that only depends on $a$.
-Formally we show that there exists an integer $N_a$ such that
+Formally we show that there exists a constant integer $N_a$ such that
 \begin{center}
 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
 \end{center}
 \noindent
 where the size ($\llbracket \_ \rrbracket$) of
 		$r,n$ indicate that $S$ depends on them), then the runtime of
 		derivative-based approaches would be $O(S_{r,n} * n)$.
 		We observe that if $S_{r,n}$ continously grows with $n$ (for example
 		growing exponentially fast), then this
 		is equally bad as catastrophic backtracking.
-		Our finiteness bound seeks to find a constant upper bound $C$ of $\S_{r,n}$,
+		Our finiteness bound seeks to find a constant integer
+		upper bound $C$ (which in our case is $N_a$ where $a = r^\uparrow$) of $\S_{r,n}$,
 		so that the complexity of the algorithm can be seen as linear ($O(C * n)$).
 		Even if $C$ is still large in our current work, it is still a constant
 		rather than ever-increasing number with respect to input length $n$.
 		More importantly this $C$ constant can potentially
 		be shrunken as we optimize our simplification procedure.
 Before delving into the details of the formalisation,
 we are going to provide an overview of it in the following subsection.
 \subsection{Overview of the Proof}
+\marginpar{trying to make it more intuitive
+and provide more insights into proof}
+The most important intuition is what we call the "closed forms" of
+regular expression derivatives with respect to strings.
+Assume we have a regular expression $r$, be it an alternative,
+a sequence or a star, the idea is if we try to take several derivatives
+of it on paper, we end up getting a list of subexpressions,
+something like
+%omitting certain
+%nested structures of those expressions:
+\[
+	r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n,
+\]
+if we omit the way these regular expressions need to be nested.
+where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments
+of $r$ and $s$.
+The second important observation is that the list %of regular expressions
+$[r_1, \ldots, r_n]$ %is not
+cannot grow indefinitely because they all come from $r$, and derivatives
+of the same regular expression are finite up to some isomorphisms.
+We prove that the simplifications of $\blexersimp$ %make use of
+is powerful enough to counteract the effect of nested structure of alternatives
+and eliminate duplicates
+such that indeed the list in $a\backslash s$ does not grow unbounded.
 A high-level overview of the main components of the finiteness proof
 is as follows:
 \begin{figure}[H]
 	\begin{tikzpicture}[scale=1,font=\bf,
 		node/.style={

changeset 661	71502e4d8691
parent 660	eddc4eaba7c4
child 662	fa7552396c71