overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
--- a/ChengsongTanPhdThesis/Chapters/Finite.tex Mon Jul 10 01:33:45 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Mon Jul 10 01:51:46 2023 +0100
@@ -14,7 +14,7 @@
our algorithm $\blexersimp$'s derivatives
are finitely bounded
by a constant that only depends on $a$.
-Formally we show that there exists an integer $N_a$ such that
+Formally we show that there exists a constant integer $N_a$ such that
\begin{center}
$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
\end{center}
@@ -51,7 +51,8 @@
We observe that if $S_{r,n}$ continously grows with $n$ (for example
growing exponentially fast), then this
is equally bad as catastrophic backtracking.
- Our finiteness bound seeks to find a constant upper bound $C$ of $\S_{r,n}$,
+ Our finiteness bound seeks to find a constant integer
+ upper bound $C$ (which in our case is $N_a$ where $a = r^\uparrow$) of $\S_{r,n}$,
so that the complexity of the algorithm can be seen as linear ($O(C * n)$).
Even if $C$ is still large in our current work, it is still a constant
rather than ever-increasing number with respect to input length $n$.
@@ -201,6 +202,31 @@
\subsection{Overview of the Proof}
+\marginpar{trying to make it more intuitive
+and provide more insights into proof}
+The most important intuition is what we call the "closed forms" of
+regular expression derivatives with respect to strings.
+Assume we have a regular expression $r$, be it an alternative,
+a sequence or a star, the idea is if we try to take several derivatives
+of it on paper, we end up getting a list of subexpressions,
+something like
+%omitting certain
+%nested structures of those expressions:
+\[
+ r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n,
+\]
+if we omit the way these regular expressions need to be nested.
+where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments
+of $r$ and $s$.
+The second important observation is that the list %of regular expressions
+$[r_1, \ldots, r_n]$ %is not
+cannot grow indefinitely because they all come from $r$, and derivatives
+of the same regular expression are finite up to some isomorphisms.
+We prove that the simplifications of $\blexersimp$ %make use of
+is powerful enough to counteract the effect of nested structure of alternatives
+and eliminate duplicates
+such that indeed the list in $a\backslash s$ does not grow unbounded.
+
A high-level overview of the main components of the finiteness proof
is as follows:
\begin{figure}[H]