diff -r fa7552396c71 -r 0d1e68268d0f ChengsongTanPhdThesis/Chapters/Finite.tex --- a/ChengsongTanPhdThesis/Chapters/Finite.tex Mon Jul 10 01:53:32 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Mon Jul 10 14:32:48 2023 +0100 @@ -204,8 +204,16 @@ \subsection{Overview of the Proof} \marginpar{trying to make it more intuitive and provide more insights into proof} -The most important intuition is what we call the "closed forms" of +The most important idea in this chapter %intuition +is what we call the "closed forms" of regular expression derivatives with respect to strings. +In short it allows us to express $r \backslash_{rsimps} s$ +as a different recursive function so induction on the size bound can go through. +A simple induction on $s$ or $r$ fails for $r\backslash_{rsimps} s$, but +works for $\textit{ClosedForm}(r,s)$. + + + Assume we have a regular expression $r$, be it an alternative, a sequence or a star, the idea is if we try to take several derivatives of it on paper, we end up getting a list of subexpressions, @@ -218,11 +226,6 @@ if we omit the way these regular expressions need to be nested. where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments of $r$ and $s$. -We call the precise formalisation for the shape of -\[ - r_1 + r_2 + r_3 + \ldots + r_n -\] -"closed form". The second important observation is that the list %of regular expressions $[r_1, \ldots, r_n]$ %is not cannot grow indefinitely because they all come from $r$, and derivatives @@ -231,6 +234,45 @@ is powerful enough to counteract the effect of nested structure of alternatives and eliminate duplicates such that indeed the list in $a\backslash s$ does not grow unbounded. +We call the precise formalisation for the shape of +\[ + r_1 + r_2 + r_3 + \ldots + r_n +\] +"closed form". +The name was chosen because turning the recursive relation +\[ + a \backslash_{bsimps} (c\!::\!s) \dn (\textit{bsimp} \; (a\backslash c)) \backslash_{bsimps} s +\] +into some easier-to-estimate forms +like +\[ + \sum (a_1\backslash s \cdot a_2) :: (\map \; (a_2\backslash\_) \; + (\textit{Suffix} \; s \; a_1)) + %\backslash_{bsimp +\] +was reminiscent of +%similar to t +solving recurrence relations like +$T \; n = 2 (T \frac{1}{2} n) + n$ to obtain +their closed forms. +%$T \; n = n \ln n + (s \; n)$ ($s \; n$ is +%some higher-order terms). +%(for example we know $T$ is $\Theta (n \ln n)$). +Just like a closed form of a recursive definition makes estimating +their growth possible, the closed +form of $a \backslash_{bsimps} s$ +allows us to prove the existence of a size bound. +Note that \ref{eq:approx} is only an approximate +term to show our point. +The precise formalised formula (\ref{seqClosedForm}) +needs to wait until all $\textit{rrexp}$-related +definitions are given, +%but for now we can think of the above as "the sequence +%regular expression $a_1 \cdot a_2$ after derivatives and simplifications +%w.r.t string $s$ looks like +%an alternative of giant list of sub-expressions, where each + + A high-level overview of the main components of the finiteness proof is as follows: