--- a/ChengsongTanPhdThesis/Chapters/Finite.tex Tue Jul 05 00:42:06 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Wed Jul 13 08:27:28 2022 +0100
@@ -209,7 +209,7 @@
With $\rrexp$ the size caclulation of annotated regular expressions'
simplification and derivatives can be done by the size of their unlifted
counterpart with the unlifted version of simplification and derivatives applied.
-\begin{lemma}
+\begin{lemma}\label{sizeRelations}
The following equalities hold:
\begin{itemize}
\item
@@ -1244,8 +1244,35 @@
\end{corollary}
\noindent
\subsection{Closed Forms for Star Regular Expressions}
-We use a similar technique as $r_1 \cdot r_2$ case,
-generating
+We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
+the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then
+the property of the $\distinct$ function.
+Now we try to get a bound on $r^* \backslash s$ as well.
+Again, we first look at how a star's derivatives evolve, if they grow maximally:
+\begin{center}
+
+$r^* \quad \longrightarrow_{\backslash c} \quad (r\backslash c) \cdot r^* \quad \longrightarrow_{\backslash c'} \quad
+r \backslash cc' \cdot r^* + r \backslash c' \cdot r^* \quad \longrightarrow_{\backslash c''} \quad
+(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) \quad \longrightarrow_{\backslash c'''}
+\quad \ldots$
+
+\end{center}
+When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$,
+$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
+the number of terms in $r^* \backslash s$ will grow exponentially, causing the size
+of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not
+count the possible size explosions of $r \backslash c$ themselves.
+
+Thanks to $\rflts$ and $\rDistinct$, we are able to open up regexes like
+$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') +
+(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $
+into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'',
+r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
+and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$).
+This allows us to use a similar technique as $r_1 \cdot r_2$ case,
+where the crux is to get an equivalent form of
+$\rderssimp{r^*}{s}$ with shape $\rsimp{\sum rs}$.
+This requires generating
all possible sub-strings $s'$ of $s$
such that $r\backslash s' \cdot r^*$ will appear
as a term in $(r^*) \backslash s$.
@@ -1304,6 +1331,68 @@
\end{center}
\noindent
%MAYBE TODO: introduce createdByStar
+Again these definitions are tailor-made for dealing with alternatives that have
+originated from a star's derivatives, so we do not attempt to open up all possible
+regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
+elements.
+We give a predicate for such "star-created" regular expressions:
+\begin{center}
+\begin{tabular}{lcr}
+ & & $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
+ $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
+ \end{tabular}
+ \end{center}
+
+ These definitions allows us the flexibility to talk about
+ regular expressions in their most convenient format,
+ for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
+ instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
+ These definitions help express that certain classes of syntatically
+ distinct regular expressions are actually the same under simplification.
+ This is not entirely true for annotated regular expressions:
+ %TODO: bsimp bders \neq bderssimp
+ \begin{center}
+ $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
+ \end{center}
+ For bit-codes, the order in which simplification is applied
+ might cause a difference in the location they are placed.
+ If we want something like
+ \begin{center}
+ $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
+ \end{center}
+ Some "canonicalization" procedure is required,
+ which either pushes all the common bitcodes to nodes
+ as senior as possible:
+ \begin{center}
+ $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
+ \end{center}
+ or does the reverse. However bitcodes are not of interest if we are talking about
+ the $\llbracket r \rrbracket$ size of a regex.
+ Therefore for the ease and simplicity of producing a
+ proof for a size bound, we are happy to restrict ourselves to
+ unannotated regular expressions, and obtain such equalities as
+ \begin{lemma}
+ $\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
+ \end{lemma}
+
+ \begin{proof}
+ By using the rewriting relation $\rightsquigarrow$
+ \end{proof}
+ %TODO: rsimp sflat
+And from this we obtain a proof that a star's derivative will be the same
+as if it had all its nested alternatives created during deriving being flattened out:
+ For example,
+ \begin{lemma}
+ $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
+ \end{lemma}
+ \begin{proof}
+ By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
+ \end{proof}
+% The simplification of a flattened out regular expression, provided it comes
+%from the derivative of a star, is the same as the one nested.
+
+
+
We first introduce an inductive property
for $\starupdate$ and $\hflataux{\_}$,
it says if we do derivatives of $r^*$
@@ -1516,7 +1605,7 @@
We reason similarly for $\STAR$.
The inductive hypothesis is
$\exists N. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N$.
-Let $\n_r = \llbracket r^* \rrbracket_r$.
+Let $n_r = \llbracket r^* \rrbracket_r$.
When $s = c :: cs$ is not empty,
\begin{center}
\begin{tabular}{lcll}
@@ -1572,6 +1661,13 @@
(4), (8), and (12) are all the inductive cases proven.
\end{proof}
+
+\begin{corollary}
+For any regex $a$, $\exists N_r. \forall s. \; \rsize{\bderssimp{a}{s}} \leq N_r$
+\end{corollary}
+\begin{proof}
+ By \ref{sizeRelations}.
+\end{proof}
\noindent
%-----------------------------------
@@ -1891,105 +1987,6 @@
%----------------------------------------------------------------------------------------
% SECTION 4
%----------------------------------------------------------------------------------------
-\section{A Bound for the Star Regular Expression}
-We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
-the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then
-the property of the $\distinct$ function.
-Now we try to get a bound on $r^* \backslash s$ as well.
-Again, we first look at how a star's derivatives evolve, if they grow maximally:
-\begin{center}
-
-$r^* \quad \longrightarrow_{\backslash c} \quad (r\backslash c) \cdot r^* \quad \longrightarrow_{\backslash c'} \quad
-r \backslash cc' \cdot r^* + r \backslash c' \cdot r^* \quad \longrightarrow_{\backslash c''} \quad
-(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) \quad \longrightarrow_{\backslash c'''}
-\quad \ldots$
-
-\end{center}
-When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$,
-$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
-the number of terms in $r^* \backslash s$ will grow exponentially, causing the size
-of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not
-count the possible size explosions of $r \backslash c$ themselves.
-
-Thanks to $\flts$ and $\distinctWith$, we are able to open up regexes like
-$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $
-into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
-and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$).
-For this we define $\hflataux{\_}$ and $\hflat{\_}$, similar to $\sflataux{\_}$ and $\sflat{\_}$:
-%TODO: definitions of and \hflataux \hflat
- \begin{center}
- \begin{tabular}{ccc}
- $\hflataux{r_1 + r_2}$ & $=$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
-$\hflataux r$ & $=$ & $ [r]$
-\end{tabular}
-\end{center}
-
- \begin{center}
- \begin{tabular}{ccc}
- $\hflat{r_1 + r_2}$ & $=$ & $\RALTS{\hflataux{r_1} @ \hflataux{r_2}}$\\
-$\hflat r$ & $=$ & $ r$
-\end{tabular}
-\end{center}s
-Again these definitions are tailor-made for dealing with alternatives that have
-originated from a star's derivatives, so we don't attempt to open up all possible
-regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
-elements.
-We give a predicate for such "star-created" regular expressions:
-\begin{center}
-\begin{tabular}{lcr}
- & & $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
- $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
- \end{tabular}
- \end{center}
-
- These definitions allows us the flexibility to talk about
- regular expressions in their most convenient format,
- for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
- instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
- These definitions help express that certain classes of syntatically
- distinct regular expressions are actually the same under simplification.
- This is not entirely true for annotated regular expressions:
- %TODO: bsimp bders \neq bderssimp
- \begin{center}
- $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
- \end{center}
- For bit-codes, the order in which simplification is applied
- might cause a difference in the location they are placed.
- If we want something like
- \begin{center}
- $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
- \end{center}
- Some "canonicalization" procedure is required,
- which either pushes all the common bitcodes to nodes
- as senior as possible:
- \begin{center}
- $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
- \end{center}
- or does the reverse. However bitcodes are not of interest if we are talking about
- the $\llbracket r \rrbracket$ size of a regex.
- Therefore for the ease and simplicity of producing a
- proof for a size bound, we are happy to restrict ourselves to
- unannotated regular expressions, and obtain such equalities as
- \begin{lemma}
- $\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
- \end{lemma}
-
- \begin{proof}
- By using the rewriting relation $\rightsquigarrow$
- \end{proof}
- %TODO: rsimp sflat
-And from this we obtain a proof that a star's derivative will be the same
-as if it had all its nested alternatives created during deriving being flattened out:
- For example,
- \begin{lemma}
- $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
- \end{lemma}
- \begin{proof}
- By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
- \end{proof}
-% The simplification of a flattened out regular expression, provided it comes
-%from the derivative of a star, is the same as the one nested.
-