diff -r 57e33978e55d -r 3cbcd7cda0a9 ChengsongTanPhdThesis/Chapters/Finite.tex --- a/ChengsongTanPhdThesis/Chapters/Finite.tex Tue Jul 05 00:42:06 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Wed Jul 13 08:27:28 2022 +0100 @@ -209,7 +209,7 @@ With $\rrexp$ the size caclulation of annotated regular expressions' simplification and derivatives can be done by the size of their unlifted counterpart with the unlifted version of simplification and derivatives applied. -\begin{lemma} +\begin{lemma}\label{sizeRelations} The following equalities hold: \begin{itemize} \item @@ -1244,8 +1244,35 @@ \end{corollary} \noindent \subsection{Closed Forms for Star Regular Expressions} -We use a similar technique as $r_1 \cdot r_2$ case, -generating +We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using +the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then +the property of the $\distinct$ function. +Now we try to get a bound on $r^* \backslash s$ as well. +Again, we first look at how a star's derivatives evolve, if they grow maximally: +\begin{center} + +$r^* \quad \longrightarrow_{\backslash c} \quad (r\backslash c) \cdot r^* \quad \longrightarrow_{\backslash c'} \quad +r \backslash cc' \cdot r^* + r \backslash c' \cdot r^* \quad \longrightarrow_{\backslash c''} \quad +(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) \quad \longrightarrow_{\backslash c'''} +\quad \ldots$ + +\end{center} +When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, +$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable, +the number of terms in $r^* \backslash s$ will grow exponentially, causing the size +of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not +count the possible size explosions of $r \backslash c$ themselves. + +Thanks to $\rflts$ and $\rDistinct$, we are able to open up regexes like +$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + +(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ +into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', +r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$ +and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$). +This allows us to use a similar technique as $r_1 \cdot r_2$ case, +where the crux is to get an equivalent form of +$\rderssimp{r^*}{s}$ with shape $\rsimp{\sum rs}$. +This requires generating all possible sub-strings $s'$ of $s$ such that $r\backslash s' \cdot r^*$ will appear as a term in $(r^*) \backslash s$. @@ -1304,6 +1331,68 @@ \end{center} \noindent %MAYBE TODO: introduce createdByStar +Again these definitions are tailor-made for dealing with alternatives that have +originated from a star's derivatives, so we do not attempt to open up all possible +regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2 +elements. +We give a predicate for such "star-created" regular expressions: +\begin{center} +\begin{tabular}{lcr} + & & $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\ + $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$ + \end{tabular} + \end{center} + + These definitions allows us the flexibility to talk about + regular expressions in their most convenient format, + for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $ + instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$. + These definitions help express that certain classes of syntatically + distinct regular expressions are actually the same under simplification. + This is not entirely true for annotated regular expressions: + %TODO: bsimp bders \neq bderssimp + \begin{center} + $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$ + \end{center} + For bit-codes, the order in which simplification is applied + might cause a difference in the location they are placed. + If we want something like + \begin{center} + $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$ + \end{center} + Some "canonicalization" procedure is required, + which either pushes all the common bitcodes to nodes + as senior as possible: + \begin{center} + $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $ + \end{center} + or does the reverse. However bitcodes are not of interest if we are talking about + the $\llbracket r \rrbracket$ size of a regex. + Therefore for the ease and simplicity of producing a + proof for a size bound, we are happy to restrict ourselves to + unannotated regular expressions, and obtain such equalities as + \begin{lemma} + $\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$ + \end{lemma} + + \begin{proof} + By using the rewriting relation $\rightsquigarrow$ + \end{proof} + %TODO: rsimp sflat +And from this we obtain a proof that a star's derivative will be the same +as if it had all its nested alternatives created during deriving being flattened out: + For example, + \begin{lemma} + $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$ + \end{lemma} + \begin{proof} + By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$. + \end{proof} +% The simplification of a flattened out regular expression, provided it comes +%from the derivative of a star, is the same as the one nested. + + + We first introduce an inductive property for $\starupdate$ and $\hflataux{\_}$, it says if we do derivatives of $r^*$ @@ -1516,7 +1605,7 @@ We reason similarly for $\STAR$. The inductive hypothesis is $\exists N. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N$. -Let $\n_r = \llbracket r^* \rrbracket_r$. +Let $n_r = \llbracket r^* \rrbracket_r$. When $s = c :: cs$ is not empty, \begin{center} \begin{tabular}{lcll} @@ -1572,6 +1661,13 @@ (4), (8), and (12) are all the inductive cases proven. \end{proof} + +\begin{corollary} +For any regex $a$, $\exists N_r. \forall s. \; \rsize{\bderssimp{a}{s}} \leq N_r$ +\end{corollary} +\begin{proof} + By \ref{sizeRelations}. +\end{proof} \noindent %----------------------------------- @@ -1891,105 +1987,6 @@ %---------------------------------------------------------------------------------------- % SECTION 4 %---------------------------------------------------------------------------------------- -\section{A Bound for the Star Regular Expression} -We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using -the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then -the property of the $\distinct$ function. -Now we try to get a bound on $r^* \backslash s$ as well. -Again, we first look at how a star's derivatives evolve, if they grow maximally: -\begin{center} - -$r^* \quad \longrightarrow_{\backslash c} \quad (r\backslash c) \cdot r^* \quad \longrightarrow_{\backslash c'} \quad -r \backslash cc' \cdot r^* + r \backslash c' \cdot r^* \quad \longrightarrow_{\backslash c''} \quad -(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) \quad \longrightarrow_{\backslash c'''} -\quad \ldots$ - -\end{center} -When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, -$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable, -the number of terms in $r^* \backslash s$ will grow exponentially, causing the size -of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not -count the possible size explosions of $r \backslash c$ themselves. - -Thanks to $\flts$ and $\distinctWith$, we are able to open up regexes like -$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ -into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$ -and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$). -For this we define $\hflataux{\_}$ and $\hflat{\_}$, similar to $\sflataux{\_}$ and $\sflat{\_}$: -%TODO: definitions of and \hflataux \hflat - \begin{center} - \begin{tabular}{ccc} - $\hflataux{r_1 + r_2}$ & $=$ & $\hflataux{r_1} @ \hflataux{r_2}$\\ -$\hflataux r$ & $=$ & $ [r]$ -\end{tabular} -\end{center} - - \begin{center} - \begin{tabular}{ccc} - $\hflat{r_1 + r_2}$ & $=$ & $\RALTS{\hflataux{r_1} @ \hflataux{r_2}}$\\ -$\hflat r$ & $=$ & $ r$ -\end{tabular} -\end{center}s -Again these definitions are tailor-made for dealing with alternatives that have -originated from a star's derivatives, so we don't attempt to open up all possible -regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2 -elements. -We give a predicate for such "star-created" regular expressions: -\begin{center} -\begin{tabular}{lcr} - & & $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\ - $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$ - \end{tabular} - \end{center} - - These definitions allows us the flexibility to talk about - regular expressions in their most convenient format, - for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $ - instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$. - These definitions help express that certain classes of syntatically - distinct regular expressions are actually the same under simplification. - This is not entirely true for annotated regular expressions: - %TODO: bsimp bders \neq bderssimp - \begin{center} - $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$ - \end{center} - For bit-codes, the order in which simplification is applied - might cause a difference in the location they are placed. - If we want something like - \begin{center} - $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$ - \end{center} - Some "canonicalization" procedure is required, - which either pushes all the common bitcodes to nodes - as senior as possible: - \begin{center} - $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $ - \end{center} - or does the reverse. However bitcodes are not of interest if we are talking about - the $\llbracket r \rrbracket$ size of a regex. - Therefore for the ease and simplicity of producing a - proof for a size bound, we are happy to restrict ourselves to - unannotated regular expressions, and obtain such equalities as - \begin{lemma} - $\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$ - \end{lemma} - - \begin{proof} - By using the rewriting relation $\rightsquigarrow$ - \end{proof} - %TODO: rsimp sflat -And from this we obtain a proof that a star's derivative will be the same -as if it had all its nested alternatives created during deriving being flattened out: - For example, - \begin{lemma} - $\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$ - \end{lemma} - \begin{proof} - By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$. - \end{proof} -% The simplification of a flattened out regular expression, provided it comes -%from the derivative of a star, is the same as the one nested. -