lexing: comparison ChengsongTanPhdThesis/Chapters/Finite.tex

equal deleted inserted replaced

-:57e33978e55d
+:3cbcd7cda0a9
 We omit these functions, as they are routine. Please refer to the formalisation
 (in file BasicIdentities.thy) for the exact definition.
 With $\rrexp$ the size caclulation of annotated regular expressions'
 simplification and derivatives can be done by the size of their unlifted
 counterpart with the unlifted version of simplification and derivatives applied.
-\begin{lemma}
+\begin{lemma}\label{sizeRelations}
 	The following equalities hold:
 	\begin{itemize}
 		\item
 			$\asize{\bsimp{a}} = \rsize{\rsimp{\rerase{a}}}$
 		\item
 	\end{center}
 \end{corollary}
 \noindent
 \subsection{Closed Forms for Star Regular Expressions}
-We use a similar technique as $r_1 \cdot r_2$ case,
+We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
-generating
+the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then
+the property of the $\distinct$ function.
+Now we try to get a bound on $r^* \backslash s$ as well.
+Again, we first look at how a star's derivatives evolve, if they grow maximally:
+\begin{center}
+$r^* \quad \longrightarrow_{\backslash c}  \quad   (r\backslash c)  \cdot  r^* \quad \longrightarrow_{\backslash c'}  \quad
+r \backslash cc'  \cdot r^* + r \backslash c' \cdot r^*  \quad \longrightarrow_{\backslash c''} \quad
+(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*)   \quad \longrightarrow_{\backslash c'''}
+\quad \ldots$
+\end{center}
+When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$,
+$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
+the number of terms in $r^* \backslash s$ will grow exponentially, causing the size
+of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not
+count the possible size explosions of $r \backslash c$ themselves.
+Thanks to $\rflts$ and $\rDistinct$, we are able to open up regexes like
+$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') +
+(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $
+into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'',
+r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
+and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$).
+This allows us to use a similar technique as $r_1 \cdot r_2$ case,
+where the crux is to get an equivalent form of
+$\rderssimp{r^*}{s}$ with shape $\rsimp{\sum rs}$.
+This requires generating
 all possible sub-strings $s'$ of $s$
 such that $r\backslash s' \cdot r^*$ will appear
 as a term in $(r^*) \backslash s$.
 The first function we define is a single-step
 updating function $\starupdate$, which takes three arguments as input:
 		$\hflat{r}$ & $\dn$ & $r$
 	\end{tabular}
 \end{center}
 \noindent
 %MAYBE TODO: introduce createdByStar
+Again these definitions are tailor-made for dealing with alternatives that have
+originated from a star's derivatives, so we do not attempt to open up all possible
+regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
+elements.
+We give a predicate for such "star-created" regular expressions:
+\begin{center}
+\begin{tabular}{lcr}
+&    &       $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
+$\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
+\end{tabular}
+\end{center}
+These definitions allows us the flexibility to talk about
+regular expressions in their most convenient format,
+for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
+instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
+These definitions help express that certain classes of syntatically
+distinct regular expressions are actually the same under simplification.
+This is not entirely true for annotated regular expressions:
+%TODO: bsimp bders \neq bderssimp
+\begin{center}
+$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
+\end{center}
+For bit-codes, the order in which simplification is applied
+might cause a difference in the location they are placed.
+If we want something like
+\begin{center}
+$\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
+\end{center}
+Some "canonicalization" procedure is required,
+which either pushes all the common bitcodes to nodes
+as senior as possible:
+\begin{center}
+$_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
+\end{center}
+or does the reverse. However bitcodes are not of interest if we are talking about
+the $\llbracket r \rrbracket$ size of a regex.
+Therefore for the ease and simplicity of producing a
+proof for a size bound, we are happy to restrict ourselves to
+unannotated regular expressions, and obtain such equalities as
+\begin{lemma}
+$\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
+\end{lemma}
+\begin{proof}
+By using the rewriting relation $\rightsquigarrow$
+\end{proof}
+%TODO: rsimp sflat
+And from this we obtain a proof that a star's derivative will be the same
+as if it had all its nested alternatives created during deriving being flattened out:
+For example,
+\begin{lemma}
+$\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
+\end{lemma}
+\begin{proof}
+By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
+\end{proof}
+% The simplification of a flattened out regular expression, provided it comes
+%from the derivative of a star, is the same as the one nested.
 We first introduce an inductive property
 for $\starupdate$ and $\hflataux{\_}$,
 it says if we do derivatives of $r^*$
 with a string that starts with $c$,
 then flatten it out,
 \end{center}
 We reason similarly for  $\STAR$.
 The inductive hypothesis is
 $\exists N. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N$.
-Let $\n_r = \llbracket r^* \rrbracket_r$.
+Let $n_r = \llbracket r^* \rrbracket_r$.
 When $s = c :: cs$ is not empty,
 \begin{center}
 \begin{tabular}{lcll}
 & & $ \llbracket   \rderssimp{r^* }{c::cs} \rrbracket_r $\\
 & $ = $ & $\llbracket \rsimp{(\sum (\map \; (\lambda s. (r \backslash_{rsimp} s) \cdot r^*) \; (\starupdates\;
 \end{tabular}
 \end{center}
 (4), (8), and (12) are all the inductive cases proven.
 \end{proof}
+\begin{corollary}
+For any regex $a$, $\exists N_r. \forall s. \; \rsize{\bderssimp{a}{s}} \leq N_r$
+\end{corollary}
+\begin{proof}
+	By \ref{sizeRelations}.
+\end{proof}
 \noindent
 %-----------------------------------
 %	SECTION 2
 %-----------------------------------
 \end{center}
 %----------------------------------------------------------------------------------------
 %	SECTION 4
 %----------------------------------------------------------------------------------------
-\section{A Bound for the Star Regular Expression}
-We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
-the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then
-the property of the $\distinct$ function.
-Now we try to get a bound on $r^* \backslash s$ as well.
-Again, we first look at how a star's derivatives evolve, if they grow maximally:
-\begin{center}
-$r^* \quad \longrightarrow_{\backslash c}  \quad   (r\backslash c)  \cdot  r^* \quad \longrightarrow_{\backslash c'}  \quad
-r \backslash cc'  \cdot r^* + r \backslash c' \cdot r^*  \quad \longrightarrow_{\backslash c''} \quad
-(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*)   \quad \longrightarrow_{\backslash c'''}
-\quad \ldots$
-\end{center}
-When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$,
-$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
-the number of terms in $r^* \backslash s$ will grow exponentially, causing the size
-of the derivatives $r^* \backslash s$ to grow exponentially, even if we do not
-count the possible size explosions of $r \backslash c$ themselves.
-Thanks to $\flts$ and $\distinctWith$, we are able to open up regexes like
-$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $
-into $\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
-and then de-duplicate terms of the form $r\backslash s' \cdot r^*$ ($s'$ being a substring of $s$).
-For this we define $\hflataux{\_}$ and $\hflat{\_}$, similar to $\sflataux{\_}$ and $\sflat{\_}$:
-%TODO: definitions of  and \hflataux \hflat
-\begin{center}
-\begin{tabular}{ccc}
-$\hflataux{r_1 + r_2}$ & $=$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
-$\hflataux r$ & $=$ & $ [r]$
-\end{tabular}
-\end{center}
-\begin{center}
-\begin{tabular}{ccc}
-$\hflat{r_1 + r_2}$ & $=$ & $\RALTS{\hflataux{r_1} @ \hflataux{r_2}}$\\
-$\hflat r$ & $=$ & $ r$
-\end{tabular}
-\end{center}s
-Again these definitions are tailor-made for dealing with alternatives that have
-originated from a star's derivatives, so we don't attempt to open up all possible
-regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
-elements.
-We give a predicate for such "star-created" regular expressions:
-\begin{center}
-\begin{tabular}{lcr}
-&    &       $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
-$\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
-\end{tabular}
-\end{center}
-These definitions allows us the flexibility to talk about
-regular expressions in their most convenient format,
-for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
-instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
-These definitions help express that certain classes of syntatically
-distinct regular expressions are actually the same under simplification.
-This is not entirely true for annotated regular expressions:
-%TODO: bsimp bders \neq bderssimp
-\begin{center}
-$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
-\end{center}
-For bit-codes, the order in which simplification is applied
-might cause a difference in the location they are placed.
-If we want something like
-\begin{center}
-$\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
-\end{center}
-Some "canonicalization" procedure is required,
-which either pushes all the common bitcodes to nodes
-as senior as possible:
-\begin{center}
-$_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
-\end{center}
-or does the reverse. However bitcodes are not of interest if we are talking about
-the $\llbracket r \rrbracket$ size of a regex.
-Therefore for the ease and simplicity of producing a
-proof for a size bound, we are happy to restrict ourselves to
-unannotated regular expressions, and obtain such equalities as
-\begin{lemma}
-$\rsimp{r_1 + r_2} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
-\end{lemma}
-\begin{proof}
-By using the rewriting relation $\rightsquigarrow$
-\end{proof}
-%TODO: rsimp sflat
-And from this we obtain a proof that a star's derivative will be the same
-as if it had all its nested alternatives created during deriving being flattened out:
-For example,
-\begin{lemma}
-$\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
-\end{lemma}
-\begin{proof}
-By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
-\end{proof}
-% The simplification of a flattened out regular expression, provided it comes
-%from the derivative of a star, is the same as the one nested.

changeset 564	3cbcd7cda0a9
parent 562	57e33978e55d
child 576	3e1b699696b6