lexing: comparison ChengsongTanPhdThesis/Chapters/Chapter2.tex

equal deleted inserted replaced

-:5ce3bd8e5696
+:69ad05398894
 regexes of the form $\RALTS{rs}$, where $\textit{rs}$ might not contain precisely 2
 elements.
 We give a predicate for such "star-created" regular expressions:
 \begin{center}
 \begin{tabular}{lcr}
-&    &       $\createdByStar{\RSEQ{ra}{\RSTAR{rb}}}$\\
+&    &       $\createdByStar{(\RSEQ{ra}{\RSTAR{rb}}) }$\\
 $\createdByStar{r_1} \land \createdByStar{r_2} $ & $ \Longrightarrow$ & $\createdByStar{(r_1 + r_2)}$
 \end{tabular}
 \end{center}
+These definitions allows us the flexibility to talk about
+regular expressions in their most convenient format,
+for example, flattened out $\RALTS{[r_1, r_2, \ldots, r_n]} $
+instead of binary-nested: $((r_1 + r_2) + (r_3 + r_4)) + \ldots$.
+These definitions help express that certain classes of syntatically
+distinct regular expressions are actually the same under simplification.
+This is not entirely true for annotated regular expressions:
+%TODO: bsimp bders \neq bderssimp
+\begin{center}
+$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
+\end{center}
+For bit-codes, the order in which simplification is applied
+might cause a difference in the location they are placed.
+If we want something like
+\begin{center}
+$\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
+\end{center}
+Some "canonicalization" procedure is required,
+which either pushes all the common bitcodes to nodes
+as senior as possible:
+\begin{center}
+$_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
+\end{center}
+or does the reverse. However bitcodes are not of interest if we are talking about
+the $\llbracket r \rrbracket$ size of a regex.
+Therefore for the ease and simplicity of producing a
+proof for a size bound, we are happy to restrict ourselves to
+unannotated regular expressions, and obtain such equalities as
+\begin{lemma}
+$\rsimp{(r_1 + r_2)} = \rsimp{\RALTS{\hflataux{r_1} @ \hflataux{r_2}}}$
+\end{lemma}
+\begin{proof}
+By using the rewriting relation $\rightsquigarrow$
+\end{proof}
+%TODO: rsimp sflat
+And from this we obtain a proof that a star's derivative will be the same
+as if it had all its nested alternatives created during deriving being flattened out:
+For example,
+\begin{lemma}
+$\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
+\end{lemma}
+\begin{proof}
+By structural induction on $r$, where the induction rules are these of $\createdByStar{_}$.
+\end{proof}
+% The simplification of a flattened out regular expression, provided it comes
+%from the derivative of a star, is the same as the one nested.
 One might wonder the actual bound rather than the loose bound we gave
 for the convenience of a easier proof.
 How much can the regex $r^* \backslash s$ grow? As hinted earlier,
 they can grow at a maximum speed
-of exponential $w.r.t$ the number of characters.
+exponential $w.r.t$ the number of characters,
-But they will eventually level off when the string $s$ is long enough,
+but will eventually level off when the string $s$ is long enough,
 as suggested by the finiteness bound proof.
+And unfortunately we have concrete examples
+where such regexes grew exponentially large before levelling off:
+$(a ^ * + (a + aa) ^ * + (a + aa + aaa) ^ * + \ldots +
+(a+ aa + aaa+\ldots \underbrace{a \ldots a}_{\text{n a's}})^*$ will already have a maximum
+size that is  exponential on the number $n$.
 %TODO: give out a graph showing how the size of the regex grows and levels off at a size exponential to the original regex
+While the tight upper bound will definitely be a lot lower than
+the one we gave for the finiteness proof,
+it will still be at least expoential, under our current simplification rules.
+This suggests stronger simplifications that keep the tight bound polynomial.
+%----------------------------------------------------------------------------------------
+%	SECTION 5
+%----------------------------------------------------------------------------------------
+\section{a stronger version of simplification}
+%TODO: search for isabelle proofs of algorithms that check equivalence

changeset 506	69ad05398894
parent 505	5ce3bd8e5696
child 507	213220f54a6e