--- a/ChengsongTanPhdThesis/Chapters/Finite.tex Tue Oct 11 13:09:47 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Wed Oct 12 14:01:33 2022 +0100
@@ -7,6 +7,19 @@
%of our bitcoded algorithm, that is a finite bound on the size of any
%regex's derivatives.
+\begin{figure}
+\begin{center}
+ \begin{tabular}{ccc}
+ $\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
+ $\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
+ $\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
+ $\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
+ $\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as + 1$\\
+ $\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
+ \end{tabular}
+\end{center}
+\caption{The size function of bitcoded regular expressions}\label{brexpSize}
+\end{figure}
In this chapter we give a guarantee in terms of size:
given an annotated regular expression $a$, for any string $s$
our algorithm $\blexersimp$'s internal annotated regular expression
@@ -17,26 +30,19 @@
\end{center}
\noindent
where the size of an annotated regular expression is defined
-in terms of the number of nodes in its tree structure:
-\begin{center}
- \begin{tabular}{ccc}
- $\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
- $\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
- $\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
- $\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
- $\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as + 1$\\
- $\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
- \end{tabular}
-\end{center}
-We believe this size formalisation
-of the algorithm is important in our context, because
+in terms of the number of nodes in its tree structure (see figure \ref{brexpSize}).
+We believe this size bound
+is important in the context of POSIX lexing, because
\begin{itemize}
\item
It is a stepping stone towards an ``absence of catastrophic-backtracking''
- guarantee. The next step would be to refine the bound $N_a$ so that it
+ guarantee.
+ If the internal data structures used by our algorithm cannot grow very large,
+ then our algorithm (which traverses those structures) cannot be too slow.
+ The next step would be to refine the bound $N_a$ so that it
is polynomial on $\llbracket a\rrbracket$.
\item
- The size bound proof gives us a higher confidence that
+ Having it formalised gives us a higher confidence that
our simplification algorithm $\simp$ does not ``mis-behave''
like $\simpsulz$ does.
The bound is universal, which is an advantage over work which
@@ -44,9 +50,9 @@
\end{itemize}
\section{Formalising About Size}
\noindent
-In our lexer $\blexersimp$,
-The regular expression is repeatedly being taken derivative of
-and then simplified.
+In our lexer ($\blexersimp$),
+we take an annotated regular expression as input,
+and repeately take derivative of and simplify it:
\begin{figure}[H]
\begin{tikzpicture}[scale=2,
every node/.style={minimum size=11mm},
@@ -75,17 +81,17 @@
\end{figure}
\noindent
Each time
-a derivative is taken, a regular expression might grow a bit,
-but simplification always takes care that
+a derivative is taken, the regular expression might grow.
+However, the simplification that is immediately afterwards will always shrink it so that
it stays small.
This intuition is depicted by the relative size
change between the black and blue nodes:
After $\simp$ the node always shrinks.
Our proof says that all the blue nodes
-stay below a size bound $N_a$ determined by $a$.
+stay below a size bound $N_a$ determined by the input $a$.
\noindent
-Sulzmann and Lu's assumed something similar about their algorithm,
+Sulzmann and Lu's assumed a similar picture about their algorithm,
though in fact their algorithm's size might be better depicted by the following graph:
\begin{figure}[H]
\begin{tikzpicture}[scale=2,
@@ -118,19 +124,25 @@
\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
\end{figure}
\noindent
-That is, on certain cases their lexer
+The picture means that on certain cases their lexer (where they use $\simpsulz$
+as the simplification function)
will have an indefinite size explosion, causing the running time
-of each derivative step to grow arbitrarily large (for example
+of each derivative step to grow continuously (for example
in \ref{SulzmannLuLexerTime}).
-The reason they made this mistake was that
-they tested out the run time of their
+They tested out the run time of their
lexer on particular examples such as $(a+b+ab)^*$
-and generalised to all cases, which
-cannot happen with our mecahnised proof.\\
-We give details of the proof in the next sections.
+and claimed that their algorithm is linear w.r.t to the input.
+With our mecahnised proof, we avoid this type of unintentional
+generalisation.\\
+
+Before delving in to the details of the formalisation,
+we are going to provide an overview of it.
+In the next subsection, we draw a picture of the bird's eye view
+of the proof.
+
\subsection{Overview of the Proof}
-Here is a bird's eye view of how the proof of finiteness works,
+Here is a bird's eye view of the main components of the finiteness proof,
which involves three steps:
\begin{figure}[H]
\begin{tikzpicture}[scale=1,font=\bf,
@@ -187,7 +199,7 @@
\end{itemize}
We will expand on these steps in the next sections.\\
-\section{The $\textit{Rrexp}$ Datatype and Its Lexing-Related Functions}
+\section{The $\textit{Rrexp}$ Datatype}
The first step is to define
$\textit{rrexp}$s.
They are without bitcodes,
@@ -240,21 +252,8 @@
$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
\end{tabular}
\end{center}
-\noindent
-$\textit{Rerase}$ throws away the bitcodes
-on the annotated regular expressions,
-but keeps everything else intact.
-Therefore it does not change the size
-of an annotated regular expression:
-\begin{lemma}\label{rsizeAsize}
- $\rsize{\rerase a} = \asize a$
-\end{lemma}
-\begin{proof}
- By routine structural induction on $a$.
-\end{proof}
-\noindent
-\subsection{Motivation Behind a New Datatype}
+\subsection{Why a New Datatype?}
The reason we take all the trouble
defining a new datatype is that $\erase$ makes things harder.
We initially started by using
@@ -282,7 +281,7 @@
All these operations change the size and structure of
an annotated regular expressions, adding unnecessary
complexities to the size bound proof.\\
-For example, if we define the size of a basic regular expression
+For example, if we define the size of a basic plain regular expression
in the usual way,
\begin{center}
\begin{tabular}{ccc}
@@ -297,19 +296,38 @@
\noindent
Then the property
\begin{center}
- $\llbracket a \rrbracket = \llbracket a_\downarrow \rrbracket_p$
+ $\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
\end{center}
does not hold.
+With $\textit{rerase}$, however,
+only the bitcodes are thrown away.
+Everything about the structure remains intact.
+Therefore it does not change the size
+of an annotated regular expression:
+\begin{lemma}\label{rsizeAsize}
+ $\rsize{\rerase a} = \asize a$
+\end{lemma}
+\begin{proof}
+ By routine structural induction on $a$.
+\end{proof}
+\noindent
One might be able to prove an inequality such as
$\llbracket a \rrbracket \leq \llbracket a_\downarrow \rrbracket_p $
and then estimate $\llbracket a_\downarrow \rrbracket_p$,
but we found our approach more straightforward.\\
-\subsection{Lexing Related Functions for $\rrexp$ such as $\backslash_r$, $\rdistincts$, and $\rsimp$}
-The operations on r-regular expressions are
-almost identical to those of the annotated regular expressions,
-except that no bitcodes are used. For example,
-the derivative operation becomes simpler:\\
+\subsection{Functions for R-regular Expressions}
+We shall define the r-regular expression version
+of $\blexer$ and $\blexersimp$ related functions.
+We use $r$ as the prefix or subscript to differentiate
+with the bitcoded version.
+For example,$\backslash_r$, $\rdistincts$, and $\rsimp$
+as opposed to $\backslash$, $\distinctBy$, and $\bsimp$.
+As promised, they are much simpler than their bitcoded counterparts.
+%The operations on r-regular expressions are
+%almost identical to those of the annotated regular expressions,
+%except that no bitcodes are used. For example,
+The derivative operation becomes simpler:\\
\begin{center}
\begin{tabular}{@{}lcl@{}}
$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\
@@ -404,8 +422,14 @@
We do not define an r-regular expression version of $\blexersimp$,
as our proof does not involve its use
(and there is no bitcode to decode into a lexical value).
-Everything about the size of annotated regular expressions
-can be calculated via the size of r-regular expressions:
+Now we are ready to introduce how r-regular expressions allow
+us to prove the size bound on bitcoded regular expressions.
+
+\subsection{Using R-regular Expressions to Bound Bit-coded Regular Expressions}
+Everything about the size of annotated regular expressions after the application
+of function $\bsimp$ and $\backslash_{simps}$
+can be calculated via the size of r-regular expressions after the application
+of $\rsimp$ and $\backslash_{rsimps}$:
\begin{lemma}\label{sizeRelations}
The following equalities hold:
\begin{itemize}
@@ -432,6 +456,7 @@
implies
$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
\end{center}
+From now on we
Unless stated otherwise in the rest of this
chapter all regular expressions without
bitcodes are seen as r-regular expressions ($\rrexp$s).
@@ -470,14 +495,9 @@
%
%\begin{quote}\it
% Claim: For regular expressions $r_1 \cdot r_2$, we claim that
-% \begin{center}
-% $ \rderssimp{r_1 \cdot r_2}{s} =
-% \rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
-% \end{center}
%\end{quote}
%\noindent
%We explain in detail how we reached those claims.
-\subsection{The Idea Behind Closed Forms}
If we attempt to prove
\begin{center}
$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{rsimps} s \rrbracket_r \leq N_r$
@@ -503,6 +523,7 @@
inductive hypotheses cannot be directly used.
We have already seen that $(r_1 \cdot r_2)\backslash s$
and $(r^*)\backslash s$ can grow in a wild way.
+
The point is that they will be equivalent to a list of
terms $\sum rs$, where each term in $rs$ will
be made of $r_1 \backslash s' $, $r_2\backslash s'$,
@@ -510,48 +531,60 @@
The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
in the simplification which saves $rs$ from growing indefinitely.
-Based on this idea, we sketch a proof by first showing the equality (where
+Based on this idea, we develop a proof in two steps.
+First, we show the equality (where
$f$ and $g$ are functions that do not increase the size of the input)
\begin{center}
-$(r_1 \cdot r_2)\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
+$r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
\end{center}
-and then show the right-hand-side can be finitely bounded.
+where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
+For example, for $r_1 \cdot r_2$ we have the equality as
+ \begin{center}
+ $ \rderssimp{r_1 \cdot r_2}{s} =
+ \rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
+ \end{center}
We call the right-hand-side the
\emph{Closed Form} of $(r_1 \cdot r_2)\backslash_{rsimps} s$.
-We will flesh out the proof sketch in the next section.
+Second, we will bound the closed form of r-regular expressions
+using some estimation techniques
+and then piece it together
+with lemma \ref{sizeRelations} to show the bitcoded regular expressions
+in our $\blexersimp$ are finitely bounded.
-\section{Details of Closed Forms and Bounds}
+We will flesh out the first step of the proof we
+sketched just now in the next section.
+
+\section{Closed Forms}
In this section we introduce in detail
how the closed forms are obtained for regular expressions'
-derivatives and how they are bounded.
-We start by proving some basic identities and inequalities
+derivatives.
+We start by proving some basic identities
involving the simplification functions for r-regular expressions.
-After that we use these identities to establish the
-closed forms we need.
-Finally, we prove the functions such as $\flts$
-will keep the size non-increasing.
-Putting this together with a general bound
-on the finiteness of distinct regular expressions
-smaller than a certain size, we obtain a bound on
-the closed forms.
+After that we introduce the rewrite relations
+$\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
+$\rightsquigarrow_f$ and $\rightsquigarrow_g$.
+These relations involves similar techniques in chapter \ref{Bitcoded2}.
+Finally, we use these identities to establish the
+closed forms of the alternative regular expression,
+the sequence regular expression, and the star regular expression.
%$r_1\cdot r_2$, $r^*$ and $\sum rs$.
-\subsection{Some Basic Identities and Inequalities}
+\subsection{Some Basic Identities}
-In this subsection, we introduce lemmas
+We now introduce lemmas
that are repeatedly used in later proofs.
Note that for the $\textit{rdistinct}$ function there
will be a lot of conversion from lists to sets.
-We use the name $set$ to refere to the
+We use $set$ to refere to the
function that converts a list $rs$ to the set
containing all the elements in $rs$.
\subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
The $\textit{rdistinct}$ function, as its name suggests, will
-remove duplicates in an r-regular expression list.
-It will also correctly exclude any elements that
-is intially in the accumulator set.
+de-duplicate an r-regular expression list.
+It will also remove any elements that
+is already in the accumulator set.
\begin{lemma}\label{rdistinctDoesTheJob}
%The function $\textit{rdistinct}$ satisfies the following
%properties:
@@ -559,7 +592,7 @@
recursive definition here, its Isabelle counterpart would be $\textit{distinct}$.}
readily defined
for testing
- whether a list's elements are all unique. Then the following
+ whether a list's elements are unique. Then the following
properties about $\textit{rdistinct}$ hold:
\begin{itemize}
\item
@@ -581,10 +614,11 @@
\end{proof}
\noindent
-$\textit{rdistinct}$ will cancel out all regular expression terms
-that are in the accumulator, therefore prepending a list $rs_a$ with an arbitrary
-list $rs$ whose elements are all from the accumulator, and then call $\textit{rdistinct}$
-on the resulting list, the output will be as if we had called $\textit{rdistinct}$
+%$\textit{rdistinct}$ will out all regular expression terms
+%that are in the accumulator, therefore
+Concatenating a list $rs_a$ at the front of another
+list $rs$ whose elements are all from the accumulator, and then calling $\textit{rdistinct}$
+on the merged list, the output will be as if we had called $\textit{rdistinct}$
without the prepending of $rs$:
\begin{lemma}\label{rdistinctConcat}
The elements appearing in the accumulator will always be removed.
@@ -672,6 +706,8 @@
By \ref{rdistinctDoesTheJob} and \ref{distinctRemovesMiddle}.
\end{proof}
\noindent
+The next lemma is a more general form of \ref{rdistinctConcat},
+it says that
$\textit{rdistinct}$ is composable w.r.t list concatenation:
\begin{lemma}\label{distinctRdistinctAppend}
If $\;\; \textit{isDistinct} \; rs_1$,
@@ -749,388 +785,16 @@
$\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$
\end{lemma}
\noindent
-\subsubsection{$\textit{rsimp}$ Does Not Increment the Size}
-Although it seems evident, we need a series
-of non-trivial lemmas to establish this property.
-\begin{lemma}\label{rsimpMonoLemmas}
- \mbox{}
- \begin{itemize}
- \item
- \[
- \llbracket \rsimpalts \; rs \rrbracket_r \leq
- \llbracket \sum \; rs \rrbracket_r
- \]
- \item
- \[
- \llbracket \rsimpseq \; r_1 \; r_2 \rrbracket_r \leq
- \llbracket r_1 \cdot r_2 \rrbracket_r
- \]
- \item
- \[
- \llbracket \rflts \; rs \rrbracket_r \leq
- \llbracket rs \rrbracket_r
- \]
- \item
- \[
- \llbracket \rDistinct \; rs \; ss \rrbracket_r \leq
- \llbracket rs \rrbracket_r
- \]
- \item
- If all elements $a$ in the set $as$ satisfy the property
- that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
- \llbracket a \rrbracket_r$, then we have
- \[
- \llbracket \; \rsimpalts \; (\textit{rdistinct} \;
- (\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
- \rrbracket \leq
- \llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
- \textit{rsimp} \; x))\; \{ \} ) \rrbracket_r
- \]
- \end{itemize}
-\end{lemma}
-\begin{proof}
- Point 1, 3, 4 can be proven by an induction on $rs$.
- Point 2 is by case analysis on $r_1$ and $r_2$.
- The last part is a corollary of the previous ones.
-\end{proof}
-\noindent
-With the lemmas for each inductive case in place, we are ready to get
-the non-increasing property as a corollary:
-\begin{corollary}\label{rsimpMono}
- $\llbracket \textit{rsimp} \; r \rrbracket_r$
-\end{corollary}
-\begin{proof}
- By \ref{rsimpMonoLemmas}.
-\end{proof}
-
-\subsubsection{The $RL$ Function: Language Interpretation of $\textit{Rrexp}$s}
-Much like the definition of $L$ on plain regular expressions, one could also
-define the language interpretation of $\rrexp$s.
-\begin{center}
- \begin{tabular}{lcl}
- $RL \; (\ZERO)$ & $\dn$ & $\phi$\\
- $RL \; (\ONE)$ & $\dn$ & $\{[]\}$\\
- $RL \; (c)$ & $\dn$ & $\{[c]\}$\\
- $RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
- $RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
- $RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$
- \end{tabular}
-\end{center}
-\noindent
-The main use of $RL$ is to establish some connections between $\rsimp{}$
-and $\rnullable{}$:
-\begin{lemma}
- The following properties hold:
- \begin{itemize}
- \item
- If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
- \item
- $\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
- \end{itemize}
-\end{lemma}
-\begin{proof}
- The first part is by induction on $r$.
- The second part is true because property
- \[ RL \; r = RL \; (\rsimp{r})\] holds.
-\end{proof}
-
-\subsubsection{$\rsimp{}$ is Non-Increasing}
-In this subsection, we prove that the function $\rsimp{}$ does not
-make the $\llbracket \rrbracket_r$ size increase.
-
-
-\begin{lemma}\label{rsimpSize}
- $\llbracket \rsimp{r} \rrbracket_r \leq \llbracket r \rrbracket_r$
-\end{lemma}
-\subsubsection{Simplified $\textit{Rrexp}$s are Good}
-We formalise the notion of ``good" regular expressions,
-which means regular expressions that
-are not fully simplified. For alternative regular expressions that means they
-do not contain any nested alternatives like
-\[ r_1 + (r_2 + r_3) \], un-removed $\RZERO$s like \[\RZERO + r\]
-or duplicate elements in a children regular expression list like \[ \sum [r, r, \ldots]\]:
-\begin{center}
- \begin{tabular}{@{}lcl@{}}
- $\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
- $\good\; \RONE$ & $\dn$ & $\textit{true}$\\
- $\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
- $\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
- $\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
- $\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ &
- $\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
- & & $\textit{and}\; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \, \textit{and}\; \, \textit{nonAlt}\; r')$\\
- $\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
- $\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
- $\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
- $\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
- $\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
- \end{tabular}
-\end{center}
-\noindent
-The predicate $\textit{nonAlt}$ evaluates to true when the regular expression is not an
-alternative, and false otherwise.
-The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
-its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$,
-and unique:
-\begin{lemma}\label{rsimpaltsGood}
- If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
- then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
-\end{lemma}
-\noindent
-We also note that
-if a regular expression $r$ is good, then $\rflts$ on the singleton
-list $[r]$ will not break goodness:
-\begin{lemma}\label{flts2}
- If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
-\end{lemma}
-\begin{proof}
- By an induction on $r$.
-\end{proof}
-\noindent
-The other observation we make about $\rsimp{r}$ is that it never
-comes with nested alternatives, which we describe as the $\nonnested$
-property:
-\begin{center}
- \begin{tabular}{lcl}
- $\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
- $\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
- $\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
- $\nonnested \; \, r $ & $\dn$ & $\btrue$
- \end{tabular}
-\end{center}
-\noindent
-The $\rflts$ function
-always opens up nested alternatives,
-which enables $\rsimp$ to be non-nested:
+We need more equalities like the above to enable a closed form,
+for which we need to introduce a few rewrite relations
+to help
+us obtain them.
-\begin{lemma}\label{nonnestedRsimp}
- $\nonnested \; (\rsimp{r})$
-\end{lemma}
-\begin{proof}
- By an induction on $r$.
-\end{proof}
-\noindent
-With this we could prove that a regular expressions
-after simplification and flattening and de-duplication,
-will not contain any alternative regular expression directly:
-\begin{lemma}\label{nonaltFltsRd}
- If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$
- then $\textit{nonAlt} \; x$.
-\end{lemma}
-\begin{proof}
- By \ref{nonnestedRsimp}.
-\end{proof}
-\noindent
-The other thing we know is that once $\rsimp{}$ had finished
-processing an alternative regular expression, it will not
-contain any $\RZERO$s, this is because all the recursive
-calls to the simplification on the children regular expressions
-make the children good, and $\rflts$ will not take out
-any $\RZERO$s out of a good regular expression list,
-and $\rdistinct{}$ will not mess with the result.
-\begin{lemma}\label{flts3Obv}
- The following are true:
- \begin{itemize}
- \item
- If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
- then for all $r \in \rflts\; rs. \, \good \; r$.
- \item
- If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
- and for all $y$ such that $\llbracket y \rrbracket_r$ less than
- $\llbracket rs \rrbracket_r + 1$, either
- $\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
- then $\good \; x$.
- \end{itemize}
-\end{lemma}
-\begin{proof}
- The first part is by induction on $rs$, where the induction
- rule is the inductive cases for $\rflts$.
- The second part is a corollary from the first part.
-\end{proof}
-
-And this leads to good structural property of $\rsimp{}$,
-that after simplification, a regular expression is
-either good or $\RZERO$:
-\begin{lemma}\label{good1}
- For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
-\end{lemma}
-\begin{proof}
- By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
- Lemma \ref{rsimpSize} says that
- $\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
- $\llbracket r \rrbracket_r$.
- Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
- Inductive hypothesis applies to the children regular expressions
- $r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
- by that as well.
- The lemmas \ref{nonnestedRsimp} and \ref{nonaltFltsRd} are used
- to ensure that goodness is preserved at the topmost level.
-\end{proof}
-We shall prove that any good regular expression is
-a fixed-point for $\rsimp{}$.
-First we prove an auxiliary lemma:
-\begin{lemma}\label{goodaltsNonalt}
- If $\good \; \sum rs$, then $\rflts\; rs = rs$.
-\end{lemma}
-\begin{proof}
- By an induction on $\sum rs$. The inductive rules are the cases
- for $\good$.
-\end{proof}
-\noindent
-Now we are ready to prove that good regular expressions are invariant
-of $\rsimp{}$ application:
-\begin{lemma}\label{test}
- If $\good \;r$ then $\rsimp{r} = r$.
-\end{lemma}
-\begin{proof}
- By an induction on the inductive cases of $\good$, using lemmas
- \ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
- The lemma \ref{goodaltsNonalt} is used in the alternative
- case where 2 or more elements are present in the list.
-\end{proof}
-\noindent
-Given below is a property involving $\rflts$, $\rdistinct{}{}$, $\rsimp{}$ and $\rsimp_{ALTS}$,
-which requires $\ref{good1}$ to go through smoothly.
-It says that an application of $\rsimp_{ALTS}$ can be "absorbed",
-if it its output is concatenated with a list and then applied to $\rflts$.
-\begin{lemma}\label{flattenRsimpalts}
- $\rflts \; ( (\rsimp_{ALTS} \;
- (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) ::
- \map \; \rsimp{} \; rs' ) =
- \rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
- \map \; \rsimp{rs'}))$
-
-
-\end{lemma}
-\begin{proof}
- By \ref{good1}.
-\end{proof}
-\noindent
-
-
-
-
-
-We are also ready to prove that $\textit{rsimp}$ is idempotent.
-\subsubsection{$\rsimp$ is Idempotent}
-The idempotency of $\rsimp$ is very useful in
-manipulating regular expression terms into desired
-forms so that key steps allowing further rewriting to closed forms
-are possible.
-\begin{lemma}\label{rsimpIdem}
- $\rsimp{r} = \rsimp{\rsimp{r}}$
-\end{lemma}
-
-\begin{proof}
- By \ref{test} and \ref{good1}.
-\end{proof}
-\noindent
-This property means we do not have to repeatedly
-apply simplification in each step, which justifies
-our definition of $\blexersimp$.
-
-
-On the other hand, we could repeat the same $\rsimp{}$ applications
-on regular expressions as many times as we want, if we have at least
-one simplification applied to it, and apply it wherever we would like to:
-\begin{corollary}\label{headOneMoreSimp}
- The following properties hold, directly from \ref{rsimpIdem}:
-
- \begin{itemize}
- \item
- $\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
- \item
- $\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
- \end{itemize}
-\end{corollary}
-\noindent
-This will be useful in later closed form proof's rewriting steps.
-Similarly, we point out the following useful facts below:
-\begin{lemma}
- The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
- \begin{itemize}
- \item
- If $r = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
- \item
- If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
- \item
- $\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
- \end{itemize}
-\end{lemma}
-\begin{proof}
- By application of lemmas \ref{rsimpIdem} and \ref{good1}.
-\end{proof}
-
-\noindent
-With the idempotency of $\rsimp{}$ and its corollaries,
-we can start proving some key equalities leading to the
-closed forms.
-Now presented are a few equivalent terms under $\rsimp{}$.
-We use $r_1 \sequal r_2 $ here to denote $\rsimp{r_1} = \rsimp{r_2}$.
-\begin{lemma}
- \begin{itemize}
- The following equivalence hold:
- \item
- $\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
- \item
- $\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
- \item
- $\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
- \item
- $\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
- \item
- $\RALTS{rs} = \RALTS{\map \; \rsimp{} \; rs}$
-\end{itemize}
-\end{lemma}
-\begin{proof}
- By induction on the lists involved.
-\end{proof}
-\noindent
-The above allows us to prove
-two similar equalities (which are a bit more involved).
-It says that we could flatten out the elements
-before simplification and still get the same result.
-\begin{lemma}\label{simpFlatten3}
- One can flatten the inside $\sum$ of a $\sum$ if it is being
- simplified. Concretely,
- \begin{itemize}
- \item
- If for all $r \in rs, rs', rs''$, we have $\good \; r $
- or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal
- \sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
- \item
- $\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
- \end{itemize}
-\end{lemma}
-\begin{proof}
- By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
- The second sub-lemma is a corollary of the previous.
-\end{proof}
-%Rewriting steps not put in--too long and complicated-------------------------------
-\begin{comment}
- \begin{center}
- $\rsimp{\sum (rs' @ rs @ rs'')} \stackrel{def of bsimp}{=}$ \\
- $\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
- $\stackrel{by \ref{test}}{=}
- \rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
- \varnothing})$\\
- $\stackrel{by \ref{rdistinctConcatGeneral}}{=}
- \rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
- \rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
-
- \end{center}
-\end{comment}
-%Rewriting steps not put in--too long and complicated-------------------------------
-\noindent
-We need more equalities like the above to enable a closed form,
-but to proceed we need to introduce two rewrite relations,
-to make things smoother.
\subsection{The rewrite relation $\hrewrite$ , $\scfrewrites$ , $\frewrite$ and $\grewrite$}
-Insired by the success we had in the correctness proof
-in \ref{Bitcoded2}, where we invented
-a term rewriting system to capture the similarity between terms,
-we follow suit here defining simplification
-steps as rewriting steps. This allows capturing
+Inspired by the success we had in the correctness proof
+in \ref{Bitcoded2},
+we follow suit here, defining atomic simplification
+steps as ``small-step'' rewriting steps. This allows capturing
similarities between terms that would be otherwise
hard to express.
@@ -1138,10 +802,12 @@
regular expression simplification,
$\frewrite$ for rewrite of list of regular expressions that
include all operations carried out in $\rflts$, and $\grewrite$ for
-rewriting a list of regular expressions possible in both $\rflts$ and $\rdistinct{}{}$.
+rewriting a list of regular expressions possible in both $\rflts$ and $\textit{rdistinct}$.
Their reflexive transitive closures are used to denote zero or many steps,
as was the case in the previous chapter.
-The presentation will be more concise than that in \ref{Bitcoded2}.
+As we have already
+done something similar, the presentation about
+these rewriting rules will be more concise than that in \ref{Bitcoded2}.
To differentiate between the rewriting steps for annotated regular expressions
and $\rrexp$s, we add characters $h$ and $g$ below the squig arrow symbol
to mean atomic simplification transitions
@@ -1149,9 +815,8 @@
-List of one-step rewrite rules for $\rrexp$ ($\hrewrite$):
-
+\begin{figure}[H]
\begin{center}
\begin{mathpar}
\inferrule[RSEQ0L]{}{\RZERO \cdot r_2 \hrewrite \RZERO\\}
@@ -1178,22 +843,30 @@
\end{mathpar}
\end{center}
+\caption{List of one-step rewrite rules for r-regular expressions ($\hrewrite$)}\label{hRewrite}
+\end{figure}
-List of rewrite rules for a list of regular expressions,
+Like $\rightsquigarrow_s$, it is
+convenient to define rewrite rules for a list of regular expressions,
where each element can rewrite in many steps to the other (scf stands for
li\emph{s}t \emph{c}losed \emph{f}orm). This relation is similar to the
$\stackrel{s*}{\rightsquigarrow}$ for annotated regular expressions.
+\begin{figure}[H]
\begin{center}
\begin{mathpar}
\inferrule{}{[] \scfrewrites [] }
+
\inferrule{r \hrewrites r' \\ rs \scfrewrites rs'}{r :: rs \scfrewrites r' :: rs'}
\end{mathpar}
\end{center}
+\caption{List of one-step rewrite rules for a list of r-regular expressions}\label{scfRewrite}
+\end{figure}
%frewrite
List of one-step rewrite rules for flattening
a list of regular expressions($\frewrite$):
+\begin{figure}[H]
\begin{center}
\begin{mathpar}
\inferrule{}{\RZERO :: rs \frewrite rs \\}
@@ -1203,9 +876,12 @@
\inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2}
\end{mathpar}
\end{center}
+\caption{List of one-step rewrite rules characterising the $\rflts$ operation on a list}\label{fRewrites}
+\end{figure}
Lists of one-step rewrite rules for flattening and de-duplicating
a list of regular expressions ($\grewrite$):
+\begin{figure}[H]
\begin{center}
\begin{mathpar}
\inferrule{}{\RZERO :: rs \grewrite rs \\}
@@ -1217,23 +893,27 @@
\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
\end{mathpar}
\end{center}
-
+\caption{List of one-step rewrite rules characterising the $\rflts$ and $\textit{rdistinct}$
+operations}\label{gRewrite}
+\end{figure}
\noindent
We defined
-two separate list rewriting definitions $\frewrite$ and $\grewrite$.
+two separate list rewriting relations $\frewrite$ and $\grewrite$.
The rewriting steps that take place during
flattening is characterised by $\frewrite$.
$\grewrite$ characterises both flattening and de-duplicating.
Sometimes $\grewrites$ is slightly too powerful
-so we would rather use $\frewrites$ because we only
-need to prove about certain equivalence under the rewriting steps of $\frewrites$.
+so we would rather use $\frewrites$ to prove
+%because we only
+equalities related to $\rflts$.
+%certain equivalence under the rewriting steps of $\frewrites$.
For example, when proving the closed-form for the alternative regular expression,
-one of the rewriting steps would be:
-\begin{lemma}
+one of the equalities needed is:
+\begin{center}
$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
\sum (\rDistinct \;\; (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
$
-\end{lemma}
+\end{center}
\noindent
Proving this is by first showing
\begin{lemma}\label{earlyLaterDerFrewrites}
@@ -1241,13 +921,20 @@
\rflts \; (\map \; (\_ \backslash x) \; rs)$
\end{lemma}
\noindent
-and then using lemma
+and then the equivalence between two terms
+that can reduce in many steps to each other.
\begin{lemma}\label{frewritesSimpeq}
If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal
\sum (\rDistinct \; rs_2 \; \varnothing)$.
\end{lemma}
\noindent
-is a piece of cake.
+
+\begin{corollary}
+ $\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
+ \sum (\rDistinct \;\; (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
+ $
+\end{corollary}
+
But this trick will not work for $\grewrites$.
For example, a rewriting step in proving
closed forms is:
@@ -1268,14 +955,21 @@
For this rewriting step we will introduce some slightly more cumbersome
proof technique in later sections.
The point is that $\frewrite$
-allows us to prove equivalence in a straightforward two-step method that is
-not possible for $\grewrite$, thereby reducing the complexity of the entire proof.
+allows us to prove equivalence in a straightforward way that is
+not possible for $\grewrite$.
\subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$}
+In this part, we present lemmas stating
+pairs of r-regular expressions or r-regular expression lists
+where one could rewrite from one in many steps to the other.
+Most of the proofs to these lemmas are straightforward, using
+an induction on the inductive cases of the corresponding rewriting relations.
+These proofs will therefore be omitted when this is the case.
We present in the below lemma a few pairs of terms that are rewritable via
$\grewrites$:
\begin{lemma}\label{gstarRdistinctGeneral}
+ \mbox{}
\begin{itemize}
\item
$rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$
@@ -1295,7 +989,7 @@
then they are equivalent under $\rsimp{}$:
\begin{lemma}\label{grewritesSimpalts}
If $rs_1 \grewrites rs_2$, then
- we have the following equivalence hold:
+ we have the following equivalence:
\begin{itemize}
\item
$\sum rs_1 \sequal \sum rs_2$
@@ -1360,10 +1054,13 @@
\end{lemma}
\begin{proof}
Part 1 is by lemma \ref{insideSimpRemoval},
- part 2 is by lemmas \ref{insideSimpRemoval} and \ref{distinctDer}.
+ part 2 is by lemma \ref{insideSimpRemoval} .%and \ref{distinctDer}.
\end{proof}
\noindent
-This leads to the first closed form--
+
+\subsection{Closed Forms for $\sum rs$, $r_1\cdot r_2$ and $r^*$}
+\subsubsection{Closed Form for Alternative Regular Expression}
+Lemma \ref{Simpders} leads to the first closed form--
\begin{lemma}\label{altsClosedForm}
\begin{center}
$\rderssimp{(\sum rs)}{s} \sequal
@@ -1398,18 +1095,11 @@
\end{corollary}
\noindent
The harder closed forms are the sequence and star ones.
-Before we go on to obtain them, some preliminary definitions
+Before we obtain them, some preliminary definitions
are needed to make proof statements concise.
-
-
-\subsection{Closed Forms}
\subsubsection{Closed Form for Sequence Regular Expressions}
-The problem of obataining a closed-form for sequence regular expression
-is constructing $(r_1 \cdot r_2) \backslash_r s$
-if we are only allowed to use a combination of $r_1 \backslash s''$
-and $r_2 \backslash s''$ , where $s''$ is from $s$.
First let's look at a series of derivatives steps on a sequence
regular expression, assuming that each time the first
component of the sequence is always nullable):
@@ -1815,6 +1505,441 @@
+%----------------------------------------------------------------------------------------
+% SECTION ??
+%----------------------------------------------------------------------------------------
+
+%-----------------------------------
+% SECTION syntactic equivalence under simp
+%-----------------------------------
+
+
+%----------------------------------------------------------------------------------------
+% SECTION ALTS CLOSED FORM
+%----------------------------------------------------------------------------------------
+%\section{A Closed Form for \textit{ALTS}}
+%Now we prove that $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
+%
+%
+%There are a few key steps, one of these steps is
+%
+%
+%
+%One might want to prove this by something a simple statement like:
+%
+%For this to hold we want the $\textit{distinct}$ function to pick up
+%the elements before and after derivatives correctly:
+%$r \in rset \equiv (rder x r) \in (rder x rset)$.
+%which essentially requires that the function $\backslash$ is an injective mapping.
+%
+%Unfortunately the function $\backslash c$ is not an injective mapping.
+%
+%\subsection{function $\backslash c$ is not injective (1-to-1)}
+%\begin{center}
+% The derivative $w.r.t$ character $c$ is not one-to-one.
+% Formally,
+% $\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
+%\end{center}
+%This property is trivially true for the
+%character regex example:
+%\begin{center}
+% $r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
+%\end{center}
+%But apart from the cases where the derivative
+%output is $\ZERO$, are there non-trivial results
+%of derivatives which contain strings?
+%The answer is yes.
+%For example,
+%\begin{center}
+% Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
+% where $a$ is not nullable.\\
+% $r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
+% $r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
+%\end{center}
+%We start with two syntactically different regular expressions,
+%and end up with the same derivative result.
+%This is not surprising as we have such
+%equality as below in the style of Arden's lemma:\\
+%\begin{center}
+% $L(A^*B) = L(A\cdot A^* \cdot B + B)$
+%\end{center}
+\section{Bounding Closed Forms}
+
+In this section, we introduce how we formalised the bound
+on closed forms.
+We first prove that functions such as $\rflts$
+will not cause the size of r-regular expressions to grow.
+Putting this together with a general bound
+on the finiteness of distinct regular expressions
+smaller than a certain size, we obtain a bound on
+the closed forms.
+
+\subsection{$\textit{rsimp}$ Does Not Increment the Size}
+Although it seems evident, we need a series
+of non-trivial lemmas to establish that functions such as $\rflts$
+do not cause the regular expressions to grow.
+\begin{lemma}\label{rsimpMonoLemmas}
+ \mbox{}
+ \begin{itemize}
+ \item
+ \[
+ \llbracket \rsimpalts \; rs \rrbracket_r \leq
+ \llbracket \sum \; rs \rrbracket_r
+ \]
+ \item
+ \[
+ \llbracket \rsimpseq \; r_1 \; r_2 \rrbracket_r \leq
+ \llbracket r_1 \cdot r_2 \rrbracket_r
+ \]
+ \item
+ \[
+ \llbracket \rflts \; rs \rrbracket_r \leq
+ \llbracket rs \rrbracket_r
+ \]
+ \item
+ \[
+ \llbracket \rDistinct \; rs \; ss \rrbracket_r \leq
+ \llbracket rs \rrbracket_r
+ \]
+ \item
+ If all elements $a$ in the set $as$ satisfy the property
+ that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
+ \llbracket a \rrbracket_r$, then we have
+ \[
+ \llbracket \; \rsimpalts \; (\textit{rdistinct} \;
+ (\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
+ \rrbracket \leq
+ \llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
+ \textit{rsimp} \; x))\; \{ \} ) \rrbracket_r
+ \]
+ \end{itemize}
+\end{lemma}
+\begin{proof}
+ Point 1, 3, 4 can be proven by an induction on $rs$.
+ Point 2 is by case analysis on $r_1$ and $r_2$.
+ The last part is a corollary of the previous ones.
+\end{proof}
+\noindent
+With the lemmas for each inductive case in place, we are ready to get
+the non-increasing property as a corollary:
+\begin{corollary}\label{rsimpMono}
+ $\llbracket \textit{rsimp} \; r \rrbracket_r \leq \llbracket r \rrbracket_r$
+\end{corollary}
+\begin{proof}
+ By \ref{rsimpMonoLemmas}.
+\end{proof}
+
+\subsubsection{The $RL$ Function: Language Interpretation of $\textit{Rrexp}$s}
+Much like the definition of $L$ on plain regular expressions, one could also
+define the language interpretation of $\rrexp$s.
+\begin{center}
+ \begin{tabular}{lcl}
+ $RL \; (\ZERO)$ & $\dn$ & $\phi$\\
+ $RL \; (\ONE)$ & $\dn$ & $\{[]\}$\\
+ $RL \; (c)$ & $\dn$ & $\{[c]\}$\\
+ $RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
+ $RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
+ $RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$
+ \end{tabular}
+\end{center}
+\noindent
+The main use of $RL$ is to establish some connections between $\rsimp{}$
+and $\rnullable{}$:
+\begin{lemma}
+ The following properties hold:
+ \begin{itemize}
+ \item
+ If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
+ \item
+ $\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
+ \end{itemize}
+\end{lemma}
+\begin{proof}
+ The first part is by induction on $r$.
+ The second part is true because property
+ \[ RL \; r = RL \; (\rsimp{r})\] holds.
+\end{proof}
+
+\subsubsection{Simplified $\textit{Rrexp}$s are Good}
+We formalise the notion of ``good" regular expressions,
+which means regular expressions that
+are not fully simplified. For alternative regular expressions that means they
+do not contain any nested alternatives like
+\[ r_1 + (r_2 + r_3) \], un-removed $\RZERO$s like \[\RZERO + r\]
+or duplicate elements in a children regular expression list like \[ \sum [r, r, \ldots]\]:
+\begin{center}
+ \begin{tabular}{@{}lcl@{}}
+ $\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
+ $\good\; \RONE$ & $\dn$ & $\textit{true}$\\
+ $\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
+ $\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
+ $\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
+ $\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ &
+ $\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
+ & & $\textit{and}\; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \, \textit{and}\; \, \textit{nonAlt}\; r')$\\
+ $\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
+ $\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
+ $\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
+ $\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
+ $\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
+ \end{tabular}
+\end{center}
+\noindent
+The predicate $\textit{nonAlt}$ evaluates to true when the regular expression is not an
+alternative, and false otherwise.
+The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
+its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$,
+and unique:
+\begin{lemma}\label{rsimpaltsGood}
+ If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
+ then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
+\end{lemma}
+\noindent
+We also note that
+if a regular expression $r$ is good, then $\rflts$ on the singleton
+list $[r]$ will not break goodness:
+\begin{lemma}\label{flts2}
+ If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
+\end{lemma}
+\begin{proof}
+ By an induction on $r$.
+\end{proof}
+\noindent
+The other observation we make about $\rsimp{r}$ is that it never
+comes with nested alternatives, which we describe as the $\nonnested$
+property:
+\begin{center}
+ \begin{tabular}{lcl}
+ $\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
+ $\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
+ $\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
+ $\nonnested \; \, r $ & $\dn$ & $\btrue$
+ \end{tabular}
+\end{center}
+\noindent
+The $\rflts$ function
+always opens up nested alternatives,
+which enables $\rsimp$ to be non-nested:
+
+\begin{lemma}\label{nonnestedRsimp}
+ $\nonnested \; (\rsimp{r})$
+\end{lemma}
+\begin{proof}
+ By an induction on $r$.
+\end{proof}
+\noindent
+With this we could prove that a regular expressions
+after simplification and flattening and de-duplication,
+will not contain any alternative regular expression directly:
+\begin{lemma}\label{nonaltFltsRd}
+ If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$
+ then $\textit{nonAlt} \; x$.
+\end{lemma}
+\begin{proof}
+ By \ref{nonnestedRsimp}.
+\end{proof}
+\noindent
+The other thing we know is that once $\rsimp{}$ had finished
+processing an alternative regular expression, it will not
+contain any $\RZERO$s, this is because all the recursive
+calls to the simplification on the children regular expressions
+make the children good, and $\rflts$ will not take out
+any $\RZERO$s out of a good regular expression list,
+and $\rdistinct{}$ will not mess with the result.
+\begin{lemma}\label{flts3Obv}
+ The following are true:
+ \begin{itemize}
+ \item
+ If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
+ then for all $r \in \rflts\; rs. \, \good \; r$.
+ \item
+ If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
+ and for all $y$ such that $\llbracket y \rrbracket_r$ less than
+ $\llbracket rs \rrbracket_r + 1$, either
+ $\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
+ then $\good \; x$.
+ \end{itemize}
+\end{lemma}
+\begin{proof}
+ The first part is by induction on $rs$, where the induction
+ rule is the inductive cases for $\rflts$.
+ The second part is a corollary from the first part.
+\end{proof}
+
+And this leads to good structural property of $\rsimp{}$,
+that after simplification, a regular expression is
+either good or $\RZERO$:
+\begin{lemma}\label{good1}
+ For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
+\end{lemma}
+\begin{proof}
+ By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
+ Lemma \ref{rsimpSize} says that
+ $\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
+ $\llbracket r \rrbracket_r$.
+ Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
+ Inductive hypothesis applies to the children regular expressions
+ $r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
+ by that as well.
+ The lemmas \ref{nonnestedRsimp} and \ref{nonaltFltsRd} are used
+ to ensure that goodness is preserved at the topmost level.
+\end{proof}
+We shall prove that any good regular expression is
+a fixed-point for $\rsimp{}$.
+First we prove an auxiliary lemma:
+\begin{lemma}\label{goodaltsNonalt}
+ If $\good \; \sum rs$, then $\rflts\; rs = rs$.
+\end{lemma}
+\begin{proof}
+ By an induction on $\sum rs$. The inductive rules are the cases
+ for $\good$.
+\end{proof}
+\noindent
+Now we are ready to prove that good regular expressions are invariant
+of $\rsimp{}$ application:
+\begin{lemma}\label{test}
+ If $\good \;r$ then $\rsimp{r} = r$.
+\end{lemma}
+\begin{proof}
+ By an induction on the inductive cases of $\good$, using lemmas
+ \ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
+ The lemma \ref{goodaltsNonalt} is used in the alternative
+ case where 2 or more elements are present in the list.
+\end{proof}
+\noindent
+Given below is a property involving $\rflts$, $\rdistinct{}{}$, $\rsimp{}$ and $\rsimp_{ALTS}$,
+which requires $\ref{good1}$ to go through smoothly.
+It says that an application of $\rsimp_{ALTS}$ can be "absorbed",
+if it its output is concatenated with a list and then applied to $\rflts$.
+\begin{lemma}\label{flattenRsimpalts}
+ $\rflts \; ( (\rsimp_{ALTS} \;
+ (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) ::
+ \map \; \rsimp{} \; rs' ) =
+ \rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
+ \map \; \rsimp{rs'}))$
+
+
+\end{lemma}
+\begin{proof}
+ By \ref{good1}.
+\end{proof}
+\noindent
+
+
+
+
+
+We are also ready to prove that $\textit{rsimp}$ is idempotent.
+\subsubsection{$\rsimp$ is Idempotent}
+The idempotency of $\rsimp$ is very useful in
+manipulating regular expression terms into desired
+forms so that key steps allowing further rewriting to closed forms
+are possible.
+\begin{lemma}\label{rsimpIdem}
+ $\rsimp{r} = \rsimp{\rsimp{r}}$
+\end{lemma}
+
+\begin{proof}
+ By \ref{test} and \ref{good1}.
+\end{proof}
+\noindent
+This property means we do not have to repeatedly
+apply simplification in each step, which justifies
+our definition of $\blexersimp$.
+
+
+On the other hand, we could repeat the same $\rsimp{}$ applications
+on regular expressions as many times as we want, if we have at least
+one simplification applied to it, and apply it wherever we would like to:
+\begin{corollary}\label{headOneMoreSimp}
+ The following properties hold, directly from \ref{rsimpIdem}:
+
+ \begin{itemize}
+ \item
+ $\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
+ \item
+ $\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
+ \end{itemize}
+\end{corollary}
+\noindent
+This will be useful in later closed form proof's rewriting steps.
+Similarly, we point out the following useful facts below:
+\begin{lemma}
+ The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
+ \begin{itemize}
+ \item
+ If $r = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
+ \item
+ If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
+ \item
+ $\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
+ \end{itemize}
+\end{lemma}
+\begin{proof}
+ By application of lemmas \ref{rsimpIdem} and \ref{good1}.
+\end{proof}
+
+\noindent
+With the idempotency of $\rsimp{}$ and its corollaries,
+we can start proving some key equalities leading to the
+closed forms.
+Now presented are a few equivalent terms under $\rsimp{}$.
+We use $r_1 \sequal r_2 $ here to denote $\rsimp{r_1} = \rsimp{r_2}$.
+\begin{lemma}
+ \begin{itemize}
+ The following equivalence hold:
+ \item
+ $\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
+ \item
+ $\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
+ \item
+ $\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
+ \item
+ $\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
+ \item
+ $\RALTS{rs} = \RALTS{\map \; \rsimp{} \; rs}$
+\end{itemize}
+\end{lemma}
+\begin{proof}
+ By induction on the lists involved.
+\end{proof}
+\noindent
+The above allows us to prove
+two similar equalities (which are a bit more involved).
+It says that we could flatten out the elements
+before simplification and still get the same result.
+\begin{lemma}\label{simpFlatten3}
+ One can flatten the inside $\sum$ of a $\sum$ if it is being
+ simplified. Concretely,
+ \begin{itemize}
+ \item
+ If for all $r \in rs, rs', rs''$, we have $\good \; r $
+ or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal
+ \sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
+ \item
+ $\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
+ \end{itemize}
+\end{lemma}
+\begin{proof}
+ By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
+ The second sub-lemma is a corollary of the previous.
+\end{proof}
+%Rewriting steps not put in--too long and complicated-------------------------------
+\begin{comment}
+ \begin{center}
+ $\rsimp{\sum (rs' @ rs @ rs'')} \stackrel{def of bsimp}{=}$ \\
+ $\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
+ $\stackrel{by \ref{test}}{=}
+ \rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
+ \varnothing})$\\
+ $\stackrel{by \ref{rdistinctConcatGeneral}}{=}
+ \rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
+ \rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
+
+ \end{center}
+\end{comment}
+%Rewriting steps not put in--too long and complicated-------------------------------
+\noindent
\subsection{Estimating the Closed Forms' sizes}
We now summarize the closed forms below:
\begin{itemize}
@@ -2254,66 +2379,7 @@
-%----------------------------------------------------------------------------------------
-% SECTION ??
-%----------------------------------------------------------------------------------------
-
-%-----------------------------------
-% SECTION syntactic equivalence under simp
-%-----------------------------------
-
-
-%----------------------------------------------------------------------------------------
-% SECTION ALTS CLOSED FORM
-%----------------------------------------------------------------------------------------
-%\section{A Closed Form for \textit{ALTS}}
-%Now we prove that $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
-%
-%
-%There are a few key steps, one of these steps is
-%
-%
-%
-%One might want to prove this by something a simple statement like:
-%
-%For this to hold we want the $\textit{distinct}$ function to pick up
-%the elements before and after derivatives correctly:
-%$r \in rset \equiv (rder x r) \in (rder x rset)$.
-%which essentially requires that the function $\backslash$ is an injective mapping.
-%
-%Unfortunately the function $\backslash c$ is not an injective mapping.
-%
-%\subsection{function $\backslash c$ is not injective (1-to-1)}
-%\begin{center}
-% The derivative $w.r.t$ character $c$ is not one-to-one.
-% Formally,
-% $\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
-%\end{center}
-%This property is trivially true for the
-%character regex example:
-%\begin{center}
-% $r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
-%\end{center}
-%But apart from the cases where the derivative
-%output is $\ZERO$, are there non-trivial results
-%of derivatives which contain strings?
-%The answer is yes.
-%For example,
-%\begin{center}
-% Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
-% where $a$ is not nullable.\\
-% $r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
-% $r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
-%\end{center}
-%We start with two syntactically different regular expressions,
-%and end up with the same derivative result.
-%This is not surprising as we have such
-%equality as below in the style of Arden's lemma:\\
-%\begin{center}
-% $L(A^*B) = L(A\cdot A^* \cdot B + B)$
-%\end{center}
-
-\section{Further Improvements to the Bound}
+\section{Possible Further Improvements}
There are two problems with this finiteness result, though.
\begin{itemize}
\item