lexing: comparison ChengsongTanPhdThesis/Chapters/Finite.tex

equal deleted inserted replaced

-:8c234a1bc7e0
+:b0f0d884a547
 \label{Finite}
 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
 %of our bitcoded algorithm, that is a finite bound on the size of any
 %regex's derivatives.
-In this chapter we give a guarantee in terms of size:
+\begin{figure}
-given an annotated regular expression $a$, for any string $s$
-our algorithm $\blexersimp$'s internal annotated regular expression
-size  is finitely bounded
-by a constant $N_a$ that only depends on $a$:
-\begin{center}
-	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
-\end{center}
-\noindent
-where the size of an annotated regular expression is defined
-in terms of the number of nodes in its tree structure:
 \begin{center}
 	\begin{tabular}{ccc}
 		$\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
 		$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
 		$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
 		$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
 		$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as   + 1$\\
 		$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
 	\end{tabular}
 \end{center}
-We believe this size formalisation
+\caption{The size function of bitcoded regular expressions}\label{brexpSize}
-of the algorithm is important in our context, because
+\end{figure}
+In this chapter we give a guarantee in terms of size:
+given an annotated regular expression $a$, for any string $s$
+our algorithm $\blexersimp$'s internal annotated regular expression
+size  is finitely bounded
+by a constant $N_a$ that only depends on $a$:
+\begin{center}
+	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
+\end{center}
+\noindent
+where the size of an annotated regular expression is defined
+in terms of the number of nodes in its tree structure (see figure \ref{brexpSize}).
+We believe this size bound
+is important in the context of POSIX lexing, because
 \begin{itemize}
 	\item
 		It is a stepping stone towards an ``absence of catastrophic-backtracking''
-		guarantee. The next step would be to refine the bound $N_a$ so that it
+		guarantee.
+		If the internal data structures used by our algorithm cannot grow very large,
+		then our algorithm (which traverses those structures) cannot be too slow.
+		The next step would be to refine the bound $N_a$ so that it
 		is polynomial on $\llbracket a\rrbracket$.
 	\item
-		The size bound proof gives us a higher confidence that
+		Having it formalised gives us a higher confidence that
 		our simplification algorithm $\simp$ does not ``mis-behave''
 		like $\simpsulz$ does.
 		The bound is universal, which is an advantage over work which
 		only gives empirical evidence on some test cases.
 \end{itemize}
 \section{Formalising About Size}
 \noindent
-In our lexer $\blexersimp$,
+In our lexer ($\blexersimp$),
-The regular expression is repeatedly being taken derivative of
+we take an annotated regular expression as input,
-and then simplified.
+and repeately take derivative of and simplify it:
 \begin{figure}[H]
 	\begin{tikzpicture}[scale=2,
 		every node/.style={minimum size=11mm},
 		->,>=stealth',shorten >=1pt,auto,thick
 		]
 	\end{tikzpicture}
 	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{simpShrinks}
 \end{figure}
 \noindent
 Each time
-a derivative is taken, a regular expression might grow a bit,
+a derivative is taken, the regular expression might grow.
-but simplification always takes care that
+However, the simplification that is immediately afterwards will always shrink it so that
 it stays small.
 This intuition is depicted by the relative size
 change between the black and blue nodes:
 After $\simp$ the node always shrinks.
 Our proof says that all the blue nodes
-stay below a size bound $N_a$ determined by $a$.
+stay below a size bound $N_a$ determined by the input $a$.
 \noindent
-Sulzmann and Lu's assumed something similar about their algorithm,
+Sulzmann and Lu's assumed a similar picture about their algorithm,
 though in fact their algorithm's size might be better depicted by the following graph:
 \begin{figure}[H]
 	\begin{tikzpicture}[scale=2,
 		every node/.style={minimum size=11mm},
 		->,>=stealth',shorten >=1pt,auto,thick
 	\end{tikzpicture}
 	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
 \end{figure}
 \noindent
-That is, on certain cases their lexer
+The picture means that on certain cases their lexer (where they use $\simpsulz$
+as the simplification function)
 will have an indefinite size explosion, causing the running time
-of each derivative step to grow arbitrarily large (for example
+of each derivative step to grow continuously (for example
 in \ref{SulzmannLuLexerTime}).
-The reason they made this mistake was that
+They tested out the run time of their
-they tested out the run time of their
 lexer on particular examples such as $(a+b+ab)^*$
-and generalised to all cases, which
+and claimed that their algorithm is linear w.r.t to the input.
-cannot happen with our mecahnised proof.\\
+With our mecahnised proof, we avoid this type of unintentional
-We give details of the proof in the next sections.
+generalisation.\\
+Before delving in to the details of the formalisation,
+we are going to provide an overview of it.
+In the next subsection, we draw a picture of the bird's eye view
+of the proof.
 \subsection{Overview of the Proof}
-Here is a bird's eye view of how the proof of finiteness works,
+Here is a bird's eye view of the main components of the finiteness proof,
 which involves three steps:
 \begin{figure}[H]
 	\begin{tikzpicture}[scale=1,font=\bf,
 		node/.style={
 			rectangle,rounded corners=3mm,
 		The key observation is that $\distinctBy$'s output is
 		a list with a constant length bound.
 \end{itemize}
 We will expand on these steps in the next sections.\\
-\section{The $\textit{Rrexp}$ Datatype and Its Lexing-Related Functions}
+\section{The $\textit{Rrexp}$ Datatype}
 The first step is to define
 $\textit{rrexp}$s.
 They are without bitcodes,
 allowing a much simpler size bound proof.
 Of course, the bits which encode the lexing information
 		$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
 		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
 		$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
 	\end{tabular}
 \end{center}
-\noindent
-$\textit{Rerase}$ throws away the bitcodes
+\subsection{Why a New Datatype?}
-on the annotated regular expressions,
-but keeps everything else intact.
-Therefore it does not change the size
-of an annotated regular expression:
-\begin{lemma}\label{rsizeAsize}
-	$\rsize{\rerase a} = \asize a$
-\end{lemma}
-\begin{proof}
-	By routine structural induction on $a$.
-\end{proof}
-\noindent
-\subsection{Motivation Behind a New Datatype}
 The reason we take all the trouble
 defining a new datatype is that $\erase$ makes things harder.
 We initially started by using
 plain regular expressions and tried to prove
 the lemma \ref{rsizeAsize},
 The  annotated regular expression $\sum[a, b, c]$ would turn into
 $(a+(b+c))$.
 All these operations change the size and structure of
 an annotated regular expressions, adding unnecessary
 complexities to the size bound proof.\\
-For example, if we define the size of a basic regular expression
+For example, if we define the size of a basic plain regular expression
 in the usual way,
 \begin{center}
 	\begin{tabular}{ccc}
 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
 		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
 	\end{tabular}
 \end{center}
 \noindent
 Then the property
 \begin{center}
-	$\llbracket a \rrbracket = \llbracket a_\downarrow \rrbracket_p$
+	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
 \end{center}
 does not hold.
+With $\textit{rerase}$, however,
+only the bitcodes are thrown away.
+Everything about the structure remains intact.
+Therefore it does not change the size
+of an annotated regular expression:
+\begin{lemma}\label{rsizeAsize}
+	$\rsize{\rerase a} = \asize a$
+\end{lemma}
+\begin{proof}
+	By routine structural induction on $a$.
+\end{proof}
+\noindent
 One might be able to prove an inequality such as
 $\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
 and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
 but we found our approach more straightforward.\\
-\subsection{Lexing Related Functions for $\rrexp$ such as $\backslash_r$, $\rdistincts$, and $\rsimp$}
+\subsection{Functions for R-regular Expressions}
-The operations on r-regular expressions are
+We shall define the r-regular expression version
-almost identical to those of the annotated regular expressions,
+of $\blexer$ and $\blexersimp$ related functions.
-except that no bitcodes are used. For example,
+We use $r$ as the prefix or subscript to differentiate
-the derivative operation becomes simpler:\\
+with the bitcoded version.
+For example,$\backslash_r$, $\rdistincts$, and $\rsimp$
+as opposed to $\backslash$, $\distinctBy$, and $\bsimp$.
+As promised, they are much simpler than their bitcoded counterparts.
+%The operations on r-regular expressions are
+%almost identical to those of the annotated regular expressions,
+%except that no bitcodes are used. For example,
+The derivative operation becomes simpler:\\
 \begin{center}
 	\begin{tabular}{@{}lcl@{}}
 		$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\
 		$(\ONE)\,\backslash_r c$ & $\dn$ &
 		$\textit{if}\;c=d\; \;\textit{then}\;
 \end{center}
 \noindent
 We do not define an r-regular expression version of $\blexersimp$,
 as our proof does not involve its use
 (and there is no bitcode to decode into a lexical value).
-Everything about the size of annotated regular expressions
+Now we are ready to introduce how r-regular expressions allow
-can be calculated via the size of r-regular expressions:
+us to prove the size bound on bitcoded regular expressions.
+\subsection{Using R-regular Expressions to Bound Bit-coded Regular Expressions}
+Everything about the size of annotated regular expressions after the application
+of function $\bsimp$ and $\backslash_{simps}$
+can be calculated via the size of r-regular expressions after the application
+of $\rsimp$ and $\backslash_{rsimps}$:
 \begin{lemma}\label{sizeRelations}
 	The following equalities hold:
 	\begin{itemize}
 		\item
 			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
 \begin{center}
 	$\rsize{\rderssimp{\rerase{a}}{s}} \leq N_r \quad$
 	implies
 	$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
 \end{center}
+From now on we
 Unless stated otherwise in the rest of this
 chapter all regular expressions without
 bitcodes are seen as r-regular expressions ($\rrexp$s).
 For the binary alternative r-regular expression $\RALTS{[r_1, r_2]}$,
 we use the notation $r_1 + r_2$
 %This way we get a different but equivalent way of expressing : $r\backslash s = f \; (g\; (\sum rs))$, we call the
 %right hand side the "closed form" of $r\backslash s$.
 %
 %\begin{quote}\it
 %	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
-%	\begin{center}
-%		$ \rderssimp{r_1 \cdot r_2}{s} =
-%		\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
-%	\end{center}
 %\end{quote}
 %\noindent
 %We explain in detail how we reached those claims.
-\subsection{The Idea Behind Closed Forms}
 If we attempt to prove
 \begin{center}
 	$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{rsimps} s \rrbracket_r \leq N_r$
 \end{center}
 using a naive induction on the structure of $r$,
 $(r_1\cdot r_2) \backslash_{rsimps} s$ looks like,
 and therefore $N_{r_1}$ and $N_{r_2}$ in the
 inductive hypotheses cannot be directly used.
 We have already seen that $(r_1 \cdot r_2)\backslash s$
 and $(r^*)\backslash s$ can grow in a wild way.
 The point is that they will be equivalent to a list of
 terms $\sum rs$, where each term in $rs$ will
 be made of $r_1 \backslash s' $, $r_2\backslash s'$,
 and $r \backslash s'$ with $s' \in \textit{SubString} \; s$.
 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
 in the simplification which saves $rs$ from growing indefinitely.
-Based on this idea, we sketch a proof by first showing the equality (where
+Based on this idea, we develop a proof in two steps.
+First, we show the equality (where
 $f$ and $g$ are functions that do not increase the size of the input)
 \begin{center}
-$(r_1 \cdot r_2)\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
+$r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
 \end{center}
-and then show the right-hand-side can be finitely bounded.
+where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
+For example, for $r_1 \cdot r_2$ we have the equality as
+	\begin{center}
+		$ \rderssimp{r_1 \cdot r_2}{s} =
+		\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
+	\end{center}
 We call the right-hand-side the
 \emph{Closed Form} of $(r_1 \cdot r_2)\backslash_{rsimps} s$.
-We will flesh out the proof sketch in the next section.
+Second, we will bound the closed form of r-regular expressions
+using some estimation techniques
-\section{Details of Closed Forms and Bounds}
+and then piece it together
+with lemma \ref{sizeRelations} to show the bitcoded regular expressions
+in our $\blexersimp$ are finitely bounded.
+We will flesh out the first step of the proof we
+sketched just now in the next section.
+\section{Closed Forms}
 In this section we introduce in detail
 how the closed forms are obtained for regular expressions'
-derivatives and how they are bounded.
+derivatives.
-We start by proving some basic identities and inequalities
+We start by proving some basic identities
 involving the simplification functions for r-regular expressions.
-After that we use these identities to establish the
+After that we introduce the rewrite relations
-closed forms we need.
+$\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
-Finally, we prove the functions such as $\flts$
+$\rightsquigarrow_f$ and $\rightsquigarrow_g$.
-will keep the size non-increasing.
+These relations involves similar techniques in chapter \ref{Bitcoded2}.
-Putting this together with a general bound
+Finally, we use these identities to establish the
-on the finiteness of distinct regular expressions
+closed forms of the alternative regular expression,
-smaller than a certain size, we obtain a bound on
+the sequence regular expression, and the star regular expression.
-the closed forms.
 %$r_1\cdot r_2$, $r^*$ and $\sum rs$.
-\subsection{Some Basic Identities and Inequalities}
+\subsection{Some Basic Identities}
-In this subsection, we introduce lemmas
+We now introduce lemmas
 that are repeatedly used in later proofs.
 Note that for the $\textit{rdistinct}$ function there
 will be a lot of conversion from lists to sets.
-We use the name $set$ to refere to the
+We use $set$ to refere to the
 function that converts a list $rs$ to the set
 containing all the elements in $rs$.
 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
 The $\textit{rdistinct}$ function, as its name suggests, will
-remove duplicates in an r-regular expression list.
+de-duplicate an r-regular expression list.
-It will also correctly exclude any elements that
+It will also remove any elements that
-is intially in the accumulator set.
+is already in the accumulator set.
 \begin{lemma}\label{rdistinctDoesTheJob}
 	%The function $\textit{rdistinct}$ satisfies the following
 	%properties:
 	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
 	recursive definition here, its Isabelle counterpart would be $\textit{distinct}$.}
 	readily defined
 	for testing
-	whether a list's elements are all unique. Then the following
+	whether a list's elements are unique. Then the following
 	properties about $\textit{rdistinct}$ hold:
 	\begin{itemize}
 		\item
 			If $a \in acc$ then $a \notin (\rdistinct{rs}{acc})$.
 		\item
 	inductive cases of $\textit{rdistinct}$.
 \end{proof}
 \noindent
-$\textit{rdistinct}$ will cancel out all regular expression terms
+%$\textit{rdistinct}$ will out all regular expression terms
-that are in the accumulator, therefore prepending a list $rs_a$ with an arbitrary
+%that are in the accumulator, therefore
-list $rs$ whose elements are all from the accumulator, and then call $\textit{rdistinct}$
+Concatenating a list $rs_a$ at the front of another
-on the resulting list, the output will be as if we had called $\textit{rdistinct}$
+list $rs$ whose elements are all from the accumulator, and then calling $\textit{rdistinct}$
+on the merged list, the output will be as if we had called $\textit{rdistinct}$
 without the prepending of $rs$:
 \begin{lemma}\label{rdistinctConcat}
 	The elements appearing in the accumulator will always be removed.
 	More precisely,
 	\begin{itemize}
 \end{lemma}
 \begin{proof}
 	By \ref{rdistinctDoesTheJob} and \ref{distinctRemovesMiddle}.
 \end{proof}
 \noindent
+The next lemma is a more general form of \ref{rdistinctConcat},
+it says that
 $\textit{rdistinct}$ is composable w.r.t list concatenation:
 \begin{lemma}\label{distinctRdistinctAppend}
 			If $\;\; \textit{isDistinct} \; rs_1$,
 			and $(set \; rs_1) \cap acc = \varnothing$,
 			then applying $\textit{rdistinct}$ on $rs_1 @ rs_a$ does not
 the alternative regular expression:
 \begin{lemma}\label{rderRsimpAltsCommute}
 	$\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$
 \end{lemma}
 \noindent
-\subsubsection{$\textit{rsimp}$ Does Not Increment the Size}
-Although it seems evident, we need a series
-of non-trivial lemmas to establish this property.
-\begin{lemma}\label{rsimpMonoLemmas}
-	\mbox{}
-	\begin{itemize}
-		\item
-			\[
-				\llbracket \rsimpalts \; rs \rrbracket_r \leq
-				\llbracket \sum \; rs \rrbracket_r
-			\]
-		\item
-			\[
-				\llbracket \rsimpseq \; r_1 \;  r_2 \rrbracket_r \leq
-				\llbracket r_1 \cdot r_2 \rrbracket_r
-			\]
-		\item
-			\[
-				\llbracket \rflts \; rs \rrbracket_r  \leq
-				\llbracket rs \rrbracket_r
-			\]
-		\item
-			\[
-				\llbracket \rDistinct \; rs \; ss \rrbracket_r  \leq
-				\llbracket rs \rrbracket_r
-			\]
-		\item
-			If all elements $a$ in the set $as$ satisfy the property
-			that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
-			\llbracket a \rrbracket_r$, then we have
-			\[
-				\llbracket \; \rsimpalts \; (\textit{rdistinct} \;
-				(\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
-				\rrbracket \leq
-				\llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
-				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r
-			\]
-	\end{itemize}
-\end{lemma}
-\begin{proof}
-	Point 1, 3, 4 can be proven by an induction on $rs$.
-	Point 2 is by case analysis on $r_1$ and $r_2$.
-	The last part is a corollary of the previous ones.
-\end{proof}
-\noindent
-With the lemmas for each inductive case in place, we are ready to get
-the non-increasing property as a corollary:
-\begin{corollary}\label{rsimpMono}
-	$\llbracket \textit{rsimp} \; r \rrbracket_r$
-\end{corollary}
-\begin{proof}
-	By \ref{rsimpMonoLemmas}.
-\end{proof}
-\subsubsection{The $RL$ Function: Language Interpretation of $\textit{Rrexp}$s}
-Much like the definition of $L$ on plain regular expressions, one could also
-define the language interpretation of $\rrexp$s.
-\begin{center}
-	\begin{tabular}{lcl}
-		$RL \; (\ZERO)$ & $\dn$ & $\phi$\\
-		$RL \; (\ONE)$ & $\dn$ & $\{[]\}$\\
-		$RL \; (c)$ & $\dn$ & $\{[c]\}$\\
-		$RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
-		$RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
-		$RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$
-	\end{tabular}
-\end{center}
-\noindent
-The main use of $RL$ is to establish some connections between $\rsimp{}$
-and $\rnullable{}$:
-\begin{lemma}
-	The following properties hold:
-	\begin{itemize}
-		\item
-			If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
-		\item
-			$\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
-	\end{itemize}
-\end{lemma}
-\begin{proof}
-	The first part is by induction on $r$.
-	The second part is true because property
-	\[ RL \; r = RL \; (\rsimp{r})\] holds.
-\end{proof}
-\subsubsection{$\rsimp{}$ is Non-Increasing}
-In this subsection, we prove that the function $\rsimp{}$ does not
-make the $\llbracket \rrbracket_r$ size increase.
-\begin{lemma}\label{rsimpSize}
-	$\llbracket \rsimp{r} \rrbracket_r \leq \llbracket r \rrbracket_r$
-\end{lemma}
-\subsubsection{Simplified $\textit{Rrexp}$s are Good}
-We formalise the notion of ``good" regular expressions,
-which means regular expressions that
-are not fully simplified. For alternative regular expressions that means they
-do not contain any nested alternatives like
-\[ r_1 + (r_2 + r_3) \], un-removed $\RZERO$s like \[\RZERO + r\]
-or duplicate elements in a children regular expression list like \[ \sum [r, r, \ldots]\]:
-\begin{center}
-	\begin{tabular}{@{}lcl@{}}
-		$\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
-		$\good\; \RONE$ & $\dn$ & $\textit{true}$\\
-		$\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
-		$\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
-		$\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
-		$\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ &
-		$\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
-						   & & $\textit{and}\; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \,  \textit{and}\; \, \textit{nonAlt}\; r')$\\
-		$\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
-		$\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
-		$\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
-		$\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
-		$\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
-	\end{tabular}
-\end{center}
-\noindent
-The predicate $\textit{nonAlt}$ evaluates to true when the regular expression is not an
-alternative, and false otherwise.
-The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
-its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$,
-and unique:
-\begin{lemma}\label{rsimpaltsGood}
-	If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
-	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
-\end{lemma}
-\noindent
-We also note that
-if a regular expression $r$ is good, then $\rflts$ on the singleton
-list $[r]$ will not break goodness:
-\begin{lemma}\label{flts2}
-	If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
-\end{lemma}
-\begin{proof}
-	By an induction on $r$.
-\end{proof}
-\noindent
-The other observation we make about $\rsimp{r}$ is that it never
-comes with nested alternatives, which we describe as the $\nonnested$
-property:
-\begin{center}
-	\begin{tabular}{lcl}
-		$\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
-		$\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
-		$\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
-		$\nonnested \; \, r $ & $\dn$ & $\btrue$
-	\end{tabular}
-\end{center}
-\noindent
-The $\rflts$ function
-always opens up nested alternatives,
-which enables $\rsimp$ to be non-nested:
-\begin{lemma}\label{nonnestedRsimp}
-	$\nonnested \; (\rsimp{r})$
-\end{lemma}
-\begin{proof}
-	By an induction on $r$.
-\end{proof}
-\noindent
-With this we could prove that a regular expressions
-after simplification and flattening and de-duplication,
-will not contain any alternative regular expression directly:
-\begin{lemma}\label{nonaltFltsRd}
-	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$
-	then $\textit{nonAlt} \; x$.
-\end{lemma}
-\begin{proof}
-	By \ref{nonnestedRsimp}.
-\end{proof}
-\noindent
-The other thing we know is that once $\rsimp{}$ had finished
-processing an alternative regular expression, it will not
-contain any $\RZERO$s, this is because all the recursive
-calls to the simplification on the children regular expressions
-make the children good, and $\rflts$ will not take out
-any $\RZERO$s out of a good regular expression list,
-and $\rdistinct{}$ will not mess with the result.
-\begin{lemma}\label{flts3Obv}
-	The following are true:
-	\begin{itemize}
-		\item
-			If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
-			then for all $r \in \rflts\; rs. \, \good \; r$.
-		\item
-			If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
-			and for all $y$ such that $\llbracket y \rrbracket_r$ less than
-			$\llbracket rs \rrbracket_r + 1$, either
-			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
-			then $\good \; x$.
-	\end{itemize}
-\end{lemma}
-\begin{proof}
-	The first part is by induction on $rs$, where the induction
-	rule is the inductive cases for $\rflts$.
-	The second part is a corollary from the first part.
-\end{proof}
-And this leads to good structural property of $\rsimp{}$,
-that after simplification, a regular expression is
-either good or $\RZERO$:
-\begin{lemma}\label{good1}
-	For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
-\end{lemma}
-\begin{proof}
-	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
-	Lemma \ref{rsimpSize} says that
-	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
-	$\llbracket r \rrbracket_r$.
-	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
-	Inductive hypothesis applies to the children regular expressions
-	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
-	by that as well.
-	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
-	to ensure that goodness is preserved at the topmost level.
-\end{proof}
-We shall prove that any good regular expression is
-a fixed-point for $\rsimp{}$.
-First we prove an auxiliary lemma:
-\begin{lemma}\label{goodaltsNonalt}
-	If $\good \; \sum rs$, then $\rflts\; rs = rs$.
-\end{lemma}
-\begin{proof}
-	By an induction on $\sum rs$. The inductive rules are the cases
-	for $\good$.
-\end{proof}
-\noindent
-Now we are ready to prove that good regular expressions are invariant
-of $\rsimp{}$ application:
-\begin{lemma}\label{test}
-	If $\good \;r$ then $\rsimp{r} = r$.
-\end{lemma}
-\begin{proof}
-	By an induction on the inductive cases of $\good$, using lemmas
-	\ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
-	The lemma \ref{goodaltsNonalt} is used in the alternative
-	case where 2 or more elements are present in the list.
-\end{proof}
-\noindent
-Given below is a property involving $\rflts$, $\rdistinct{}{}$, $\rsimp{}$ and $\rsimp_{ALTS}$,
-which requires $\ref{good1}$ to go through smoothly.
-It says that an application of $\rsimp_{ALTS}$ can be "absorbed",
-if it its output is concatenated with a list and then applied to $\rflts$.
-\begin{lemma}\label{flattenRsimpalts}
-	$\rflts \; ( (\rsimp_{ALTS} \;
-	(\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) ::
-	\map \; \rsimp{} \; rs' ) =
-	\rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
-	\map \; \rsimp{rs'}))$
-\end{lemma}
-\begin{proof}
-	By \ref{good1}.
-\end{proof}
-\noindent
-We are also ready to prove that $\textit{rsimp}$ is idempotent.
-\subsubsection{$\rsimp$ is Idempotent}
-The idempotency of $\rsimp$ is very useful in
-manipulating regular expression terms into desired
-forms so that key steps allowing further rewriting to closed forms
-are possible.
-\begin{lemma}\label{rsimpIdem}
-	$\rsimp{r} = \rsimp{\rsimp{r}}$
-\end{lemma}
-\begin{proof}
-	By \ref{test} and \ref{good1}.
-\end{proof}
-\noindent
-This property means we do not have to repeatedly
-apply simplification in each step, which justifies
-our definition of $\blexersimp$.
-On the other hand, we could repeat the same $\rsimp{}$ applications
-on regular expressions as many times as we want, if we have at least
-one simplification applied to it, and apply it wherever we would like to:
-\begin{corollary}\label{headOneMoreSimp}
-	The following properties hold, directly from \ref{rsimpIdem}:
-	\begin{itemize}
-		\item
-			$\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
-		\item
-			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
-	\end{itemize}
-\end{corollary}
-\noindent
-This will be useful in later closed form proof's rewriting steps.
-Similarly, we point out the following useful facts below:
-\begin{lemma}
-	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
-	\begin{itemize}
-		\item
-			If $r  = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
-		\item
-			If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
-		\item
-			$\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
-	\end{itemize}
-\end{lemma}
-\begin{proof}
-	By application of lemmas \ref{rsimpIdem} and \ref{good1}.
-\end{proof}
-\noindent
-With the idempotency of $\rsimp{}$ and its corollaries,
-we can start proving some key equalities leading to the
-closed forms.
-Now presented are a few equivalent terms under $\rsimp{}$.
-We use $r_1 \sequal r_2 $ here to denote $\rsimp{r_1} = \rsimp{r_2}$.
-\begin{lemma}
-	\begin{itemize}
-		The following equivalence hold:
-	\item
-		$\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
-	\item
-		$\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
-	\item
-		$\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
-	\item
-		$\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
-	\item
-		$\RALTS{rs} = \RALTS{\map \; \rsimp{} \; rs}$
-\end{itemize}
-\end{lemma}
-\begin{proof}
-	By induction on the lists involved.
-\end{proof}
-\noindent
-The above allows us to prove
-two similar equalities (which are a bit more involved).
-It says that we could flatten out the elements
-before simplification and still get the same result.
-\begin{lemma}\label{simpFlatten3}
-	One can flatten the inside $\sum$ of a $\sum$ if it is being
-	simplified. Concretely,
-	\begin{itemize}
-		\item
-			If for all $r \in rs, rs', rs''$, we have $\good \; r $
-			or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal
-			\sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
-		\item
-			$\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
-	\end{itemize}
-\end{lemma}
-\begin{proof}
-	By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
-	The second sub-lemma is a corollary of the previous.
-\end{proof}
-%Rewriting steps not put in--too long and complicated-------------------------------
-\begin{comment}
-	\begin{center}
-		$\rsimp{\sum (rs' @ rs @ rs'')}  \stackrel{def of bsimp}{=}$  \\
-		$\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
-		$\stackrel{by \ref{test}}{=}
-		\rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
-		\varnothing})$\\
-		$\stackrel{by \ref{rdistinctConcatGeneral}}{=}
-		\rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
-		\rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
-	\end{center}
-\end{comment}
-%Rewriting steps not put in--too long and complicated-------------------------------
-\noindent
 We need more equalities like the above to enable a closed form,
-but to proceed we need to introduce two rewrite relations,
+for which we need to introduce a few rewrite relations
-to make things smoother.
+to help
+us obtain them.
 \subsection{The rewrite relation $\hrewrite$ , $\scfrewrites$ , $\frewrite$ and $\grewrite$}
-Insired by the success we had in the correctness proof
+Inspired by the success we had in the correctness proof
-in \ref{Bitcoded2}, where we invented
+in \ref{Bitcoded2},
-a term rewriting system to capture the similarity between terms,
+we follow suit here, defining atomic simplification
-we follow suit here defining simplification
+steps as ``small-step'' rewriting steps. This allows capturing
-steps as rewriting steps. This allows capturing
 similarities between terms that would be otherwise
 hard to express.
 We use $\hrewrite$ for one-step atomic rewrite of
 regular expression simplification,
 $\frewrite$ for rewrite of list of regular expressions that
 include all operations carried out in $\rflts$, and $\grewrite$ for
-rewriting a list of regular expressions possible in both $\rflts$ and $\rdistinct{}{}$.
+rewriting a list of regular expressions possible in both $\rflts$ and $\textit{rdistinct}$.
 Their reflexive transitive closures are used to denote zero or many steps,
 as was the case in the previous chapter.
-The presentation will be more concise than that in \ref{Bitcoded2}.
+As we have already
+done something similar, the presentation about
+these rewriting rules will be more concise than that in \ref{Bitcoded2}.
 To differentiate between the rewriting steps for annotated regular expressions
 and $\rrexp$s, we add characters $h$ and $g$ below the squig arrow symbol
 to mean atomic simplification transitions
 of $\rrexp$s and $\rrexp$ lists, respectively.
-List of one-step rewrite rules for $\rrexp$ ($\hrewrite$):
+\begin{figure}[H]
 \begin{center}
 	\begin{mathpar}
 		\inferrule[RSEQ0L]{}{\RZERO \cdot r_2 \hrewrite \RZERO\\}
 		\inferrule[RSEQ0R]{}{r_1 \cdot \RZERO \hrewrite \RZERO\\}
 		\inferrule[RALTSDelete]{\\ r_1 = r_2}{\sum rs_a @ [r_1] @ rs_b @ [r_2] @ rsc \hrewrite \sum rs_a @ [r_1] @ rs_b @ rs_c}
 	\end{mathpar}
 \end{center}
+\caption{List of one-step rewrite rules for r-regular expressions ($\hrewrite$)}\label{hRewrite}
+\end{figure}
-List of rewrite rules for a list of regular expressions,
+Like $\rightsquigarrow_s$, it is
+convenient to define rewrite rules for a list of regular expressions,
 where each element can rewrite in many steps to the other (scf stands for
 li\emph{s}t \emph{c}losed \emph{f}orm). This relation is similar to the
 $\stackrel{s*}{\rightsquigarrow}$ for annotated regular expressions.
+\begin{figure}[H]
 \begin{center}
 	\begin{mathpar}
 		\inferrule{}{[] \scfrewrites [] }
 		\inferrule{r \hrewrites r' \\ rs \scfrewrites rs'}{r :: rs \scfrewrites r' :: rs'}
 	\end{mathpar}
 \end{center}
+\caption{List of one-step rewrite rules for a list of r-regular expressions}\label{scfRewrite}
+\end{figure}
 %frewrite
 List of one-step rewrite rules for flattening
 a list of  regular expressions($\frewrite$):
+\begin{figure}[H]
 \begin{center}
 	\begin{mathpar}
 		\inferrule{}{\RZERO :: rs \frewrite rs \\}
 		\inferrule{}{(\sum rs) :: rs_a \frewrite rs @ rs_a \\}
 		\inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2}
 	\end{mathpar}
 \end{center}
+\caption{List of one-step rewrite rules characterising the $\rflts$ operation on a list}\label{fRewrites}
+\end{figure}
 Lists of one-step rewrite rules for flattening and de-duplicating
 a list of regular expressions ($\grewrite$):
+\begin{figure}[H]
 \begin{center}
 	\begin{mathpar}
 		\inferrule{}{\RZERO :: rs \grewrite rs \\}
 		\inferrule{}{(\sum rs) :: rs_a \grewrite rs @ rs_a \\}
 		\inferrule{rs_1 \grewrite rs_2}{r :: rs_1 \grewrite r :: rs_2}
 		\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
 	\end{mathpar}
 \end{center}
+\caption{List of one-step rewrite rules characterising the $\rflts$ and $\textit{rdistinct}$
+operations}\label{gRewrite}
+\end{figure}
 \noindent
 We defined
-two separate list rewriting definitions $\frewrite$ and $\grewrite$.
+two separate list rewriting relations $\frewrite$ and $\grewrite$.
 The rewriting steps that take place during
 flattening is characterised by $\frewrite$.
 $\grewrite$ characterises both flattening and de-duplicating.
 Sometimes $\grewrites$ is slightly too powerful
-so we would rather use $\frewrites$ because we only
+so we would rather use $\frewrites$ to prove
-need to prove about certain equivalence under the rewriting steps of $\frewrites$.
+%because we only
+equalities related to $\rflts$.
+%certain equivalence under the rewriting steps of $\frewrites$.
 For example, when proving the closed-form for the alternative regular expression,
-one of the rewriting steps would be:
+one of the equalities needed is:
-\begin{lemma}
+\begin{center}
 	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
 	\sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
 	$
-\end{lemma}
+\end{center}
 \noindent
 Proving this is by first showing
 \begin{lemma}\label{earlyLaterDerFrewrites}
 	$\map \; (\_ \backslash x) \;  (\rflts \; rs) \frewrites
 	\rflts \; (\map \; (\_ \backslash x) \; rs)$
 \end{lemma}
 \noindent
-and then using lemma
+and then the equivalence between two terms
+that can reduce in many steps to each other.
 \begin{lemma}\label{frewritesSimpeq}
 	If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal
 	\sum (\rDistinct \;  rs_2 \;  \varnothing)$.
 \end{lemma}
 \noindent
-is a piece of cake.
+\begin{corollary}
+	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
+	\sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
+	$
+\end{corollary}
 But this trick will not work for $\grewrites$.
 For example, a rewriting step in proving
 closed forms is:
 \begin{center}
 	$\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))}$\\
 \noindent
 does $\mathbf{not}$ hold in general.
 For this rewriting step we will introduce some slightly more cumbersome
 proof technique in later sections.
 The point is that $\frewrite$
-allows us to prove equivalence in a straightforward two-step method that is
+allows us to prove equivalence in a straightforward way that is
-not possible for $\grewrite$, thereby reducing the complexity of the entire proof.
+not possible for $\grewrite$.
 \subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$}
+In this part, we present lemmas stating
+pairs of r-regular expressions or r-regular expression lists
+where one could rewrite from one in many steps to the other.
+Most of the proofs to these lemmas are straightforward, using
+an induction on the inductive cases of the corresponding rewriting relations.
+These proofs will therefore be omitted when this is the case.
 We present in the below lemma a few pairs of terms that are rewritable via
 $\grewrites$:
 \begin{lemma}\label{gstarRdistinctGeneral}
+	\mbox{}
 	\begin{itemize}
 		\item
 			$rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$
 		\item
 			$rs \grewrites \rDistinct \; rs \; \varnothing$
 \noindent
 If a pair of terms $rs_1, rs_2$ are rewritable via $\grewrites$ to each other,
 then they are equivalent under $\rsimp{}$:
 \begin{lemma}\label{grewritesSimpalts}
 	If $rs_1 \grewrites rs_2$, then
-	we have the following equivalence hold:
+	we have the following equivalence:
 	\begin{itemize}
 		\item
 			$\sum rs_1 \sequal \sum rs_2$
 		\item
 			$\rsimpalts \; rs_1 \sequal \rsimpalts \; rs_2$
 			(\map \; (\_ \backslash_r x) rs) \;\varnothing  )$
 	\end{itemize}
 \end{lemma}
 \begin{proof}
 	Part 1 is by lemma \ref{insideSimpRemoval},
-	part 2 is by lemmas \ref{insideSimpRemoval} and \ref{distinctDer}.
+	part 2 is by lemma \ref{insideSimpRemoval} .%and \ref{distinctDer}.
 \end{proof}
 \noindent
-This leads to the first closed form--
+\subsection{Closed Forms for $\sum rs$, $r_1\cdot r_2$ and $r^*$}
+\subsubsection{Closed Form for Alternative Regular Expression}
+Lemma \ref{Simpders} leads to the first closed form--
 \begin{lemma}\label{altsClosedForm}
 	\begin{center}
 		$\rderssimp{(\sum rs)}{s} \sequal
 		\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
 	\end{center}
 	$\rderssimp \; (\sum \; rs) \; s =
 	\rsimp{(\sum \; (\map \; \rderssimp{\_}{s} \; rs))}$.
 \end{corollary}
 \noindent
 The harder closed forms are the sequence and star ones.
-Before we go on to obtain them, some preliminary definitions
+Before we obtain them, some preliminary definitions
 are needed to make proof statements concise.
-\subsection{Closed Forms}
 \subsubsection{Closed Form for Sequence Regular Expressions}
-The problem of obataining a closed-form for sequence regular expression
-is constructing $(r_1 \cdot r_2) \backslash_r s$
-if we are only allowed to use a combination of $r_1 \backslash s''$
-and  $r_2 \backslash s''$ , where $s''$ is from $s$.
 First let's look at a series of derivatives steps on a sequence
 regular expression, assuming that each time the first
 component of the sequence is always nullable):
 \begin{center}
+%----------------------------------------------------------------------------------------
+%	SECTION ??
+%----------------------------------------------------------------------------------------
+%-----------------------------------
+%	SECTION syntactic equivalence under simp
+%-----------------------------------
+%----------------------------------------------------------------------------------------
+%	SECTION ALTS CLOSED FORM
+%----------------------------------------------------------------------------------------
+%\section{A Closed Form for \textit{ALTS}}
+%Now we prove that  $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
+%
+%
+%There are a few key steps, one of these steps is
+%
+%
+%
+%One might want to prove this by something a simple statement like:
+%
+%For this to hold we want the $\textit{distinct}$ function to pick up
+%the elements before and after derivatives correctly:
+%$r \in rset \equiv (rder x r) \in (rder x rset)$.
+%which essentially requires that the function $\backslash$ is an injective mapping.
+%
+%Unfortunately the function $\backslash c$ is not an injective mapping.
+%
+%\subsection{function $\backslash c$ is not injective (1-to-1)}
+%\begin{center}
+%	The derivative $w.r.t$ character $c$ is not one-to-one.
+%	Formally,
+%	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
+%\end{center}
+%This property is trivially true for the
+%character regex example:
+%\begin{center}
+%	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
+%\end{center}
+%But apart from the cases where the derivative
+%output is $\ZERO$, are there non-trivial results
+%of derivatives which contain strings?
+%The answer is yes.
+%For example,
+%\begin{center}
+%	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
+%	where $a$ is not nullable.\\
+%	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
+%	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
+%\end{center}
+%We start with two syntactically different regular expressions,
+%and end up with the same derivative result.
+%This is not surprising as we have such
+%equality as below in the style of Arden's lemma:\\
+%\begin{center}
+%	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
+%\end{center}
+\section{Bounding Closed Forms}
+In this section, we introduce how we formalised the bound
+on closed forms.
+We first prove that functions such as $\rflts$
+will not cause the size of r-regular expressions to grow.
+Putting this together with a general bound
+on the finiteness of distinct regular expressions
+smaller than a certain size, we obtain a bound on
+the closed forms.
+\subsection{$\textit{rsimp}$ Does Not Increment the Size}
+Although it seems evident, we need a series
+of non-trivial lemmas to establish that functions such as $\rflts$
+do not cause the regular expressions to grow.
+\begin{lemma}\label{rsimpMonoLemmas}
+	\mbox{}
+	\begin{itemize}
+		\item
+			\[
+				\llbracket \rsimpalts \; rs \rrbracket_r \leq
+				\llbracket \sum \; rs \rrbracket_r
+			\]
+		\item
+			\[
+				\llbracket \rsimpseq \; r_1 \;  r_2 \rrbracket_r \leq
+				\llbracket r_1 \cdot r_2 \rrbracket_r
+			\]
+		\item
+			\[
+				\llbracket \rflts \; rs \rrbracket_r  \leq
+				\llbracket rs \rrbracket_r
+			\]
+		\item
+			\[
+				\llbracket \rDistinct \; rs \; ss \rrbracket_r  \leq
+				\llbracket rs \rrbracket_r
+			\]
+		\item
+			If all elements $a$ in the set $as$ satisfy the property
+			that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
+			\llbracket a \rrbracket_r$, then we have
+			\[
+				\llbracket \; \rsimpalts \; (\textit{rdistinct} \;
+				(\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
+				\rrbracket \leq
+				\llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
+				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r
+			\]
+	\end{itemize}
+\end{lemma}
+\begin{proof}
+	Point 1, 3, 4 can be proven by an induction on $rs$.
+	Point 2 is by case analysis on $r_1$ and $r_2$.
+	The last part is a corollary of the previous ones.
+\end{proof}
+\noindent
+With the lemmas for each inductive case in place, we are ready to get
+the non-increasing property as a corollary:
+\begin{corollary}\label{rsimpMono}
+	$\llbracket \textit{rsimp} \; r \rrbracket_r \leq \llbracket r \rrbracket_r$
+\end{corollary}
+\begin{proof}
+	By \ref{rsimpMonoLemmas}.
+\end{proof}
+\subsubsection{The $RL$ Function: Language Interpretation of $\textit{Rrexp}$s}
+Much like the definition of $L$ on plain regular expressions, one could also
+define the language interpretation of $\rrexp$s.
+\begin{center}
+	\begin{tabular}{lcl}
+		$RL \; (\ZERO)$ & $\dn$ & $\phi$\\
+		$RL \; (\ONE)$ & $\dn$ & $\{[]\}$\\
+		$RL \; (c)$ & $\dn$ & $\{[c]\}$\\
+		$RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
+		$RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
+		$RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$
+	\end{tabular}
+\end{center}
+\noindent
+The main use of $RL$ is to establish some connections between $\rsimp{}$
+and $\rnullable{}$:
+\begin{lemma}
+	The following properties hold:
+	\begin{itemize}
+		\item
+			If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
+		\item
+			$\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
+	\end{itemize}
+\end{lemma}
+\begin{proof}
+	The first part is by induction on $r$.
+	The second part is true because property
+	\[ RL \; r = RL \; (\rsimp{r})\] holds.
+\end{proof}
+\subsubsection{Simplified $\textit{Rrexp}$s are Good}
+We formalise the notion of ``good" regular expressions,
+which means regular expressions that
+are not fully simplified. For alternative regular expressions that means they
+do not contain any nested alternatives like
+\[ r_1 + (r_2 + r_3) \], un-removed $\RZERO$s like \[\RZERO + r\]
+or duplicate elements in a children regular expression list like \[ \sum [r, r, \ldots]\]:
+\begin{center}
+	\begin{tabular}{@{}lcl@{}}
+		$\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
+		$\good\; \RONE$ & $\dn$ & $\textit{true}$\\
+		$\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
+		$\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
+		$\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
+		$\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ &
+		$\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
+						   & & $\textit{and}\; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \,  \textit{and}\; \, \textit{nonAlt}\; r')$\\
+		$\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
+		$\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
+		$\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
+		$\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
+		$\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
+	\end{tabular}
+\end{center}
+\noindent
+The predicate $\textit{nonAlt}$ evaluates to true when the regular expression is not an
+alternative, and false otherwise.
+The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
+its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$,
+and unique:
+\begin{lemma}\label{rsimpaltsGood}
+	If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
+	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
+\end{lemma}
+\noindent
+We also note that
+if a regular expression $r$ is good, then $\rflts$ on the singleton
+list $[r]$ will not break goodness:
+\begin{lemma}\label{flts2}
+	If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
+\end{lemma}
+\begin{proof}
+	By an induction on $r$.
+\end{proof}
+\noindent
+The other observation we make about $\rsimp{r}$ is that it never
+comes with nested alternatives, which we describe as the $\nonnested$
+property:
+\begin{center}
+	\begin{tabular}{lcl}
+		$\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
+		$\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
+		$\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
+		$\nonnested \; \, r $ & $\dn$ & $\btrue$
+	\end{tabular}
+\end{center}
+\noindent
+The $\rflts$ function
+always opens up nested alternatives,
+which enables $\rsimp$ to be non-nested:
+\begin{lemma}\label{nonnestedRsimp}
+	$\nonnested \; (\rsimp{r})$
+\end{lemma}
+\begin{proof}
+	By an induction on $r$.
+\end{proof}
+\noindent
+With this we could prove that a regular expressions
+after simplification and flattening and de-duplication,
+will not contain any alternative regular expression directly:
+\begin{lemma}\label{nonaltFltsRd}
+	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$
+	then $\textit{nonAlt} \; x$.
+\end{lemma}
+\begin{proof}
+	By \ref{nonnestedRsimp}.
+\end{proof}
+\noindent
+The other thing we know is that once $\rsimp{}$ had finished
+processing an alternative regular expression, it will not
+contain any $\RZERO$s, this is because all the recursive
+calls to the simplification on the children regular expressions
+make the children good, and $\rflts$ will not take out
+any $\RZERO$s out of a good regular expression list,
+and $\rdistinct{}$ will not mess with the result.
+\begin{lemma}\label{flts3Obv}
+	The following are true:
+	\begin{itemize}
+		\item
+			If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
+			then for all $r \in \rflts\; rs. \, \good \; r$.
+		\item
+			If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
+			and for all $y$ such that $\llbracket y \rrbracket_r$ less than
+			$\llbracket rs \rrbracket_r + 1$, either
+			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
+			then $\good \; x$.
+	\end{itemize}
+\end{lemma}
+\begin{proof}
+	The first part is by induction on $rs$, where the induction
+	rule is the inductive cases for $\rflts$.
+	The second part is a corollary from the first part.
+\end{proof}
+And this leads to good structural property of $\rsimp{}$,
+that after simplification, a regular expression is
+either good or $\RZERO$:
+\begin{lemma}\label{good1}
+	For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
+\end{lemma}
+\begin{proof}
+	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
+	Lemma \ref{rsimpSize} says that
+	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
+	$\llbracket r \rrbracket_r$.
+	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
+	Inductive hypothesis applies to the children regular expressions
+	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
+	by that as well.
+	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
+	to ensure that goodness is preserved at the topmost level.
+\end{proof}
+We shall prove that any good regular expression is
+a fixed-point for $\rsimp{}$.
+First we prove an auxiliary lemma:
+\begin{lemma}\label{goodaltsNonalt}
+	If $\good \; \sum rs$, then $\rflts\; rs = rs$.
+\end{lemma}
+\begin{proof}
+	By an induction on $\sum rs$. The inductive rules are the cases
+	for $\good$.
+\end{proof}
+\noindent
+Now we are ready to prove that good regular expressions are invariant
+of $\rsimp{}$ application:
+\begin{lemma}\label{test}
+	If $\good \;r$ then $\rsimp{r} = r$.
+\end{lemma}
+\begin{proof}
+	By an induction on the inductive cases of $\good$, using lemmas
+	\ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
+	The lemma \ref{goodaltsNonalt} is used in the alternative
+	case where 2 or more elements are present in the list.
+\end{proof}
+\noindent
+Given below is a property involving $\rflts$, $\rdistinct{}{}$, $\rsimp{}$ and $\rsimp_{ALTS}$,
+which requires $\ref{good1}$ to go through smoothly.
+It says that an application of $\rsimp_{ALTS}$ can be "absorbed",
+if it its output is concatenated with a list and then applied to $\rflts$.
+\begin{lemma}\label{flattenRsimpalts}
+	$\rflts \; ( (\rsimp_{ALTS} \;
+	(\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) ::
+	\map \; \rsimp{} \; rs' ) =
+	\rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
+	\map \; \rsimp{rs'}))$
+\end{lemma}
+\begin{proof}
+	By \ref{good1}.
+\end{proof}
+\noindent
+We are also ready to prove that $\textit{rsimp}$ is idempotent.
+\subsubsection{$\rsimp$ is Idempotent}
+The idempotency of $\rsimp$ is very useful in
+manipulating regular expression terms into desired
+forms so that key steps allowing further rewriting to closed forms
+are possible.
+\begin{lemma}\label{rsimpIdem}
+	$\rsimp{r} = \rsimp{\rsimp{r}}$
+\end{lemma}
+\begin{proof}
+	By \ref{test} and \ref{good1}.
+\end{proof}
+\noindent
+This property means we do not have to repeatedly
+apply simplification in each step, which justifies
+our definition of $\blexersimp$.
+On the other hand, we could repeat the same $\rsimp{}$ applications
+on regular expressions as many times as we want, if we have at least
+one simplification applied to it, and apply it wherever we would like to:
+\begin{corollary}\label{headOneMoreSimp}
+	The following properties hold, directly from \ref{rsimpIdem}:
+	\begin{itemize}
+		\item
+			$\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
+		\item
+			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
+	\end{itemize}
+\end{corollary}
+\noindent
+This will be useful in later closed form proof's rewriting steps.
+Similarly, we point out the following useful facts below:
+\begin{lemma}
+	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
+	\begin{itemize}
+		\item
+			If $r  = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
+		\item
+			If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
+		\item
+			$\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
+	\end{itemize}
+\end{lemma}
+\begin{proof}
+	By application of lemmas \ref{rsimpIdem} and \ref{good1}.
+\end{proof}
+\noindent
+With the idempotency of $\rsimp{}$ and its corollaries,
+we can start proving some key equalities leading to the
+closed forms.
+Now presented are a few equivalent terms under $\rsimp{}$.
+We use $r_1 \sequal r_2 $ here to denote $\rsimp{r_1} = \rsimp{r_2}$.
+\begin{lemma}
+	\begin{itemize}
+		The following equivalence hold:
+	\item
+		$\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
+	\item
+		$\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
+	\item
+		$\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
+	\item
+		$\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
+	\item
+		$\RALTS{rs} = \RALTS{\map \; \rsimp{} \; rs}$
+\end{itemize}
+\end{lemma}
+\begin{proof}
+	By induction on the lists involved.
+\end{proof}
+\noindent
+The above allows us to prove
+two similar equalities (which are a bit more involved).
+It says that we could flatten out the elements
+before simplification and still get the same result.
+\begin{lemma}\label{simpFlatten3}
+	One can flatten the inside $\sum$ of a $\sum$ if it is being
+	simplified. Concretely,
+	\begin{itemize}
+		\item
+			If for all $r \in rs, rs', rs''$, we have $\good \; r $
+			or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal
+			\sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
+		\item
+			$\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
+	\end{itemize}
+\end{lemma}
+\begin{proof}
+	By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
+	The second sub-lemma is a corollary of the previous.
+\end{proof}
+%Rewriting steps not put in--too long and complicated-------------------------------
+\begin{comment}
+	\begin{center}
+		$\rsimp{\sum (rs' @ rs @ rs'')}  \stackrel{def of bsimp}{=}$  \\
+		$\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
+		$\stackrel{by \ref{test}}{=}
+		\rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
+		\varnothing})$\\
+		$\stackrel{by \ref{rdistinctConcatGeneral}}{=}
+		\rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
+		\rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
+	\end{center}
+\end{comment}
+%Rewriting steps not put in--too long and complicated-------------------------------
+\noindent
 \subsection{Estimating the Closed Forms' sizes}
 We now summarize the closed forms below:
 \begin{itemize}
 	\item
 		$\rderssimp{(\sum rs)}{s} \sequal
 original size.
 We will discuss improvements to this bound in the next chapter.
-%----------------------------------------------------------------------------------------
+\section{Possible Further Improvements}
-%	SECTION ??
-%----------------------------------------------------------------------------------------
-%-----------------------------------
-%	SECTION syntactic equivalence under simp
-%-----------------------------------
-%----------------------------------------------------------------------------------------
-%	SECTION ALTS CLOSED FORM
-%----------------------------------------------------------------------------------------
-%\section{A Closed Form for \textit{ALTS}}
-%Now we prove that  $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
-%
-%
-%There are a few key steps, one of these steps is
-%
-%
-%
-%One might want to prove this by something a simple statement like:
-%
-%For this to hold we want the $\textit{distinct}$ function to pick up
-%the elements before and after derivatives correctly:
-%$r \in rset \equiv (rder x r) \in (rder x rset)$.
-%which essentially requires that the function $\backslash$ is an injective mapping.
-%
-%Unfortunately the function $\backslash c$ is not an injective mapping.
-%
-%\subsection{function $\backslash c$ is not injective (1-to-1)}
-%\begin{center}
-%	The derivative $w.r.t$ character $c$ is not one-to-one.
-%	Formally,
-%	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
-%\end{center}
-%This property is trivially true for the
-%character regex example:
-%\begin{center}
-%	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
-%\end{center}
-%But apart from the cases where the derivative
-%output is $\ZERO$, are there non-trivial results
-%of derivatives which contain strings?
-%The answer is yes.
-%For example,
-%\begin{center}
-%	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
-%	where $a$ is not nullable.\\
-%	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
-%	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
-%\end{center}
-%We start with two syntactically different regular expressions,
-%and end up with the same derivative result.
-%This is not surprising as we have such
-%equality as below in the style of Arden's lemma:\\
-%\begin{center}
-%	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
-%\end{center}
-\section{Further Improvements to the Bound}
 There are two problems with this finiteness result, though.
 \begin{itemize}
 	\item
 		First, It is not yet a direct formalisation of our lexer's complexity,
 		as a complexity proof would require looking into

changeset 613	b0f0d884a547
parent 611	bc1df466150a
child 614	d5e9bcb384ec