ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 558 671a83abccf3
parent 557 812e5d112f49
child 559 9d18f3eac484
--- a/ChengsongTanPhdThesis/Chapters/Finite.tex	Fri Jul 01 13:02:15 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex	Mon Jul 04 12:27:13 2022 +0100
@@ -265,10 +265,21 @@
 
 \section{Step One: Closed Forms}
 		We transform the function application $\rderssimp{r}{s}$
-		into an equivalent form $f\; (g \; (\sum rs))$.
+		into an equivalent 
+		form $f\; (g \; (\sum rs))$.
 		The functions $f$ and $g$ can be anything from $\flts$, $\distinctBy$ and other helper functions from $\bsimp{\_}$.
 		This way we get a different but equivalent way of expressing : $r\backslash s = f \; (g\; (\sum rs))$, we call the
 		right hand side the "closed form" of $r\backslash s$.
+
+\begin{quote}\it
+	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
+	\begin{center}
+		$ \rderssimp{r_1 \cdot r_2}{s} = 
+	\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
+	\end{center}
+\end{quote}
+\noindent
+We explain in detail how we reached those claims.
 \subsection{Basic Properties needed for Closed Forms}
 
 \subsubsection{$\textit{rdistinct}$'s Deduplicates Successfully}
@@ -1045,37 +1056,100 @@
 Before we go on to obtain them, some preliminary definitions
 are needed to make proof statements concise.
 
-\section{"Closed Forms" of regular expressions' derivatives w.r.t strings}
-We note that the different ways in which regular expressions are 
-nested do not matter under $\rsimp{}$:
+\section{"Closed Forms" of Sequence Regular Expressions}
+The problem of obataining a closed-form for sequence regular expression 
+is constructing $(r_1 \cdot r_2) \backslash_r s$
+if we are only allowed to use a combination of $r_1 \backslash s''$ 
+and  $r_2 \backslash s''$ , where $s''$ is from $s$.
+First let's look at a series of derivatives steps on a sequence 
+regular expression, assuming that each time the first
+component of the sequence is always nullable):
 \begin{center}
-	To be proven:\\
-	$\rsimp{r} \stackrel{?}{=} \rsimp{r'}$ if $r = \sum [r_1, r_2, r_3, \ldots]$ 
-and $r' =(\ldots ((r_1 + r_2) + r_3) + \ldots)$
+
+$r_1 \cdot r_2 \quad \longrightarrow_{\backslash c}  \quad   r_1  \backslash c \cdot r_2 + r_2 \backslash c \quad \longrightarrow_{\backslash c'} \quad (r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc' \longrightarrow_{\backslash c''} \quad$\\
+$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') + r_2 \backslash cc'c''   \longrightarrow_{\backslash c''} \quad
+ \ldots$
+
 \end{center}
-\noindent
+Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expresssed as 
+a giant alternative taking a list of terms 
+$[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$,
+where the head of the list is always the term
+representing a match involving only $r_1$, and the tail of the list consisting of
+terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$.
 This intuition is also echoed by IndianPaper, where they gave
 a pencil-and-paper derivation of $(r_1 \cdot r_2)\backslash s$:
 \begin{center}
-	$((r_1 \cdot r_2) \backslash_r c_1) \stackrel{\backslash_r c_2}{=}
-	((r_1 \backslash_r c_1) \cdot r_2 + (\delta\; (\rnullable \; r_1),\; r_2 \backslash_r c_1)) 
-	\stackrel{\backslash_r c_2}{=}
-	((r_1 \backslash_r c_1c_2 \cdot r_2 + (\delta \; (\rnullable) \; r_1, r_2 \backslash_r c_1c_2)
-	+ (\delta \; (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2)
+	\begin{tabular}{c}
+	$(r_1 \cdot r_2) \backslash_r (c_1 :: c_2 :: \ldots c_n) \myequiv$\\ 
+	\rule{0pt}{3ex} $((r_1 \backslash_r c_1) \cdot r_2 + (\delta\; (\rnullable \; r_1) \; r_2 \backslash_r c_1)) \backslash_r (c_2 :: \ldots c_n) 
+	\myequiv$\\
+	\rule{0pt}{3ex} $((r_1 \backslash_r c_1c_2 \cdot r_2 + (\delta \; (\rnullable \; r_1) \; r_2 \backslash_r c_1c_2))
+	+ (\delta \ (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2)) \backslash_r (c_3 \ldots c_n)
 	$
+	\end{tabular}
 \end{center}
 \noindent
-The $\delta \; b \; r$ function above turns the entire term into $\RZERO$
-if the boolean condition $b$ evaluates to false, and outputs $r$ otherwise.
-The equality in their derivation steps should be interpretated
-as language equivalence. To pin down the intuition into rigorous terms,
-$\sflat{}$ is used to enable the transformation from 
+The equality in above should be interpretated
+as language equivalence. 
+The $\delta$ function works similarly to that of
+a Kronecker delta function:
+\[ \delta \; b\; r\]
+will produce $r$
+if $b$ evaluates to true, 
+and $\RZERO$ otherwise.
+Note that their formulation  
+\[
+	((r_1 \backslash_r \, c_1c_2 \cdot r_2 + (\delta \; (\rnullable) \; r_1, r_2 \backslash_r c_1c_2)
+	+ (\delta \; (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2)
+\]
+does not faithfully
+represent what the intermediate derivatives would actually look like
+when one or more intermediate results $r_1 \backslash s' \cdot r_2$ are not 
+nullable in the head of the sequence.
+For example, when $r_1$ and $r_1 \backslash_r c_1$ are not nullable,
+the regular expression would not look like 
+\[
+	(r_1 \backslash_r c_1c_2 + \RZERO ) + \RZERO,
+\]
+but actually $r_1 \backslash_r c_1c_2$, the redundant $\RZERO$s will not be created in the
+first place.
+In a closed-form one would want to take into account this 
+and generate the list of
+regular expressions $r_2 \backslash_r s''$ with
+string pairs $(s', s'')$ where $s'@s'' = s$ and
+$r_1 \backslash s'$ nullable.
+We denote the list consisting of such 
+strings $s''$ as $\vsuf{s}{r_1}$.
+
+The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string:
+\begin{center}
+\begin{tabular}{lcl}
+$\vsuf{[]}{\_} $ & $=$ &  $[]$\\
+$\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\
+                                     && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) })  $
+\end{tabular}
+\end{center}
+\noindent
+The list is sorted in the order $r_2\backslash s''$ 
+appears in $(r_1\cdot r_2)\backslash s$.
+In essence, $\vsuf{\_}{\_}$ is doing a 
+"virtual derivative" of $r_1 \cdot r_2$, but instead of producing 
+the entire result $(r_1 \cdot r_2) \backslash s$, 
+it only stores all the strings $s''$ such that $r_2 \backslash s''$
+are occurring terms in $(r_1\cdot r_2)\backslash s$.
+
+To make the closed form representation 
+more straightforward,
+the flattetning function $\sflat{\_}$ is used to enable the transformation from 
 a left-associative nested sequence of alternatives into 
 a flattened list:
-$r = \sum [r_1, r_2, r_3, \ldots] \stackrel{\sflat{}}{\rightarrow} 
-(\ldots ((r_1 + r_2) + r_3) + \ldots)$
-The definitions $\sflat{}$, $\sflataux{}$ are given below.
-
+\[
+  \sum [r_1, r_2, r_3, \ldots] \stackrel{\sflat{\_}}{\rightarrow} 
+(\ldots ((r_1 + r_2) + r_3) + \ldots)
+\]
+\noindent
+The definitions $\sflat{\_}$, $\sflataux{\_}$ are given below.
  \begin{center}  
  \begin{tabular}{ccc}
  $\sflataux{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\
@@ -1091,118 +1165,363 @@
 $\sflat r$ & $=$ & $ r$
 \end{tabular}
 \end{center}
-The intuition behind $\sflataux{\_}$ is to break up nested regexes 
+\noindent
+$\sflataux{\_}$ breaks up nested alternative regexes 
 of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape
-into a more "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$.
+into a "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$.
 It will return the singleton list $[r]$ otherwise.
 $\sflat{\_}$ works the same  as $\sflataux{\_}$, except that it keeps
 the output type a regular expression, not a list.
-$\sflataux{\_}$  and $\sflat{\_}$ is only recursive in terms of the
- first element of the list of children of
-an alternative regex ($\AALTS{ }{rs}$), and is purposefully built for  dealing with the regular expression
-$r_1 \cdot r_2 \backslash s$.
-\subsection{The $\textit{vsuf}$ Function}
-Note that $((r_1 \backslash_r c_1c_2 \cdot r_2 + (\delta \; (\rnullable) \; r_1, r_2 \backslash_r c_1c_2)
-	+ (\delta \; (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2)$ does not faithfully
-represent what would the intermediate derivatives would actually look like.
-For example, when $r_1$ and $r_1 \backslash_r c_1$ are nullable,
-the regular expression would not look like $(r_1 \backslash_r c_1c_2 + \RZERO ) + \RZERO$,
-but actually $r_1\backslash_r c_1c_2$, the redundant $\RZERO$s will not be created in the
-first place.
-In a closed-form one would want to take into account this 
-and generate the list of string pairs $(s', s'')$ where $s'@s'' = s$ such that
-only $r_1 \backslash s'$ nullable.
-With $\sflat{\_}$ and $\sflataux{\_}$ we make 
-precise what  "closed forms" we have for the sequence derivatives and their simplifications,
-in other words, how can we construct $(r_1 \cdot r_2) \backslash s$
-and $\bderssimp{(r_1\cdot r_2)}{s}$,
-if we are only allowed to use a combination of $r_1 \backslash s'$ ($\bderssimp{r_1}{s'}$)
-and  $r_2 \backslash s'$ ($\bderssimp{r_2}{s'}$), where $s'$  ranges over 
-the substring of $s$?
-First let's look at a series of derivatives steps on a sequence 
-regular expression,  (assuming) that each time the first
-component of the sequence is always nullable):
-\begin{center}
+$\sflataux{\_}$  and $\sflat{\_}$ are only recursive on the  
+first element of the list.
+
+With $\sflataux{}$ a preliminary to the closed form can be stated,
+where the derivative of $r_1 \cdot r_2 \backslash s$ can be
+flattened into a list whose head and tail meet the description
+we gave earlier.
+\begin{lemma}\label{seqSfau0}
+	$\sflataux{\rders{(r_1 \cdot r_2) \backslash s }} = (r_1 \backslash_r s) \cdot r_2 
+	:: (\map \; (r_2 \backslash_r \_) \; (\textit{Suffix} \; s \; r1))$ 
+\end{lemma}
+\begin{proof}
+	By an induction on the string $s$, where the inductive cases 
+	are split as $[]$ and $xs @ [x]$.
+	Note the key identify holds:
+	\[
+		\map \; (r_2 \backslash_r \_) \; (\vsuf{[x]}{(r_1 \backslash_r xs)}) \;\; @ \;\;
+		\map \; (\_ \backslash_r x) \; (\map \; (r_2 \backslash \_) \; (\vsuf{xs}{r_1}))
+	\]
+		=
+	\[
+		\map \; (r_2 \backslash_r \_) \; (\vsuf{xs @ [x]}{r_1})
+	\]
+	This enables the inductive case to go through.
+\end{proof}
+\noindent 
+Note that this lemma does $\mathbf{not}$ depend on any
+specific definitions we used,
+allowing people investigating derivatives to get an alternative
+view of what $r_1 \cdot r_2$ is.
 
-$r_1 \cdot r_2 \quad \longrightarrow_{\backslash c}  \quad   r_1  \backslash c \cdot r_2 + r_2 \backslash c \quad \longrightarrow_{\backslash c'} \quad (r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc' \longrightarrow_{\backslash c''} \quad$\\
-$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') + r_2 \backslash cc'c''   \longrightarrow_{\backslash c''} \quad
- \ldots$
-
-\end{center}
-%TODO: cite indian paper
-Indianpaper have  come up with a slightly more formal way of putting the above process:
+Now we are able to use this for the intuition that 
+the different ways in which regular expressions are 
+nested do not matter under $\rsimp{}$:
 \begin{center}
-$r_1 \cdot r_2 \backslash (c_1 :: c_2 ::\ldots c_n) \myequiv r_1 \backslash (c_1 :: c_2:: \ldots c_n) +
-\delta(\nullable(r_1 \backslash (c_1 :: c_2 \ldots c_{n-1}) ), r_2 \backslash (c_n)) + \ldots
-+ \delta (\nullable(r_1), r_2\backslash (c_1 :: c_2 ::\ldots c_n))$
+	$\rsimp{r} \stackrel{?}{\sequal} \rsimp{r'}$ if $r = \sum [r_1, r_2, r_3, \ldots]$ 
+and $r' =(\ldots ((r_1 + r_2) + r_3) + \ldots)$
 \end{center}
-where  $\delta(b, r)$ will produce $r$
-if $b$ evaluates to true, 
-and $\ZERO$ otherwise.
+Simply wrap with $\sum$ constructor and add 
+simplifications to both sides of \ref{seqSfau0}
+and one gets
+\begin{corollary}\label{seqClosedFormGeneral}
+	$\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }
+	=\rsimp{(\sum (  (r_1 \backslash s) \cdot r_2 :: 
+		\map\; (r_2 \backslash \_) \; (\vsuf{s}{r_1})))}$
+\end{corollary}
+Together with the idempotency property of $\rsimp{}$ (lemma \ref{rsimpIdem}),
+it is possible to convert the above lemma to obtain a "closed form"
+for  derivatives nested with simplification:
+\begin{lemma}\label{seqClosedForm}
+	$\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{(\sum ((r_1 \backslash s) \cdot r_2 ) 
+	:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1})))}$
+\end{lemma}
+\begin{proof}
+	By a case analysis of string $s$.
+	When $s$ is empty list, the rewrite is straightforward.
+	When $s$ is a list, one could use the corollary \ref{seqSfau0},
+	and lemma \ref{Simpders} to rewrite the left-hand-side.
+\end{proof}
+As a corollary for this closed form, one can estimate the size 
+of the sequence derivative $r_1 \cdot r_2 \backslash_r s$ using 
+an easier-to-handle expression:
+\begin{corollary}\label{seqEstimate1}
+	\begin{center}
 
- But the $\myequiv$ sign in the above equation means language equivalence instead of syntactical
- equivalence. To make this intuition useful 
- for a formal proof, we need something
-stronger than language equivalence.
-With the help of $\sflat{\_}$ we can state the equation in Indianpaper
-more rigorously:
-\begin{lemma}
-$\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$
-\end{lemma}
-
-The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string:
+	$\llbracket \rderssimp{(r_1 \cdot r_2)}{s} \rrbracket_r = \llbracket \rsimp{(\sum ((r_1 \backslash s) \cdot r_2 ) 
+	:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1})))} \rrbracket_r$
+    	
+	\end{center}
+\end{corollary}
+\noindent
+\subsection{Closed Forms for Star Regular Expressions}
+We use a similar technique as $r_1 \cdot r_2$ case,
+generating 
+all possible sub-strings $s'$ of $s$
+such that $r\backslash s' \cdot r^*$ will appear 
+as a term in $(r^*) \backslash s$.
+The first function we define is a single-step
+updating function $\starupdate$, which takes three arguments as input:
+the new character $c$ to take derivative with, 
+the regular expression
+$r$ directly under the star $r^*$, and the
+list of strings $sSet$ for the derivative $r^* \backslash s$ 
+up til this point  
+such that $(r^*) \backslash s = \sum_{s' \in sSet} (r\backslash s') \cdot r^*$ 
+(the equality is not exact, more on this later).
+\begin{center}
+	\begin{tabular}{lcl}
+		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
+		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
+						     & & $\textit{if} \; 
+						     (\rnullable \; (\rders \; r \; s))$ \\
+						     & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
+						     \starupdate \; c \; r \; Ss)$ \\
+						     & & $\textit{else} \;\; (s @ [c]) :: (
+						     \starupdate \; c \; r \; Ss)$
+	\end{tabular}
+\end{center}
+\noindent
+As a generalisation from characters to strings,
+$\starupdates$ takes a string instead of a character
+as the first input argument, and is otherwise the same
+as $\starupdate$.
+\begin{center}
+	\begin{tabular}{lcl}
+		$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
+		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
+		\starupdate \; c \; r \; Ss)$
+	\end{tabular}
+\end{center}
+\noindent
+For the star regular expression,
+its derivatives can be seen as  a nested gigantic
+alternative similar to that of sequence regular expression's derivatives, 
+and therefore need
+to be ``straightened out" as well.
+The function for this would be $\hflat{}$ and $\hflataux{}$.
+\begin{center}
+	\begin{tabular}{lcl}
+		$\hflataux{r_1 + r_2}$ & $\dn$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
+		$\hflataux{r}$ & $\dn$ & $[r]$
+	\end{tabular}
+\end{center}
 
 \begin{center}
-\begin{tabular}{lcl}
-$\vsuf{[]}{\_} $ & $=$ &  $[]$\\
-$\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\
-                                     && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) })  $
-\end{tabular}
-\end{center}                   
-It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable,
-and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in
-the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$.
-In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r_1 \cdot r_2$, but instead of producing 
-the entire result of  $(r_1 \cdot r_2) \backslash s$, 
-it only stores all the $s''$ such that $r_2 \backslash s''$  are occurring terms in $(r_1\cdot r_2)\backslash s$.
+	\begin{tabular}{lcl}
+		$\hflat{r_1 + r_2}$ & $\dn$ & $\sum (\hflataux {r_1} @ \hflataux {r_2}) $\\
+		$\hflat{r}$ & $\dn$ & $r$
+	\end{tabular}
+\end{center}
+\noindent
+%MAYBE TODO: introduce createdByStar
+We first introduce an inductive property
+for $\starupdate$ and $\hflataux{\_}$, 
+it says if we do derivatives of $r^*$
+with a string that starts with $c$,
+then flatten it out,
+we obtain a list
+of the shape $\sum_{s' \in sSet} (r\backslash_r s') \cdot r^*$,
+where $sSet = \starupdates \; s \; r \; [[c]]$.
+\begin{lemma}\label{starHfauInduct}
+	$\hflataux{(\rders{( (\rder{c}{r_0})\cdot(r_0^*))}{s})} = 
+	\map \; (\lambda s_1. (r_0 \backslash_r s_1) \cdot (r_0^*)) \; 
+	(\starupdates \; s \; r_0 \; [[c]])$
+\end{lemma}
+\begin{proof}
+	By an induction on $s$, the inductive cases
+	being $[]$ and $s@[c]$.
+\end{proof}
+\noindent
+Here is a corollary that states the lemma in
+a more intuitive way:
+\begin{corollary}
+	$\hflataux{r^* \backslash_r (c::xs)} = \map \; (\lambda s. (r \backslash_r s) \cdot
+	(r^*))\; (\starupdates \; c\; r\; [[c]])$
+\end{corollary}
+\noindent
+Note that this is also agnostic of the simplification
+function we defined, and is therefore of more general interest.
+
+Now adding the $\rsimp{}$ bit for closed forms,
+we have
+\begin{lemma}
+	$a :: rs \grewrites \hflataux{a} @ rs$
+\end{lemma}
+\noindent
+giving us
+\begin{lemma}\label{cbsHfauRsimpeq1}
+	$\rsimp{a+b} = \rsimp{(\sum \hflataux{a} @ \hflataux{b})}$.
+\end{lemma}
+\noindent
+This yields
+\begin{lemma}\label{hfauRsimpeq2}
+$\rsimp{r} = \rsimp{(\sum \hflataux{r})}$
+\end{lemma}
+\noindent
+Together with the rewriting relation
+\begin{lemma}\label{starClosedForm6Hrewrites}
+	$\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss
+	\scfrewrites
+	 \map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss$
+\end{lemma}
+\noindent
+We obtain the closed form for star regular expression:
+\begin{lemma}\label{starClosedForm}
+	$\rderssimp{r^*}{c::s} = 
+	\rsimp{
+		(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; 
+			(\starupdates \; s\; r \; [[c]])
+		      )
+		)
+	      }
+	$
+\end{lemma}
+\begin{proof}
+	By an induction on $s$.
+	The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, and \ref{hfauRsimpeq2}
+	are used.	
+\end{proof}
+\section{Estimating the Closed Forms' sizes}
+We now summarize the closed forms below:
+\begin{itemize}
+	\item
+	$\rderssimp{(\sum rs)}{s} \sequal
+	 \sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
+	\item
+	$\rderssimp{(r_1 \cdot r_2)}{s} \sequal \sum ((r_1 \backslash s) \cdot r_2 ) 
+	:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1}))$
+	\item
+
+	$\rderssimp{r^*}{c::s} = 
+	\rsimp{
+		(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; 
+			(\starupdates \; s\; r \; [[c]])
+		      )
+		)
+	      }
+	$
+\end{itemize}	
+\noindent	
+The closed forms on the left-hand-side
+are all of the same shape: $\rsimp{ (\sum rs)} $.
+Such regular expression will be bounded by the size of $\sum rs'$, 
+where every element in $rs'$ is distinct, and each element 
+can be described by some inductive sub-structures 
+(for example when $r = r_1 \cdot r_2$ then $rs'$ 
+will be solely comprised of $r_1 \backslash s'$ 
+and $r_2 \backslash s''$, $s'$ and $s''$ being 
+sub-strings of $s$).
+which will each have a size uppder bound 
+according to inductive hypothesis, which controls $r \backslash s$.
 
-With this we can also add simplifications to both sides and get
+We elaborate the above reasoning by a series of lemmas
+below, where straightforward proofs are omitted.
 \begin{lemma}
-$\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
+	If $\forall r \in rs. \rsize{r} $ is less than or equal to $N$,
+	and $\textit{length} \; rs$ is less than or equal to $l$,
+	then $\rsize{\sum rs}$ is less than or equal to $l*N + 1$.
+\end{lemma}
+\noindent
+If we define all regular expressions with size no
+more than $N$ as $\sizeNregex \; N$:
+\[
+	\sizeNregex \; N \dn  \{r \mid \rsize{r} \leq N \}
+\]
+Then such set is finite:
+\begin{lemma}\label{finiteSizeN}
+	$\textit{isFinite}\; (\sizeNregex \; N)$
+\end{lemma}
+\begin{proof}
+	By overestimating the set $\sizeNregex \; N + 1$
+	using union of sets like
+	$\{r_1 \cdot r_2 \mid r_1 \in A
+		\text{and}
+		r_2 \in A\}
+	$ where $A = \sizeNregex \; N$.
+\end{proof}
+\noindent
+From this we get a corollary that
+if forall $r \in rs$, $\rsize{r} \leq N$, then the output of 
+$\rdistinct{rs}{\varnothing}$ is a list of regular
+expressions of finite size depending on $N$ only. 
+\begin{corollary}
+	Assumes that for all $r \in rs. \rsize{r} \leq N$,
+	and the cardinality of $\sizeNregex \; N$ is $c_N$
+	then$\rsize{\rdistinct{rs}{\varnothing}} \leq c*N$.
+\end{corollary}
+\noindent
+We have proven that the output of $\rdistinct{rs'}{\varnothing}$
+is bounded by a constant $c_N$ depending only on $N$,
+provided that each of $rs'$'s element
+is bounded by $N$.
+We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
+
+We show how $\rdistinct$ and $\rflts$
+in the simplification function together is at least as 
+good as $\rdistinct{}{}$ alone.
+\begin{lemma}\label{interactionFltsDB}
+	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
+	\leq 
+	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
 \end{lemma}
-Together with the idempotency property of $\rsimp{}$\ref{rsimpIdem},
-%TODO: state the idempotency property of rsimp
-it is possible to convert the above lemma to obtain a "closed form"
-for  derivatives nested with simplification:
-\begin{lemma}
-$\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
+\noindent
+The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
+duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
+
+Now this $\rsimp{\sum rs}$ can be estimated using $\rdistinct{rs}{\varnothing}$:
+\begin{lemma}\label{altsSimpControl}
+	$\rsize{\rsimp{\sum rs}} \leq \rsize{\rdistinct{rs}{\varnothing}}+ 1$
 \end{lemma}
-And now the reason we have (1) in section 1 is clear:
-$\rsize{\rsimp{\RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map \;(r_2 \backslash \_) \; (\vsuf{s}{r_1}))}}}$, 
-is equal to $\rsize{\rsimp{\RALTS{ ((r_1 \backslash s) \cdot r_2 :: (\map \; (r_2 \backslash \_) \; (\textit{Suffix}(r1, s))))}}}$
-    as $\vsuf{s}{r_1}$ is one way of expressing the list $\textit{Suffix}( r_1, s)$.
+\begin{proof}
+	By using \ref{interactionFltsDB}.
+\end{proof}
+\noindent
+which says that the size of regular expression
+is always smaller if we apply the full simplification
+rather than just one component ($\rdistinct{}{}$).
+
+
+Now we are ready to control the sizes of
+$r_1 \cdot r_2 \backslash s$, $r^* \backslash s$.
+\begin{theorem}
+For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
+\end{theorem}
+\noindent
+\begin{proof}
+We prove this by induction on $r$. The base cases for $\RZERO$,
+$\RONE $ and $\RCHAR{c}$ are straightforward. 
+In the sequence $r_1 \cdot r_2$ case,
+the inductive hypotheses state $\exists N_1. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N_1$ and
+$\exists N_2. \forall s. \; \llbracket \rderssimp{r_2}{s} \rrbracket \leq N_2$. We can reason as follows
+%
+\begin{center}
+\begin{tabular}{lcll}
+& & $ \llbracket   \rderssimp{r_1\cdot r_2 }{s} \rrbracket_r $\\
+& $ = $ & $\llbracket \rsimp{(\sum(r_1 \backslash s \cdot r_2 \; \;  :: \; \; 
+\map \; (\lambda s'. r_2\backslash_{rsimp} s') (\vsuf{s}{r})))} \rrbracket_r $ & (1) \\
+& $\leq$ & $\llbracket \rsimp{(\sum(r_1 \backslash s \cdot r_2 ::
+\map \; (\lambda s'. r_2\backslash_{rsimp} s') (\vsuf{s}{r})))} \rrbracket_r $ & (2) \\
+									 
+\end{tabular}
+\end{center}
+
+\end{proof}
+
+\noindent
+where in (1) the $\textit{Suffix}( r_1, s)$ are the all the suffixes of $s$ where $\rderssimp{ r_1}{s'}$ is nullable ($s'$ being a suffix of $s$).
+The reason why we could write the derivatives of a sequence this way is described in section 2.
+The term (2) is used to control (1). 
+That is because one can obtain an overall
+smaller regex list
+by flattening it and removing $\ZERO$s first before applying $\distinctWith$ on it.
+Section 3 is dedicated to its proof.
+In (3) we know that  $\llbracket \ASEQ{bs}{(\rderssimp{ r_1}{s}}{r_2}\rrbracket$ is 
+bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
+than $N_2$. The list length after  $\distinctWith$ is bounded by a number, which we call $l_{N_2}$. It stands
+for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
+We reason similarly for  $\STAR$.\medskip
+
+%-----------------------------------
+%	SECTION 2
+%-----------------------------------
+
 
 %----------------------------------------------------------------------------------------
 %	SECTION 3
 %----------------------------------------------------------------------------------------
 
-\section{interaction between $\distinctWith$ and $\flts$}
-Note that it is not immediately obvious that 
-\begin{center}
-$\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $.
-\end{center}
-The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
-duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
-
 
 \subsection{A Closed Form for the Sequence Regular Expression}
-\begin{quote}\it
-	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
-	\begin{center}
-		$ \rderssimp{r_1 \cdot r_2}{s} = 
-	\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
-	\end{center}
-\end{quote}
 \noindent
 
 Before we get to the proof that says the intermediate result of our lexer will
@@ -1215,57 +1534,6 @@
 but this time we will have stronger equalities established.
 
 
-\section{Estimating the Closed Forms' sizes}
-
-		The closed form $f\; (g\; (\sum rs)) $ is controlled
-		by the size of $rs'$, where every element in $rs'$ is distinct, and each element can be described by som e inductive sub-structures (for example when $r = r_1 \cdot r_2$ then $rs'$ will be solely comprised of $r_1 \backslash s'$ and $r_2 \backslash s''$, $s'$ and $s''$ being sub-strings of $s$), which will have a numeric uppder bound according to inductive hypothesis, which controls $r \backslash s$.
-%-----------------------------------
-%	SECTION 2
-%-----------------------------------
-
-\begin{theorem}
-For any regex $r$, $\exists N_r. \forall s. \; \llbracket{\bderssimp{r}{s}}\rrbracket \leq N_r$
-\end{theorem}
-
-\noindent
-\begin{proof}
-We prove this by induction on $r$. The base cases for $\AZERO$,
-$\AONE \textit{bs}$ and $\ACHAR \textit{bs} c$ are straightforward. The interesting case is
-for sequences of the form $\ASEQ{bs}{r_1}{r_2}$. In this case our induction
-hypotheses state $\exists N_1. \forall s. \; \llbracket \bderssimp{r}{s} \rrbracket \leq N_1$ and
-$\exists N_2. \forall s. \; \llbracket \bderssimp{r_2}{s} \rrbracket \leq N_2$. We can reason as follows
-%
-\begin{center}
-\begin{tabular}{lcll}
-& & $ \llbracket   \bderssimp{\ASEQ{bs}{r_1}{r_2} }{s} \rrbracket$\\
-& $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;(\ASEQ{nil}{\bderssimp{ r_1}{s}}{ r_2} ::
-    [\bderssimp{ r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)]))\rrbracket $ & (1) \\
-& $\leq$ &
-    $\llbracket\textit{\distinctWith}\,((\ASEQ{nil}{\bderssimp{r_1}{s}}{r_2}$) ::
-    [$\bderssimp{ r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)])\,\approx\;{}\rrbracket + 1 $ & (2) \\
-& $\leq$ & $\llbracket\ASEQ{bs}{\bderssimp{ r_1}{s}}{r_2}\rrbracket +
-             \llbracket\textit{distinctWith}\,[\bderssimp{r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)]\,\approx\;{}\rrbracket + 1 $ & (3) \\
-& $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 +
-      \llbracket \distinctWith\,[ \bderssimp{r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)] \,\approx\;\rrbracket$ & (4)\\
-& $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5)
-\end{tabular}
-\end{center}
-
-
-\noindent
-where in (1) the $\textit{Suffix}( r_1, s)$ are the all the suffixes of $s$ where $\bderssimp{ r_1}{s'}$ is nullable ($s'$ being a suffix of $s$).
-The reason why we could write the derivatives of a sequence this way is described in section 2.
-The term (2) is used to control (1). 
-That is because one can obtain an overall
-smaller regex list
-by flattening it and removing $\ZERO$s first before applying $\distinctWith$ on it.
-Section 3 is dedicated to its proof.
-In (3) we know that  $\llbracket \ASEQ{bs}{(\bderssimp{ r_1}{s}}{r_2}\rrbracket$ is 
-bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
-than $N_2$. The list length after  $\distinctWith$ is bounded by a number, which we call $l_{N_2}$. It stands
-for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
-We reason similarly for  $\STAR$.\medskip
-\end{proof}
 
 What guarantee does this bound give us?