ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 557 812e5d112f49
parent 556 c27f04bb2262
child 558 671a83abccf3
equal deleted inserted replaced
556:c27f04bb2262 557:812e5d112f49
   766 %Rewriting steps not put in--too long and complicated-------------------------------
   766 %Rewriting steps not put in--too long and complicated-------------------------------
   767 \noindent
   767 \noindent
   768 We need more equalities like the above to enable a closed form,
   768 We need more equalities like the above to enable a closed form,
   769 but to proceed we need to introduce two rewrite relations,
   769 but to proceed we need to introduce two rewrite relations,
   770 to make things smoother.
   770 to make things smoother.
   771 \subsubsection{The rewrite relation $\hrewrite$, $\frewrite$ and $\grewrite$}
   771 \subsubsection{The rewrite relation $\hrewrite$ , $\scfrewrites$ , $\frewrite$ and $\grewrite$}
   772 Insired by the success we had in the correctness proof 
   772 Insired by the success we had in the correctness proof 
   773 in \ref{Bitcoded2}, where we invented
   773 in \ref{Bitcoded2}, where we invented
   774 a term rewriting system to capture the similarity between terms,
   774 a term rewriting system to capture the similarity between terms,
   775 we follow suit here defining simplification
   775 we follow suit here defining simplification
   776 steps as rewriting steps. This allows capturing 
   776 steps as rewriting steps. This allows capturing 
   777 similarities between terms that would be otherwise
   777 similarities between terms that would be otherwise
   778 hard to express.
   778 hard to express.
   779 
   779 
   780 We use $\hrewrite$ for one-step atomic rewrite of regular expression simplification, 
   780 We use $\hrewrite$ for one-step atomic rewrite of 
       
   781 regular expression simplification, 
   781 $\frewrite$ for rewrite of list of regular expressions that 
   782 $\frewrite$ for rewrite of list of regular expressions that 
   782 include all operations carried out in $\rflts$, and $\grewrite$ for
   783 include all operations carried out in $\rflts$, and $\grewrite$ for
   783 rewriting a list of regular expressions possible in both $\rflts$ and $\rdistinct{}{}$.
   784 rewriting a list of regular expressions possible in both $\rflts$ and $\rdistinct{}{}$.
   784 Their reflexive transitive closures are used to denote zero or many steps,
   785 Their reflexive transitive closures are used to denote zero or many steps,
   785 as was the case in the previous chapter.
   786 as was the case in the previous chapter.
   819 	\inferrule[RALTSDelete]{\\ r_1 = r_2}{\sum rs_a @ [r_1] @ rs_b @ [r_2] @ rsc \hrewrite \sum rs_a @ [r_1] @ rs_b @ rs_c}
   820 	\inferrule[RALTSDelete]{\\ r_1 = r_2}{\sum rs_a @ [r_1] @ rs_b @ [r_2] @ rsc \hrewrite \sum rs_a @ [r_1] @ rs_b @ rs_c}
   820 
   821 
   821 \end{mathpar}
   822 \end{mathpar}
   822 \end{center}
   823 \end{center}
   823 
   824 
       
   825 
       
   826 	List of rewrite rules for a list of regular expressions,
       
   827 	where each element can rewrite in many steps to the other (scf stands for
       
   828 	li\emph{s}t \emph{c}losed \emph{f}orm). This relation is similar to the 
       
   829 	$\stackrel{s*}{\rightsquigarrow}$ for annotated regular expressions.
       
   830 
       
   831 \begin{center}
       
   832 \begin{mathpar}
       
   833 	\inferrule{}{[] \scfrewrites [] }
       
   834 	\inferrule{r \hrewrites r' \\ rs \scfrewrites rs'}{r :: rs \scfrewrites r' :: rs'}
       
   835 \end{mathpar}
       
   836 \end{center}
   824 %frewrite
   837 %frewrite
   825 	List of one-step rewrite rules for flattening 
   838 	List of one-step rewrite rules for flattening 
   826 	a list of  regular expressions($\frewrite$):
   839 	a list of  regular expressions($\frewrite$):
   827 \begin{center}
   840 \begin{center}
   828 \begin{mathpar}
   841 \begin{mathpar}
   836 
   849 
   837 	Lists of one-step rewrite rules for flattening and de-duplicating
   850 	Lists of one-step rewrite rules for flattening and de-duplicating
   838 	a list of regular expressions ($\grewrite$):
   851 	a list of regular expressions ($\grewrite$):
   839 \begin{center}
   852 \begin{center}
   840 \begin{mathpar}
   853 \begin{mathpar}
   841 	\inferrule{}{\RZERO :: rs \frewrite rs \\}
   854 	\inferrule{}{\RZERO :: rs \grewrite rs \\}
   842 
   855 
   843 	\inferrule{}{(\sum rs) :: rs_a \frewrite rs @ rs_a \\}
   856 	\inferrule{}{(\sum rs) :: rs_a \grewrite rs @ rs_a \\}
   844 
   857 
   845 	\inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2}
   858 	\inferrule{rs_1 \grewrite rs_2}{r :: rs_1 \grewrite r :: rs_2}
   846 
   859 
   847 	\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
   860 	\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
   848 \end{mathpar}
   861 \end{mathpar}
   849 \end{center}
   862 \end{center}
   850 
   863 
   851 \noindent
   864 \noindent
   852 The reason why we take the trouble of defining 
   865 The reason why we take the trouble of defining 
   853 two separate list rewriting definitions $\frewrite$ and $\grewrite$
   866 two separate list rewriting definitions $\frewrite$ and $\grewrite$
   854 is that sometimes $\grewrites$ is slightly too powerful
   867 is to separate the two stages of simplification: flattening and de-duplicating.
   855 for proving equivalence.
   868 Sometimes $\grewrites$ is slightly too powerful
       
   869 so we would rather use $\frewrites$ which makes certain rewriting steps 
       
   870 more straightforward to prove.
   856 For example, when proving the closed-form for the alternative regular expression,
   871 For example, when proving the closed-form for the alternative regular expression,
   857 one of the rewriting steps would be:
   872 one of the rewriting steps would be:
   858 \begin{lemma}
   873 \begin{lemma}
   859 	$\sum (\rDistinct \; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \; \varnothing) \sequal
   874 	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
   860 	 \sum (\rDistinct \; (\rflts \; (\map \; (\_ \backslash x) \; rs)) \; \varnothing)
   875 	 \sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
   861 	 $
   876 	 $
   862 \end{lemma}
   877 \end{lemma}
   863 \noindent
   878 \noindent
   864 Proving this is by first showing 
   879 Proving this is by first showing 
   865 \begin{lemma}
   880 \begin{lemma}\label{earlyLaterDerFrewrites}
   866 	$\map \; (\_ \backslash x) \;  (\rflts \; rs) \frewrites
   881 	$\map \; (\_ \backslash x) \;  (\rflts \; rs) \frewrites
   867 \rflts \; (\map \; (\_ \backslash x) \; rs)$
   882 	\rflts \; (\map \; (\_ \backslash x) \; rs)$
   868 \end{lemma}
   883 \end{lemma}
   869 \noindent
   884 \noindent
   870 and then using lemma
   885 and then using lemma
   871 \begin{lemma}\label{frewritesSimpeq}
   886 \begin{lemma}\label{frewritesSimpeq}
   872 	If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal 
   887 	If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal 
   873 	\sum (\rDistinct \;  rs_2 \;  {})$.
   888 	\sum (\rDistinct \;  rs_2 \;  \varnothing)$.
   874 \end{lemma}
   889 \end{lemma}
   875 . But this trick will not work for $\grewrites$:
   890 \noindent
       
   891 is a piece of cake.
       
   892 But this trick will not work for $\grewrites$.
       
   893 For example, a rewriting step in proving
       
   894 closed forms is:
       
   895 \begin{center}
       
   896 $\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))}$\\
       
   897 $=$ \\
       
   898 $\rsimp{(\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; (\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}))} $
       
   899 \noindent
       
   900 \end{center}
       
   901 For this one would hope to have a rewriting relation between the two lists involved,
       
   902 similar to \ref{earlyLaterDerFrewrites}. However, it turns out that 
   876 \begin{center}
   903 \begin{center}
   877 $\map \; (\_ \backslash x) \; (\rDistinct \; rs \; rset) \grewrites \rDistinct \; (\map \;
   904 $\map \; (\_ \backslash x) \; (\rDistinct \; rs \; rset) \grewrites \rDistinct \; (\map \;
   878 (\_ \backslash x) \; rs) \; (\map \; (\_ \backslash x) \; rset)$
   905 (\_ \backslash x) \; rs) \; ( rset \backslash x)$
   879 \end{center}
   906 \end{center}
   880 \noindent
   907 \noindent
   881 does \emph{not} hold.
   908 does $\mathbf{not}$ hold in general.
   882 
   909 For this rewriting step we will introduce some slightly more cumbersome
   883 
   910 proof technique in later sections.
   884 
   911 The point is that $\frewrite$
   885 
   912 allows us to prove equivalence in a straightforward two-step method that is 
   886 %this is for closed form for alts section, talk about that later------------------------------
   913 not possible for $\grewrite$, thereby reducing the complexity of the entire proof.
   887 \begin{comment}
   914 
   888 \begin{center}
   915 
   889 $\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))} = 
   916 \subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$}
   890 \rsimp{(\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; (\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}))} $
   917 We present in the below lemma a few pairs of terms that are rewritable via 
   891 \end{center}
   918 $\grewrites$:
       
   919 \begin{lemma}\label{gstarRdistinctGeneral}
       
   920 	\begin{itemize}
       
   921 		\item
       
   922 			$rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$
       
   923 		\item
       
   924 			$rs \grewrites \rDistinct \; rs \; \varnothing$
       
   925 		\item
       
   926 			$rs_a @ (\rDistinct \; rs \; rs_a) \grewrites rs_a @ (\rDistinct \; 
       
   927 			rs \; (\{\RZERO\} \cup rs_a))$
       
   928 		\item
       
   929 			$rs \;\; @ \;\; \rDistinct \; rs_a \; rset \grewrites rs @  \rDistinct \; rs_a \;
       
   930 			(rest \cup rs)$
       
   931 
       
   932 	\end{itemize}
       
   933 \end{lemma}
       
   934 \noindent
       
   935 If a pair of terms $rs_1, rs_2$ are rewritable via $\grewrites$ to each other,
       
   936 then they are equivalent under $\rsimp{}$:
       
   937 \begin{lemma}\label{grewritesSimpalts}
       
   938 	If $rs_1 \grewrites rs_2$, then
       
   939 	we have the following equivalence hold:
       
   940 	\begin{itemize}
       
   941 		\item
       
   942 			$\sum rs_1 \sequal \sum rs_2$
       
   943 		\item
       
   944 			$\rsimpalts \; rs_1 \sequal \rsimpalts \; rs_2$
       
   945 	\end{itemize}
       
   946 \end{lemma}
       
   947 \noindent
       
   948 Here are a few connecting lemmas showing that
       
   949 if a list of regular expressions can be rewritten using $\grewrites$ or $\frewrites $ or
       
   950 $\scfrewrites$,
       
   951 then an alternative constructor taking the list can also be rewritten using $\hrewrites$:
       
   952 \begin{lemma}
       
   953 	\begin{itemize}
       
   954 		\item
       
   955 			If $rs \grewrites rs'$ then $\sum rs \hrewrites \sum rs'$.
       
   956 		\item
       
   957 			If $rs \grewrites rs'$ then $\sum rs \hrewrites \rsimpalts \; rs'$
       
   958 		\item
       
   959 			If $rs_1 \scfrewrites rs_2$ then $\sum (rs @ rs_1) \hrewrites \sum (rs @ rs_2)$
       
   960 		\item
       
   961 			If $rs_1 \scfrewrites rs_2$ then $\sum rs_1 \hrewrites \sum rs_2$
       
   962 
       
   963 	\end{itemize}
       
   964 \end{lemma}
       
   965 \noindent
       
   966 Here comes the meat of the proof, 
       
   967 which says that once two lists are rewritable to each other,
       
   968 then they are equivalent under $\rsimp{}$:
       
   969 \begin{lemma}
       
   970 	If $r_1 \hrewrites r_2$ then $r_1 \sequal r_2$.
       
   971 \end{lemma}
       
   972 
       
   973 \noindent
       
   974 And similar to \ref{Bitcoded2} one can preserve rewritability after taking derivative
       
   975 of two regular expressions on both sides:
       
   976 \begin{lemma}\label{interleave}
       
   977 	If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$
       
   978 \end{lemma}
       
   979 \noindent
       
   980 This allows proving more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$ now.
       
   981 \begin{lemma}\label{insideSimpRemoval}
       
   982 	$\rsimp{\rder{c}{\rsimp{r}}} = \rsimp{\rder{c}{r}}  $
       
   983 \end{lemma}
       
   984 \noindent
       
   985 \begin{proof}
       
   986 	By \ref{interleave} and \ref{rsimpIdem}.
       
   987 \end{proof}
       
   988 \noindent
       
   989 And this unlocks more equivalent terms:
       
   990 \begin{lemma}\label{Simpders}
       
   991 	As corollaries of \ref{insideSimpRemoval}, we have
       
   992 	\begin{itemize}
       
   993 		\item
       
   994 			If $s \neq []$ then $\rderssimp{r}{s} = \rsimp{(\rders \; r \; s)}$.
       
   995 		\item
       
   996 			$\rsimpalts \; (\map \; (\_ \backslash_r x) \;
       
   997 				(\rdistinct{rs}{\varnothing})) \sequal
       
   998 			 \rsimpalts \; (\rDistinct \; 
       
   999 			 (\map \; (\_ \backslash_r x) rs) \;\varnothing  )$
       
  1000 	 \end{itemize}
       
  1001  \end{lemma}
       
  1002 \noindent
       
  1003 
       
  1004 Finally,
       
  1005 together with 
       
  1006 \begin{lemma}\label{rderRsimpAltsCommute}
       
  1007 	$\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$
       
  1008 \end{lemma}
       
  1009 \noindent
       
  1010 this leads to the first closed form--
       
  1011 \begin{lemma}\label{altsClosedForm}
       
  1012 \begin{center}
       
  1013 	$\rderssimp{(\sum rs)}{s} \sequal
       
  1014 	 \sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
       
  1015 \end{center}
       
  1016 \end{lemma}
   892 	
  1017 	
   893 \noindent
  1018 \noindent
   894 This is not possible to prove 
  1019 \begin{proof}
   895 
  1020 	By a reverse induction on the string $s$.
   896 \end{comment}
  1021 	One rewriting step, as we mentioned earlier,
   897 %this is for closed form for alts section, talk about that later------------------------------
  1022 	involves
   898 And we define an "grewrite" relation that works on lists:
  1023 	\begin{center}
       
  1024 		$\rsimpalts \; (\map \; (\_ \backslash x) \; 
       
  1025 		(\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; 
       
  1026 		(\lambda r. \rderssimp{r}{xs}))))}{\varnothing}))
       
  1027 		\sequal
       
  1028 		\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; 
       
  1029 		(\rflts \; (\map \; (\rsimp{} \; \circ \; 
       
  1030 		(\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}) $.
       
  1031 	\end{center}
       
  1032 	This can be proven by a combination of 
       
  1033 	\ref{grewritesSimpalts}, \ref{gstarRdistinctGeneral}, \ref{rderRsimpAltsCommute}, and
       
  1034 	\ref{insideSimpRemoval}.
       
  1035 \end{proof}
       
  1036 \noindent
       
  1037 This closed form has a variant which can be more convenient in later proofs:
       
  1038 \begin{corollary}
       
  1039 	If $s \neq []$ then 
       
  1040 	$\rderssimp \; (\sum \; rs) \; s = 
       
  1041 	\rsimp{(\sum \; (\map \; \rderssimp{\_}{s} \; rs))}$.
       
  1042 \end{corollary}
       
  1043 \noindent
       
  1044 The harder closed forms are the sequence and star ones.
       
  1045 Before we go on to obtain them, some preliminary definitions
       
  1046 are needed to make proof statements concise.
       
  1047 
       
  1048 \section{"Closed Forms" of regular expressions' derivatives w.r.t strings}
       
  1049 We note that the different ways in which regular expressions are 
       
  1050 nested do not matter under $\rsimp{}$:
       
  1051 \begin{center}
       
  1052 	To be proven:\\
       
  1053 	$\rsimp{r} \stackrel{?}{=} \rsimp{r'}$ if $r = \sum [r_1, r_2, r_3, \ldots]$ 
       
  1054 and $r' =(\ldots ((r_1 + r_2) + r_3) + \ldots)$
       
  1055 \end{center}
       
  1056 \noindent
       
  1057 This intuition is also echoed by IndianPaper, where they gave
       
  1058 a pencil-and-paper derivation of $(r_1 \cdot r_2)\backslash s$:
       
  1059 \begin{center}
       
  1060 	$((r_1 \cdot r_2) \backslash_r c_1) \stackrel{\backslash_r c_2}{=}
       
  1061 	((r_1 \backslash_r c_1) \cdot r_2 + (\delta\; (\rnullable \; r_1),\; r_2 \backslash_r c_1)) 
       
  1062 	\stackrel{\backslash_r c_2}{=}
       
  1063 	((r_1 \backslash_r c_1c_2 \cdot r_2 + (\delta \; (\rnullable) \; r_1, r_2 \backslash_r c_1c_2)
       
  1064 	+ (\delta \; (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2)
       
  1065 	$
       
  1066 \end{center}
       
  1067 \noindent
       
  1068 The $\delta \; b \; r$ function above turns the entire term into $\RZERO$
       
  1069 if the boolean condition $b$ evaluates to false, and outputs $r$ otherwise.
       
  1070 The equality in their derivation steps should be interpretated
       
  1071 as language equivalence. To pin down the intuition into rigorous terms,
       
  1072 $\sflat{}$ is used to enable the transformation from 
       
  1073 a left-associative nested sequence of alternatives into 
       
  1074 a flattened list:
       
  1075 $r = \sum [r_1, r_2, r_3, \ldots] \stackrel{\sflat{}}{\rightarrow} 
       
  1076 (\ldots ((r_1 + r_2) + r_3) + \ldots)$
       
  1077 The definitions $\sflat{}$, $\sflataux{}$ are given below.
       
  1078 
       
  1079  \begin{center}  
       
  1080  \begin{tabular}{ccc}
       
  1081  $\sflataux{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\
       
  1082 $\sflataux{\AALTS{ }{[]}}$ & $ = $ & $ []$\\
       
  1083 $\sflataux r$ & $=$ & $ [r]$
       
  1084 \end{tabular}
       
  1085 \end{center}
       
  1086 
       
  1087  \begin{center} 
       
  1088  \begin{tabular}{ccc}
       
  1089 	 $\sflat{(\sum r :: rs)}$ & $=$ & $\sum (\sflataux{r} @ rs)$\\
       
  1090 	 $\sflat{\sum []}$ & $ = $ & $ \sum []$\\
       
  1091 $\sflat r$ & $=$ & $ r$
       
  1092 \end{tabular}
       
  1093 \end{center}
       
  1094 The intuition behind $\sflataux{\_}$ is to break up nested regexes 
       
  1095 of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape
       
  1096 into a more "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$.
       
  1097 It will return the singleton list $[r]$ otherwise.
       
  1098 $\sflat{\_}$ works the same  as $\sflataux{\_}$, except that it keeps
       
  1099 the output type a regular expression, not a list.
       
  1100 $\sflataux{\_}$  and $\sflat{\_}$ is only recursive in terms of the
       
  1101  first element of the list of children of
       
  1102 an alternative regex ($\AALTS{ }{rs}$), and is purposefully built for  dealing with the regular expression
       
  1103 $r_1 \cdot r_2 \backslash s$.
       
  1104 \subsection{The $\textit{vsuf}$ Function}
       
  1105 Note that $((r_1 \backslash_r c_1c_2 \cdot r_2 + (\delta \; (\rnullable) \; r_1, r_2 \backslash_r c_1c_2)
       
  1106 	+ (\delta \; (\rnullable \; r_1 \backslash_r c)\; r_2 \backslash_r c_2)$ does not faithfully
       
  1107 represent what would the intermediate derivatives would actually look like.
       
  1108 For example, when $r_1$ and $r_1 \backslash_r c_1$ are nullable,
       
  1109 the regular expression would not look like $(r_1 \backslash_r c_1c_2 + \RZERO ) + \RZERO$,
       
  1110 but actually $r_1\backslash_r c_1c_2$, the redundant $\RZERO$s will not be created in the
       
  1111 first place.
       
  1112 In a closed-form one would want to take into account this 
       
  1113 and generate the list of string pairs $(s', s'')$ where $s'@s'' = s$ such that
       
  1114 only $r_1 \backslash s'$ nullable.
       
  1115 With $\sflat{\_}$ and $\sflataux{\_}$ we make 
       
  1116 precise what  "closed forms" we have for the sequence derivatives and their simplifications,
       
  1117 in other words, how can we construct $(r_1 \cdot r_2) \backslash s$
       
  1118 and $\bderssimp{(r_1\cdot r_2)}{s}$,
       
  1119 if we are only allowed to use a combination of $r_1 \backslash s'$ ($\bderssimp{r_1}{s'}$)
       
  1120 and  $r_2 \backslash s'$ ($\bderssimp{r_2}{s'}$), where $s'$  ranges over 
       
  1121 the substring of $s$?
       
  1122 First let's look at a series of derivatives steps on a sequence 
       
  1123 regular expression,  (assuming) that each time the first
       
  1124 component of the sequence is always nullable):
       
  1125 \begin{center}
       
  1126 
       
  1127 $r_1 \cdot r_2 \quad \longrightarrow_{\backslash c}  \quad   r_1  \backslash c \cdot r_2 + r_2 \backslash c \quad \longrightarrow_{\backslash c'} \quad (r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc' \longrightarrow_{\backslash c''} \quad$\\
       
  1128 $((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') + r_2 \backslash cc'c''   \longrightarrow_{\backslash c''} \quad
       
  1129  \ldots$
       
  1130 
       
  1131 \end{center}
       
  1132 %TODO: cite indian paper
       
  1133 Indianpaper have  come up with a slightly more formal way of putting the above process:
       
  1134 \begin{center}
       
  1135 $r_1 \cdot r_2 \backslash (c_1 :: c_2 ::\ldots c_n) \myequiv r_1 \backslash (c_1 :: c_2:: \ldots c_n) +
       
  1136 \delta(\nullable(r_1 \backslash (c_1 :: c_2 \ldots c_{n-1}) ), r_2 \backslash (c_n)) + \ldots
       
  1137 + \delta (\nullable(r_1), r_2\backslash (c_1 :: c_2 ::\ldots c_n))$
       
  1138 \end{center}
       
  1139 where  $\delta(b, r)$ will produce $r$
       
  1140 if $b$ evaluates to true, 
       
  1141 and $\ZERO$ otherwise.
       
  1142 
       
  1143  But the $\myequiv$ sign in the above equation means language equivalence instead of syntactical
       
  1144  equivalence. To make this intuition useful 
       
  1145  for a formal proof, we need something
       
  1146 stronger than language equivalence.
       
  1147 With the help of $\sflat{\_}$ we can state the equation in Indianpaper
       
  1148 more rigorously:
       
  1149 \begin{lemma}
       
  1150 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$
       
  1151 \end{lemma}
       
  1152 
       
  1153 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string:
       
  1154 
   899 \begin{center}
  1155 \begin{center}
   900 \begin{tabular}{lcl}
  1156 \begin{tabular}{lcl}
   901 $ \ZERO :: rs$ & $\rightarrow_g$ & $rs$
  1157 $\vsuf{[]}{\_} $ & $=$ &  $[]$\\
   902 \end{tabular}
  1158 $\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\
   903 \end{center}
  1159                                      && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) })  $
   904 
  1160 \end{tabular}
   905 
  1161 \end{center}                   
   906 
  1162 It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable,
   907 With these relations we prove
  1163 and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in
       
  1164 the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$.
       
  1165 In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r_1 \cdot r_2$, but instead of producing 
       
  1166 the entire result of  $(r_1 \cdot r_2) \backslash s$, 
       
  1167 it only stores all the $s''$ such that $r_2 \backslash s''$  are occurring terms in $(r_1\cdot r_2)\backslash s$.
       
  1168 
       
  1169 With this we can also add simplifications to both sides and get
   908 \begin{lemma}
  1170 \begin{lemma}
   909 $rs \rightarrow rs'  \implies \RALTS{rs} \rightarrow \RALTS{rs'}$
  1171 $\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
   910 \end{lemma}
  1172 \end{lemma}
   911 which enables us to prove "closed forms" equalities such as
  1173 Together with the idempotency property of $\rsimp{}$\ref{rsimpIdem},
       
  1174 %TODO: state the idempotency property of rsimp
       
  1175 it is possible to convert the above lemma to obtain a "closed form"
       
  1176 for  derivatives nested with simplification:
   912 \begin{lemma}
  1177 \begin{lemma}
   913 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\suffix \; s \; r_1 ))}$
  1178 $\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
   914 \end{lemma}
  1179 \end{lemma}
   915 
  1180 And now the reason we have (1) in section 1 is clear:
   916 But the most involved part of the above lemma is proving the following:
  1181 $\rsize{\rsimp{\RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map \;(r_2 \backslash \_) \; (\vsuf{s}{r_1}))}}}$, 
   917 \begin{lemma}
  1182 is equal to $\rsize{\rsimp{\RALTS{ ((r_1 \backslash s) \cdot r_2 :: (\map \; (r_2 \backslash \_) \; (\textit{Suffix}(r1, s))))}}}$
   918 $\bsimp{\RALTS{rs @ \RALTS{rs_1} @ rs'}} = \bsimp{\RALTS{rs @rs_1 @ rs'}} $ 
  1183     as $\vsuf{s}{r_1}$ is one way of expressing the list $\textit{Suffix}( r_1, s)$.
   919 \end{lemma}
  1184 
   920 which is needed in proving things like 
  1185 %----------------------------------------------------------------------------------------
   921 \begin{lemma}
  1186 %	SECTION 3
   922 $r \rightarrow_f r'  \implies \rsimp{r} \rightarrow \rsimp{r'}$
  1187 %----------------------------------------------------------------------------------------
   923 \end{lemma}
  1188 
   924 
  1189 \section{interaction between $\distinctWith$ and $\flts$}
   925 Fortunately this is provable by induction on the list $rs$
  1190 Note that it is not immediately obvious that 
       
  1191 \begin{center}
       
  1192 $\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $.
       
  1193 \end{center}
       
  1194 The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
       
  1195 duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
       
  1196 
   926 
  1197 
   927 \subsection{A Closed Form for the Sequence Regular Expression}
  1198 \subsection{A Closed Form for the Sequence Regular Expression}
   928 \begin{quote}\it
  1199 \begin{quote}\it
   929 	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
  1200 	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
   930 	\begin{center}
  1201 	\begin{center}
  1220 
  1491 
  1221 
  1492 
  1222 %----------------------------------------------------------------------------------------
  1493 %----------------------------------------------------------------------------------------
  1223 %	SECTION ??
  1494 %	SECTION ??
  1224 %----------------------------------------------------------------------------------------
  1495 %----------------------------------------------------------------------------------------
  1225 
       
  1226 \section{"Closed Forms" of regular expressions' derivatives w.r.t strings}
       
  1227 To embark on getting the "closed forms" of regexes, we first
       
  1228 define a few auxiliary definitions to make expressing them smoothly.
       
  1229 
       
  1230  \begin{center}  
       
  1231  \begin{tabular}{ccc}
       
  1232  $\sflataux{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\
       
  1233 $\sflataux{\AALTS{ }{[]}}$ & $ = $ & $ []$\\
       
  1234 $\sflataux r$ & $=$ & $ [r]$
       
  1235 \end{tabular}
       
  1236 \end{center}
       
  1237 The intuition behind $\sflataux{\_}$ is to break up nested regexes 
       
  1238 of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape
       
  1239 into a more "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$.
       
  1240 It will return the singleton list $[r]$ otherwise.
       
  1241 
       
  1242 $\sflat{\_}$ works the same  as $\sflataux{\_}$, except that it keeps
       
  1243 the output type a regular expression, not a list:
       
  1244  \begin{center} 
       
  1245  \begin{tabular}{ccc}
       
  1246  $\sflat{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\
       
  1247 $\sflat{\AALTS{ }{[]}}$ & $ = $ & $ \AALTS{ }{[]}$\\
       
  1248 $\sflat r$ & $=$ & $ [r]$
       
  1249 \end{tabular}
       
  1250 \end{center}
       
  1251 $\sflataux{\_}$  and $\sflat{\_}$ is only recursive in terms of the
       
  1252  first element of the list of children of
       
  1253 an alternative regex ($\AALTS{ }{rs}$), and is purposefully built for  dealing with the regular expression
       
  1254 $r_1 \cdot r_2 \backslash s$.
       
  1255 
       
  1256 With $\sflat{\_}$ and $\sflataux{\_}$ we make 
       
  1257 precise what  "closed forms" we have for the sequence derivatives and their simplifications,
       
  1258 in other words, how can we construct $(r_1 \cdot r_2) \backslash s$
       
  1259 and $\bderssimp{(r_1\cdot r_2)}{s}$,
       
  1260 if we are only allowed to use a combination of $r_1 \backslash s'$ ($\bderssimp{r_1}{s'}$)
       
  1261 and  $r_2 \backslash s'$ ($\bderssimp{r_2}{s'}$), where $s'$  ranges over 
       
  1262 the substring of $s$?
       
  1263 First let's look at a series of derivatives steps on a sequence 
       
  1264 regular expression,  (assuming) that each time the first
       
  1265 component of the sequence is always nullable):
       
  1266 \begin{center}
       
  1267 
       
  1268 $r_1 \cdot r_2 \quad \longrightarrow_{\backslash c}  \quad   r_1  \backslash c \cdot r_2 + r_2 \backslash c \quad \longrightarrow_{\backslash c'} \quad (r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc' \longrightarrow_{\backslash c''} \quad$\\
       
  1269 $((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') + r_2 \backslash cc'c''   \longrightarrow_{\backslash c''} \quad
       
  1270  \ldots$
       
  1271 
       
  1272 \end{center}
       
  1273 %TODO: cite indian paper
       
  1274 Indianpaper have  come up with a slightly more formal way of putting the above process:
       
  1275 \begin{center}
       
  1276 $r_1 \cdot r_2 \backslash (c_1 :: c_2 ::\ldots c_n) \myequiv r_1 \backslash (c_1 :: c_2:: \ldots c_n) +
       
  1277 \delta(\nullable(r_1 \backslash (c_1 :: c_2 \ldots c_{n-1}) ), r_2 \backslash (c_n)) + \ldots
       
  1278 + \delta (\nullable(r_1), r_2\backslash (c_1 :: c_2 ::\ldots c_n))$
       
  1279 \end{center}
       
  1280 where  $\delta(b, r)$ will produce $r$
       
  1281 if $b$ evaluates to true, 
       
  1282 and $\ZERO$ otherwise.
       
  1283 
       
  1284  But the $\myequiv$ sign in the above equation means language equivalence instead of syntactical
       
  1285  equivalence. To make this intuition useful 
       
  1286  for a formal proof, we need something
       
  1287 stronger than language equivalence.
       
  1288 With the help of $\sflat{\_}$ we can state the equation in Indianpaper
       
  1289 more rigorously:
       
  1290 \begin{lemma}
       
  1291 $\sflat{(r_1 \cdot r_2) \backslash s} = \RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$
       
  1292 \end{lemma}
       
  1293 
       
  1294 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string:
       
  1295 
       
  1296 \begin{center}
       
  1297 \begin{tabular}{lcl}
       
  1298 $\vsuf{[]}{\_} $ & $=$ &  $[]$\\
       
  1299 $\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\
       
  1300                                      && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) })  $
       
  1301 \end{tabular}
       
  1302 \end{center}                   
       
  1303 It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable,
       
  1304 and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in
       
  1305 the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$.
       
  1306 In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r_1 \cdot r_2$, but instead of producing 
       
  1307 the entire result of  $(r_1 \cdot r_2) \backslash s$, 
       
  1308 it only stores all the $s''$ such that $r_2 \backslash s''$  are occurring terms in $(r_1\cdot r_2)\backslash s$.
       
  1309 
       
  1310 With this we can also add simplifications to both sides and get
       
  1311 \begin{lemma}
       
  1312 $\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
       
  1313 \end{lemma}
       
  1314 Together with the idempotency property of $\rsimp{}$\ref{rsimpIdem},
       
  1315 %TODO: state the idempotency property of rsimp
       
  1316 it is possible to convert the above lemma to obtain a "closed form"
       
  1317 for  derivatives nested with simplification:
       
  1318 \begin{lemma}
       
  1319 $\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
       
  1320 \end{lemma}
       
  1321 And now the reason we have (1) in section 1 is clear:
       
  1322 $\rsize{\rsimp{\RALTS{ (r_1 \backslash s) \cdot r_2 :: (\map \;(r_2 \backslash \_) \; (\vsuf{s}{r_1}))}}}$, 
       
  1323 is equal to $\rsize{\rsimp{\RALTS{ ((r_1 \backslash s) \cdot r_2 :: (\map \; (r_2 \backslash \_) \; (\textit{Suffix}(r1, s))))}}}$
       
  1324     as $\vsuf{s}{r_1}$ is one way of expressing the list $\textit{Suffix}( r_1, s)$.
       
  1325 
       
  1326 %----------------------------------------------------------------------------------------
       
  1327 %	SECTION 3
       
  1328 %----------------------------------------------------------------------------------------
       
  1329 
       
  1330 \section{interaction between $\distinctWith$ and $\flts$}
       
  1331 Note that it is not immediately obvious that 
       
  1332 \begin{center}
       
  1333 $\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $.
       
  1334 \end{center}
       
  1335 The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
       
  1336 duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
       
  1337 
       
  1338 
  1496 
  1339 %-----------------------------------
  1497 %-----------------------------------
  1340 %	SECTION syntactic equivalence under simp
  1498 %	SECTION syntactic equivalence under simp
  1341 %-----------------------------------
  1499 %-----------------------------------
  1342 \section{Syntactic Equivalence Under $\simp$}
  1500 \section{Syntactic Equivalence Under $\simp$}