ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 613 b0f0d884a547
parent 611 bc1df466150a
child 614 d5e9bcb384ec
equal deleted inserted replaced
612:8c234a1bc7e0 613:b0f0d884a547
     5 \label{Finite} 
     5 \label{Finite} 
     6 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
     6 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
     7 %of our bitcoded algorithm, that is a finite bound on the size of any 
     7 %of our bitcoded algorithm, that is a finite bound on the size of any 
     8 %regex's derivatives. 
     8 %regex's derivatives. 
     9 
     9 
    10 In this chapter we give a guarantee in terms of size: 
    10 \begin{figure}
    11 given an annotated regular expression $a$, for any string $s$
       
    12 our algorithm $\blexersimp$'s internal annotated regular expression 
       
    13 size  is finitely bounded
       
    14 by a constant $N_a$ that only depends on $a$:
       
    15 \begin{center}
       
    16 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
       
    17 \end{center}
       
    18 \noindent
       
    19 where the size of an annotated regular expression is defined
       
    20 in terms of the number of nodes in its tree structure:
       
    21 \begin{center}
    11 \begin{center}
    22 	\begin{tabular}{ccc}
    12 	\begin{tabular}{ccc}
    23 		$\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
    13 		$\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
    24 		$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
    14 		$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
    25 		$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
    15 		$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
    26 		$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
    16 		$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
    27 		$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as   + 1$\\
    17 		$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as   + 1$\\
    28 		$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
    18 		$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
    29 	\end{tabular}
    19 	\end{tabular}
    30 \end{center}
    20 \end{center}
    31 We believe this size formalisation 
    21 \caption{The size function of bitcoded regular expressions}\label{brexpSize}
    32 of the algorithm is important in our context, because 
    22 \end{figure}
       
    23 In this chapter we give a guarantee in terms of size: 
       
    24 given an annotated regular expression $a$, for any string $s$
       
    25 our algorithm $\blexersimp$'s internal annotated regular expression 
       
    26 size  is finitely bounded
       
    27 by a constant $N_a$ that only depends on $a$:
       
    28 \begin{center}
       
    29 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
       
    30 \end{center}
       
    31 \noindent
       
    32 where the size of an annotated regular expression is defined
       
    33 in terms of the number of nodes in its tree structure (see figure \ref{brexpSize}).
       
    34 We believe this size bound
       
    35 is important in the context of POSIX lexing, because 
    33 \begin{itemize}
    36 \begin{itemize}
    34 	\item
    37 	\item
    35 		It is a stepping stone towards an ``absence of catastrophic-backtracking''
    38 		It is a stepping stone towards an ``absence of catastrophic-backtracking''
    36 		guarantee. The next step would be to refine the bound $N_a$ so that it
    39 		guarantee. 
       
    40 		If the internal data structures used by our algorithm cannot grow very large,
       
    41 		then our algorithm (which traverses those structures) cannot be too slow.
       
    42 		The next step would be to refine the bound $N_a$ so that it
    37 		is polynomial on $\llbracket a\rrbracket$.
    43 		is polynomial on $\llbracket a\rrbracket$.
    38 	\item
    44 	\item
    39 		The size bound proof gives us a higher confidence that
    45 		Having it formalised gives us a higher confidence that
    40 		our simplification algorithm $\simp$ does not ``mis-behave''
    46 		our simplification algorithm $\simp$ does not ``mis-behave''
    41 		like $\simpsulz$ does.
    47 		like $\simpsulz$ does.
    42 		The bound is universal, which is an advantage over work which 
    48 		The bound is universal, which is an advantage over work which 
    43 		only gives empirical evidence on some test cases.
    49 		only gives empirical evidence on some test cases.
    44 \end{itemize}
    50 \end{itemize}
    45 \section{Formalising About Size}
    51 \section{Formalising About Size}
    46 \noindent
    52 \noindent
    47 In our lexer $\blexersimp$,
    53 In our lexer ($\blexersimp$),
    48 The regular expression is repeatedly being taken derivative of
    54 we take an annotated regular expression as input,
    49 and then simplified.
    55 and repeately take derivative of and simplify it:
    50 \begin{figure}[H]
    56 \begin{figure}[H]
    51 	\begin{tikzpicture}[scale=2,
    57 	\begin{tikzpicture}[scale=2,
    52 		every node/.style={minimum size=11mm},
    58 		every node/.style={minimum size=11mm},
    53 		->,>=stealth',shorten >=1pt,auto,thick
    59 		->,>=stealth',shorten >=1pt,auto,thick
    54 		]
    60 		]
    73 	\end{tikzpicture}
    79 	\end{tikzpicture}
    74 	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{simpShrinks}
    80 	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{simpShrinks}
    75 \end{figure}
    81 \end{figure}
    76 \noindent
    82 \noindent
    77 Each time
    83 Each time
    78 a derivative is taken, a regular expression might grow a bit,
    84 a derivative is taken, the regular expression might grow.
    79 but simplification always takes care that 
    85 However, the simplification that is immediately afterwards will always shrink it so that 
    80 it stays small.
    86 it stays small.
    81 This intuition is depicted by the relative size
    87 This intuition is depicted by the relative size
    82 change between the black and blue nodes:
    88 change between the black and blue nodes:
    83 After $\simp$ the node always shrinks.
    89 After $\simp$ the node always shrinks.
    84 Our proof says that all the blue nodes
    90 Our proof says that all the blue nodes
    85 stay below a size bound $N_a$ determined by $a$.
    91 stay below a size bound $N_a$ determined by the input $a$.
    86 
    92 
    87 \noindent
    93 \noindent
    88 Sulzmann and Lu's assumed something similar about their algorithm,
    94 Sulzmann and Lu's assumed a similar picture about their algorithm,
    89 though in fact their algorithm's size might be better depicted by the following graph:
    95 though in fact their algorithm's size might be better depicted by the following graph:
    90 \begin{figure}[H]
    96 \begin{figure}[H]
    91 	\begin{tikzpicture}[scale=2,
    97 	\begin{tikzpicture}[scale=2,
    92 		every node/.style={minimum size=11mm},
    98 		every node/.style={minimum size=11mm},
    93 		->,>=stealth',shorten >=1pt,auto,thick
    99 		->,>=stealth',shorten >=1pt,auto,thick
   116 
   122 
   117 	\end{tikzpicture}
   123 	\end{tikzpicture}
   118 	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
   124 	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
   119 \end{figure}
   125 \end{figure}
   120 \noindent
   126 \noindent
   121 That is, on certain cases their lexer 
   127 The picture means that on certain cases their lexer (where they use $\simpsulz$ 
       
   128 as the simplification function)
   122 will have an indefinite size explosion, causing the running time 
   129 will have an indefinite size explosion, causing the running time 
   123 of each derivative step to grow arbitrarily large (for example 
   130 of each derivative step to grow continuously (for example 
   124 in \ref{SulzmannLuLexerTime}).
   131 in \ref{SulzmannLuLexerTime}).
   125 The reason they made this mistake was that
   132 They tested out the run time of their
   126 they tested out the run time of their
       
   127 lexer on particular examples such as $(a+b+ab)^*$
   133 lexer on particular examples such as $(a+b+ab)^*$
   128 and generalised to all cases, which
   134 and claimed that their algorithm is linear w.r.t to the input.
   129 cannot happen with our mecahnised proof.\\
   135 With our mecahnised proof, we avoid this type of unintentional
   130 We give details of the proof in the next sections.
   136 generalisation.\\
       
   137 
       
   138 Before delving in to the details of the formalisation,
       
   139 we are going to provide an overview of it.
       
   140 In the next subsection, we draw a picture of the bird's eye view
       
   141 of the proof.
       
   142 
   131 
   143 
   132 \subsection{Overview of the Proof}
   144 \subsection{Overview of the Proof}
   133 Here is a bird's eye view of how the proof of finiteness works,
   145 Here is a bird's eye view of the main components of the finiteness proof,
   134 which involves three steps:
   146 which involves three steps:
   135 \begin{figure}[H]
   147 \begin{figure}[H]
   136 	\begin{tikzpicture}[scale=1,font=\bf,
   148 	\begin{tikzpicture}[scale=1,font=\bf,
   137 		node/.style={
   149 		node/.style={
   138 			rectangle,rounded corners=3mm,
   150 			rectangle,rounded corners=3mm,
   185 		The key observation is that $\distinctBy$'s output is
   197 		The key observation is that $\distinctBy$'s output is
   186 		a list with a constant length bound.
   198 		a list with a constant length bound.
   187 \end{itemize}
   199 \end{itemize}
   188 We will expand on these steps in the next sections.\\
   200 We will expand on these steps in the next sections.\\
   189 
   201 
   190 \section{The $\textit{Rrexp}$ Datatype and Its Lexing-Related Functions}
   202 \section{The $\textit{Rrexp}$ Datatype}
   191 The first step is to define 
   203 The first step is to define 
   192 $\textit{rrexp}$s.
   204 $\textit{rrexp}$s.
   193 They are without bitcodes,
   205 They are without bitcodes,
   194 allowing a much simpler size bound proof.
   206 allowing a much simpler size bound proof.
   195 Of course, the bits which encode the lexing information 
   207 Of course, the bits which encode the lexing information 
   238 		$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
   250 		$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
   239 		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
   251 		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
   240 		$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
   252 		$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
   241 	\end{tabular}
   253 	\end{tabular}
   242 \end{center}
   254 \end{center}
   243 \noindent
   255 
   244 $\textit{Rerase}$ throws away the bitcodes 
   256 \subsection{Why a New Datatype?}
   245 on the annotated regular expressions, 
       
   246 but keeps everything else intact.
       
   247 Therefore it does not change the size
       
   248 of an annotated regular expression:
       
   249 \begin{lemma}\label{rsizeAsize}
       
   250 	$\rsize{\rerase a} = \asize a$
       
   251 \end{lemma}
       
   252 \begin{proof}
       
   253 	By routine structural induction on $a$.
       
   254 \end{proof}
       
   255 \noindent
       
   256 
       
   257 \subsection{Motivation Behind a New Datatype}
       
   258 The reason we take all the trouble 
   257 The reason we take all the trouble 
   259 defining a new datatype is that $\erase$ makes things harder.
   258 defining a new datatype is that $\erase$ makes things harder.
   260 We initially started by using 
   259 We initially started by using 
   261 plain regular expressions and tried to prove
   260 plain regular expressions and tried to prove
   262 the lemma \ref{rsizeAsize},
   261 the lemma \ref{rsizeAsize},
   280 The  annotated regular expression $\sum[a, b, c]$ would turn into
   279 The  annotated regular expression $\sum[a, b, c]$ would turn into
   281 $(a+(b+c))$.
   280 $(a+(b+c))$.
   282 All these operations change the size and structure of
   281 All these operations change the size and structure of
   283 an annotated regular expressions, adding unnecessary 
   282 an annotated regular expressions, adding unnecessary 
   284 complexities to the size bound proof.\\
   283 complexities to the size bound proof.\\
   285 For example, if we define the size of a basic regular expression 
   284 For example, if we define the size of a basic plain regular expression 
   286 in the usual way,
   285 in the usual way,
   287 \begin{center}
   286 \begin{center}
   288 	\begin{tabular}{ccc}
   287 	\begin{tabular}{ccc}
   289 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
   288 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
   290 		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
   289 		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
   295 	\end{tabular}
   294 	\end{tabular}
   296 \end{center}
   295 \end{center}
   297 \noindent
   296 \noindent
   298 Then the property
   297 Then the property
   299 \begin{center}
   298 \begin{center}
   300 	$\llbracket a \rrbracket = \llbracket a_\downarrow \rrbracket_p$
   299 	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
   301 \end{center}
   300 \end{center}
   302 does not hold.
   301 does not hold.
       
   302 With $\textit{rerase}$, however, 
       
   303 only the bitcodes are thrown away.
       
   304 Everything about the structure remains intact.
       
   305 Therefore it does not change the size
       
   306 of an annotated regular expression:
       
   307 \begin{lemma}\label{rsizeAsize}
       
   308 	$\rsize{\rerase a} = \asize a$
       
   309 \end{lemma}
       
   310 \begin{proof}
       
   311 	By routine structural induction on $a$.
       
   312 \end{proof}
       
   313 \noindent
   303 One might be able to prove an inequality such as
   314 One might be able to prove an inequality such as
   304 $\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
   315 $\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
   305 and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
   316 and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
   306 but we found our approach more straightforward.\\
   317 but we found our approach more straightforward.\\
   307 
   318 
   308 \subsection{Lexing Related Functions for $\rrexp$ such as $\backslash_r$, $\rdistincts$, and $\rsimp$}
   319 \subsection{Functions for R-regular Expressions}
   309 The operations on r-regular expressions are 
   320 We shall define the r-regular expression version
   310 almost identical to those of the annotated regular expressions,
   321 of $\blexer$ and $\blexersimp$ related functions.
   311 except that no bitcodes are used. For example,
   322 We use $r$ as the prefix or subscript to differentiate
   312 the derivative operation becomes simpler:\\
   323 with the bitcoded version.
       
   324 For example,$\backslash_r$, $\rdistincts$, and $\rsimp$
       
   325 as opposed to $\backslash$, $\distinctBy$, and $\bsimp$.
       
   326 As promised, they are much simpler than their bitcoded counterparts.
       
   327 %The operations on r-regular expressions are 
       
   328 %almost identical to those of the annotated regular expressions,
       
   329 %except that no bitcodes are used. For example,
       
   330 The derivative operation becomes simpler:\\
   313 \begin{center}
   331 \begin{center}
   314 	\begin{tabular}{@{}lcl@{}}
   332 	\begin{tabular}{@{}lcl@{}}
   315 		$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\  
   333 		$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\  
   316 		$(\ONE)\,\backslash_r c$ & $\dn$ &
   334 		$(\ONE)\,\backslash_r c$ & $\dn$ &
   317 		$\textit{if}\;c=d\; \;\textit{then}\;
   335 		$\textit{if}\;c=d\; \;\textit{then}\;
   402 \end{center}
   420 \end{center}
   403 \noindent
   421 \noindent
   404 We do not define an r-regular expression version of $\blexersimp$,
   422 We do not define an r-regular expression version of $\blexersimp$,
   405 as our proof does not involve its use 
   423 as our proof does not involve its use 
   406 (and there is no bitcode to decode into a lexical value). 
   424 (and there is no bitcode to decode into a lexical value). 
   407 Everything about the size of annotated regular expressions
   425 Now we are ready to introduce how r-regular expressions allow
   408 can be calculated via the size of r-regular expressions:
   426 us to prove the size bound on bitcoded regular expressions.
       
   427 
       
   428 \subsection{Using R-regular Expressions to Bound Bit-coded Regular Expressions}
       
   429 Everything about the size of annotated regular expressions after the application
       
   430 of function $\bsimp$ and $\backslash_{simps}$
       
   431 can be calculated via the size of r-regular expressions after the application
       
   432 of $\rsimp$ and $\backslash_{rsimps}$:
   409 \begin{lemma}\label{sizeRelations}
   433 \begin{lemma}\label{sizeRelations}
   410 	The following equalities hold:
   434 	The following equalities hold:
   411 	\begin{itemize}
   435 	\begin{itemize}
   412 		\item
   436 		\item
   413 			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
   437 			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
   430 \begin{center}
   454 \begin{center}
   431 	$\rsize{\rderssimp{\rerase{a}}{s}} \leq N_r \quad$
   455 	$\rsize{\rderssimp{\rerase{a}}{s}} \leq N_r \quad$
   432 	implies
   456 	implies
   433 	$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
   457 	$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
   434 \end{center}
   458 \end{center}
       
   459 From now on we 
   435 Unless stated otherwise in the rest of this 
   460 Unless stated otherwise in the rest of this 
   436 chapter all regular expressions without
   461 chapter all regular expressions without
   437 bitcodes are seen as r-regular expressions ($\rrexp$s).
   462 bitcodes are seen as r-regular expressions ($\rrexp$s).
   438 For the binary alternative r-regular expression $\RALTS{[r_1, r_2]}$,
   463 For the binary alternative r-regular expression $\RALTS{[r_1, r_2]}$,
   439 we use the notation $r_1 + r_2$
   464 we use the notation $r_1 + r_2$
   468 %This way we get a different but equivalent way of expressing : $r\backslash s = f \; (g\; (\sum rs))$, we call the
   493 %This way we get a different but equivalent way of expressing : $r\backslash s = f \; (g\; (\sum rs))$, we call the
   469 %right hand side the "closed form" of $r\backslash s$.
   494 %right hand side the "closed form" of $r\backslash s$.
   470 %
   495 %
   471 %\begin{quote}\it
   496 %\begin{quote}\it
   472 %	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
   497 %	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
   473 %	\begin{center}
       
   474 %		$ \rderssimp{r_1 \cdot r_2}{s} = 
       
   475 %		\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
       
   476 %	\end{center}
       
   477 %\end{quote}
   498 %\end{quote}
   478 %\noindent
   499 %\noindent
   479 %We explain in detail how we reached those claims.
   500 %We explain in detail how we reached those claims.
   480 \subsection{The Idea Behind Closed Forms}
       
   481 If we attempt to prove 
   501 If we attempt to prove 
   482 \begin{center}
   502 \begin{center}
   483 	$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{rsimps} s \rrbracket_r \leq N_r$
   503 	$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{rsimps} s \rrbracket_r \leq N_r$
   484 \end{center}
   504 \end{center}
   485 using a naive induction on the structure of $r$,
   505 using a naive induction on the structure of $r$,
   501 $(r_1\cdot r_2) \backslash_{rsimps} s$ looks like,
   521 $(r_1\cdot r_2) \backslash_{rsimps} s$ looks like,
   502 and therefore $N_{r_1}$ and $N_{r_2}$ in the
   522 and therefore $N_{r_1}$ and $N_{r_2}$ in the
   503 inductive hypotheses cannot be directly used.
   523 inductive hypotheses cannot be directly used.
   504 We have already seen that $(r_1 \cdot r_2)\backslash s$ 
   524 We have already seen that $(r_1 \cdot r_2)\backslash s$ 
   505 and $(r^*)\backslash s$ can grow in a wild way.
   525 and $(r^*)\backslash s$ can grow in a wild way.
       
   526 
   506 The point is that they will be equivalent to a list of
   527 The point is that they will be equivalent to a list of
   507 terms $\sum rs$, where each term in $rs$ will
   528 terms $\sum rs$, where each term in $rs$ will
   508 be made of $r_1 \backslash s' $, $r_2\backslash s'$,
   529 be made of $r_1 \backslash s' $, $r_2\backslash s'$,
   509 and $r \backslash s'$ with $s' \in \textit{SubString} \; s$.
   530 and $r \backslash s'$ with $s' \in \textit{SubString} \; s$.
   510 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
   531 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
   511 in the simplification which saves $rs$ from growing indefinitely.
   532 in the simplification which saves $rs$ from growing indefinitely.
   512 
   533 
   513 Based on this idea, we sketch a proof by first showing the equality (where
   534 Based on this idea, we develop a proof in two steps.
       
   535 First, we show the equality (where
   514 $f$ and $g$ are functions that do not increase the size of the input)
   536 $f$ and $g$ are functions that do not increase the size of the input)
   515 \begin{center}
   537 \begin{center}
   516 $(r_1 \cdot r_2)\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
   538 $r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
   517 \end{center}
   539 \end{center}
   518 and then show the right-hand-side can be finitely bounded.
   540 where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
       
   541 For example, for $r_1 \cdot r_2$ we have the equality as
       
   542 	\begin{center}
       
   543 		$ \rderssimp{r_1 \cdot r_2}{s} = 
       
   544 		\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
       
   545 	\end{center}
   519 We call the right-hand-side the 
   546 We call the right-hand-side the 
   520 \emph{Closed Form} of $(r_1 \cdot r_2)\backslash_{rsimps} s$.
   547 \emph{Closed Form} of $(r_1 \cdot r_2)\backslash_{rsimps} s$.
   521 We will flesh out the proof sketch in the next section.
   548 Second, we will bound the closed form of r-regular expressions
   522 
   549 using some estimation techniques
   523 \section{Details of Closed Forms and Bounds}
   550 and then piece it together
       
   551 with lemma \ref{sizeRelations} to show the bitcoded regular expressions
       
   552 in our $\blexersimp$ are finitely bounded.
       
   553 
       
   554 We will flesh out the first step of the proof we 
       
   555 sketched just now in the next section.
       
   556 
       
   557 \section{Closed Forms}
   524 In this section we introduce in detail
   558 In this section we introduce in detail
   525 how the closed forms are obtained for regular expressions'
   559 how the closed forms are obtained for regular expressions'
   526 derivatives and how they are bounded.
   560 derivatives.
   527 We start by proving some basic identities and inequalities
   561 We start by proving some basic identities
   528 involving the simplification functions for r-regular expressions.
   562 involving the simplification functions for r-regular expressions.
   529 After that we use these identities to establish the
   563 After that we introduce the rewrite relations
   530 closed forms we need.
   564 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
   531 Finally, we prove the functions such as $\flts$
   565 $\rightsquigarrow_f$ and $\rightsquigarrow_g$.
   532 will keep the size non-increasing.
   566 These relations involves similar techniques in chapter \ref{Bitcoded2}.
   533 Putting this together with a general bound 
   567 Finally, we use these identities to establish the
   534 on the finiteness of distinct regular expressions
   568 closed forms of the alternative regular expression,
   535 smaller than a certain size, we obtain a bound on 
   569 the sequence regular expression, and the star regular expression.
   536 the closed forms.
       
   537 %$r_1\cdot r_2$, $r^*$ and $\sum rs$.
   570 %$r_1\cdot r_2$, $r^*$ and $\sum rs$.
   538 
   571 
   539 
   572 
   540 
   573 
   541 \subsection{Some Basic Identities and Inequalities}
   574 \subsection{Some Basic Identities}
   542 
   575 
   543 In this subsection, we introduce lemmas 
   576 We now introduce lemmas 
   544 that are repeatedly used in later proofs.
   577 that are repeatedly used in later proofs.
   545 Note that for the $\textit{rdistinct}$ function there
   578 Note that for the $\textit{rdistinct}$ function there
   546 will be a lot of conversion from lists to sets.
   579 will be a lot of conversion from lists to sets.
   547 We use the name $set$ to refere to the 
   580 We use $set$ to refere to the 
   548 function that converts a list $rs$ to the set
   581 function that converts a list $rs$ to the set
   549 containing all the elements in $rs$.
   582 containing all the elements in $rs$.
   550 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
   583 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
   551 The $\textit{rdistinct}$ function, as its name suggests, will
   584 The $\textit{rdistinct}$ function, as its name suggests, will
   552 remove duplicates in an r-regular expression list.
   585 de-duplicate an r-regular expression list.
   553 It will also correctly exclude any elements that 
   586 It will also remove any elements that 
   554 is intially in the accumulator set.
   587 is already in the accumulator set.
   555 \begin{lemma}\label{rdistinctDoesTheJob}
   588 \begin{lemma}\label{rdistinctDoesTheJob}
   556 	%The function $\textit{rdistinct}$ satisfies the following
   589 	%The function $\textit{rdistinct}$ satisfies the following
   557 	%properties:
   590 	%properties:
   558 	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
   591 	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
   559 	recursive definition here, its Isabelle counterpart would be $\textit{distinct}$.} 
   592 	recursive definition here, its Isabelle counterpart would be $\textit{distinct}$.} 
   560 	readily defined
   593 	readily defined
   561 	for testing
   594 	for testing
   562 	whether a list's elements are all unique. Then the following
   595 	whether a list's elements are unique. Then the following
   563 	properties about $\textit{rdistinct}$ hold:
   596 	properties about $\textit{rdistinct}$ hold:
   564 	\begin{itemize}
   597 	\begin{itemize}
   565 		\item
   598 		\item
   566 			If $a \in acc$ then $a \notin (\rdistinct{rs}{acc})$.
   599 			If $a \in acc$ then $a \notin (\rdistinct{rs}{acc})$.
   567 		\item
   600 		\item
   579 	inductive cases of $\textit{rdistinct}$.
   612 	inductive cases of $\textit{rdistinct}$.
   580 
   613 
   581 \end{proof}
   614 \end{proof}
   582 
   615 
   583 \noindent
   616 \noindent
   584 $\textit{rdistinct}$ will cancel out all regular expression terms
   617 %$\textit{rdistinct}$ will out all regular expression terms
   585 that are in the accumulator, therefore prepending a list $rs_a$ with an arbitrary
   618 %that are in the accumulator, therefore 
   586 list $rs$ whose elements are all from the accumulator, and then call $\textit{rdistinct}$
   619 Concatenating a list $rs_a$ at the front of another
   587 on the resulting list, the output will be as if we had called $\textit{rdistinct}$
   620 list $rs$ whose elements are all from the accumulator, and then calling $\textit{rdistinct}$
       
   621 on the merged list, the output will be as if we had called $\textit{rdistinct}$
   588 without the prepending of $rs$:
   622 without the prepending of $rs$:
   589 \begin{lemma}\label{rdistinctConcat}
   623 \begin{lemma}\label{rdistinctConcat}
   590 	The elements appearing in the accumulator will always be removed.
   624 	The elements appearing in the accumulator will always be removed.
   591 	More precisely,
   625 	More precisely,
   592 	\begin{itemize}
   626 	\begin{itemize}
   670 \end{lemma}
   704 \end{lemma}
   671 \begin{proof}
   705 \begin{proof}
   672 	By \ref{rdistinctDoesTheJob} and \ref{distinctRemovesMiddle}.
   706 	By \ref{rdistinctDoesTheJob} and \ref{distinctRemovesMiddle}.
   673 \end{proof}
   707 \end{proof}
   674 \noindent
   708 \noindent
       
   709 The next lemma is a more general form of \ref{rdistinctConcat},
       
   710 it says that
   675 $\textit{rdistinct}$ is composable w.r.t list concatenation:
   711 $\textit{rdistinct}$ is composable w.r.t list concatenation:
   676 \begin{lemma}\label{distinctRdistinctAppend}
   712 \begin{lemma}\label{distinctRdistinctAppend}
   677 			If $\;\; \textit{isDistinct} \; rs_1$, 
   713 			If $\;\; \textit{isDistinct} \; rs_1$, 
   678 			and $(set \; rs_1) \cap acc = \varnothing$,
   714 			and $(set \; rs_1) \cap acc = \varnothing$,
   679 			then applying $\textit{rdistinct}$ on $rs_1 @ rs_a$ does not 
   715 			then applying $\textit{rdistinct}$ on $rs_1 @ rs_a$ does not 
   747 the alternative regular expression:
   783 the alternative regular expression:
   748 \begin{lemma}\label{rderRsimpAltsCommute}
   784 \begin{lemma}\label{rderRsimpAltsCommute}
   749 	$\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$
   785 	$\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$
   750 \end{lemma}
   786 \end{lemma}
   751 \noindent
   787 \noindent
   752 \subsubsection{$\textit{rsimp}$ Does Not Increment the Size}
       
   753 Although it seems evident, we need a series
       
   754 of non-trivial lemmas to establish this property.
       
   755 \begin{lemma}\label{rsimpMonoLemmas}
       
   756 	\mbox{}
       
   757 	\begin{itemize}
       
   758 		\item
       
   759 			\[
       
   760 				\llbracket \rsimpalts \; rs \rrbracket_r \leq
       
   761 				\llbracket \sum \; rs \rrbracket_r
       
   762 			\]
       
   763 		\item
       
   764 			\[
       
   765 				\llbracket \rsimpseq \; r_1 \;  r_2 \rrbracket_r \leq
       
   766 				\llbracket r_1 \cdot r_2 \rrbracket_r
       
   767 			\]
       
   768 		\item
       
   769 			\[
       
   770 				\llbracket \rflts \; rs \rrbracket_r  \leq
       
   771 				\llbracket rs \rrbracket_r 
       
   772 			\]
       
   773 		\item
       
   774 			\[
       
   775 				\llbracket \rDistinct \; rs \; ss \rrbracket_r  \leq
       
   776 				\llbracket rs \rrbracket_r 
       
   777 			\]
       
   778 		\item
       
   779 			If all elements $a$ in the set $as$ satisfy the property
       
   780 			that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
       
   781 			\llbracket a \rrbracket_r$, then we have 
       
   782 			\[
       
   783 				\llbracket \; \rsimpalts \; (\textit{rdistinct} \;
       
   784 				(\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
       
   785 				\rrbracket \leq
       
   786 				\llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
       
   787 				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r 
       
   788 			\]
       
   789 	\end{itemize}
       
   790 \end{lemma}
       
   791 \begin{proof}
       
   792 	Point 1, 3, 4 can be proven by an induction on $rs$.
       
   793 	Point 2 is by case analysis on $r_1$ and $r_2$.
       
   794 	The last part is a corollary of the previous ones.
       
   795 \end{proof}
       
   796 \noindent
       
   797 With the lemmas for each inductive case in place, we are ready to get 
       
   798 the non-increasing property as a corollary:
       
   799 \begin{corollary}\label{rsimpMono}
       
   800 	$\llbracket \textit{rsimp} \; r \rrbracket_r$
       
   801 \end{corollary}
       
   802 \begin{proof}
       
   803 	By \ref{rsimpMonoLemmas}.
       
   804 \end{proof}
       
   805 
       
   806 \subsubsection{The $RL$ Function: Language Interpretation of $\textit{Rrexp}$s}
       
   807 Much like the definition of $L$ on plain regular expressions, one could also 
       
   808 define the language interpretation of $\rrexp$s.
       
   809 \begin{center}
       
   810 	\begin{tabular}{lcl}
       
   811 		$RL \; (\ZERO)$ & $\dn$ & $\phi$\\
       
   812 		$RL \; (\ONE)$ & $\dn$ & $\{[]\}$\\
       
   813 		$RL \; (c)$ & $\dn$ & $\{[c]\}$\\
       
   814 		$RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
       
   815 		$RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
       
   816 		$RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$
       
   817 	\end{tabular}
       
   818 \end{center}
       
   819 \noindent
       
   820 The main use of $RL$ is to establish some connections between $\rsimp{}$ 
       
   821 and $\rnullable{}$:
       
   822 \begin{lemma}
       
   823 	The following properties hold:
       
   824 	\begin{itemize}
       
   825 		\item
       
   826 			If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
       
   827 		\item
       
   828 			$\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
       
   829 	\end{itemize}
       
   830 \end{lemma}
       
   831 \begin{proof}
       
   832 	The first part is by induction on $r$. 
       
   833 	The second part is true because property 
       
   834 	\[ RL \; r = RL \; (\rsimp{r})\] holds.
       
   835 \end{proof}
       
   836 
       
   837 \subsubsection{$\rsimp{}$ is Non-Increasing}
       
   838 In this subsection, we prove that the function $\rsimp{}$ does not 
       
   839 make the $\llbracket \rrbracket_r$ size increase.
       
   840 
       
   841 
       
   842 \begin{lemma}\label{rsimpSize}
       
   843 	$\llbracket \rsimp{r} \rrbracket_r \leq \llbracket r \rrbracket_r$
       
   844 \end{lemma}
       
   845 \subsubsection{Simplified $\textit{Rrexp}$s are Good}
       
   846 We formalise the notion of ``good" regular expressions,
       
   847 which means regular expressions that
       
   848 are not fully simplified. For alternative regular expressions that means they
       
   849 do not contain any nested alternatives like 
       
   850 \[ r_1 + (r_2 + r_3) \], un-removed $\RZERO$s like \[\RZERO + r\]
       
   851 or duplicate elements in a children regular expression list like \[ \sum [r, r, \ldots]\]:
       
   852 \begin{center}
       
   853 	\begin{tabular}{@{}lcl@{}}
       
   854 		$\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
       
   855 		$\good\; \RONE$ & $\dn$ & $\textit{true}$\\
       
   856 		$\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
       
   857 		$\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
       
   858 		$\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
       
   859 		$\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ & 
       
   860 		$\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
       
   861 						   & & $\textit{and}\; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \,  \textit{and}\; \, \textit{nonAlt}\; r')$\\
       
   862 		$\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
       
   863 		$\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
       
   864 		$\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
       
   865 		$\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
       
   866 		$\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
       
   867 	\end{tabular}
       
   868 \end{center}
       
   869 \noindent
       
   870 The predicate $\textit{nonAlt}$ evaluates to true when the regular expression is not an
       
   871 alternative, and false otherwise.
       
   872 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
       
   873 its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$, 
       
   874 and unique:
       
   875 \begin{lemma}\label{rsimpaltsGood}
       
   876 	If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
       
   877 	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
       
   878 \end{lemma}
       
   879 \noindent
       
   880 We also note that
       
   881 if a regular expression $r$ is good, then $\rflts$ on the singleton
       
   882 list $[r]$ will not break goodness:
       
   883 \begin{lemma}\label{flts2}
       
   884 	If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
       
   885 \end{lemma}
       
   886 \begin{proof}
       
   887 	By an induction on $r$.
       
   888 \end{proof}
       
   889 \noindent
       
   890 The other observation we make about $\rsimp{r}$ is that it never
       
   891 comes with nested alternatives, which we describe as the $\nonnested$
       
   892 property:
       
   893 \begin{center}
       
   894 	\begin{tabular}{lcl}
       
   895 		$\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
       
   896 		$\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
       
   897 		$\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
       
   898 		$\nonnested \; \, r $ & $\dn$ & $\btrue$
       
   899 	\end{tabular}
       
   900 \end{center}
       
   901 \noindent
       
   902 The $\rflts$ function
       
   903 always opens up nested alternatives,
       
   904 which enables $\rsimp$ to be non-nested:
       
   905 
       
   906 \begin{lemma}\label{nonnestedRsimp}
       
   907 	$\nonnested \; (\rsimp{r})$
       
   908 \end{lemma}
       
   909 \begin{proof}
       
   910 	By an induction on $r$.
       
   911 \end{proof}
       
   912 \noindent
       
   913 With this we could prove that a regular expressions
       
   914 after simplification and flattening and de-duplication,
       
   915 will not contain any alternative regular expression directly:
       
   916 \begin{lemma}\label{nonaltFltsRd}
       
   917 	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ 
       
   918 	then $\textit{nonAlt} \; x$.
       
   919 \end{lemma}
       
   920 \begin{proof}
       
   921 	By \ref{nonnestedRsimp}.
       
   922 \end{proof}
       
   923 \noindent
       
   924 The other thing we know is that once $\rsimp{}$ had finished
       
   925 processing an alternative regular expression, it will not
       
   926 contain any $\RZERO$s, this is because all the recursive 
       
   927 calls to the simplification on the children regular expressions
       
   928 make the children good, and $\rflts$ will not take out
       
   929 any $\RZERO$s out of a good regular expression list,
       
   930 and $\rdistinct{}$ will not mess with the result.
       
   931 \begin{lemma}\label{flts3Obv}
       
   932 	The following are true:
       
   933 	\begin{itemize}
       
   934 		\item
       
   935 			If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
       
   936 			then for all $r \in \rflts\; rs. \, \good \; r$.
       
   937 		\item
       
   938 			If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
       
   939 			and for all $y$ such that $\llbracket y \rrbracket_r$ less than
       
   940 			$\llbracket rs \rrbracket_r + 1$, either
       
   941 			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
       
   942 			then $\good \; x$.
       
   943 	\end{itemize}
       
   944 \end{lemma}
       
   945 \begin{proof}
       
   946 	The first part is by induction on $rs$, where the induction
       
   947 	rule is the inductive cases for $\rflts$.
       
   948 	The second part is a corollary from the first part.
       
   949 \end{proof}
       
   950 
       
   951 And this leads to good structural property of $\rsimp{}$,
       
   952 that after simplification, a regular expression is
       
   953 either good or $\RZERO$:
       
   954 \begin{lemma}\label{good1}
       
   955 	For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
       
   956 \end{lemma}
       
   957 \begin{proof}
       
   958 	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
       
   959 	Lemma \ref{rsimpSize} says that 
       
   960 	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
       
   961 	$\llbracket r \rrbracket_r$.
       
   962 	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
       
   963 	Inductive hypothesis applies to the children regular expressions
       
   964 	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
       
   965 	by that as well.
       
   966 	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
       
   967 	to ensure that goodness is preserved at the topmost level.
       
   968 \end{proof}
       
   969 We shall prove that any good regular expression is 
       
   970 a fixed-point for $\rsimp{}$.
       
   971 First we prove an auxiliary lemma:
       
   972 \begin{lemma}\label{goodaltsNonalt}
       
   973 	If $\good \; \sum rs$, then $\rflts\; rs = rs$.
       
   974 \end{lemma}
       
   975 \begin{proof}
       
   976 	By an induction on $\sum rs$. The inductive rules are the cases
       
   977 	for $\good$.
       
   978 \end{proof}
       
   979 \noindent
       
   980 Now we are ready to prove that good regular expressions are invariant
       
   981 of $\rsimp{}$ application:
       
   982 \begin{lemma}\label{test}
       
   983 	If $\good \;r$ then $\rsimp{r} = r$.
       
   984 \end{lemma}
       
   985 \begin{proof}
       
   986 	By an induction on the inductive cases of $\good$, using lemmas
       
   987 	\ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
       
   988 	The lemma \ref{goodaltsNonalt} is used in the alternative
       
   989 	case where 2 or more elements are present in the list.
       
   990 \end{proof}
       
   991 \noindent
       
   992 Given below is a property involving $\rflts$, $\rdistinct{}{}$, $\rsimp{}$ and $\rsimp_{ALTS}$,
       
   993 which requires $\ref{good1}$ to go through smoothly.
       
   994 It says that an application of $\rsimp_{ALTS}$ can be "absorbed",
       
   995 if it its output is concatenated with a list and then applied to $\rflts$.
       
   996 \begin{lemma}\label{flattenRsimpalts}
       
   997 	$\rflts \; ( (\rsimp_{ALTS} \; 
       
   998 	(\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) :: 
       
   999 	\map \; \rsimp{} \; rs' ) = 
       
  1000 	\rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
       
  1001 	\map \; \rsimp{rs'}))$
       
  1002 
       
  1003 
       
  1004 \end{lemma}
       
  1005 \begin{proof}
       
  1006 	By \ref{good1}.
       
  1007 \end{proof}
       
  1008 \noindent
       
  1009 
       
  1010 
       
  1011 
       
  1012 
       
  1013 
       
  1014 We are also ready to prove that $\textit{rsimp}$ is idempotent.
       
  1015 \subsubsection{$\rsimp$ is Idempotent}
       
  1016 The idempotency of $\rsimp$ is very useful in 
       
  1017 manipulating regular expression terms into desired
       
  1018 forms so that key steps allowing further rewriting to closed forms
       
  1019 are possible.
       
  1020 \begin{lemma}\label{rsimpIdem}
       
  1021 	$\rsimp{r} = \rsimp{\rsimp{r}}$
       
  1022 \end{lemma}
       
  1023 
       
  1024 \begin{proof}
       
  1025 	By \ref{test} and \ref{good1}.
       
  1026 \end{proof}
       
  1027 \noindent
       
  1028 This property means we do not have to repeatedly
       
  1029 apply simplification in each step, which justifies
       
  1030 our definition of $\blexersimp$.
       
  1031 
       
  1032 
       
  1033 On the other hand, we could repeat the same $\rsimp{}$ applications
       
  1034 on regular expressions as many times as we want, if we have at least
       
  1035 one simplification applied to it, and apply it wherever we would like to:
       
  1036 \begin{corollary}\label{headOneMoreSimp}
       
  1037 	The following properties hold, directly from \ref{rsimpIdem}:
       
  1038 
       
  1039 	\begin{itemize}
       
  1040 		\item
       
  1041 			$\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
       
  1042 		\item
       
  1043 			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
       
  1044 	\end{itemize}
       
  1045 \end{corollary}
       
  1046 \noindent
       
  1047 This will be useful in later closed form proof's rewriting steps.
       
  1048 Similarly, we point out the following useful facts below:
       
  1049 \begin{lemma}
       
  1050 	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
       
  1051 	\begin{itemize}
       
  1052 		\item
       
  1053 			If $r  = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
       
  1054 		\item
       
  1055 			If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
       
  1056 		\item
       
  1057 			$\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
       
  1058 	\end{itemize}
       
  1059 \end{lemma}
       
  1060 \begin{proof}
       
  1061 	By application of lemmas \ref{rsimpIdem} and \ref{good1}.
       
  1062 \end{proof}
       
  1063 
       
  1064 \noindent
       
  1065 With the idempotency of $\rsimp{}$ and its corollaries, 
       
  1066 we can start proving some key equalities leading to the 
       
  1067 closed forms.
       
  1068 Now presented are a few equivalent terms under $\rsimp{}$.
       
  1069 We use $r_1 \sequal r_2 $ here to denote $\rsimp{r_1} = \rsimp{r_2}$.
       
  1070 \begin{lemma}
       
  1071 	\begin{itemize}
       
  1072 		The following equivalence hold:
       
  1073 	\item
       
  1074 		$\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
       
  1075 	\item
       
  1076 		$\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
       
  1077 	\item
       
  1078 		$\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
       
  1079 	\item
       
  1080 		$\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
       
  1081 	\item
       
  1082 		$\RALTS{rs} = \RALTS{\map \; \rsimp{} \; rs}$
       
  1083 \end{itemize}
       
  1084 \end{lemma}
       
  1085 \begin{proof}
       
  1086 	By induction on the lists involved.
       
  1087 \end{proof}
       
  1088 \noindent
       
  1089 The above allows us to prove
       
  1090 two similar equalities (which are a bit more involved).
       
  1091 It says that we could flatten out the elements
       
  1092 before simplification and still get the same result.
       
  1093 \begin{lemma}\label{simpFlatten3}
       
  1094 	One can flatten the inside $\sum$ of a $\sum$ if it is being 
       
  1095 	simplified. Concretely,
       
  1096 	\begin{itemize}
       
  1097 		\item
       
  1098 			If for all $r \in rs, rs', rs''$, we have $\good \; r $
       
  1099 			or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal 
       
  1100 			\sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
       
  1101 		\item
       
  1102 			$\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
       
  1103 	\end{itemize}
       
  1104 \end{lemma}
       
  1105 \begin{proof}
       
  1106 	By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
       
  1107 	The second sub-lemma is a corollary of the previous.
       
  1108 \end{proof}
       
  1109 %Rewriting steps not put in--too long and complicated-------------------------------
       
  1110 \begin{comment}
       
  1111 	\begin{center}
       
  1112 		$\rsimp{\sum (rs' @ rs @ rs'')}  \stackrel{def of bsimp}{=}$  \\
       
  1113 		$\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
       
  1114 		$\stackrel{by \ref{test}}{=} 
       
  1115 		\rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
       
  1116 		\varnothing})$\\
       
  1117 		$\stackrel{by \ref{rdistinctConcatGeneral}}{=}
       
  1118 		\rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
       
  1119 		\rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
       
  1120 
       
  1121 	\end{center}
       
  1122 \end{comment}
       
  1123 %Rewriting steps not put in--too long and complicated-------------------------------
       
  1124 \noindent
       
  1125 We need more equalities like the above to enable a closed form,
   788 We need more equalities like the above to enable a closed form,
  1126 but to proceed we need to introduce two rewrite relations,
   789 for which we need to introduce a few rewrite relations
  1127 to make things smoother.
   790 to help
       
   791 us obtain them.
       
   792 
  1128 \subsection{The rewrite relation $\hrewrite$ , $\scfrewrites$ , $\frewrite$ and $\grewrite$}
   793 \subsection{The rewrite relation $\hrewrite$ , $\scfrewrites$ , $\frewrite$ and $\grewrite$}
  1129 Insired by the success we had in the correctness proof 
   794 Inspired by the success we had in the correctness proof 
  1130 in \ref{Bitcoded2}, where we invented
   795 in \ref{Bitcoded2},
  1131 a term rewriting system to capture the similarity between terms,
   796 we follow suit here, defining atomic simplification
  1132 we follow suit here defining simplification
   797 steps as ``small-step'' rewriting steps. This allows capturing 
  1133 steps as rewriting steps. This allows capturing 
       
  1134 similarities between terms that would be otherwise
   798 similarities between terms that would be otherwise
  1135 hard to express.
   799 hard to express.
  1136 
   800 
  1137 We use $\hrewrite$ for one-step atomic rewrite of 
   801 We use $\hrewrite$ for one-step atomic rewrite of 
  1138 regular expression simplification, 
   802 regular expression simplification, 
  1139 $\frewrite$ for rewrite of list of regular expressions that 
   803 $\frewrite$ for rewrite of list of regular expressions that 
  1140 include all operations carried out in $\rflts$, and $\grewrite$ for
   804 include all operations carried out in $\rflts$, and $\grewrite$ for
  1141 rewriting a list of regular expressions possible in both $\rflts$ and $\rdistinct{}{}$.
   805 rewriting a list of regular expressions possible in both $\rflts$ and $\textit{rdistinct}$.
  1142 Their reflexive transitive closures are used to denote zero or many steps,
   806 Their reflexive transitive closures are used to denote zero or many steps,
  1143 as was the case in the previous chapter.
   807 as was the case in the previous chapter.
  1144 The presentation will be more concise than that in \ref{Bitcoded2}.
   808 As we have already
       
   809 done something similar, the presentation about
       
   810 these rewriting rules will be more concise than that in \ref{Bitcoded2}.
  1145 To differentiate between the rewriting steps for annotated regular expressions
   811 To differentiate between the rewriting steps for annotated regular expressions
  1146 and $\rrexp$s, we add characters $h$ and $g$ below the squig arrow symbol
   812 and $\rrexp$s, we add characters $h$ and $g$ below the squig arrow symbol
  1147 to mean atomic simplification transitions 
   813 to mean atomic simplification transitions 
  1148 of $\rrexp$s and $\rrexp$ lists, respectively.
   814 of $\rrexp$s and $\rrexp$ lists, respectively.
  1149 
   815 
  1150 
   816 
  1151 
   817 
  1152 List of one-step rewrite rules for $\rrexp$ ($\hrewrite$):
   818 
  1153 
   819 \begin{figure}[H]
  1154 
       
  1155 \begin{center}
   820 \begin{center}
  1156 	\begin{mathpar}
   821 	\begin{mathpar}
  1157 		\inferrule[RSEQ0L]{}{\RZERO \cdot r_2 \hrewrite \RZERO\\}
   822 		\inferrule[RSEQ0L]{}{\RZERO \cdot r_2 \hrewrite \RZERO\\}
  1158 
   823 
  1159 		\inferrule[RSEQ0R]{}{r_1 \cdot \RZERO \hrewrite \RZERO\\}
   824 		\inferrule[RSEQ0R]{}{r_1 \cdot \RZERO \hrewrite \RZERO\\}
  1176 
   841 
  1177 		\inferrule[RALTSDelete]{\\ r_1 = r_2}{\sum rs_a @ [r_1] @ rs_b @ [r_2] @ rsc \hrewrite \sum rs_a @ [r_1] @ rs_b @ rs_c}
   842 		\inferrule[RALTSDelete]{\\ r_1 = r_2}{\sum rs_a @ [r_1] @ rs_b @ [r_2] @ rsc \hrewrite \sum rs_a @ [r_1] @ rs_b @ rs_c}
  1178 
   843 
  1179 	\end{mathpar}
   844 	\end{mathpar}
  1180 \end{center}
   845 \end{center}
  1181 
   846 \caption{List of one-step rewrite rules for r-regular expressions ($\hrewrite$)}\label{hRewrite}
  1182 
   847 \end{figure}
  1183 List of rewrite rules for a list of regular expressions,
   848 
       
   849 
       
   850 Like $\rightsquigarrow_s$, it is
       
   851 convenient to define rewrite rules for a list of regular expressions,
  1184 where each element can rewrite in many steps to the other (scf stands for
   852 where each element can rewrite in many steps to the other (scf stands for
  1185 li\emph{s}t \emph{c}losed \emph{f}orm). This relation is similar to the 
   853 li\emph{s}t \emph{c}losed \emph{f}orm). This relation is similar to the 
  1186 $\stackrel{s*}{\rightsquigarrow}$ for annotated regular expressions.
   854 $\stackrel{s*}{\rightsquigarrow}$ for annotated regular expressions.
  1187 
   855 
       
   856 \begin{figure}[H]
  1188 \begin{center}
   857 \begin{center}
  1189 	\begin{mathpar}
   858 	\begin{mathpar}
  1190 		\inferrule{}{[] \scfrewrites [] }
   859 		\inferrule{}{[] \scfrewrites [] }
       
   860 
  1191 		\inferrule{r \hrewrites r' \\ rs \scfrewrites rs'}{r :: rs \scfrewrites r' :: rs'}
   861 		\inferrule{r \hrewrites r' \\ rs \scfrewrites rs'}{r :: rs \scfrewrites r' :: rs'}
  1192 	\end{mathpar}
   862 	\end{mathpar}
  1193 \end{center}
   863 \end{center}
       
   864 \caption{List of one-step rewrite rules for a list of r-regular expressions}\label{scfRewrite}
       
   865 \end{figure}
  1194 %frewrite
   866 %frewrite
  1195 List of one-step rewrite rules for flattening 
   867 List of one-step rewrite rules for flattening 
  1196 a list of  regular expressions($\frewrite$):
   868 a list of  regular expressions($\frewrite$):
       
   869 \begin{figure}[H]
  1197 \begin{center}
   870 \begin{center}
  1198 	\begin{mathpar}
   871 	\begin{mathpar}
  1199 		\inferrule{}{\RZERO :: rs \frewrite rs \\}
   872 		\inferrule{}{\RZERO :: rs \frewrite rs \\}
  1200 
   873 
  1201 		\inferrule{}{(\sum rs) :: rs_a \frewrite rs @ rs_a \\}
   874 		\inferrule{}{(\sum rs) :: rs_a \frewrite rs @ rs_a \\}
  1202 
   875 
  1203 		\inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2}
   876 		\inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2}
  1204 	\end{mathpar}
   877 	\end{mathpar}
  1205 \end{center}
   878 \end{center}
       
   879 \caption{List of one-step rewrite rules characterising the $\rflts$ operation on a list}\label{fRewrites}
       
   880 \end{figure}
  1206 
   881 
  1207 Lists of one-step rewrite rules for flattening and de-duplicating
   882 Lists of one-step rewrite rules for flattening and de-duplicating
  1208 a list of regular expressions ($\grewrite$):
   883 a list of regular expressions ($\grewrite$):
       
   884 \begin{figure}[H]
  1209 \begin{center}
   885 \begin{center}
  1210 	\begin{mathpar}
   886 	\begin{mathpar}
  1211 		\inferrule{}{\RZERO :: rs \grewrite rs \\}
   887 		\inferrule{}{\RZERO :: rs \grewrite rs \\}
  1212 
   888 
  1213 		\inferrule{}{(\sum rs) :: rs_a \grewrite rs @ rs_a \\}
   889 		\inferrule{}{(\sum rs) :: rs_a \grewrite rs @ rs_a \\}
  1215 		\inferrule{rs_1 \grewrite rs_2}{r :: rs_1 \grewrite r :: rs_2}
   891 		\inferrule{rs_1 \grewrite rs_2}{r :: rs_1 \grewrite r :: rs_2}
  1216 
   892 
  1217 		\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
   893 		\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
  1218 	\end{mathpar}
   894 	\end{mathpar}
  1219 \end{center}
   895 \end{center}
  1220 
   896 \caption{List of one-step rewrite rules characterising the $\rflts$ and $\textit{rdistinct}$
       
   897 operations}\label{gRewrite}
       
   898 \end{figure}
  1221 \noindent
   899 \noindent
  1222 We defined
   900 We defined
  1223 two separate list rewriting definitions $\frewrite$ and $\grewrite$.
   901 two separate list rewriting relations $\frewrite$ and $\grewrite$.
  1224 The rewriting steps that take place during
   902 The rewriting steps that take place during
  1225 flattening is characterised by $\frewrite$.
   903 flattening is characterised by $\frewrite$.
  1226 $\grewrite$ characterises both flattening and de-duplicating.
   904 $\grewrite$ characterises both flattening and de-duplicating.
  1227 Sometimes $\grewrites$ is slightly too powerful
   905 Sometimes $\grewrites$ is slightly too powerful
  1228 so we would rather use $\frewrites$ because we only
   906 so we would rather use $\frewrites$ to prove
  1229 need to prove about certain equivalence under the rewriting steps of $\frewrites$.
   907 %because we only
       
   908 equalities related to $\rflts$.
       
   909 %certain equivalence under the rewriting steps of $\frewrites$.
  1230 For example, when proving the closed-form for the alternative regular expression,
   910 For example, when proving the closed-form for the alternative regular expression,
  1231 one of the rewriting steps would be:
   911 one of the equalities needed is:
  1232 \begin{lemma}
   912 \begin{center}
  1233 	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
   913 	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
  1234 	\sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
   914 	\sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
  1235 	$
   915 	$
  1236 \end{lemma}
   916 \end{center}
  1237 \noindent
   917 \noindent
  1238 Proving this is by first showing 
   918 Proving this is by first showing 
  1239 \begin{lemma}\label{earlyLaterDerFrewrites}
   919 \begin{lemma}\label{earlyLaterDerFrewrites}
  1240 	$\map \; (\_ \backslash x) \;  (\rflts \; rs) \frewrites
   920 	$\map \; (\_ \backslash x) \;  (\rflts \; rs) \frewrites
  1241 	\rflts \; (\map \; (\_ \backslash x) \; rs)$
   921 	\rflts \; (\map \; (\_ \backslash x) \; rs)$
  1242 \end{lemma}
   922 \end{lemma}
  1243 \noindent
   923 \noindent
  1244 and then using lemma
   924 and then the equivalence between two terms
       
   925 that can reduce in many steps to each other.
  1245 \begin{lemma}\label{frewritesSimpeq}
   926 \begin{lemma}\label{frewritesSimpeq}
  1246 	If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal 
   927 	If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal 
  1247 	\sum (\rDistinct \;  rs_2 \;  \varnothing)$.
   928 	\sum (\rDistinct \;  rs_2 \;  \varnothing)$.
  1248 \end{lemma}
   929 \end{lemma}
  1249 \noindent
   930 \noindent
  1250 is a piece of cake.
   931 
       
   932 \begin{corollary}
       
   933 	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
       
   934 	\sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
       
   935 	$
       
   936 \end{corollary}
       
   937 
  1251 But this trick will not work for $\grewrites$.
   938 But this trick will not work for $\grewrites$.
  1252 For example, a rewriting step in proving
   939 For example, a rewriting step in proving
  1253 closed forms is:
   940 closed forms is:
  1254 \begin{center}
   941 \begin{center}
  1255 	$\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))}$\\
   942 	$\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))}$\\
  1266 \noindent
   953 \noindent
  1267 does $\mathbf{not}$ hold in general.
   954 does $\mathbf{not}$ hold in general.
  1268 For this rewriting step we will introduce some slightly more cumbersome
   955 For this rewriting step we will introduce some slightly more cumbersome
  1269 proof technique in later sections.
   956 proof technique in later sections.
  1270 The point is that $\frewrite$
   957 The point is that $\frewrite$
  1271 allows us to prove equivalence in a straightforward two-step method that is 
   958 allows us to prove equivalence in a straightforward way that is 
  1272 not possible for $\grewrite$, thereby reducing the complexity of the entire proof.
   959 not possible for $\grewrite$. 
  1273 
   960 
  1274 
   961 
  1275 \subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$}
   962 \subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$}
       
   963 In this part, we present lemmas stating
       
   964 pairs of r-regular expressions or r-regular expression lists
       
   965 where one could rewrite from one in many steps to the other.
       
   966 Most of the proofs to these lemmas are straightforward, using
       
   967 an induction on the inductive cases of the corresponding rewriting relations.
       
   968 These proofs will therefore be omitted when this is the case.
  1276 We present in the below lemma a few pairs of terms that are rewritable via 
   969 We present in the below lemma a few pairs of terms that are rewritable via 
  1277 $\grewrites$:
   970 $\grewrites$:
  1278 \begin{lemma}\label{gstarRdistinctGeneral}
   971 \begin{lemma}\label{gstarRdistinctGeneral}
       
   972 	\mbox{}
  1279 	\begin{itemize}
   973 	\begin{itemize}
  1280 		\item
   974 		\item
  1281 			$rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$
   975 			$rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$
  1282 		\item
   976 		\item
  1283 			$rs \grewrites \rDistinct \; rs \; \varnothing$
   977 			$rs \grewrites \rDistinct \; rs \; \varnothing$
  1293 \noindent
   987 \noindent
  1294 If a pair of terms $rs_1, rs_2$ are rewritable via $\grewrites$ to each other,
   988 If a pair of terms $rs_1, rs_2$ are rewritable via $\grewrites$ to each other,
  1295 then they are equivalent under $\rsimp{}$:
   989 then they are equivalent under $\rsimp{}$:
  1296 \begin{lemma}\label{grewritesSimpalts}
   990 \begin{lemma}\label{grewritesSimpalts}
  1297 	If $rs_1 \grewrites rs_2$, then
   991 	If $rs_1 \grewrites rs_2$, then
  1298 	we have the following equivalence hold:
   992 	we have the following equivalence:
  1299 	\begin{itemize}
   993 	\begin{itemize}
  1300 		\item
   994 		\item
  1301 			$\sum rs_1 \sequal \sum rs_2$
   995 			$\sum rs_1 \sequal \sum rs_2$
  1302 		\item
   996 		\item
  1303 			$\rsimpalts \; rs_1 \sequal \rsimpalts \; rs_2$
   997 			$\rsimpalts \; rs_1 \sequal \rsimpalts \; rs_2$
  1358 			(\map \; (\_ \backslash_r x) rs) \;\varnothing  )$
  1052 			(\map \; (\_ \backslash_r x) rs) \;\varnothing  )$
  1359 	\end{itemize}
  1053 	\end{itemize}
  1360 \end{lemma}
  1054 \end{lemma}
  1361 \begin{proof}
  1055 \begin{proof}
  1362 	Part 1 is by lemma \ref{insideSimpRemoval},
  1056 	Part 1 is by lemma \ref{insideSimpRemoval},
  1363 	part 2 is by lemmas \ref{insideSimpRemoval} and \ref{distinctDer}.
  1057 	part 2 is by lemma \ref{insideSimpRemoval} .%and \ref{distinctDer}.
  1364 \end{proof}
  1058 \end{proof}
  1365 \noindent
  1059 \noindent
  1366 This leads to the first closed form--
  1060 
       
  1061 \subsection{Closed Forms for $\sum rs$, $r_1\cdot r_2$ and $r^*$}
       
  1062 \subsubsection{Closed Form for Alternative Regular Expression}
       
  1063 Lemma \ref{Simpders} leads to the first closed form--
  1367 \begin{lemma}\label{altsClosedForm}
  1064 \begin{lemma}\label{altsClosedForm}
  1368 	\begin{center}
  1065 	\begin{center}
  1369 		$\rderssimp{(\sum rs)}{s} \sequal
  1066 		$\rderssimp{(\sum rs)}{s} \sequal
  1370 		\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
  1067 		\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
  1371 	\end{center}
  1068 	\end{center}
  1396 	$\rderssimp \; (\sum \; rs) \; s = 
  1093 	$\rderssimp \; (\sum \; rs) \; s = 
  1397 	\rsimp{(\sum \; (\map \; \rderssimp{\_}{s} \; rs))}$.
  1094 	\rsimp{(\sum \; (\map \; \rderssimp{\_}{s} \; rs))}$.
  1398 \end{corollary}
  1095 \end{corollary}
  1399 \noindent
  1096 \noindent
  1400 The harder closed forms are the sequence and star ones.
  1097 The harder closed forms are the sequence and star ones.
  1401 Before we go on to obtain them, some preliminary definitions
  1098 Before we obtain them, some preliminary definitions
  1402 are needed to make proof statements concise.
  1099 are needed to make proof statements concise.
  1403 
  1100 
  1404 
  1101 
  1405 
       
  1406 
       
  1407 \subsection{Closed Forms}
       
  1408 \subsubsection{Closed Form for Sequence Regular Expressions}
  1102 \subsubsection{Closed Form for Sequence Regular Expressions}
  1409 The problem of obataining a closed-form for sequence regular expression 
       
  1410 is constructing $(r_1 \cdot r_2) \backslash_r s$
       
  1411 if we are only allowed to use a combination of $r_1 \backslash s''$ 
       
  1412 and  $r_2 \backslash s''$ , where $s''$ is from $s$.
       
  1413 First let's look at a series of derivatives steps on a sequence 
  1103 First let's look at a series of derivatives steps on a sequence 
  1414 regular expression, assuming that each time the first
  1104 regular expression, assuming that each time the first
  1415 component of the sequence is always nullable):
  1105 component of the sequence is always nullable):
  1416 \begin{center}
  1106 \begin{center}
  1417 
  1107 
  1813 
  1503 
  1814 
  1504 
  1815 
  1505 
  1816 
  1506 
  1817 
  1507 
       
  1508 %----------------------------------------------------------------------------------------
       
  1509 %	SECTION ??
       
  1510 %----------------------------------------------------------------------------------------
       
  1511 
       
  1512 %-----------------------------------
       
  1513 %	SECTION syntactic equivalence under simp
       
  1514 %-----------------------------------
       
  1515 
       
  1516 
       
  1517 %----------------------------------------------------------------------------------------
       
  1518 %	SECTION ALTS CLOSED FORM
       
  1519 %----------------------------------------------------------------------------------------
       
  1520 %\section{A Closed Form for \textit{ALTS}}
       
  1521 %Now we prove that  $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
       
  1522 %
       
  1523 %
       
  1524 %There are a few key steps, one of these steps is
       
  1525 %
       
  1526 %
       
  1527 %
       
  1528 %One might want to prove this by something a simple statement like: 
       
  1529 %
       
  1530 %For this to hold we want the $\textit{distinct}$ function to pick up
       
  1531 %the elements before and after derivatives correctly:
       
  1532 %$r \in rset \equiv (rder x r) \in (rder x rset)$.
       
  1533 %which essentially requires that the function $\backslash$ is an injective mapping.
       
  1534 %
       
  1535 %Unfortunately the function $\backslash c$ is not an injective mapping.
       
  1536 %
       
  1537 %\subsection{function $\backslash c$ is not injective (1-to-1)}
       
  1538 %\begin{center}
       
  1539 %	The derivative $w.r.t$ character $c$ is not one-to-one.
       
  1540 %	Formally,
       
  1541 %	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
       
  1542 %\end{center}
       
  1543 %This property is trivially true for the
       
  1544 %character regex example:
       
  1545 %\begin{center}
       
  1546 %	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
       
  1547 %\end{center}
       
  1548 %But apart from the cases where the derivative
       
  1549 %output is $\ZERO$, are there non-trivial results
       
  1550 %of derivatives which contain strings?
       
  1551 %The answer is yes.
       
  1552 %For example,
       
  1553 %\begin{center}
       
  1554 %	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
       
  1555 %	where $a$ is not nullable.\\
       
  1556 %	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
       
  1557 %	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
       
  1558 %\end{center}
       
  1559 %We start with two syntactically different regular expressions,
       
  1560 %and end up with the same derivative result.
       
  1561 %This is not surprising as we have such 
       
  1562 %equality as below in the style of Arden's lemma:\\
       
  1563 %\begin{center}
       
  1564 %	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
       
  1565 %\end{center}
       
  1566 \section{Bounding Closed Forms}
       
  1567 
       
  1568 In this section, we introduce how we formalised the bound
       
  1569 on closed forms.
       
  1570 We first prove that functions such as $\rflts$
       
  1571 will not cause the size of r-regular expressions to grow.
       
  1572 Putting this together with a general bound 
       
  1573 on the finiteness of distinct regular expressions
       
  1574 smaller than a certain size, we obtain a bound on 
       
  1575 the closed forms.
       
  1576 
       
  1577 \subsection{$\textit{rsimp}$ Does Not Increment the Size}
       
  1578 Although it seems evident, we need a series
       
  1579 of non-trivial lemmas to establish that functions such as $\rflts$
       
  1580 do not cause the regular expressions to grow.
       
  1581 \begin{lemma}\label{rsimpMonoLemmas}
       
  1582 	\mbox{}
       
  1583 	\begin{itemize}
       
  1584 		\item
       
  1585 			\[
       
  1586 				\llbracket \rsimpalts \; rs \rrbracket_r \leq
       
  1587 				\llbracket \sum \; rs \rrbracket_r
       
  1588 			\]
       
  1589 		\item
       
  1590 			\[
       
  1591 				\llbracket \rsimpseq \; r_1 \;  r_2 \rrbracket_r \leq
       
  1592 				\llbracket r_1 \cdot r_2 \rrbracket_r
       
  1593 			\]
       
  1594 		\item
       
  1595 			\[
       
  1596 				\llbracket \rflts \; rs \rrbracket_r  \leq
       
  1597 				\llbracket rs \rrbracket_r 
       
  1598 			\]
       
  1599 		\item
       
  1600 			\[
       
  1601 				\llbracket \rDistinct \; rs \; ss \rrbracket_r  \leq
       
  1602 				\llbracket rs \rrbracket_r 
       
  1603 			\]
       
  1604 		\item
       
  1605 			If all elements $a$ in the set $as$ satisfy the property
       
  1606 			that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
       
  1607 			\llbracket a \rrbracket_r$, then we have 
       
  1608 			\[
       
  1609 				\llbracket \; \rsimpalts \; (\textit{rdistinct} \;
       
  1610 				(\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
       
  1611 				\rrbracket \leq
       
  1612 				\llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
       
  1613 				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r 
       
  1614 			\]
       
  1615 	\end{itemize}
       
  1616 \end{lemma}
       
  1617 \begin{proof}
       
  1618 	Point 1, 3, 4 can be proven by an induction on $rs$.
       
  1619 	Point 2 is by case analysis on $r_1$ and $r_2$.
       
  1620 	The last part is a corollary of the previous ones.
       
  1621 \end{proof}
       
  1622 \noindent
       
  1623 With the lemmas for each inductive case in place, we are ready to get 
       
  1624 the non-increasing property as a corollary:
       
  1625 \begin{corollary}\label{rsimpMono}
       
  1626 	$\llbracket \textit{rsimp} \; r \rrbracket_r \leq \llbracket r \rrbracket_r$
       
  1627 \end{corollary}
       
  1628 \begin{proof}
       
  1629 	By \ref{rsimpMonoLemmas}.
       
  1630 \end{proof}
       
  1631 
       
  1632 \subsubsection{The $RL$ Function: Language Interpretation of $\textit{Rrexp}$s}
       
  1633 Much like the definition of $L$ on plain regular expressions, one could also 
       
  1634 define the language interpretation of $\rrexp$s.
       
  1635 \begin{center}
       
  1636 	\begin{tabular}{lcl}
       
  1637 		$RL \; (\ZERO)$ & $\dn$ & $\phi$\\
       
  1638 		$RL \; (\ONE)$ & $\dn$ & $\{[]\}$\\
       
  1639 		$RL \; (c)$ & $\dn$ & $\{[c]\}$\\
       
  1640 		$RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
       
  1641 		$RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
       
  1642 		$RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$
       
  1643 	\end{tabular}
       
  1644 \end{center}
       
  1645 \noindent
       
  1646 The main use of $RL$ is to establish some connections between $\rsimp{}$ 
       
  1647 and $\rnullable{}$:
       
  1648 \begin{lemma}
       
  1649 	The following properties hold:
       
  1650 	\begin{itemize}
       
  1651 		\item
       
  1652 			If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
       
  1653 		\item
       
  1654 			$\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
       
  1655 	\end{itemize}
       
  1656 \end{lemma}
       
  1657 \begin{proof}
       
  1658 	The first part is by induction on $r$. 
       
  1659 	The second part is true because property 
       
  1660 	\[ RL \; r = RL \; (\rsimp{r})\] holds.
       
  1661 \end{proof}
       
  1662 
       
  1663 \subsubsection{Simplified $\textit{Rrexp}$s are Good}
       
  1664 We formalise the notion of ``good" regular expressions,
       
  1665 which means regular expressions that
       
  1666 are not fully simplified. For alternative regular expressions that means they
       
  1667 do not contain any nested alternatives like 
       
  1668 \[ r_1 + (r_2 + r_3) \], un-removed $\RZERO$s like \[\RZERO + r\]
       
  1669 or duplicate elements in a children regular expression list like \[ \sum [r, r, \ldots]\]:
       
  1670 \begin{center}
       
  1671 	\begin{tabular}{@{}lcl@{}}
       
  1672 		$\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
       
  1673 		$\good\; \RONE$ & $\dn$ & $\textit{true}$\\
       
  1674 		$\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
       
  1675 		$\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
       
  1676 		$\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
       
  1677 		$\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ & 
       
  1678 		$\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
       
  1679 						   & & $\textit{and}\; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \,  \textit{and}\; \, \textit{nonAlt}\; r')$\\
       
  1680 		$\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
       
  1681 		$\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
       
  1682 		$\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
       
  1683 		$\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
       
  1684 		$\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
       
  1685 	\end{tabular}
       
  1686 \end{center}
       
  1687 \noindent
       
  1688 The predicate $\textit{nonAlt}$ evaluates to true when the regular expression is not an
       
  1689 alternative, and false otherwise.
       
  1690 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
       
  1691 its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$, 
       
  1692 and unique:
       
  1693 \begin{lemma}\label{rsimpaltsGood}
       
  1694 	If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
       
  1695 	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
       
  1696 \end{lemma}
       
  1697 \noindent
       
  1698 We also note that
       
  1699 if a regular expression $r$ is good, then $\rflts$ on the singleton
       
  1700 list $[r]$ will not break goodness:
       
  1701 \begin{lemma}\label{flts2}
       
  1702 	If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
       
  1703 \end{lemma}
       
  1704 \begin{proof}
       
  1705 	By an induction on $r$.
       
  1706 \end{proof}
       
  1707 \noindent
       
  1708 The other observation we make about $\rsimp{r}$ is that it never
       
  1709 comes with nested alternatives, which we describe as the $\nonnested$
       
  1710 property:
       
  1711 \begin{center}
       
  1712 	\begin{tabular}{lcl}
       
  1713 		$\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
       
  1714 		$\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
       
  1715 		$\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
       
  1716 		$\nonnested \; \, r $ & $\dn$ & $\btrue$
       
  1717 	\end{tabular}
       
  1718 \end{center}
       
  1719 \noindent
       
  1720 The $\rflts$ function
       
  1721 always opens up nested alternatives,
       
  1722 which enables $\rsimp$ to be non-nested:
       
  1723 
       
  1724 \begin{lemma}\label{nonnestedRsimp}
       
  1725 	$\nonnested \; (\rsimp{r})$
       
  1726 \end{lemma}
       
  1727 \begin{proof}
       
  1728 	By an induction on $r$.
       
  1729 \end{proof}
       
  1730 \noindent
       
  1731 With this we could prove that a regular expressions
       
  1732 after simplification and flattening and de-duplication,
       
  1733 will not contain any alternative regular expression directly:
       
  1734 \begin{lemma}\label{nonaltFltsRd}
       
  1735 	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ 
       
  1736 	then $\textit{nonAlt} \; x$.
       
  1737 \end{lemma}
       
  1738 \begin{proof}
       
  1739 	By \ref{nonnestedRsimp}.
       
  1740 \end{proof}
       
  1741 \noindent
       
  1742 The other thing we know is that once $\rsimp{}$ had finished
       
  1743 processing an alternative regular expression, it will not
       
  1744 contain any $\RZERO$s, this is because all the recursive 
       
  1745 calls to the simplification on the children regular expressions
       
  1746 make the children good, and $\rflts$ will not take out
       
  1747 any $\RZERO$s out of a good regular expression list,
       
  1748 and $\rdistinct{}$ will not mess with the result.
       
  1749 \begin{lemma}\label{flts3Obv}
       
  1750 	The following are true:
       
  1751 	\begin{itemize}
       
  1752 		\item
       
  1753 			If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
       
  1754 			then for all $r \in \rflts\; rs. \, \good \; r$.
       
  1755 		\item
       
  1756 			If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
       
  1757 			and for all $y$ such that $\llbracket y \rrbracket_r$ less than
       
  1758 			$\llbracket rs \rrbracket_r + 1$, either
       
  1759 			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
       
  1760 			then $\good \; x$.
       
  1761 	\end{itemize}
       
  1762 \end{lemma}
       
  1763 \begin{proof}
       
  1764 	The first part is by induction on $rs$, where the induction
       
  1765 	rule is the inductive cases for $\rflts$.
       
  1766 	The second part is a corollary from the first part.
       
  1767 \end{proof}
       
  1768 
       
  1769 And this leads to good structural property of $\rsimp{}$,
       
  1770 that after simplification, a regular expression is
       
  1771 either good or $\RZERO$:
       
  1772 \begin{lemma}\label{good1}
       
  1773 	For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
       
  1774 \end{lemma}
       
  1775 \begin{proof}
       
  1776 	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
       
  1777 	Lemma \ref{rsimpSize} says that 
       
  1778 	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
       
  1779 	$\llbracket r \rrbracket_r$.
       
  1780 	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
       
  1781 	Inductive hypothesis applies to the children regular expressions
       
  1782 	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
       
  1783 	by that as well.
       
  1784 	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
       
  1785 	to ensure that goodness is preserved at the topmost level.
       
  1786 \end{proof}
       
  1787 We shall prove that any good regular expression is 
       
  1788 a fixed-point for $\rsimp{}$.
       
  1789 First we prove an auxiliary lemma:
       
  1790 \begin{lemma}\label{goodaltsNonalt}
       
  1791 	If $\good \; \sum rs$, then $\rflts\; rs = rs$.
       
  1792 \end{lemma}
       
  1793 \begin{proof}
       
  1794 	By an induction on $\sum rs$. The inductive rules are the cases
       
  1795 	for $\good$.
       
  1796 \end{proof}
       
  1797 \noindent
       
  1798 Now we are ready to prove that good regular expressions are invariant
       
  1799 of $\rsimp{}$ application:
       
  1800 \begin{lemma}\label{test}
       
  1801 	If $\good \;r$ then $\rsimp{r} = r$.
       
  1802 \end{lemma}
       
  1803 \begin{proof}
       
  1804 	By an induction on the inductive cases of $\good$, using lemmas
       
  1805 	\ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
       
  1806 	The lemma \ref{goodaltsNonalt} is used in the alternative
       
  1807 	case where 2 or more elements are present in the list.
       
  1808 \end{proof}
       
  1809 \noindent
       
  1810 Given below is a property involving $\rflts$, $\rdistinct{}{}$, $\rsimp{}$ and $\rsimp_{ALTS}$,
       
  1811 which requires $\ref{good1}$ to go through smoothly.
       
  1812 It says that an application of $\rsimp_{ALTS}$ can be "absorbed",
       
  1813 if it its output is concatenated with a list and then applied to $\rflts$.
       
  1814 \begin{lemma}\label{flattenRsimpalts}
       
  1815 	$\rflts \; ( (\rsimp_{ALTS} \; 
       
  1816 	(\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) :: 
       
  1817 	\map \; \rsimp{} \; rs' ) = 
       
  1818 	\rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
       
  1819 	\map \; \rsimp{rs'}))$
       
  1820 
       
  1821 
       
  1822 \end{lemma}
       
  1823 \begin{proof}
       
  1824 	By \ref{good1}.
       
  1825 \end{proof}
       
  1826 \noindent
       
  1827 
       
  1828 
       
  1829 
       
  1830 
       
  1831 
       
  1832 We are also ready to prove that $\textit{rsimp}$ is idempotent.
       
  1833 \subsubsection{$\rsimp$ is Idempotent}
       
  1834 The idempotency of $\rsimp$ is very useful in 
       
  1835 manipulating regular expression terms into desired
       
  1836 forms so that key steps allowing further rewriting to closed forms
       
  1837 are possible.
       
  1838 \begin{lemma}\label{rsimpIdem}
       
  1839 	$\rsimp{r} = \rsimp{\rsimp{r}}$
       
  1840 \end{lemma}
       
  1841 
       
  1842 \begin{proof}
       
  1843 	By \ref{test} and \ref{good1}.
       
  1844 \end{proof}
       
  1845 \noindent
       
  1846 This property means we do not have to repeatedly
       
  1847 apply simplification in each step, which justifies
       
  1848 our definition of $\blexersimp$.
       
  1849 
       
  1850 
       
  1851 On the other hand, we could repeat the same $\rsimp{}$ applications
       
  1852 on regular expressions as many times as we want, if we have at least
       
  1853 one simplification applied to it, and apply it wherever we would like to:
       
  1854 \begin{corollary}\label{headOneMoreSimp}
       
  1855 	The following properties hold, directly from \ref{rsimpIdem}:
       
  1856 
       
  1857 	\begin{itemize}
       
  1858 		\item
       
  1859 			$\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
       
  1860 		\item
       
  1861 			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
       
  1862 	\end{itemize}
       
  1863 \end{corollary}
       
  1864 \noindent
       
  1865 This will be useful in later closed form proof's rewriting steps.
       
  1866 Similarly, we point out the following useful facts below:
       
  1867 \begin{lemma}
       
  1868 	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
       
  1869 	\begin{itemize}
       
  1870 		\item
       
  1871 			If $r  = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
       
  1872 		\item
       
  1873 			If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
       
  1874 		\item
       
  1875 			$\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
       
  1876 	\end{itemize}
       
  1877 \end{lemma}
       
  1878 \begin{proof}
       
  1879 	By application of lemmas \ref{rsimpIdem} and \ref{good1}.
       
  1880 \end{proof}
       
  1881 
       
  1882 \noindent
       
  1883 With the idempotency of $\rsimp{}$ and its corollaries, 
       
  1884 we can start proving some key equalities leading to the 
       
  1885 closed forms.
       
  1886 Now presented are a few equivalent terms under $\rsimp{}$.
       
  1887 We use $r_1 \sequal r_2 $ here to denote $\rsimp{r_1} = \rsimp{r_2}$.
       
  1888 \begin{lemma}
       
  1889 	\begin{itemize}
       
  1890 		The following equivalence hold:
       
  1891 	\item
       
  1892 		$\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
       
  1893 	\item
       
  1894 		$\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
       
  1895 	\item
       
  1896 		$\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
       
  1897 	\item
       
  1898 		$\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
       
  1899 	\item
       
  1900 		$\RALTS{rs} = \RALTS{\map \; \rsimp{} \; rs}$
       
  1901 \end{itemize}
       
  1902 \end{lemma}
       
  1903 \begin{proof}
       
  1904 	By induction on the lists involved.
       
  1905 \end{proof}
       
  1906 \noindent
       
  1907 The above allows us to prove
       
  1908 two similar equalities (which are a bit more involved).
       
  1909 It says that we could flatten out the elements
       
  1910 before simplification and still get the same result.
       
  1911 \begin{lemma}\label{simpFlatten3}
       
  1912 	One can flatten the inside $\sum$ of a $\sum$ if it is being 
       
  1913 	simplified. Concretely,
       
  1914 	\begin{itemize}
       
  1915 		\item
       
  1916 			If for all $r \in rs, rs', rs''$, we have $\good \; r $
       
  1917 			or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal 
       
  1918 			\sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
       
  1919 		\item
       
  1920 			$\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
       
  1921 	\end{itemize}
       
  1922 \end{lemma}
       
  1923 \begin{proof}
       
  1924 	By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
       
  1925 	The second sub-lemma is a corollary of the previous.
       
  1926 \end{proof}
       
  1927 %Rewriting steps not put in--too long and complicated-------------------------------
       
  1928 \begin{comment}
       
  1929 	\begin{center}
       
  1930 		$\rsimp{\sum (rs' @ rs @ rs'')}  \stackrel{def of bsimp}{=}$  \\
       
  1931 		$\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
       
  1932 		$\stackrel{by \ref{test}}{=} 
       
  1933 		\rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
       
  1934 		\varnothing})$\\
       
  1935 		$\stackrel{by \ref{rdistinctConcatGeneral}}{=}
       
  1936 		\rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
       
  1937 		\rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
       
  1938 
       
  1939 	\end{center}
       
  1940 \end{comment}
       
  1941 %Rewriting steps not put in--too long and complicated-------------------------------
       
  1942 \noindent
  1818 \subsection{Estimating the Closed Forms' sizes}
  1943 \subsection{Estimating the Closed Forms' sizes}
  1819 We now summarize the closed forms below:
  1944 We now summarize the closed forms below:
  1820 \begin{itemize}
  1945 \begin{itemize}
  1821 	\item
  1946 	\item
  1822 		$\rderssimp{(\sum rs)}{s} \sequal
  1947 		$\rderssimp{(\sum rs)}{s} \sequal
  2252 original size.
  2377 original size.
  2253 We will discuss improvements to this bound in the next chapter.
  2378 We will discuss improvements to this bound in the next chapter.
  2254 
  2379 
  2255 
  2380 
  2256 
  2381 
  2257 %----------------------------------------------------------------------------------------
  2382 \section{Possible Further Improvements}
  2258 %	SECTION ??
       
  2259 %----------------------------------------------------------------------------------------
       
  2260 
       
  2261 %-----------------------------------
       
  2262 %	SECTION syntactic equivalence under simp
       
  2263 %-----------------------------------
       
  2264 
       
  2265 
       
  2266 %----------------------------------------------------------------------------------------
       
  2267 %	SECTION ALTS CLOSED FORM
       
  2268 %----------------------------------------------------------------------------------------
       
  2269 %\section{A Closed Form for \textit{ALTS}}
       
  2270 %Now we prove that  $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
       
  2271 %
       
  2272 %
       
  2273 %There are a few key steps, one of these steps is
       
  2274 %
       
  2275 %
       
  2276 %
       
  2277 %One might want to prove this by something a simple statement like: 
       
  2278 %
       
  2279 %For this to hold we want the $\textit{distinct}$ function to pick up
       
  2280 %the elements before and after derivatives correctly:
       
  2281 %$r \in rset \equiv (rder x r) \in (rder x rset)$.
       
  2282 %which essentially requires that the function $\backslash$ is an injective mapping.
       
  2283 %
       
  2284 %Unfortunately the function $\backslash c$ is not an injective mapping.
       
  2285 %
       
  2286 %\subsection{function $\backslash c$ is not injective (1-to-1)}
       
  2287 %\begin{center}
       
  2288 %	The derivative $w.r.t$ character $c$ is not one-to-one.
       
  2289 %	Formally,
       
  2290 %	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
       
  2291 %\end{center}
       
  2292 %This property is trivially true for the
       
  2293 %character regex example:
       
  2294 %\begin{center}
       
  2295 %	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
       
  2296 %\end{center}
       
  2297 %But apart from the cases where the derivative
       
  2298 %output is $\ZERO$, are there non-trivial results
       
  2299 %of derivatives which contain strings?
       
  2300 %The answer is yes.
       
  2301 %For example,
       
  2302 %\begin{center}
       
  2303 %	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
       
  2304 %	where $a$ is not nullable.\\
       
  2305 %	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
       
  2306 %	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
       
  2307 %\end{center}
       
  2308 %We start with two syntactically different regular expressions,
       
  2309 %and end up with the same derivative result.
       
  2310 %This is not surprising as we have such 
       
  2311 %equality as below in the style of Arden's lemma:\\
       
  2312 %\begin{center}
       
  2313 %	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
       
  2314 %\end{center}
       
  2315 
       
  2316 \section{Further Improvements to the Bound}
       
  2317 There are two problems with this finiteness result, though.
  2383 There are two problems with this finiteness result, though.
  2318 \begin{itemize}
  2384 \begin{itemize}
  2319 	\item
  2385 	\item
  2320 		First, It is not yet a direct formalisation of our lexer's complexity,
  2386 		First, It is not yet a direct formalisation of our lexer's complexity,
  2321 		as a complexity proof would require looking into 
  2387 		as a complexity proof would require looking into