ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 609 61139fdddae0
parent 601 ce4e5151a836
child 610 d028c662a3df
equal deleted inserted replaced
608:37b6fd310a16 609:61139fdddae0
   400 $r \backslash_{rsimps} [\,] $ & $\dn$ & $r$
   400 $r \backslash_{rsimps} [\,] $ & $\dn$ & $r$
   401 	\end{tabular}
   401 	\end{tabular}
   402 \end{center}
   402 \end{center}
   403 \noindent
   403 \noindent
   404 We do not define an r-regular expression version of $\blexersimp$,
   404 We do not define an r-regular expression version of $\blexersimp$,
   405 as our proof does not involve its use. 
   405 as our proof does not involve its use 
       
   406 (and there is no bitcode to decode into a lexical value). 
   406 Everything about the size of annotated regular expressions
   407 Everything about the size of annotated regular expressions
   407 can be calculated via the size of r-regular expressions:
   408 can be calculated via the size of r-regular expressions:
   408 \begin{lemma}\label{sizeRelations}
   409 \begin{lemma}\label{sizeRelations}
   409 	The following equalities hold:
   410 	The following equalities hold:
   410 	\begin{itemize}
   411 	\begin{itemize}
   431 	implies
   432 	implies
   432 	$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
   433 	$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
   433 \end{center}
   434 \end{center}
   434 Unless stated otherwise in the rest of this 
   435 Unless stated otherwise in the rest of this 
   435 chapter all regular expressions without
   436 chapter all regular expressions without
   436 bitcodes are seen as $\rrexp$s.
   437 bitcodes are seen as r-regular expressions ($\rrexp$s).
   437 For the binary alternative r-regular expression $\RALTS{[r_1, r_2]}$,
   438 For the binary alternative r-regular expression $\RALTS{[r_1, r_2]}$,
   438 we use the notation $r_1 + r_2$
   439 we use the notation $r_1 + r_2$
   439 for brevity.
   440 for brevity.
   440 
   441 
   441 
   442 
   474 %		\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
   475 %		\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r2}{\_} \;(\vsuf{s}{r_1})))}$
   475 %	\end{center}
   476 %	\end{center}
   476 %\end{quote}
   477 %\end{quote}
   477 %\noindent
   478 %\noindent
   478 %We explain in detail how we reached those claims.
   479 %We explain in detail how we reached those claims.
   479 \subsection{Basic Properties needed for Closed Forms}
   480 \subsection{The Idea Behind Closed Forms}
   480 If we attempt to prove 
   481 If we attempt to prove 
   481 \begin{center}
   482 \begin{center}
   482 	$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{bsimps} s \rrbracket_r \leq N_r$
   483 	$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{rsimps} s \rrbracket_r \leq N_r$
   483 \end{center}
   484 \end{center}
   484 using a naive induction on the structure of $r$,
   485 using a naive induction on the structure of $r$,
   485 then we are stuck at the inductive cases such as
   486 then we are stuck at the inductive cases such as
   486 $r_1\cdot r_2$.
   487 $r_1\cdot r_2$.
   487 The inductive hypotheses are:
   488 The inductive hypotheses are:
   488 \begin{center}
   489 \begin{center}
   489 	1: $\text{for } r_1, \text{there exists } N_{r_1}.\;\; s.t. 
   490 	1: $\text{for } r_1, \text{there exists } N_{r_1}.\;\; s.t. 
   490 	\;\;\forall s.  \llbracket r_1 \backslash_{bsimps} s \rrbracket_r \leq N_{r_1}. $\\
   491 	\;\;\forall s.  \llbracket r_1 \backslash_{rsimps} s \rrbracket_r \leq N_{r_1}. $\\
   491 	2: $\text{for } r_2, \text{there exists } N_{r_2}.\;\; s.t. 
   492 	2: $\text{for } r_2, \text{there exists } N_{r_2}.\;\; s.t. 
   492 	\;\; \forall s. \llbracket r_2 \backslash_{bsimps} s \rrbracket_r \leq N_{r_2}. $
   493 	\;\; \forall s. \llbracket r_2 \backslash_{rsimps} s \rrbracket_r \leq N_{r_2}. $
   493 \end{center}
   494 \end{center}
   494 The inductive step to prove would be 
   495 The inductive step to prove would be 
   495 \begin{center}
   496 \begin{center}
   496 	$\text{there exists } N_{r_1\cdot r_2}. \;\; s.t. \forall s. 
   497 	$\text{there exists } N_{r_1\cdot r_2}. \;\; s.t. \forall s. 
   497 	\llbracket (r_1 \cdot r_2) \backslash_{bsimps} s \rrbracket_r \leq N_{r_1\cdot r_2}.$
   498 	\llbracket (r_1 \cdot r_2) \backslash_{rsimps} s \rrbracket_r \leq N_{r_1\cdot r_2}.$
   498 \end{center}
   499 \end{center}
   499 The problem is that it is not clear what 
   500 The problem is that it is not clear what 
   500 $(r_1\cdot r_2) \backslash_{bsimps} s$ looks like,
   501 $(r_1\cdot r_2) \backslash_{rsimps} s$ looks like,
   501 and therefore $N_{r_1}$ and $N_{r_2}$ in the
   502 and therefore $N_{r_1}$ and $N_{r_2}$ in the
   502 inductive hypotheses cannot be directly used.
   503 inductive hypotheses cannot be directly used.
   503 
   504 We have already seen that $(r_1 \cdot r_2)\backslash s$ 
   504 
   505 and $(r^*)\backslash s$ can grow in a wild way.
   505 The next steps involves getting a closed form for
   506 The point is that they will be equivalent to a list of
   506 $\rderssimp{r}{s}$ and then obtaining
   507 terms $\sum rs$, where each term in $rs$ will
   507 an estimate for the closed form.
   508 be made of $r_1 \backslash s' $, $r_2\backslash s'$,
       
   509 and $r \backslash s'$ with $s' \in \textit{SubString} \; s$.
       
   510 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
       
   511 in the simplification which saves $rs$ from growing indefinitely.
       
   512 
       
   513 Based on this idea, we sketch a proof by first showing the equality (where
       
   514 $f$ and $g$ are functions that do not increase the size of the input)
       
   515 \begin{center}
       
   516 $(r_1 \cdot r_2)\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
       
   517 \end{center}
       
   518 and then show the right-hand-side can be finitely bounded.
       
   519 We call the right-hand-side the 
       
   520 \emph{Closed Form} of $(r_1 \cdot r_2)\backslash_{rsimps} s$.
       
   521 We will flesh out the proof sketch in the next section.
       
   522 
       
   523 \section{Details of Closed Forms and Bounds}
       
   524 In this section we introduce in detail
       
   525 how the closed forms are obtained for regular expressions'
       
   526 derivatives and how they are bounded.
       
   527 We start by proving some basic identities
       
   528 involving the simplification functions for r-regular expressions.
       
   529 After that we use these identities to establish the
       
   530 closed forms we need.
       
   531 Finally, we prove the functions such as $\flts$
       
   532 will keep the size non-increasing.
       
   533 Putting this together with a general bound 
       
   534 on the finiteness of distinct regular expressions
       
   535 smaller than a certain size, we obtain a bound on 
       
   536 the closed forms.
       
   537 %$r_1\cdot r_2$, $r^*$ and $\sum rs$.
       
   538 
       
   539 
       
   540 
       
   541 \subsection{Some Basic Identities}
       
   542 
       
   543 
   508 \subsubsection{$\textit{rdistinct}$'s Deduplicates Successfully}
   544 \subsubsection{$\textit{rdistinct}$'s Deduplicates Successfully}
   509 The $\textit{rdistinct}$ function, as its name suggests, will
   545 The $\textit{rdistinct}$ function, as its name suggests, will
   510 remove duplicates in an \emph{r}$\textit{rexp}$ list, 
   546 remove duplicates in an r-regular expression list.
   511 according to the accumulator
   547 It will also correctly exclude any elements that 
   512 and leave only one of each different element in a list:
   548 is intially in the accumulator set.
   513 \begin{lemma}\label{rdistinctDoesTheJob}
   549 \begin{lemma}\label{rdistinctDoesTheJob}
   514 	The function $\textit{rdistinct}$ satisfies the following
   550 	%The function $\textit{rdistinct}$ satisfies the following
   515 	properties:
   551 	%properties:
       
   552 	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
       
   553 	recursive definition here, its Isabelle counterpart would be $\textit{distinct}$.} 
       
   554 	readily defined
       
   555 	for testing
       
   556 	whether a list's elements are all unique. Then the following
       
   557 	properties about $\textit{rdistinct}$ hold:
   516 	\begin{itemize}
   558 	\begin{itemize}
   517 		\item
   559 		\item
   518 			If $a \in acc$ then $a \notin (\rdistinct{rs}{acc})$.
   560 			If $a \in acc$ then $a \notin (\rdistinct{rs}{acc})$.
   519 		\item
   561 		\item
   520 			If list $rs'$ is the result of $\rdistinct{rs}{acc}$,
   562 			%If list $rs'$ is the result of $\rdistinct{rs}{acc}$,
   521 			then $\textit{isDistinct} \; rs'$.
   563 			$\textit{isDistinct} \;\;\; (\rdistinct{rs}{acc})$.
   522 		\item
   564 		\item
   523 			$\rdistinct{rs}{acc} = rs - acc$
   565 			$\textit{set} \; (\rdistinct{rs}{acc}) 
       
   566 			= (textit{set} \; rs) - acc$
   524 	\end{itemize}
   567 	\end{itemize}
   525 \end{lemma}
   568 \end{lemma}
   526 \noindent
   569 \noindent
   527 The predicate $\textit{isDistinct}$ is for testing
       
   528 whether a list's elements are all unique. It is defined
       
   529 recursively on the structure of a regular expression,
       
   530 and we omit the precise definition here.
       
   531 \begin{proof}
   570 \begin{proof}
   532 	The first part is by an induction on $rs$.
   571 	The first part is by an induction on $rs$.
   533 	The second and third part can be proven by using the 
   572 	The second and third part can be proven by using the 
   534 	induction rules of $\rdistinct{\_}{\_}$.
   573 	inductive cases of $\textit{rdistinct}$.
   535 
   574 
   536 \end{proof}
   575 \end{proof}
   537 
   576 
   538 \noindent
   577 \noindent
   539 $\rdistinct{\_}{\_}$ will cancel out all regular expression terms
   578 $\rdistinct{\_}{\_}$ will cancel out all regular expression terms
   540 that are in the accumulator, therefore prepending a list $rs_a$ with an arbitrary
   579 that are in the accumulator, therefore prepending a list $rs_a$ with an arbitrary
   541 list $rs$ whose elements are all from the accumulator, and then call $\rdistinct{\_}{\_}$
   580 list $rs$ whose elements are all from the accumulator, and then call $\rdistinct{\_}{\_}$
   542 on the resulting list, the output will be as if we had called $\rdistinct{\_}{\_}$
   581 on the resulting list, the output will be as if we had called $\rdistinct{\_}{\_}$
   543 without the prepending of $rs$:
   582 without the prepending of $rs$:
   544 \begin{lemma}
   583 \begin{lemma}\label{rdistinctConcat}
   545 	The elements appearing in the accumulator will always be removed.
   584 	The elements appearing in the accumulator will always be removed.
   546 	More precisely,
   585 	More precisely,
   547 	\begin{itemize}
   586 	\begin{itemize}
   548 		\item
   587 		\item
   549 			If $rs \subseteq rset$, then 
   588 			If $rs \subseteq rset$, then 
   550 			$\rdistinct{rs@rsa }{acc} = \rdistinct{rsa }{acc}$.
   589 			$\rdistinct{rs@rsa }{acc} = \rdistinct{rsa }{acc}$.
   551 		\item
   590 		\item
   552 			Furthermore, if $a \in rset$ and $\rdistinct{rs}{\{a\}} = []$,
   591 			More generally, if $a \in rset$ and $\rdistinct{rs}{\{a\}} = []$,
   553 			then $\rdistinct{(rs @ rs')}{rset} = \rdistinct{rs'}{rset}$
   592 			then $\rdistinct{(rs @ rs')}{rset} = \rdistinct{rs'}{rset}$
   554 	\end{itemize}
   593 	\end{itemize}
   555 \end{lemma}
   594 \end{lemma}
   556 
   595 
   557 \begin{proof}
   596 \begin{proof}
   558 	By induction on $rs$.
   597 	By induction on $rs$ and using \ref{rdistinctDoesTheJob}.
   559 \end{proof}
   598 \end{proof}
   560 \noindent
   599 \noindent
   561 On the other hand, if an element $r$ does not appear in the input list waiting to be deduplicated,
   600 On the other hand, if an element $r$ does not appear in the input list waiting to be deduplicated,
   562 then expanding the accumulator to include that element will not cause the output list to change:
   601 then expanding the accumulator to include that element will not cause the output list to change:
   563 \begin{lemma}
   602 \begin{lemma}
   962 \end{lemma}
  1001 \end{lemma}
   963 \begin{proof}
  1002 \begin{proof}
   964 	By induction on the lists involved.
  1003 	By induction on the lists involved.
   965 \end{proof}
  1004 \end{proof}
   966 \noindent
  1005 \noindent
   967 Similarly,
  1006 The above allows us to prove
   968 we introduce the equality for $\sum$ when certain child regular expressions
  1007 two similar equalities (which are a bit more involved).
   969 are $\sum$ themselves:
  1008 It says that we could flatten out the elements
       
  1009 before simplification and still get the same result.
   970 \begin{lemma}\label{simpFlatten3}
  1010 \begin{lemma}\label{simpFlatten3}
   971 	One can flatten the inside $\sum$ of a $\sum$ if it is being 
  1011 	One can flatten the inside $\sum$ of a $\sum$ if it is being 
   972 	simplified. Concretely,
  1012 	simplified. Concretely,
   973 	\begin{itemize}
  1013 	\begin{itemize}
   974 		\item
  1014 		\item
  1277 \noindent
  1317 \noindent
  1278 The harder closed forms are the sequence and star ones.
  1318 The harder closed forms are the sequence and star ones.
  1279 Before we go on to obtain them, some preliminary definitions
  1319 Before we go on to obtain them, some preliminary definitions
  1280 are needed to make proof statements concise.
  1320 are needed to make proof statements concise.
  1281 
  1321 
  1282 \section{"Closed Forms" of Sequence Regular Expressions}
  1322 
       
  1323 
       
  1324 
       
  1325 \subsection{Closed Forms}
       
  1326 \subsubsection{Closed Form for Sequence Regular Expressions}
  1283 The problem of obataining a closed-form for sequence regular expression 
  1327 The problem of obataining a closed-form for sequence regular expression 
  1284 is constructing $(r_1 \cdot r_2) \backslash_r s$
  1328 is constructing $(r_1 \cdot r_2) \backslash_r s$
  1285 if we are only allowed to use a combination of $r_1 \backslash s''$ 
  1329 if we are only allowed to use a combination of $r_1 \backslash s''$ 
  1286 and  $r_2 \backslash s''$ , where $s''$ is from $s$.
  1330 and  $r_2 \backslash s''$ , where $s''$ is from $s$.
  1287 First let's look at a series of derivatives steps on a sequence 
  1331 First let's look at a series of derivatives steps on a sequence 
  1464 		:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1})))} \rrbracket_r$
  1508 		:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1})))} \rrbracket_r$
  1465 
  1509 
  1466 	\end{center}
  1510 	\end{center}
  1467 \end{corollary}
  1511 \end{corollary}
  1468 \noindent
  1512 \noindent
  1469 \subsection{Closed Forms for Star Regular Expressions}
  1513 \subsubsection{Closed Forms for Star Regular Expressions}
  1470 We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
  1514 We have shown how to control the size of the sequence regular expression $r_1\cdot r_2$ using
  1471 the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then 
  1515 the "closed form" of $(r_1 \cdot r_2) \backslash s$ and then 
  1472 the property of the $\distinct$ function.
  1516 the property of the $\distinct$ function.
  1473 Now we try to get a bound on $r^* \backslash s$ as well.
  1517 Now we try to get a bound on $r^* \backslash s$ as well.
  1474 Again, we first look at how a star's derivatives evolve, if they grow maximally: 
  1518 Again, we first look at how a star's derivatives evolve, if they grow maximally: 
  1681 \begin{proof}
  1725 \begin{proof}
  1682 	By an induction on $s$.
  1726 	By an induction on $s$.
  1683 	The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, and \ref{hfauRsimpeq2}
  1727 	The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, and \ref{hfauRsimpeq2}
  1684 	are used.	
  1728 	are used.	
  1685 \end{proof}
  1729 \end{proof}
  1686 \section{Estimating the Closed Forms' sizes}
  1730 
       
  1731 
       
  1732 
       
  1733 
       
  1734 
       
  1735 
       
  1736 \subsection{Estimating the Closed Forms' sizes}
  1687 We now summarize the closed forms below:
  1737 We now summarize the closed forms below:
  1688 \begin{itemize}
  1738 \begin{itemize}
  1689 	\item
  1739 	\item
  1690 		$\rderssimp{(\sum rs)}{s} \sequal
  1740 		$\rderssimp{(\sum rs)}{s} \sequal
  1691 		\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
  1741 		\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
  1756 is bounded by a constant $c_N$ depending only on $N$,
  1806 is bounded by a constant $c_N$ depending only on $N$,
  1757 provided that each of $rs'$'s element
  1807 provided that each of $rs'$'s element
  1758 is bounded by $N$.
  1808 is bounded by $N$.
  1759 We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
  1809 We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
  1760 
  1810 
  1761 We show how $\rdistinct$ and $\rflts$
  1811 We show that $\rdistinct$ and $\rflts$
  1762 in the simplification function together is at least as 
  1812 working together is at least as 
  1763 good as $\rdistinct{}{}$ alone.
  1813 good as $\rdistinct{}{}$ alone, which can be written as
  1764 \begin{lemma}\label{interactionFltsDB}
  1814 \begin{center}
  1765 	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
  1815 	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
  1766 	\leq 
  1816 	\leq 
  1767 	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
  1817 	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
  1768 \end{lemma}
  1818 \end{center}
       
  1819 We need this so that we know the outcome of our real 
       
  1820 simplification is better than or equal to a rough estimate,
       
  1821 and therefore can be bounded by that estimate.
       
  1822 This is a bit harder to establish compared with proving
       
  1823 $\textit{flts}$ does not make a list larger (which can
       
  1824 be proven using routine induction):
       
  1825 \begin{center}
       
  1826 	$\llbracket  \textit{rflts}\; rs \rrbracket_r \leq
       
  1827 	\llbracket  \textit{rs} \rrbracket_r$
       
  1828 \end{center}
       
  1829 We cannot simply prove how each helper function
       
  1830 reduces the size and then put them together:
       
  1831 From
       
  1832 \begin{center}
       
  1833 $\llbracket  \textit{rflts}\; rs \rrbracket_r \leq
       
  1834 	\llbracket \; \textit{rs} \rrbracket_r$
       
  1835 \end{center}
       
  1836 and
       
  1837 \begin{center}
       
  1838      $\llbracket  \textit{rdistinct} \; rs \; \varnothing \leq
       
  1839      \llbracket rs \rrbracket_r$
       
  1840 \end{center}
       
  1841 one cannot imply
       
  1842 \begin{center}
       
  1843 	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
       
  1844 	\leq 
       
  1845 	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
       
  1846 \end{center}
       
  1847 What we can imply is that 
       
  1848 \begin{center}
       
  1849 	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
       
  1850 	\leq
       
  1851 	\llbracket rs \rrbracket_r$
       
  1852 \end{center}
       
  1853 but this estimate is too rough and $\llbracket rs \rrbracket_r$	is unbounded.
       
  1854 The way we 
       
  1855 get through this is by first proving a more general lemma 
       
  1856 (so that the inductive case goes through):
       
  1857 \begin{lemma}\label{fltsSizeReductionAlts}
       
  1858 	If we have three accumulator sets:
       
  1859 	$noalts\_set$, $alts\_set$ and $corr\_set$,
       
  1860 	satisfying:
       
  1861 	\begin{itemize}
       
  1862 		\item
       
  1863 			$\forall r \in noalts\_set. \; \nexists xs.\; r = \sum  xs$
       
  1864 		\item
       
  1865 			$\forall r \in alts\_set. \; \exists xs. \; r = \sum xs
       
  1866 			\; \textit{and} \; set \; xs \subseteq corr\_set$
       
  1867 	\end{itemize}
       
  1868 	then we have that
       
  1869 	\begin{center}
       
  1870 	\begin{tabular}{lcl}
       
  1871 	$\llbracket  (\textit{rdistinct} \; (\textit{rflts} \; as) \;
       
  1872 	(noalts\_set \cup corr\_set)) \rrbracket_r$ & $\leq$ &\\
       
  1873 						    $\llbracket  (\textit{rdistinct} \; as \; (noalts\_set \cup alts\_set \cup
       
  1874 	\{ \ZERO \} )) \rrbracket_r$ & & \\ 
       
  1875 	\end{tabular}
       
  1876 	\end{center}
       
  1877 		holds.
       
  1878 \end{lemma}
       
  1879 \noindent
       
  1880 We need to split the accumulator into two parts: the part
       
  1881 which contains alternative regular expressions ($alts\_set$), and 
       
  1882 the part without any of them($noalts\_set$).
       
  1883 The set $corr\_set$ is the corresponding set
       
  1884 of $alts\_set$ with all elements under the $\sum$ constructor
       
  1885 spilled out.
       
  1886 \begin{proof}
       
  1887 	By induction on the list $as$. We make use of lemma \ref{rdistinctConcat}.
       
  1888 \end{proof}
       
  1889 By setting all three sets to the empty set, one gets the desired size estimate:
       
  1890 \begin{corollary}\label{interactionFltsDB}
       
  1891 	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
       
  1892 	\leq 
       
  1893 	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
       
  1894 \end{corollary}
       
  1895 \begin{proof}
       
  1896 	By using the lemma \ref{fltsSizeReductionAlts}.
       
  1897 \end{proof}
  1769 \noindent
  1898 \noindent
  1770 The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
  1899 The intuition is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
  1771 duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
  1900 duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
  1772 
  1901 
  1773 Now this $\rsimp{\sum rs}$ can be estimated using $\rdistinct{rs}{\varnothing}$:
  1902 Now this $\rsimp{\sum rs}$ can be estimated using $\rdistinct{rs}{\varnothing}$:
  1776 \end{lemma}
  1905 \end{lemma}
  1777 \begin{proof}
  1906 \begin{proof}
  1778 	By using \ref{interactionFltsDB}.
  1907 	By using \ref{interactionFltsDB}.
  1779 \end{proof}
  1908 \end{proof}
  1780 \noindent
  1909 \noindent
  1781 which says that the size of regular expression
  1910 This is a key lemma in establishing the bounds on all the 
  1782 is always smaller if we apply the full simplification
  1911 closed forms.
  1783 rather than just one component ($\rdistinct{}{}$).
  1912 With this we are now ready to control the sizes of
  1784 
  1913 $(r_1 \cdot r_2 )\backslash s$, $r^* \backslash s$.
  1785 
       
  1786 Now we are ready to control the sizes of
       
  1787 $r_1 \cdot r_2 \backslash s$, $r^* \backslash s$.
       
  1788 \begin{theorem}
  1914 \begin{theorem}
  1789 	For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
  1915 	For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
  1790 \end{theorem}
  1916 \end{theorem}
  1791 \noindent
  1917 \noindent
  1792 \begin{proof}
  1918 \begin{proof}
  1890 \end{corollary}
  2016 \end{corollary}
  1891 \begin{proof}
  2017 \begin{proof}
  1892 	By \ref{sizeRelations}.
  2018 	By \ref{sizeRelations}.
  1893 \end{proof}
  2019 \end{proof}
  1894 \noindent
  2020 \noindent
       
  2021 
       
  2022 
       
  2023 
       
  2024 
  1895 
  2025 
  1896 %-----------------------------------
  2026 %-----------------------------------
  1897 %	SECTION 2
  2027 %	SECTION 2
  1898 %-----------------------------------
  2028 %-----------------------------------
  1899 
  2029 
  2060 
  2190 
  2061 
  2191 
  2062 %----------------------------------------------------------------------------------------
  2192 %----------------------------------------------------------------------------------------
  2063 %	SECTION ALTS CLOSED FORM
  2193 %	SECTION ALTS CLOSED FORM
  2064 %----------------------------------------------------------------------------------------
  2194 %----------------------------------------------------------------------------------------
  2065 \section{A Closed Form for \textit{ALTS}}
  2195 %\section{A Closed Form for \textit{ALTS}}
  2066 Now we prove that  $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
  2196 %Now we prove that  $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
  2067 
  2197 %
  2068 
  2198 %
  2069 There are a few key steps, one of these steps is
  2199 %There are a few key steps, one of these steps is
  2070 
  2200 %
  2071 
  2201 %
  2072 
  2202 %
  2073 One might want to prove this by something a simple statement like: 
  2203 %One might want to prove this by something a simple statement like: 
  2074 
  2204 %
  2075 For this to hold we want the $\textit{distinct}$ function to pick up
  2205 %For this to hold we want the $\textit{distinct}$ function to pick up
  2076 the elements before and after derivatives correctly:
  2206 %the elements before and after derivatives correctly:
  2077 $r \in rset \equiv (rder x r) \in (rder x rset)$.
  2207 %$r \in rset \equiv (rder x r) \in (rder x rset)$.
  2078 which essentially requires that the function $\backslash$ is an injective mapping.
  2208 %which essentially requires that the function $\backslash$ is an injective mapping.
  2079 
  2209 %
  2080 Unfortunately the function $\backslash c$ is not an injective mapping.
  2210 %Unfortunately the function $\backslash c$ is not an injective mapping.
  2081 
  2211 %
  2082 \subsection{function $\backslash c$ is not injective (1-to-1)}
  2212 %\subsection{function $\backslash c$ is not injective (1-to-1)}
  2083 \begin{center}
  2213 %\begin{center}
  2084 	The derivative $w.r.t$ character $c$ is not one-to-one.
  2214 %	The derivative $w.r.t$ character $c$ is not one-to-one.
  2085 	Formally,
  2215 %	Formally,
  2086 	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
  2216 %	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
  2087 \end{center}
  2217 %\end{center}
  2088 This property is trivially true for the
  2218 %This property is trivially true for the
  2089 character regex example:
  2219 %character regex example:
  2090 \begin{center}
  2220 %\begin{center}
  2091 	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
  2221 %	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
  2092 \end{center}
  2222 %\end{center}
  2093 But apart from the cases where the derivative
  2223 %But apart from the cases where the derivative
  2094 output is $\ZERO$, are there non-trivial results
  2224 %output is $\ZERO$, are there non-trivial results
  2095 of derivatives which contain strings?
  2225 %of derivatives which contain strings?
  2096 The answer is yes.
  2226 %The answer is yes.
  2097 For example,
  2227 %For example,
  2098 \begin{center}
  2228 %\begin{center}
  2099 	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
  2229 %	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
  2100 	where $a$ is not nullable.\\
  2230 %	where $a$ is not nullable.\\
  2101 	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
  2231 %	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
  2102 	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
  2232 %	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
  2103 \end{center}
  2233 %\end{center}
  2104 We start with two syntactically different regular expressions,
  2234 %We start with two syntactically different regular expressions,
  2105 and end up with the same derivative result.
  2235 %and end up with the same derivative result.
  2106 This is not surprising as we have such 
  2236 %This is not surprising as we have such 
  2107 equality as below in the style of Arden's lemma:\\
  2237 %equality as below in the style of Arden's lemma:\\
  2108 \begin{center}
  2238 %\begin{center}
  2109 	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
  2239 %	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
  2110 \end{center}
  2240 %\end{center}
  2111 
  2241 
       
  2242 \section{Further Improvements to the Bound}
  2112 There are two problems with this finiteness result, though.
  2243 There are two problems with this finiteness result, though.
  2113 \begin{itemize}
  2244 \begin{itemize}
  2114 	\item
  2245 	\item
  2115 		First, It is not yet a direct formalisation of our lexer's complexity,
  2246 		First, It is not yet a direct formalisation of our lexer's complexity,
  2116 		as a complexity proof would require looking into 
  2247 		as a complexity proof would require looking into