ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 640 bd1354127574
parent 639 80cc6dc4c98b
child 659 2e05f04ed6b3
equal deleted inserted replaced
639:80cc6dc4c98b 640:bd1354127574
     6 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
     6 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
     7 %of our bitcoded algorithm, that is a finite bound on the size of any 
     7 %of our bitcoded algorithm, that is a finite bound on the size of any 
     8 %regex's derivatives. 
     8 %regex's derivatives. 
     9 
     9 
    10 
    10 
    11 In this chapter we give a bound in terms of size of 
    11 In this chapter we give a bound in terms of the size of 
    12 the calculated derivatives: 
    12 the calculated derivatives: 
    13 given an annotated regular expression $a$, for any string $s$
    13 given an annotated regular expression $a$, for any string $s$
    14 our algorithm $\blexersimp$'s derivatives
    14 our algorithm $\blexersimp$'s derivatives
    15 are finitely bounded
    15 are finitely bounded
    16 by a constant  that only depends on $a$.
    16 by a constant that only depends on $a$.
    17 Formally we show that there exists an $N_a$ such that
    17 Formally we show that there exists an $N_a$ such that
    18 \begin{center}
    18 \begin{center}
    19 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
    19 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
    20 \end{center}
    20 \end{center}
    21 \noindent
    21 \noindent
    22 where the size ($\llbracket \_ \rrbracket$) of 
    22 where the size ($\llbracket \_ \rrbracket$) of 
    23 an annotated regular expression is defined
    23 an annotated regular expression is defined
    24 in terms of the number of nodes in its 
    24 in terms of the number of nodes in its 
    25 tree structure (its recursive definition given in the next page).
    25 tree structure (its recursive definition is given in the next page).
    26 We believe this size bound
    26 We believe this size bound
    27 is important in the context of POSIX lexing, because 
    27 is important in the context of POSIX lexing because 
    28 \begin{itemize}
    28 \begin{itemize}
    29 	\item
    29 	\item
    30 		It is a stepping stone towards the goal 
    30 		It is a stepping stone towards the goal 
    31 		of eliminating ``catastrophic backtracking''. 
    31 		of eliminating ``catastrophic backtracking''. 
    32 		If the internal data structures used by our algorithm
    32 		If the internal data structures used by our algorithm
    35 		be slow.
    35 		be slow.
    36 		The next step is to refine the bound $N_a$ so that it
    36 		The next step is to refine the bound $N_a$ so that it
    37 		is not just finite but polynomial in $\llbracket a\rrbracket$.
    37 		is not just finite but polynomial in $\llbracket a\rrbracket$.
    38 	\item
    38 	\item
    39 		Having the finite bound formalised 
    39 		Having the finite bound formalised 
    40 		gives us a higher confidence that
    40 		gives us higher confidence that
    41 		our simplification algorithm $\simp$ does not ``mis-behave''
    41 		our simplification algorithm $\simp$ does not ``misbehave''
    42 		like $\textit{simpSL}$ does.
    42 		like $\textit{simpSL}$ does.
    43 		The bound is universal for a given regular expression, 
    43 		The bound is universal for a given regular expression, 
    44 		which is an advantage over work which 
    44 		which is an advantage over work which 
    45 		only gives empirical evidence on 
    45 		only gives empirical evidence on 
    46 		some test cases (see for example Verbatim work \cite{Verbatimpp}).
    46 		some test cases (see for example Verbatim work \cite{Verbatimpp}).
   119 After $\simp$ the node shrinks.
   119 After $\simp$ the node shrinks.
   120 Our proof states that all the blue nodes
   120 Our proof states that all the blue nodes
   121 stay below a size bound $N_a$ determined by the input $a$.
   121 stay below a size bound $N_a$ determined by the input $a$.
   122 
   122 
   123 \noindent
   123 \noindent
   124 Sulzmann and Lu's assumed a similar picture about their algorithm,
   124 Sulzmann and Lu's assumed a similar picture of their algorithm,
   125 though in fact their algorithm's size might be better depicted by the following graph:
   125 though in fact their algorithm's size might be better depicted by the following graph:
   126 \begin{figure}[H]
   126 \begin{figure}[H]
   127 	\begin{tikzpicture}[scale=2,
   127 	\begin{tikzpicture}[scale=2,
   128 		every node/.style={minimum size=11mm},
   128 		every node/.style={minimum size=11mm},
   129 		->,>=stealth',shorten >=1pt,auto,thick
   129 		->,>=stealth',shorten >=1pt,auto,thick
   160 of each derivative step to grow continuously (for example 
   160 of each derivative step to grow continuously (for example 
   161 in \ref{SulzmannLuLexerTime}).
   161 in \ref{SulzmannLuLexerTime}).
   162 They tested out the run time of their
   162 They tested out the run time of their
   163 lexer on particular examples such as $(a+b+ab)^*$
   163 lexer on particular examples such as $(a+b+ab)^*$
   164 and claimed that their algorithm is linear w.r.t to the input.
   164 and claimed that their algorithm is linear w.r.t to the input.
   165 With our mecahnised proof, we avoid this type of unintentional
   165 With our mechanised proof, we avoid this type of unintentional
   166 generalisation.
   166 generalisation.
   167 
   167 
   168 Before delving in to the details of the formalisation,
   168 Before delving into the details of the formalisation,
   169 we are going to provide an overview of it in the next subsection.
   169 we are going to provide an overview of it in the following subsection.
   170 
   170 
   171 
   171 
   172 \subsection{Overview of the Proof}
   172 \subsection{Overview of the Proof}
   173 A high-level overview of the main components of the finiteness proof
   173 A high-level overview of the main components of the finiteness proof
   174 is as follows:
   174 is as follows:
   234 $\textit{rrexp}$s.
   234 $\textit{rrexp}$s.
   235 They are annotated regular expressions without bitcodes,
   235 They are annotated regular expressions without bitcodes,
   236 allowing a more convenient size bound proof.
   236 allowing a more convenient size bound proof.
   237 %Of course, the bits which encode the lexing information 
   237 %Of course, the bits which encode the lexing information 
   238 %would grow linearly with respect 
   238 %would grow linearly with respect 
   239 %to the input, which should be taken into account when we wish to tackle the runtime comlexity.
   239 %to the input, which should be taken into accounte when we wish to tackle the runtime complexity.
   240 %But for the sake of the structural size 
   240 %But for the sake of the structural size 
   241 %we can safely ignore them.\\
   241 %we can safely ignore them.\\
   242 The datatype 
   242 The datatype 
   243 definition of the $\rrexp$, called
   243 definition of the $\rrexp$, called
   244 \emph{r-regular expressions},
   244 \emph{r-regular expressions},
   267 	\end{tabular}
   267 	\end{tabular}
   268 \end{center}
   268 \end{center}
   269 \noindent
   269 \noindent
   270 The $r$ in the subscript of $\llbracket \rrbracket_r$ is to 
   270 The $r$ in the subscript of $\llbracket \rrbracket_r$ is to 
   271 differentiate with the same operation for annotated regular expressions.
   271 differentiate with the same operation for annotated regular expressions.
   272 Adding $r$ as subscript will be used in 
   272 Similar subscripts will be added for operations like $\rerase{}$:
   273 other operations as well.\\
       
   274 The transformation from an annotated regular expression
       
   275 to an r-regular expression is straightforward.
       
   276 \begin{center}
   273 \begin{center}
   277 	\begin{tabular}{lcl}
   274 	\begin{tabular}{lcl}
   278 		$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
   275 		$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
   279 		$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
   276 		$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
   280 		$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
   277 		$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
   312 The singleton alternative $\sum [r]$ becomes $r$ during the
   309 The singleton alternative $\sum [r]$ becomes $r$ during the
   313 $\erase$ function.
   310 $\erase$ function.
   314 The  annotated regular expression $\sum[a, b, c]$ would turn into
   311 The  annotated regular expression $\sum[a, b, c]$ would turn into
   315 $(a+(b+c))$.
   312 $(a+(b+c))$.
   316 All these operations change the size and structure of
   313 All these operations change the size and structure of
   317 an annotated regular expressions, adding unnecessary 
   314 an annotated regular expression, adding unnecessary 
   318 complexities to the size bound proof.
   315 complexities to the size bound proof.
   319 
   316 
   320 For example, if we define the size of a basic plain regular expression 
   317 For example, if we define the size of a basic plain regular expression 
   321 in the usual way,
   318 in the usual way,
   322 \begin{center}
   319 \begin{center}
   520 
   517 
   521 %The way we obtain the bound for $\rrexp$s is by two steps:
   518 %The way we obtain the bound for $\rrexp$s is by two steps:
   522 %\begin{itemize}
   519 %\begin{itemize}
   523 %	\item
   520 %	\item
   524 %		First, we rewrite $r\backslash s$ into something else that is easier
   521 %		First, we rewrite $r\backslash s$ into something else that is easier
   525 %		to bound. This step is especially important for the inductive case 
   522 %		to bound. This step is crucial for the inductive case 
   526 %		$r_1 \cdot r_2$ and $r^*$, where the derivative can grow and bloat in a wild way,
   523 %		$r_1 \cdot r_2$ and $r^*$, where the derivative can grow and bloat in a wild way,
   527 %		but after simplification they will always be equal or smaller to a form consisting of an alternative
   524 %		but after simplification, they will always be equal or smaller to a form consisting of an alternative
   528 %		list of regular expressions $f \; (g\; (\sum rs))$ with some functions applied to it, where each element will be distinct after the function application.
   525 %		list of regular expressions $f \; (g\; (\sum rs))$ with some functions applied to it, where each element will be distinct after the function application.
   529 %	\item
   526 %	\item
   530 %		Then, for such a sum  list of regular expressions $f\; (g\; (\sum rs))$, we can control its size
   527 %		Then, for such a sum  list of regular expressions $f\; (g\; (\sum rs))$, we can control its size
   531 %		by estimation, since $\distinctBy$ and $\flts$ are well-behaved and working together would only 
   528 %		by estimation, since $\distinctBy$ and $\flts$ are well-behaved and working together would only 
   532 %		reduce the size of a regular expression, not adding to it.
   529 %		reduce the size of a regular expression, not adding to it.
   578 for the set of substrings of $s$).
   575 for the set of substrings of $s$).
   579 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
   576 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
   580 in the simplification, which prevents the $rs$ from growing indefinitely.
   577 in the simplification, which prevents the $rs$ from growing indefinitely.
   581 
   578 
   582 Based on this idea, we develop a proof in two steps.
   579 Based on this idea, we develop a proof in two steps.
   583 First, we show the equality (where
   580 First, we show the below equality (where
   584 $f$ and $g$ are functions that do not increase the size of the input)
   581 $f$ and $g$ are functions that do not increase the size of the input)
   585 \begin{center}
   582 \begin{center}
   586 $r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
   583 $r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
   587 \end{center}
   584 \end{center}
   588 where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
   585 where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
   602 We will describe in detail the first step of the proof
   599 We will describe in detail the first step of the proof
   603 in the next section.
   600 in the next section.
   604 
   601 
   605 \section{Closed Forms}
   602 \section{Closed Forms}
   606 In this section we introduce in detail
   603 In this section we introduce in detail
   607 how we express the string derivatives
   604 how to express the string derivatives
   608 of regular expressions (i.e. $r \backslash_r s$ where $s$ is a string
   605 of regular expressions (i.e. $r \backslash_r s$ where $s$ is a string
   609 rather than a single character) in a different way than 
   606 rather than a single character) in a different way than 
   610 our previous definition.
   607 our previous definition.
   611 In previous chapters, the derivative of a 
   608 In previous chapters, the derivative of a 
   612 regular expression $r$ w.r.t a string $s$
   609 regular expression $r$ w.r.t a string $s$
   621 $(r_1 \cdot r_2)\backslash s$,
   618 $(r_1 \cdot r_2)\backslash s$,
   622 we have to somehow get a more concrete form to begin.
   619 we have to somehow get a more concrete form to begin.
   623 We call such more concrete representations the ``closed forms'' of
   620 We call such more concrete representations the ``closed forms'' of
   624 string derivatives as opposed to their original definitions.
   621 string derivatives as opposed to their original definitions.
   625 The terminology ``closed form'' is borrowed from mathematics,
   622 The terminology ``closed form'' is borrowed from mathematics,
   626 which usually describe expressions that are solely comprised of
   623 which usually describe expressions that are solely comprised of finitely many
   627 well-known and easy-to-compute operations such as 
   624 well-known and easy-to-compute operations such as 
   628 additions, multiplications, exponential functions.
   625 additions, multiplications, and exponential functions.
   629 
   626 
   630 We start by proving some basic identities
   627 We start by proving some basic identities
   631 involving the simplification functions for r-regular expressions.
   628 involving the simplification functions for r-regular expressions.
   632 After that we introduce the rewrite relations
   629 After that we introduce the rewrite relations
   633 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
   630 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
   634 $\rightsquigarrow_f$ and $\rightsquigarrow_g$.
   631 $\rightsquigarrow_f$ and $\rightsquigarrow_g$.
   635 These relations involves similar techniques as in chapter \ref{Bitcoded2}
   632 These relations involve similar techniques as in chapter \ref{Bitcoded2}
   636 for annotated regular expressions.
   633 for annotated regular expressions.
   637 Finally, we use these identities to establish the
   634 Finally, we use these identities to establish the
   638 closed forms of the alternative regular expression,
   635 closed forms of the alternative regular expression,
   639 the sequence regular expression, and the star regular expression.
   636 the sequence regular expression, and the star regular expression.
   640 %$r_1\cdot r_2$, $r^*$ and $\sum rs$.
   637 %$r_1\cdot r_2$, $r^*$ and $\sum rs$.
   643 
   640 
   644 \subsection{Some Basic Identities}
   641 \subsection{Some Basic Identities}
   645 
   642 
   646 In what follows we will often convert between lists
   643 In what follows we will often convert between lists
   647 and sets.
   644 and sets.
   648 We use Isabelle's $set$ to refere to the 
   645 We use Isabelle's $set$ to refer to the 
   649 function that converts a list $rs$ to the set
   646 function that converts a list $rs$ to the set
   650 containing all the elements in $rs$.
   647 containing all the elements in $rs$.
   651 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
   648 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
   652 The $\textit{rdistinct}$ function, as its name suggests, will
   649 The $\textit{rdistinct}$ function, as its name suggests, will
   653 de-duplicate an r-regular expression list.
   650 de-duplicate an r-regular expression list.
   654 It will also remove any elements that 
   651 It will also remove any elements that 
   655 is already in the accumulator set.
   652 are already in the accumulator set.
   656 \begin{lemma}\label{rdistinctDoesTheJob}
   653 \begin{lemma}\label{rdistinctDoesTheJob}
   657 	%The function $\textit{rdistinct}$ satisfies the following
   654 	%The function $\textit{rdistinct}$ satisfies the following
   658 	%properties:
   655 	%properties:
   659 	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
   656 	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
   660 	recursive definition here. Its Isabelle counterpart would be $\textit{distinct}$.} 
   657 	recursive definition here. Its Isabelle counterpart would be $\textit{distinct}$.} 
   673 	\end{itemize}
   670 	\end{itemize}
   674 \end{lemma}
   671 \end{lemma}
   675 \noindent
   672 \noindent
   676 \begin{proof}
   673 \begin{proof}
   677 	The first part is by an induction on $rs$.
   674 	The first part is by an induction on $rs$.
   678 	The second and third part can be proven by using the 
   675 	The second and third parts can be proven by using the 
   679 	inductive cases of $\textit{rdistinct}$.
   676 	inductive cases of $\textit{rdistinct}$.
   680 
   677 
   681 \end{proof}
   678 \end{proof}
   682 
   679 
   683 \noindent
   680 \noindent
   802 
   799 
   803 \subsubsection{The Properties of $\textit{Rflts}$} 
   800 \subsubsection{The Properties of $\textit{Rflts}$} 
   804 We give in this subsection some properties
   801 We give in this subsection some properties
   805 involving $\backslash_r$, $\backslash_{rsimps}$, $\textit{rflts}$ and 
   802 involving $\backslash_r$, $\backslash_{rsimps}$, $\textit{rflts}$ and 
   806 $\textit{rsimp}_{ALTS} $, together with any non-trivial lemmas that lead to them.
   803 $\textit{rsimp}_{ALTS} $, together with any non-trivial lemmas that lead to them.
   807 These will be helpful in later closed form proofs, when
   804 These will be helpful in later closed-form proofs, when
   808 we want to transform derivative terms which have
   805 we want to transform derivative terms which have
   809 %the ways in which multiple functions involving
   806 %the ways in which multiple functions involving
   810 %those are composed together
   807 %those are composed together
   811 interleaving derivatives and  simplifications applied to them.
   808 interleaving derivatives and simplifications applied to them.
   812 
   809 
   813 \noindent
   810 \noindent
   814 %When the function $\textit{Rflts}$ 
   811 %When the function $\textit{Rflts}$ 
   815 %is applied to the concatenation of two lists, the output can be calculated by first applying the
   812 %is applied to the concatenation of two lists; the output can be calculated by first applying the
   816 %functions on two lists separately, and then concatenating them together.
   813 %functions on two lists separately and then concatenating them together.
   817 $\textit{Rflts}$ is composable in terms of concatenation:
   814 $\textit{Rflts}$ is composable in terms of concatenation:
   818 \begin{lemma}\label{rfltsProps}
   815 \begin{lemma}\label{rfltsProps}
   819 	The function $\rflts$ has the properties below:\\
   816 	The function $\rflts$ has the properties below:\\
   820 	\begin{itemize}
   817 	\begin{itemize}
   821 		\item
   818 		\item
   917 \noindent
   914 \noindent
   918 We omit the recursive definition of the predicate $\textit{nonAlt}$,
   915 We omit the recursive definition of the predicate $\textit{nonAlt}$,
   919 which evaluates to true when the regular expression is not an
   916 which evaluates to true when the regular expression is not an
   920 alternative, and false otherwise.
   917 alternative, and false otherwise.
   921 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
   918 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
   922 its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$, 
   919 its non-empty argument list of expressions are all good themselves, and $\textit{nonAlt}$, 
   923 and unique:
   920 and unique:
   924 \begin{lemma}\label{rsimpaltsGood}
   921 \begin{lemma}\label{rsimpaltsGood}
   925 	If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
   922 	If $rs \neq []$ and for all $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
   926 	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
   923 	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
   927 \end{lemma}
   924 \end{lemma}
   928 \noindent
   925 \noindent
   929 We also note that
   926 We also note that
   930 if a regular expression $r$ is good, then $\rflts$ on the singleton
   927 if a regular expression $r$ is good, then $\rflts$ on the singleton
   960 \end{lemma}
   957 \end{lemma}
   961 \begin{proof}
   958 \begin{proof}
   962 	By induction on $r$.
   959 	By induction on $r$.
   963 \end{proof}
   960 \end{proof}
   964 \noindent
   961 \noindent
   965 With this we can prove that a regular expressions
   962 With this we can prove that a regular expression
   966 after simplification and flattening and de-duplication,
   963 after simplification and flattening and de-duplication,
   967 will not contain any alternative regular expression directly:
   964 will not contain any alternative regular expression directly:
   968 \begin{lemma}\label{nonaltFltsRd}
   965 \begin{lemma}\label{nonaltFltsRd}
   969 	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ 
   966 	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ 
   970 	then $\textit{nonAlt} \; x$.
   967 	then $\textit{nonAlt} \; x$.
   993 			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
   990 			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
   994 			then $\good \; x$.
   991 			then $\good \; x$.
   995 	\end{itemize}
   992 	\end{itemize}
   996 \end{lemma}
   993 \end{lemma}
   997 \begin{proof}
   994 \begin{proof}
   998 	The first part is by induction on $rs$, where the induction
   995 	The first part is by induction, where the inductive cases
   999 	rule is the inductive cases for $\rflts$.
   996 	are the inductive cases of $\rflts$.
  1000 	The second part is a corollary from the first part.
   997 	The second part is a corollary from the first part.
  1001 \end{proof}
   998 \end{proof}
  1002 
   999 
  1003 This leads to good structural property of $\rsimp{}$,
  1000 This leads to good structural property of $\rsimp{}$,
  1004 that after simplification, a regular expression is
  1001 that after simplification, a regular expression is
  1010 	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
  1007 	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
  1011 	Lemma \ref{rsimpMono} says that 
  1008 	Lemma \ref{rsimpMono} says that 
  1012 	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
  1009 	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
  1013 	$\llbracket r \rrbracket_r$.
  1010 	$\llbracket r \rrbracket_r$.
  1014 	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
  1011 	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
  1015 	Inductive hypothesis applies to the children regular expressions
  1012 	The inductive hypothesis applies to the children regular expressions
  1016 	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
  1013 	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
  1017 	by that as well.
  1014 	by that as well.
  1018 	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
  1015 	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
  1019 	to ensure that goodness is preserved at the topmost level.
  1016 	to ensure that goodness is preserved at the topmost level.
  1020 \end{proof}
  1017 \end{proof}
  1099 		\item
  1096 		\item
  1100 			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
  1097 			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
  1101 	\end{itemize}
  1098 	\end{itemize}
  1102 \end{corollary}
  1099 \end{corollary}
  1103 \noindent
  1100 \noindent
  1104 This will be useful in the later closed form proof's rewriting steps.
  1101 This will be useful in the later closed-form proof's rewriting steps.
  1105 Similarly, we state the following useful facts below:
  1102 Similarly, we state the following useful facts below:
  1106 \begin{lemma}
  1103 \begin{lemma}
  1107 	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
  1104 	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
  1108 	\begin{itemize}
  1105 	\begin{itemize}
  1109 		\item
  1106 		\item
  1440 the first ($r\backslash c$) to the other ($r'\backslash c$).
  1437 the first ($r\backslash c$) to the other ($r'\backslash c$).
  1441 \begin{lemma}\label{interleave}
  1438 \begin{lemma}\label{interleave}
  1442 	If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$
  1439 	If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$
  1443 \end{lemma}
  1440 \end{lemma}
  1444 \noindent
  1441 \noindent
  1445 This allows us proving more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$.
  1442 This allows us to prove more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$.
  1446 \begin{lemma}\label{insideSimpRemoval}
  1443 \begin{lemma}\label{insideSimpRemoval}
  1447 	$\rsimp{(\rder{c}{(\rsimp{r})})} = \rsimp{(\rder{c}{r})}  $
  1444 	$\rsimp{(\rder{c}{(\rsimp{r})})} = \rsimp{(\rder{c}{r})}  $
  1448 \end{lemma}
  1445 \end{lemma}
  1449 \noindent
  1446 \noindent
  1450 \begin{proof}
  1447 \begin{proof}
  1528 		$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') 
  1525 		$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') 
  1529 		+ r_2 \backslash cc'c''$ & 
  1526 		+ r_2 \backslash cc'c''$ & 
  1530 		$   \longrightarrow_{\backslash c''} \quad \ldots$\\
  1527 		$   \longrightarrow_{\backslash c''} \quad \ldots$\\
  1531 	\end{tabular}
  1528 	\end{tabular}
  1532 \end{center}
  1529 \end{center}
  1533 Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expresssed as 
  1530 Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expressed as 
  1534 a giant alternative taking a list of terms 
  1531 a giant alternative taking a list of terms 
  1535 $[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$,
  1532 $[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$,
  1536 where the head of the list is always the term
  1533 where the head of the list is always the term
  1537 representing a match involving only $r_1$, and the tail of the list consisting of
  1534 representing a match involving only $r_1$, and the tail of the list consisting of
  1538 terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$.
  1535 terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$.
  1598 In a closed-form one needs to take into account this (because
  1595 In a closed-form one needs to take into account this (because
  1599 closed forms require exact equality rather than language equivalence)
  1596 closed forms require exact equality rather than language equivalence)
  1600 and only generate the 
  1597 and only generate the 
  1601 $r_2 \backslash_r s''$ terms satisfying the property
  1598 $r_2 \backslash_r s''$ terms satisfying the property
  1602 \begin{center}
  1599 \begin{center}
  1603 $\exists  s'.  such \; that \; s'@s'' = s \;\; \land \;\;
  1600 $\exists s'.  such \; that \; s'@s'' = s \;\; \land \;\;
  1604 r_1 \backslash s' \; is \; nullable$.
  1601 r_1 \backslash s' \; is \; nullable$.
  1605 \end{center}
  1602 \end{center}
  1606 Given the arguments $s$ and $r_1$, we denote the list of strings
  1603 Given the arguments $s$ and $r_1$, we denote the list of strings
  1607 $s''$ satisfying the above property as $\vsuf{s}{r_1}$.
  1604 $s''$ satisfying the above property as $\vsuf{s}{r_1}$.
  1608 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string\footnote{
  1605 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string\footnote{
  1609 	Perhaps a better name of it would be ``NullablePrefixSuffix''
  1606 	Perhaps a better name for it would be ``NullablePrefixSuffix''
  1610 	to differentiate with the list of \emph{all} prefixes of $s$, but
  1607 	to differentiate with the list of \emph{all} prefixes of $s$, but
  1611 	that is a bit too long for a function name and we are yet to find
  1608 	that is a bit too long for a function name and we are yet to find
  1612 a more concise and easy-to-understand name.}
  1609 a more concise and easy-to-understand name.}
  1613 \begin{center}
  1610 \begin{center}
  1614 	\begin{tabular}{lcl}
  1611 	\begin{tabular}{lcl}
  1833 When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, 
  1830 When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, 
  1834 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
  1831 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
  1835 the number of terms in $r^* \backslash s$ will grow exponentially rather than linearly
  1832 the number of terms in $r^* \backslash s$ will grow exponentially rather than linearly
  1836 in the sequence case.
  1833 in the sequence case.
  1837 The good news is that the function $\textit{rsimp}$ will again
  1834 The good news is that the function $\textit{rsimp}$ will again
  1838 ignore the difference between differently nesting patterns of alternatives,
  1835 ignore the difference between different nesting patterns of alternatives,
  1839 and the exponentially growing star derivative like
  1836 and the exponentially growing star derivative like
  1840 \begin{center}
  1837 \begin{center}
  1841 	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
  1838 	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
  1842 	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
  1839 	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
  1843 \end{center}
  1840 \end{center}
  2018 		\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$
  2015 		\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$
  2019 	\end{center}
  2016 	\end{center}
  2020 	holds.
  2017 	holds.
  2021 \end{lemma}
  2018 \end{lemma}
  2022 \begin{proof}
  2019 \begin{proof}
  2023 	By an induction on the inductivev cases of $\textit{createdByStar}$.
  2020 	By an induction on the inductive cases of $\textit{createdByStar}$.
  2024 \end{proof}
  2021 \end{proof}
  2025 %This is not entirely true for annotated regular expressions: 
  2022 %This is not entirely true for annotated regular expressions: 
  2026 %%TODO: bsimp bders \neq bderssimp
  2023 %%TODO: bsimp bders \neq bderssimp
  2027 %\begin{center}
  2024 %\begin{center}
  2028 %	$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
  2025 %	$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
  2088 \end{lemma}
  2085 \end{lemma}
  2089 
  2086 
  2090 \begin{proof}
  2087 \begin{proof}
  2091 	By using the rewriting relation $\rightsquigarrow$
  2088 	By using the rewriting relation $\rightsquigarrow$
  2092 \end{proof}
  2089 \end{proof}
  2093 And from this we obtain that a 
  2090 And from this we obtain the following fact: a 
  2094 regular expression created by star 
  2091 regular expression created by star 
  2095 is the same as its flattened version, up to equivalence under $\textit{bsimp}$.
  2092 is the same as its flattened version, up to equivalence under $\textit{bsimp}$.
  2096 For example,
  2093 For example,
  2097 \begin{lemma}\label{hfauRsimpeq2}
  2094 \begin{lemma}\label{hfauRsimpeq2}
  2098 	$\textit{createdByStar} \; r \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
  2095 	$\textit{createdByStar} \; r \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
  2236 %\end{center}
  2233 %\end{center}
  2237 \section{Bounding Closed Forms}
  2234 \section{Bounding Closed Forms}
  2238 
  2235 
  2239 In this section, we introduce how we formalised the bound
  2236 In this section, we introduce how we formalised the bound
  2240 on closed forms.
  2237 on closed forms.
  2241 We first show that in general regular expressions up to a certain 
  2238 We first show that in general the number of regular expressions up to a certain 
  2242 size are finite.
  2239 size is finite.
  2243 Then we prove that functions such as $\rflts$
  2240 Then we prove that functions such as $\rflts$
  2244 will not cause the size of r-regular expressions to grow.
  2241 will not cause the size of r-regular expressions to grow.
  2245 Putting this together with a general bound 
  2242 Putting this together with a general bound 
  2246 on the finiteness of distinct regular expressions
  2243 on the finiteness of distinct regular expressions
  2247 up to a certain size, we obtain a bound on 
  2244 up to a specific size, we obtain a bound on 
  2248 the closed forms.
  2245 the closed forms.
  2249 
  2246 
  2250 \subsection{Finiteness of Distinct Regular Expressions}
  2247 \subsection{Finiteness of Distinct Regular Expressions}
  2251 We define the set of regular expressions whose size are no more than
  2248 We define the set of regular expressions whose size is no more than
  2252 a certain size $N$ as $\textit{sizeNregex} \; N$:
  2249 a certain size $N$ as $\textit{sizeNregex} \; N$:
  2253 \[
  2250 \[
  2254 	\textit{sizeNregex} \; N \dn \{r\; \mid \;  \llbracket r \rrbracket_r \leq N \}
  2251 	\textit{sizeNregex} \; N \dn \{r\; \mid \;  \llbracket r \rrbracket_r \leq N \}
  2255 \]
  2252 \]
  2256 We have that $\textit{sizeNregex} \; N$ is always a finite set:
  2253 We have that $\textit{sizeNregex} \; N$ is always a finite set:
  2259 \end{lemma}
  2256 \end{lemma}
  2260 \begin{proof}
  2257 \begin{proof}
  2261 	By splitting the set $\textit{sizeNregex} \; (N + 1)$ into
  2258 	By splitting the set $\textit{sizeNregex} \; (N + 1)$ into
  2262 	subsets by their categories:
  2259 	subsets by their categories:
  2263 	$\{\ZERO_r, \ONE_r, c\}$, $\{r^* \mid r \in \textit{sizeNregex} \; N\}$,
  2260 	$\{\ZERO_r, \ONE_r, c\}$, $\{r^* \mid r \in \textit{sizeNregex} \; N\}$,
  2264 	and so on. Each of these subsets are finitely bounded.
  2261 	and so on. Each of these subsets is finitely bounded.
  2265 \end{proof}
  2262 \end{proof}
  2266 \noindent
  2263 \noindent
  2267 From this we get a corollary that
  2264 From this we get a corollary that
  2268 if forall $r \in rs$, $\rsize{r} \leq N$, then the output of 
  2265 if forall $r \in rs$, $\rsize{r} \leq N$, then the output of 
  2269 $\rdistinct{rs}{\varnothing}$ is a list of regular
  2266 $\rdistinct{rs}{\varnothing}$ is a list of regular
  2327 				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r 
  2324 				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r 
  2328 			\]
  2325 			\]
  2329 	\end{itemize}
  2326 	\end{itemize}
  2330 \end{lemma}
  2327 \end{lemma}
  2331 \begin{proof}
  2328 \begin{proof}
  2332 	Point 1, 3, 4 can be proven by an induction on $rs$.
  2329 	Points 1, 3, and 4 can be proven by an induction on $rs$.
  2333 	Point 2 is by case analysis on $r_1$ and $r_2$.
  2330 	Point 2 is by case analysis on $r_1$ and $r_2$.
  2334 	The last part is a corollary of the previous ones.
  2331 	The last part is a corollary of the previous ones.
  2335 \end{proof}
  2332 \end{proof}
  2336 \noindent
  2333 \noindent
  2337 With the lemmas for each inductive case in place, we are ready to get 
  2334 With the lemmas for each inductive case in place, we are ready to get 
  2372 can be described by some inductive sub-structures 
  2369 can be described by some inductive sub-structures 
  2373 (for example when $r = r_1 \cdot r_2$ then $rs'$ 
  2370 (for example when $r = r_1 \cdot r_2$ then $rs'$ 
  2374 will be solely comprised of $r_1 \backslash s'$ 
  2371 will be solely comprised of $r_1 \backslash s'$ 
  2375 and $r_2 \backslash s''$, $s'$ and $s''$ being 
  2372 and $r_2 \backslash s''$, $s'$ and $s''$ being 
  2376 sub-strings of $s$).
  2373 sub-strings of $s$).
  2377 which will each have a size uppder bound 
  2374 which will each have a size upper bound 
  2378 according to inductive hypothesis, which controls $r \backslash s$.
  2375 according to the inductive hypothesis, which controls $r \backslash s$.
  2379 
  2376 
  2380 We elaborate the above reasoning by a series of lemmas
  2377 We elaborate the above reasoning by a series of lemmas
  2381 below, where straightforward proofs are omitted.
  2378 below, where straightforward proofs are omitted.
  2382 %We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
  2379 %We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
  2383 We show that $\textit{rdistinct}$ and $\rflts$
  2380 We show that $\textit{rdistinct}$ and $\rflts$
  2389 	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
  2386 	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
  2390 \end{center}
  2387 \end{center}
  2391 We need this so that we know the outcome of our real 
  2388 We need this so that we know the outcome of our real 
  2392 simplification is better than or equal to a rough estimate,
  2389 simplification is better than or equal to a rough estimate,
  2393 and therefore can be bounded by that estimate.
  2390 and therefore can be bounded by that estimate.
  2394 This is a bit harder to establish compared with proving
  2391 This is a bit harder to establish compared to proving
  2395 $\textit{flts}$ does not make a list larger (which can
  2392 $\textit{flts}$ does not make a list larger (which can
  2396 be proven using routine induction):
  2393 be proven using routine induction):
  2397 \begin{center}
  2394 \begin{center}
  2398 	$\llbracket  \textit{rflts}\; rs \rrbracket_r \leq
  2395 	$\llbracket  \textit{rflts}\; rs \rrbracket_r \leq
  2399 	\llbracket  \textit{rs} \rrbracket_r$
  2396 	\llbracket  \textit{rs} \rrbracket_r$
  2484 \end{lemma}
  2481 \end{lemma}
  2485 \begin{proof}
  2482 \begin{proof}
  2486 	By using corollary \ref{interactionFltsDB}.
  2483 	By using corollary \ref{interactionFltsDB}.
  2487 \end{proof}
  2484 \end{proof}
  2488 \noindent
  2485 \noindent
  2489 This is a key lemma in establishing the bounds on all the 
  2486 This is a key lemma in establishing the bounds of all the 
  2490 closed forms.
  2487 closed forms.
  2491 With this we are now ready to control the sizes of
  2488 With this we are now ready to control the sizes of
  2492 $(r_1 \cdot r_2 )\backslash s$ and $r^* \backslash s$.
  2489 $(r_1 \cdot r_2 )\backslash s$ and $r^* \backslash s$.
  2493 \begin{theorem}\label{rBound}
  2490 \begin{theorem}\label{rBound}
  2494 	For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
  2491 	For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
  2639 		$\llbracket r \rrbracket_r + n$\\
  2636 		$\llbracket r \rrbracket_r + n$\\
  2640 	\end{tabular}
  2637 	\end{tabular}
  2641 \end{center}
  2638 \end{center}
  2642 \noindent
  2639 \noindent
  2643 Arguably we should use $\log \; n$ for the size because
  2640 Arguably we should use $\log \; n$ for the size because
  2644 the number of digits increase logarithmically w.r.t $n$.
  2641 the number of digits increases logarithmically w.r.t $n$.
  2645 For simplicity we choose to add the counter directly to the size.
  2642 For simplicity we choose to add the counter directly to the size.
  2646 
  2643 
  2647 The derivative w.r.t a bounded regular expression
  2644 The derivative w.r.t a bounded regular expression
  2648 is given as 
  2645 is given as 
  2649 \begin{center}
  2646 \begin{center}
  2691 We confirm this point because the correctness theorem would also
  2688 We confirm this point because the correctness theorem would also
  2692 extend without surprise to $\blexersimp$.
  2689 extend without surprise to $\blexersimp$.
  2693 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on
  2690 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on
  2694 do not need to be changed,
  2691 do not need to be changed,
  2695 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to 
  2692 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to 
  2696 add one more line which can be solved by sledgehammer 
  2693 add one more line which can be solved by the Sledgehammer tool
  2697 to solve the $r^{\{n\}}$ inductive case.
  2694 to solve the $r^{\{n\}}$ inductive case.
  2698 
  2695 
  2699 
  2696 
  2700 \subsubsection{Finiteness Proof Augmentation}
  2697 \subsubsection{Finiteness Proof Augmentation}
  2701 The bounded repetitions are
  2698 The bounded repetitions are
  2916 	By a reverse induction on $s$.
  2913 	By a reverse induction on $s$.
  2917 	For the inductive case, note that if $\cbn \; r$ holds,
  2914 	For the inductive case, note that if $\cbn \; r$ holds,
  2918 	then $\cbn \; (r\backslash_r c)$ holds.
  2915 	then $\cbn \; (r\backslash_r c)$ holds.
  2919 \end{proof}
  2916 \end{proof}
  2920 \noindent
  2917 \noindent
  2921 In addition, for $\cbn$-shaped regular expressioins, one can flatten
  2918 In addition, for $\cbn$-shaped regular expressions, one can flatten
  2922 them:
  2919 them:
  2923 \begin{lemma}\label{ntimesHfauPushin}
  2920 \begin{lemma}\label{ntimesHfauPushin}
  2924 	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = 
  2921 	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = 
  2925 	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
  2922 	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
  2926 	(\hflataux{r})})$
  2923 	(\hflataux{r})})$
  2979 		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
  2976 		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
  2980 	\end{tabular}
  2977 	\end{tabular}
  2981 \end{center}
  2978 \end{center}
  2982 \begin{lemma}\label{nupdatesNonempty}
  2979 \begin{lemma}\label{nupdatesNonempty}
  2983 	If for all elements $o \in \textit{set} \; Ss$,
  2980 	If for all elements $o \in \textit{set} \; Ss$,
  2984 	$\nString \; o$ holds, the we have that
  2981 	$\nString \; o$ holds, then we have that
  2985 	for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$,
  2982 	for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$,
  2986 	$\nString \; o'$ holds.
  2983 	$\nString \; o'$ holds.
  2987 \end{lemma}
  2984 \end{lemma}
  2988 \begin{proof}
  2985 \begin{proof}
  2989 	By an induction on $s$, where $Ss$ is set to vary over all possible values.
  2986 	By an induction on $s$, where $Ss$ is set to vary over all possible values.
  3107 
  3104 
  3108 We aim to formalise the correctness and size bound
  3105 We aim to formalise the correctness and size bound
  3109 for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$
  3106 for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$
  3110 and so on, which is still work in progress.
  3107 and so on, which is still work in progress.
  3111 They should more or less follow the same recipe described in this section.
  3108 They should more or less follow the same recipe described in this section.
  3112 Once we know about how to deal with them recursively using suitable auxiliary
  3109 Once we know how to deal with them recursively using suitable auxiliary
  3113 definitions, we are able to routinely establish the proofs.
  3110 definitions, we can routinely establish the proofs.
  3114 
  3111 
  3115 
  3112 
  3116 %----------------------------------------------------------------------------------------
  3113 %----------------------------------------------------------------------------------------
  3117 %	SECTION 3
  3114 %	SECTION 3
  3118 %----------------------------------------------------------------------------------------
  3115 %----------------------------------------------------------------------------------------
  3158 inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function
  3155 inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function
  3159 $f(x) = x * 2^x$.
  3156 $f(x) = x * 2^x$.
  3160 This means the bound we have will surge up at least
  3157 This means the bound we have will surge up at least
  3161 tower-exponentially with a linear increase of the depth.
  3158 tower-exponentially with a linear increase of the depth.
  3162 
  3159 
  3163 One might be quite skeptical about what this non-elementary
  3160 One might be pretty skepticafl about what this non-elementary
  3164 bound can bring us.
  3161 bound can bring us.
  3165 It turns out that the giant bounds are far from being hit.
  3162 It turns out that the giant bounds are far from being hit.
  3166 Here we have some test data from randomly generated regular expressions:
  3163 Here we have some test data from randomly generated regular expressions:
  3167 \begin{figure}[H]
  3164 \begin{figure}[H]
  3168 	\begin{tabular}{@{}c@{\hspace{2mm}}c@{\hspace{0mm}}c@{}}
  3165 	\begin{tabular}{@{}c@{\hspace{2mm}}c@{\hspace{0mm}}c@{}}
  3230   \end{tikzpicture}\\
  3227   \end{tikzpicture}\\
  3231   \multicolumn{3}{c}{}
  3228   \multicolumn{3}{c}{}
  3232 	\end{tabular}    
  3229 	\end{tabular}    
  3233   \caption{Graphs: size change of 3 randomly generated 
  3230   \caption{Graphs: size change of 3 randomly generated 
  3234   regular expressions $w.r.t.$ input string length. 
  3231   regular expressions $w.r.t.$ input string length. 
  3235   The x axis represents the length of input.}
  3232   The x-axis represents the length of the input.}
  3236 \end{figure}  
  3233 \end{figure}  
  3237 \noindent
  3234 \noindent
  3238 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the 
  3235 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the 
  3239 original size.
  3236 original size.
  3240 We will discuss improvements to this bound in the next chapter.
  3237 We will discuss improvements to this bound in the next chapter.
  3257 	\item
  3254 	\item
  3258 
  3255 
  3259 		The size proof can serve as a starting point for a complexity
  3256 		The size proof can serve as a starting point for a complexity
  3260 		formalisation.
  3257 		formalisation.
  3261 		Derivatives are the most important phases of our lexer algorithm.
  3258 		Derivatives are the most important phases of our lexer algorithm.
  3262 		Size properties about derivatives covers the majority of the algorithm
  3259 		Size properties about derivatives cover the majority of the algorithm
  3263 		and is therefore a good indication of complexity of the entire program.
  3260 		and is therefore a good indication of the complexity of the entire program.
  3264 	\item
  3261 	\item
  3265 		The bound is already a strong indication that catastrophic
  3262 		The bound is already a strong indication that catastrophic
  3266 		backtracking is much less likely to occur in our $\blexersimp$
  3263 		backtracking is much less likely to occur in our $\blexersimp$
  3267 		algorithm.
  3264 		algorithm.
  3268 		We refine $\blexersimp$ with $\blexerStrong$ in the next chapter
  3265 		We refine $\blexersimp$ with $\blexerStrong$ in the next chapter
  3278 
  3275 
  3279 
  3276 
  3280 
  3277 
  3281 
  3278 
  3282 
  3279 
  3283 One might wonder the actual bound rather than the loose bound we gave
  3280 One might wonder about the actual bound rather than the loose bound we gave
  3284 for the convenience of an easier proof.
  3281 for the convenience of a more straightforward proof.
  3285 How much can the regex $r^* \backslash s$ grow? 
  3282 How much can the regex $r^* \backslash s$ grow? 
  3286 As  earlier graphs have shown,
  3283 As  earlier graphs have shown,
  3287 %TODO: reference that graph where size grows quickly
  3284 %TODO: reference that graph where size grows quickly
  3288 they can grow at a maximum speed
  3285 they can grow at a maximum speed
  3289 exponential $w.r.t$ the number of characters, 
  3286 exponential $w.r.t$ the number of characters, 
  3320 to express $(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots + 
  3317 to express $(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots + 
  3321 (\underbrace{a \ldots a}_{\text{n a's}})^*$ in the below discussion.
  3318 (\underbrace{a \ldots a}_{\text{n a's}})^*$ in the below discussion.
  3322 The exponential size is triggered by that the regex
  3319 The exponential size is triggered by that the regex
  3323 $\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$
  3320 $\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$
  3324 inside the $(\ldots) ^*$ having exponentially many
  3321 inside the $(\ldots) ^*$ having exponentially many
  3325 different derivatives, despite those difference being minor.
  3322 different derivatives, despite those differences being minor.
  3326 $(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*\backslash \underbrace{a \ldots a}_{\text{m a's}}$
  3323 $(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*\backslash \underbrace{a \ldots a}_{\text{m a's}}$
  3327 will therefore contain the following terms (after flattening out all nested 
  3324 will therefore contain the following terms (after flattening out all nested 
  3328 alternatives):
  3325 alternatives):
  3329 \begin{center}
  3326 \begin{center}
  3330 $(\sum_{i = 1}^{n}  (\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot  (\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot (\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)$\\
  3327 $(\sum_{i = 1}^{n}  (\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot  (\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot (\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)$\\
  3406 
  3403 
  3407 %-----------------------------------
  3404 %-----------------------------------
  3408 %	SUBSECTION 1
  3405 %	SUBSECTION 1
  3409 %-----------------------------------
  3406 %-----------------------------------
  3410 %\subsection{Syntactic Equivalence Under $\simp$}
  3407 %\subsection{Syntactic Equivalence Under $\simp$}
  3411 %We prove that minor differences can be annhilated
  3408 %We prove that minor differences can be annihilated
  3412 %by $\simp$.
  3409 %by $\simp$.
  3413 %For example,
  3410 %For example,
  3414 %\begin{center}
  3411 %\begin{center}
  3415 %	$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) = 
  3412 %	$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) = 
  3416 %	\simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$
  3413 %	\simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$