6 % In Chapter 4 \ref{Chapter4} we give the second guarantee |
6 % In Chapter 4 \ref{Chapter4} we give the second guarantee |
7 %of our bitcoded algorithm, that is a finite bound on the size of any |
7 %of our bitcoded algorithm, that is a finite bound on the size of any |
8 %regex's derivatives. |
8 %regex's derivatives. |
9 |
9 |
10 |
10 |
11 In this chapter we give a bound in terms of size of |
11 In this chapter we give a bound in terms of the size of |
12 the calculated derivatives: |
12 the calculated derivatives: |
13 given an annotated regular expression $a$, for any string $s$ |
13 given an annotated regular expression $a$, for any string $s$ |
14 our algorithm $\blexersimp$'s derivatives |
14 our algorithm $\blexersimp$'s derivatives |
15 are finitely bounded |
15 are finitely bounded |
16 by a constant that only depends on $a$. |
16 by a constant that only depends on $a$. |
17 Formally we show that there exists an $N_a$ such that |
17 Formally we show that there exists an $N_a$ such that |
18 \begin{center} |
18 \begin{center} |
19 $\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$ |
19 $\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$ |
20 \end{center} |
20 \end{center} |
21 \noindent |
21 \noindent |
22 where the size ($\llbracket \_ \rrbracket$) of |
22 where the size ($\llbracket \_ \rrbracket$) of |
23 an annotated regular expression is defined |
23 an annotated regular expression is defined |
24 in terms of the number of nodes in its |
24 in terms of the number of nodes in its |
25 tree structure (its recursive definition given in the next page). |
25 tree structure (its recursive definition is given in the next page). |
26 We believe this size bound |
26 We believe this size bound |
27 is important in the context of POSIX lexing, because |
27 is important in the context of POSIX lexing because |
28 \begin{itemize} |
28 \begin{itemize} |
29 \item |
29 \item |
30 It is a stepping stone towards the goal |
30 It is a stepping stone towards the goal |
31 of eliminating ``catastrophic backtracking''. |
31 of eliminating ``catastrophic backtracking''. |
32 If the internal data structures used by our algorithm |
32 If the internal data structures used by our algorithm |
35 be slow. |
35 be slow. |
36 The next step is to refine the bound $N_a$ so that it |
36 The next step is to refine the bound $N_a$ so that it |
37 is not just finite but polynomial in $\llbracket a\rrbracket$. |
37 is not just finite but polynomial in $\llbracket a\rrbracket$. |
38 \item |
38 \item |
39 Having the finite bound formalised |
39 Having the finite bound formalised |
40 gives us a higher confidence that |
40 gives us higher confidence that |
41 our simplification algorithm $\simp$ does not ``mis-behave'' |
41 our simplification algorithm $\simp$ does not ``misbehave'' |
42 like $\textit{simpSL}$ does. |
42 like $\textit{simpSL}$ does. |
43 The bound is universal for a given regular expression, |
43 The bound is universal for a given regular expression, |
44 which is an advantage over work which |
44 which is an advantage over work which |
45 only gives empirical evidence on |
45 only gives empirical evidence on |
46 some test cases (see for example Verbatim work \cite{Verbatimpp}). |
46 some test cases (see for example Verbatim work \cite{Verbatimpp}). |
119 After $\simp$ the node shrinks. |
119 After $\simp$ the node shrinks. |
120 Our proof states that all the blue nodes |
120 Our proof states that all the blue nodes |
121 stay below a size bound $N_a$ determined by the input $a$. |
121 stay below a size bound $N_a$ determined by the input $a$. |
122 |
122 |
123 \noindent |
123 \noindent |
124 Sulzmann and Lu's assumed a similar picture about their algorithm, |
124 Sulzmann and Lu's assumed a similar picture of their algorithm, |
125 though in fact their algorithm's size might be better depicted by the following graph: |
125 though in fact their algorithm's size might be better depicted by the following graph: |
126 \begin{figure}[H] |
126 \begin{figure}[H] |
127 \begin{tikzpicture}[scale=2, |
127 \begin{tikzpicture}[scale=2, |
128 every node/.style={minimum size=11mm}, |
128 every node/.style={minimum size=11mm}, |
129 ->,>=stealth',shorten >=1pt,auto,thick |
129 ->,>=stealth',shorten >=1pt,auto,thick |
160 of each derivative step to grow continuously (for example |
160 of each derivative step to grow continuously (for example |
161 in \ref{SulzmannLuLexerTime}). |
161 in \ref{SulzmannLuLexerTime}). |
162 They tested out the run time of their |
162 They tested out the run time of their |
163 lexer on particular examples such as $(a+b+ab)^*$ |
163 lexer on particular examples such as $(a+b+ab)^*$ |
164 and claimed that their algorithm is linear w.r.t to the input. |
164 and claimed that their algorithm is linear w.r.t to the input. |
165 With our mecahnised proof, we avoid this type of unintentional |
165 With our mechanised proof, we avoid this type of unintentional |
166 generalisation. |
166 generalisation. |
167 |
167 |
168 Before delving in to the details of the formalisation, |
168 Before delving into the details of the formalisation, |
169 we are going to provide an overview of it in the next subsection. |
169 we are going to provide an overview of it in the following subsection. |
170 |
170 |
171 |
171 |
172 \subsection{Overview of the Proof} |
172 \subsection{Overview of the Proof} |
173 A high-level overview of the main components of the finiteness proof |
173 A high-level overview of the main components of the finiteness proof |
174 is as follows: |
174 is as follows: |
234 $\textit{rrexp}$s. |
234 $\textit{rrexp}$s. |
235 They are annotated regular expressions without bitcodes, |
235 They are annotated regular expressions without bitcodes, |
236 allowing a more convenient size bound proof. |
236 allowing a more convenient size bound proof. |
237 %Of course, the bits which encode the lexing information |
237 %Of course, the bits which encode the lexing information |
238 %would grow linearly with respect |
238 %would grow linearly with respect |
239 %to the input, which should be taken into account when we wish to tackle the runtime comlexity. |
239 %to the input, which should be taken into accounte when we wish to tackle the runtime complexity. |
240 %But for the sake of the structural size |
240 %But for the sake of the structural size |
241 %we can safely ignore them.\\ |
241 %we can safely ignore them.\\ |
242 The datatype |
242 The datatype |
243 definition of the $\rrexp$, called |
243 definition of the $\rrexp$, called |
244 \emph{r-regular expressions}, |
244 \emph{r-regular expressions}, |
267 \end{tabular} |
267 \end{tabular} |
268 \end{center} |
268 \end{center} |
269 \noindent |
269 \noindent |
270 The $r$ in the subscript of $\llbracket \rrbracket_r$ is to |
270 The $r$ in the subscript of $\llbracket \rrbracket_r$ is to |
271 differentiate with the same operation for annotated regular expressions. |
271 differentiate with the same operation for annotated regular expressions. |
272 Adding $r$ as subscript will be used in |
272 Similar subscripts will be added for operations like $\rerase{}$: |
273 other operations as well.\\ |
|
274 The transformation from an annotated regular expression |
|
275 to an r-regular expression is straightforward. |
|
276 \begin{center} |
273 \begin{center} |
277 \begin{tabular}{lcl} |
274 \begin{tabular}{lcl} |
278 $\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\ |
275 $\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\ |
279 $\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\ |
276 $\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\ |
280 $\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\ |
277 $\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\ |
312 The singleton alternative $\sum [r]$ becomes $r$ during the |
309 The singleton alternative $\sum [r]$ becomes $r$ during the |
313 $\erase$ function. |
310 $\erase$ function. |
314 The annotated regular expression $\sum[a, b, c]$ would turn into |
311 The annotated regular expression $\sum[a, b, c]$ would turn into |
315 $(a+(b+c))$. |
312 $(a+(b+c))$. |
316 All these operations change the size and structure of |
313 All these operations change the size and structure of |
317 an annotated regular expressions, adding unnecessary |
314 an annotated regular expression, adding unnecessary |
318 complexities to the size bound proof. |
315 complexities to the size bound proof. |
319 |
316 |
320 For example, if we define the size of a basic plain regular expression |
317 For example, if we define the size of a basic plain regular expression |
321 in the usual way, |
318 in the usual way, |
322 \begin{center} |
319 \begin{center} |
520 |
517 |
521 %The way we obtain the bound for $\rrexp$s is by two steps: |
518 %The way we obtain the bound for $\rrexp$s is by two steps: |
522 %\begin{itemize} |
519 %\begin{itemize} |
523 % \item |
520 % \item |
524 % First, we rewrite $r\backslash s$ into something else that is easier |
521 % First, we rewrite $r\backslash s$ into something else that is easier |
525 % to bound. This step is especially important for the inductive case |
522 % to bound. This step is crucial for the inductive case |
526 % $r_1 \cdot r_2$ and $r^*$, where the derivative can grow and bloat in a wild way, |
523 % $r_1 \cdot r_2$ and $r^*$, where the derivative can grow and bloat in a wild way, |
527 % but after simplification they will always be equal or smaller to a form consisting of an alternative |
524 % but after simplification, they will always be equal or smaller to a form consisting of an alternative |
528 % list of regular expressions $f \; (g\; (\sum rs))$ with some functions applied to it, where each element will be distinct after the function application. |
525 % list of regular expressions $f \; (g\; (\sum rs))$ with some functions applied to it, where each element will be distinct after the function application. |
529 % \item |
526 % \item |
530 % Then, for such a sum list of regular expressions $f\; (g\; (\sum rs))$, we can control its size |
527 % Then, for such a sum list of regular expressions $f\; (g\; (\sum rs))$, we can control its size |
531 % by estimation, since $\distinctBy$ and $\flts$ are well-behaved and working together would only |
528 % by estimation, since $\distinctBy$ and $\flts$ are well-behaved and working together would only |
532 % reduce the size of a regular expression, not adding to it. |
529 % reduce the size of a regular expression, not adding to it. |
578 for the set of substrings of $s$). |
575 for the set of substrings of $s$). |
579 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$ |
576 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$ |
580 in the simplification, which prevents the $rs$ from growing indefinitely. |
577 in the simplification, which prevents the $rs$ from growing indefinitely. |
581 |
578 |
582 Based on this idea, we develop a proof in two steps. |
579 Based on this idea, we develop a proof in two steps. |
583 First, we show the equality (where |
580 First, we show the below equality (where |
584 $f$ and $g$ are functions that do not increase the size of the input) |
581 $f$ and $g$ are functions that do not increase the size of the input) |
585 \begin{center} |
582 \begin{center} |
586 $r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$, |
583 $r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$, |
587 \end{center} |
584 \end{center} |
588 where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on. |
585 where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on. |
602 We will describe in detail the first step of the proof |
599 We will describe in detail the first step of the proof |
603 in the next section. |
600 in the next section. |
604 |
601 |
605 \section{Closed Forms} |
602 \section{Closed Forms} |
606 In this section we introduce in detail |
603 In this section we introduce in detail |
607 how we express the string derivatives |
604 how to express the string derivatives |
608 of regular expressions (i.e. $r \backslash_r s$ where $s$ is a string |
605 of regular expressions (i.e. $r \backslash_r s$ where $s$ is a string |
609 rather than a single character) in a different way than |
606 rather than a single character) in a different way than |
610 our previous definition. |
607 our previous definition. |
611 In previous chapters, the derivative of a |
608 In previous chapters, the derivative of a |
612 regular expression $r$ w.r.t a string $s$ |
609 regular expression $r$ w.r.t a string $s$ |
621 $(r_1 \cdot r_2)\backslash s$, |
618 $(r_1 \cdot r_2)\backslash s$, |
622 we have to somehow get a more concrete form to begin. |
619 we have to somehow get a more concrete form to begin. |
623 We call such more concrete representations the ``closed forms'' of |
620 We call such more concrete representations the ``closed forms'' of |
624 string derivatives as opposed to their original definitions. |
621 string derivatives as opposed to their original definitions. |
625 The terminology ``closed form'' is borrowed from mathematics, |
622 The terminology ``closed form'' is borrowed from mathematics, |
626 which usually describe expressions that are solely comprised of |
623 which usually describe expressions that are solely comprised of finitely many |
627 well-known and easy-to-compute operations such as |
624 well-known and easy-to-compute operations such as |
628 additions, multiplications, exponential functions. |
625 additions, multiplications, and exponential functions. |
629 |
626 |
630 We start by proving some basic identities |
627 We start by proving some basic identities |
631 involving the simplification functions for r-regular expressions. |
628 involving the simplification functions for r-regular expressions. |
632 After that we introduce the rewrite relations |
629 After that we introduce the rewrite relations |
633 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$ |
630 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$ |
634 $\rightsquigarrow_f$ and $\rightsquigarrow_g$. |
631 $\rightsquigarrow_f$ and $\rightsquigarrow_g$. |
635 These relations involves similar techniques as in chapter \ref{Bitcoded2} |
632 These relations involve similar techniques as in chapter \ref{Bitcoded2} |
636 for annotated regular expressions. |
633 for annotated regular expressions. |
637 Finally, we use these identities to establish the |
634 Finally, we use these identities to establish the |
638 closed forms of the alternative regular expression, |
635 closed forms of the alternative regular expression, |
639 the sequence regular expression, and the star regular expression. |
636 the sequence regular expression, and the star regular expression. |
640 %$r_1\cdot r_2$, $r^*$ and $\sum rs$. |
637 %$r_1\cdot r_2$, $r^*$ and $\sum rs$. |
643 |
640 |
644 \subsection{Some Basic Identities} |
641 \subsection{Some Basic Identities} |
645 |
642 |
646 In what follows we will often convert between lists |
643 In what follows we will often convert between lists |
647 and sets. |
644 and sets. |
648 We use Isabelle's $set$ to refere to the |
645 We use Isabelle's $set$ to refer to the |
649 function that converts a list $rs$ to the set |
646 function that converts a list $rs$ to the set |
650 containing all the elements in $rs$. |
647 containing all the elements in $rs$. |
651 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication} |
648 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication} |
652 The $\textit{rdistinct}$ function, as its name suggests, will |
649 The $\textit{rdistinct}$ function, as its name suggests, will |
653 de-duplicate an r-regular expression list. |
650 de-duplicate an r-regular expression list. |
654 It will also remove any elements that |
651 It will also remove any elements that |
655 is already in the accumulator set. |
652 are already in the accumulator set. |
656 \begin{lemma}\label{rdistinctDoesTheJob} |
653 \begin{lemma}\label{rdistinctDoesTheJob} |
657 %The function $\textit{rdistinct}$ satisfies the following |
654 %The function $\textit{rdistinct}$ satisfies the following |
658 %properties: |
655 %properties: |
659 Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its |
656 Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its |
660 recursive definition here. Its Isabelle counterpart would be $\textit{distinct}$.} |
657 recursive definition here. Its Isabelle counterpart would be $\textit{distinct}$.} |
802 |
799 |
803 \subsubsection{The Properties of $\textit{Rflts}$} |
800 \subsubsection{The Properties of $\textit{Rflts}$} |
804 We give in this subsection some properties |
801 We give in this subsection some properties |
805 involving $\backslash_r$, $\backslash_{rsimps}$, $\textit{rflts}$ and |
802 involving $\backslash_r$, $\backslash_{rsimps}$, $\textit{rflts}$ and |
806 $\textit{rsimp}_{ALTS} $, together with any non-trivial lemmas that lead to them. |
803 $\textit{rsimp}_{ALTS} $, together with any non-trivial lemmas that lead to them. |
807 These will be helpful in later closed form proofs, when |
804 These will be helpful in later closed-form proofs, when |
808 we want to transform derivative terms which have |
805 we want to transform derivative terms which have |
809 %the ways in which multiple functions involving |
806 %the ways in which multiple functions involving |
810 %those are composed together |
807 %those are composed together |
811 interleaving derivatives and simplifications applied to them. |
808 interleaving derivatives and simplifications applied to them. |
812 |
809 |
813 \noindent |
810 \noindent |
814 %When the function $\textit{Rflts}$ |
811 %When the function $\textit{Rflts}$ |
815 %is applied to the concatenation of two lists, the output can be calculated by first applying the |
812 %is applied to the concatenation of two lists; the output can be calculated by first applying the |
816 %functions on two lists separately, and then concatenating them together. |
813 %functions on two lists separately and then concatenating them together. |
817 $\textit{Rflts}$ is composable in terms of concatenation: |
814 $\textit{Rflts}$ is composable in terms of concatenation: |
818 \begin{lemma}\label{rfltsProps} |
815 \begin{lemma}\label{rfltsProps} |
819 The function $\rflts$ has the properties below:\\ |
816 The function $\rflts$ has the properties below:\\ |
820 \begin{itemize} |
817 \begin{itemize} |
821 \item |
818 \item |
917 \noindent |
914 \noindent |
918 We omit the recursive definition of the predicate $\textit{nonAlt}$, |
915 We omit the recursive definition of the predicate $\textit{nonAlt}$, |
919 which evaluates to true when the regular expression is not an |
916 which evaluates to true when the regular expression is not an |
920 alternative, and false otherwise. |
917 alternative, and false otherwise. |
921 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that |
918 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that |
922 its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$, |
919 its non-empty argument list of expressions are all good themselves, and $\textit{nonAlt}$, |
923 and unique: |
920 and unique: |
924 \begin{lemma}\label{rsimpaltsGood} |
921 \begin{lemma}\label{rsimpaltsGood} |
925 If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$, |
922 If $rs \neq []$ and for all $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$, |
926 then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$. |
923 then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$. |
927 \end{lemma} |
924 \end{lemma} |
928 \noindent |
925 \noindent |
929 We also note that |
926 We also note that |
930 if a regular expression $r$ is good, then $\rflts$ on the singleton |
927 if a regular expression $r$ is good, then $\rflts$ on the singleton |
960 \end{lemma} |
957 \end{lemma} |
961 \begin{proof} |
958 \begin{proof} |
962 By induction on $r$. |
959 By induction on $r$. |
963 \end{proof} |
960 \end{proof} |
964 \noindent |
961 \noindent |
965 With this we can prove that a regular expressions |
962 With this we can prove that a regular expression |
966 after simplification and flattening and de-duplication, |
963 after simplification and flattening and de-duplication, |
967 will not contain any alternative regular expression directly: |
964 will not contain any alternative regular expression directly: |
968 \begin{lemma}\label{nonaltFltsRd} |
965 \begin{lemma}\label{nonaltFltsRd} |
969 If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ |
966 If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ |
970 then $\textit{nonAlt} \; x$. |
967 then $\textit{nonAlt} \; x$. |
993 $\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$, |
990 $\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$, |
994 then $\good \; x$. |
991 then $\good \; x$. |
995 \end{itemize} |
992 \end{itemize} |
996 \end{lemma} |
993 \end{lemma} |
997 \begin{proof} |
994 \begin{proof} |
998 The first part is by induction on $rs$, where the induction |
995 The first part is by induction, where the inductive cases |
999 rule is the inductive cases for $\rflts$. |
996 are the inductive cases of $\rflts$. |
1000 The second part is a corollary from the first part. |
997 The second part is a corollary from the first part. |
1001 \end{proof} |
998 \end{proof} |
1002 |
999 |
1003 This leads to good structural property of $\rsimp{}$, |
1000 This leads to good structural property of $\rsimp{}$, |
1004 that after simplification, a regular expression is |
1001 that after simplification, a regular expression is |
1010 By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$. |
1007 By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$. |
1011 Lemma \ref{rsimpMono} says that |
1008 Lemma \ref{rsimpMono} says that |
1012 $\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to |
1009 $\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to |
1013 $\llbracket r \rrbracket_r$. |
1010 $\llbracket r \rrbracket_r$. |
1014 Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case, |
1011 Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case, |
1015 Inductive hypothesis applies to the children regular expressions |
1012 The inductive hypothesis applies to the children regular expressions |
1016 $r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied |
1013 $r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied |
1017 by that as well. |
1014 by that as well. |
1018 The lemmas \ref{nonnestedRsimp} and \ref{nonaltFltsRd} are used |
1015 The lemmas \ref{nonnestedRsimp} and \ref{nonaltFltsRd} are used |
1019 to ensure that goodness is preserved at the topmost level. |
1016 to ensure that goodness is preserved at the topmost level. |
1020 \end{proof} |
1017 \end{proof} |
1440 the first ($r\backslash c$) to the other ($r'\backslash c$). |
1437 the first ($r\backslash c$) to the other ($r'\backslash c$). |
1441 \begin{lemma}\label{interleave} |
1438 \begin{lemma}\label{interleave} |
1442 If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$ |
1439 If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$ |
1443 \end{lemma} |
1440 \end{lemma} |
1444 \noindent |
1441 \noindent |
1445 This allows us proving more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$. |
1442 This allows us to prove more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$. |
1446 \begin{lemma}\label{insideSimpRemoval} |
1443 \begin{lemma}\label{insideSimpRemoval} |
1447 $\rsimp{(\rder{c}{(\rsimp{r})})} = \rsimp{(\rder{c}{r})} $ |
1444 $\rsimp{(\rder{c}{(\rsimp{r})})} = \rsimp{(\rder{c}{r})} $ |
1448 \end{lemma} |
1445 \end{lemma} |
1449 \noindent |
1446 \noindent |
1450 \begin{proof} |
1447 \begin{proof} |
1528 $((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') |
1525 $((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') |
1529 + r_2 \backslash cc'c''$ & |
1526 + r_2 \backslash cc'c''$ & |
1530 $ \longrightarrow_{\backslash c''} \quad \ldots$\\ |
1527 $ \longrightarrow_{\backslash c''} \quad \ldots$\\ |
1531 \end{tabular} |
1528 \end{tabular} |
1532 \end{center} |
1529 \end{center} |
1533 Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expresssed as |
1530 Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expressed as |
1534 a giant alternative taking a list of terms |
1531 a giant alternative taking a list of terms |
1535 $[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$, |
1532 $[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$, |
1536 where the head of the list is always the term |
1533 where the head of the list is always the term |
1537 representing a match involving only $r_1$, and the tail of the list consisting of |
1534 representing a match involving only $r_1$, and the tail of the list consisting of |
1538 terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$. |
1535 terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$. |
1598 In a closed-form one needs to take into account this (because |
1595 In a closed-form one needs to take into account this (because |
1599 closed forms require exact equality rather than language equivalence) |
1596 closed forms require exact equality rather than language equivalence) |
1600 and only generate the |
1597 and only generate the |
1601 $r_2 \backslash_r s''$ terms satisfying the property |
1598 $r_2 \backslash_r s''$ terms satisfying the property |
1602 \begin{center} |
1599 \begin{center} |
1603 $\exists s'. such \; that \; s'@s'' = s \;\; \land \;\; |
1600 $\exists s'. such \; that \; s'@s'' = s \;\; \land \;\; |
1604 r_1 \backslash s' \; is \; nullable$. |
1601 r_1 \backslash s' \; is \; nullable$. |
1605 \end{center} |
1602 \end{center} |
1606 Given the arguments $s$ and $r_1$, we denote the list of strings |
1603 Given the arguments $s$ and $r_1$, we denote the list of strings |
1607 $s''$ satisfying the above property as $\vsuf{s}{r_1}$. |
1604 $s''$ satisfying the above property as $\vsuf{s}{r_1}$. |
1608 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string\footnote{ |
1605 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string\footnote{ |
1609 Perhaps a better name of it would be ``NullablePrefixSuffix'' |
1606 Perhaps a better name for it would be ``NullablePrefixSuffix'' |
1610 to differentiate with the list of \emph{all} prefixes of $s$, but |
1607 to differentiate with the list of \emph{all} prefixes of $s$, but |
1611 that is a bit too long for a function name and we are yet to find |
1608 that is a bit too long for a function name and we are yet to find |
1612 a more concise and easy-to-understand name.} |
1609 a more concise and easy-to-understand name.} |
1613 \begin{center} |
1610 \begin{center} |
1614 \begin{tabular}{lcl} |
1611 \begin{tabular}{lcl} |
1833 When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, |
1830 When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, |
1834 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable, |
1831 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable, |
1835 the number of terms in $r^* \backslash s$ will grow exponentially rather than linearly |
1832 the number of terms in $r^* \backslash s$ will grow exponentially rather than linearly |
1836 in the sequence case. |
1833 in the sequence case. |
1837 The good news is that the function $\textit{rsimp}$ will again |
1834 The good news is that the function $\textit{rsimp}$ will again |
1838 ignore the difference between differently nesting patterns of alternatives, |
1835 ignore the difference between different nesting patterns of alternatives, |
1839 and the exponentially growing star derivative like |
1836 and the exponentially growing star derivative like |
1840 \begin{center} |
1837 \begin{center} |
1841 $(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + |
1838 $(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + |
1842 (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ |
1839 (r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ |
1843 \end{center} |
1840 \end{center} |
2018 \map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$ |
2015 \map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$ |
2019 \end{center} |
2016 \end{center} |
2020 holds. |
2017 holds. |
2021 \end{lemma} |
2018 \end{lemma} |
2022 \begin{proof} |
2019 \begin{proof} |
2023 By an induction on the inductivev cases of $\textit{createdByStar}$. |
2020 By an induction on the inductive cases of $\textit{createdByStar}$. |
2024 \end{proof} |
2021 \end{proof} |
2025 %This is not entirely true for annotated regular expressions: |
2022 %This is not entirely true for annotated regular expressions: |
2026 %%TODO: bsimp bders \neq bderssimp |
2023 %%TODO: bsimp bders \neq bderssimp |
2027 %\begin{center} |
2024 %\begin{center} |
2028 % $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$ |
2025 % $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$ |
2088 \end{lemma} |
2085 \end{lemma} |
2089 |
2086 |
2090 \begin{proof} |
2087 \begin{proof} |
2091 By using the rewriting relation $\rightsquigarrow$ |
2088 By using the rewriting relation $\rightsquigarrow$ |
2092 \end{proof} |
2089 \end{proof} |
2093 And from this we obtain that a |
2090 And from this we obtain the following fact: a |
2094 regular expression created by star |
2091 regular expression created by star |
2095 is the same as its flattened version, up to equivalence under $\textit{bsimp}$. |
2092 is the same as its flattened version, up to equivalence under $\textit{bsimp}$. |
2096 For example, |
2093 For example, |
2097 \begin{lemma}\label{hfauRsimpeq2} |
2094 \begin{lemma}\label{hfauRsimpeq2} |
2098 $\textit{createdByStar} \; r \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$ |
2095 $\textit{createdByStar} \; r \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$ |
2236 %\end{center} |
2233 %\end{center} |
2237 \section{Bounding Closed Forms} |
2234 \section{Bounding Closed Forms} |
2238 |
2235 |
2239 In this section, we introduce how we formalised the bound |
2236 In this section, we introduce how we formalised the bound |
2240 on closed forms. |
2237 on closed forms. |
2241 We first show that in general regular expressions up to a certain |
2238 We first show that in general the number of regular expressions up to a certain |
2242 size are finite. |
2239 size is finite. |
2243 Then we prove that functions such as $\rflts$ |
2240 Then we prove that functions such as $\rflts$ |
2244 will not cause the size of r-regular expressions to grow. |
2241 will not cause the size of r-regular expressions to grow. |
2245 Putting this together with a general bound |
2242 Putting this together with a general bound |
2246 on the finiteness of distinct regular expressions |
2243 on the finiteness of distinct regular expressions |
2247 up to a certain size, we obtain a bound on |
2244 up to a specific size, we obtain a bound on |
2248 the closed forms. |
2245 the closed forms. |
2249 |
2246 |
2250 \subsection{Finiteness of Distinct Regular Expressions} |
2247 \subsection{Finiteness of Distinct Regular Expressions} |
2251 We define the set of regular expressions whose size are no more than |
2248 We define the set of regular expressions whose size is no more than |
2252 a certain size $N$ as $\textit{sizeNregex} \; N$: |
2249 a certain size $N$ as $\textit{sizeNregex} \; N$: |
2253 \[ |
2250 \[ |
2254 \textit{sizeNregex} \; N \dn \{r\; \mid \; \llbracket r \rrbracket_r \leq N \} |
2251 \textit{sizeNregex} \; N \dn \{r\; \mid \; \llbracket r \rrbracket_r \leq N \} |
2255 \] |
2252 \] |
2256 We have that $\textit{sizeNregex} \; N$ is always a finite set: |
2253 We have that $\textit{sizeNregex} \; N$ is always a finite set: |
2259 \end{lemma} |
2256 \end{lemma} |
2260 \begin{proof} |
2257 \begin{proof} |
2261 By splitting the set $\textit{sizeNregex} \; (N + 1)$ into |
2258 By splitting the set $\textit{sizeNregex} \; (N + 1)$ into |
2262 subsets by their categories: |
2259 subsets by their categories: |
2263 $\{\ZERO_r, \ONE_r, c\}$, $\{r^* \mid r \in \textit{sizeNregex} \; N\}$, |
2260 $\{\ZERO_r, \ONE_r, c\}$, $\{r^* \mid r \in \textit{sizeNregex} \; N\}$, |
2264 and so on. Each of these subsets are finitely bounded. |
2261 and so on. Each of these subsets is finitely bounded. |
2265 \end{proof} |
2262 \end{proof} |
2266 \noindent |
2263 \noindent |
2267 From this we get a corollary that |
2264 From this we get a corollary that |
2268 if forall $r \in rs$, $\rsize{r} \leq N$, then the output of |
2265 if forall $r \in rs$, $\rsize{r} \leq N$, then the output of |
2269 $\rdistinct{rs}{\varnothing}$ is a list of regular |
2266 $\rdistinct{rs}{\varnothing}$ is a list of regular |
2327 \textit{rsimp} \; x))\; \{ \} ) \rrbracket_r |
2324 \textit{rsimp} \; x))\; \{ \} ) \rrbracket_r |
2328 \] |
2325 \] |
2329 \end{itemize} |
2326 \end{itemize} |
2330 \end{lemma} |
2327 \end{lemma} |
2331 \begin{proof} |
2328 \begin{proof} |
2332 Point 1, 3, 4 can be proven by an induction on $rs$. |
2329 Points 1, 3, and 4 can be proven by an induction on $rs$. |
2333 Point 2 is by case analysis on $r_1$ and $r_2$. |
2330 Point 2 is by case analysis on $r_1$ and $r_2$. |
2334 The last part is a corollary of the previous ones. |
2331 The last part is a corollary of the previous ones. |
2335 \end{proof} |
2332 \end{proof} |
2336 \noindent |
2333 \noindent |
2337 With the lemmas for each inductive case in place, we are ready to get |
2334 With the lemmas for each inductive case in place, we are ready to get |
2372 can be described by some inductive sub-structures |
2369 can be described by some inductive sub-structures |
2373 (for example when $r = r_1 \cdot r_2$ then $rs'$ |
2370 (for example when $r = r_1 \cdot r_2$ then $rs'$ |
2374 will be solely comprised of $r_1 \backslash s'$ |
2371 will be solely comprised of $r_1 \backslash s'$ |
2375 and $r_2 \backslash s''$, $s'$ and $s''$ being |
2372 and $r_2 \backslash s''$, $s'$ and $s''$ being |
2376 sub-strings of $s$). |
2373 sub-strings of $s$). |
2377 which will each have a size uppder bound |
2374 which will each have a size upper bound |
2378 according to inductive hypothesis, which controls $r \backslash s$. |
2375 according to the inductive hypothesis, which controls $r \backslash s$. |
2379 |
2376 |
2380 We elaborate the above reasoning by a series of lemmas |
2377 We elaborate the above reasoning by a series of lemmas |
2381 below, where straightforward proofs are omitted. |
2378 below, where straightforward proofs are omitted. |
2382 %We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$. |
2379 %We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$. |
2383 We show that $\textit{rdistinct}$ and $\rflts$ |
2380 We show that $\textit{rdistinct}$ and $\rflts$ |
2389 \llbracket \rdistinct{rs}{\varnothing} \rrbracket_r $. |
2386 \llbracket \rdistinct{rs}{\varnothing} \rrbracket_r $. |
2390 \end{center} |
2387 \end{center} |
2391 We need this so that we know the outcome of our real |
2388 We need this so that we know the outcome of our real |
2392 simplification is better than or equal to a rough estimate, |
2389 simplification is better than or equal to a rough estimate, |
2393 and therefore can be bounded by that estimate. |
2390 and therefore can be bounded by that estimate. |
2394 This is a bit harder to establish compared with proving |
2391 This is a bit harder to establish compared to proving |
2395 $\textit{flts}$ does not make a list larger (which can |
2392 $\textit{flts}$ does not make a list larger (which can |
2396 be proven using routine induction): |
2393 be proven using routine induction): |
2397 \begin{center} |
2394 \begin{center} |
2398 $\llbracket \textit{rflts}\; rs \rrbracket_r \leq |
2395 $\llbracket \textit{rflts}\; rs \rrbracket_r \leq |
2399 \llbracket \textit{rs} \rrbracket_r$ |
2396 \llbracket \textit{rs} \rrbracket_r$ |
2484 \end{lemma} |
2481 \end{lemma} |
2485 \begin{proof} |
2482 \begin{proof} |
2486 By using corollary \ref{interactionFltsDB}. |
2483 By using corollary \ref{interactionFltsDB}. |
2487 \end{proof} |
2484 \end{proof} |
2488 \noindent |
2485 \noindent |
2489 This is a key lemma in establishing the bounds on all the |
2486 This is a key lemma in establishing the bounds of all the |
2490 closed forms. |
2487 closed forms. |
2491 With this we are now ready to control the sizes of |
2488 With this we are now ready to control the sizes of |
2492 $(r_1 \cdot r_2 )\backslash s$ and $r^* \backslash s$. |
2489 $(r_1 \cdot r_2 )\backslash s$ and $r^* \backslash s$. |
2493 \begin{theorem}\label{rBound} |
2490 \begin{theorem}\label{rBound} |
2494 For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$ |
2491 For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$ |
2639 $\llbracket r \rrbracket_r + n$\\ |
2636 $\llbracket r \rrbracket_r + n$\\ |
2640 \end{tabular} |
2637 \end{tabular} |
2641 \end{center} |
2638 \end{center} |
2642 \noindent |
2639 \noindent |
2643 Arguably we should use $\log \; n$ for the size because |
2640 Arguably we should use $\log \; n$ for the size because |
2644 the number of digits increase logarithmically w.r.t $n$. |
2641 the number of digits increases logarithmically w.r.t $n$. |
2645 For simplicity we choose to add the counter directly to the size. |
2642 For simplicity we choose to add the counter directly to the size. |
2646 |
2643 |
2647 The derivative w.r.t a bounded regular expression |
2644 The derivative w.r.t a bounded regular expression |
2648 is given as |
2645 is given as |
2649 \begin{center} |
2646 \begin{center} |
2691 We confirm this point because the correctness theorem would also |
2688 We confirm this point because the correctness theorem would also |
2692 extend without surprise to $\blexersimp$. |
2689 extend without surprise to $\blexersimp$. |
2693 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on |
2690 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on |
2694 do not need to be changed, |
2691 do not need to be changed, |
2695 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to |
2692 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to |
2696 add one more line which can be solved by sledgehammer |
2693 add one more line which can be solved by the Sledgehammer tool |
2697 to solve the $r^{\{n\}}$ inductive case. |
2694 to solve the $r^{\{n\}}$ inductive case. |
2698 |
2695 |
2699 |
2696 |
2700 \subsubsection{Finiteness Proof Augmentation} |
2697 \subsubsection{Finiteness Proof Augmentation} |
2701 The bounded repetitions are |
2698 The bounded repetitions are |
2916 By a reverse induction on $s$. |
2913 By a reverse induction on $s$. |
2917 For the inductive case, note that if $\cbn \; r$ holds, |
2914 For the inductive case, note that if $\cbn \; r$ holds, |
2918 then $\cbn \; (r\backslash_r c)$ holds. |
2915 then $\cbn \; (r\backslash_r c)$ holds. |
2919 \end{proof} |
2916 \end{proof} |
2920 \noindent |
2917 \noindent |
2921 In addition, for $\cbn$-shaped regular expressioins, one can flatten |
2918 In addition, for $\cbn$-shaped regular expressions, one can flatten |
2922 them: |
2919 them: |
2923 \begin{lemma}\label{ntimesHfauPushin} |
2920 \begin{lemma}\label{ntimesHfauPushin} |
2924 If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = |
2921 If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = |
2925 \textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \; |
2922 \textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \; |
2926 (\hflataux{r})})$ |
2923 (\hflataux{r})})$ |
2979 $\nString \; (\Some \; (c::s, n))$ & $\dn$ & $ \textit{true}$\\ |
2976 $\nString \; (\Some \; (c::s, n))$ & $\dn$ & $ \textit{true}$\\ |
2980 \end{tabular} |
2977 \end{tabular} |
2981 \end{center} |
2978 \end{center} |
2982 \begin{lemma}\label{nupdatesNonempty} |
2979 \begin{lemma}\label{nupdatesNonempty} |
2983 If for all elements $o \in \textit{set} \; Ss$, |
2980 If for all elements $o \in \textit{set} \; Ss$, |
2984 $\nString \; o$ holds, the we have that |
2981 $\nString \; o$ holds, then we have that |
2985 for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$, |
2982 for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$, |
2986 $\nString \; o'$ holds. |
2983 $\nString \; o'$ holds. |
2987 \end{lemma} |
2984 \end{lemma} |
2988 \begin{proof} |
2985 \begin{proof} |
2989 By an induction on $s$, where $Ss$ is set to vary over all possible values. |
2986 By an induction on $s$, where $Ss$ is set to vary over all possible values. |
3107 |
3104 |
3108 We aim to formalise the correctness and size bound |
3105 We aim to formalise the correctness and size bound |
3109 for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$ |
3106 for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$ |
3110 and so on, which is still work in progress. |
3107 and so on, which is still work in progress. |
3111 They should more or less follow the same recipe described in this section. |
3108 They should more or less follow the same recipe described in this section. |
3112 Once we know about how to deal with them recursively using suitable auxiliary |
3109 Once we know how to deal with them recursively using suitable auxiliary |
3113 definitions, we are able to routinely establish the proofs. |
3110 definitions, we can routinely establish the proofs. |
3114 |
3111 |
3115 |
3112 |
3116 %---------------------------------------------------------------------------------------- |
3113 %---------------------------------------------------------------------------------------- |
3117 % SECTION 3 |
3114 % SECTION 3 |
3118 %---------------------------------------------------------------------------------------- |
3115 %---------------------------------------------------------------------------------------- |
3158 inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function |
3155 inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function |
3159 $f(x) = x * 2^x$. |
3156 $f(x) = x * 2^x$. |
3160 This means the bound we have will surge up at least |
3157 This means the bound we have will surge up at least |
3161 tower-exponentially with a linear increase of the depth. |
3158 tower-exponentially with a linear increase of the depth. |
3162 |
3159 |
3163 One might be quite skeptical about what this non-elementary |
3160 One might be pretty skepticafl about what this non-elementary |
3164 bound can bring us. |
3161 bound can bring us. |
3165 It turns out that the giant bounds are far from being hit. |
3162 It turns out that the giant bounds are far from being hit. |
3166 Here we have some test data from randomly generated regular expressions: |
3163 Here we have some test data from randomly generated regular expressions: |
3167 \begin{figure}[H] |
3164 \begin{figure}[H] |
3168 \begin{tabular}{@{}c@{\hspace{2mm}}c@{\hspace{0mm}}c@{}} |
3165 \begin{tabular}{@{}c@{\hspace{2mm}}c@{\hspace{0mm}}c@{}} |
3230 \end{tikzpicture}\\ |
3227 \end{tikzpicture}\\ |
3231 \multicolumn{3}{c}{} |
3228 \multicolumn{3}{c}{} |
3232 \end{tabular} |
3229 \end{tabular} |
3233 \caption{Graphs: size change of 3 randomly generated |
3230 \caption{Graphs: size change of 3 randomly generated |
3234 regular expressions $w.r.t.$ input string length. |
3231 regular expressions $w.r.t.$ input string length. |
3235 The x axis represents the length of input.} |
3232 The x-axis represents the length of the input.} |
3236 \end{figure} |
3233 \end{figure} |
3237 \noindent |
3234 \noindent |
3238 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the |
3235 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the |
3239 original size. |
3236 original size. |
3240 We will discuss improvements to this bound in the next chapter. |
3237 We will discuss improvements to this bound in the next chapter. |
3257 \item |
3254 \item |
3258 |
3255 |
3259 The size proof can serve as a starting point for a complexity |
3256 The size proof can serve as a starting point for a complexity |
3260 formalisation. |
3257 formalisation. |
3261 Derivatives are the most important phases of our lexer algorithm. |
3258 Derivatives are the most important phases of our lexer algorithm. |
3262 Size properties about derivatives covers the majority of the algorithm |
3259 Size properties about derivatives cover the majority of the algorithm |
3263 and is therefore a good indication of complexity of the entire program. |
3260 and is therefore a good indication of the complexity of the entire program. |
3264 \item |
3261 \item |
3265 The bound is already a strong indication that catastrophic |
3262 The bound is already a strong indication that catastrophic |
3266 backtracking is much less likely to occur in our $\blexersimp$ |
3263 backtracking is much less likely to occur in our $\blexersimp$ |
3267 algorithm. |
3264 algorithm. |
3268 We refine $\blexersimp$ with $\blexerStrong$ in the next chapter |
3265 We refine $\blexersimp$ with $\blexerStrong$ in the next chapter |
3278 |
3275 |
3279 |
3276 |
3280 |
3277 |
3281 |
3278 |
3282 |
3279 |
3283 One might wonder the actual bound rather than the loose bound we gave |
3280 One might wonder about the actual bound rather than the loose bound we gave |
3284 for the convenience of an easier proof. |
3281 for the convenience of a more straightforward proof. |
3285 How much can the regex $r^* \backslash s$ grow? |
3282 How much can the regex $r^* \backslash s$ grow? |
3286 As earlier graphs have shown, |
3283 As earlier graphs have shown, |
3287 %TODO: reference that graph where size grows quickly |
3284 %TODO: reference that graph where size grows quickly |
3288 they can grow at a maximum speed |
3285 they can grow at a maximum speed |
3289 exponential $w.r.t$ the number of characters, |
3286 exponential $w.r.t$ the number of characters, |
3320 to express $(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots + |
3317 to express $(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots + |
3321 (\underbrace{a \ldots a}_{\text{n a's}})^*$ in the below discussion. |
3318 (\underbrace{a \ldots a}_{\text{n a's}})^*$ in the below discussion. |
3322 The exponential size is triggered by that the regex |
3319 The exponential size is triggered by that the regex |
3323 $\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$ |
3320 $\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$ |
3324 inside the $(\ldots) ^*$ having exponentially many |
3321 inside the $(\ldots) ^*$ having exponentially many |
3325 different derivatives, despite those difference being minor. |
3322 different derivatives, despite those differences being minor. |
3326 $(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*\backslash \underbrace{a \ldots a}_{\text{m a's}}$ |
3323 $(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*\backslash \underbrace{a \ldots a}_{\text{m a's}}$ |
3327 will therefore contain the following terms (after flattening out all nested |
3324 will therefore contain the following terms (after flattening out all nested |
3328 alternatives): |
3325 alternatives): |
3329 \begin{center} |
3326 \begin{center} |
3330 $(\sum_{i = 1}^{n} (\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot (\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot (\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)$\\ |
3327 $(\sum_{i = 1}^{n} (\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot (\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot (\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)$\\ |
3406 |
3403 |
3407 %----------------------------------- |
3404 %----------------------------------- |
3408 % SUBSECTION 1 |
3405 % SUBSECTION 1 |
3409 %----------------------------------- |
3406 %----------------------------------- |
3410 %\subsection{Syntactic Equivalence Under $\simp$} |
3407 %\subsection{Syntactic Equivalence Under $\simp$} |
3411 %We prove that minor differences can be annhilated |
3408 %We prove that minor differences can be annihilated |
3412 %by $\simp$. |
3409 %by $\simp$. |
3413 %For example, |
3410 %For example, |
3414 %\begin{center} |
3411 %\begin{center} |
3415 % $\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) = |
3412 % $\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) = |
3416 % \simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$ |
3413 % \simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$ |