ChengsongTanPhdThesis/Chapters/Finite.tex
author Chengsong
Wed, 23 Aug 2023 03:02:31 +0100
changeset 668 3831621d7b14
parent 663 0d1e68268d0f
permissions -rwxr-xr-x
added technical Overview section, almost done introduction
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
     3
\chapter{A Formal Proof That $\textit{Blexer}\_\textit{simp}$ will not Grow Unbounded} % Main chapter title
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
\label{Finite} 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
%  In Chapter 4 \ref{Chapter4} we give the second guarantee
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%of our bitcoded algorithm, that is a finite bound on the size of any 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%regex's derivatives. 
660
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
     9
%(this is cahpter 5 now)
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    10
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    11
In this chapter we prove a bound in terms of the size of 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 620
diff changeset
    12
the calculated derivatives: 
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    13
given an annotated regular expression $a$, there exists
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    14
a constant integer $N$, such that for any string $s$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 620
diff changeset
    15
our algorithm $\blexersimp$'s derivatives
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    16
are bounded
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    17
by $N$. %a constant that only depends on $a$.
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    18
Formally this can be expresssed
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    19
as 
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    20
%we show that there exists a constant integer $N_a$ such that
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
    21
\begin{center}
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    22
	$\llbracket \bderssimp{a}{s} \rrbracket \leq N$
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
    23
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
    24
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    25
where the size ($\llbracket \_ \rrbracket$) of 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    26
an annotated regular expression is defined
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    27
in terms of the number of nodes in its 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
    28
tree structure (its recursive definition is given in the next page).
613
Chengsong
parents: 611
diff changeset
    29
We believe this size bound
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
    30
is important in the context of POSIX lexing because 
660
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    31
\marginpar{Addressing Gerog comment: "how does this relate to backtracking?"}
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
    32
\begin{itemize}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
    33
	\item
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    34
		It is a stepping stone towards the goal 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    35
		of eliminating ``catastrophic backtracking''. 
660
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    36
		The derivative-based lexing algorithm avoids backtracking
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    37
		by a trade-off between space and time.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    38
		Backtracking algorithms
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    39
		save other possibilities on a stack when exploring one possible
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    40
		path of matching. Catastrophic backtracking typically occurs
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    41
		when the number of steps increase exponentially with respect
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
    42
		to input. In other words, the complexity is $O((c_r)^n)$ of the input
660
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    43
		string length $n$, where the base of the exponent is determined by the
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    44
		regular expression $r$.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    45
		%so that they
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    46
		%can be traversed in the future in a DFS manner,
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    47
		%different matchings are stored as sub-expressions 
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    48
		%in a regular expression derivative.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    49
		Derivatives saves these possibilities as sub-expressions
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    50
		and traverse those during future derivatives. If we denote the size
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    51
		of intermediate derivatives as $S_{r,n}$ (where the subscripts
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    52
		$r,n$ indicate that $S$ depends on them), then the runtime of 
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    53
		derivative-based approaches would be $O(S_{r,n} * n)$.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    54
		We observe that if $S_{r,n}$ continously grows with $n$ (for example
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    55
		growing exponentially fast), then this
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    56
		is equally bad as catastrophic backtracking.
661
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
    57
		Our finiteness bound seeks to find a constant integer
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
    58
		upper bound $C$ (which in our case is $N_a$ where $a = r^\uparrow$) of $\S_{r,n}$,
660
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    59
		so that the complexity of the algorithm can be seen as linear ($O(C * n)$).
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    60
		Even if $C$ is still large in our current work, it is still a constant
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    61
		rather than ever-increasing number with respect to input length $n$.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    62
		More importantly this $C$ constant can potentially
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    63
		be shrunken as we optimize our simplification procedure. 
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    64
		%and showing the potential
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    65
		%improvements can be by the notion of partial derivatives.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    66
		
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    67
		%If the internal data structures used by our algorithm
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    68
		%grows beyond a finite bound, then clearly 
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    69
		%the algorithm (which traverses these structures) will
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    70
		%be slow.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    71
		%The next step is to refine the bound $N_a$ so that it
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment
Chengsong
parents: 659
diff changeset
    72
		%is not just finite but polynomial in $\llbracket a\rrbracket$.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
    73
	\item
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    74
		Having the finite bound formalised 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
    75
		gives us higher confidence that
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
    76
		our simplification algorithm $\simp$ does not ``misbehave''
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    77
		like $\textit{simpSL}$ does.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    78
		The bound is universal for a given regular expression, 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    79
		which is an advantage over work which 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 620
diff changeset
    80
		only gives empirical evidence on 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    81
		some test cases (see for example Verbatim work \cite{Verbatimpp}).
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
    82
\end{itemize}
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    83
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    84
We then extend our $\blexersimp$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    85
to support bounded repetitions ($r^{\{n\}}$).
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    86
We update our formalisation of 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    87
the correctness and finiteness properties to
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    88
include this new construct. 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    89
We show that we can out-compete other verified lexers such as
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    90
Verbatim++ on bounded regular expressions.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    91
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    92
In the next section we describe in more detail
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    93
what the finite bound means in our algorithm
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    94
and why the size of the internal data structures of
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    95
a typical derivative-based lexer such as
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    96
Sulzmann and Lu's needs formal treatment.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
    97
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    98
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
    99
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
   100
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   101
\section{Formalising Size Bound of Derivatives}
577
f47fc4840579 thesis chap2
Chengsong
parents: 576
diff changeset
   102
\noindent
613
Chengsong
parents: 611
diff changeset
   103
In our lexer ($\blexersimp$),
Chengsong
parents: 611
diff changeset
   104
we take an annotated regular expression as input,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   105
and repeately take derivative of and simplify it.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   106
\begin{figure}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   107
	\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   108
		\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   109
			$\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   110
			$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   111
			$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   112
			$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   113
			$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $(\textit{sum}\; (\map \; (\llbracket \_ \rrbracket)\; as)  ) + 1$\\
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   114
			$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   115
		\end{tabular}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   116
	\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   117
	\caption{The size function of bitcoded regular expressions}\label{brexpSize}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   118
\end{figure}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   119
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   120
\begin{figure}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   121
	\begin{tikzpicture}[scale=2,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   122
		every node/.style={minimum size=11mm},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   123
		->,>=stealth',shorten >=1pt,auto,thick
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   124
		]
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   125
		\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   126
		\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   127
		\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   128
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   129
		\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=6mm]{$a_{1s}$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   130
		\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp$};
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   131
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   132
		\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 12mm]{$a_2$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   133
		\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   134
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   135
		\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=6mm]{$a_{2s}$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   136
		\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp$};
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   137
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   138
		\node (rns) [rectangle, draw = blue, thick, right=of r2s,minimum size=6mm]{$a_{ns}$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   139
		\draw[->,line width=0.2mm, dashed](r2s)--(rns) node[above,midway] {$\backslash \ldots$};
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   140
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   141
		\node (v) [circle, thick, draw, right=of rns, minimum size=6mm, right=1.7cm]{$v$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   142
		\draw[->, line width=0.2mm](rns)--(v) node[above, midway] {\bmkeps} node [below, midway] {\decode};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   143
	\end{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   144
	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{simpShrinks}
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   145
\end{figure}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   146
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   147
\noindent
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   148
Each time
613
Chengsong
parents: 611
diff changeset
   149
a derivative is taken, the regular expression might grow.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   150
However, the simplification that is immediately afterwards will often shrink it so that 
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   151
the overall size of the derivatives stays relatively small.
577
f47fc4840579 thesis chap2
Chengsong
parents: 576
diff changeset
   152
This intuition is depicted by the relative size
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   153
change between the black and blue nodes:
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   154
After $\simp$ the node shrinks.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   155
Our proof states that all the blue nodes
613
Chengsong
parents: 611
diff changeset
   156
stay below a size bound $N_a$ determined by the input $a$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   157
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   158
Sulzmann and Lu's assumed a similar picture of their algorithm,
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   159
even though it did not work as they expected.
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   160
%though in fact their algorithm's size might be better depicted by the following graph:
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   161
%\begin{figure}[H]
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   162
%	\begin{tikzpicture}[scale=2,
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   163
%		every node/.style={minimum size=11mm},
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   164
%		->,>=stealth',shorten >=1pt,auto,thick
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   165
%		]
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   166
%		\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   167
%		\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   168
%		\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   169
%
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   170
%		\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   171
%		\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   172
%
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   173
%		\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 17mm]{$a_2$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   174
%		\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   175
%
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   176
%		\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   177
%		\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   178
%
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   179
%		\node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   180
%		\draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   181
%
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   182
%		\node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   183
%		\draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   184
%
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   185
%		\node (rnn) [right = of rns, minimum size = 1mm]{};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   186
%		\draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$};
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   187
%
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   188
%	\end{tikzpicture}
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   189
%	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   190
%\end{figure}
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   191
%\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   192
The picture means that in some cases their lexer (where they use $\simpsulz$ 
613
Chengsong
parents: 611
diff changeset
   193
as the simplification function)
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   194
will have a size explosion, causing the running time 
613
Chengsong
parents: 611
diff changeset
   195
of each derivative step to grow continuously (for example 
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   196
in \ref{SulzmannLuLexerTime}).
613
Chengsong
parents: 611
diff changeset
   197
They tested out the run time of their
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   198
lexer on particular examples such as $(a+b+ab)^*$
613
Chengsong
parents: 611
diff changeset
   199
and claimed that their algorithm is linear w.r.t to the input.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   200
With our mechanised proof, we avoid this type of unintentional
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   201
generalisation.
613
Chengsong
parents: 611
diff changeset
   202
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   203
Before delving into the details of the formalisation,
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   204
we are going to provide an overview of it in the following subsection.
613
Chengsong
parents: 611
diff changeset
   205
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   206
577
f47fc4840579 thesis chap2
Chengsong
parents: 576
diff changeset
   207
\subsection{Overview of the Proof}
661
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   208
\marginpar{trying to make it more intuitive
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   209
and provide more insights into proof}
663
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   210
The most important idea in this chapter %intuition 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   211
is what we call the "closed forms" of
661
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   212
regular expression derivatives with respect to strings.
663
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   213
In short it allows us to express $r \backslash_{rsimps} s$
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   214
as a different recursive function so induction on the size bound can go through.
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   215
A simple induction on $s$ or $r$ fails for $r\backslash_{rsimps} s$, but 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   216
works for $\textit{ClosedForm}(r,s)$.
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   217
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   218
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   219
661
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   220
Assume we have a regular expression $r$, be it an alternative,
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   221
a sequence or a star, the idea is if we try to take several derivatives
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   222
of it on paper, we end up getting a list of subexpressions,
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   223
something like
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   224
%omitting certain
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   225
%nested structures of those expressions:
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   226
\[
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   227
	r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n,
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   228
\]
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   229
if we omit the way these regular expressions need to be nested.
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   230
where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   231
of $r$ and $s$.
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   232
The second important observation is that the list %of regular expressions
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   233
$[r_1, \ldots, r_n]$ %is not
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   234
cannot grow indefinitely because they all come from $r$, and derivatives
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   235
of the same regular expression are finite up to some isomorphisms.
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   236
We prove that the simplifications of $\blexersimp$ %make use of 
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   237
is powerful enough to counteract the effect of nested structure of alternatives
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   238
and eliminate duplicates
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   239
such that indeed the list in $a\backslash s$ does not grow unbounded.
663
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   240
We call the precise formalisation for the shape of 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   241
\[
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   242
 r_1 + r_2 + r_3 + \ldots + r_n
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   243
\]
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   244
"closed form".
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   245
The name was chosen because turning the recursive relation
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   246
\[
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   247
	a \backslash_{bsimps} (c\!::\!s) \dn (\textit{bsimp} \; (a\backslash c)) \backslash_{bsimps} s
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   248
\]
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   249
into some easier-to-estimate forms 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   250
like
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   251
\[
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   252
	\sum (a_1\backslash s \cdot a_2) :: (\map \; (a_2\backslash\_) \;
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   253
	(\textit{Suffix} \; s \; a_1))
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   254
	%\backslash_{bsimp
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   255
\]
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   256
was reminiscent of 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   257
%similar to t
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   258
solving recurrence relations like 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   259
$T \; n = 2 (T \frac{1}{2} n) + n$ to obtain
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   260
their closed forms.
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   261
%$T \; n = n \ln n + (s \; n)$ ($s \; n$ is 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   262
%some higher-order terms).
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   263
%(for example we know $T$ is $\Theta (n \ln n)$).
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   264
Just like a closed form of a recursive definition makes estimating 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   265
their growth possible, the closed 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   266
form of $a \backslash_{bsimps} s$
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   267
allows us to prove the existence of a size bound.
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   268
Note that \ref{eq:approx} is only an approximate
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   269
term to show our point.
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   270
The precise formalised formula (\ref{seqClosedForm})
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   271
needs to wait until all $\textit{rrexp}$-related
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   272
definitions are given, 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   273
%but for now we can think of the above as "the sequence
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   274
%regular expression $a_1 \cdot a_2$ after derivatives and simplifications
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   275
%w.r.t string $s$ looks like
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   276
%an alternative of giant list of sub-expressions, where each 
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   277
0d1e68268d0f more explanation for the name "closed form" and their intuition
Chengsong
parents: 662
diff changeset
   278
661
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms"
Chengsong
parents: 660
diff changeset
   279
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   280
A high-level overview of the main components of the finiteness proof
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   281
is as follows:
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   282
\begin{figure}[H]
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   283
	\begin{tikzpicture}[scale=1,font=\bf,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   284
		node/.style={
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   285
			rectangle,rounded corners=3mm,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   286
			ultra thick,draw=black!50,minimum height=18mm, 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   287
			minimum width=20mm,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   288
		top color=white,bottom color=black!20}]
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   289
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   290
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   291
		\node (0) at (-5,0) 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   292
			[node, text width=1.8cm, text centered] 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   293
			{$\llbracket \bderssimp{a}{s} \rrbracket$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   294
		\node (A) at (0,0) 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   295
			[node,text width=1.6cm,  text centered] 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   296
			{$\llbracket \rderssimp{r}{s} \rrbracket_r$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   297
		\node (B) at (3,0) 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   298
			[node,text width=3.0cm, anchor=west, minimum width = 40mm] 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   299
			{$\llbracket \textit{ClosedForm}(r, s)\rrbracket_r$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   300
		\node (C) at (9.5,0) [node, minimum width=10mm] {$N_r$};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   301
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   302
		\draw [->,line width=0.5mm] (0) -- 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   303
			node [above,pos=0.45] {=} (A) node [below, pos = 0.45] {$(r = a \downarrow_r)$} (A); 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   304
		\draw [->,line width=0.5mm] (A) -- 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   305
			node [above,pos=0.35] {$\quad =\ldots=$} (B); 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   306
		\draw [->,line width=0.5mm] (B) -- 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   307
			node [above,pos=0.35] {$\quad \leq \ldots \leq$} (C); 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   308
	\end{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   309
	%\caption{
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   310
\end{figure}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   311
\noindent
577
f47fc4840579 thesis chap2
Chengsong
parents: 576
diff changeset
   312
We explain the steps one by one:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   313
\begin{itemize}
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   314
	\item
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   315
		We first introduce the operations such as 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   316
		derivatives, simplification, size calculation, etc.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   317
		associated with $\rrexp$s, which we have introduced
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   318
		in chapter \ref{Bitcoded2}. As promised we will discuss
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   319
		why they are needed in \ref{whyRerase}.
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   320
		The operations on $\rrexp$s are identical to those on
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   321
		annotated regular expressions except that they dispense with
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   322
		bitcodes. This means that all proofs about size of $\rrexp$s will apply to
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   323
		annotated regular expressions, because the size of a regular
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   324
		expression is independent of the bitcodes.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   325
	\item
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   326
		We prove that $\rderssimp{r}{s} = \textit{ClosedForm}(r, s)$,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   327
		where $\textit{ClosedForm}(r, s)$ is entirely 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   328
		given as the derivatives of their children regular 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   329
		expressions.
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   330
		We call the right-hand-side the \emph{Closed Form}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   331
		of the derivative $\rderssimp{r}{s}$.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
   332
	\item
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   333
		Formally we give an estimate of 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   334
		$\llbracket \textit{ClosedForm}(r, s) \rrbracket_r$.
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   335
		The key observation is that $\distinctBy$'s output is
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   336
		a list with a constant length bound.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   337
\end{itemize}
594
Chengsong
parents: 593
diff changeset
   338
We will expand on these steps in the next sections.\\
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   339
613
Chengsong
parents: 611
diff changeset
   340
\section{The $\textit{Rrexp}$ Datatype}
594
Chengsong
parents: 593
diff changeset
   341
The first step is to define 
Chengsong
parents: 593
diff changeset
   342
$\textit{rrexp}$s.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   343
They are annotated regular expressions without bitcodes,
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   344
allowing a more convenient size bound proof.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   345
%Of course, the bits which encode the lexing information 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   346
%would grow linearly with respect 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   347
%to the input, which should be taken into accounte when we wish to tackle the runtime complexity.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   348
%But for the sake of the structural size 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   349
%we can safely ignore them.\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   350
The datatype 
594
Chengsong
parents: 593
diff changeset
   351
definition of the $\rrexp$, called
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   352
\emph{r-regular expressions},
594
Chengsong
parents: 593
diff changeset
   353
was initially defined in \ref{rrexpDef}.
Chengsong
parents: 593
diff changeset
   354
The reason for the prefix $r$ is
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   355
to make a distinction  
594
Chengsong
parents: 593
diff changeset
   356
with basic regular expressions.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   357
We give here again the definition of $\rrexp$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   358
\[			\rrexp ::=   \RZERO \mid  \RONE
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   359
	\mid  \RCHAR{c}  
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   360
	\mid  \RSEQ{r_1}{r_2}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   361
	\mid  \RALTS{rs}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   362
	\mid \RSTAR{r}        
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   363
\]
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   364
The size of an r-regular expression is
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   365
written $\llbracket r\rrbracket_r$, 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   366
whose definition mirrors that of an annotated regular expression. 
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   367
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   368
	\begin{tabular}{lcl}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   369
		$\llbracket _{bs}\ONE \rrbracket_r$ & $\dn$ & $1$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   370
		$\llbracket \ZERO \rrbracket_r$ & $\dn$ & $1$ \\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   371
		$\llbracket _{bs} r_1 \cdot r_2 \rrbracket_r$ & $\dn$ & $\llbracket r_1 \rrbracket_r + \llbracket r_2 \rrbracket_r + 1$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   372
		$\llbracket _{bs}\mathbf{c} \rrbracket_r $ & $\dn$ & $1$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   373
		$\llbracket _{bs}\sum as \rrbracket_r $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket_r)\; as   + 1$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   374
		$\llbracket _{bs} a^* \rrbracket_r $ & $\dn$ & $\llbracket a \rrbracket_r + 1$.
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   375
	\end{tabular}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   376
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   377
\noindent
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   378
The $r$ in the subscript of $\llbracket \rrbracket_r$ is to 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   379
differentiate with the same operation for annotated regular expressions.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   380
Similar subscripts will be added for operations like $\rerase{}$:
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   381
\begin{center}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   382
	\begin{tabular}{lcl}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   383
		$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   384
		$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   385
		$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   386
		$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   387
		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   388
		$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   389
	\end{tabular}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   390
\end{center}
594
Chengsong
parents: 593
diff changeset
   391
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   392
\subsection{Why a New Datatype?}\label{whyRerase}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   393
\marginpar{\em added label so this section can be referenced by other parts of the thesis
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   394
so that interested readers can jump to/be reassured that there will explanations.}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   395
Originally the erase operation $(\_)_\downarrow$ was
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   396
used by Ausaf et al. in their proofs related to $\blexer$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   397
This function was not part of the lexing algorithm, and the sole purpose was to
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   398
bridge the gap between the $r$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   399
%$\textit{rexp}$ 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   400
(un-annotated) and $\textit{arexp}$ (annotated)
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   401
regular expression datatypes so as to leverage the correctness
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   402
theorem of $\lexer$.%to establish the correctness of $\blexer$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   403
For example, lemma \ref{retrieveStepwise} %and \ref{bmkepsRetrieve} 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   404
uses $\erase$ to convert an annotated regular expression $a$ into
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   405
a plain one so that it can be used by $\inj$ to create the desired value
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   406
$\inj\; (a)_\downarrow \; c \; v$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   407
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   408
Ideally $\erase$ should only remove the auxiliary information not related to the
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   409
structure--the 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   410
bitcodes. However there exists a complication
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   411
where the alternative constructors have different arity for $\textit{arexp}$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   412
and $\textit{r}$:
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   413
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   414
	\begin{tabular}{lcl}
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   415
		$\textit{r}$ & $::=$ & $\ldots \;|\; (\_ + \_) \; ::\; "\textit{r} \Rightarrow \textit{r} \Rightarrow \textit{r}" | \ldots$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   416
		$\textit{arexp}$ & $::=$ & $\ldots\; |\; (\Sigma \_ ) \; ::\; "\textit{arexp} \; list \Rightarrow \textit{arexp}" | \ldots$
594
Chengsong
parents: 593
diff changeset
   417
	\end{tabular}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
   418
\end{center}
594
Chengsong
parents: 593
diff changeset
   419
\noindent
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   420
To convert between the two
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   421
$\erase$ has to recursively disassemble a list into nested binary applications of the 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   422
$(\_ + \_)$ operator,
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   423
handling corner cases like empty or
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   424
singleton alternative lists:
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   425
%becomes $r$ during the
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   426
%$\erase$ function.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   427
%The  annotated regular expression $\sum[a, b, c]$ would turn into
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   428
%$(a+(b+c))$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   429
\begin{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   430
	\begin{tabular}{lcl}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   431
		$ (_{bs}\sum [])_\downarrow $ & $\dn$ & $\ZERO$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   432
		$ (_{bs}\sum [a])_\downarrow$ & $\dn$ & $a$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   433
		$ (_{bs}\sum a_1 :: a_2)_\downarrow$ & $\dn$ & $(a_1)_\downarrow + (a_2)_\downarrow)$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   434
		$ (_{bs}\sum a :: as)_\downarrow$ & $\dn$ & $a_\downarrow + (\erase \; _{[]} \sum as)$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   435
	\end{tabular}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   436
\end{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   437
\noindent
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   438
These operations inevitably change the structure and size of
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   439
an annotated regular expression. For example,
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   440
$a_1 = \sum _{Z}[x]$ has size 2, but $(a_1)_\downarrow = x$ 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   441
only has size 1.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   442
%adding unnecessary 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   443
%complexities to the size bound proof.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   444
%The reason we
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   445
%define a new datatype is that 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   446
%the $\erase$ function 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   447
%does not preserve the structure of annotated
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   448
%regular expressions.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   449
%We initially started by using 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   450
%plain regular expressions and tried to prove
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   451
%lemma \ref{rsizeAsize},
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   452
%however the $\erase$ function messes with the structure of the 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   453
%annotated regular expression.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   454
%The $+$ constructor
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   455
%of basic regular expressions is only binary, whereas $\sum$ 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   456
%takes a list. Therefore we need to convert between
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   457
%annotated and normal regular expressions as follows:
613
Chengsong
parents: 611
diff changeset
   458
For example, if we define the size of a basic plain regular expression 
594
Chengsong
parents: 593
diff changeset
   459
in the usual way,
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   460
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   461
	\begin{tabular}{lcl}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   462
		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   463
		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   464
		$\llbracket r_1 + r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   465
		$\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   466
		$\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   467
		$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   468
	\end{tabular}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   469
\end{center}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   470
\noindent
594
Chengsong
parents: 593
diff changeset
   471
Then the property
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   472
\begin{center}
613
Chengsong
parents: 611
diff changeset
   473
	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   474
\end{center}
594
Chengsong
parents: 593
diff changeset
   475
does not hold.
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   476
%With $\textit{rerase}$, however, 
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   477
%only the bitcodes are thrown away.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   478
That leads to us defining the new regular expression datatype without
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   479
bitcodes but with a list alternative constructor, and defining a new erase function
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   480
in a strictly structure-preserving manner:
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   481
\begin{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   482
	\begin{tabular}{lcl}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   483
		$\textit{rrexp}$ & $::=$ & $\ldots\; |\; (\sum \_ ) \; ::\; "\textit{rrexp} \; list \Rightarrow \textit{rrexp}" | \ldots$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   484
		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   485
	\end{tabular}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   486
\end{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   487
\noindent
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   488
%But
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   489
%Everything about the structure remains intact.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   490
%Therefore it does not change the size
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   491
%of an annotated regular expression and we have:
613
Chengsong
parents: 611
diff changeset
   492
\noindent
594
Chengsong
parents: 593
diff changeset
   493
One might be able to prove an inequality such as
Chengsong
parents: 593
diff changeset
   494
$\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
Chengsong
parents: 593
diff changeset
   495
and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
Chengsong
parents: 593
diff changeset
   496
but we found our approach more straightforward.\\
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   497
613
Chengsong
parents: 611
diff changeset
   498
\subsection{Functions for R-regular Expressions}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   499
The downside of our approach is that we need to redefine
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   500
several functions for $\rrexp$.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   501
In this section we shall define the r-regular expression version
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   502
of $\bder$, and $\textit{bsimp}$ related functions.
613
Chengsong
parents: 611
diff changeset
   503
We use $r$ as the prefix or subscript to differentiate
Chengsong
parents: 611
diff changeset
   504
with the bitcoded version.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   505
%For example,$\backslash_r$, $\rdistincts$, and $\rsimp$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   506
%as opposed to $\backslash$, $\distinctBy$, and $\bsimp$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   507
%As promised, they are much simpler than their bitcoded counterparts.
613
Chengsong
parents: 611
diff changeset
   508
%The operations on r-regular expressions are 
Chengsong
parents: 611
diff changeset
   509
%almost identical to those of the annotated regular expressions,
Chengsong
parents: 611
diff changeset
   510
%except that no bitcodes are used. For example,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   511
The derivative operation for an r-regular expression is\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   512
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   513
	\begin{tabular}{@{}lcl@{}}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   514
		$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\  
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   515
		$(\ONE)\,\backslash_r c$ & $\dn$ &
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   516
		$\textit{if}\;c=d\; \;\textit{then}\;
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   517
		\ONE\;\textit{else}\;\ZERO$\\  
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   518
		$(\sum \;\textit{rs})\,\backslash_r c$ & $\dn$ &
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   519
		$\sum\;(\textit{map} \; (\_\backslash_r c) \; rs )$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   520
		$(r_1\cdot r_2)\,\backslash_r c$ & $\dn$ &
594
Chengsong
parents: 593
diff changeset
   521
		$\textit{if}\;(\textit{rnullable}\,r_1)$\\
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   522
						 & &$\textit{then}\;\sum\,[(r_1\,\backslash_r c)\cdot\,r_2,$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   523
						 & &$\phantom{\textit{then},\;\sum\,}((r_2\,\backslash_r c))]$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   524
						 & &$\textit{else}\;\,(r_1\,\backslash_r c)\cdot r_2$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   525
		$(r^*)\,\backslash_r c$ & $\dn$ &
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   526
		$( r\,\backslash_r c)\cdot
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   527
		(_{[]}r^*))$
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   528
	\end{tabular}    
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   529
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   530
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   531
where we omit the definition of $\textit{rnullable}$.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   532
The generalisation from the derivatives w.r.t a character to
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   533
derivatives w.r.t strings is given as
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
   534
\begin{center}
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
   535
	\begin{tabular}{lcl}
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
   536
		$r \backslash_{rs} []$ & $\dn$ & $r$\\
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
   537
		$r \backslash_{rs} c::s$ & $\dn$ & $(r\backslash_r c) \backslash_{rs} s$
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
   538
	\end{tabular}
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
   539
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   540
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   541
The function $\distinctBy$ for r-regular expressions does not need 
594
Chengsong
parents: 593
diff changeset
   542
a function checking equivalence because
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   543
there are no bit annotations.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   544
Therefore we have
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   545
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   546
	\begin{tabular}{lcl}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   547
		$\rdistinct{[]}{rset} $ & $\dn$ & $[]$\\
594
Chengsong
parents: 593
diff changeset
   548
		$\rdistinct{r :: rs}{rset}$ & $\dn$ & 
Chengsong
parents: 593
diff changeset
   549
		$\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\
Chengsong
parents: 593
diff changeset
   550
					    &        & $\textit{else}\; \;
Chengsong
parents: 593
diff changeset
   551
					    r::\rdistinct{rs}{(rset \cup \{r\})}$
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   552
	\end{tabular}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   553
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   554
%TODO: definition of rsimp (maybe only the alternative clause)
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   555
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   556
%We would like to make clear
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   557
%a difference between our $\rdistincts$ and
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   558
%the Isabelle $\textit {distinct}$ predicate.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   559
%In Isabelle $\textit{distinct}$ is a function that returns a boolean
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   560
%rather than a list.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   561
%It tests if all the elements of a list are unique.\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   562
With $\textit{rdistinct}$ in place,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   563
the flatten function for $\rrexp$ is as follows:
595
Chengsong
parents: 594
diff changeset
   564
 \begin{center}
Chengsong
parents: 594
diff changeset
   565
  \begin{tabular}{@{}lcl@{}}
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   566
  $\textit{rflts} \; (\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $as \; @ \; \textit{rflts} \; as' $ \\
595
Chengsong
parents: 594
diff changeset
   567
  $\textit{rflts} \; \ZERO :: as'$ & $\dn$ & $ \textit{rflts} \;  \textit{as'} $ \\
Chengsong
parents: 594
diff changeset
   568
    $\textit{rflts} \; a :: as'$ & $\dn$ & $a :: \textit{rflts} \; \textit{as'}$ \quad(otherwise) 
Chengsong
parents: 594
diff changeset
   569
\end{tabular}    
Chengsong
parents: 594
diff changeset
   570
\end{center}  
Chengsong
parents: 594
diff changeset
   571
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   572
The function 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   573
$\rsimpalts$ corresponds to $\textit{bsimp}_{ALTS}$:
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   574
\begin{center}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   575
  \begin{tabular}{@{}lcl@{}}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   576
	  $\rsimpalts \;\; nil$ & $\dn$ & $\RZERO$\\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   577
	  $\rsimpalts \;\; r::nil$ & $\dn$ & $r$\\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   578
	  $\rsimpalts \;\; rs$ & $\dn$ & $\sum rs$\\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   579
\end{tabular}    
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   580
\end{center}  
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   581
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   582
Similarly, we have $\rsimpseq$ which corresponds to $\textit{bsimp}_{SEQ}$:
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   583
\begin{center}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   584
  \begin{tabular}{@{}lcl@{}}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   585
	  $\rsimpseq \;\; \RZERO \; \_ $ &   $=$ &   $\RZERO$\\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   586
	  $\rsimpseq \;\; \_ \; \RZERO $ &   $=$ &   $\RZERO$\\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   587
	  $\rsimpseq \;\; \RONE \cdot r_2$ & $\dn$ & $r_2$\\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   588
	  $\rsimpseq \;\; r_1 r_2$ & $\dn$ & $r_1 \cdot r_2$\\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   589
\end{tabular}    
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   590
\end{center}  
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   591
and get $\textit{rsimp}$ and $\rderssimp{\_}{\_}$:
595
Chengsong
parents: 594
diff changeset
   592
\begin{center}
Chengsong
parents: 594
diff changeset
   593
  \begin{tabular}{@{}lcl@{}}
Chengsong
parents: 594
diff changeset
   594
   
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   595
	  $\textit{rsimp} \; (r_1\cdot r_2)$ & $\dn$ & $ \textit{rsimp}_{SEQ} \; bs \;(\textit{rsimp} \; r_1) \; (\textit{rsimp}  \; r_2)  $ \\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   596
	  $\textit{rsimp} \; (_{bs}\sum \textit{rs})$ & $\dn$ & $\textit{rsimp}_{ALTS} \; \textit{bs} \; (\textit{rdistinct} \; ( \textit{rflts} ( \textit{map} \; rsimp \; rs)) \; \rerases \; \varnothing) $ \\
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   597
   $\textit{rsimp} \; r$ & $\dn$ & $\textit{r} \qquad \textit{otherwise}$   
595
Chengsong
parents: 594
diff changeset
   598
\end{tabular}    
Chengsong
parents: 594
diff changeset
   599
\end{center} 
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   600
\begin{center}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   601
	\begin{tabular}{@{}lcl@{}}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   602
		$r\backslash_{rsimp} \, c$ & $\dn$ & $\rsimp \; (r\backslash_r \, c)$
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   603
	\end{tabular}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   604
\end{center}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   605
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   606
\begin{center}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   607
	\begin{tabular}{@{}lcl@{}}
601
Chengsong
parents: 596
diff changeset
   608
$r \backslash_{rsimps} \; \; c\!::\!s $ & $\dn$ & $(r \backslash_{rsimp}\, c) \backslash_{rsimps}\, s$ \\
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   609
$r \backslash_{rsimps} [\,] $ & $\dn$ & $r$
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   610
	\end{tabular}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   611
\end{center}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   612
\noindent
601
Chengsong
parents: 596
diff changeset
   613
We do not define an r-regular expression version of $\blexersimp$,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   614
as our proof does not depend on it.
613
Chengsong
parents: 611
diff changeset
   615
Now we are ready to introduce how r-regular expressions allow
Chengsong
parents: 611
diff changeset
   616
us to prove the size bound on bitcoded regular expressions.
Chengsong
parents: 611
diff changeset
   617
Chengsong
parents: 611
diff changeset
   618
\subsection{Using R-regular Expressions to Bound Bit-coded Regular Expressions}
Chengsong
parents: 611
diff changeset
   619
Everything about the size of annotated regular expressions after the application
Chengsong
parents: 611
diff changeset
   620
of function $\bsimp$ and $\backslash_{simps}$
Chengsong
parents: 611
diff changeset
   621
can be calculated via the size of r-regular expressions after the application
Chengsong
parents: 611
diff changeset
   622
of $\rsimp$ and $\backslash_{rsimps}$:
564
Chengsong
parents: 562
diff changeset
   623
\begin{lemma}\label{sizeRelations}
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   624
	The following equalities hold:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   625
	\begin{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   626
		\item
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   627
			$\rsize{\rerase a} = \asize a$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   628
		\item
601
Chengsong
parents: 596
diff changeset
   629
			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
554
Chengsong
parents: 553
diff changeset
   630
		\item
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   631
			$\asize{\bderssimp{a}{s}} =  \rsize{\rderssimp{\rerase{a}}{s}}$
554
Chengsong
parents: 553
diff changeset
   632
	\end{itemize}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   633
\end{lemma}
601
Chengsong
parents: 596
diff changeset
   634
\begin{proof}
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   635
	First part follows from the definition of $(\_)_{\downarrow_r}$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   636
	The second part is by induction on the inductive cases
601
Chengsong
parents: 596
diff changeset
   637
	of $\textit{bsimp}$.
659
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment
Chengsong
parents: 640
diff changeset
   638
	The third part is by induction on the string $s$,
601
Chengsong
parents: 596
diff changeset
   639
	where the inductive step follows from part one.
Chengsong
parents: 596
diff changeset
   640
\end{proof}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   641
\noindent
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   642
With lemma \ref{sizeRelations},
601
Chengsong
parents: 596
diff changeset
   643
we will be able to focus on 
Chengsong
parents: 596
diff changeset
   644
estimating only
Chengsong
parents: 596
diff changeset
   645
$\rsize{\rderssimp{\rerase{a}}{s}}$
Chengsong
parents: 596
diff changeset
   646
in later parts because
Chengsong
parents: 596
diff changeset
   647
\begin{center}
Chengsong
parents: 596
diff changeset
   648
	$\rsize{\rderssimp{\rerase{a}}{s}} \leq N_r \quad$
Chengsong
parents: 596
diff changeset
   649
	implies
Chengsong
parents: 596
diff changeset
   650
	$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
Chengsong
parents: 596
diff changeset
   651
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   652
%From now on we 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   653
%Unless stated otherwise in the rest of this 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   654
%chapter all regular expressions without
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   655
%bitcodes are seen as r-regular expressions ($\rrexp$s).
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   656
%For the binary alternative r-regular expression $\RALTS{[r_1, r_2]}$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   657
%we use the notation $r_1 + r_2$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   658
%for brevity.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   659
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   660
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   661
%-----------------------------------
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   662
%	SUB SECTION ROADMAP RREXP BOUND
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   663
%-----------------------------------
553
0f00d440f484 more changes
Chengsong
parents: 543
diff changeset
   664
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   665
%\subsection{Roadmap to a Bound for $\textit{Rrexp}$}
553
0f00d440f484 more changes
Chengsong
parents: 543
diff changeset
   666
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   667
%The way we obtain the bound for $\rrexp$s is by two steps:
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   668
%\begin{itemize}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   669
%	\item
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   670
%		First, we rewrite $r\backslash s$ into something else that is easier
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   671
%		to bound. This step is crucial for the inductive case 
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   672
%		$r_1 \cdot r_2$ and $r^*$, where the derivative can grow and bloat in a wild way,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   673
%		but after simplification, they will always be equal or smaller to a form consisting of an alternative
596
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   674
%		list of regular expressions $f \; (g\; (\sum rs))$ with some functions applied to it, where each element will be distinct after the function application.
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   675
%	\item
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   676
%		Then, for such a sum  list of regular expressions $f\; (g\; (\sum rs))$, we can control its size
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   677
%		by estimation, since $\distinctBy$ and $\flts$ are well-behaved and working together would only 
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   678
%		reduce the size of a regular expression, not adding to it.
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   679
%\end{itemize}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   680
%
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   681
%\section{Step One: Closed Forms}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   682
%We transform the function application $\rderssimp{r}{s}$
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   683
%into an equivalent 
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   684
%form $f\; (g \; (\sum rs))$.
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   685
%The functions $f$ and $g$ can be anything from $\flts$, $\distinctBy$ and other helper functions from $\bsimp{\_}$.
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   686
%This way we get a different but equivalent way of expressing : $r\backslash s = f \; (g\; (\sum rs))$, we call the
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   687
%right hand side the "closed form" of $r\backslash s$.
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   688
%
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   689
%\begin{quote}\it
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   690
%	Claim: For regular expressions $r_1 \cdot r_2$, we claim that
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   691
%\end{quote}
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   692
%\noindent
b306628a0eab more chap 56
Chengsong
parents: 595
diff changeset
   693
%We explain in detail how we reached those claims.
601
Chengsong
parents: 596
diff changeset
   694
If we attempt to prove 
Chengsong
parents: 596
diff changeset
   695
\begin{center}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   696
	$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{rsimps} s \rrbracket_r \leq N_r$
601
Chengsong
parents: 596
diff changeset
   697
\end{center}
Chengsong
parents: 596
diff changeset
   698
using a naive induction on the structure of $r$,
Chengsong
parents: 596
diff changeset
   699
then we are stuck at the inductive cases such as
Chengsong
parents: 596
diff changeset
   700
$r_1\cdot r_2$.
Chengsong
parents: 596
diff changeset
   701
The inductive hypotheses are:
Chengsong
parents: 596
diff changeset
   702
\begin{center}
Chengsong
parents: 596
diff changeset
   703
	1: $\text{for } r_1, \text{there exists } N_{r_1}.\;\; s.t. 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   704
	\;\;\forall s.  \llbracket r_1 \backslash_{rsimps} s \rrbracket_r \leq N_{r_1}. $\\
601
Chengsong
parents: 596
diff changeset
   705
	2: $\text{for } r_2, \text{there exists } N_{r_2}.\;\; s.t. 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   706
	\;\; \forall s. \llbracket r_2 \backslash_{rsimps} s \rrbracket_r \leq N_{r_2}. $
601
Chengsong
parents: 596
diff changeset
   707
\end{center}
Chengsong
parents: 596
diff changeset
   708
The inductive step to prove would be 
Chengsong
parents: 596
diff changeset
   709
\begin{center}
Chengsong
parents: 596
diff changeset
   710
	$\text{there exists } N_{r_1\cdot r_2}. \;\; s.t. \forall s. 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   711
	\llbracket (r_1 \cdot r_2) \backslash_{rsimps} s \rrbracket_r \leq N_{r_1\cdot r_2}.$
601
Chengsong
parents: 596
diff changeset
   712
\end{center}
Chengsong
parents: 596
diff changeset
   713
The problem is that it is not clear what 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   714
$(r_1\cdot r_2) \backslash_{rsimps} s$ looks like,
601
Chengsong
parents: 596
diff changeset
   715
and therefore $N_{r_1}$ and $N_{r_2}$ in the
Chengsong
parents: 596
diff changeset
   716
inductive hypotheses cannot be directly used.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   717
%We have already seen that $(r_1 \cdot r_2)\backslash s$ 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   718
%and $(r^*)\backslash s$ can grow in a wild way.
613
Chengsong
parents: 611
diff changeset
   719
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   720
The point however, is that they will be equivalent to a list of
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   721
terms $\sum rs$, where each term in $rs$ will
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   722
be made of $r_1 \backslash s' $, $r_2\backslash s'$,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   723
and $r \backslash s'$ with $s' \in \textit{SubString} \; s$ (which stands
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   724
for the set of substrings of $s$).
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   725
The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   726
in the simplification, which prevents the $rs$ from growing indefinitely.
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   727
613
Chengsong
parents: 611
diff changeset
   728
Based on this idea, we develop a proof in two steps.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   729
First, we show the below equality (where
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   730
$f$ and $g$ are functions that do not increase the size of the input)
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   731
\begin{center}
613
Chengsong
parents: 611
diff changeset
   732
$r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   733
\end{center}
613
Chengsong
parents: 611
diff changeset
   734
where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
Chengsong
parents: 611
diff changeset
   735
For example, for $r_1 \cdot r_2$ we have the equality as
Chengsong
parents: 611
diff changeset
   736
	\begin{center}
Chengsong
parents: 611
diff changeset
   737
		$ \rderssimp{r_1 \cdot r_2}{s} = 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   738
		\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r_2}{\_} \;(\vsuf{s}{r_1})))}$
613
Chengsong
parents: 611
diff changeset
   739
	\end{center}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   740
We call the right-hand-side the 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   741
\emph{Closed Form} of $(r_1 \cdot r_2)\backslash_{rsimps} s$.
613
Chengsong
parents: 611
diff changeset
   742
Second, we will bound the closed form of r-regular expressions
Chengsong
parents: 611
diff changeset
   743
using some estimation techniques
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   744
and then apply
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   745
lemma \ref{sizeRelations} to show that the bitcoded regular expressions
613
Chengsong
parents: 611
diff changeset
   746
in our $\blexersimp$ are finitely bounded.
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   747
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   748
We will describe in detail the first step of the proof
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   749
in the next section.
613
Chengsong
parents: 611
diff changeset
   750
Chengsong
parents: 611
diff changeset
   751
\section{Closed Forms}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   752
In this section we introduce in detail
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   753
how to express the string derivatives
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   754
of regular expressions (i.e. $r \backslash_r s$ where $s$ is a string
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   755
rather than a single character) in a different way than 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   756
our previous definition.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   757
In previous chapters, the derivative of a 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   758
regular expression $r$ w.r.t a string $s$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   759
was recursively defined on the string:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   760
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   761
	$r \backslash_s (c::s) \dn (r \backslash c) \backslash_s s$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   762
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   763
The problem is that 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   764
this definition does not provide much information
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   765
on what $r \backslash_s s$ looks like.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   766
If we are interested in the size of a derivative like 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   767
$(r_1 \cdot r_2)\backslash s$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   768
we have to somehow get a more concrete form to begin.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   769
We call such more concrete representations the ``closed forms'' of
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   770
string derivatives as opposed to their original definitions.
668
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   771
The name ``closed from'' was inspired by closed forms in math,
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   772
and the similarity with closed forms here is that they make
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   773
estimating the same term easier.
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   774
%The terminology ``closed form'' is borrowed from mathematics,
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   775
%which usually describe expressions that are solely comprised of finitely many
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   776
%well-known and easy-to-compute operations such as 
3831621d7b14 added technical Overview section, almost done introduction
Chengsong
parents: 663
diff changeset
   777
%additions, multiplications, and exponential functions.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   778
613
Chengsong
parents: 611
diff changeset
   779
We start by proving some basic identities
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   780
involving the simplification functions for r-regular expressions.
613
Chengsong
parents: 611
diff changeset
   781
After that we introduce the rewrite relations
Chengsong
parents: 611
diff changeset
   782
$\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
Chengsong
parents: 611
diff changeset
   783
$\rightsquigarrow_f$ and $\rightsquigarrow_g$.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   784
These relations involve similar techniques as in chapter \ref{Bitcoded2}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   785
for annotated regular expressions.
613
Chengsong
parents: 611
diff changeset
   786
Finally, we use these identities to establish the
Chengsong
parents: 611
diff changeset
   787
closed forms of the alternative regular expression,
Chengsong
parents: 611
diff changeset
   788
the sequence regular expression, and the star regular expression.
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   789
%$r_1\cdot r_2$, $r^*$ and $\sum rs$.
601
Chengsong
parents: 596
diff changeset
   790
Chengsong
parents: 596
diff changeset
   791
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   792
613
Chengsong
parents: 611
diff changeset
   793
\subsection{Some Basic Identities}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   794
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   795
In what follows we will often convert between lists
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   796
and sets.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   797
We use Isabelle's $set$ to refer to the 
611
Chengsong
parents: 610
diff changeset
   798
function that converts a list $rs$ to the set
Chengsong
parents: 610
diff changeset
   799
containing all the elements in $rs$.
Chengsong
parents: 610
diff changeset
   800
\subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   801
The $\textit{rdistinct}$ function, as its name suggests, will
613
Chengsong
parents: 611
diff changeset
   802
de-duplicate an r-regular expression list.
Chengsong
parents: 611
diff changeset
   803
It will also remove any elements that 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   804
are already in the accumulator set.
555
Chengsong
parents: 554
diff changeset
   805
\begin{lemma}\label{rdistinctDoesTheJob}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   806
	%The function $\textit{rdistinct}$ satisfies the following
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   807
	%properties:
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   808
	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   809
	recursive definition here. Its Isabelle counterpart would be $\textit{distinct}$.} 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   810
	for testing
613
Chengsong
parents: 611
diff changeset
   811
	whether a list's elements are unique. Then the following
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   812
	properties about $\textit{rdistinct}$ hold:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   813
	\begin{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   814
		\item
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   815
			If $a \in acc$ then $a \notin (\rdistinct{rs}{acc})$.
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   816
		\item
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   817
			%If list $rs'$ is the result of $\rdistinct{rs}{acc}$,
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   818
			$\textit{isDistinct} \;\;\; (\rdistinct{rs}{acc})$.
555
Chengsong
parents: 554
diff changeset
   819
		\item
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   820
			$\textit{set} \; (\rdistinct{rs}{acc}) 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   821
			= (\textit{set} \; rs) - acc$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   822
	\end{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   823
\end{lemma}
555
Chengsong
parents: 554
diff changeset
   824
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   825
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   826
	The first part is by an induction on $rs$.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   827
	The second and third parts can be proven by using the 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   828
	inductive cases of $\textit{rdistinct}$.
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   829
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   830
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   831
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   832
\noindent
613
Chengsong
parents: 611
diff changeset
   833
%$\textit{rdistinct}$ will out all regular expression terms
Chengsong
parents: 611
diff changeset
   834
%that are in the accumulator, therefore 
Chengsong
parents: 611
diff changeset
   835
Concatenating a list $rs_a$ at the front of another
Chengsong
parents: 611
diff changeset
   836
list $rs$ whose elements are all from the accumulator, and then calling $\textit{rdistinct}$
Chengsong
parents: 611
diff changeset
   837
on the merged list, the output will be as if we had called $\textit{rdistinct}$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   838
without the prepending of $rs$:
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   839
\begin{lemma}\label{rdistinctConcat}
554
Chengsong
parents: 553
diff changeset
   840
	The elements appearing in the accumulator will always be removed.
Chengsong
parents: 553
diff changeset
   841
	More precisely,
Chengsong
parents: 553
diff changeset
   842
	\begin{itemize}
Chengsong
parents: 553
diff changeset
   843
		\item
Chengsong
parents: 553
diff changeset
   844
			If $rs \subseteq rset$, then 
Chengsong
parents: 553
diff changeset
   845
			$\rdistinct{rs@rsa }{acc} = \rdistinct{rsa }{acc}$.
Chengsong
parents: 553
diff changeset
   846
		\item
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   847
			More generally, if $a \in rset$ and $\rdistinct{rs}{\{a\}} = []$,
554
Chengsong
parents: 553
diff changeset
   848
			then $\rdistinct{(rs @ rs')}{rset} = \rdistinct{rs'}{rset}$
Chengsong
parents: 553
diff changeset
   849
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   850
\end{lemma}
554
Chengsong
parents: 553
diff changeset
   851
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   852
\begin{proof}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
   853
	By induction on $rs$ and using \ref{rdistinctDoesTheJob}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   854
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   855
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   856
On the other hand, if an element $r$ does not appear in the input list waiting to be deduplicated,
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   857
then expanding the accumulator to include that element will not cause the output list to change:
611
Chengsong
parents: 610
diff changeset
   858
\begin{lemma}\label{rdistinctOnDistinct}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   859
	The accumulator can be augmented to include elements not appearing in the input list,
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   860
	and the output will not change.	
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   861
	\begin{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   862
		\item
611
Chengsong
parents: 610
diff changeset
   863
			If $r \notin rs$, then $\rdistinct{rs}{acc} = \rdistinct{rs}{(\{r\} \cup acc)}$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   864
		\item
611
Chengsong
parents: 610
diff changeset
   865
			Particularly, if $\;\;\textit{isDistinct} \; rs$, then we have\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   866
			\[ \rdistinct{rs}{\varnothing} = rs \]
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   867
	\end{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   868
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   869
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   870
	The first half is by induction on $rs$. The second half is a corollary of the first.
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   871
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   872
\noindent
611
Chengsong
parents: 610
diff changeset
   873
The function $\textit{rdistinct}$ removes duplicates from anywhere in a list.
Chengsong
parents: 610
diff changeset
   874
Despite being seemingly obvious, 
Chengsong
parents: 610
diff changeset
   875
the induction technique is not as straightforward.
554
Chengsong
parents: 553
diff changeset
   876
\begin{lemma}\label{distinctRemovesMiddle}
Chengsong
parents: 553
diff changeset
   877
	The two properties hold if $r \in rs$:
Chengsong
parents: 553
diff changeset
   878
	\begin{itemize}
Chengsong
parents: 553
diff changeset
   879
		\item
555
Chengsong
parents: 554
diff changeset
   880
			$\rdistinct{rs}{rset} = \rdistinct{(rs @ [r])}{rset}$\\
Chengsong
parents: 554
diff changeset
   881
			and\\
554
Chengsong
parents: 553
diff changeset
   882
			$\rdistinct{(ab :: rs @ [ab])}{rset'} = \rdistinct{(ab :: rs)}{rset'}$
Chengsong
parents: 553
diff changeset
   883
		\item
555
Chengsong
parents: 554
diff changeset
   884
			$\rdistinct{ (rs @ rs') }{rset} = \rdistinct{rs @ [r] @ rs'}{rset}$\\
Chengsong
parents: 554
diff changeset
   885
			and\\
554
Chengsong
parents: 553
diff changeset
   886
			$\rdistinct{(ab :: rs @ [ab] @ rs'')}{rset'} = 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   887
			\rdistinct{(ab :: rs @ rs'')}{rset'}$
554
Chengsong
parents: 553
diff changeset
   888
	\end{itemize}
Chengsong
parents: 553
diff changeset
   889
\end{lemma}
Chengsong
parents: 553
diff changeset
   890
\noindent
Chengsong
parents: 553
diff changeset
   891
\begin{proof}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   892
	By induction on $rs$. All other variables are allowed to be arbitrary.
611
Chengsong
parents: 610
diff changeset
   893
	The second part of the lemma requires the first.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   894
	Note that for each part, the two sub-propositions need to be proven 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   895
	at the same time,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
   896
	so that the induction goes through.
554
Chengsong
parents: 553
diff changeset
   897
\end{proof}
555
Chengsong
parents: 554
diff changeset
   898
\noindent
611
Chengsong
parents: 610
diff changeset
   899
This allows us to prove a few more equivalence relations involving 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   900
$\textit{rdistinct}$ (they will be useful later):
555
Chengsong
parents: 554
diff changeset
   901
\begin{lemma}\label{rdistinctConcatGeneral}
611
Chengsong
parents: 610
diff changeset
   902
	\mbox{}
555
Chengsong
parents: 554
diff changeset
   903
	\begin{itemize}
Chengsong
parents: 554
diff changeset
   904
		\item
Chengsong
parents: 554
diff changeset
   905
			$\rdistinct{(rs @ rs')}{\varnothing} = \rdistinct{((\rdistinct{rs}{\varnothing})@ rs')}{\varnothing}$
Chengsong
parents: 554
diff changeset
   906
		\item
Chengsong
parents: 554
diff changeset
   907
			$\rdistinct{(rs @ rs')}{\varnothing} = \rdistinct{(\rdistinct{rs}{\varnothing} @ rs')}{\varnothing}$
Chengsong
parents: 554
diff changeset
   908
		\item
Chengsong
parents: 554
diff changeset
   909
			If $rset' \subseteq rset$, then $\rdistinct{rs}{rset} = 
Chengsong
parents: 554
diff changeset
   910
			\rdistinct{(\rdistinct{rs}{rset'})}{rset}$. As a corollary
Chengsong
parents: 554
diff changeset
   911
			of this,
Chengsong
parents: 554
diff changeset
   912
		\item
Chengsong
parents: 554
diff changeset
   913
			$\rdistinct{(rs @ rs')}{rset} = \rdistinct{
Chengsong
parents: 554
diff changeset
   914
			(\rdistinct{rs}{\varnothing}) @ rs')}{rset}$. This
Chengsong
parents: 554
diff changeset
   915
			gives another corollary use later:
Chengsong
parents: 554
diff changeset
   916
		\item
Chengsong
parents: 554
diff changeset
   917
			If $a \in rset$, then $\rdistinct{(rs @ rs')}{rset} = \rdistinct{
Chengsong
parents: 554
diff changeset
   918
			(\rdistinct{(a :: rs)}{\varnothing} @ rs')}{rset} $,
Chengsong
parents: 554
diff changeset
   919
Chengsong
parents: 554
diff changeset
   920
	\end{itemize}
Chengsong
parents: 554
diff changeset
   921
\end{lemma}
Chengsong
parents: 554
diff changeset
   922
\begin{proof}
Chengsong
parents: 554
diff changeset
   923
	By \ref{rdistinctDoesTheJob} and \ref{distinctRemovesMiddle}.
Chengsong
parents: 554
diff changeset
   924
\end{proof}
611
Chengsong
parents: 610
diff changeset
   925
\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   926
The next lemma is a more general form of \ref{rdistinctConcat};
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   927
It says that
611
Chengsong
parents: 610
diff changeset
   928
$\textit{rdistinct}$ is composable w.r.t list concatenation:
Chengsong
parents: 610
diff changeset
   929
\begin{lemma}\label{distinctRdistinctAppend}
Chengsong
parents: 610
diff changeset
   930
			If $\;\; \textit{isDistinct} \; rs_1$, 
Chengsong
parents: 610
diff changeset
   931
			and $(set \; rs_1) \cap acc = \varnothing$,
Chengsong
parents: 610
diff changeset
   932
			then applying $\textit{rdistinct}$ on $rs_1 @ rs_a$ does not 
Chengsong
parents: 610
diff changeset
   933
			have an effect on $rs_1$:
Chengsong
parents: 610
diff changeset
   934
			\[\textit{rdistinct}\;  (rs_1 @ rsa)\;\, acc
Chengsong
parents: 610
diff changeset
   935
			= rs_1@(\textit{rdistinct} rsa \; (acc \cup rs_1))\]
Chengsong
parents: 610
diff changeset
   936
\end{lemma}
Chengsong
parents: 610
diff changeset
   937
\begin{proof}
Chengsong
parents: 610
diff changeset
   938
	By an induction on 
Chengsong
parents: 610
diff changeset
   939
	$rs_1$, where $rsa$ and $acc$ are allowed to be arbitrary.
Chengsong
parents: 610
diff changeset
   940
\end{proof}
Chengsong
parents: 610
diff changeset
   941
\noindent
Chengsong
parents: 610
diff changeset
   942
$\textit{rdistinct}$ needs to be applied only once, and 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   943
applying it multiple times does not make any difference:
611
Chengsong
parents: 610
diff changeset
   944
\begin{corollary}\label{distinctOnceEnough}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   945
	$\textit{rdistinct} \; (rs @ rsa) {} = \textit{rdistinct} \; ( (rdistinct \; 
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   946
	rs \; \{ \}) @ (\textit{rdistinct} \; rs_a \; (set \; rs)))$
611
Chengsong
parents: 610
diff changeset
   947
\end{corollary}
Chengsong
parents: 610
diff changeset
   948
\begin{proof}
Chengsong
parents: 610
diff changeset
   949
	By lemma \ref{distinctRdistinctAppend}.
Chengsong
parents: 610
diff changeset
   950
\end{proof}
555
Chengsong
parents: 554
diff changeset
   951
611
Chengsong
parents: 610
diff changeset
   952
\subsubsection{The Properties of $\textit{Rflts}$} 
Chengsong
parents: 610
diff changeset
   953
We give in this subsection some properties
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
   954
involving $\backslash_r$, $\backslash_{rsimps}$, $\textit{rflts}$ and 
611
Chengsong
parents: 610
diff changeset
   955
$\textit{rsimp}_{ALTS} $, together with any non-trivial lemmas that lead to them.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   956
These will be helpful in later closed-form proofs, when
611
Chengsong
parents: 610
diff changeset
   957
we want to transform derivative terms which have
Chengsong
parents: 610
diff changeset
   958
%the ways in which multiple functions involving
Chengsong
parents: 610
diff changeset
   959
%those are composed together
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   960
interleaving derivatives and simplifications applied to them.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   961
611
Chengsong
parents: 610
diff changeset
   962
\noindent
Chengsong
parents: 610
diff changeset
   963
%When the function $\textit{Rflts}$ 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   964
%is applied to the concatenation of two lists; the output can be calculated by first applying the
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   965
%functions on two lists separately and then concatenating them together.
611
Chengsong
parents: 610
diff changeset
   966
$\textit{Rflts}$ is composable in terms of concatenation:
554
Chengsong
parents: 553
diff changeset
   967
\begin{lemma}\label{rfltsProps}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
   968
	The function $\rflts$ has the properties below:\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   969
	\begin{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   970
		\item
554
Chengsong
parents: 553
diff changeset
   971
			$\rflts \; (rs_1 @ rs_2) = \rflts \; rs_1 @ \rflts \; rs_2$
Chengsong
parents: 553
diff changeset
   972
		\item
Chengsong
parents: 553
diff changeset
   973
			If $r \neq \RZERO$ and $\nexists rs_1. r = \RALTS{rs}_1$, then $\rflts \; (r::rs) = r :: \rflts \; rs$
Chengsong
parents: 553
diff changeset
   974
		\item
Chengsong
parents: 553
diff changeset
   975
			$\rflts \; (rs @ [\RZERO]) = \rflts \; rs$
Chengsong
parents: 553
diff changeset
   976
		\item
Chengsong
parents: 553
diff changeset
   977
			$\rflts \; (rs' @ [\RALTS{rs}]) = \rflts \; rs'@rs$
Chengsong
parents: 553
diff changeset
   978
		\item
Chengsong
parents: 553
diff changeset
   979
			$\rflts \; (rs @ [\RONE]) = \rflts \; rs @ [\RONE]$
Chengsong
parents: 553
diff changeset
   980
		\item
Chengsong
parents: 553
diff changeset
   981
			If $r \neq \RZERO$ and $\nexists rs'. r = \RALTS{rs'}$ then $\rflts \; (rs @ [r])
Chengsong
parents: 553
diff changeset
   982
			= (\rflts \; rs) @ [r]$
555
Chengsong
parents: 554
diff changeset
   983
		\item
Chengsong
parents: 554
diff changeset
   984
			If $r = \RALTS{rs}$ and $r \in rs'$ then for all $r_1 \in rs. 
Chengsong
parents: 554
diff changeset
   985
			r_1 \in \rflts \; rs'$.
Chengsong
parents: 554
diff changeset
   986
		\item
Chengsong
parents: 554
diff changeset
   987
			$\rflts \; (rs_a @ \RZERO :: rs_b) = \rflts \; (rs_a @ rs_b)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   988
	\end{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   989
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   990
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   991
\begin{proof}
555
Chengsong
parents: 554
diff changeset
   992
	By induction on $rs_1$ in the first sub-lemma, and induction on $r$ in the second part,
Chengsong
parents: 554
diff changeset
   993
	and induction on $rs$, $rs'$, $rs$, $rs'$, $rs_a$ in the third, fourth, fifth, sixth and 
Chengsong
parents: 554
diff changeset
   994
	last sub-lemma.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 532
diff changeset
   995
\end{proof}
611
Chengsong
parents: 610
diff changeset
   996
\noindent
Chengsong
parents: 610
diff changeset
   997
Now we introduce the property that the operations 
Chengsong
parents: 610
diff changeset
   998
derivative and $\rsimpalts$
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   999
commute, this will be used later on when deriving the closed form for
611
Chengsong
parents: 610
diff changeset
  1000
the alternative regular expression:
Chengsong
parents: 610
diff changeset
  1001
\begin{lemma}\label{rderRsimpAltsCommute}
Chengsong
parents: 610
diff changeset
  1002
	$\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$
Chengsong
parents: 610
diff changeset
  1003
\end{lemma}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1004
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1005
	By induction on $rs$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1006
\end{proof}
611
Chengsong
parents: 610
diff changeset
  1007
\noindent
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1008
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1009
\subsubsection{The $RL$ Function: Language Interpretation for $\textit{Rrexp}$s}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1010
Much like the definition of $L$ on plain regular expressions, one can also 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1011
define the language interpretation for $\rrexp$s.
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1012
\begin{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1013
	\begin{tabular}{lcl}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1014
		$RL \; (\ZERO_r)$ & $\dn$ & $\phi$\\
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1015
		$RL \; (\ONE_r)$ & $\dn$ & $\{[]\}$\\
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1016
		$RL \; (c)$ & $\dn$ & $\{[c]\}$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1017
		$RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1018
		$RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1019
		$RL \; (r^*)$ & $\dn$ & $ (RL(r))^*$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1020
	\end{tabular}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1021
\end{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1022
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1023
The main use of $RL$ is to establish some connections between $\rsimp{}$ 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1024
and $\rnullable{}$:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1025
\begin{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1026
	The following properties hold:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1027
	\begin{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1028
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1029
			If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1030
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1031
			$\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1032
	\end{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1033
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1034
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1035
	The first part is by induction on $r$. 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1036
	The second part is true because property 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1037
	\[ RL \; r = RL \; (\rsimp{r})\] holds.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1038
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1039
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1040
\subsubsection{Simplified $\textit{Rrexp}$s are Good}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1041
We formalise the notion of ``good" regular expressions,
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1042
which means regular expressions that
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1043
are fully simplified in terms of our $\textit{rsimp}$ function. 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1044
For alternative regular expressions that means they
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1045
do not contain any nested alternatives, un-eliminated $\RZERO$s
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1046
or duplicate elements (for example, 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1047
$r_1 + (r_2 + r_3)$, $\RZERO + r$ and $ \sum [r, r, \ldots]$).
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1048
The clauses for $\good$ are:
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1049
\begin{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1050
	\begin{tabular}{@{}lcl@{}}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1051
		$\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1052
		$\good\; \RONE$ & $\dn$ & $\textit{true}$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1053
		$\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1054
		$\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1055
		$\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1056
		$\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ & 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1057
		$\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1058
						   & & $\land \; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \,  \land \; \, \textit{nonAlt}\; r')$\\
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1059
		$\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1060
		$\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1061
		$\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1062
		$\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1063
		$\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1064
	\end{tabular}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1065
\end{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1066
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1067
We omit the recursive definition of the predicate $\textit{nonAlt}$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1068
which evaluates to true when the regular expression is not an
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1069
alternative, and false otherwise.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1070
The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1071
its non-empty argument list of expressions are all good themselves, and $\textit{nonAlt}$, 
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1072
and unique:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1073
\begin{lemma}\label{rsimpaltsGood}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1074
	If $rs \neq []$ and for all $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1075
	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1076
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1077
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1078
We also note that
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1079
if a regular expression $r$ is good, then $\rflts$ on the singleton
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1080
list $[r]$ will not break goodness:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1081
\begin{lemma}\label{flts2}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1082
	If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1083
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1084
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1085
	By an induction on $r$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1086
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1087
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1088
The other observation we make about $\rsimp{r}$ is that it never
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1089
comes with nested alternatives, which we describe as the $\nonnested$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1090
property:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1091
\begin{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1092
	\begin{tabular}{lcl}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1093
		$\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1094
		$\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1095
		$\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1096
		$\nonnested \; \, r $ & $\dn$ & $\btrue$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1097
	\end{tabular}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1098
\end{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1099
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1100
The $\rflts$ function
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1101
always opens up nested alternatives,
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1102
which enables $\rsimp$ to be non-nested:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1103
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1104
\begin{lemma}\label{nonnestedRsimp}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1105
	It is always the case that
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1106
	\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1107
		$\nonnested \; (\rsimp{r})$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1108
	\end{center}
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1109
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1110
\begin{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1111
	By induction on $r$.
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1112
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1113
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1114
With this we can prove that a regular expression
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1115
after simplification and flattening and de-duplication,
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1116
will not contain any alternative regular expression directly:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1117
\begin{lemma}\label{nonaltFltsRd}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1118
	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$ 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1119
	then $\textit{nonAlt} \; x$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1120
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1121
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1122
	By \ref{nonnestedRsimp}.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1123
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1124
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1125
The other fact we know is that once $\rsimp{}$ has finished
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1126
processing an alternative regular expression, it will not
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1127
contain any $\RZERO$s. This is because all the recursive 
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1128
calls to the simplification on the children regular expressions
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1129
make the children good, and $\rflts$ will not delete
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1130
any $\RZERO$s out of a good regular expression list,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1131
and $\rdistinct{}$ will not ``mess'' with the result.
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1132
\begin{lemma}\label{flts3Obv}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1133
	The following are true:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1134
	\begin{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1135
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1136
			If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1137
			then for all $r \in \rflts\; rs. \, \good \; r$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1138
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1139
			If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1140
			and for all $y$ such that $\llbracket y \rrbracket_r$ less than
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1141
			$\llbracket rs \rrbracket_r + 1$, either
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1142
			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1143
			then $\good \; x$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1144
	\end{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1145
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1146
\begin{proof}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1147
	The first part is by induction, where the inductive cases
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1148
	are the inductive cases of $\rflts$.
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1149
	The second part is a corollary from the first part.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1150
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1151
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1152
This leads to good structural property of $\rsimp{}$,
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1153
that after simplification, a regular expression is
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1154
either good or $\RZERO$:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1155
\begin{lemma}\label{good1}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1156
	For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1157
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1158
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1159
	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1160
	Lemma \ref{rsimpMono} says that 
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1161
	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1162
	$\llbracket r \rrbracket_r$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1163
	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1164
	The inductive hypothesis applies to the children regular expressions
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1165
	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1166
	by that as well.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1167
	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1168
	to ensure that goodness is preserved at the topmost level.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1169
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1170
We shall prove that any good regular expression is 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1171
a fixed-point for $\textit{rsimp}$.
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1172
First we prove an auxiliary lemma:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1173
\begin{lemma}\label{goodaltsNonalt}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1174
	If $\good \; \sum rs$, then $\rflts\; rs = rs$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1175
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1176
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1177
	By an induction on $\sum rs$. The inductive rules are the cases
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1178
	for $\good$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1179
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1180
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1181
Now we are ready to prove that good regular expressions are invariant
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1182
with respect to $\rsimp{}$:
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1183
\begin{lemma}\label{test}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1184
	If $\good \;r$ then $\rsimp{r} = r$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1185
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1186
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1187
	By an induction on the inductive cases of $\good$, using lemmas
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1188
	\ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1189
	The lemma \ref{goodaltsNonalt} is used in the alternative
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1190
	case where 2 or more elements are present in the list.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1191
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1192
\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1193
Below we show a property involving $\rflts$, $\textit{rdistinct}$, 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1194
$\rsimp{}$ and $\rsimp_{ALTS}$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1195
which requires $\ref{good1}$ to go through smoothly:
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1196
\begin{lemma}\label{flattenRsimpalts}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1197
An application of $\rsimp_{ALTS}$ can be ``absorbed'',
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1198
if its output is concatenated with a list and then applied to $\rflts$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1199
\begin{center}
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1200
	$\rflts \; ( (\rsimp_{ALTS} \; 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1201
	(\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) :: 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1202
	\map \; \rsimp{} \; rs' ) = 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1203
	\rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1204
	\map \; \rsimp{rs'}))$
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1205
\end{center}
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1206
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1207
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1208
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1209
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1210
	By \ref{good1}.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1211
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1212
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1213
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1214
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1215
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1216
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1217
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1218
We are also ready to prove that $\textit{rsimp}$ is idempotent.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1219
\subsubsection{$\rsimp$ is Idempotent}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1220
The idempotency of $\rsimp$ is very useful in 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1221
manipulating regular expression terms into desired
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1222
forms so that key steps allowing further rewriting to closed forms
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1223
are possible.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1224
\begin{lemma}\label{rsimpIdem}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1225
	$\rsimp{r} = \rsimp{(\rsimp{r})}$
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1226
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1227
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1228
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1229
	By \ref{test} and \ref{good1}.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1230
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1231
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1232
This property means we do not have to repeatedly
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1233
apply simplification in each step, which justifies
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1234
our definition of $\blexersimp$.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1235
This is in contrast to the work of Sulzmann and Lu where
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1236
the simplification is applied in a fixpoint manner.
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1237
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1238
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1239
On the other hand, we can repeat the same $\rsimp{}$ applications
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1240
on regular expressions as many times as we want, if we have at least
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1241
one simplification applied to it, and apply it wherever we need to:
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1242
\begin{corollary}\label{headOneMoreSimp}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1243
	The following properties hold, directly from \ref{rsimpIdem}:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1244
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1245
	\begin{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1246
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1247
			$\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1248
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1249
			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1250
	\end{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1251
\end{corollary}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1252
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1253
This will be useful in the later closed-form proof's rewriting steps.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1254
Similarly, we state the following useful facts below:
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1255
\begin{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1256
	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1257
	\begin{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1258
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1259
			If $r  = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1260
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1261
			If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1262
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1263
			$\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1264
	\end{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1265
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1266
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1267
	By application of lemmas \ref{rsimpIdem} and \ref{good1}.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1268
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1269
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1270
\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1271
With the idempotency of $\textit{rsimp}$ and its corollaries, 
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1272
we can start proving some key equalities leading to the 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1273
closed forms.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1274
Next we present a few equivalent terms under $\textit{rsimp}$.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1275
To make the notation more concise
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1276
We use $r_1 \sequal r_2 $ to denote that $\rsimp{r_1} = \rsimp{r_2}$.
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1277
%\begin{center}
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1278
%\begin{tabular}{lcl}
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1279
%	$a \sequal b$ & $ \dn$ & $ \textit{rsimp} \; a = \textit{rsimp} \; b$
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1280
%\end{tabular}
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1281
%\end{center}
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1282
%\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1283
%\vspace{0em}
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1284
\begin{lemma}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1285
	The following equivalence hold:
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1286
	\begin{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1287
	\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1288
		$\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1289
	\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1290
		$\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1291
	\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1292
		$\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1293
	\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1294
		$\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1295
	\item
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1296
		$\RALTS{rs} \sequal \RALTS{\map \; \rsimp{} \; rs}$
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1297
\end{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1298
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1299
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1300
	By induction on the lists involved.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1301
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1302
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1303
The above allows us to prove
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1304
two similar equalities (which are a bit more involved).
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1305
It says that we could flatten the elements
614
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1306
before simplification and still get the same result.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1307
\begin{lemma}\label{simpFlatten3}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1308
	One can flatten the inside $\sum$ of a $\sum$ if it is being 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1309
	simplified. Concretely,
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1310
	\begin{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1311
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1312
			If for all $r \in rs, rs', rs''$, we have $\good \; r $
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1313
			or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1314
			\sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1315
		\item
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1316
			$\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1317
	\end{itemize}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1318
\end{lemma}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1319
\begin{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1320
	By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1321
	The second sub-lemma is a corollary of the previous.
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1322
\end{proof}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1323
%Rewriting steps not put in--too long and complicated-------------------------------
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1324
\begin{comment}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1325
	\begin{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1326
		$\rsimp{\sum (rs' @ rs @ rs'')}  \stackrel{def of bsimp}{=}$  \\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1327
		$\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1328
		$\stackrel{by \ref{test}}{=} 
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1329
		\rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1330
		\varnothing})$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1331
		$\stackrel{by \ref{rdistinctConcatGeneral}}{=}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1332
		\rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1333
		\rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1334
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1335
	\end{center}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1336
\end{comment}
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1337
%Rewriting steps not put in--too long and complicated-------------------------------
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1338
\noindent
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1339
d5e9bcb384ec reorder
Chengsong
parents: 613
diff changeset
  1340
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1341
We need more equalities like the above to enable a closed form lemma,
613
Chengsong
parents: 611
diff changeset
  1342
for which we need to introduce a few rewrite relations
Chengsong
parents: 611
diff changeset
  1343
to help
Chengsong
parents: 611
diff changeset
  1344
us obtain them.
554
Chengsong
parents: 553
diff changeset
  1345
610
d028c662a3df data files
Chengsong
parents: 609
diff changeset
  1346
\subsection{The rewrite relation $\hrewrite$ , $\scfrewrites$ , $\frewrite$ and $\grewrite$}
613
Chengsong
parents: 611
diff changeset
  1347
Inspired by the success we had in the correctness proof 
Chengsong
parents: 611
diff changeset
  1348
in \ref{Bitcoded2},
Chengsong
parents: 611
diff changeset
  1349
we follow suit here, defining atomic simplification
Chengsong
parents: 611
diff changeset
  1350
steps as ``small-step'' rewriting steps. This allows capturing 
555
Chengsong
parents: 554
diff changeset
  1351
similarities between terms that would be otherwise
Chengsong
parents: 554
diff changeset
  1352
hard to express.
Chengsong
parents: 554
diff changeset
  1353
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1354
We use $\hrewrite$ for one-step atomic rewrite of 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1355
regular expression simplification, 
555
Chengsong
parents: 554
diff changeset
  1356
$\frewrite$ for rewrite of list of regular expressions that 
Chengsong
parents: 554
diff changeset
  1357
include all operations carried out in $\rflts$, and $\grewrite$ for
613
Chengsong
parents: 611
diff changeset
  1358
rewriting a list of regular expressions possible in both $\rflts$ and $\textit{rdistinct}$.
555
Chengsong
parents: 554
diff changeset
  1359
Their reflexive transitive closures are used to denote zero or many steps,
Chengsong
parents: 554
diff changeset
  1360
as was the case in the previous chapter.
613
Chengsong
parents: 611
diff changeset
  1361
As we have already
Chengsong
parents: 611
diff changeset
  1362
done something similar, the presentation about
Chengsong
parents: 611
diff changeset
  1363
these rewriting rules will be more concise than that in \ref{Bitcoded2}.
554
Chengsong
parents: 553
diff changeset
  1364
To differentiate between the rewriting steps for annotated regular expressions
Chengsong
parents: 553
diff changeset
  1365
and $\rrexp$s, we add characters $h$ and $g$ below the squig arrow symbol
Chengsong
parents: 553
diff changeset
  1366
to mean atomic simplification transitions 
Chengsong
parents: 553
diff changeset
  1367
of $\rrexp$s and $\rrexp$ lists, respectively.
Chengsong
parents: 553
diff changeset
  1368
555
Chengsong
parents: 554
diff changeset
  1369
Chengsong
parents: 554
diff changeset
  1370
Chengsong
parents: 554
diff changeset
  1371
613
Chengsong
parents: 611
diff changeset
  1372
\begin{figure}[H]
554
Chengsong
parents: 553
diff changeset
  1373
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1374
	\begin{mathpar}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1375
		\inferrule[RSEQ0L]{}{\RZERO \cdot r_2 \hrewrite \RZERO\\}
555
Chengsong
parents: 554
diff changeset
  1376
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1377
		\inferrule[RSEQ0R]{}{r_1 \cdot \RZERO \hrewrite \RZERO\\}
555
Chengsong
parents: 554
diff changeset
  1378
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1379
		\inferrule[RSEQ1]{}{(\RONE \cdot r) \hrewrite  r\\}\\	
555
Chengsong
parents: 554
diff changeset
  1380
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1381
		\inferrule[RSEQL]{ r_1 \hrewrite r_2}{r_1 \cdot r_3 \hrewrite r_2 \cdot r_3\\}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1382
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1383
		\inferrule[RSEQR]{ r_3 \hrewrite r_4}{r_1 \cdot r_3 \hrewrite r_1 \cdot r_4\\}\\
555
Chengsong
parents: 554
diff changeset
  1384
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1385
		\inferrule[RALTSChild]{r \hrewrite r'}{\sum (rs_1 @ [r] @ rs_2) \hrewrite \sum (rs_1 @ [r'] @ rs_2)\\}
555
Chengsong
parents: 554
diff changeset
  1386
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1387
		\inferrule[RALTS0]{}{\sum (rs_a @ [\RZERO] @ rs_b) \hrewrite \sum (rs_a @ rs_b)}
555
Chengsong
parents: 554
diff changeset
  1388
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1389
		\inferrule[RALTSNested]{}{\sum (rs_a @ [\sum rs_1] @ rs_b) \hrewrite \sum (rs_a @ rs_1 @ rs_b)}
555
Chengsong
parents: 554
diff changeset
  1390
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1391
		\inferrule[RALTSNil]{}{ \sum [] \hrewrite \RZERO\\}
555
Chengsong
parents: 554
diff changeset
  1392
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1393
		\inferrule[RALTSSingle]{}{ \sum [r] \hrewrite  r\\}	
555
Chengsong
parents: 554
diff changeset
  1394
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1395
		\inferrule[RALTSDelete]{\\ r_1 = r_2}{\sum rs_a @ [r_1] @ rs_b @ [r_2] @ rsc \hrewrite \sum rs_a @ [r_1] @ rs_b @ rs_c}
555
Chengsong
parents: 554
diff changeset
  1396
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1397
	\end{mathpar}
555
Chengsong
parents: 554
diff changeset
  1398
\end{center}
613
Chengsong
parents: 611
diff changeset
  1399
\caption{List of one-step rewrite rules for r-regular expressions ($\hrewrite$)}\label{hRewrite}
Chengsong
parents: 611
diff changeset
  1400
\end{figure}
554
Chengsong
parents: 553
diff changeset
  1401
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1402
613
Chengsong
parents: 611
diff changeset
  1403
Like $\rightsquigarrow_s$, it is
Chengsong
parents: 611
diff changeset
  1404
convenient to define rewrite rules for a list of regular expressions,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1405
where each element can rewrite in many steps to the other (scf stands for
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1406
li\emph{s}t \emph{c}losed \emph{f}orm). This relation is similar to the 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1407
$\stackrel{s*}{\rightsquigarrow}$ for annotated regular expressions.
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1408
613
Chengsong
parents: 611
diff changeset
  1409
\begin{figure}[H]
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1410
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1411
	\begin{mathpar}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1412
		\inferrule{}{[] \scfrewrites [] }
613
Chengsong
parents: 611
diff changeset
  1413
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1414
		\inferrule{r \hrewrites r' \\ rs \scfrewrites rs'}{r :: rs \scfrewrites r' :: rs'}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1415
	\end{mathpar}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1416
\end{center}
613
Chengsong
parents: 611
diff changeset
  1417
\caption{List of one-step rewrite rules for a list of r-regular expressions}\label{scfRewrite}
Chengsong
parents: 611
diff changeset
  1418
\end{figure}
555
Chengsong
parents: 554
diff changeset
  1419
%frewrite
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1420
List of one-step rewrite rules for flattening 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1421
a list of  regular expressions($\frewrite$):
613
Chengsong
parents: 611
diff changeset
  1422
\begin{figure}[H]
555
Chengsong
parents: 554
diff changeset
  1423
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1424
	\begin{mathpar}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1425
		\inferrule{}{\RZERO :: rs \frewrite rs \\}
555
Chengsong
parents: 554
diff changeset
  1426
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1427
		\inferrule{}{(\sum rs) :: rs_a \frewrite rs @ rs_a \\}
555
Chengsong
parents: 554
diff changeset
  1428
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1429
		\inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1430
	\end{mathpar}
555
Chengsong
parents: 554
diff changeset
  1431
\end{center}
613
Chengsong
parents: 611
diff changeset
  1432
\caption{List of one-step rewrite rules characterising the $\rflts$ operation on a list}\label{fRewrites}
Chengsong
parents: 611
diff changeset
  1433
\end{figure}
555
Chengsong
parents: 554
diff changeset
  1434
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1435
Lists of one-step rewrite rules for flattening and de-duplicating
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1436
a list of regular expressions ($\grewrite$):
613
Chengsong
parents: 611
diff changeset
  1437
\begin{figure}[H]
555
Chengsong
parents: 554
diff changeset
  1438
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1439
	\begin{mathpar}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1440
		\inferrule{}{\RZERO :: rs \grewrite rs \\}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1441
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1442
		\inferrule{}{(\sum rs) :: rs_a \grewrite rs @ rs_a \\}
555
Chengsong
parents: 554
diff changeset
  1443
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1444
		\inferrule{rs_1 \grewrite rs_2}{r :: rs_1 \grewrite r :: rs_2}
555
Chengsong
parents: 554
diff changeset
  1445
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1446
		\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1447
	\end{mathpar}
555
Chengsong
parents: 554
diff changeset
  1448
\end{center}
613
Chengsong
parents: 611
diff changeset
  1449
\caption{List of one-step rewrite rules characterising the $\rflts$ and $\textit{rdistinct}$
Chengsong
parents: 611
diff changeset
  1450
operations}\label{gRewrite}
Chengsong
parents: 611
diff changeset
  1451
\end{figure}
555
Chengsong
parents: 554
diff changeset
  1452
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1453
We define
613
Chengsong
parents: 611
diff changeset
  1454
two separate list rewriting relations $\frewrite$ and $\grewrite$.
611
Chengsong
parents: 610
diff changeset
  1455
The rewriting steps that take place during
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1456
flattening are characterised by $\frewrite$.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1457
The rewrite relation $\grewrite$ characterises both flattening and de-duplicating.
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1458
Sometimes $\grewrites$ is slightly too powerful
613
Chengsong
parents: 611
diff changeset
  1459
so we would rather use $\frewrites$ to prove
Chengsong
parents: 611
diff changeset
  1460
%because we only
Chengsong
parents: 611
diff changeset
  1461
equalities related to $\rflts$.
Chengsong
parents: 611
diff changeset
  1462
%certain equivalence under the rewriting steps of $\frewrites$.
556
Chengsong
parents: 555
diff changeset
  1463
For example, when proving the closed-form for the alternative regular expression,
613
Chengsong
parents: 611
diff changeset
  1464
one of the equalities needed is:
Chengsong
parents: 611
diff changeset
  1465
\begin{center}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1466
	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1467
	\sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1468
	$
613
Chengsong
parents: 611
diff changeset
  1469
\end{center}
556
Chengsong
parents: 555
diff changeset
  1470
\noindent
Chengsong
parents: 555
diff changeset
  1471
Proving this is by first showing 
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1472
\begin{lemma}\label{earlyLaterDerFrewrites}
556
Chengsong
parents: 555
diff changeset
  1473
	$\map \; (\_ \backslash x) \;  (\rflts \; rs) \frewrites
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1474
	\rflts \; (\map \; (\_ \backslash x) \; rs)$
556
Chengsong
parents: 555
diff changeset
  1475
\end{lemma}
Chengsong
parents: 555
diff changeset
  1476
\noindent
613
Chengsong
parents: 611
diff changeset
  1477
and then the equivalence between two terms
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1478
that can reduce in many steps to each other:
556
Chengsong
parents: 555
diff changeset
  1479
\begin{lemma}\label{frewritesSimpeq}
Chengsong
parents: 555
diff changeset
  1480
	If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal 
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1481
	\sum (\rDistinct \;  rs_2 \;  \varnothing)$.
556
Chengsong
parents: 555
diff changeset
  1482
\end{lemma}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1483
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1484
These two lemmas can both be proven using a straightforward induction (and
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1485
the proofs for them are therefore omitted).
613
Chengsong
parents: 611
diff changeset
  1486
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1487
Now the above equalities can be derived with ease: 
613
Chengsong
parents: 611
diff changeset
  1488
\begin{corollary}
Chengsong
parents: 611
diff changeset
  1489
	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
Chengsong
parents: 611
diff changeset
  1490
	\sum (\rDistinct \;\;  (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
Chengsong
parents: 611
diff changeset
  1491
	$
Chengsong
parents: 611
diff changeset
  1492
\end{corollary}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1493
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1494
	By lemmas \ref{earlyLaterDerFrewrites} and \ref{frewritesSimpeq}.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1495
\end{proof}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1496
But this trick will not work for $\grewrites$.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1497
For example, a rewriting step in proving
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1498
closed forms is:
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1499
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1500
	$\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))}$\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1501
	$=$ \\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1502
	$\rsimp{(\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; (\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}))} $
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1503
	\noindent
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1504
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1505
For this, one would hope to have a rewriting relation between the two lists involved,
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1506
similar to \ref{earlyLaterDerFrewrites}. However, it turns out that 
556
Chengsong
parents: 555
diff changeset
  1507
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1508
	$\map \; (\_ \backslash x) \; (\rDistinct \; rs \; rset) \grewrites \rDistinct \; (\map \;
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1509
	(\_ \backslash x) \; rs) \; ( rset \backslash x)$
556
Chengsong
parents: 555
diff changeset
  1510
\end{center}
Chengsong
parents: 555
diff changeset
  1511
\noindent
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1512
does $\mathbf{not}$ hold in general.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1513
For this rewriting step we will introduce some slightly more cumbersome
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1514
proof technique later.
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1515
The point is that $\frewrite$
613
Chengsong
parents: 611
diff changeset
  1516
allows us to prove equivalence in a straightforward way that is 
Chengsong
parents: 611
diff changeset
  1517
not possible for $\grewrite$. 
555
Chengsong
parents: 554
diff changeset
  1518
556
Chengsong
parents: 555
diff changeset
  1519
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1520
\subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$}
613
Chengsong
parents: 611
diff changeset
  1521
In this part, we present lemmas stating
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1522
pairs of r-regular expressions and r-regular expression lists
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1523
where one can rewrite from one in many steps to the other.
613
Chengsong
parents: 611
diff changeset
  1524
Most of the proofs to these lemmas are straightforward, using
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1525
an induction on the corresponding rewriting relations.
613
Chengsong
parents: 611
diff changeset
  1526
These proofs will therefore be omitted when this is the case.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1527
We present in the following lemma a few pairs of terms that are rewritable via 
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1528
$\grewrites$:
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1529
\begin{lemma}\label{gstarRdistinctGeneral}
613
Chengsong
parents: 611
diff changeset
  1530
	\mbox{}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1531
	\begin{itemize}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1532
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1533
			$rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1534
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1535
			$rs \grewrites \rDistinct \; rs \; \varnothing$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1536
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1537
			$rs_a @ (\rDistinct \; rs \; rs_a) \grewrites rs_a @ (\rDistinct \; 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1538
			rs \; (\{\RZERO\} \cup rs_a))$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1539
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1540
			$rs \;\; @ \;\; \rDistinct \; rs_a \; rset \grewrites rs @  \rDistinct \; rs_a \;
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1541
			(rest \cup rs)$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1542
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1543
	\end{itemize}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1544
\end{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1545
\noindent
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1546
If a pair of terms $rs_1, rs_2$ are rewritable via $\grewrites$ to each other,
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1547
then they are equivalent under $\rsimp{}$:
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1548
\begin{lemma}\label{grewritesSimpalts}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1549
	\mbox{}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1550
	If $rs_1 \grewrites rs_2$, then
613
Chengsong
parents: 611
diff changeset
  1551
	we have the following equivalence:
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1552
	\begin{itemize}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1553
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1554
			$\sum rs_1 \sequal \sum rs_2$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1555
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1556
			$\rsimpalts \; rs_1 \sequal \rsimpalts \; rs_2$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1557
	\end{itemize}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1558
\end{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1559
\noindent
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1560
Here are a few connecting lemmas showing that
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1561
if a list of regular expressions can be rewritten using $\grewrites$ or $\frewrites $ or
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1562
$\scfrewrites$,
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1563
then an alternative constructor taking the list can also be rewritten using $\hrewrites$:
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1564
\begin{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1565
	\begin{itemize}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1566
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1567
			If $rs \grewrites rs'$ then $\sum rs \hrewrites \sum rs'$.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1568
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1569
			If $rs \grewrites rs'$ then $\sum rs \hrewrites \rsimpalts \; rs'$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1570
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1571
			If $rs_1 \scfrewrites rs_2$ then $\sum (rs @ rs_1) \hrewrites \sum (rs @ rs_2)$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1572
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1573
			If $rs_1 \scfrewrites rs_2$ then $\sum rs_1 \hrewrites \sum rs_2$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1574
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1575
	\end{itemize}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1576
\end{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1577
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1578
Now comes the core of the proof, 
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1579
which says that once two lists are rewritable to each other,
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1580
then they are equivalent under $\textit{rsimp}$:
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1581
\begin{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1582
	If $r_1 \hrewrites r_2$ then $r_1 \sequal r_2$.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1583
\end{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1584
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1585
\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1586
Similar to what we did in chapter \ref{Bitcoded2}, 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1587
we prove that if one can rewrite from one r-regular expression ($r$)
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1588
to the other ($r'$), after taking derivatives one can still rewrite
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1589
the first ($r\backslash c$) to the other ($r'\backslash c$).
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1590
\begin{lemma}\label{interleave}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1591
	If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1592
\end{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1593
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1594
This allows us to prove more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$.
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1595
\begin{lemma}\label{insideSimpRemoval}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1596
	$\rsimp{(\rder{c}{(\rsimp{r})})} = \rsimp{(\rder{c}{r})}  $
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1597
\end{lemma}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1598
\noindent
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1599
\begin{proof}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1600
	By \ref{interleave} and \ref{rsimpIdem}.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1601
\end{proof}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1602
\noindent
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1603
And this unlocks more equivalent terms:
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1604
\begin{lemma}\label{Simpders}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1605
	As corollaries of \ref{insideSimpRemoval}, we have
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1606
	\begin{itemize}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1607
		\item
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  1608
			If $s \neq []$ then $\rderssimp{r}{s} = \rsimp{( r \backslash_{rs} s)}$.
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1609
		\item
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1610
			$\rsimpalts \; (\map \; (\_ \backslash_r x) \;
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1611
			(\rdistinct{rs}{\varnothing})) \sequal
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1612
			\rsimpalts \; (\rDistinct \; 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1613
			(\map \; (\_ \backslash_r x) rs) \;\varnothing  )$
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1614
	\end{itemize}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1615
\end{lemma}
611
Chengsong
parents: 610
diff changeset
  1616
\begin{proof}
Chengsong
parents: 610
diff changeset
  1617
	Part 1 is by lemma \ref{insideSimpRemoval},
613
Chengsong
parents: 611
diff changeset
  1618
	part 2 is by lemma \ref{insideSimpRemoval} .%and \ref{distinctDer}.
611
Chengsong
parents: 610
diff changeset
  1619
\end{proof}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1620
\noindent
613
Chengsong
parents: 611
diff changeset
  1621
Chengsong
parents: 611
diff changeset
  1622
\subsection{Closed Forms for $\sum rs$, $r_1\cdot r_2$ and $r^*$}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1623
Lemma \ref{Simpders} leads to our first closed form,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1624
which is for the alternative regular expression:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1625
\begin{theorem}\label{altsClosedForm}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1626
	\mbox{}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1627
	\begin{center}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1628
		$\rderssimp{(\sum rs)}{s} \sequal
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1629
		\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1630
	\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1631
\end{theorem}
556
Chengsong
parents: 555
diff changeset
  1632
\noindent
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1633
\begin{proof}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1634
	By a reverse induction on the string $s$.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1635
	One rewriting step, as we mentioned earlier,
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1636
	involves
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1637
	\begin{center}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1638
		$\rsimpalts \; (\map \; (\_ \backslash x) \; 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1639
		(\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1640
		(\lambda r. \rderssimp{r}{xs}))))}{\varnothing}))
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1641
		\sequal
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1642
		\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1643
			(\rflts \; (\map \; (\rsimp{} \; \circ \; 
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1644
		(\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}) $.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1645
	\end{center}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1646
	This can be proven by a combination of 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1647
	\ref{grewritesSimpalts}, \ref{gstarRdistinctGeneral}, \ref{rderRsimpAltsCommute}, and
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1648
	\ref{insideSimpRemoval}.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1649
\end{proof}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1650
\noindent
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1651
This closed form has a variant which can be more convenient in later proofs:
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1652
\begin{corollary}\label{altsClosedForm1}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1653
	If $s \neq []$ then 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1654
	$\rderssimp \; (\sum \; rs) \; s = 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1655
	\rsimp{(\sum \; (\map \; \rderssimp{\_}{s} \; rs))}$.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1656
\end{corollary}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1657
\noindent
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1658
The harder closed forms are the sequence and star ones.
613
Chengsong
parents: 611
diff changeset
  1659
Before we obtain them, some preliminary definitions
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1660
are needed to make proof statements concise.
556
Chengsong
parents: 555
diff changeset
  1661
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  1662
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  1663
\subsubsection{Closed Form for Sequence Regular Expressions}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1664
For the sequence regular expression,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1665
let's first look at a series of derivative steps on it 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1666
(assuming that each time when a derivative is taken,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1667
the head of the sequence is always nullable):
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1668
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1669
	\begin{tabular}{llll}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1670
		$r_1 \cdot r_2$ &  
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1671
		$\longrightarrow_{\backslash c}$ &  
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1672
		$r_1\backslash c \cdot r_2 + r_2 \backslash c$ &
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1673
		$ \longrightarrow_{\backslash c'} $ \\ 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1674
		\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1675
		$(r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc'$ & 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1676
		$\longrightarrow_{\backslash c''} $ &
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1677
		$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1678
		+ r_2 \backslash cc'c''$ & 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1679
		$   \longrightarrow_{\backslash c''} \quad \ldots$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1680
	\end{tabular}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1681
\end{center}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1682
Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expressed as 
558
Chengsong
parents: 557
diff changeset
  1683
a giant alternative taking a list of terms 
Chengsong
parents: 557
diff changeset
  1684
$[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$,
Chengsong
parents: 557
diff changeset
  1685
where the head of the list is always the term
Chengsong
parents: 557
diff changeset
  1686
representing a match involving only $r_1$, and the tail of the list consisting of
Chengsong
parents: 557
diff changeset
  1687
terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1688
This intuition is also echoed by Murugesan and Sundaram \cite{Murugesan2014}, 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1689
where they gave
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1690
a pencil-and-paper derivation of $(r_1 \cdot r_2)\backslash s$:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1691
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1692
	\begin{tabular}{lc}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1693
		$L \; [ (r_1 \cdot r_2) \backslash_r (c_1 :: c_2 :: \ldots c_n) ]$ & $ =$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1694
		\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1695
		\rule{0pt}{3ex} $L \; [ ((r_1 \backslash_r c_1) \cdot r_2 + 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1696
		(\delta\; (\nullable \; r_1) \; (r_2 \backslash_r c_1) )) \backslash_r 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1697
		(c_2 :: \ldots c_n) ]$ &
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1698
		$=$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1699
		\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1700
		\rule{0pt}{3ex} $L \; [ ((r_1 \backslash_r c_1c_2 \cdot r_2 + 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1701
		(\delta \; (\nullable \; r_1) \; (r_2 \backslash_r c_1c_2)))
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1702
		$ & \\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1703
		\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1704
		$\quad + (\delta \ (\nullable \; r_1 \backslash_r c)\; (r_2 \backslash_r c_2) )) 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1705
		\backslash_r (c_3 \ldots c_n) ]$ & $\ldots$ \\
558
Chengsong
parents: 557
diff changeset
  1706
	\end{tabular}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1707
\end{center}
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1708
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1709
The $\delta$ function 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1710
returns $r$ when the boolean condition
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1711
$b$ evaluates to true and
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1712
$\ZERO_r$ otherwise:
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1713
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1714
	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1715
		$\delta \; b\; r$ & $\dn$ & $r \quad \textit{if} \; b \; is \;\textit{true}$\\
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1716
				  & $\dn$ & $\ZERO_r \quad otherwise$
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1717
	\end{tabular}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1718
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1719
\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1720
Note that the term
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1721
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1722
	\begin{tabular}{lc}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1723
		\rule{0pt}{3ex} $((r_1 \backslash_r c_1c_2 \cdot r_2 + 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1724
		(\delta \; (\nullable \; r_1) \; (r_2 \backslash_r c_1c_2)))
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1725
		$ & \\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1726
		\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1727
		$\quad + (\delta \ (\nullable \; r_1 \backslash_r c)\; (r_2 \backslash_r c_2) )) 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1728
		\backslash_r (c_3 \ldots c_n)$ &\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1729
	\end{tabular}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1730
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1731
\noindent
558
Chengsong
parents: 557
diff changeset
  1732
does not faithfully
Chengsong
parents: 557
diff changeset
  1733
represent what the intermediate derivatives would actually look like
Chengsong
parents: 557
diff changeset
  1734
when one or more intermediate results $r_1 \backslash s' \cdot r_2$ are not 
Chengsong
parents: 557
diff changeset
  1735
nullable in the head of the sequence.
Chengsong
parents: 557
diff changeset
  1736
For example, when $r_1$ and $r_1 \backslash_r c_1$ are not nullable,
Chengsong
parents: 557
diff changeset
  1737
the regular expression would not look like 
Chengsong
parents: 557
diff changeset
  1738
\[
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1739
	r_1 \backslash_r c_1c_2
558
Chengsong
parents: 557
diff changeset
  1740
\]
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1741
instead of
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1742
\[
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1743
	(r_1 \backslash_r c_1c_2 + \ZERO_r ) + \ZERO_r.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1744
\]
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1745
The redundant $\ZERO_r$s will not be created in the
558
Chengsong
parents: 557
diff changeset
  1746
first place.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1747
In a closed-form one needs to take into account this (because
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1748
closed forms require exact equality rather than language equivalence)
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1749
and only generate the 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1750
$r_2 \backslash_r s''$ terms satisfying the property
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1751
\begin{center}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1752
$\exists s'.  such \; that \; s'@s'' = s \;\; \land \;\;
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1753
r_1 \backslash s' \; is \; nullable$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1754
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1755
Given the arguments $s$ and $r_1$, we denote the list of strings
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1756
$s''$ satisfying the above property as $\vsuf{s}{r_1}$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1757
The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string\footnote{
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1758
	Perhaps a better name for it would be ``NullablePrefixSuffix''
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1759
	to differentiate with the list of \emph{all} prefixes of $s$, but
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1760
	that is a bit too long for a function name and we are yet to find
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1761
a more concise and easy-to-understand name.}
558
Chengsong
parents: 557
diff changeset
  1762
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1763
	\begin{tabular}{lcl}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1764
		$\vsuf{[]}{\_} $ & $=$ &  $[]$\\
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1765
		$\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} \; (\rnullable{r_1}) \; \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1766
				    && $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) })  $
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1767
	\end{tabular}
558
Chengsong
parents: 557
diff changeset
  1768
\end{center}
Chengsong
parents: 557
diff changeset
  1769
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1770
The list starts with shorter suffixes
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1771
and ends with longer ones (in other words, the string elements $s''$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1772
in the list $\vsuf{s}{r_1}$ are sorted
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1773
in the same order as that of the terms $r_2\backslash s''$ 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1774
appearing in $(r_1\cdot r_2)\backslash s$).
558
Chengsong
parents: 557
diff changeset
  1775
In essence, $\vsuf{\_}{\_}$ is doing a 
Chengsong
parents: 557
diff changeset
  1776
"virtual derivative" of $r_1 \cdot r_2$, but instead of producing 
Chengsong
parents: 557
diff changeset
  1777
the entire result $(r_1 \cdot r_2) \backslash s$, 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1778
it only stores strings,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1779
with each string $s''$ representing a term such that $r_2 \backslash s''$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1780
is occurring in $(r_1\cdot r_2)\backslash s$.
558
Chengsong
parents: 557
diff changeset
  1781
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1782
With $\textit{Suffix}$ we are ready to express the
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1783
sequence regular expression's closed form,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1784
but before doing so 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1785
more definitions are needed.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1786
The first thing is the flattening function $\sflat{\_}$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1787
which takes an alternative regular expression and produces a flattened version
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1788
of that alternative regular expression.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1789
It is needed to convert
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1790
a left-associative nested sequence of alternatives into 
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1791
a flattened list:
558
Chengsong
parents: 557
diff changeset
  1792
\[
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1793
	\sum(\ldots ((r_1 + r_2) + r_3) + \ldots)
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1794
	\stackrel{\sflat{\_}}{\rightarrow} 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1795
	\sum[r_1, r_2, r_3, \ldots]
558
Chengsong
parents: 557
diff changeset
  1796
\]
Chengsong
parents: 557
diff changeset
  1797
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1798
The definitions of $\sflat{\_}$ and helper functions
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1799
$\sflataux{\_}$ and $\llparenthesis \_ \rrparenthesis''$ are given below.
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1800
\begin{center}  
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1801
	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1802
		$\sflataux{\sum r :: rs}$ & $\dn$ & $\sflataux{r} @ rs$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1803
		$\sflataux{\sum []}$ & $ \dn $ & $ []$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1804
		$\sflataux r$ & $\dn$ & $ [r]$
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1805
	\end{tabular}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1806
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1807
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1808
\begin{center} 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1809
	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1810
		$\sflat{(\sum r :: rs)}$ & $\dn$ & $\sum (\sflataux{r} @ rs)$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1811
		$\sflat{\sum []}$ & $ \dn $ & $ \sum []$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1812
		$\sflat r$ & $\dn$ & $ r$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1813
	\end{tabular}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1814
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1815
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1816
\begin{center}  
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1817
	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1818
		$\sflataux{[]}'$ & $ \dn $ & $ []$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1819
		$\sflataux{ (r_1 + r_2) :: rs }'$ & $\dn$ & $r_1 :: r_2 :: rs$\\
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1820
		$\sflataux{r :: rs}'$ & $\dn$ & $ r::rs$
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1821
	\end{tabular}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1822
\end{center}
558
Chengsong
parents: 557
diff changeset
  1823
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
  1824
$\sflataux{\_}$ breaks up nested alternative regular expressions 
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1825
of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape
558
Chengsong
parents: 557
diff changeset
  1826
into a "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$.
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1827
It will return the singleton list $[r]$ otherwise.
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1828
$\sflat{\_}$ works the same  as $\sflataux{\_}$, except that it keeps
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  1829
the output type a regular expression, not a list.
558
Chengsong
parents: 557
diff changeset
  1830
$\sflataux{\_}$  and $\sflat{\_}$ are only recursive on the  
Chengsong
parents: 557
diff changeset
  1831
first element of the list.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1832
$\sflataux{\_}'$ takes a list of regular expressions as input, and outputs
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1833
a list of regular expressions.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1834
The use of $\sflataux{\_}$ and $\sflataux{\_}'$ is clear once we have
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1835
$\textit{createdBySequence}$ defined:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1836
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1837
	\begin{mathpar}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1838
		\inferrule{\mbox{}}{\textit{createdBySequence}\; (r_1 \cdot r_2)}
558
Chengsong
parents: 557
diff changeset
  1839
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1840
		\inferrule{\textit{createdBySequence} \; r_1}{\textit{createdBySequence} \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1841
		(r_1 + r_2)}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1842
	\end{mathpar}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1843
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1844
\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1845
The predicate $\textit{createdBySequence}$ is used to describe the shape of
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1846
the derivative regular expressions $(r_1\cdot r_2) \backslash s$:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1847
\begin{lemma}\label{recursivelyDerseq}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1848
	It is always the case that
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1849
	\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1850
		$\textit{createdBySequence} \; ( (r_1\cdot r_2) \backslash_r s) $
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1851
	\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1852
	holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1853
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1854
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1855
	By a reverse induction on the string $s$, where the inductive cases are $[]$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1856
	and $xs  @ [x]$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1857
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1858
\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1859
If we have a regular expression $r$ whose shape 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1860
fits into those described by $\textit{createdBySequence}$,
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1861
then we can convert between
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1862
$r \backslash_r c$ and $(\sflataux{r}) \backslash_r c$ with
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1863
$\sflataux{\_}'$:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1864
\begin{lemma}\label{sfauIdemDer}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1865
	If $\textit{createdBySequence} \; r$, then 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1866
	\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1867
		$\sflataux{ r \backslash_r c} = 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1868
		\llparenthesis (\map \; (\_ \backslash_r c) \; (\sflataux{r}) ) \rrparenthesis''$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1869
	\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1870
	holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1871
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1872
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1873
	By a simple induction on the inductive cases of $\textit{createdBySequence}.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1874
	$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1875
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1876
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1877
Now we are ready to express
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1878
the shape of $r_1 \cdot r_2 \backslash s$
558
Chengsong
parents: 557
diff changeset
  1879
\begin{lemma}\label{seqSfau0}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1880
	$\sflataux{(r_1 \cdot r_2) \backslash_r s} = (r_1 \backslash_r s) \cdot r_2 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1881
	:: (\map \; (r_2 \backslash_r \_) \; (\textit{Suffix} \; s \; r_1))$ 
558
Chengsong
parents: 557
diff changeset
  1882
\end{lemma}
Chengsong
parents: 557
diff changeset
  1883
\begin{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1884
	By a reverse induction on the string $s$, where the inductive cases 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1885
	are $[]$ and $xs @ [x]$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1886
	For the inductive case, we know that $\textit{createdBySequence} \; ((r_1 \cdot r_2)
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1887
	\backslash_r xs)$ holds from lemma \ref{recursivelyDerseq},
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1888
	which can be used to prove
558
Chengsong
parents: 557
diff changeset
  1889
	\[
Chengsong
parents: 557
diff changeset
  1890
		\map \; (r_2 \backslash_r \_) \; (\vsuf{[x]}{(r_1 \backslash_r xs)}) \;\; @ \;\;
Chengsong
parents: 557
diff changeset
  1891
		\map \; (\_ \backslash_r x) \; (\map \; (r_2 \backslash \_) \; (\vsuf{xs}{r_1}))
Chengsong
parents: 557
diff changeset
  1892
	\]
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1893
	=
558
Chengsong
parents: 557
diff changeset
  1894
	\[
Chengsong
parents: 557
diff changeset
  1895
		\map \; (r_2 \backslash_r \_) \; (\vsuf{xs @ [x]}{r_1})
Chengsong
parents: 557
diff changeset
  1896
	\]
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1897
	using lemma \ref{sfauIdemDer}.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1898
	This equality enables the inductive case to go through.
558
Chengsong
parents: 557
diff changeset
  1899
\end{proof}
Chengsong
parents: 557
diff changeset
  1900
\noindent 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1901
This lemma says that $(r_1\cdot r_2)\backslash s$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1902
can be flattened into a list whose head and tail meet the description
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1903
we gave earlier.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1904
%Note that this lemma does $\mathbf{not}$ depend on any
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1905
%specific definitions we used,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1906
%allowing people investigating derivatives to get an alternative
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1907
%view of what $r_1 \cdot r_2$ is.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1908
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1909
We now use $\textit{createdBySequence}$ and
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1910
$\sflataux{\_}$ to describe an intuition
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1911
behind the sequence closed form.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1912
If two regular expressions only differ in the way their
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1913
alternatives are nested, then we should be able to get the same result
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1914
once we apply simplification to both of them:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1915
\begin{lemma}\label{sflatRsimpeq}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1916
	If $r$ is created from a sequence through
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1917
	a series of derivatives 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1918
	(i.e. if $\textit{createdBySequence} \; r$ holds), 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1919
	and that $\sflataux{r} = rs$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1920
	then we have
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1921
	that 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1922
	\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1923
		$\textit{rsimp} \; r = \textit{rsimp} \; (\sum \; rs)$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1924
	\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1925
	holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1926
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1927
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1928
	By an induction on the inductive cases of $\textit{createdBySequence}$. 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1929
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1930
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1931
Now we are ready for the closed form 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1932
for the sequence regular expressions (without the inner applications
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1933
of simplifications):
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1934
\begin{lemma}\label{seqClosedFormGeneral}
558
Chengsong
parents: 557
diff changeset
  1935
	$\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }
Chengsong
parents: 557
diff changeset
  1936
	=\rsimp{(\sum (  (r_1 \backslash s) \cdot r_2 :: 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  1937
	\map\; (r_2 \backslash \_) \; (\vsuf{s}{r_1})))}$
558
Chengsong
parents: 557
diff changeset
  1938
\end{lemma}
Chengsong
parents: 557
diff changeset
  1939
\begin{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1940
	We know that 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1941
	$\sflataux{(r_1 \cdot r_2) \backslash_r s} = (r_1 \backslash_r s) \cdot r_2 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  1942
	:: (\map \; (r_2 \backslash_r \_) \; (\textit{Suffix} \; s \; r_1))$
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1943
	holds
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1944
	by lemma \ref{seqSfau0}.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1945
	This allows the theorem to go through because of lemma \ref{sflatRsimpeq}.
558
Chengsong
parents: 557
diff changeset
  1946
\end{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1947
Together with the idempotency property of $\rsimp{}$ (lemma \ref{rsimpIdem}),
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1948
it is possible to convert the above lemma to obtain the
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1949
proper closed form for $\backslash_{rsimps}$ rather than $\backslash_r$:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1950
for  derivatives nested with simplification:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1951
\begin{theorem}\label{seqClosedForm}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1952
	$\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{(\sum ((r_1 \backslash s) \cdot r_2 ) 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1953
	:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1})))}$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1954
\end{theorem}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1955
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1956
	By a case analysis of the string $s$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1957
	When $s$ is an empty list, the rewrite is straightforward.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1958
	When $s$ is a non-empty list, the
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1959
	lemmas \ref{seqClosedFormGeneral} and \ref{Simpders} apply,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1960
	making the proof go through.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1961
\end{proof}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  1962
\subsubsection{Closed Forms for Star Regular Expressions}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1963
The closed form for the star regular expression involves similar tricks
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1964
for the sequence regular expression.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1965
The $\textit{Suffix}$ function is now replaced by something
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1966
slightly more complex, because the growth pattern of star
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1967
regular expressions' derivatives is a bit different:
564
Chengsong
parents: 562
diff changeset
  1968
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1969
	\begin{tabular}{lclc}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1970
		$r^* $ & $\longrightarrow_{\backslash c}$ & 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1971
		$(r\backslash c)  \cdot  r^*$ & $\longrightarrow_{\backslash c'}$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1972
		\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1973
		$r \backslash cc'  \cdot r^* + r \backslash c' \cdot r^*$ &
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1974
		$\longrightarrow_{\backslash c''}$ & 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1975
		$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1976
		(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*)$ & 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1977
		$\longrightarrow_{\backslash c'''}$ \\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1978
		\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1979
		$\ldots$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1980
	\end{tabular}
564
Chengsong
parents: 562
diff changeset
  1981
\end{center}
Chengsong
parents: 562
diff changeset
  1982
When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$, 
Chengsong
parents: 562
diff changeset
  1983
$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1984
the number of terms in $r^* \backslash s$ will grow exponentially rather than linearly
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1985
in the sequence case.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1986
The good news is that the function $\textit{rsimp}$ will again
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1987
ignore the difference between different nesting patterns of alternatives,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1988
and the exponentially growing star derivative like
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1989
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1990
	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1991
	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1992
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1993
can be treated as
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1994
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1995
	$\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'', 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1996
	r \backslash c'c'' \cdot r^*, r \backslash c'' \cdot r^*]}$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1997
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1998
which can be de-duplicated by $\rDistinct$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  1999
and therefore bounded finitely.
564
Chengsong
parents: 562
diff changeset
  2000
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2001
%and then de-duplicate terms of the form  ($s'$ being a substring of $s$).
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2002
%This allows us to use a similar technique as $r_1 \cdot r_2$ case,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2003
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2004
Now the crux of this section is finding a suitable description
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2005
for $rs$ where
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2006
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2007
	$\rderssimp{r^*}{s} = \rsimp{\sum rs}$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2008
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2009
holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2010
In addition, the list $rs$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2011
shall be in the form of 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2012
$\map \; (\lambda s'. r\backslash s' \cdot r^*) \; Ss$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2013
The $Ss$ is a list of strings, and for example in the sequence
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2014
closed form it is specified as $\textit{Suffix} \; s \; r_1$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2015
To get $Ss$ for the star regular expression,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2016
we need to introduce $\starupdate$ and $\starupdates$:
558
Chengsong
parents: 557
diff changeset
  2017
\begin{center}
Chengsong
parents: 557
diff changeset
  2018
	\begin{tabular}{lcl}
Chengsong
parents: 557
diff changeset
  2019
		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
Chengsong
parents: 557
diff changeset
  2020
		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
Chengsong
parents: 557
diff changeset
  2021
						     & & $\textit{if} \; 
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2022
						     (\rnullable \; (r \backslash_{rs} s))$ \\
558
Chengsong
parents: 557
diff changeset
  2023
						     & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
Chengsong
parents: 557
diff changeset
  2024
						     \starupdate \; c \; r \; Ss)$ \\
Chengsong
parents: 557
diff changeset
  2025
						     & & $\textit{else} \;\; (s @ [c]) :: (
Chengsong
parents: 557
diff changeset
  2026
						     \starupdate \; c \; r \; Ss)$
Chengsong
parents: 557
diff changeset
  2027
	\end{tabular}
Chengsong
parents: 557
diff changeset
  2028
\end{center}
Chengsong
parents: 557
diff changeset
  2029
\begin{center}
Chengsong
parents: 557
diff changeset
  2030
	\begin{tabular}{lcl}
Chengsong
parents: 557
diff changeset
  2031
		$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
Chengsong
parents: 557
diff changeset
  2032
		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
Chengsong
parents: 557
diff changeset
  2033
		\starupdate \; c \; r \; Ss)$
Chengsong
parents: 557
diff changeset
  2034
	\end{tabular}
Chengsong
parents: 557
diff changeset
  2035
\end{center}
Chengsong
parents: 557
diff changeset
  2036
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2037
Assuming we have that
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2038
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2039
	$\rderssimp{r^*}{s} = \rsimp{(\sum \map \; (\lambda s'. r\backslash s' \cdot r^*) \; Ss)}$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2040
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2041
holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2042
The idea of $\starupdate$ and $\starupdates$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2043
is to update $Ss$ when another
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2044
derivative is taken on $\rderssimp{r^*}{s}$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2045
w.r.t a character $c$ and a string $s'$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2046
respectively.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2047
Both $\starupdate$ and $\starupdates$ take three arguments as input:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2048
the new character $c$ or string $s$ to take derivative with, 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2049
the regular expression
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2050
$r$ under the star $r^*$, and the
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2051
list of strings $Ss$ for the derivative $r^* \backslash s$ 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2052
up until this point  
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2053
such that 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2054
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2055
$(r^*) \backslash s = \sum_{s' \in sSet} (r\backslash s') \cdot r^*$ 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2056
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2057
is satisfied.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2058
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2059
Functions $\starupdate$ and $\starupdates$ characterise what the 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2060
star derivatives will look like once ``straightened out'' into lists.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2061
The helper functions for such operations will be similar to
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2062
$\sflat{\_}$, $\sflataux{\_}$ and $\sflataux{\_}$, which we defined for sequence.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2063
We use similar symbols to denote them, with a $*$ subscript to mark the difference.
558
Chengsong
parents: 557
diff changeset
  2064
\begin{center}
Chengsong
parents: 557
diff changeset
  2065
	\begin{tabular}{lcl}
Chengsong
parents: 557
diff changeset
  2066
		$\hflataux{r_1 + r_2}$ & $\dn$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
Chengsong
parents: 557
diff changeset
  2067
		$\hflataux{r}$ & $\dn$ & $[r]$
Chengsong
parents: 557
diff changeset
  2068
	\end{tabular}
Chengsong
parents: 557
diff changeset
  2069
\end{center}
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  2070
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  2071
\begin{center}
558
Chengsong
parents: 557
diff changeset
  2072
	\begin{tabular}{lcl}
Chengsong
parents: 557
diff changeset
  2073
		$\hflat{r_1 + r_2}$ & $\dn$ & $\sum (\hflataux {r_1} @ \hflataux {r_2}) $\\
Chengsong
parents: 557
diff changeset
  2074
		$\hflat{r}$ & $\dn$ & $r$
Chengsong
parents: 557
diff changeset
  2075
	\end{tabular}
Chengsong
parents: 557
diff changeset
  2076
\end{center}
Chengsong
parents: 557
diff changeset
  2077
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2078
These definitions are tailor-made for dealing with alternatives that have
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2079
originated from a star's derivatives.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2080
A typical star derivative always has the structure of a balanced binary tree:
564
Chengsong
parents: 562
diff changeset
  2081
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2082
	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2083
	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2084
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2085
All of the nested structures of alternatives
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2086
generated from derivatives are binary, and therefore
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2087
$\hflat{\_}$ and $\hflataux{\_}$ only deal with binary alternatives.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2088
$\hflat{\_}$ ``untangles'' like the following:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2089
\[
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2090
	\sum ((r_1 + r_2) + (r_3 + r_4))  + \ldots \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2091
	\stackrel{\hflat{\_}}{\longrightarrow} \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2092
	\RALTS{[r_1, r_2, \ldots, r_n]} 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2093
\]
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2094
Here is a lemma stating the recursive property of $\starupdate$ and $\starupdates$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2095
with the helpers $\hflat{\_}$ and $\hflataux{\_}$\footnote{The function $\textit{concat}$ takes a list of lists 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2096
			and merges each of the element lists to form a flattened list.}:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2097
\begin{lemma}\label{stupdateInduct1}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2098
	\mbox
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2099
	For a list of strings $Ss$, the following hold.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2100
	\begin{itemize}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2101
		\item
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2102
			If we do a derivative on the terms 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2103
			$r\backslash_r s \cdot r^*$ (where $s$ is taken from the list $Ss$),
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2104
			the result will be the same as if we apply $\starupdate$ to $Ss$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2105
			\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2106
				\begin{tabular}{c}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2107
			$\textit{concat} \; (\map \; (\hflataux{\_} \circ ( (\_\backslash_r x)
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2108
			\circ (\lambda s.\;\; (r \backslash_r s) \cdot r^*)))\; Ss )\;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2109
			$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2110
			\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2111
			$=$ \\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2112
			\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2113
			$\map \; (\lambda s. (r \backslash_r s) \cdot (r^*)) \; 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2114
			(\starupdate \; x \; r \; Ss)$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2115
				\end{tabular}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2116
			\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2117
		\item
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2118
			$\starupdates$ is ``composable'' w.r.t a derivative.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2119
			It piggybacks the character $x$ to the tail of the string
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2120
			$xs$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2121
			\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2122
				\begin{tabular}{c}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2123
					$\textit{concat} \; (\map \; \hflataux{\_} \; 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2124
					(\map \; (\_\backslash_r x) \; 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2125
					(\map \; (\lambda s.\;\; (r \backslash_r s) \cdot 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2126
					(r^*) ) \; (\starupdates \; xs \; r \; Ss))))$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2127
					\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2128
					$=$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2129
					\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2130
					$\map \; (\lambda s.\;\; (r\backslash_r s) \cdot (r^*)) \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2131
					(\starupdates \; (xs @ [x]) \; r \; Ss)$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2132
				\end{tabular}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2133
			\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2134
	\end{itemize}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2135
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2136
			
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2137
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2138
	Part 1 is by induction on $Ss$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2139
	Part 2 is by induction on $xs$, where $Ss$ is left to take arbitrary values.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2140
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2141
			
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2142
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2143
Like $\textit{createdBySequence}$, we need
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2144
a predicate for ``star-created'' regular expressions:
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2145
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2146
	\begin{mathpar}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2147
		\inferrule{\mbox{}}{ \textit{createdByStar}\; \RSEQ{ra}{\RSTAR{rb}} }
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2148
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2149
		\inferrule{  \textit{createdByStar} \; r_1\; \land  \; \textit{createdByStar} \; r_2 }{\textit{createdByStar} \; (r_1 + r_2) } 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2150
	\end{mathpar}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2151
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2152
\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2153
All regular expressions created by taking derivatives of
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2154
$r_1 \cdot (r_2)^*$ satisfy the $\textit{createdByStar}$ predicate:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2155
\begin{lemma}\label{starDersCbs}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2156
	$\textit{createdByStar} \; ((r_1 \cdot r_2^*) \backslash_r s) $ holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2157
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2158
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2159
	By a reverse induction on $s$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2160
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2161
If a regular expression conforms to the shape of a star's derivative,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2162
then we can push an application of $\hflataux{\_}$ inside a derivative of it:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2163
\begin{lemma}\label{hfauPushin}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2164
	If $\textit{createdByStar} \; r$ holds, then
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2165
	\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2166
		$\hflataux{r \backslash_r c} = \textit{concat} \; (
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2167
		\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2168
	\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2169
	holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2170
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2171
\begin{proof}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2172
	By an induction on the inductive cases of $\textit{createdByStar}$.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2173
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2174
%This is not entirely true for annotated regular expressions: 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2175
%%TODO: bsimp bders \neq bderssimp
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2176
%\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2177
%	$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2178
%\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2179
%For bit-codes, the order in which simplification is applied
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2180
%might cause a difference in the location they are placed.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2181
%If we want something like
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2182
%\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2183
%	$\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2184
%\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2185
%Some "canonicalization" procedure is required,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2186
%which either pushes all the common bitcodes to nodes
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2187
%as senior as possible:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2188
%\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2189
%	$_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2190
%\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2191
%or does the reverse. However bitcodes are not of interest if we are talking about
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2192
%the $\llbracket r \rrbracket$ size of a regex.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2193
%Therefore for the ease and simplicity of producing a
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2194
%proof for a size bound, we are happy to restrict ourselves to 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2195
%unannotated regular expressions, and obtain such equalities as
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2196
%TODO: rsimp sflat
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2197
% The simplification of a flattened out regular expression, provided it comes
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2198
%from the derivative of a star, is the same as the one nested.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2199
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2200
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2201
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2202
Now we introduce an inductive property
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2203
for $\starupdate$ and $\hflataux{\_}$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2204
\begin{lemma}\label{starHfauInduct}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2205
	If we do derivatives of $r^*$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2206
	with a string that starts with $c$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2207
	then flatten it out,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2208
	we obtain a list
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2209
	of the shape $\sum_{s' \in sS} (r\backslash_r s') \cdot r^*$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2210
	where $sS = \starupdates \; s \; r \; [[c]]$. Namely,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2211
	\begin{center}
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2212
	$\hflataux{(( (\rder{c}{r_0})\cdot(r_0^*))\backslash_{rs} s)} = 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2213
		\map \; (\lambda s_1. (r_0 \backslash_r s_1) \cdot (r_0^*)) \; 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2214
		(\starupdates \; s \; r_0 \; [[c]])$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2215
	\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2216
holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2217
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2218
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2219
	By an induction on $s$, the inductive cases
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2220
	being $[]$ and $s@[c]$. The lemmas \ref{hfauPushin} and \ref{starDersCbs} are used.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2221
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2222
\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2223
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2224
The function $\hflataux{\_}$ has a similar effect as $\textit{flatten}$:
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2225
\begin{lemma}\label{hflatauxGrewrites}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2226
	$a :: rs \grewrites \hflataux{a} @ rs$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2227
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2228
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2229
	By induction on $a$. $rs$ is set to take arbitrary values.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2230
\end{proof}
638
dd9dde2d902b comments till chap4
Chengsong
parents: 625
diff changeset
  2231
It is also not surprising that 
dd9dde2d902b comments till chap4
Chengsong
parents: 625
diff changeset
  2232
two regular expressions differing only in terms
dd9dde2d902b comments till chap4
Chengsong
parents: 625
diff changeset
  2233
of the
dd9dde2d902b comments till chap4
Chengsong
parents: 625
diff changeset
  2234
nesting of parentheses are equivalent w.r.t. $\textit{rsimp}$:
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2235
\begin{lemma}\label{cbsHfauRsimpeq1}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2236
	$\rsimp{(r_1 + r_2)} = \rsimp{(\RALTS{\hflataux{r_1} @ \hflataux{r_2}})}$
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2237
\end{lemma}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2238
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2239
\begin{proof}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2240
	By using the rewriting relation $\rightsquigarrow$
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2241
\end{proof}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2242
And from this we obtain the following fact: a 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2243
regular expression created by star 
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2244
is the same as its flattened version, up to equivalence under $\textit{bsimp}$.
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2245
For example,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2246
\begin{lemma}\label{hfauRsimpeq2}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2247
	$\textit{createdByStar} \; r \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2248
\end{lemma}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2249
\begin{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2250
	By structural induction on $r$, where the induction rules 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2251
	are these of $\createdByStar{\_}$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2252
	Lemma \ref{cbsHfauRsimpeq1} is used in the inductive case.
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2253
\end{proof}
564
Chengsong
parents: 562
diff changeset
  2254
Chengsong
parents: 562
diff changeset
  2255
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2256
%Here is a corollary that states the lemma in
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2257
%a more intuitive way:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2258
%\begin{corollary}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2259
%	$\hflataux{r^* \backslash_r (c::xs)} = \map \; (\lambda s. (r \backslash_r s) \cdot
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2260
%	(r^*))\; (\starupdates \; c\; r\; [[c]])$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2261
%\end{corollary}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2262
%\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2263
%Note that this is also agnostic of the simplification
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2264
%function we defined, and is therefore of more general interest.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2265
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2266
Together with the rewriting relation
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2267
\begin{lemma}\label{starClosedForm6Hrewrites}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2268
	We have the following set of rewriting relations or equalities:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2269
	\begin{itemize}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2270
		\item
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2271
			$\textit{rsimp} \; (r^* \backslash_r (c::s)) 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2272
			\sequal
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2273
			\sum \; ( ( \sum (\lambda s. (r\backslash_r s) \cdot r^*) \; (
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2274
			\starupdates \; s \; r \; [ c::[]] ) ) )$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2275
		\item
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2276
			$r \backslash_{rsimps} (c::s) = \textit{rsimp} \; ( (
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2277
			\sum ( (\map \; (\lambda s_1. (r\backslash s_1) \; r^*) \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2278
			(\starupdates \;s \; r \; [ c::[] ])))))$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2279
		\item
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2280
			$\sum ( (\map \; (\lambda s. (r\backslash s) \; r^*) \; Ss))
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2281
			\sequal
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2282
			 \sum ( (\map \; (\lambda s. \textit{rsimp} \; (r\backslash s) \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2283
			 r^*) \; Ss) )$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2284
		\item
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2285
			$\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2286
			\scfrewrites
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2287
			\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2288
		\item
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2289
			$( ( \sum ( ( \map \ (\lambda s. \;\;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2290
			(\textit{rsimp} \; (r \backslash_r s)) \cdot r^*) \; (\starupdates \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2291
			s \; r \; [ c::[] ])))))$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2292
			$\sequal$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2293
			$( ( \sum ( ( \map \ (\lambda s. \;\;
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2294
			( r \backslash_{rsimps} s)) \cdot r^*) \; (\starupdates \;
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2295
			s \; r \; [ c::[] ]))))$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2296
	\end{itemize}
558
Chengsong
parents: 557
diff changeset
  2297
\end{lemma}
Chengsong
parents: 557
diff changeset
  2298
\begin{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2299
	Part 1 leads to part 2.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2300
	The rest of them are routine.
558
Chengsong
parents: 557
diff changeset
  2301
\end{proof}
Chengsong
parents: 557
diff changeset
  2302
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2303
Next the closed form for star regular expressions can be derived:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2304
\begin{theorem}\label{starClosedForm}
558
Chengsong
parents: 557
diff changeset
  2305
	$\rderssimp{r^*}{c::s} = 
Chengsong
parents: 557
diff changeset
  2306
	\rsimp{
Chengsong
parents: 557
diff changeset
  2307
		(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2308
		(\starupdates \; s\; r \; [[c]])
558
Chengsong
parents: 557
diff changeset
  2309
		)
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2310
		)
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2311
	}
558
Chengsong
parents: 557
diff changeset
  2312
	$
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2313
\end{theorem}
558
Chengsong
parents: 557
diff changeset
  2314
\begin{proof}
Chengsong
parents: 557
diff changeset
  2315
	By an induction on $s$.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2316
	The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, \ref{starClosedForm6Hrewrites} 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2317
	and \ref{hfauRsimpeq2}
558
Chengsong
parents: 557
diff changeset
  2318
	are used.	
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2319
	In \ref{starClosedForm6Hrewrites}, the equalities are
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2320
	used to link the LHS and RHS.
558
Chengsong
parents: 557
diff changeset
  2321
\end{proof}
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2322
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2323
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2324
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2325
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2326
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2327
613
Chengsong
parents: 611
diff changeset
  2328
%----------------------------------------------------------------------------------------
Chengsong
parents: 611
diff changeset
  2329
%	SECTION ??
Chengsong
parents: 611
diff changeset
  2330
%----------------------------------------------------------------------------------------
Chengsong
parents: 611
diff changeset
  2331
Chengsong
parents: 611
diff changeset
  2332
%-----------------------------------
Chengsong
parents: 611
diff changeset
  2333
%	SECTION syntactic equivalence under simp
Chengsong
parents: 611
diff changeset
  2334
%-----------------------------------
Chengsong
parents: 611
diff changeset
  2335
Chengsong
parents: 611
diff changeset
  2336
Chengsong
parents: 611
diff changeset
  2337
%----------------------------------------------------------------------------------------
Chengsong
parents: 611
diff changeset
  2338
%	SECTION ALTS CLOSED FORM
Chengsong
parents: 611
diff changeset
  2339
%----------------------------------------------------------------------------------------
Chengsong
parents: 611
diff changeset
  2340
%\section{A Closed Form for \textit{ALTS}}
Chengsong
parents: 611
diff changeset
  2341
%Now we prove that  $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
Chengsong
parents: 611
diff changeset
  2342
%
Chengsong
parents: 611
diff changeset
  2343
%
Chengsong
parents: 611
diff changeset
  2344
%There are a few key steps, one of these steps is
Chengsong
parents: 611
diff changeset
  2345
%
Chengsong
parents: 611
diff changeset
  2346
%
Chengsong
parents: 611
diff changeset
  2347
%
Chengsong
parents: 611
diff changeset
  2348
%One might want to prove this by something a simple statement like: 
Chengsong
parents: 611
diff changeset
  2349
%
Chengsong
parents: 611
diff changeset
  2350
%For this to hold we want the $\textit{distinct}$ function to pick up
Chengsong
parents: 611
diff changeset
  2351
%the elements before and after derivatives correctly:
Chengsong
parents: 611
diff changeset
  2352
%$r \in rset \equiv (rder x r) \in (rder x rset)$.
Chengsong
parents: 611
diff changeset
  2353
%which essentially requires that the function $\backslash$ is an injective mapping.
Chengsong
parents: 611
diff changeset
  2354
%
Chengsong
parents: 611
diff changeset
  2355
%Unfortunately the function $\backslash c$ is not an injective mapping.
Chengsong
parents: 611
diff changeset
  2356
%
Chengsong
parents: 611
diff changeset
  2357
%\subsection{function $\backslash c$ is not injective (1-to-1)}
Chengsong
parents: 611
diff changeset
  2358
%\begin{center}
Chengsong
parents: 611
diff changeset
  2359
%	The derivative $w.r.t$ character $c$ is not one-to-one.
Chengsong
parents: 611
diff changeset
  2360
%	Formally,
Chengsong
parents: 611
diff changeset
  2361
%	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
Chengsong
parents: 611
diff changeset
  2362
%\end{center}
Chengsong
parents: 611
diff changeset
  2363
%This property is trivially true for the
Chengsong
parents: 611
diff changeset
  2364
%character regex example:
Chengsong
parents: 611
diff changeset
  2365
%\begin{center}
Chengsong
parents: 611
diff changeset
  2366
%	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
Chengsong
parents: 611
diff changeset
  2367
%\end{center}
Chengsong
parents: 611
diff changeset
  2368
%But apart from the cases where the derivative
Chengsong
parents: 611
diff changeset
  2369
%output is $\ZERO$, are there non-trivial results
Chengsong
parents: 611
diff changeset
  2370
%of derivatives which contain strings?
Chengsong
parents: 611
diff changeset
  2371
%The answer is yes.
Chengsong
parents: 611
diff changeset
  2372
%For example,
Chengsong
parents: 611
diff changeset
  2373
%\begin{center}
Chengsong
parents: 611
diff changeset
  2374
%	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
Chengsong
parents: 611
diff changeset
  2375
%	where $a$ is not nullable.\\
Chengsong
parents: 611
diff changeset
  2376
%	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
Chengsong
parents: 611
diff changeset
  2377
%	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
Chengsong
parents: 611
diff changeset
  2378
%\end{center}
Chengsong
parents: 611
diff changeset
  2379
%We start with two syntactically different regular expressions,
Chengsong
parents: 611
diff changeset
  2380
%and end up with the same derivative result.
Chengsong
parents: 611
diff changeset
  2381
%This is not surprising as we have such 
Chengsong
parents: 611
diff changeset
  2382
%equality as below in the style of Arden's lemma:\\
Chengsong
parents: 611
diff changeset
  2383
%\begin{center}
Chengsong
parents: 611
diff changeset
  2384
%	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
Chengsong
parents: 611
diff changeset
  2385
%\end{center}
Chengsong
parents: 611
diff changeset
  2386
\section{Bounding Closed Forms}
Chengsong
parents: 611
diff changeset
  2387
Chengsong
parents: 611
diff changeset
  2388
In this section, we introduce how we formalised the bound
Chengsong
parents: 611
diff changeset
  2389
on closed forms.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2390
We first show that in general the number of regular expressions up to a certain 
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2391
size is finite.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2392
Then we prove that functions such as $\rflts$
613
Chengsong
parents: 611
diff changeset
  2393
will not cause the size of r-regular expressions to grow.
Chengsong
parents: 611
diff changeset
  2394
Putting this together with a general bound 
Chengsong
parents: 611
diff changeset
  2395
on the finiteness of distinct regular expressions
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2396
up to a specific size, we obtain a bound on 
613
Chengsong
parents: 611
diff changeset
  2397
the closed forms.
Chengsong
parents: 611
diff changeset
  2398
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2399
\subsection{Finiteness of Distinct Regular Expressions}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2400
We define the set of regular expressions whose size is no more than
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2401
a certain size $N$ as $\textit{sizeNregex} \; N$:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2402
\[
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2403
	\textit{sizeNregex} \; N \dn \{r\; \mid \;  \llbracket r \rrbracket_r \leq N \}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2404
\]
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2405
We have that $\textit{sizeNregex} \; N$ is always a finite set:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2406
\begin{lemma}\label{finiteSizeN}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2407
	$\textit{finite} \; (\textit{sizeNregex} \; N)$ holds.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2408
\end{lemma}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2409
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2410
	By splitting the set $\textit{sizeNregex} \; (N + 1)$ into
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2411
	subsets by their categories:
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2412
	$\{\ZERO_r, \ONE_r, c\}$, $\{r^* \mid r \in \textit{sizeNregex} \; N\}$,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2413
	and so on. Each of these subsets is finitely bounded.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2414
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2415
\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2416
From this we get a corollary that
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2417
if forall $r \in rs$, $\rsize{r} \leq N$, then the output of 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2418
$\rdistinct{rs}{\varnothing}$ is a list of regular
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2419
expressions of finite size depending on $N$ only. 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2420
\begin{corollary}\label{finiteSizeNCorollary}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2421
	$\rsize{\rdistinct{rs}{\varnothing}} \leq c_N * N$ holds,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2422
	where the constant $c_N$ is equal to $\textit{card} \; (\textit{sizeNregex} \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2423
	N)$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2424
\end{corollary}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2425
\begin{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2426
	For all $r$ in 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2427
	$\textit{set} \; (\rdistinct{rs}{\varnothing})$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2428
	it is always the case that $\rsize{r} \leq N$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2429
	In addition, the list length is bounded by
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2430
	$c_N$, yielding the desired bound.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2431
\end{proof}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2432
\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2433
This fact will be handy in estimating the closed form sizes.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2434
%We have proven that the size of the
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2435
%output of $\textit{rdistinct} \; rs' \; \varnothing$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2436
%is bounded by a constant $N * c_N$ depending only on $N$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2437
%provided that each of $rs'$'s element
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2438
%is bounded by $N$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2439
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2440
\subsection{$\textit{rsimp}$ Does Not Increase the Size}
613
Chengsong
parents: 611
diff changeset
  2441
Although it seems evident, we need a series
Chengsong
parents: 611
diff changeset
  2442
of non-trivial lemmas to establish that functions such as $\rflts$
Chengsong
parents: 611
diff changeset
  2443
do not cause the regular expressions to grow.
Chengsong
parents: 611
diff changeset
  2444
\begin{lemma}\label{rsimpMonoLemmas}
Chengsong
parents: 611
diff changeset
  2445
	\mbox{}
Chengsong
parents: 611
diff changeset
  2446
	\begin{itemize}
Chengsong
parents: 611
diff changeset
  2447
		\item
Chengsong
parents: 611
diff changeset
  2448
			\[
Chengsong
parents: 611
diff changeset
  2449
				\llbracket \rsimpalts \; rs \rrbracket_r \leq
Chengsong
parents: 611
diff changeset
  2450
				\llbracket \sum \; rs \rrbracket_r
Chengsong
parents: 611
diff changeset
  2451
			\]
Chengsong
parents: 611
diff changeset
  2452
		\item
Chengsong
parents: 611
diff changeset
  2453
			\[
Chengsong
parents: 611
diff changeset
  2454
				\llbracket \rsimpseq \; r_1 \;  r_2 \rrbracket_r \leq
Chengsong
parents: 611
diff changeset
  2455
				\llbracket r_1 \cdot r_2 \rrbracket_r
Chengsong
parents: 611
diff changeset
  2456
			\]
Chengsong
parents: 611
diff changeset
  2457
		\item
Chengsong
parents: 611
diff changeset
  2458
			\[
Chengsong
parents: 611
diff changeset
  2459
				\llbracket \rflts \; rs \rrbracket_r  \leq
Chengsong
parents: 611
diff changeset
  2460
				\llbracket rs \rrbracket_r 
Chengsong
parents: 611
diff changeset
  2461
			\]
Chengsong
parents: 611
diff changeset
  2462
		\item
Chengsong
parents: 611
diff changeset
  2463
			\[
Chengsong
parents: 611
diff changeset
  2464
				\llbracket \rDistinct \; rs \; ss \rrbracket_r  \leq
Chengsong
parents: 611
diff changeset
  2465
				\llbracket rs \rrbracket_r 
Chengsong
parents: 611
diff changeset
  2466
			\]
Chengsong
parents: 611
diff changeset
  2467
		\item
Chengsong
parents: 611
diff changeset
  2468
			If all elements $a$ in the set $as$ satisfy the property
Chengsong
parents: 611
diff changeset
  2469
			that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
Chengsong
parents: 611
diff changeset
  2470
			\llbracket a \rrbracket_r$, then we have 
Chengsong
parents: 611
diff changeset
  2471
			\[
Chengsong
parents: 611
diff changeset
  2472
				\llbracket \; \rsimpalts \; (\textit{rdistinct} \;
Chengsong
parents: 611
diff changeset
  2473
				(\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
Chengsong
parents: 611
diff changeset
  2474
				\rrbracket \leq
Chengsong
parents: 611
diff changeset
  2475
				\llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
Chengsong
parents: 611
diff changeset
  2476
				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r 
Chengsong
parents: 611
diff changeset
  2477
			\]
Chengsong
parents: 611
diff changeset
  2478
	\end{itemize}
Chengsong
parents: 611
diff changeset
  2479
\end{lemma}
Chengsong
parents: 611
diff changeset
  2480
\begin{proof}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2481
	Points 1, 3, and 4 can be proven by an induction on $rs$.
613
Chengsong
parents: 611
diff changeset
  2482
	Point 2 is by case analysis on $r_1$ and $r_2$.
Chengsong
parents: 611
diff changeset
  2483
	The last part is a corollary of the previous ones.
Chengsong
parents: 611
diff changeset
  2484
\end{proof}
Chengsong
parents: 611
diff changeset
  2485
\noindent
Chengsong
parents: 611
diff changeset
  2486
With the lemmas for each inductive case in place, we are ready to get 
Chengsong
parents: 611
diff changeset
  2487
the non-increasing property as a corollary:
Chengsong
parents: 611
diff changeset
  2488
\begin{corollary}\label{rsimpMono}
Chengsong
parents: 611
diff changeset
  2489
	$\llbracket \textit{rsimp} \; r \rrbracket_r \leq \llbracket r \rrbracket_r$
Chengsong
parents: 611
diff changeset
  2490
\end{corollary}
Chengsong
parents: 611
diff changeset
  2491
\begin{proof}
Chengsong
parents: 611
diff changeset
  2492
	By \ref{rsimpMonoLemmas}.
Chengsong
parents: 611
diff changeset
  2493
\end{proof}
Chengsong
parents: 611
diff changeset
  2494
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2495
\subsection{Estimating the Closed Forms' sizes}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2496
We recap the closed forms we obtained
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2497
earlier:
558
Chengsong
parents: 557
diff changeset
  2498
\begin{itemize}
Chengsong
parents: 557
diff changeset
  2499
	\item
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2500
		$\rderssimp{(\sum rs)}{s} \sequal
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2501
		\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
558
Chengsong
parents: 557
diff changeset
  2502
	\item
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2503
		$\rderssimp{(r_1 \cdot r_2)}{s} \sequal \sum ((r_1 \backslash s) \cdot r_2 ) 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2504
		:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1}))$
558
Chengsong
parents: 557
diff changeset
  2505
	\item
Chengsong
parents: 557
diff changeset
  2506
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2507
		$\rderssimp{r^*}{c::s} = 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2508
		\rsimp{
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2509
			(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; 
558
Chengsong
parents: 557
diff changeset
  2510
			(\starupdates \; s\; r \; [[c]])
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2511
			)
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2512
			)
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2513
		}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2514
		$
558
Chengsong
parents: 557
diff changeset
  2515
\end{itemize}	
Chengsong
parents: 557
diff changeset
  2516
\noindent	
Chengsong
parents: 557
diff changeset
  2517
The closed forms on the left-hand-side
Chengsong
parents: 557
diff changeset
  2518
are all of the same shape: $\rsimp{ (\sum rs)} $.
Chengsong
parents: 557
diff changeset
  2519
Such regular expression will be bounded by the size of $\sum rs'$, 
Chengsong
parents: 557
diff changeset
  2520
where every element in $rs'$ is distinct, and each element 
Chengsong
parents: 557
diff changeset
  2521
can be described by some inductive sub-structures 
Chengsong
parents: 557
diff changeset
  2522
(for example when $r = r_1 \cdot r_2$ then $rs'$ 
Chengsong
parents: 557
diff changeset
  2523
will be solely comprised of $r_1 \backslash s'$ 
Chengsong
parents: 557
diff changeset
  2524
and $r_2 \backslash s''$, $s'$ and $s''$ being 
Chengsong
parents: 557
diff changeset
  2525
sub-strings of $s$).
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2526
which will each have a size upper bound 
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2527
according to the inductive hypothesis, which controls $r \backslash s$.
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  2528
558
Chengsong
parents: 557
diff changeset
  2529
We elaborate the above reasoning by a series of lemmas
Chengsong
parents: 557
diff changeset
  2530
below, where straightforward proofs are omitted.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2531
%We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2532
We show that $\textit{rdistinct}$ and $\rflts$
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2533
working together is at least as 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2534
good as $\textit{rdistinct}$ alone, which can be written as
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2535
\begin{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2536
	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2537
	\leq 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2538
	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2539
\end{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2540
We need this so that we know the outcome of our real 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2541
simplification is better than or equal to a rough estimate,
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2542
and therefore can be bounded by that estimate.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2543
This is a bit harder to establish compared to proving
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2544
$\textit{flts}$ does not make a list larger (which can
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2545
be proven using routine induction):
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2546
\begin{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2547
	$\llbracket  \textit{rflts}\; rs \rrbracket_r \leq
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2548
	\llbracket  \textit{rs} \rrbracket_r$
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2549
\end{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2550
We cannot simply prove how each helper function
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2551
reduces the size and then put them together:
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2552
From
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2553
\begin{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2554
$\llbracket  \textit{rflts}\; rs \rrbracket_r \leq
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2555
	\llbracket  \textit{rs} \rrbracket_r$
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2556
\end{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2557
and
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2558
\begin{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2559
     $\llbracket  \textit{rdistinct} \; rs \; \varnothing \leq
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2560
     \llbracket rs \rrbracket_r$
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2561
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2562
one cannot infer 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2563
\begin{center}
558
Chengsong
parents: 557
diff changeset
  2564
	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
Chengsong
parents: 557
diff changeset
  2565
	\leq 
Chengsong
parents: 557
diff changeset
  2566
	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2567
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2568
What we can infer is that 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2569
\begin{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2570
	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2571
	\leq
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2572
	\llbracket rs \rrbracket_r$
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2573
\end{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2574
but this estimate is too rough and $\llbracket rs \rrbracket_r$	is unbounded.
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2575
The way we 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2576
get around this is by first proving a more general lemma 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2577
(so that the inductive case goes through):
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2578
\begin{lemma}\label{fltsSizeReductionAlts}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2579
	If we have three accumulator sets:
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2580
	$noalts\_set$, $alts\_set$ and $corr\_set$,
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2581
	satisfying:
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2582
	\begin{itemize}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2583
		\item
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2584
			$\forall r \in noalts\_set. \; \nexists xs.\; r = \sum  xs$
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2585
		\item
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2586
			$\forall r \in alts\_set. \; \exists xs. \; r = \sum xs
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2587
			\; \textit{and} \; set \; xs \subseteq corr\_set$
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2588
	\end{itemize}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2589
	then we have that
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2590
	\begin{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2591
	\begin{tabular}{lcl}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2592
	$\llbracket  (\textit{rdistinct} \; (\textit{rflts} \; as) \;
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2593
	(noalts\_set \cup corr\_set)) \rrbracket_r$ & $\leq$ &\\
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2594
						    $\llbracket  (\textit{rdistinct} \; as \; (noalts\_set \cup alts\_set \cup
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2595
	\{ \ZERO_r \} )) \rrbracket_r$ & & \\ 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2596
	\end{tabular}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2597
	\end{center}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2598
		holds.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  2599
\end{lemma}
558
Chengsong
parents: 557
diff changeset
  2600
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2601
We split the accumulator into two parts: the part
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2602
which contains alternative regular expressions ($alts\_set$), and 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2603
the part without any of them($noalts\_set$).
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2604
This is because $\rflts$ opens up the alternatives in $as$,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2605
causing the accumulators on both sides of the inequality
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2606
to diverge slightly.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2607
If we want to compare the accumulators that are not
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2608
perfectly in sync, we need to consider the alternatives and non-alternatives
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2609
separately.
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2610
The set $corr\_set$ is the corresponding set
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2611
of $alts\_set$ with all elements under the alternative constructor
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2612
spilled out.
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2613
\begin{proof}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2614
	By induction on the list $as$. We make use of lemma \ref{rdistinctConcat}.
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2615
\end{proof}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2616
By setting all three sets to the empty set, one gets the desired size estimate:
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2617
\begin{corollary}\label{interactionFltsDB}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2618
	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2619
	\leq 
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2620
	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2621
\end{corollary}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2622
\begin{proof}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2623
	By using the lemma \ref{fltsSizeReductionAlts}.
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2624
\end{proof}
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2625
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2626
The intuition for why this is true
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2627
is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of 
558
Chengsong
parents: 557
diff changeset
  2628
duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$. 
Chengsong
parents: 557
diff changeset
  2629
Chengsong
parents: 557
diff changeset
  2630
Now this $\rsimp{\sum rs}$ can be estimated using $\rdistinct{rs}{\varnothing}$:
Chengsong
parents: 557
diff changeset
  2631
\begin{lemma}\label{altsSimpControl}
Chengsong
parents: 557
diff changeset
  2632
	$\rsize{\rsimp{\sum rs}} \leq \rsize{\rdistinct{rs}{\varnothing}}+ 1$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  2633
\end{lemma}
558
Chengsong
parents: 557
diff changeset
  2634
\begin{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2635
	By using corollary \ref{interactionFltsDB}.
558
Chengsong
parents: 557
diff changeset
  2636
\end{proof}
Chengsong
parents: 557
diff changeset
  2637
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2638
This is a key lemma in establishing the bounds of all the 
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2639
closed forms.
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2640
With this we are now ready to control the sizes of
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2641
$(r_1 \cdot r_2 )\backslash s$ and $r^* \backslash s$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2642
\begin{theorem}\label{rBound}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2643
	For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
558
Chengsong
parents: 557
diff changeset
  2644
\end{theorem}
Chengsong
parents: 557
diff changeset
  2645
\noindent
Chengsong
parents: 557
diff changeset
  2646
\begin{proof}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2647
	We prove this by induction on $r$. The base cases for $\RZERO$,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2648
	$\RONE $ and $\RCHAR{c}$ are straightforward. 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2649
	In the sequence $r_1 \cdot r_2$ case,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2650
	the inductive hypotheses state 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2651
	$\exists N_1. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N_1$ and
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2652
	$\exists N_2. \forall s. \; \llbracket \rderssimp{r_2}{s} \rrbracket \leq N_2$. 
562
Chengsong
parents: 561
diff changeset
  2653
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2654
	When the string $s$ is not empty, we can reason as follows
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2655
	%
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2656
	\begin{center}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2657
		\begin{tabular}{lcll}
558
Chengsong
parents: 557
diff changeset
  2658
& & $ \llbracket   \rderssimp{r_1\cdot r_2 }{s} \rrbracket_r $\\
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2659
& $ = $ & $\llbracket \rsimp{(\sum(r_1 \backslash_{rsimps} s \cdot r_2 \; \;  :: \; \; 
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2660
		\map \; (r_2\backslash_{rsimps} \_)\; (\vsuf{s}{r})))} \rrbracket_r $ & (1) \\			
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2661
										     & $\leq$ & $\llbracket \rdistinct{(r_1 \backslash_{rsimps} s \cdot r_2 \; \;  :: \; \; 
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2662
	\map \; (r_2\backslash_{rsimps} \_)\; (\vsuf{s}{r}))}{\varnothing} \rrbracket_r  + 1$ & (2) \\
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2663
											     & $\leq$ & $2 + N_1 + \rsize{r_2} + (N_2 * (card\;(\sizeNregex \; N_2)))$ & (3)\\
558
Chengsong
parents: 557
diff changeset
  2664
\end{tabular}
Chengsong
parents: 557
diff changeset
  2665
\end{center}
561
486fb297ac7c more done
Chengsong
parents: 559
diff changeset
  2666
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2667
(1) is by theorem \ref{seqClosedForm}.
561
486fb297ac7c more done
Chengsong
parents: 559
diff changeset
  2668
(2) is by \ref{altsSimpControl}.
486fb297ac7c more done
Chengsong
parents: 559
diff changeset
  2669
(3) is by \ref{finiteSizeNCorollary}.
562
Chengsong
parents: 561
diff changeset
  2670
Chengsong
parents: 561
diff changeset
  2671
Chengsong
parents: 561
diff changeset
  2672
Combining the cases when $s = []$ and $s \neq []$, we get (4):
Chengsong
parents: 561
diff changeset
  2673
\begin{center}
Chengsong
parents: 561
diff changeset
  2674
	\begin{tabular}{lcll}
Chengsong
parents: 561
diff changeset
  2675
		$\rsize{(r_1 \cdot r_2) \backslash_r s}$ & $\leq$ & 
Chengsong
parents: 561
diff changeset
  2676
		$max \; (2 + N_1 + 
Chengsong
parents: 561
diff changeset
  2677
		\llbracket r_2 \rrbracket_r + 
Chengsong
parents: 561
diff changeset
  2678
		N_2 * (card\; (\sizeNregex \; N_2))) \; \rsize{r_1\cdot r_2}$ & (4)
Chengsong
parents: 561
diff changeset
  2679
	\end{tabular}
Chengsong
parents: 561
diff changeset
  2680
\end{center}
558
Chengsong
parents: 557
diff changeset
  2681
562
Chengsong
parents: 561
diff changeset
  2682
We reason similarly for  $\STAR$.
Chengsong
parents: 561
diff changeset
  2683
The inductive hypothesis is
Chengsong
parents: 561
diff changeset
  2684
$\exists N. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N$.
564
Chengsong
parents: 562
diff changeset
  2685
Let $n_r = \llbracket r^* \rrbracket_r$.
562
Chengsong
parents: 561
diff changeset
  2686
When $s = c :: cs$ is not empty,
Chengsong
parents: 561
diff changeset
  2687
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2688
	\begin{tabular}{lcll}
562
Chengsong
parents: 561
diff changeset
  2689
& & $ \llbracket   \rderssimp{r^* }{c::cs} \rrbracket_r $\\
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2690
& $ = $ & $\llbracket \rsimp{(\sum (\map \; (\lambda s. (r \backslash_{rsimps} s) \cdot r^*) \; (\starupdates\; 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2691
	cs \; r \; [[c]] )) )} \rrbracket_r $ & (5) \\			
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2692
					      & $\leq$ & $\llbracket 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2693
					      \rdistinct{
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2694
						      (\map \; 
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2695
						      (\lambda s. (r \backslash_{rsimps} s) \cdot r^*) \; 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2696
						      (\starupdates\; cs \; r \; [[c]] )
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2697
					      )}
562
Chengsong
parents: 561
diff changeset
  2698
	{\varnothing} \rrbracket_r  + 1$ & (6) \\
Chengsong
parents: 561
diff changeset
  2699
					 & $\leq$ & $1 + (\textit{card} (\sizeNregex \; (N + n_r)))
Chengsong
parents: 561
diff changeset
  2700
	* (1 + (N + n_r)) $ & (7)\\
Chengsong
parents: 561
diff changeset
  2701
\end{tabular}
Chengsong
parents: 561
diff changeset
  2702
\end{center}
Chengsong
parents: 561
diff changeset
  2703
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2704
(5) is by theorem \ref{starClosedForm}.
562
Chengsong
parents: 561
diff changeset
  2705
(6) is by \ref{altsSimpControl}.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2706
(7) is by corollary \ref{finiteSizeNCorollary}.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2707
Combining with the case when $s = []$, one obtains
562
Chengsong
parents: 561
diff changeset
  2708
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2709
	\begin{tabular}{lcll}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2710
		$\rsize{r^* \backslash_r s}$ & $\leq$ & $max \; n_r \; 1 + (\textit{card} (\sizeNregex \; (N + n_r)))
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2711
		* (1 + (N + n_r)) $ & (8)\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2712
	\end{tabular}
562
Chengsong
parents: 561
diff changeset
  2713
\end{center}
Chengsong
parents: 561
diff changeset
  2714
\noindent
Chengsong
parents: 561
diff changeset
  2715
Chengsong
parents: 561
diff changeset
  2716
The alternative case is slightly less involved.
Chengsong
parents: 561
diff changeset
  2717
The inductive hypothesis 
Chengsong
parents: 561
diff changeset
  2718
is equivalent to $\exists N. \forall r \in (\map \; (\_ \backslash_r s) \; rs). \rsize{r} \leq N$.
Chengsong
parents: 561
diff changeset
  2719
In the case when $s = c::cs$, we have 
Chengsong
parents: 561
diff changeset
  2720
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2721
	\begin{tabular}{lcll}
562
Chengsong
parents: 561
diff changeset
  2722
& & $ \llbracket   \rderssimp{\sum rs }{c::cs} \rrbracket_r $\\
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2723
& $ = $ & $\llbracket \rsimp{(\sum (\map \; (\_ \backslash_{rsimps} s)  \; rs) )} \rrbracket_r $ & (9) \\			
ae6010c14e49 chap6 almost done
Chengsong
parents: 618
diff changeset
  2724
& $\leq$ & $\llbracket (\sum (\map \; (\_ \backslash_{rsimps} s)  \; rs) ) \rrbracket_r $  & (10) \\
562
Chengsong
parents: 561
diff changeset
  2725
& $\leq$ & $1 + N * (length \; rs) $ & (11)\\
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2726
	\end{tabular}
562
Chengsong
parents: 561
diff changeset
  2727
\end{center}
Chengsong
parents: 561
diff changeset
  2728
\noindent
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2729
(9) is by theorem \ref{altsClosedForm}, (10) by lemma \ref{rsimpMono} and (11) by inductive hypothesis.
562
Chengsong
parents: 561
diff changeset
  2730
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2731
Combining with the case when $s = []$, we obtain 
562
Chengsong
parents: 561
diff changeset
  2732
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2733
	\begin{tabular}{lcll}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2734
		$\rsize{\sum rs \backslash_r s}$ & $\leq$ & $max \; \rsize{\sum rs} \; 1+N*(length \; rs)$ 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2735
						 & (12)\\
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  2736
	\end{tabular}
562
Chengsong
parents: 561
diff changeset
  2737
\end{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2738
We have all the inductive cases proven.
558
Chengsong
parents: 557
diff changeset
  2739
\end{proof}
Chengsong
parents: 557
diff changeset
  2740
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2741
This leads to our main result on the size bound:
564
Chengsong
parents: 562
diff changeset
  2742
\begin{corollary}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2743
	For any annotated regular expression $a$, $\exists N_r. \forall s. \; \rsize{\bderssimp{a}{s}} \leq N_r$
564
Chengsong
parents: 562
diff changeset
  2744
\end{corollary}
Chengsong
parents: 562
diff changeset
  2745
\begin{proof}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  2746
	By lemma \ref{sizeRelations} and theorem \ref{rBound}.
564
Chengsong
parents: 562
diff changeset
  2747
\end{proof}
558
Chengsong
parents: 557
diff changeset
  2748
\noindent
Chengsong
parents: 557
diff changeset
  2749
609
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2750
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2751
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2752
61139fdddae0 chap1 totally done
Chengsong
parents: 601
diff changeset
  2753
558
Chengsong
parents: 557
diff changeset
  2754
%-----------------------------------
Chengsong
parents: 557
diff changeset
  2755
%	SECTION 2
Chengsong
parents: 557
diff changeset
  2756
%-----------------------------------
Chengsong
parents: 557
diff changeset
  2757
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2758
\section{Bounded Repetitions}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2759
We have promised in chapter \ref{Introduction}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2760
that our lexing algorithm can potentially be extended
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2761
to handle bounded repetitions
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2762
in natural and elegant ways.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2763
Now we fulfill our promise by adding support for 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2764
the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2765
We add clauses in our derivatives-based lexing algorithms (with simplifications)
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2766
introduced in chapter \ref{Bitcoded2}.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2767
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2768
\subsection{Augmented Definitions}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2769
There are a number of definitions that need to be augmented.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2770
The most notable one would be the POSIX rules for $r^{\{n\}}$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2771
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2772
	\begin{mathpar}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2773
		\inferrule{\forall v \in vs_1. \vdash v:r \land 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2774
		|v| \neq []\\ \forall v \in vs_2. \vdash v:r \land |v| = []\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2775
		\textit{length} \; (vs_1 @ vs_2) = n}{\textit{Stars} \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2776
		(vs_1 @ vs_2) : r^{\{n\}} }
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2777
	\end{mathpar}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2778
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2779
As Ausaf had pointed out \cite{Ausaf},
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2780
sometimes empty iterations have to be taken to get
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2781
a match with exactly $n$ repetitions,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2782
and hence the $vs_2$ part.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2783
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2784
Another important definition would be the size:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2785
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2786
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2787
		$\llbracket r^{\{n\}} \rrbracket_r$ & $\dn$ & 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2788
		$\llbracket r \rrbracket_r + n$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2789
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2790
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2791
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2792
Arguably we should use $\log \; n$ for the size because
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2793
the number of digits increases logarithmically w.r.t $n$.
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2794
For simplicity we choose to add the counter directly to the size.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2795
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2796
The derivative w.r.t a bounded regular expression
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2797
is given as 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2798
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2799
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2800
		$r^{\{n\}} \backslash_r c$ & $\dn$ & 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2801
		$r\backslash_r c \cdot r^{\{n-1\}} \;\; \textit{if} \; n \geq 1$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2802
					   & & $\RZERO \;\quad \quad\quad \quad
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2803
					   \textit{otherwise}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2804
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2805
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2806
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2807
For brevity, we sometimes use NTIMES to refer to bounded 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2808
regular expressions.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2809
The $\mkeps$ function clause for NTIMES would be
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2810
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2811
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2812
		$\mkeps \; r^{\{n\}} $ & $\dn$ & $\Stars \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2813
		(\textit{replicate} \; n\; (\mkeps \; r))$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2814
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2815
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2816
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2817
The injection looks like
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2818
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2819
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2820
		$\inj \; r^{\{n\}} \; c\; (\Seq \;v \; (\Stars \; vs)) $ & 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2821
		$\dn$ & $\Stars \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2822
		((\inj \; r \;c \;v ) :: vs)$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2823
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2824
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2825
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2826
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2827
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2828
\subsection{Proofs for the Augmented Lexing Algorithm}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2829
We need to maintain two proofs with the additional $r^{\{n\}}$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2830
construct: the 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2831
correctness proof in chapter \ref{Bitcoded2},
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2832
and the finiteness proof in chapter \ref{Finite}.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2833
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2834
\subsubsection{Correctness Proof Augmentation}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2835
The correctness of $\textit{lexer}$ and $\textit{blexer}$ with bounded repetitions
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2836
have been proven by Ausaf and Urban\cite{AusafDyckhoffUrban2016}.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2837
As they have commented, once the definitions are in place,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2838
the proofs given for the basic regular expressions will extend to
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2839
bounded regular expressions, and there are no ``surprises''.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2840
We confirm this point because the correctness theorem would also
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2841
extend without surprise to $\blexersimp$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2842
The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2843
do not need to be changed,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2844
and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  2845
add one more line which can be solved by the Sledgehammer tool
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2846
to solve the $r^{\{n\}}$ inductive case.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2847
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2848
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2849
\subsubsection{Finiteness Proof Augmentation}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2850
The bounded repetitions are
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2851
very similar to stars, and therefore the treatment
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2852
is similar, with minor changes to handle some slight complications
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2853
when the counter reaches 0.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2854
The exponential growth is similar:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2855
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2856
	\begin{tabular}{ll}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2857
		$r^{\{n\}} $ & $\longrightarrow_{\backslash c}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2858
		$(r\backslash c)  \cdot  
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2859
		r^{\{n - 1\}}*$ & $\longrightarrow_{\backslash c'}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2860
		\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2861
		$r \backslash cc'  \cdot r^{\{n - 2\}}* + 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2862
		r \backslash c' \cdot r^{\{n - 1\}}*$ &
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2863
		$\longrightarrow_{\backslash c''}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2864
		\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2865
		$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2866
		r \backslash c''\cdot r^{\{n-1\}}) + 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2867
		(r \backslash c'c'' \cdot r^{\{n-2\}}* + 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2868
		r \backslash c'' \cdot r^{\{n-1\}}*)$ & 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2869
		$\longrightarrow_{\backslash c'''}$ \\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2870
		\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2871
		$\ldots$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2872
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2873
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2874
Again, we assume that $r\backslash c$, $r \backslash cc'$ and so on
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2875
are all nullable.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2876
The flattened list of terms for $r^{\{n\}} \backslash_{rs} s$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2877
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2878
	$[r_1 \backslash cc'c'' \cdot r^{\{n-3\}}*,\;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2879
	r \backslash c''\cdot r^{\{n-1\}}, \; 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2880
	r \backslash c'c'' \cdot r^{\{n-2\}}*, \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2881
	r \backslash c'' \cdot r^{\{n-1\}}*,\; \ldots ]$  
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2882
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2883
that comes from 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2884
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2885
		$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2886
		r \backslash c''\cdot r^{\{n-1\}}) + 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2887
		(r \backslash c'c'' \cdot r^{\{n-2\}}* + 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2888
		r \backslash c'' \cdot r^{\{n-1\}}*)+ \ldots$ 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2889
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2890
are made of sequences with different tails, where the counters
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2891
might differ.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2892
The observation for maintaining the bound is that
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2893
these counters never exceed $n$, the original
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2894
counter. With the number of counters staying finite,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2895
$\rDistinct$ will deduplicate and keep the list finite.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2896
We introduce this idea as a lemma once we describe all
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2897
the necessary helper functions.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2898
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2899
Similar to the star case, we want
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2900
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2901
	$\rderssimp{r^{\{n\}}}{s} = \rsimp{\sum rs}$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2902
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2903
where $rs$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2904
shall be in the form of 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2905
$\map \; f \; Ss$, where $f$ is a function and
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2906
$Ss$ a list of objects to act on.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2907
For star, the object's datatype is string.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2908
The list of strings $Ss$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2909
is generated using functions 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2910
$\starupdate$ and $\starupdates$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2911
The function that takes a string and returns a regular expression
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2912
is the anonymous function $
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2913
(\lambda s'. \; r\backslash s' \cdot r^{\{m\}})$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2914
In the NTIMES setting,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2915
the $\starupdate$ and $\starupdates$ functions are replaced by 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2916
$\textit{nupdate}$ and $\textit{nupdates}$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2917
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2918
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2919
		$\nupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2920
		$\nupdate \; c \; r \; 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2921
		(\Some \; (s, \; n + 1) \; :: \; Ss)$ & $\dn$ & %\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2922
						     $\textit{if} \; 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2923
						     (\rnullable \; (r \backslash_{rs} s))$ \\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2924
						     & & $\;\;\textit{then} 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2925
						     \;\; \Some \; (s @ [c], n + 1) :: \Some \; ([c], n) :: (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2926
						     \nupdate \; c \; r \; Ss)$ \\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2927
						     & & $\textit{else} \;\; \Some \; (s @ [c], n+1) :: (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2928
						     \nupdate \; c \; r \; Ss)$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2929
		$\nupdate \; c \; r \; (\textit{None} :: Ss)$ & $\dn$ & 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2930
		$(\None :: \nupdate  \; c \; r \; Ss)$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2931
							      & & \\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2932
	%\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2933
%\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2934
%\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2935
	%\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2936
		$\nupdates \; [] \; r \; Ss$ & $\dn$ & $Ss$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2937
		$\nupdates \; (c :: cs) \; r \; Ss$ &  $\dn$ &  $\nupdates \; cs \; r \; (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2938
		\nupdate \; c \; r \; Ss)$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2939
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2940
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2941
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2942
which take into account when a subterm
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2943
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2944
	$r \backslash_s s \cdot r^{\{n\}}$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2945
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2946
counter $n$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2947
is 0, and therefore expands to 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2948
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2949
$r \backslash_s (s@[c]) \cdot r^{\{n\}} \;+
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2950
\; \ZERO$ 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2951
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2952
after taking a derivative.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2953
The object now has type 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2954
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2955
$\textit{option} \;(\textit{string}, \textit{nat})$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2956
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2957
and therefore the function for converting such an option into
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2958
a regular expression term is called $\opterm$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2959
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2960
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2961
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2962
	$\opterm \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2963
				 & & $\;\;\Some \; (s, n) \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2964
				 (r\backslash_{rs} s)\cdot r^{\{n\}}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2965
				 & & $\;\;\None  \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2966
				 \ZERO$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2967
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2968
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2969
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2970
Put together, the list $\map \; f \; Ss$ is instantiated as
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2971
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2972
	$\map \; (\opterm \; r) \; (\nupdates \; s \; r \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2973
	[\Some \; ([c], n)])$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2974
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2975
For the closed form to be bounded, we would like
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2976
simplification to be applied to each term in the list.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2977
Therefore we introduce some variants of $\opterm$,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2978
which help conveniently express the rewriting steps 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2979
needed in the closed form proof.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2980
We have $\optermOsimp$, $\optermosimp$ and $\optermsimp$
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  2981
with slightly different spellings because they help the proof to go through:
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2982
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2983
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2984
	$\optermOsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2985
				 & & $\;\;\Some \; (s, n) \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2986
				 \textit{rsimp} \; ((r\backslash_{rs} s)\cdot r^{\{n\}})$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2987
				 & & $\;\;\None  \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2988
				 \ZERO$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2989
				 \\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2990
	$\optermosimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2991
				 & & $\;\;\Some \; (s, n) \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2992
				 (\textit{rsimp} \; (r\backslash_{rs} s)) 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2993
				 \cdot r^{\{n\}}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2994
				 & & $\;\;\None  \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2995
				 \ZERO$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2996
				 \\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2997
	$\optermsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2998
				 & & $\;\;\Some \; (s, n) \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  2999
				 (r\backslash_{rsimps} s)\cdot r^{\{n\}}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3000
				 & & $\;\;\None  \Rightarrow 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3001
				 \ZERO$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3002
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3003
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3004
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3005
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3006
For a list of 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3007
$\textit{option} \;(\textit{string}, \textit{nat})$ elements,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3008
we define the highest power for it recursively:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3009
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3010
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3011
		$\hpa \; [] \; n $ & $\dn$ & $n$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3012
		$\hpa \; (\None :: os) \; n $ &  $\dn$ &  $\hpa \; os \; n$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3013
		$\hpa \; (\Some \; (s, n) :: os) \; m$ & $\dn$ & 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3014
		$\hpa \;os \; (\textit{max} \; n\; m)$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3015
		\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3016
		$\hpower \; rs $ & $\dn$ & $\hpa \; rs \; 0$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3017
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3018
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3019
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3020
Now the intuition that an NTIMES regular expression's power
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3021
does not increase can be easily expressed as
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3022
\begin{lemma}\label{nupdatesMono2}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3023
	$\hpower \; (\nupdates \;s \; r \; [\Some \; ([c], n)]) \leq n$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3024
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3025
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3026
	Note that the power is non-increasing after a $\nupdate$ application:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3027
	\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3028
		$\hpa \;\; (\nupdate \; c \; r \; Ss)\;\; m \leq 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3029
		 \hpa\; \; Ss \; m$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3030
	 \end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3031
	 This is also the case for $\nupdates$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3032
	\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3033
		$\hpa \;\; (\nupdates \; s \; r \; Ss)\;\; m \leq 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3034
		 \hpa\; \; Ss \; m$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3035
	 \end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3036
	 Therefore we have that
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3037
	 \begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3038
		 $\hpower \;\; (\nupdates \; s \; r \; Ss) \leq
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3039
		  \hpower \;\; Ss$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3040
	 \end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3041
	 which leads to the lemma being proven.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3042
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3043
 \end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3044
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3045
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3046
We also define the inductive rules for
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3047
the shape of derivatives of the NTIMES regular expressions:\\[-3em]
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3048
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3049
	\begin{mathpar}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3050
		\inferrule{\mbox{}}{\cbn \;\ZERO}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3051
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3052
		\inferrule{\mbox{}}{\cbn \; \; r_a \cdot (r^{\{n\}})}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3053
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3054
		\inferrule{\cbn \; r_1 \;\; \; \cbn \; r_2}{\cbn \; r_1 + r_2}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3055
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3056
		\inferrule{\cbn \; r}{\cbn \; r + \ZERO}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3057
	\end{mathpar}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3058
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3059
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3060
A derivative of NTIMES fits into the shape described by $\cbn$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3061
\begin{lemma}\label{ntimesDersCbn}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3062
	$\cbn \; ((r' \cdot r^{\{n\}}) \backslash_{rs} s)$ holds.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3063
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3064
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3065
	By a reverse induction on $s$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3066
	For the inductive case, note that if $\cbn \; r$ holds,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3067
	then $\cbn \; (r\backslash_r c)$ holds.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3068
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3069
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3070
In addition, for $\cbn$-shaped regular expressions, one can flatten
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3071
them:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3072
\begin{lemma}\label{ntimesHfauPushin}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3073
	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3074
	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3075
	(\hflataux{r})})$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3076
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3077
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3078
	By an induction on the inductive cases of $\cbn$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3079
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3080
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3081
This time we do not need to define the flattening functions for NTIMES only,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3082
because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3083
\begin{lemma}\label{ntimesHfauInduct}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3084
$\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3085
 \map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3086
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3087
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3088
	By a reverse induction on $s$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3089
	The lemmas \ref{ntimesHfauPushin} and \ref{ntimesDersCbn} are used.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3090
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3091
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3092
We have a recursive property for NTIMES with $\nupdate$ 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3093
similar to that for STAR,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3094
and one for $\nupdates $ as well:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3095
\begin{lemma}\label{nupdateInduct1}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3096
	\mbox{}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3097
	\begin{itemize}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3098
		\item
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3099
			\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3100
	 $\textit{concat} \; (\map \; (\hflataux{\_} \circ (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3101
	\opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3102
	c \; r \; Ss)$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3103
	\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3104
	holds.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3105
\item
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3106
	\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3107
	 $\textit{concat} \; (\map \; \hflataux{\_}\; 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3108
	\map \; (\_\backslash_r x) \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3109
		(\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss)))$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3110
		$=$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3111
	$\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3112
	\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3113
	holds.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3114
	\end{itemize}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3115
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3116
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3117
	(i) is by an induction on $Ss$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3118
	(ii) is by an induction on $xs$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3119
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3120
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3121
The $\nString$ predicate is defined for conveniently
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3122
expressing that there are no empty strings in the
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3123
$\Some \;(s, n)$ elements generated by $\nupdate$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3124
\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3125
	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3126
		$\nString \; \None$  & $\dn$ & $ \textit{true}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3127
		$\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3128
		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3129
	\end{tabular}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3130
\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3131
\begin{lemma}\label{nupdatesNonempty}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3132
	If for all elements $o \in \textit{set} \; Ss$,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3133
	$\nString \; o$ holds, then we have that
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3134
	for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3135
	$\nString \; o'$ holds.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3136
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3137
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3138
	By an induction on $s$, where $Ss$ is set to vary over all possible values.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3139
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3140
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3141
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3142
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3143
\begin{lemma}\label{ntimesClosedFormsSteps}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3144
	The following list of equalities or rewriting relations hold:\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3145
	(i) $r^{\{n+1\}} \backslash_{rsimps} (c::s) = 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3146
	\textit{rsimp} \; (\sum (\map \; (\opterm \;r \;\_) \; (\nupdates \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3147
	s \; r \; [\Some \; ([c], n)])))$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3148
	(ii)
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3149
	\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3150
	$\sum (\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3151
	\Some \; ([c], n)]))$ \\ $ \sequal$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3152
	 $\sum (\map \; (\textit{rsimp} \circ (\opterm \; r))\; (\nupdates \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3153
	 s\;r \; [\Some \; ([c], n)]))$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3154
 	\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3155
	(iii)
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3156
	\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3157
	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3158
	([c], n)]))$\\ 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3159
	$\sequal$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3160
	 $\sum \;(\map \; (\optermsimp r) \; (\nupdates \; s \; r \; [\Some \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3161
	([c], n)])) $\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3162
	\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3163
	(iv)
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3164
	\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3165
	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3166
	([c], n)])) $ \\ $\sequal$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3167
	 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3168
	([c], n)])) $\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3169
	\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3170
	(v)
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3171
	\begin{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3172
	 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3173
	 ([c], n)])) $ \\ $\sequal$\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3174
	  $\sum \; (\map \; (\textit{rsimp} \circ (\opterm \; r)) \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3175
	  (\nupdates \; s \; r \; [\Some \; ([c], n)]))$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3176
  	\end{center}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3177
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3178
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3179
	Routine.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3180
	(iii) and (iv) make use of the fact that all the strings $s$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3181
	inside $\Some \; (s, m)$ which are elements of the list
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3182
	$\nupdates \; s\;r\;[\Some\; ([c], n)]$ are non-empty,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3183
	which is from lemma \ref{nupdatesNonempty}.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3184
	Once the string in $o = \Some \; (s, n)$ is 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3185
	nonempty, $\optermsimp \; r \;o$,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3186
	$\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3187
	to be equal.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3188
	(v) uses \ref{nupdateInduct1}.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3189
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3190
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3191
Now we are ready to present the closed form for NTIMES:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3192
\begin{theorem}\label{ntimesClosedForm}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3193
	The derivative of $r^{\{n+1\}}$ can be described as an alternative
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3194
	containing a list
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3195
	of terms:\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3196
	$r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3197
	\sum (\map \; (\optermsimp \; r) \; (\nupdates \; s \; r \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3198
	[\Some \; ([c], n)])))$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3199
\end{theorem}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3200
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3201
	By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3202
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3203
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3204
The key observation for bounding this closed form
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3205
is that the counter on $r^{\{n\}}$ will 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3206
only decrement during derivatives:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3207
\begin{lemma}\label{nupdatesNLeqN}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3208
	For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \;
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3209
	[\Some \; ([c], n)])$, either $o = \None$, or $o = \Some
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3210
	\; (s', m)$ for some string $s'$ and number $m \leq n$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3211
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3212
\noindent
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3213
The proof is routine and therefore omitted.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3214
This allows us to say what kind of terms
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3215
are in the list $\textit{set} \; (\map \; (\optermsimp \; r) \; (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3216
\nupdates \; s \; r \; [\Some \; ([c], n)]))$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3217
only $\ZERO_r$s or a sequence with the tail an $r^{\{m\}}$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3218
with a small $m$:
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3219
\begin{lemma}\label{ntimesClosedFormListElemShape}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3220
	For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3221
	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3222
	we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3223
	r^{\{m\}}$ for some string $s'$ and number $m \leq n$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3224
\end{lemma}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3225
\begin{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3226
	Using lemma \ref{nupdatesNLeqN}.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3227
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3228
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3229
\begin{theorem}\label{ntimesClosedFormBounded}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3230
	Assuming that for any string $s$, $\llbracket r \backslash_{rsimps} s
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3231
	\rrbracket_r \leq N$ holds, then we have that\\
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3232
	$\llbracket r^{\{n+1\}} \backslash_{rsimps} s \rrbracket_r \leq
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3233
	\textit{max} \; (c_N+1)* (N + \llbracket r^{\{n\}} \rrbracket+1)$,
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3234
	where $c_N = \textit{card} \; (\textit{sizeNregex} \; (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3235
	N + \llbracket r^{\{n\}} \rrbracket_r+1))$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3236
\end{theorem}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3237
\begin{proof}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3238
We have that for all regular expressions $r'$ in 
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3239
\begin{center}
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3240
$\textit{set} \; (\map \; (\optermsimp \; r) \; (
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3241
	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3242
\end{center}
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3243
	$r'$'s size is less than or equal to $N + \llbracket r^{\{n\}} 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3244
	\rrbracket_r + 1$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3245
because $r'$ can only be either a $\ZERO$ or $r \backslash_{rsimps} s' \cdot
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3246
r^{\{m\}}$ for some string $s'$ and number 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3247
$m \leq n$ (lemma \ref{ntimesClosedFormListElemShape}).
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3248
In addition, we know that the list 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3249
$\map \; (\optermsimp \; r) \; (
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3250
\nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3251
$c_N = \textit{card} \; 
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3252
(\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3253
This gives us $\llbracket r \backslash_{rsimps} \;s \rrbracket_r
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3254
\leq N * c_N$.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3255
\end{proof}
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3256
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3257
We aim to formalise the correctness and size bound
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3258
for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3259
and so on, which is still work in progress.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3260
They should more or less follow the same recipe described in this section.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3261
Once we know how to deal with them recursively using suitable auxiliary
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3262
definitions, we can routinely establish the proofs.
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 624
diff changeset
  3263
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3264
557
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  3265
%----------------------------------------------------------------------------------------
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  3266
%	SECTION 3
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  3267
%----------------------------------------------------------------------------------------
812e5d112f49 more changes
Chengsong
parents: 556
diff changeset
  3268
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3269
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3270
\section{Comments and Future Improvements}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3271
\subsection{Some Experimental Results}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3272
What guarantee does this bound give us?
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3273
It states that whatever the regex is, it will not grow indefinitely.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3274
Take our previous example $(a + aa)^*$ as an example:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3275
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3276
	\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3277
		\begin{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3278
			\begin{axis}[
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3279
				xlabel={number of $a$'s},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3280
				x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3281
				ylabel={regex size},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3282
				enlargelimits=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3283
				xtick={0,5,...,30},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3284
				xmax=33,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3285
				ymax= 40,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3286
				ytick={0,10,...,40},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3287
				scaled ticks=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3288
				axis lines=left,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3289
				width=5cm,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3290
				height=4cm, 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3291
				legend entries={$(a + aa)^*$},  
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3292
				legend pos=south east,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3293
				legend cell align=left]
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3294
				\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star.data};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3295
			\end{axis}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3296
		\end{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3297
	\end{tabular}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3298
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3299
We are able to limit the size of the regex $(a + aa)^*$'s derivatives
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3300
with our simplification
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3301
rules very effectively.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3302
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3303
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3304
In our proof for the inductive case $r_1 \cdot r_2$, the dominant term in the bound
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3305
is $l_{N_2} * N_2$, where $N_2$ is the bound we have for $\llbracket \bderssimp{r_2}{s} \rrbracket$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3306
Given that $l_{N_2}$ is roughly the size $4^{N_2}$, the size bound $\llbracket \bderssimp{r_1 \cdot r_2}{s} \rrbracket$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3307
inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3308
$f(x) = x * 2^x$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3309
This means the bound we have will surge up at least
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3310
tower-exponentially with a linear increase of the depth.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3311
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3312
One might be pretty skepticafl about what this non-elementary
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3313
bound can bring us.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3314
It turns out that the giant bounds are far from being hit.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3315
Here we have some test data from randomly generated regular expressions:
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3316
\begin{figure}[H]
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3317
	\begin{tabular}{@{}c@{\hspace{2mm}}c@{\hspace{0mm}}c@{}}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3318
		\begin{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3319
			\begin{axis}[
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3320
				xlabel={$n$},
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3321
				x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3322
				ylabel={regex size},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3323
				enlargelimits=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3324
				xtick={0,5,...,30},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3325
				xmax=33,
611
Chengsong
parents: 610
diff changeset
  3326
				%ymax=1000,
Chengsong
parents: 610
diff changeset
  3327
				%ytick={0,100,...,1000},
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3328
				scaled ticks=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3329
				axis lines=left,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3330
				width=4.75cm,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3331
				height=3.8cm, 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3332
				legend entries={regex1},  
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3333
				legend pos=north east,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3334
				legend cell align=left]
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3335
				\addplot[red,mark=*, mark options={fill=white}] table {regex1_size_change.data};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3336
			\end{axis}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3337
		\end{tikzpicture}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3338
 & 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3339
  \begin{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3340
	  \begin{axis}[
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3341
		  xlabel={$n$},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3342
		  x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3343
		  %ylabel={time in secs},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3344
		  enlargelimits=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3345
		  xtick={0,5,...,30},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3346
		  xmax=33,
611
Chengsong
parents: 610
diff changeset
  3347
		  %ymax=1000,
Chengsong
parents: 610
diff changeset
  3348
		  %ytick={0,100,...,1000},
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3349
		  scaled ticks=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3350
		  axis lines=left,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3351
		  width=4.75cm,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3352
		  height=3.8cm, 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3353
		  legend entries={regex2},  
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3354
		  legend pos=south east,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3355
		  legend cell align=left]
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3356
		  \addplot[blue,mark=*, mark options={fill=white}] table {regex2_size_change.data};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3357
	  \end{axis}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3358
  \end{tikzpicture}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3359
 & 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3360
  \begin{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3361
	  \begin{axis}[
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3362
		  xlabel={$n$},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3363
		  x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3364
		  %ylabel={time in secs},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3365
		  enlargelimits=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3366
		  xtick={0,5,...,30},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3367
		  xmax=33,
611
Chengsong
parents: 610
diff changeset
  3368
		  %ymax=1000,
Chengsong
parents: 610
diff changeset
  3369
		  %ytick={0,100,...,1000},
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3370
		  scaled ticks=false,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3371
		  axis lines=left,
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3372
		  width=4.75cm,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3373
		  height=3.8cm, 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3374
		  legend entries={regex3},  
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3375
		  legend pos=south east,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3376
		  legend cell align=left]
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3377
		  \addplot[cyan,mark=*, mark options={fill=white}] table {regex3_size_change.data};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3378
	  \end{axis}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3379
  \end{tikzpicture}\\
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3380
  \multicolumn{3}{c}{}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3381
	\end{tabular}    
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3382
  \caption{Graphs: size change of 3 randomly generated 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3383
  regular expressions $w.r.t.$ input string length. 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3384
  The x-axis represents the length of the input.}
611
Chengsong
parents: 610
diff changeset
  3385
\end{figure}  
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3386
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3387
Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3388
original size.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
  3389
We will discuss improvements to this bound in the next chapter.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3390
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3391
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3392
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3393
\subsection{Possible Further Improvements}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3394
There are two problems with this finiteness result, though:\\
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3395
(i)	
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3396
		First, it is not yet a direct formalisation of our lexer's complexity,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3397
		as a complexity proof would require looking into 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3398
		the time it takes to execute {\bf all} the operations
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3399
		involved in the lexer (simp, collect, decode), not just the derivative.\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3400
(ii)
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3401
		Second, the bound is not yet tight, and we seek to improve $N_a$ so that
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3402
		it is polynomial on $\llbracket a \rrbracket$.\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3403
Still, we believe this contribution is useful,
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3404
because
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3405
\begin{itemize}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3406
	\item
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3407
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3408
		The size proof can serve as a starting point for a complexity
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3409
		formalisation.
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3410
		Derivatives are the most important phases of our lexer algorithm.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3411
		Size properties about derivatives cover the majority of the algorithm
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3412
		and is therefore a good indication of the complexity of the entire program.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3413
	\item
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3414
		The bound is already a strong indication that catastrophic
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3415
		backtracking is much less likely to occur in our $\blexersimp$
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3416
		algorithm.
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3417
		We refine $\blexersimp$ with $\blexerStrong$ in the next chapter
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3418
		so that we conjecture the bound becomes polynomial.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 577
diff changeset
  3419
\end{itemize}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3420
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3421
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3422
%	SECTION 4
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3423
%----------------------------------------------------------------------------------------
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3424
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3425
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3426
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3427
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3428
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3429
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3430
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3431
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3432
One might wonder about the actual bound rather than the loose bound we gave
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3433
for the convenience of a more straightforward proof.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3434
How much can the regex $r^* \backslash s$ grow? 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3435
As  earlier graphs have shown,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3436
%TODO: reference that graph where size grows quickly
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3437
they can grow at a maximum speed
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3438
exponential $w.r.t$ the number of characters, 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3439
but will eventually level off when the string $s$ is long enough.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3440
If they grow to a size exponential $w.r.t$ the original regex, our algorithm
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3441
would still be slow.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3442
And unfortunately, we have concrete examples
576
3e1b699696b6 thesis chap5
Chengsong
parents: 564
diff changeset
  3443
where such regular expressions grew exponentially large before levelling off:
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3444
\begin{center}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3445
$(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots + 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3446
(\underbrace{a \ldots a}_{\text{n a's}})^*)^*$ 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3447
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3448
will already have a maximum
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3449
size that is  exponential on the number $n$ 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3450
under our current simplification rules:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3451
%TODO: graph of a regex whose size increases exponentially.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3452
\begin{center}
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3453
	\begin{tikzpicture}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3454
		\begin{axis}[
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3455
			height=0.5\textwidth,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3456
			width=\textwidth,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3457
			xlabel=number of a's,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3458
			xtick={0,...,9},
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3459
			ylabel=maximum size,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3460
			ymode=log,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3461
			log basis y={2}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3462
			]
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3463
			\addplot[mark=*,blue] table {re-chengsong.data};
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3464
		\end{axis}
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3465
	\end{tikzpicture}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3466
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3467
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3468
For convenience we use $(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3469
to express $(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots + 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3470
(\underbrace{a \ldots a}_{\text{n a's}})^*$ in the below discussion.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3471
The exponential size is triggered by that the regex
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3472
$\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3473
inside the $(\ldots) ^*$ having exponentially many
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3474
different derivatives, despite those differences being minor.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3475
$(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*\backslash \underbrace{a \ldots a}_{\text{m a's}}$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3476
will therefore contain the following terms (after flattening out all nested 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3477
alternatives):
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3478
\begin{center}
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3479
$(\sum_{i = 1}^{n}  (\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot  (\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot (\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3480
[1mm]
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3481
	$(1 \leq m' \leq m )$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3482
\end{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
  3483
There are at least exponentially
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3484
many such terms.\footnote{To be exact, these terms are 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3485
distinct for $m' \leq L.C.M.(1, \ldots, n)$, the details are omitted,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3486
but the point is that the number is exponential.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3487
} 
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3488
With each new input character taking the derivative against the intermediate result, more and more such distinct
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3489
terms will accumulate.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3490
The function $\textit{distinctBy}$ will not be able to de-duplicate any two of these terms 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3491
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3492
$(\sum_{i = 1}^{n}  
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3493
(\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot  
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3494
(\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3495
(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3496
$(\sum_{i = 1}^{n}  (\underbrace{a \ldots a}_{\text{((i - (m'' \% i))\%i) a's}})\cdot  
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3497
(\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3498
(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3499
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3500
\noindent
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3501
where $m' \neq m''$
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3502
as they are slightly different.
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3503
This means that with our current simplification methods,
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3504
we will not be able to control the derivative so that
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3505
$\llbracket \bderssimp{r}{s} \rrbracket$ stays polynomial. %\leq O((\llbracket r\rrbacket)^c)$
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3506
These terms are similar in the sense that the head of those terms
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3507
are all consisted of sub-terms of the form: 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3508
$(\underbrace{a \ldots a}_{\text{j a's}})\cdot  (\underbrace{a \ldots a}_{\text{i a's}})^* $.
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3509
For  $\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$, there will be at most
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3510
$n * (n + 1) / 2$ such terms. 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3511
For example, $(a^* + (aa)^* + (aaa)^*) ^*$'s derivatives
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3512
can be described by 6 terms:
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3513
$a^*$, $a\cdot (aa)^*$, $ (aa)^*$, 
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3514
$aa \cdot (aaa)^*$, $a \cdot (aaa)^*$, and $(aaa)^*$.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3515
The total number of different "head terms",  $n * (n + 1) / 2$,
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3516
is proportional to the number of characters in the regex 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3517
$(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3518
If we can improve our deduplication process so that it becomes smarter
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3519
and only keep track of these $n * (n+1) /2$ terms, then we can keep
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3520
the size growth polynomial again.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3521
This example also suggests a slightly different notion of size, which we call the 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3522
alphabetic width:
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3523
\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3524
	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3525
		$\textit{awidth} \; \ZERO$ & $\dn$ & $0$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3526
		$\textit{awidth} \; \ONE$ & $\dn$ & $0$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3527
		$\textit{awidth} \; c$ & $\dn$ & $1$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3528
		$\textit{awidth} \; r_1 + r_2$ & $\dn$ & $\textit{awidth} \; 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3529
		r_1 + \textit{awidth} \; r_2$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3530
		$\textit{awidth} \; r_1 \cdot r_2$ & $\dn$ & $\textit{awidth} \;
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3531
		r_1 + \textit{awidth} \; r_2$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3532
		$\textit{awidth} \; r^*$ & $\dn$ & $\textit{awidth} \; r$\\
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3533
	\end{tabular}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3534
\end{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3535
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3536
593
83fab852d72d more chap5
Chengsong
parents: 591
diff changeset
  3537
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3538
Antimirov\parencite{Antimirov95} has proven that 
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3539
$\textit{PDER}_{UNIV}(r) \leq \textit{awidth}(r)$,
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3540
where $\textit{PDER}_{UNIV}(r)$ is a set of all possible subterms
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3541
created by doing derivatives of $r$ against all possible strings.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3542
If we can make sure that at any moment in our lexing algorithm our 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3543
intermediate result hold at most one copy of each of the 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3544
subterms then we can get the same bound as Antimirov's.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3545
This leads to the algorithm in the next chapter.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3546
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3547
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3548
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3549
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3550
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3551
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3552
%	SECTION 1
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3553
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3554
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3555
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3556
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3557
%	SUBSECTION 1
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3558
%-----------------------------------
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3559
%\subsection{Syntactic Equivalence Under $\simp$}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  3560
%We prove that minor differences can be annihilated
618
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3561
%by $\simp$.
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3562
%For example,
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3563
%\begin{center}
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3564
%	$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) = 
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3565
%	\simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$
233cf2b97d1a chapter 5 finished!!
Chengsong
parents: 614
diff changeset
  3566
%\end{center}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  3567