lexing: ChengsongTanPhdThesis/Chapters/Finite.tex@ba44144875b1 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	\chapter{Finiteness Bound} % Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4
cc54ce075db5 restructured Chengsong parents: diff changeset	5	\label{Finite}
cc54ce075db5 restructured Chengsong parents: diff changeset	6	% In Chapter 4 \ref{Chapter4} we give the second guarantee
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%of our bitcoded algorithm, that is a finite bound on the size of any
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%regex's derivatives.
660 eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	9	%(this is cahpter 5 now)
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	10
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	11	In this chapter we give a bound in terms of the size of
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 620 diff changeset	12	the calculated derivatives:
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	13	given an annotated regular expression $a$, for any string $s$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 620 diff changeset	14	our algorithm $\blexersimp$'s derivatives
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 620 diff changeset	15	are finitely bounded
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	16	by a constant that only depends on $a$.
661 71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	17	Formally we show that there exists a constant integer $N_a$ such that
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	18	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	19	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	20	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	21	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	22	where the size ($\llbracket \_ \rrbracket$) of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	23	an annotated regular expression is defined
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	24	in terms of the number of nodes in its
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	25	tree structure (its recursive definition is given in the next page).
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	26	We believe this size bound
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	27	is important in the context of POSIX lexing because
660 eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	28	\marginpar{Addressing Gerog comment: "how does this relate to backtracking?"}
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	29	\begin{itemize}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	30	\item
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	31	It is a stepping stone towards the goal
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	32	of eliminating ``catastrophic backtracking''.
660 eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	33	The derivative-based lexing algorithm avoids backtracking
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	34	by a trade-off between space and time.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	35	Backtracking algorithms
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	36	save other possibilities on a stack when exploring one possible
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	37	path of matching. Catastrophic backtracking typically occurs
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	38	when the number of steps increase exponentially with respect
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	39	to input. In other words, the runtime is $O((c_r)^n)$ of the input
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	40	string length $n$, where the base of the exponent is determined by the
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	41	regular expression $r$.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	42	%so that they
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	43	%can be traversed in the future in a DFS manner,
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	44	%different matchings are stored as sub-expressions
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	45	%in a regular expression derivative.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	46	Derivatives saves these possibilities as sub-expressions
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	47	and traverse those during future derivatives. If we denote the size
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	48	of intermediate derivatives as $S_{r,n}$ (where the subscripts
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	49	$r,n$ indicate that $S$ depends on them), then the runtime of
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	50	derivative-based approaches would be $O(S_{r,n} * n)$.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	51	We observe that if $S_{r,n}$ continously grows with $n$ (for example
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	52	growing exponentially fast), then this
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	53	is equally bad as catastrophic backtracking.
661 71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	54	Our finiteness bound seeks to find a constant integer
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	55	upper bound $C$ (which in our case is $N_a$ where $a = r^\uparrow$) of $\S_{r,n}$,
660 eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	56	so that the complexity of the algorithm can be seen as linear ($O(C * n)$).
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	57	Even if $C$ is still large in our current work, it is still a constant
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	58	rather than ever-increasing number with respect to input length $n$.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	59	More importantly this $C$ constant can potentially
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	60	be shrunken as we optimize our simplification procedure.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	61	%and showing the potential
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	62	%improvements can be by the notion of partial derivatives.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	63
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	64	%If the internal data structures used by our algorithm
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	65	%grows beyond a finite bound, then clearly
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	66	%the algorithm (which traverses these structures) will
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	67	%be slow.
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	68	%The next step is to refine the bound $N_a$ so that it
eddc4eaba7c4 addresses Gerog "N_r meaning and relation with backtracking?" comment Chengsong parents: 659 diff changeset	69	%is not just finite but polynomial in $\llbracket a\rrbracket$.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	70	\item
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	71	Having the finite bound formalised
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	72	gives us higher confidence that
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	73	our simplification algorithm $\simp$ does not ``misbehave''
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	74	like $\textit{simpSL}$ does.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	75	The bound is universal for a given regular expression,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	76	which is an advantage over work which
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 620 diff changeset	77	only gives empirical evidence on
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	78	some test cases (see for example Verbatim work \cite{Verbatimpp}).
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	79	\end{itemize}
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	80	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	81	We then extend our $\blexersimp$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	82	to support bounded repetitions ($r^{\{n\}}$).
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	83	We update our formalisation of
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	84	the correctness and finiteness properties to
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	85	include this new construct.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	86	We show that we can out-compete other verified lexers such as
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	87	Verbatim++ on bounded regular expressions.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	88
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	89	In the next section we describe in more detail
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	90	what the finite bound means in our algorithm
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	91	and why the size of the internal data structures of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	92	a typical derivative-based lexer such as
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	93	Sulzmann and Lu's needs formal treatment.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	94
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	95
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	96
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	97
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	98	\section{Formalising Size Bound of Derivatives}
577 f47fc4840579 thesis chap2 Chengsong parents: 576 diff changeset	99	\noindent
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	100	In our lexer ($\blexersimp$),
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	101	we take an annotated regular expression as input,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	102	and repeately take derivative of and simplify it.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	103	\begin{figure}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	104	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	105	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	106	$\llbracket _{bs}\ONE \rrbracket$ & $\dn$ & $1$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	107	$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	108	$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	109	$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	110	$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as + 1$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	111	$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	112	\end{tabular}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	113	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	114	\caption{The size function of bitcoded regular expressions}\label{brexpSize}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	115	\end{figure}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	116
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	117	\begin{figure}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	118	\begin{tikzpicture}[scale=2,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	119	every node/.style={minimum size=11mm},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	120	->,>=stealth',shorten >=1pt,auto,thick
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	121	]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	122	\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	123	\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	124	\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	125
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	126	\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=6mm]{$a_{1s}$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	127	\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	128
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	129	\node (r2) [rectangle, draw=black, thick, right=of r1s, minimum size = 12mm]{$a_2$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	130	\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	131
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	132	\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=6mm]{$a_{2s}$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	133	\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	134
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	135	\node (rns) [rectangle, draw = blue, thick, right=of r2s,minimum size=6mm]{$a_{ns}$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	136	\draw[->,line width=0.2mm, dashed](r2s)--(rns) node[above,midway] {$\backslash \ldots$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	137
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	138	\node (v) [circle, thick, draw, right=of rns, minimum size=6mm, right=1.7cm]{$v$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	139	\draw[->, line width=0.2mm](rns)--(v) node[above, midway] {\bmkeps} node [below, midway] {\decode};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	140	\end{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	141	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{simpShrinks}
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	142	\end{figure}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	143
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	144	\noindent
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	145	Each time
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	146	a derivative is taken, the regular expression might grow.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	147	However, the simplification that is immediately afterwards will often shrink it so that
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	148	the overall size of the derivatives stays relatively small.
577 f47fc4840579 thesis chap2 Chengsong parents: 576 diff changeset	149	This intuition is depicted by the relative size
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	150	change between the black and blue nodes:
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	151	After $\simp$ the node shrinks.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	152	Our proof states that all the blue nodes
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	153	stay below a size bound $N_a$ determined by the input $a$.
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	154
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	155	\noindent
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	156	Sulzmann and Lu's assumed a similar picture of their algorithm,
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	157	though in fact their algorithm's size might be better depicted by the following graph:
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	158	\begin{figure}[H]
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	159	\begin{tikzpicture}[scale=2,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	160	every node/.style={minimum size=11mm},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	161	->,>=stealth',shorten >=1pt,auto,thick
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	162	]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	163	\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	164	\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	165	\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	166
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	167	\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	168	\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	169
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	170	\node (r2) [rectangle, draw=black, thick, right=of r1s, minimum size = 17mm]{$a_2$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	171	\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	172
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	173	\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	174	\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	175
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	176	\node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	177	\draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	178
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	179	\node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	180	\draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	181
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	182	\node (rnn) [right = of rns, minimum size = 1mm]{};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	183	\draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$};
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	184
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	185	\end{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	186	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	187	\end{figure}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	188	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	189	The picture means that in some cases their lexer (where they use $\simpsulz$
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	190	as the simplification function)
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	191	will have a size explosion, causing the running time
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	192	of each derivative step to grow continuously (for example
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	193	in \ref{SulzmannLuLexerTime}).
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	194	They tested out the run time of their
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	195	lexer on particular examples such as $(a+b+ab)^*$
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	196	and claimed that their algorithm is linear w.r.t to the input.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	197	With our mechanised proof, we avoid this type of unintentional
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	198	generalisation.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	199
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	200	Before delving into the details of the formalisation,
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	201	we are going to provide an overview of it in the following subsection.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	202
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	203
577 f47fc4840579 thesis chap2 Chengsong parents: 576 diff changeset	204	\subsection{Overview of the Proof}
661 71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	205	\marginpar{trying to make it more intuitive
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	206	and provide more insights into proof}
663 0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	207	The most important idea in this chapter %intuition
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	208	is what we call the "closed forms" of
661 71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	209	regular expression derivatives with respect to strings.
663 0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	210	In short it allows us to express $r \backslash_{rsimps} s$
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	211	as a different recursive function so induction on the size bound can go through.
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	212	A simple induction on $s$ or $r$ fails for $r\backslash_{rsimps} s$, but
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	213	works for $\textit{ClosedForm}(r,s)$.
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	214
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	215
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	216
661 71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	217	Assume we have a regular expression $r$, be it an alternative,
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	218	a sequence or a star, the idea is if we try to take several derivatives
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	219	of it on paper, we end up getting a list of subexpressions,
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	220	something like
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	221	%omitting certain
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	222	%nested structures of those expressions:
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	223	\[
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	224	r\backslash s = r_1 + r_2 + r_3 + \ldots + r_n,
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	225	\]
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	226	if we omit the way these regular expressions need to be nested.
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	227	where each $r_i$ ($i \in \{1, \ldots, n\}$) is related to some fragments
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	228	of $r$ and $s$.
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	229	The second important observation is that the list %of regular expressions
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	230	$[r_1, \ldots, r_n]$ %is not
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	231	cannot grow indefinitely because they all come from $r$, and derivatives
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	232	of the same regular expression are finite up to some isomorphisms.
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	233	We prove that the simplifications of $\blexersimp$ %make use of
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	234	is powerful enough to counteract the effect of nested structure of alternatives
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	235	and eliminate duplicates
71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	236	such that indeed the list in $a\backslash s$ does not grow unbounded.
663 0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	237	We call the precise formalisation for the shape of
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	238	\[
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	239	r_1 + r_2 + r_3 + \ldots + r_n
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	240	\]
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	241	"closed form".
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	242	The name was chosen because turning the recursive relation
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	243	\[
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	244	a \backslash_{bsimps} (c\!::\!s) \dn (\textit{bsimp} \; (a\backslash c)) \backslash_{bsimps} s
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	245	\]
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	246	into some easier-to-estimate forms
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	247	like
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	248	\[
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	249	\sum (a_1\backslash s \cdot a_2) :: (\map \; (a_2\backslash\_) \;
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	250	(\textit{Suffix} \; s \; a_1))
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	251	%\backslash_{bsimp
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	252	\]
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	253	was reminiscent of
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	254	%similar to t
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	255	solving recurrence relations like
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	256	$T \; n = 2 (T \frac{1}{2} n) + n$ to obtain
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	257	their closed forms.
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	258	%$T \; n = n \ln n + (s \; n)$ ($s \; n$ is
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	259	%some higher-order terms).
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	260	%(for example we know $T$ is $\Theta (n \ln n)$).
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	261	Just like a closed form of a recursive definition makes estimating
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	262	their growth possible, the closed
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	263	form of $a \backslash_{bsimps} s$
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	264	allows us to prove the existence of a size bound.
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	265	Note that \ref{eq:approx} is only an approximate
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	266	term to show our point.
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	267	The precise formalised formula (\ref{seqClosedForm})
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	268	needs to wait until all $\textit{rrexp}$-related
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	269	definitions are given,
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	270	%but for now we can think of the above as "the sequence
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	271	%regular expression $a_1 \cdot a_2$ after derivatives and simplifications
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	272	%w.r.t string $s$ looks like
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	273	%an alternative of giant list of sub-expressions, where each
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	274
0d1e68268d0f more explanation for the name "closed form" and their intuition Chengsong parents: 662 diff changeset	275
661 71502e4d8691 overview of finiteness proof Gerog comment "not helpful", adding more intuitions of "closed forms" Chengsong parents: 660 diff changeset	276
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	277	A high-level overview of the main components of the finiteness proof
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	278	is as follows:
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	279	\begin{figure}[H]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	280	\begin{tikzpicture}[scale=1,font=\bf,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	281	node/.style={
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	282	rectangle,rounded corners=3mm,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	283	ultra thick,draw=black!50,minimum height=18mm,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	284	minimum width=20mm,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	285	top color=white,bottom color=black!20}]
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	286
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	287
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	288	\node (0) at (-5,0)
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	289	[node, text width=1.8cm, text centered]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	290	{$\llbracket \bderssimp{a}{s} \rrbracket$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	291	\node (A) at (0,0)
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	292	[node,text width=1.6cm, text centered]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	293	{$\llbracket \rderssimp{r}{s} \rrbracket_r$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	294	\node (B) at (3,0)
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	295	[node,text width=3.0cm, anchor=west, minimum width = 40mm]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	296	{$\llbracket \textit{ClosedForm}(r, s)\rrbracket_r$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	297	\node (C) at (9.5,0) [node, minimum width=10mm] {$N_r$};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	298
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	299	\draw [->,line width=0.5mm] (0) --
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	300	node [above,pos=0.45] {=} (A) node [below, pos = 0.45] {$(r = a \downarrow_r)$} (A);
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	301	\draw [->,line width=0.5mm] (A) --
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	302	node [above,pos=0.35] {$\quad =\ldots=$} (B);
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	303	\draw [->,line width=0.5mm] (B) --
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	304	node [above,pos=0.35] {$\quad \leq \ldots \leq$} (C);
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	305	\end{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	306	%\caption{
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	307	\end{figure}
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	308	\noindent
577 f47fc4840579 thesis chap2 Chengsong parents: 576 diff changeset	309	We explain the steps one by one:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	310	\begin{itemize}
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	311	\item
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	312	We first introduce the operations such as
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	313	derivatives, simplification, size calculation, etc.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	314	associated with $\rrexp$s, which we have introduced
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	315	in chapter \ref{Bitcoded2}. As promised we will discuss
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	316	why they are needed in \ref{whyRerase}.
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	317	The operations on $\rrexp$s are identical to those on
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	318	annotated regular expressions except that they dispense with
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	319	bitcodes. This means that all proofs about size of $\rrexp$s will apply to
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	320	annotated regular expressions, because the size of a regular
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	321	expression is independent of the bitcodes.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	322	\item
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	323	We prove that $\rderssimp{r}{s} = \textit{ClosedForm}(r, s)$,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	324	where $\textit{ClosedForm}(r, s)$ is entirely
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	325	given as the derivatives of their children regular
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	326	expressions.
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	327	We call the right-hand-side the \emph{Closed Form}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	328	of the derivative $\rderssimp{r}{s}$.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	329	\item
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	330	Formally we give an estimate of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	331	$\llbracket \textit{ClosedForm}(r, s) \rrbracket_r$.
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	332	The key observation is that $\distinctBy$'s output is
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	333	a list with a constant length bound.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	334	\end{itemize}
594 62f8fa03863e more Chengsong parents: 593 diff changeset	335	We will expand on these steps in the next sections.\\
532 cc54ce075db5 restructured Chengsong parents: diff changeset	336
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	337	\section{The $\textit{Rrexp}$ Datatype}
594 62f8fa03863e more Chengsong parents: 593 diff changeset	338	The first step is to define
62f8fa03863e more Chengsong parents: 593 diff changeset	339	$\textit{rrexp}$s.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	340	They are annotated regular expressions without bitcodes,
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	341	allowing a more convenient size bound proof.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	342	%Of course, the bits which encode the lexing information
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	343	%would grow linearly with respect
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	344	%to the input, which should be taken into accounte when we wish to tackle the runtime complexity.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	345	%But for the sake of the structural size
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	346	%we can safely ignore them.\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	347	The datatype
594 62f8fa03863e more Chengsong parents: 593 diff changeset	348	definition of the $\rrexp$, called
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	349	\emph{r-regular expressions},
594 62f8fa03863e more Chengsong parents: 593 diff changeset	350	was initially defined in \ref{rrexpDef}.
62f8fa03863e more Chengsong parents: 593 diff changeset	351	The reason for the prefix $r$ is
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	352	to make a distinction
594 62f8fa03863e more Chengsong parents: 593 diff changeset	353	with basic regular expressions.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	354	We give here again the definition of $\rrexp$.
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	355	\[ \rrexp ::= \RZERO \mid \RONE
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	356	\mid \RCHAR{c}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	357	\mid \RSEQ{r_1}{r_2}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	358	\mid \RALTS{rs}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	359	\mid \RSTAR{r}
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	360	\]
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	361	The size of an r-regular expression is
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	362	written $\llbracket r\rrbracket_r$,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	363	whose definition mirrors that of an annotated regular expression.
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	364	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	365	\begin{tabular}{lcl}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	366	$\llbracket _{bs}\ONE \rrbracket_r$ & $\dn$ & $1$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	367	$\llbracket \ZERO \rrbracket_r$ & $\dn$ & $1$ \\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	368	$\llbracket _{bs} r_1 \cdot r_2 \rrbracket_r$ & $\dn$ & $\llbracket r_1 \rrbracket_r + \llbracket r_2 \rrbracket_r + 1$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	369	$\llbracket _{bs}\mathbf{c} \rrbracket_r $ & $\dn$ & $1$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	370	$\llbracket _{bs}\sum as \rrbracket_r $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket_r)\; as + 1$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	371	$\llbracket _{bs} a^* \rrbracket_r $ & $\dn$ & $\llbracket a \rrbracket_r + 1$.
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	372	\end{tabular}
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	373	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	374	\noindent
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	375	The $r$ in the subscript of $\llbracket \rrbracket_r$ is to
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	376	differentiate with the same operation for annotated regular expressions.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	377	Similar subscripts will be added for operations like $\rerase{}$:
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	378	\begin{center}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	379	\begin{tabular}{lcl}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	380	$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	381	$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	382	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	383	$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	384	$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	385	$\rerase{_{bs} a ^}$ & $\dn$ & $\rerase{a} ^$
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	386	\end{tabular}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	387	\end{center}
594 62f8fa03863e more Chengsong parents: 593 diff changeset	388
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	389	\subsection{Why a New Datatype?}\label{whyRerase}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	390	\marginpar{\em added label so this section can be referenced by other parts of the thesis
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	391	so that interested readers can jump to/be reassured that there will explanations.}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	392	Originally the erase operation $(\_)_\downarrow$ was
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	393	used by Ausaf et al. in their proofs related to $\blexer$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	394	This function was not part of the lexing algorithm, and the sole purpose was to
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	395	bridge the gap between the $r$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	396	%$\textit{rexp}$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	397	(un-annotated) and $\textit{arexp}$ (annotated)
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	398	regular expression datatypes so as to leverage the correctness
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	399	theorem of $\lexer$.%to establish the correctness of $\blexer$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	400	For example, lemma \ref{retrieveStepwise} %and \ref{bmkepsRetrieve}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	401	uses $\erase$ to convert an annotated regular expression $a$ into
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	402	a plain one so that it can be used by $\inj$ to create the desired value
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	403	$\inj\; (a)_\downarrow \; c \; v$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	404
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	405	Ideally $\erase$ should only remove the auxiliary information not related to the
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	406	structure--the
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	407	bitcodes. However there exists a complication
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	408	where the alternative constructors have different arity for $\textit{arexp}$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	409	and $\textit{r}$:
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	410	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	411	\begin{tabular}{lcl}
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	412	$\textit{r}$ & $::=$ & $\ldots \;\|\; (\_ + \_) \; ::\; "\textit{r} \Rightarrow \textit{r} \Rightarrow \textit{r}" \| \ldots$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	413	$\textit{arexp}$ & $::=$ & $\ldots\; \|\; (\Sigma \_ ) \; ::\; "\textit{arexp} \; list \Rightarrow \textit{arexp}" \| \ldots$
594 62f8fa03863e more Chengsong parents: 593 diff changeset	414	\end{tabular}
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	415	\end{center}
594 62f8fa03863e more Chengsong parents: 593 diff changeset	416	\noindent
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	417	To convert between the two
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	418	$\erase$ has to recursively disassemble a list into nested binary applications of the
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	419	$(\_ + \_)$ operator,
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	420	handling corner cases like empty or
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	421	singleton alternative lists:
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	422	%becomes $r$ during the
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	423	%$\erase$ function.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	424	%The annotated regular expression $\sum[a, b, c]$ would turn into
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	425	%$(a+(b+c))$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	426	\begin{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	427	\begin{tabular}{lcl}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	428	$ (_{bs}\sum [])_\downarrow $ & $\dn$ & $\ZERO$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	429	$ (_{bs}\sum [a])_\downarrow$ & $\dn$ & $a$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	430	$ (_{bs}\sum a_1 :: a_2)_\downarrow$ & $\dn$ & $(a_1)_\downarrow + (a_2)_\downarrow)$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	431	$ (_{bs}\sum a :: as)_\downarrow$ & $\dn$ & $a_\downarrow + (\erase \; _{[]} \sum as)$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	432	\end{tabular}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	433	\end{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	434	\noindent
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	435	These operations inevitably change the structure and size of
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	436	an annotated regular expression. For example,
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	437	$a_1 = \sum _{Z}[x]$ has size 2, but $(a_1)_\downarrow = x$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	438	only has size 1.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	439	%adding unnecessary
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	440	%complexities to the size bound proof.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	441	%The reason we
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	442	%define a new datatype is that
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	443	%the $\erase$ function
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	444	%does not preserve the structure of annotated
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	445	%regular expressions.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	446	%We initially started by using
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	447	%plain regular expressions and tried to prove
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	448	%lemma \ref{rsizeAsize},
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	449	%however the $\erase$ function messes with the structure of the
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	450	%annotated regular expression.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	451	%The $+$ constructor
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	452	%of basic regular expressions is only binary, whereas $\sum$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	453	%takes a list. Therefore we need to convert between
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	454	%annotated and normal regular expressions as follows:
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	455	For example, if we define the size of a basic plain regular expression
594 62f8fa03863e more Chengsong parents: 593 diff changeset	456	in the usual way,
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	457	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	458	\begin{tabular}{lcl}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	459	$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	460	$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	461	$\llbracket r_1 + r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	462	$\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	463	$\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	464	$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	465	\end{tabular}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	466	\end{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	467	\noindent
594 62f8fa03863e more Chengsong parents: 593 diff changeset	468	Then the property
532 cc54ce075db5 restructured Chengsong parents: diff changeset	469	\begin{center}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	470	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	471	\end{center}
594 62f8fa03863e more Chengsong parents: 593 diff changeset	472	does not hold.
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	473	%With $\textit{rerase}$, however,
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	474	%only the bitcodes are thrown away.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	475	That leads to us defining the new regular expression datatype without
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	476	bitcodes but with a list alternative constructor, and defining a new erase function
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	477	in a strictly structure-preserving manner:
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	478	\begin{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	479	\begin{tabular}{lcl}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	480	$\textit{rrexp}$ & $::=$ & $\ldots\; \|\; (\sum \_ ) \; ::\; "\textit{rrexp} \; list \Rightarrow \textit{rrexp}" \| \ldots$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	481	$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	482	\end{tabular}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	483	\end{center}
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	484	\noindent
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	485	%But
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	486	%Everything about the structure remains intact.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	487	%Therefore it does not change the size
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	488	%of an annotated regular expression and we have:
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	489	\noindent
594 62f8fa03863e more Chengsong parents: 593 diff changeset	490	One might be able to prove an inequality such as
62f8fa03863e more Chengsong parents: 593 diff changeset	491	$\llbracket a \rrbracket \leq \llbracket a_\downarrow \rrbracket_p $
62f8fa03863e more Chengsong parents: 593 diff changeset	492	and then estimate $\llbracket a_\downarrow \rrbracket_p$,
62f8fa03863e more Chengsong parents: 593 diff changeset	493	but we found our approach more straightforward.\\
532 cc54ce075db5 restructured Chengsong parents: diff changeset	494
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	495	\subsection{Functions for R-regular Expressions}
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	496	The downside of our approach is that we need to redefine
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	497	several functions for $\rrexp$.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	498	In this section we shall define the r-regular expression version
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	499	of $\bder$, and $\textit{bsimp}$ related functions.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	500	We use $r$ as the prefix or subscript to differentiate
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	501	with the bitcoded version.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	502	%For example,$\backslash_r$, $\rdistincts$, and $\rsimp$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	503	%as opposed to $\backslash$, $\distinctBy$, and $\bsimp$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	504	%As promised, they are much simpler than their bitcoded counterparts.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	505	%The operations on r-regular expressions are
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	506	%almost identical to those of the annotated regular expressions,
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	507	%except that no bitcodes are used. For example,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	508	The derivative operation for an r-regular expression is\\
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	509	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	510	\begin{tabular}{@{}lcl@{}}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	511	$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	512	$(\ONE)\,\backslash_r c$ & $\dn$ &
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	513	$\textit{if}\;c=d\; \;\textit{then}\;
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	514	\ONE\;\textit{else}\;\ZERO$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	515	$(\sum \;\textit{rs})\,\backslash_r c$ & $\dn$ &
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	516	$\sum\;(\textit{map} \; (\_\backslash_r c) \; rs )$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	517	$(r_1\cdot r_2)\,\backslash_r c$ & $\dn$ &
594 62f8fa03863e more Chengsong parents: 593 diff changeset	518	$\textit{if}\;(\textit{rnullable}\,r_1)$\\
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	519	& &$\textit{then}\;\sum\,[(r_1\,\backslash_r c)\cdot\,r_2,$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	520	& &$\phantom{\textit{then},\;\sum\,}((r_2\,\backslash_r c))]$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	521	& &$\textit{else}\;\,(r_1\,\backslash_r c)\cdot r_2$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	522	$(r^*)\,\backslash_r c$ & $\dn$ &
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	523	$( r\,\backslash_r c)\cdot
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	524	(_{[]}r^*))$
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	525	\end{tabular}
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	526	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	527	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	528	where we omit the definition of $\textit{rnullable}$.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	529	The generalisation from the derivatives w.r.t a character to
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	530	derivatives w.r.t strings is given as
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	531	\begin{center}
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	532	\begin{tabular}{lcl}
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	533	$r \backslash_{rs} []$ & $\dn$ & $r$\\
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	534	$r \backslash_{rs} c::s$ & $\dn$ & $(r\backslash_r c) \backslash_{rs} s$
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	535	\end{tabular}
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	536	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	537
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	538	The function $\distinctBy$ for r-regular expressions does not need
594 62f8fa03863e more Chengsong parents: 593 diff changeset	539	a function checking equivalence because
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	540	there are no bit annotations.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	541	Therefore we have
532 cc54ce075db5 restructured Chengsong parents: diff changeset	542	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	543	\begin{tabular}{lcl}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	544	$\rdistinct{[]}{rset} $ & $\dn$ & $[]$\\
594 62f8fa03863e more Chengsong parents: 593 diff changeset	545	$\rdistinct{r :: rs}{rset}$ & $\dn$ &
62f8fa03863e more Chengsong parents: 593 diff changeset	546	$\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\
62f8fa03863e more Chengsong parents: 593 diff changeset	547	& & $\textit{else}\; \;
62f8fa03863e more Chengsong parents: 593 diff changeset	548	r::\rdistinct{rs}{(rset \cup \{r\})}$
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	549	\end{tabular}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	550	\end{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	551	%TODO: definition of rsimp (maybe only the alternative clause)
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	552	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	553	%We would like to make clear
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	554	%a difference between our $\rdistincts$ and
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	555	%the Isabelle $\textit {distinct}$ predicate.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	556	%In Isabelle $\textit{distinct}$ is a function that returns a boolean
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	557	%rather than a list.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	558	%It tests if all the elements of a list are unique.\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	559	With $\textit{rdistinct}$ in place,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	560	the flatten function for $\rrexp$ is as follows:
595 fa92124d1fb7 more Chengsong parents: 594 diff changeset	561	\begin{center}
fa92124d1fb7 more Chengsong parents: 594 diff changeset	562	\begin{tabular}{@{}lcl@{}}
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	563	$\textit{rflts} \; (\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $as \; @ \; \textit{rflts} \; as' $ \\
595 fa92124d1fb7 more Chengsong parents: 594 diff changeset	564	$\textit{rflts} \; \ZERO :: as'$ & $\dn$ & $ \textit{rflts} \; \textit{as'} $ \\
fa92124d1fb7 more Chengsong parents: 594 diff changeset	565	$\textit{rflts} \; a :: as'$ & $\dn$ & $a :: \textit{rflts} \; \textit{as'}$ \quad(otherwise)
fa92124d1fb7 more Chengsong parents: 594 diff changeset	566	\end{tabular}
fa92124d1fb7 more Chengsong parents: 594 diff changeset	567	\end{center}
fa92124d1fb7 more Chengsong parents: 594 diff changeset	568	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	569	The function
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	570	$\rsimpalts$ corresponds to $\textit{bsimp}_{ALTS}$:
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	571	\begin{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	572	\begin{tabular}{@{}lcl@{}}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	573	$\rsimpalts \;\; nil$ & $\dn$ & $\RZERO$\\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	574	$\rsimpalts \;\; r::nil$ & $\dn$ & $r$\\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	575	$\rsimpalts \;\; rs$ & $\dn$ & $\sum rs$\\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	576	\end{tabular}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	577	\end{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	578	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	579	Similarly, we have $\rsimpseq$ which corresponds to $\textit{bsimp}_{SEQ}$:
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	580	\begin{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	581	\begin{tabular}{@{}lcl@{}}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	582	$\rsimpseq \;\; \RZERO \; \_ $ & $=$ & $\RZERO$\\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	583	$\rsimpseq \;\; \_ \; \RZERO $ & $=$ & $\RZERO$\\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	584	$\rsimpseq \;\; \RONE \cdot r_2$ & $\dn$ & $r_2$\\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	585	$\rsimpseq \;\; r_1 r_2$ & $\dn$ & $r_1 \cdot r_2$\\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	586	\end{tabular}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	587	\end{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	588	and get $\textit{rsimp}$ and $\rderssimp{\_}{\_}$:
595 fa92124d1fb7 more Chengsong parents: 594 diff changeset	589	\begin{center}
fa92124d1fb7 more Chengsong parents: 594 diff changeset	590	\begin{tabular}{@{}lcl@{}}
fa92124d1fb7 more Chengsong parents: 594 diff changeset	591
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	592	$\textit{rsimp} \; (r_1\cdot r_2)$ & $\dn$ & $ \textit{rsimp}_{SEQ} \; bs \;(\textit{rsimp} \; r_1) \; (\textit{rsimp} \; r_2) $ \\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	593	$\textit{rsimp} \; (_{bs}\sum \textit{rs})$ & $\dn$ & $\textit{rsimp}_{ALTS} \; \textit{bs} \; (\textit{rdistinct} \; ( \textit{rflts} ( \textit{map} \; rsimp \; rs)) \; \rerases \; \varnothing) $ \\
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	594	$\textit{rsimp} \; r$ & $\dn$ & $\textit{r} \qquad \textit{otherwise}$
595 fa92124d1fb7 more Chengsong parents: 594 diff changeset	595	\end{tabular}
fa92124d1fb7 more Chengsong parents: 594 diff changeset	596	\end{center}
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	597	\begin{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	598	\begin{tabular}{@{}lcl@{}}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	599	$r\backslash_{rsimp} \, c$ & $\dn$ & $\rsimp \; (r\backslash_r \, c)$
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	600	\end{tabular}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	601	\end{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	602
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	603	\begin{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	604	\begin{tabular}{@{}lcl@{}}
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	605	$r \backslash_{rsimps} \; \; c\!::\!s $ & $\dn$ & $(r \backslash_{rsimp}\, c) \backslash_{rsimps}\, s$ \\
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	606	$r \backslash_{rsimps} [\,] $ & $\dn$ & $r$
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	607	\end{tabular}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	608	\end{center}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	609	\noindent
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	610	We do not define an r-regular expression version of $\blexersimp$,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	611	as our proof does not depend on it.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	612	Now we are ready to introduce how r-regular expressions allow
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	613	us to prove the size bound on bitcoded regular expressions.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	614
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	615	\subsection{Using R-regular Expressions to Bound Bit-coded Regular Expressions}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	616	Everything about the size of annotated regular expressions after the application
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	617	of function $\bsimp$ and $\backslash_{simps}$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	618	can be calculated via the size of r-regular expressions after the application
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	619	of $\rsimp$ and $\backslash_{rsimps}$:
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	620	\begin{lemma}\label{sizeRelations}
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	621	The following equalities hold:
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	622	\begin{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	623	\item
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	624	$\rsize{\rerase a} = \asize a$
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	625	\item
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	626	$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	627	\item
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	628	$\asize{\bderssimp{a}{s}} = \rsize{\rderssimp{\rerase{a}}{s}}$
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	629	\end{itemize}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	630	\end{lemma}
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	631	\begin{proof}
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	632	First part follows from the definition of $(\_)_{\downarrow_r}$.
2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	633	The second part is by induction on the inductive cases
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	634	of $\textit{bsimp}$.
659 2e05f04ed6b3 Addressed Gerog "can't understand 'erase messes with structure'" comment Chengsong parents: 640 diff changeset	635	The third part is by induction on the string $s$,
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	636	where the inductive step follows from part one.
ce4e5151a836 more Chengsong parents: 596 diff changeset	637	\end{proof}
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	638	\noindent
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	639	With lemma \ref{sizeRelations},
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	640	we will be able to focus on
ce4e5151a836 more Chengsong parents: 596 diff changeset	641	estimating only
ce4e5151a836 more Chengsong parents: 596 diff changeset	642	$\rsize{\rderssimp{\rerase{a}}{s}}$
ce4e5151a836 more Chengsong parents: 596 diff changeset	643	in later parts because
ce4e5151a836 more Chengsong parents: 596 diff changeset	644	\begin{center}
ce4e5151a836 more Chengsong parents: 596 diff changeset	645	$\rsize{\rderssimp{\rerase{a}}{s}} \leq N_r \quad$
ce4e5151a836 more Chengsong parents: 596 diff changeset	646	implies
ce4e5151a836 more Chengsong parents: 596 diff changeset	647	$\quad \llbracket a \backslash_{bsimps} s \rrbracket \leq N_r$.
ce4e5151a836 more Chengsong parents: 596 diff changeset	648	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	649	%From now on we
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	650	%Unless stated otherwise in the rest of this
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	651	%chapter all regular expressions without
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	652	%bitcodes are seen as r-regular expressions ($\rrexp$s).
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	653	%For the binary alternative r-regular expression $\RALTS{[r_1, r_2]}$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	654	%we use the notation $r_1 + r_2$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	655	%for brevity.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	656
cc54ce075db5 restructured Chengsong parents: diff changeset	657
cc54ce075db5 restructured Chengsong parents: diff changeset	658	%-----------------------------------
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	659	% SUB SECTION ROADMAP RREXP BOUND
532 cc54ce075db5 restructured Chengsong parents: diff changeset	660	%-----------------------------------
553 0f00d440f484 more changes Chengsong parents: 543 diff changeset	661
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	662	%\subsection{Roadmap to a Bound for $\textit{Rrexp}$}
553 0f00d440f484 more changes Chengsong parents: 543 diff changeset	663
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	664	%The way we obtain the bound for $\rrexp$s is by two steps:
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	665	%\begin{itemize}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	666	% \item
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	667	% First, we rewrite $r\backslash s$ into something else that is easier
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	668	% to bound. This step is crucial for the inductive case
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	669	% $r_1 \cdot r_2$ and $r^*$, where the derivative can grow and bloat in a wild way,
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	670	% but after simplification, they will always be equal or smaller to a form consisting of an alternative
596 b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	671	% list of regular expressions $f \; (g\; (\sum rs))$ with some functions applied to it, where each element will be distinct after the function application.
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	672	% \item
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	673	% Then, for such a sum list of regular expressions $f\; (g\; (\sum rs))$, we can control its size
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	674	% by estimation, since $\distinctBy$ and $\flts$ are well-behaved and working together would only
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	675	% reduce the size of a regular expression, not adding to it.
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	676	%\end{itemize}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	677	%
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	678	%\section{Step One: Closed Forms}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	679	%We transform the function application $\rderssimp{r}{s}$
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	680	%into an equivalent
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	681	%form $f\; (g \; (\sum rs))$.
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	682	%The functions $f$ and $g$ can be anything from $\flts$, $\distinctBy$ and other helper functions from $\bsimp{\_}$.
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	683	%This way we get a different but equivalent way of expressing : $r\backslash s = f \; (g\; (\sum rs))$, we call the
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	684	%right hand side the "closed form" of $r\backslash s$.
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	685	%
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	686	%\begin{quote}\it
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	687	% Claim: For regular expressions $r_1 \cdot r_2$, we claim that
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	688	%\end{quote}
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	689	%\noindent
b306628a0eab more chap 56 Chengsong parents: 595 diff changeset	690	%We explain in detail how we reached those claims.
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	691	If we attempt to prove
ce4e5151a836 more Chengsong parents: 596 diff changeset	692	\begin{center}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	693	$\forall r. \; \exists N_r.\;\; s.t. \llbracket r\backslash_{rsimps} s \rrbracket_r \leq N_r$
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	694	\end{center}
ce4e5151a836 more Chengsong parents: 596 diff changeset	695	using a naive induction on the structure of $r$,
ce4e5151a836 more Chengsong parents: 596 diff changeset	696	then we are stuck at the inductive cases such as
ce4e5151a836 more Chengsong parents: 596 diff changeset	697	$r_1\cdot r_2$.
ce4e5151a836 more Chengsong parents: 596 diff changeset	698	The inductive hypotheses are:
ce4e5151a836 more Chengsong parents: 596 diff changeset	699	\begin{center}
ce4e5151a836 more Chengsong parents: 596 diff changeset	700	1: $\text{for } r_1, \text{there exists } N_{r_1}.\;\; s.t.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	701	\;\;\forall s. \llbracket r_1 \backslash_{rsimps} s \rrbracket_r \leq N_{r_1}. $\\
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	702	2: $\text{for } r_2, \text{there exists } N_{r_2}.\;\; s.t.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	703	\;\; \forall s. \llbracket r_2 \backslash_{rsimps} s \rrbracket_r \leq N_{r_2}. $
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	704	\end{center}
ce4e5151a836 more Chengsong parents: 596 diff changeset	705	The inductive step to prove would be
ce4e5151a836 more Chengsong parents: 596 diff changeset	706	\begin{center}
ce4e5151a836 more Chengsong parents: 596 diff changeset	707	$\text{there exists } N_{r_1\cdot r_2}. \;\; s.t. \forall s.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	708	\llbracket (r_1 \cdot r_2) \backslash_{rsimps} s \rrbracket_r \leq N_{r_1\cdot r_2}.$
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	709	\end{center}
ce4e5151a836 more Chengsong parents: 596 diff changeset	710	The problem is that it is not clear what
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	711	$(r_1\cdot r_2) \backslash_{rsimps} s$ looks like,
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	712	and therefore $N_{r_1}$ and $N_{r_2}$ in the
ce4e5151a836 more Chengsong parents: 596 diff changeset	713	inductive hypotheses cannot be directly used.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	714	%We have already seen that $(r_1 \cdot r_2)\backslash s$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	715	%and $(r^*)\backslash s$ can grow in a wild way.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	716
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	717	The point however, is that they will be equivalent to a list of
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	718	terms $\sum rs$, where each term in $rs$ will
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	719	be made of $r_1 \backslash s' $, $r_2\backslash s'$,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	720	and $r \backslash s'$ with $s' \in \textit{SubString} \; s$ (which stands
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	721	for the set of substrings of $s$).
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	722	The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	723	in the simplification, which prevents the $rs$ from growing indefinitely.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	724
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	725	Based on this idea, we develop a proof in two steps.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	726	First, we show the below equality (where
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	727	$f$ and $g$ are functions that do not increase the size of the input)
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	728	\begin{center}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	729	$r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	730	\end{center}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	731	where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	732	For example, for $r_1 \cdot r_2$ we have the equality as
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	733	\begin{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	734	$ \rderssimp{r_1 \cdot r_2}{s} =
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	735	\rsimp{(\sum (r_1 \backslash s \cdot r_2 ) \; :: \;(\map \; \rderssimp{r_2}{\_} \;(\vsuf{s}{r_1})))}$
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	736	\end{center}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	737	We call the right-hand-side the
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	738	\emph{Closed Form} of $(r_1 \cdot r_2)\backslash_{rsimps} s$.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	739	Second, we will bound the closed form of r-regular expressions
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	740	using some estimation techniques
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	741	and then apply
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	742	lemma \ref{sizeRelations} to show that the bitcoded regular expressions
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	743	in our $\blexersimp$ are finitely bounded.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	744
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	745	We will describe in detail the first step of the proof
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	746	in the next section.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	747
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	748	\section{Closed Forms}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	749	In this section we introduce in detail
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	750	how to express the string derivatives
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	751	of regular expressions (i.e. $r \backslash_r s$ where $s$ is a string
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	752	rather than a single character) in a different way than
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	753	our previous definition.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	754	In previous chapters, the derivative of a
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	755	regular expression $r$ w.r.t a string $s$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	756	was recursively defined on the string:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	757	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	758	$r \backslash_s (c::s) \dn (r \backslash c) \backslash_s s$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	759	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	760	The problem is that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	761	this definition does not provide much information
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	762	on what $r \backslash_s s$ looks like.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	763	If we are interested in the size of a derivative like
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	764	$(r_1 \cdot r_2)\backslash s$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	765	we have to somehow get a more concrete form to begin.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	766	We call such more concrete representations the ``closed forms'' of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	767	string derivatives as opposed to their original definitions.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	768	The terminology ``closed form'' is borrowed from mathematics,
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	769	which usually describe expressions that are solely comprised of finitely many
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	770	well-known and easy-to-compute operations such as
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	771	additions, multiplications, and exponential functions.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	772
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	773	We start by proving some basic identities
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	774	involving the simplification functions for r-regular expressions.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	775	After that we introduce the rewrite relations
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	776	$\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	777	$\rightsquigarrow_f$ and $\rightsquigarrow_g$.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	778	These relations involve similar techniques as in chapter \ref{Bitcoded2}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	779	for annotated regular expressions.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	780	Finally, we use these identities to establish the
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	781	closed forms of the alternative regular expression,
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	782	the sequence regular expression, and the star regular expression.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	783	%$r_1\cdot r_2$, $r^*$ and $\sum rs$.
601 ce4e5151a836 more Chengsong parents: 596 diff changeset	784
ce4e5151a836 more Chengsong parents: 596 diff changeset	785
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	786
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	787	\subsection{Some Basic Identities}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	788
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	789	In what follows we will often convert between lists
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	790	and sets.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	791	We use Isabelle's $set$ to refer to the
611 bc1df466150a more Chengsong parents: 610 diff changeset	792	function that converts a list $rs$ to the set
bc1df466150a more Chengsong parents: 610 diff changeset	793	containing all the elements in $rs$.
bc1df466150a more Chengsong parents: 610 diff changeset	794	\subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	795	The $\textit{rdistinct}$ function, as its name suggests, will
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	796	de-duplicate an r-regular expression list.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	797	It will also remove any elements that
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	798	are already in the accumulator set.
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	799	\begin{lemma}\label{rdistinctDoesTheJob}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	800	%The function $\textit{rdistinct}$ satisfies the following
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	801	%properties:
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	802	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	803	recursive definition here. Its Isabelle counterpart would be $\textit{distinct}$.}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	804	for testing
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	805	whether a list's elements are unique. Then the following
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	806	properties about $\textit{rdistinct}$ hold:
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	807	\begin{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	808	\item
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	809	If $a \in acc$ then $a \notin (\rdistinct{rs}{acc})$.
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	810	\item
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	811	%If list $rs'$ is the result of $\rdistinct{rs}{acc}$,
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	812	$\textit{isDistinct} \;\;\; (\rdistinct{rs}{acc})$.
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	813	\item
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	814	$\textit{set} \; (\rdistinct{rs}{acc})
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	815	= (\textit{set} \; rs) - acc$
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	816	\end{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	817	\end{lemma}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	818	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	819	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	820	The first part is by an induction on $rs$.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	821	The second and third parts can be proven by using the
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	822	inductive cases of $\textit{rdistinct}$.
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	823
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	824	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	825
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	826	\noindent
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	827	%$\textit{rdistinct}$ will out all regular expression terms
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	828	%that are in the accumulator, therefore
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	829	Concatenating a list $rs_a$ at the front of another
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	830	list $rs$ whose elements are all from the accumulator, and then calling $\textit{rdistinct}$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	831	on the merged list, the output will be as if we had called $\textit{rdistinct}$
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	832	without the prepending of $rs$:
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	833	\begin{lemma}\label{rdistinctConcat}
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	834	The elements appearing in the accumulator will always be removed.
15d182ffbc76 more Chengsong parents: 553 diff changeset	835	More precisely,
15d182ffbc76 more Chengsong parents: 553 diff changeset	836	\begin{itemize}
15d182ffbc76 more Chengsong parents: 553 diff changeset	837	\item
15d182ffbc76 more Chengsong parents: 553 diff changeset	838	If $rs \subseteq rset$, then
15d182ffbc76 more Chengsong parents: 553 diff changeset	839	$\rdistinct{rs@rsa }{acc} = \rdistinct{rsa }{acc}$.
15d182ffbc76 more Chengsong parents: 553 diff changeset	840	\item
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	841	More generally, if $a \in rset$ and $\rdistinct{rs}{\{a\}} = []$,
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	842	then $\rdistinct{(rs @ rs')}{rset} = \rdistinct{rs'}{rset}$
15d182ffbc76 more Chengsong parents: 553 diff changeset	843	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	844	\end{lemma}
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	845
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	846	\begin{proof}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	847	By induction on $rs$ and using \ref{rdistinctDoesTheJob}.
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	848	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	849	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	850	On the other hand, if an element $r$ does not appear in the input list waiting to be deduplicated,
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	851	then expanding the accumulator to include that element will not cause the output list to change:
611 bc1df466150a more Chengsong parents: 610 diff changeset	852	\begin{lemma}\label{rdistinctOnDistinct}
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	853	The accumulator can be augmented to include elements not appearing in the input list,
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	854	and the output will not change.
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	855	\begin{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	856	\item
611 bc1df466150a more Chengsong parents: 610 diff changeset	857	If $r \notin rs$, then $\rdistinct{rs}{acc} = \rdistinct{rs}{(\{r\} \cup acc)}$.
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	858	\item
611 bc1df466150a more Chengsong parents: 610 diff changeset	859	Particularly, if $\;\;\textit{isDistinct} \; rs$, then we have\\
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	860	\[ \rdistinct{rs}{\varnothing} = rs \]
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	861	\end{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	862	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	863	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	864	The first half is by induction on $rs$. The second half is a corollary of the first.
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	865	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	866	\noindent
611 bc1df466150a more Chengsong parents: 610 diff changeset	867	The function $\textit{rdistinct}$ removes duplicates from anywhere in a list.
bc1df466150a more Chengsong parents: 610 diff changeset	868	Despite being seemingly obvious,
bc1df466150a more Chengsong parents: 610 diff changeset	869	the induction technique is not as straightforward.
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	870	\begin{lemma}\label{distinctRemovesMiddle}
15d182ffbc76 more Chengsong parents: 553 diff changeset	871	The two properties hold if $r \in rs$:
15d182ffbc76 more Chengsong parents: 553 diff changeset	872	\begin{itemize}
15d182ffbc76 more Chengsong parents: 553 diff changeset	873	\item
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	874	$\rdistinct{rs}{rset} = \rdistinct{(rs @ [r])}{rset}$\\
aecf1ddf3541 more Chengsong parents: 554 diff changeset	875	and\\
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	876	$\rdistinct{(ab :: rs @ [ab])}{rset'} = \rdistinct{(ab :: rs)}{rset'}$
15d182ffbc76 more Chengsong parents: 553 diff changeset	877	\item
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	878	$\rdistinct{ (rs @ rs') }{rset} = \rdistinct{rs @ [r] @ rs'}{rset}$\\
aecf1ddf3541 more Chengsong parents: 554 diff changeset	879	and\\
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	880	$\rdistinct{(ab :: rs @ [ab] @ rs'')}{rset'} =
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	881	\rdistinct{(ab :: rs @ rs'')}{rset'}$
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	882	\end{itemize}
15d182ffbc76 more Chengsong parents: 553 diff changeset	883	\end{lemma}
15d182ffbc76 more Chengsong parents: 553 diff changeset	884	\noindent
15d182ffbc76 more Chengsong parents: 553 diff changeset	885	\begin{proof}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	886	By induction on $rs$. All other variables are allowed to be arbitrary.
611 bc1df466150a more Chengsong parents: 610 diff changeset	887	The second part of the lemma requires the first.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	888	Note that for each part, the two sub-propositions need to be proven
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	889	at the same time,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	890	so that the induction goes through.
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	891	\end{proof}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	892	\noindent
611 bc1df466150a more Chengsong parents: 610 diff changeset	893	This allows us to prove a few more equivalence relations involving
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	894	$\textit{rdistinct}$ (they will be useful later):
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	895	\begin{lemma}\label{rdistinctConcatGeneral}
611 bc1df466150a more Chengsong parents: 610 diff changeset	896	\mbox{}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	897	\begin{itemize}
aecf1ddf3541 more Chengsong parents: 554 diff changeset	898	\item
aecf1ddf3541 more Chengsong parents: 554 diff changeset	899	$\rdistinct{(rs @ rs')}{\varnothing} = \rdistinct{((\rdistinct{rs}{\varnothing})@ rs')}{\varnothing}$
aecf1ddf3541 more Chengsong parents: 554 diff changeset	900	\item
aecf1ddf3541 more Chengsong parents: 554 diff changeset	901	$\rdistinct{(rs @ rs')}{\varnothing} = \rdistinct{(\rdistinct{rs}{\varnothing} @ rs')}{\varnothing}$
aecf1ddf3541 more Chengsong parents: 554 diff changeset	902	\item
aecf1ddf3541 more Chengsong parents: 554 diff changeset	903	If $rset' \subseteq rset$, then $\rdistinct{rs}{rset} =
aecf1ddf3541 more Chengsong parents: 554 diff changeset	904	\rdistinct{(\rdistinct{rs}{rset'})}{rset}$. As a corollary
aecf1ddf3541 more Chengsong parents: 554 diff changeset	905	of this,
aecf1ddf3541 more Chengsong parents: 554 diff changeset	906	\item
aecf1ddf3541 more Chengsong parents: 554 diff changeset	907	$\rdistinct{(rs @ rs')}{rset} = \rdistinct{
aecf1ddf3541 more Chengsong parents: 554 diff changeset	908	(\rdistinct{rs}{\varnothing}) @ rs')}{rset}$. This
aecf1ddf3541 more Chengsong parents: 554 diff changeset	909	gives another corollary use later:
aecf1ddf3541 more Chengsong parents: 554 diff changeset	910	\item
aecf1ddf3541 more Chengsong parents: 554 diff changeset	911	If $a \in rset$, then $\rdistinct{(rs @ rs')}{rset} = \rdistinct{
aecf1ddf3541 more Chengsong parents: 554 diff changeset	912	(\rdistinct{(a :: rs)}{\varnothing} @ rs')}{rset} $,
aecf1ddf3541 more Chengsong parents: 554 diff changeset	913
aecf1ddf3541 more Chengsong parents: 554 diff changeset	914	\end{itemize}
aecf1ddf3541 more Chengsong parents: 554 diff changeset	915	\end{lemma}
aecf1ddf3541 more Chengsong parents: 554 diff changeset	916	\begin{proof}
aecf1ddf3541 more Chengsong parents: 554 diff changeset	917	By \ref{rdistinctDoesTheJob} and \ref{distinctRemovesMiddle}.
aecf1ddf3541 more Chengsong parents: 554 diff changeset	918	\end{proof}
611 bc1df466150a more Chengsong parents: 610 diff changeset	919	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	920	The next lemma is a more general form of \ref{rdistinctConcat};
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	921	It says that
611 bc1df466150a more Chengsong parents: 610 diff changeset	922	$\textit{rdistinct}$ is composable w.r.t list concatenation:
bc1df466150a more Chengsong parents: 610 diff changeset	923	\begin{lemma}\label{distinctRdistinctAppend}
bc1df466150a more Chengsong parents: 610 diff changeset	924	If $\;\; \textit{isDistinct} \; rs_1$,
bc1df466150a more Chengsong parents: 610 diff changeset	925	and $(set \; rs_1) \cap acc = \varnothing$,
bc1df466150a more Chengsong parents: 610 diff changeset	926	then applying $\textit{rdistinct}$ on $rs_1 @ rs_a$ does not
bc1df466150a more Chengsong parents: 610 diff changeset	927	have an effect on $rs_1$:
bc1df466150a more Chengsong parents: 610 diff changeset	928	\[\textit{rdistinct}\; (rs_1 @ rsa)\;\, acc
bc1df466150a more Chengsong parents: 610 diff changeset	929	= rs_1@(\textit{rdistinct} rsa \; (acc \cup rs_1))\]
bc1df466150a more Chengsong parents: 610 diff changeset	930	\end{lemma}
bc1df466150a more Chengsong parents: 610 diff changeset	931	\begin{proof}
bc1df466150a more Chengsong parents: 610 diff changeset	932	By an induction on
bc1df466150a more Chengsong parents: 610 diff changeset	933	$rs_1$, where $rsa$ and $acc$ are allowed to be arbitrary.
bc1df466150a more Chengsong parents: 610 diff changeset	934	\end{proof}
bc1df466150a more Chengsong parents: 610 diff changeset	935	\noindent
bc1df466150a more Chengsong parents: 610 diff changeset	936	$\textit{rdistinct}$ needs to be applied only once, and
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	937	applying it multiple times does not make any difference:
611 bc1df466150a more Chengsong parents: 610 diff changeset	938	\begin{corollary}\label{distinctOnceEnough}
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	939	$\textit{rdistinct} \; (rs @ rsa) {} = \textit{rdistinct} \; ( (rdistinct \;
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	940	rs \; \{ \}) @ (\textit{rdistinct} \; rs_a \; (set \; rs)))$
611 bc1df466150a more Chengsong parents: 610 diff changeset	941	\end{corollary}
bc1df466150a more Chengsong parents: 610 diff changeset	942	\begin{proof}
bc1df466150a more Chengsong parents: 610 diff changeset	943	By lemma \ref{distinctRdistinctAppend}.
bc1df466150a more Chengsong parents: 610 diff changeset	944	\end{proof}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	945
611 bc1df466150a more Chengsong parents: 610 diff changeset	946	\subsubsection{The Properties of $\textit{Rflts}$}
bc1df466150a more Chengsong parents: 610 diff changeset	947	We give in this subsection some properties
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	948	involving $\backslash_r$, $\backslash_{rsimps}$, $\textit{rflts}$ and
611 bc1df466150a more Chengsong parents: 610 diff changeset	949	$\textit{rsimp}_{ALTS} $, together with any non-trivial lemmas that lead to them.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	950	These will be helpful in later closed-form proofs, when
611 bc1df466150a more Chengsong parents: 610 diff changeset	951	we want to transform derivative terms which have
bc1df466150a more Chengsong parents: 610 diff changeset	952	%the ways in which multiple functions involving
bc1df466150a more Chengsong parents: 610 diff changeset	953	%those are composed together
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	954	interleaving derivatives and simplifications applied to them.
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	955
611 bc1df466150a more Chengsong parents: 610 diff changeset	956	\noindent
bc1df466150a more Chengsong parents: 610 diff changeset	957	%When the function $\textit{Rflts}$
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	958	%is applied to the concatenation of two lists; the output can be calculated by first applying the
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	959	%functions on two lists separately and then concatenating them together.
611 bc1df466150a more Chengsong parents: 610 diff changeset	960	$\textit{Rflts}$ is composable in terms of concatenation:
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	961	\begin{lemma}\label{rfltsProps}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	962	The function $\rflts$ has the properties below:\\
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	963	\begin{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	964	\item
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	965	$\rflts \; (rs_1 @ rs_2) = \rflts \; rs_1 @ \rflts \; rs_2$
15d182ffbc76 more Chengsong parents: 553 diff changeset	966	\item
15d182ffbc76 more Chengsong parents: 553 diff changeset	967	If $r \neq \RZERO$ and $\nexists rs_1. r = \RALTS{rs}_1$, then $\rflts \; (r::rs) = r :: \rflts \; rs$
15d182ffbc76 more Chengsong parents: 553 diff changeset	968	\item
15d182ffbc76 more Chengsong parents: 553 diff changeset	969	$\rflts \; (rs @ [\RZERO]) = \rflts \; rs$
15d182ffbc76 more Chengsong parents: 553 diff changeset	970	\item
15d182ffbc76 more Chengsong parents: 553 diff changeset	971	$\rflts \; (rs' @ [\RALTS{rs}]) = \rflts \; rs'@rs$
15d182ffbc76 more Chengsong parents: 553 diff changeset	972	\item
15d182ffbc76 more Chengsong parents: 553 diff changeset	973	$\rflts \; (rs @ [\RONE]) = \rflts \; rs @ [\RONE]$
15d182ffbc76 more Chengsong parents: 553 diff changeset	974	\item
15d182ffbc76 more Chengsong parents: 553 diff changeset	975	If $r \neq \RZERO$ and $\nexists rs'. r = \RALTS{rs'}$ then $\rflts \; (rs @ [r])
15d182ffbc76 more Chengsong parents: 553 diff changeset	976	= (\rflts \; rs) @ [r]$
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	977	\item
aecf1ddf3541 more Chengsong parents: 554 diff changeset	978	If $r = \RALTS{rs}$ and $r \in rs'$ then for all $r_1 \in rs.
aecf1ddf3541 more Chengsong parents: 554 diff changeset	979	r_1 \in \rflts \; rs'$.
aecf1ddf3541 more Chengsong parents: 554 diff changeset	980	\item
aecf1ddf3541 more Chengsong parents: 554 diff changeset	981	$\rflts \; (rs_a @ \RZERO :: rs_b) = \rflts \; (rs_a @ rs_b)$
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	982	\end{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	983	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	984	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	985	\begin{proof}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	986	By induction on $rs_1$ in the first sub-lemma, and induction on $r$ in the second part,
aecf1ddf3541 more Chengsong parents: 554 diff changeset	987	and induction on $rs$, $rs'$, $rs$, $rs'$, $rs_a$ in the third, fourth, fifth, sixth and
aecf1ddf3541 more Chengsong parents: 554 diff changeset	988	last sub-lemma.
543 b2bea5968b89 thesis_thys Chengsong parents: 532 diff changeset	989	\end{proof}
611 bc1df466150a more Chengsong parents: 610 diff changeset	990	\noindent
bc1df466150a more Chengsong parents: 610 diff changeset	991	Now we introduce the property that the operations
bc1df466150a more Chengsong parents: 610 diff changeset	992	derivative and $\rsimpalts$
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	993	commute, this will be used later on when deriving the closed form for
611 bc1df466150a more Chengsong parents: 610 diff changeset	994	the alternative regular expression:
bc1df466150a more Chengsong parents: 610 diff changeset	995	\begin{lemma}\label{rderRsimpAltsCommute}
bc1df466150a more Chengsong parents: 610 diff changeset	996	$\rder{x}{(\rsimpalts \; rs)} = \rsimpalts \; (\map \; (\rder{x}{\_}) \; rs)$
bc1df466150a more Chengsong parents: 610 diff changeset	997	\end{lemma}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	998	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	999	By induction on $rs$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1000	\end{proof}
611 bc1df466150a more Chengsong parents: 610 diff changeset	1001	\noindent
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1002
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1003	\subsubsection{The $RL$ Function: Language Interpretation for $\textit{Rrexp}$s}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1004	Much like the definition of $L$ on plain regular expressions, one can also
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1005	define the language interpretation for $\rrexp$s.
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1006	\begin{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1007	\begin{tabular}{lcl}
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1008	$RL \; (\ZERO_r)$ & $\dn$ & $\phi$\\
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1009	$RL \; (\ONE_r)$ & $\dn$ & $\{[]\}$\\
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1010	$RL \; (c)$ & $\dn$ & $\{[c]\}$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1011	$RL \; \sum rs$ & $\dn$ & $ \bigcup_{r \in rs} (RL \; r)$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1012	$RL \; (r_1 \cdot r_2)$ & $\dn$ & $ RL \; (r_1) @ RL \; (r_2)$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1013	$RL \; (r^)$ & $\dn$ & $ (RL(r))^$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1014	\end{tabular}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1015	\end{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1016	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1017	The main use of $RL$ is to establish some connections between $\rsimp{}$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1018	and $\rnullable{}$:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1019	\begin{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1020	The following properties hold:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1021	\begin{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1022	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1023	If $\rnullable{r}$, then $\rsimp{r} \neq \RZERO$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1024	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1025	$\rnullable{r \backslash s} \quad $ if and only if $\quad \rnullable{\rderssimp{r}{s}}$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1026	\end{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1027	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1028	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1029	The first part is by induction on $r$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1030	The second part is true because property
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1031	\[ RL \; r = RL \; (\rsimp{r})\] holds.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1032	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1033
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1034	\subsubsection{Simplified $\textit{Rrexp}$s are Good}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1035	We formalise the notion of ``good" regular expressions,
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1036	which means regular expressions that
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1037	are fully simplified in terms of our $\textit{rsimp}$ function.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1038	For alternative regular expressions that means they
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1039	do not contain any nested alternatives, un-eliminated $\RZERO$s
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1040	or duplicate elements (for example,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1041	$r_1 + (r_2 + r_3)$, $\RZERO + r$ and $ \sum [r, r, \ldots]$).
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1042	The clauses for $\good$ are:
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1043	\begin{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1044	\begin{tabular}{@{}lcl@{}}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1045	$\good\; \RZERO$ & $\dn$ & $\textit{false}$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1046	$\good\; \RONE$ & $\dn$ & $\textit{true}$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1047	$\good\; \RCHAR{c}$ & $\dn$ & $\btrue$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1048	$\good\; \RALTS{[]}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1049	$\good\; \RALTS{[r]}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1050	$\good\; \RALTS{r_1 :: r_2 :: rs}$ & $\dn$ &
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1051	$\textit{isDistinct} \; (r_1 :: r_2 :: rs) \;$\\
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1052	& & $\land \; (\forall r' \in (r_1 :: r_2 :: rs).\; \good \; r'\; \, \land \; \, \textit{nonAlt}\; r')$\\
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1053	$\good \; \RSEQ{\RZERO}{r}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1054	$\good \; \RSEQ{\RONE}{r}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1055	$\good \; \RSEQ{r}{\RZERO}$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1056	$\good \; \RSEQ{r_1}{r_2}$ & $\dn$ & $\good \; r_1 \;\, \textit{and} \;\, \good \; r_2$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1057	$\good \; \RSTAR{r}$ & $\dn$ & $\btrue$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1058	\end{tabular}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1059	\end{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1060	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1061	We omit the recursive definition of the predicate $\textit{nonAlt}$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1062	which evaluates to true when the regular expression is not an
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1063	alternative, and false otherwise.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1064	The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1065	its non-empty argument list of expressions are all good themselves, and $\textit{nonAlt}$,
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1066	and unique:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1067	\begin{lemma}\label{rsimpaltsGood}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1068	If $rs \neq []$ and for all $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1069	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1070	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1071	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1072	We also note that
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1073	if a regular expression $r$ is good, then $\rflts$ on the singleton
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1074	list $[r]$ will not break goodness:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1075	\begin{lemma}\label{flts2}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1076	If $\good \; r$, then forall $r' \in \rflts \; [r]. \; \good \; r'$ and $\textit{nonAlt} \; r'$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1077	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1078	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1079	By an induction on $r$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1080	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1081	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1082	The other observation we make about $\rsimp{r}$ is that it never
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1083	comes with nested alternatives, which we describe as the $\nonnested$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1084	property:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1085	\begin{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1086	\begin{tabular}{lcl}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1087	$\nonnested \; \, \sum []$ & $\dn$ & $\btrue$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1088	$\nonnested \; \, \sum ((\sum rs_1) :: rs_2)$ & $\dn$ & $\bfalse$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1089	$\nonnested \; \, \sum (r :: rs)$ & $\dn$ & $\nonnested (\sum rs)$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1090	$\nonnested \; \, r $ & $\dn$ & $\btrue$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1091	\end{tabular}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1092	\end{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1093	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1094	The $\rflts$ function
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1095	always opens up nested alternatives,
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1096	which enables $\rsimp$ to be non-nested:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1097
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1098	\begin{lemma}\label{nonnestedRsimp}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1099	It is always the case that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1100	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1101	$\nonnested \; (\rsimp{r})$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1102	\end{center}
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1103	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1104	\begin{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1105	By induction on $r$.
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1106	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1107	\noindent
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1108	With this we can prove that a regular expression
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1109	after simplification and flattening and de-duplication,
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1110	will not contain any alternative regular expression directly:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1111	\begin{lemma}\label{nonaltFltsRd}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1112	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1113	then $\textit{nonAlt} \; x$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1114	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1115	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1116	By \ref{nonnestedRsimp}.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1117	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1118	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1119	The other fact we know is that once $\rsimp{}$ has finished
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1120	processing an alternative regular expression, it will not
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1121	contain any $\RZERO$s. This is because all the recursive
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1122	calls to the simplification on the children regular expressions
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1123	make the children good, and $\rflts$ will not delete
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1124	any $\RZERO$s out of a good regular expression list,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1125	and $\rdistinct{}$ will not ``mess'' with the result.
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1126	\begin{lemma}\label{flts3Obv}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1127	The following are true:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1128	\begin{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1129	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1130	If for all $r \in rs. \, \good \; r $ or $r = \RZERO$,
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1131	then for all $r \in \rflts\; rs. \, \good \; r$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1132	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1133	If $x \in \rdistinct{\rflts\; (\map \; rsimp{}\; rs)}{\varnothing}$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1134	and for all $y$ such that $\llbracket y \rrbracket_r$ less than
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1135	$\llbracket rs \rrbracket_r + 1$, either
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1136	$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1137	then $\good \; x$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1138	\end{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1139	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1140	\begin{proof}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1141	The first part is by induction, where the inductive cases
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1142	are the inductive cases of $\rflts$.
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1143	The second part is a corollary from the first part.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1144	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1145
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1146	This leads to good structural property of $\rsimp{}$,
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1147	that after simplification, a regular expression is
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1148	either good or $\RZERO$:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1149	\begin{lemma}\label{good1}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1150	For any r-regular expression $r$, $\good \; \rsimp{r}$ or $\rsimp{r} = \RZERO$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1151	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1152	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1153	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1154	Lemma \ref{rsimpMono} says that
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1155	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1156	$\llbracket r \rrbracket_r$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1157	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1158	The inductive hypothesis applies to the children regular expressions
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1159	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1160	by that as well.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1161	The lemmas \ref{nonnestedRsimp} and \ref{nonaltFltsRd} are used
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1162	to ensure that goodness is preserved at the topmost level.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1163	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1164	We shall prove that any good regular expression is
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1165	a fixed-point for $\textit{rsimp}$.
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1166	First we prove an auxiliary lemma:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1167	\begin{lemma}\label{goodaltsNonalt}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1168	If $\good \; \sum rs$, then $\rflts\; rs = rs$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1169	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1170	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1171	By an induction on $\sum rs$. The inductive rules are the cases
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1172	for $\good$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1173	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1174	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1175	Now we are ready to prove that good regular expressions are invariant
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1176	with respect to $\rsimp{}$:
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1177	\begin{lemma}\label{test}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1178	If $\good \;r$ then $\rsimp{r} = r$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1179	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1180	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1181	By an induction on the inductive cases of $\good$, using lemmas
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1182	\ref{goodaltsNonalt} and \ref{rdistinctOnDistinct}.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1183	The lemma \ref{goodaltsNonalt} is used in the alternative
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1184	case where 2 or more elements are present in the list.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1185	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1186	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1187	Below we show a property involving $\rflts$, $\textit{rdistinct}$,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1188	$\rsimp{}$ and $\rsimp_{ALTS}$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1189	which requires $\ref{good1}$ to go through smoothly:
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1190	\begin{lemma}\label{flattenRsimpalts}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1191	An application of $\rsimp_{ALTS}$ can be ``absorbed'',
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1192	if its output is concatenated with a list and then applied to $\rflts$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1193	\begin{center}
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1194	$\rflts \; ( (\rsimp_{ALTS} \;
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1195	(\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing})) ::
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1196	\map \; \rsimp{} \; rs' ) =
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1197	\rflts \; ( (\rdistinct{(\rflts \; (\map \; \rsimp{}\; rs))}{\varnothing}) @ (
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1198	\map \; \rsimp{rs'}))$
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1199	\end{center}
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1200
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1201
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1202	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1203	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1204	By \ref{good1}.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1205	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1206	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1207
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1208
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1209
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1210
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1211
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1212	We are also ready to prove that $\textit{rsimp}$ is idempotent.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1213	\subsubsection{$\rsimp$ is Idempotent}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1214	The idempotency of $\rsimp$ is very useful in
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1215	manipulating regular expression terms into desired
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1216	forms so that key steps allowing further rewriting to closed forms
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1217	are possible.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1218	\begin{lemma}\label{rsimpIdem}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1219	$\rsimp{r} = \rsimp{(\rsimp{r})}$
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1220	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1221
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1222	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1223	By \ref{test} and \ref{good1}.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1224	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1225	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1226	This property means we do not have to repeatedly
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1227	apply simplification in each step, which justifies
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1228	our definition of $\blexersimp$.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1229	This is in contrast to the work of Sulzmann and Lu where
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1230	the simplification is applied in a fixpoint manner.
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1231
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1232
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1233	On the other hand, we can repeat the same $\rsimp{}$ applications
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1234	on regular expressions as many times as we want, if we have at least
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1235	one simplification applied to it, and apply it wherever we need to:
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1236	\begin{corollary}\label{headOneMoreSimp}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1237	The following properties hold, directly from \ref{rsimpIdem}:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1238
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1239	\begin{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1240	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1241	$\map \; \rsimp{(r :: rs)} = \map \; \rsimp{} \; (\rsimp{r} :: rs)$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1242	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1243	$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1244	\end{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1245	\end{corollary}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1246	\noindent
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1247	This will be useful in the later closed-form proof's rewriting steps.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1248	Similarly, we state the following useful facts below:
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1249	\begin{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1250	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1251	\begin{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1252	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1253	If $r = \sum rs$ then $\rsimpalts \; rs = \sum rs$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1254	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1255	If $r = \sum rs$ then $\rdistinct{rs}{\varnothing} = rs$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1256	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1257	$\rsimpalts \; (\rdistinct{\rflts \; [r]}{\varnothing}) = r$.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1258	\end{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1259	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1260	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1261	By application of lemmas \ref{rsimpIdem} and \ref{good1}.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1262	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1263
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1264	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1265	With the idempotency of $\textit{rsimp}$ and its corollaries,
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1266	we can start proving some key equalities leading to the
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1267	closed forms.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1268	Next we present a few equivalent terms under $\textit{rsimp}$.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1269	To make the notation more concise
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1270	We use $r_1 \sequal r_2 $ to denote that $\rsimp{r_1} = \rsimp{r_2}$.
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1271	%\begin{center}
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1272	%\begin{tabular}{lcl}
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1273	% $a \sequal b$ & $ \dn$ & $ \textit{rsimp} \; a = \textit{rsimp} \; b$
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1274	%\end{tabular}
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1275	%\end{center}
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1276	%\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1277	%\vspace{0em}
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1278	\begin{lemma}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1279	The following equivalence hold:
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1280	\begin{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1281	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1282	$\rsimpalts \; (\RZERO :: rs) \sequal \rsimpalts\; rs$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1283	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1284	$\rsimpalts \; rs \sequal \rsimpalts (\map \; \rsimp{} \; rs)$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1285	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1286	$\RALTS{\RALTS{rs}} \sequal \RALTS{rs}$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1287	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1288	$\sum ((\sum rs_a) :: rs_b) \sequal \sum rs_a @ rs_b$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1289	\item
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1290	$\RALTS{rs} \sequal \RALTS{\map \; \rsimp{} \; rs}$
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1291	\end{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1292	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1293	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1294	By induction on the lists involved.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1295	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1296	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1297	The above allows us to prove
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1298	two similar equalities (which are a bit more involved).
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1299	It says that we could flatten the elements
614 d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1300	before simplification and still get the same result.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1301	\begin{lemma}\label{simpFlatten3}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1302	One can flatten the inside $\sum$ of a $\sum$ if it is being
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1303	simplified. Concretely,
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1304	\begin{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1305	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1306	If for all $r \in rs, rs', rs''$, we have $\good \; r $
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1307	or $r = \RZERO$, then $\sum (rs' @ rs @ rs'') \sequal
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1308	\sum (rs' @ [\sum rs] @ rs'')$ holds. As a corollary,
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1309	\item
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1310	$\sum (rs' @ [\sum rs] @ rs'') \sequal \sum (rs' @ rs @ rs'')$
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1311	\end{itemize}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1312	\end{lemma}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1313	\begin{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1314	By rewriting steps involving the use of \ref{test} and \ref{rdistinctConcatGeneral}.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1315	The second sub-lemma is a corollary of the previous.
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1316	\end{proof}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1317	%Rewriting steps not put in--too long and complicated-------------------------------
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1318	\begin{comment}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1319	\begin{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1320	$\rsimp{\sum (rs' @ rs @ rs'')} \stackrel{def of bsimp}{=}$ \\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1321	$\rsimpalts \; (\rdistinct{\rflts \; ((\map \; \rsimp{}\; rs') @ (\map \; \rsimp{} \; rs ) @ (\map \; \rsimp{} \; rs''))}{\varnothing})$ \\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1322	$\stackrel{by \ref{test}}{=}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1323	\rsimpalts \; (\rdistinct{(\rflts \; rs' @ \rflts \; rs @ \rflts \; rs'')}{
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1324	\varnothing})$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1325	$\stackrel{by \ref{rdistinctConcatGeneral}}{=}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1326	\rsimpalts \; (\rdistinct{\rflts \; rs'}{\varnothing} @ \rdistinct{(
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1327	\rflts\; rs @ \rflts \; rs'')}{\rflts \; rs'})$\\
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1328
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1329	\end{center}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1330	\end{comment}
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1331	%Rewriting steps not put in--too long and complicated-------------------------------
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1332	\noindent
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1333
d5e9bcb384ec reorder Chengsong parents: 613 diff changeset	1334
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1335	We need more equalities like the above to enable a closed form lemma,
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1336	for which we need to introduce a few rewrite relations
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1337	to help
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1338	us obtain them.
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	1339
610 d028c662a3df data files Chengsong parents: 609 diff changeset	1340	\subsection{The rewrite relation $\hrewrite$ , $\scfrewrites$ , $\frewrite$ and $\grewrite$}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1341	Inspired by the success we had in the correctness proof
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1342	in \ref{Bitcoded2},
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1343	we follow suit here, defining atomic simplification
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1344	steps as ``small-step'' rewriting steps. This allows capturing
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1345	similarities between terms that would be otherwise
aecf1ddf3541 more Chengsong parents: 554 diff changeset	1346	hard to express.
aecf1ddf3541 more Chengsong parents: 554 diff changeset	1347
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1348	We use $\hrewrite$ for one-step atomic rewrite of
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1349	regular expression simplification,
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1350	$\frewrite$ for rewrite of list of regular expressions that
aecf1ddf3541 more Chengsong parents: 554 diff changeset	1351	include all operations carried out in $\rflts$, and $\grewrite$ for
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1352	rewriting a list of regular expressions possible in both $\rflts$ and $\textit{rdistinct}$.
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1353	Their reflexive transitive closures are used to denote zero or many steps,
aecf1ddf3541 more Chengsong parents: 554 diff changeset	1354	as was the case in the previous chapter.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1355	As we have already
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1356	done something similar, the presentation about
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1357	these rewriting rules will be more concise than that in \ref{Bitcoded2}.
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	1358	To differentiate between the rewriting steps for annotated regular expressions
15d182ffbc76 more Chengsong parents: 553 diff changeset	1359	and $\rrexp$s, we add characters $h$ and $g$ below the squig arrow symbol
15d182ffbc76 more Chengsong parents: 553 diff changeset	1360	to mean atomic simplification transitions
15d182ffbc76 more Chengsong parents: 553 diff changeset	1361	of $\rrexp$s and $\rrexp$ lists, respectively.
15d182ffbc76 more Chengsong parents: 553 diff changeset	1362
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1363
aecf1ddf3541 more Chengsong parents: 554 diff changeset	1364
aecf1ddf3541 more Chengsong parents: 554 diff changeset	1365
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1366	\begin{figure}[H]
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	1367	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1368	\begin{mathpar}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1369	\inferrule[RSEQ0L]{}{\RZERO \cdot r_2 \hrewrite \RZERO\\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1370
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1371	\inferrule[RSEQ0R]{}{r_1 \cdot \RZERO \hrewrite \RZERO\\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1372
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1373	\inferrule[RSEQ1]{}{(\RONE \cdot r) \hrewrite r\\}\\
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1374
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1375	\inferrule[RSEQL]{ r_1 \hrewrite r_2}{r_1 \cdot r_3 \hrewrite r_2 \cdot r_3\\}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1376
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1377	\inferrule[RSEQR]{ r_3 \hrewrite r_4}{r_1 \cdot r_3 \hrewrite r_1 \cdot r_4\\}\\
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1378
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1379	\inferrule[RALTSChild]{r \hrewrite r'}{\sum (rs_1 @ [r] @ rs_2) \hrewrite \sum (rs_1 @ [r'] @ rs_2)\\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1380
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1381	\inferrule[RALTS0]{}{\sum (rs_a @ [\RZERO] @ rs_b) \hrewrite \sum (rs_a @ rs_b)}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1382
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1383	\inferrule[RALTSNested]{}{\sum (rs_a @ [\sum rs_1] @ rs_b) \hrewrite \sum (rs_a @ rs_1 @ rs_b)}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1384
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1385	\inferrule[RALTSNil]{}{ \sum [] \hrewrite \RZERO\\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1386
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1387	\inferrule[RALTSSingle]{}{ \sum [r] \hrewrite r\\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1388
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1389	\inferrule[RALTSDelete]{\\ r_1 = r_2}{\sum rs_a @ [r_1] @ rs_b @ [r_2] @ rsc \hrewrite \sum rs_a @ [r_1] @ rs_b @ rs_c}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1390
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1391	\end{mathpar}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1392	\end{center}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1393	\caption{List of one-step rewrite rules for r-regular expressions ($\hrewrite$)}\label{hRewrite}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1394	\end{figure}
554 15d182ffbc76 more Chengsong parents: 553 diff changeset	1395
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1396
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1397	Like $\rightsquigarrow_s$, it is
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1398	convenient to define rewrite rules for a list of regular expressions,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1399	where each element can rewrite in many steps to the other (scf stands for
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1400	li\emph{s}t \emph{c}losed \emph{f}orm). This relation is similar to the
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1401	$\stackrel{s*}{\rightsquigarrow}$ for annotated regular expressions.
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1402
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1403	\begin{figure}[H]
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1404	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1405	\begin{mathpar}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1406	\inferrule{}{[] \scfrewrites [] }
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1407
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1408	\inferrule{r \hrewrites r' \\ rs \scfrewrites rs'}{r :: rs \scfrewrites r' :: rs'}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1409	\end{mathpar}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1410	\end{center}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1411	\caption{List of one-step rewrite rules for a list of r-regular expressions}\label{scfRewrite}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1412	\end{figure}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1413	%frewrite
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1414	List of one-step rewrite rules for flattening
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1415	a list of regular expressions($\frewrite$):
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1416	\begin{figure}[H]
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1417	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1418	\begin{mathpar}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1419	\inferrule{}{\RZERO :: rs \frewrite rs \\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1420
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1421	\inferrule{}{(\sum rs) :: rs_a \frewrite rs @ rs_a \\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1422
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1423	\inferrule{rs_1 \frewrite rs_2}{r :: rs_1 \frewrite r :: rs_2}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1424	\end{mathpar}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1425	\end{center}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1426	\caption{List of one-step rewrite rules characterising the $\rflts$ operation on a list}\label{fRewrites}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1427	\end{figure}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1428
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1429	Lists of one-step rewrite rules for flattening and de-duplicating
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1430	a list of regular expressions ($\grewrite$):
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1431	\begin{figure}[H]
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1432	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1433	\begin{mathpar}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1434	\inferrule{}{\RZERO :: rs \grewrite rs \\}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1435
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1436	\inferrule{}{(\sum rs) :: rs_a \grewrite rs @ rs_a \\}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1437
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1438	\inferrule{rs_1 \grewrite rs_2}{r :: rs_1 \grewrite r :: rs_2}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1439
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1440	\inferrule[dB]{}{rs_a @ [a] @ rs_b @[a] @ rs_c \grewrite rs_a @ [a] @ rsb @ rsc}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1441	\end{mathpar}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1442	\end{center}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1443	\caption{List of one-step rewrite rules characterising the $\rflts$ and $\textit{rdistinct}$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1444	operations}\label{gRewrite}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1445	\end{figure}
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1446	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1447	We define
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1448	two separate list rewriting relations $\frewrite$ and $\grewrite$.
611 bc1df466150a more Chengsong parents: 610 diff changeset	1449	The rewriting steps that take place during
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1450	flattening are characterised by $\frewrite$.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1451	The rewrite relation $\grewrite$ characterises both flattening and de-duplicating.
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1452	Sometimes $\grewrites$ is slightly too powerful
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1453	so we would rather use $\frewrites$ to prove
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1454	%because we only
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1455	equalities related to $\rflts$.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1456	%certain equivalence under the rewriting steps of $\frewrites$.
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1457	For example, when proving the closed-form for the alternative regular expression,
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1458	one of the equalities needed is:
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1459	\begin{center}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1460	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1461	\sum (\rDistinct \;\; (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1462	$
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1463	\end{center}
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1464	\noindent
c27f04bb2262 hello Chengsong parents: 555 diff changeset	1465	Proving this is by first showing
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1466	\begin{lemma}\label{earlyLaterDerFrewrites}
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1467	$\map \; (\_ \backslash x) \; (\rflts \; rs) \frewrites
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1468	\rflts \; (\map \; (\_ \backslash x) \; rs)$
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1469	\end{lemma}
c27f04bb2262 hello Chengsong parents: 555 diff changeset	1470	\noindent
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1471	and then the equivalence between two terms
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1472	that can reduce in many steps to each other:
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1473	\begin{lemma}\label{frewritesSimpeq}
c27f04bb2262 hello Chengsong parents: 555 diff changeset	1474	If $rs_1 \frewrites rs_2 $, then $\sum (\rDistinct \; rs_1 \; \varnothing) \sequal
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1475	\sum (\rDistinct \; rs_2 \; \varnothing)$.
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1476	\end{lemma}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1477	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1478	These two lemmas can both be proven using a straightforward induction (and
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1479	the proofs for them are therefore omitted).
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1480
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1481	Now the above equalities can be derived with ease:
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1482	\begin{corollary}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1483	$\sum (\rDistinct \;\; (\map \; (\_ \backslash x) \; (\rflts \; rs)) \;\; \varnothing) \sequal
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1484	\sum (\rDistinct \;\; (\rflts \; (\map \; (\_ \backslash x) \; rs)) \;\; \varnothing)
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1485	$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1486	\end{corollary}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1487	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1488	By lemmas \ref{earlyLaterDerFrewrites} and \ref{frewritesSimpeq}.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1489	\end{proof}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1490	But this trick will not work for $\grewrites$.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1491	For example, a rewriting step in proving
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1492	closed forms is:
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1493	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1494	$\rsimp{(\rsimpalts \; (\map \; (\_ \backslash x) \; (\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs}))))}{\varnothing})))}$\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1495	$=$ \\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1496	$\rsimp{(\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \; (\rflts \; (\map \; (\rsimp{} \; \circ \; (\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}))} $
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1497	\noindent
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1498	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1499	For this, one would hope to have a rewriting relation between the two lists involved,
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1500	similar to \ref{earlyLaterDerFrewrites}. However, it turns out that
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1501	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1502	$\map \; (\_ \backslash x) \; (\rDistinct \; rs \; rset) \grewrites \rDistinct \; (\map \;
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1503	(\_ \backslash x) \; rs) \; ( rset \backslash x)$
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1504	\end{center}
c27f04bb2262 hello Chengsong parents: 555 diff changeset	1505	\noindent
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1506	does $\mathbf{not}$ hold in general.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1507	For this rewriting step we will introduce some slightly more cumbersome
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1508	proof technique later.
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1509	The point is that $\frewrite$
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1510	allows us to prove equivalence in a straightforward way that is
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1511	not possible for $\grewrite$.
555 aecf1ddf3541 more Chengsong parents: 554 diff changeset	1512
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1513
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1514	\subsubsection{Terms That Can Be Rewritten Using $\hrewrites$, $\grewrites$, and $\frewrites$}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1515	In this part, we present lemmas stating
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1516	pairs of r-regular expressions and r-regular expression lists
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1517	where one can rewrite from one in many steps to the other.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1518	Most of the proofs to these lemmas are straightforward, using
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1519	an induction on the corresponding rewriting relations.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1520	These proofs will therefore be omitted when this is the case.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1521	We present in the following lemma a few pairs of terms that are rewritable via
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1522	$\grewrites$:
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1523	\begin{lemma}\label{gstarRdistinctGeneral}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1524	\mbox{}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1525	\begin{itemize}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1526	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1527	$rs_1 @ rs \grewrites rs_1 @ (\rDistinct \; rs \; rs_1)$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1528	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1529	$rs \grewrites \rDistinct \; rs \; \varnothing$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1530	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1531	$rs_a @ (\rDistinct \; rs \; rs_a) \grewrites rs_a @ (\rDistinct \;
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1532	rs \; (\{\RZERO\} \cup rs_a))$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1533	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1534	$rs \;\; @ \;\; \rDistinct \; rs_a \; rset \grewrites rs @ \rDistinct \; rs_a \;
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1535	(rest \cup rs)$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1536
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1537	\end{itemize}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1538	\end{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1539	\noindent
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1540	If a pair of terms $rs_1, rs_2$ are rewritable via $\grewrites$ to each other,
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1541	then they are equivalent under $\rsimp{}$:
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1542	\begin{lemma}\label{grewritesSimpalts}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1543	\mbox{}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1544	If $rs_1 \grewrites rs_2$, then
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1545	we have the following equivalence:
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1546	\begin{itemize}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1547	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1548	$\sum rs_1 \sequal \sum rs_2$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1549	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1550	$\rsimpalts \; rs_1 \sequal \rsimpalts \; rs_2$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1551	\end{itemize}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1552	\end{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1553	\noindent
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1554	Here are a few connecting lemmas showing that
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1555	if a list of regular expressions can be rewritten using $\grewrites$ or $\frewrites $ or
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1556	$\scfrewrites$,
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1557	then an alternative constructor taking the list can also be rewritten using $\hrewrites$:
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1558	\begin{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1559	\begin{itemize}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1560	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1561	If $rs \grewrites rs'$ then $\sum rs \hrewrites \sum rs'$.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1562	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1563	If $rs \grewrites rs'$ then $\sum rs \hrewrites \rsimpalts \; rs'$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1564	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1565	If $rs_1 \scfrewrites rs_2$ then $\sum (rs @ rs_1) \hrewrites \sum (rs @ rs_2)$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1566	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1567	If $rs_1 \scfrewrites rs_2$ then $\sum rs_1 \hrewrites \sum rs_2$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1568
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1569	\end{itemize}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1570	\end{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1571	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1572	Now comes the core of the proof,
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1573	which says that once two lists are rewritable to each other,
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1574	then they are equivalent under $\textit{rsimp}$:
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1575	\begin{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1576	If $r_1 \hrewrites r_2$ then $r_1 \sequal r_2$.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1577	\end{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1578
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1579	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1580	Similar to what we did in chapter \ref{Bitcoded2},
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1581	we prove that if one can rewrite from one r-regular expression ($r$)
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1582	to the other ($r'$), after taking derivatives one can still rewrite
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1583	the first ($r\backslash c$) to the other ($r'\backslash c$).
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1584	\begin{lemma}\label{interleave}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1585	If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1586	\end{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1587	\noindent
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1588	This allows us to prove more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$.
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1589	\begin{lemma}\label{insideSimpRemoval}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1590	$\rsimp{(\rder{c}{(\rsimp{r})})} = \rsimp{(\rder{c}{r})} $
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1591	\end{lemma}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1592	\noindent
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1593	\begin{proof}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1594	By \ref{interleave} and \ref{rsimpIdem}.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1595	\end{proof}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1596	\noindent
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1597	And this unlocks more equivalent terms:
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1598	\begin{lemma}\label{Simpders}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1599	As corollaries of \ref{insideSimpRemoval}, we have
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1600	\begin{itemize}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1601	\item
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	1602	If $s \neq []$ then $\rderssimp{r}{s} = \rsimp{( r \backslash_{rs} s)}$.
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1603	\item
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1604	$\rsimpalts \; (\map \; (\_ \backslash_r x) \;
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1605	(\rdistinct{rs}{\varnothing})) \sequal
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1606	\rsimpalts \; (\rDistinct \;
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1607	(\map \; (\_ \backslash_r x) rs) \;\varnothing )$
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1608	\end{itemize}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1609	\end{lemma}
611 bc1df466150a more Chengsong parents: 610 diff changeset	1610	\begin{proof}
bc1df466150a more Chengsong parents: 610 diff changeset	1611	Part 1 is by lemma \ref{insideSimpRemoval},
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1612	part 2 is by lemma \ref{insideSimpRemoval} .%and \ref{distinctDer}.
611 bc1df466150a more Chengsong parents: 610 diff changeset	1613	\end{proof}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1614	\noindent
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1615
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1616	\subsection{Closed Forms for $\sum rs$, $r_1\cdot r_2$ and $r^*$}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1617	Lemma \ref{Simpders} leads to our first closed form,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1618	which is for the alternative regular expression:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1619	\begin{theorem}\label{altsClosedForm}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1620	\mbox{}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1621	\begin{center}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1622	$\rderssimp{(\sum rs)}{s} \sequal
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1623	\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1624	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1625	\end{theorem}
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1626	\noindent
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1627	\begin{proof}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1628	By a reverse induction on the string $s$.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1629	One rewriting step, as we mentioned earlier,
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1630	involves
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1631	\begin{center}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1632	$\rsimpalts \; (\map \; (\_ \backslash x) \;
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1633	(\rdistinct{(\rflts \; (\map \; (\rsimp{} \; \circ \;
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1634	(\lambda r. \rderssimp{r}{xs}))))}{\varnothing}))
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1635	\sequal
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1636	\rsimpalts \; (\rdistinct{(\map \; (\_ \backslash x) \;
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1637	(\rflts \; (\map \; (\rsimp{} \; \circ \;
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1638	(\lambda r. \rderssimp{r}{xs})))) ) }{\varnothing}) $.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1639	\end{center}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1640	This can be proven by a combination of
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1641	\ref{grewritesSimpalts}, \ref{gstarRdistinctGeneral}, \ref{rderRsimpAltsCommute}, and
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1642	\ref{insideSimpRemoval}.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1643	\end{proof}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1644	\noindent
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1645	This closed form has a variant which can be more convenient in later proofs:
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1646	\begin{corollary}\label{altsClosedForm1}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1647	If $s \neq []$ then
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1648	$\rderssimp \; (\sum \; rs) \; s =
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1649	\rsimp{(\sum \; (\map \; \rderssimp{\_}{s} \; rs))}$.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1650	\end{corollary}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1651	\noindent
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1652	The harder closed forms are the sequence and star ones.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	1653	Before we obtain them, some preliminary definitions
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1654	are needed to make proof statements concise.
556 c27f04bb2262 hello Chengsong parents: 555 diff changeset	1655
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	1656
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	1657	\subsubsection{Closed Form for Sequence Regular Expressions}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1658	For the sequence regular expression,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1659	let's first look at a series of derivative steps on it
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1660	(assuming that each time when a derivative is taken,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1661	the head of the sequence is always nullable):
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1662	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1663	\begin{tabular}{llll}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1664	$r_1 \cdot r_2$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1665	$\longrightarrow_{\backslash c}$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1666	$r_1\backslash c \cdot r_2 + r_2 \backslash c$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1667	$ \longrightarrow_{\backslash c'} $ \\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1668	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1669	$(r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc'$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1670	$\longrightarrow_{\backslash c''} $ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1671	$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'')
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1672	+ r_2 \backslash cc'c''$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1673	$ \longrightarrow_{\backslash c''} \quad \ldots$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1674	\end{tabular}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1675	\end{center}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1676	Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expressed as
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1677	a giant alternative taking a list of terms
671a83abccf3 haha Chengsong parents: 557 diff changeset	1678	$[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$,
671a83abccf3 haha Chengsong parents: 557 diff changeset	1679	where the head of the list is always the term
671a83abccf3 haha Chengsong parents: 557 diff changeset	1680	representing a match involving only $r_1$, and the tail of the list consisting of
671a83abccf3 haha Chengsong parents: 557 diff changeset	1681	terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1682	This intuition is also echoed by Murugesan and Sundaram \cite{Murugesan2014},
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1683	where they gave
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1684	a pencil-and-paper derivation of $(r_1 \cdot r_2)\backslash s$:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1685	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1686	\begin{tabular}{lc}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1687	$L \; [ (r_1 \cdot r_2) \backslash_r (c_1 :: c_2 :: \ldots c_n) ]$ & $ =$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1688	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1689	\rule{0pt}{3ex} $L \; [ ((r_1 \backslash_r c_1) \cdot r_2 +
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1690	(\delta\; (\nullable \; r_1) \; (r_2 \backslash_r c_1) )) \backslash_r
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1691	(c_2 :: \ldots c_n) ]$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1692	$=$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1693	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1694	\rule{0pt}{3ex} $L \; [ ((r_1 \backslash_r c_1c_2 \cdot r_2 +
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1695	(\delta \; (\nullable \; r_1) \; (r_2 \backslash_r c_1c_2)))
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1696	$ & \\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1697	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1698	$\quad + (\delta \ (\nullable \; r_1 \backslash_r c)\; (r_2 \backslash_r c_2) ))
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1699	\backslash_r (c_3 \ldots c_n) ]$ & $\ldots$ \\
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1700	\end{tabular}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1701	\end{center}
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1702	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1703	The $\delta$ function
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1704	returns $r$ when the boolean condition
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1705	$b$ evaluates to true and
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1706	$\ZERO_r$ otherwise:
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1707	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1708	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1709	$\delta \; b\; r$ & $\dn$ & $r \quad \textit{if} \; b \; is \;\textit{true}$\\
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1710	& $\dn$ & $\ZERO_r \quad otherwise$
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1711	\end{tabular}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1712	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1713	\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1714	Note that the term
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1715	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1716	\begin{tabular}{lc}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1717	\rule{0pt}{3ex} $((r_1 \backslash_r c_1c_2 \cdot r_2 +
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1718	(\delta \; (\nullable \; r_1) \; (r_2 \backslash_r c_1c_2)))
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1719	$ & \\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1720	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1721	$\quad + (\delta \ (\nullable \; r_1 \backslash_r c)\; (r_2 \backslash_r c_2) ))
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1722	\backslash_r (c_3 \ldots c_n)$ &\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1723	\end{tabular}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1724	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1725	\noindent
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1726	does not faithfully
671a83abccf3 haha Chengsong parents: 557 diff changeset	1727	represent what the intermediate derivatives would actually look like
671a83abccf3 haha Chengsong parents: 557 diff changeset	1728	when one or more intermediate results $r_1 \backslash s' \cdot r_2$ are not
671a83abccf3 haha Chengsong parents: 557 diff changeset	1729	nullable in the head of the sequence.
671a83abccf3 haha Chengsong parents: 557 diff changeset	1730	For example, when $r_1$ and $r_1 \backslash_r c_1$ are not nullable,
671a83abccf3 haha Chengsong parents: 557 diff changeset	1731	the regular expression would not look like
671a83abccf3 haha Chengsong parents: 557 diff changeset	1732	\[
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1733	r_1 \backslash_r c_1c_2
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1734	\]
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1735	instead of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1736	\[
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1737	(r_1 \backslash_r c_1c_2 + \ZERO_r ) + \ZERO_r.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1738	\]
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1739	The redundant $\ZERO_r$s will not be created in the
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1740	first place.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1741	In a closed-form one needs to take into account this (because
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1742	closed forms require exact equality rather than language equivalence)
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1743	and only generate the
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1744	$r_2 \backslash_r s''$ terms satisfying the property
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1745	\begin{center}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1746	$\exists s'. such \; that \; s'@s'' = s \;\; \land \;\;
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1747	r_1 \backslash s' \; is \; nullable$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1748	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1749	Given the arguments $s$ and $r_1$, we denote the list of strings
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1750	$s''$ satisfying the above property as $\vsuf{s}{r_1}$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1751	The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string\footnote{
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1752	Perhaps a better name for it would be ``NullablePrefixSuffix''
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1753	to differentiate with the list of \emph{all} prefixes of $s$, but
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1754	that is a bit too long for a function name and we are yet to find
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1755	a more concise and easy-to-understand name.}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1756	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1757	\begin{tabular}{lcl}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1758	$\vsuf{[]}{\_} $ & $=$ & $[]$\\
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1759	$\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} \; (\rnullable{r_1}) \; \textit{then} \; (\vsuf{cs}{(\rder{c}{r_1})}) @ [c :: cs]$\\
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1760	&& $\textit{else} \; (\vsuf{cs}{(\rder{c}{r_1}) }) $
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1761	\end{tabular}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1762	\end{center}
671a83abccf3 haha Chengsong parents: 557 diff changeset	1763	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1764	The list starts with shorter suffixes
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1765	and ends with longer ones (in other words, the string elements $s''$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1766	in the list $\vsuf{s}{r_1}$ are sorted
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1767	in the same order as that of the terms $r_2\backslash s''$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1768	appearing in $(r_1\cdot r_2)\backslash s$).
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1769	In essence, $\vsuf{\_}{\_}$ is doing a
671a83abccf3 haha Chengsong parents: 557 diff changeset	1770	"virtual derivative" of $r_1 \cdot r_2$, but instead of producing
671a83abccf3 haha Chengsong parents: 557 diff changeset	1771	the entire result $(r_1 \cdot r_2) \backslash s$,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1772	it only stores strings,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1773	with each string $s''$ representing a term such that $r_2 \backslash s''$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1774	is occurring in $(r_1\cdot r_2)\backslash s$.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1775
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1776	With $\textit{Suffix}$ we are ready to express the
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1777	sequence regular expression's closed form,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1778	but before doing so
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1779	more definitions are needed.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1780	The first thing is the flattening function $\sflat{\_}$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1781	which takes an alternative regular expression and produces a flattened version
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1782	of that alternative regular expression.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1783	It is needed to convert
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1784	a left-associative nested sequence of alternatives into
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1785	a flattened list:
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1786	\[
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1787	\sum(\ldots ((r_1 + r_2) + r_3) + \ldots)
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1788	\stackrel{\sflat{\_}}{\rightarrow}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1789	\sum[r_1, r_2, r_3, \ldots]
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1790	\]
671a83abccf3 haha Chengsong parents: 557 diff changeset	1791	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1792	The definitions of $\sflat{\_}$ and helper functions
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1793	$\sflataux{\_}$ and $\llparenthesis \_ \rrparenthesis''$ are given below.
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1794	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1795	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1796	$\sflataux{\sum r :: rs}$ & $\dn$ & $\sflataux{r} @ rs$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1797	$\sflataux{\sum []}$ & $ \dn $ & $ []$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1798	$\sflataux r$ & $\dn$ & $ [r]$
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1799	\end{tabular}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1800	\end{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	1801
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1802	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1803	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1804	$\sflat{(\sum r :: rs)}$ & $\dn$ & $\sum (\sflataux{r} @ rs)$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1805	$\sflat{\sum []}$ & $ \dn $ & $ \sum []$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1806	$\sflat r$ & $\dn$ & $ r$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1807	\end{tabular}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1808	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1809
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1810	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1811	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1812	$\sflataux{[]}'$ & $ \dn $ & $ []$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1813	$\sflataux{ (r_1 + r_2) :: rs }'$ & $\dn$ & $r_1 :: r_2 :: rs$\\
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1814	$\sflataux{r :: rs}'$ & $\dn$ & $ r::rs$
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1815	\end{tabular}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1816	\end{center}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1817	\noindent
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	1818	$\sflataux{\_}$ breaks up nested alternative regular expressions
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1819	of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1820	into a "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$.
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	1821	It will return the singleton list $[r]$ otherwise.
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1822	$\sflat{\_}$ works the same as $\sflataux{\_}$, except that it keeps
812e5d112f49 more changes Chengsong parents: 556 diff changeset	1823	the output type a regular expression, not a list.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1824	$\sflataux{\_}$ and $\sflat{\_}$ are only recursive on the
671a83abccf3 haha Chengsong parents: 557 diff changeset	1825	first element of the list.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1826	$\sflataux{\_}'$ takes a list of regular expressions as input, and outputs
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1827	a list of regular expressions.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1828	The use of $\sflataux{\_}$ and $\sflataux{\_}'$ is clear once we have
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1829	$\textit{createdBySequence}$ defined:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1830	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1831	\begin{mathpar}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1832	\inferrule{\mbox{}}{\textit{createdBySequence}\; (r_1 \cdot r_2)}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1833
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1834	\inferrule{\textit{createdBySequence} \; r_1}{\textit{createdBySequence} \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1835	(r_1 + r_2)}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1836	\end{mathpar}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1837	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1838	\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1839	The predicate $\textit{createdBySequence}$ is used to describe the shape of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1840	the derivative regular expressions $(r_1\cdot r_2) \backslash s$:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1841	\begin{lemma}\label{recursivelyDerseq}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1842	It is always the case that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1843	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1844	$\textit{createdBySequence} \; ( (r_1\cdot r_2) \backslash_r s) $
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1845	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1846	holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1847	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1848	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1849	By a reverse induction on the string $s$, where the inductive cases are $[]$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1850	and $xs @ [x]$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1851	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1852	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1853	If we have a regular expression $r$ whose shape
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1854	fits into those described by $\textit{createdBySequence}$,
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1855	then we can convert between
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1856	$r \backslash_r c$ and $(\sflataux{r}) \backslash_r c$ with
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1857	$\sflataux{\_}'$:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1858	\begin{lemma}\label{sfauIdemDer}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1859	If $\textit{createdBySequence} \; r$, then
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1860	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1861	$\sflataux{ r \backslash_r c} =
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1862	\llparenthesis (\map \; (\_ \backslash_r c) \; (\sflataux{r}) ) \rrparenthesis''$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1863	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1864	holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1865	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1866	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1867	By a simple induction on the inductive cases of $\textit{createdBySequence}.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1868	$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1869	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1870
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1871	Now we are ready to express
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1872	the shape of $r_1 \cdot r_2 \backslash s$
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1873	\begin{lemma}\label{seqSfau0}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1874	$\sflataux{(r_1 \cdot r_2) \backslash_r s} = (r_1 \backslash_r s) \cdot r_2
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1875	:: (\map \; (r_2 \backslash_r \_) \; (\textit{Suffix} \; s \; r_1))$
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1876	\end{lemma}
671a83abccf3 haha Chengsong parents: 557 diff changeset	1877	\begin{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1878	By a reverse induction on the string $s$, where the inductive cases
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1879	are $[]$ and $xs @ [x]$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1880	For the inductive case, we know that $\textit{createdBySequence} \; ((r_1 \cdot r_2)
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1881	\backslash_r xs)$ holds from lemma \ref{recursivelyDerseq},
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1882	which can be used to prove
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1883	\[
671a83abccf3 haha Chengsong parents: 557 diff changeset	1884	\map \; (r_2 \backslash_r \_) \; (\vsuf{[x]}{(r_1 \backslash_r xs)}) \;\; @ \;\;
671a83abccf3 haha Chengsong parents: 557 diff changeset	1885	\map \; (\_ \backslash_r x) \; (\map \; (r_2 \backslash \_) \; (\vsuf{xs}{r_1}))
671a83abccf3 haha Chengsong parents: 557 diff changeset	1886	\]
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1887	=
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1888	\[
671a83abccf3 haha Chengsong parents: 557 diff changeset	1889	\map \; (r_2 \backslash_r \_) \; (\vsuf{xs @ [x]}{r_1})
671a83abccf3 haha Chengsong parents: 557 diff changeset	1890	\]
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1891	using lemma \ref{sfauIdemDer}.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1892	This equality enables the inductive case to go through.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1893	\end{proof}
671a83abccf3 haha Chengsong parents: 557 diff changeset	1894	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1895	This lemma says that $(r_1\cdot r_2)\backslash s$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1896	can be flattened into a list whose head and tail meet the description
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1897	we gave earlier.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1898	%Note that this lemma does $\mathbf{not}$ depend on any
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1899	%specific definitions we used,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1900	%allowing people investigating derivatives to get an alternative
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1901	%view of what $r_1 \cdot r_2$ is.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1902
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1903	We now use $\textit{createdBySequence}$ and
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1904	$\sflataux{\_}$ to describe an intuition
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1905	behind the sequence closed form.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1906	If two regular expressions only differ in the way their
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1907	alternatives are nested, then we should be able to get the same result
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1908	once we apply simplification to both of them:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1909	\begin{lemma}\label{sflatRsimpeq}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1910	If $r$ is created from a sequence through
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1911	a series of derivatives
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1912	(i.e. if $\textit{createdBySequence} \; r$ holds),
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1913	and that $\sflataux{r} = rs$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1914	then we have
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1915	that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1916	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1917	$\textit{rsimp} \; r = \textit{rsimp} \; (\sum \; rs)$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1918	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1919	holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1920	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1921	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1922	By an induction on the inductive cases of $\textit{createdBySequence}$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1923	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1924
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1925	Now we are ready for the closed form
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1926	for the sequence regular expressions (without the inner applications
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1927	of simplifications):
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1928	\begin{lemma}\label{seqClosedFormGeneral}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1929	$\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }
671a83abccf3 haha Chengsong parents: 557 diff changeset	1930	=\rsimp{(\sum ( (r_1 \backslash s) \cdot r_2 ::
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	1931	\map\; (r_2 \backslash \_) \; (\vsuf{s}{r_1})))}$
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1932	\end{lemma}
671a83abccf3 haha Chengsong parents: 557 diff changeset	1933	\begin{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1934	We know that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1935	$\sflataux{(r_1 \cdot r_2) \backslash_r s} = (r_1 \backslash_r s) \cdot r_2
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1936	:: (\map \; (r_2 \backslash_r \_) \; (\textit{Suffix} \; s \; r_1))$
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1937	holds
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1938	by lemma \ref{seqSfau0}.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1939	This allows the theorem to go through because of lemma \ref{sflatRsimpeq}.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	1940	\end{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1941	Together with the idempotency property of $\rsimp{}$ (lemma \ref{rsimpIdem}),
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1942	it is possible to convert the above lemma to obtain the
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1943	proper closed form for $\backslash_{rsimps}$ rather than $\backslash_r$:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1944	for derivatives nested with simplification:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1945	\begin{theorem}\label{seqClosedForm}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1946	$\rderssimp{(r_1 \cdot r_2)}{s} = \rsimp{(\sum ((r_1 \backslash s) \cdot r_2 )
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1947	:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1})))}$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1948	\end{theorem}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1949	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1950	By a case analysis of the string $s$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1951	When $s$ is an empty list, the rewrite is straightforward.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1952	When $s$ is a non-empty list, the
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1953	lemmas \ref{seqClosedFormGeneral} and \ref{Simpders} apply,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1954	making the proof go through.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1955	\end{proof}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	1956	\subsubsection{Closed Forms for Star Regular Expressions}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1957	The closed form for the star regular expression involves similar tricks
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1958	for the sequence regular expression.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1959	The $\textit{Suffix}$ function is now replaced by something
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1960	slightly more complex, because the growth pattern of star
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1961	regular expressions' derivatives is a bit different:
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	1962	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1963	\begin{tabular}{lclc}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1964	$r^* $ & $\longrightarrow_{\backslash c}$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1965	$(r\backslash c) \cdot r^*$ & $\longrightarrow_{\backslash c'}$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1966	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1967	$r \backslash cc' \cdot r^* + r \backslash c' \cdot r^*$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1968	$\longrightarrow_{\backslash c''}$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1969	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') +
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1970	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*)$ &
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1971	$\longrightarrow_{\backslash c'''}$ \\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1972	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1973	$\ldots$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1974	\end{tabular}
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	1975	\end{center}
3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	1976	When we have a string $s = c :: c' :: c'' \ldots$ such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$,
3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	1977	$r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1978	the number of terms in $r^* \backslash s$ will grow exponentially rather than linearly
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1979	in the sequence case.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1980	The good news is that the function $\textit{rsimp}$ will again
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	1981	ignore the difference between different nesting patterns of alternatives,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1982	and the exponentially growing star derivative like
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1983	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1984	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') +
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1985	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1986	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1987	can be treated as
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1988	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1989	$\RALTS{[r_1 \backslash cc'c'' \cdot r^*, r \backslash c'',
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1990	r \backslash c'c'' \cdot r^, r \backslash c'' \cdot r^]}$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1991	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1992	which can be de-duplicated by $\rDistinct$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1993	and therefore bounded finitely.
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	1994
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1995	%and then de-duplicate terms of the form ($s'$ being a substring of $s$).
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	1996	%This allows us to use a similar technique as $r_1 \cdot r_2$ case,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1997
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1998	Now the crux of this section is finding a suitable description
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	1999	for $rs$ where
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2000	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2001	$\rderssimp{r^*}{s} = \rsimp{\sum rs}$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2002	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2003	holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2004	In addition, the list $rs$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2005	shall be in the form of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2006	$\map \; (\lambda s'. r\backslash s' \cdot r^*) \; Ss$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2007	The $Ss$ is a list of strings, and for example in the sequence
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2008	closed form it is specified as $\textit{Suffix} \; s \; r_1$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2009	To get $Ss$ for the star regular expression,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2010	we need to introduce $\starupdate$ and $\starupdates$:
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2011	\begin{center}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2012	\begin{tabular}{lcl}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2013	$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
671a83abccf3 haha Chengsong parents: 557 diff changeset	2014	$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
671a83abccf3 haha Chengsong parents: 557 diff changeset	2015	& & $\textit{if} \;
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2016	(\rnullable \; (r \backslash_{rs} s))$ \\
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2017	& & $\textit{then} \;\; (s @ [c]) :: [c] :: (
671a83abccf3 haha Chengsong parents: 557 diff changeset	2018	\starupdate \; c \; r \; Ss)$ \\
671a83abccf3 haha Chengsong parents: 557 diff changeset	2019	& & $\textit{else} \;\; (s @ [c]) :: (
671a83abccf3 haha Chengsong parents: 557 diff changeset	2020	\starupdate \; c \; r \; Ss)$
671a83abccf3 haha Chengsong parents: 557 diff changeset	2021	\end{tabular}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2022	\end{center}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2023	\begin{center}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2024	\begin{tabular}{lcl}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2025	$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
671a83abccf3 haha Chengsong parents: 557 diff changeset	2026	$\starupdates \; (c :: cs) \; r \; Ss$ & $=$ & $\starupdates \; cs \; r \; (
671a83abccf3 haha Chengsong parents: 557 diff changeset	2027	\starupdate \; c \; r \; Ss)$
671a83abccf3 haha Chengsong parents: 557 diff changeset	2028	\end{tabular}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2029	\end{center}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2030	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2031	Assuming we have that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2032	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2033	$\rderssimp{r^}{s} = \rsimp{(\sum \map \; (\lambda s'. r\backslash s' \cdot r^) \; Ss)}$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2034	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2035	holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2036	The idea of $\starupdate$ and $\starupdates$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2037	is to update $Ss$ when another
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2038	derivative is taken on $\rderssimp{r^*}{s}$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2039	w.r.t a character $c$ and a string $s'$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2040	respectively.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2041	Both $\starupdate$ and $\starupdates$ take three arguments as input:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2042	the new character $c$ or string $s$ to take derivative with,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2043	the regular expression
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2044	$r$ under the star $r^*$, and the
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2045	list of strings $Ss$ for the derivative $r^* \backslash s$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2046	up until this point
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2047	such that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2048	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2049	$(r^) \backslash s = \sum_{s' \in sSet} (r\backslash s') \cdot r^$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2050	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2051	is satisfied.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2052
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2053	Functions $\starupdate$ and $\starupdates$ characterise what the
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2054	star derivatives will look like once ``straightened out'' into lists.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2055	The helper functions for such operations will be similar to
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2056	$\sflat{\_}$, $\sflataux{\_}$ and $\sflataux{\_}$, which we defined for sequence.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2057	We use similar symbols to denote them, with a $*$ subscript to mark the difference.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2058	\begin{center}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2059	\begin{tabular}{lcl}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2060	$\hflataux{r_1 + r_2}$ & $\dn$ & $\hflataux{r_1} @ \hflataux{r_2}$\\
671a83abccf3 haha Chengsong parents: 557 diff changeset	2061	$\hflataux{r}$ & $\dn$ & $[r]$
671a83abccf3 haha Chengsong parents: 557 diff changeset	2062	\end{tabular}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2063	\end{center}
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	2064
812e5d112f49 more changes Chengsong parents: 556 diff changeset	2065	\begin{center}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2066	\begin{tabular}{lcl}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2067	$\hflat{r_1 + r_2}$ & $\dn$ & $\sum (\hflataux {r_1} @ \hflataux {r_2}) $\\
671a83abccf3 haha Chengsong parents: 557 diff changeset	2068	$\hflat{r}$ & $\dn$ & $r$
671a83abccf3 haha Chengsong parents: 557 diff changeset	2069	\end{tabular}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2070	\end{center}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2071	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2072	These definitions are tailor-made for dealing with alternatives that have
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2073	originated from a star's derivatives.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2074	A typical star derivative always has the structure of a balanced binary tree:
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2075	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2076	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') +
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2077	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2078	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2079	All of the nested structures of alternatives
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2080	generated from derivatives are binary, and therefore
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2081	$\hflat{\_}$ and $\hflataux{\_}$ only deal with binary alternatives.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2082	$\hflat{\_}$ ``untangles'' like the following:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2083	\[
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2084	\sum ((r_1 + r_2) + (r_3 + r_4)) + \ldots \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2085	\stackrel{\hflat{\_}}{\longrightarrow} \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2086	\RALTS{[r_1, r_2, \ldots, r_n]}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2087	\]
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2088	Here is a lemma stating the recursive property of $\starupdate$ and $\starupdates$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2089	with the helpers $\hflat{\_}$ and $\hflataux{\_}$\footnote{The function $\textit{concat}$ takes a list of lists
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2090	and merges each of the element lists to form a flattened list.}:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2091	\begin{lemma}\label{stupdateInduct1}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2092	\mbox
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2093	For a list of strings $Ss$, the following hold.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2094	\begin{itemize}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2095	\item
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2096	If we do a derivative on the terms
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2097	$r\backslash_r s \cdot r^*$ (where $s$ is taken from the list $Ss$),
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2098	the result will be the same as if we apply $\starupdate$ to $Ss$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2099	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2100	\begin{tabular}{c}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2101	$\textit{concat} \; (\map \; (\hflataux{\_} \circ ( (\_\backslash_r x)
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2102	\circ (\lambda s.\;\; (r \backslash_r s) \cdot r^*)))\; Ss )\;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2103	$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2104	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2105	$=$ \\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2106	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2107	$\map \; (\lambda s. (r \backslash_r s) \cdot (r^*)) \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2108	(\starupdate \; x \; r \; Ss)$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2109	\end{tabular}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2110	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2111	\item
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2112	$\starupdates$ is ``composable'' w.r.t a derivative.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2113	It piggybacks the character $x$ to the tail of the string
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2114	$xs$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2115	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2116	\begin{tabular}{c}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2117	$\textit{concat} \; (\map \; \hflataux{\_} \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2118	(\map \; (\_\backslash_r x) \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2119	(\map \; (\lambda s.\;\; (r \backslash_r s) \cdot
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2120	(r^*) ) \; (\starupdates \; xs \; r \; Ss))))$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2121	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2122	$=$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2123	\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2124	$\map \; (\lambda s.\;\; (r\backslash_r s) \cdot (r^*)) \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2125	(\starupdates \; (xs @ [x]) \; r \; Ss)$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2126	\end{tabular}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2127	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2128	\end{itemize}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2129	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2130
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2131	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2132	Part 1 is by induction on $Ss$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2133	Part 2 is by induction on $xs$, where $Ss$ is left to take arbitrary values.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2134	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2135
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2136
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2137	Like $\textit{createdBySequence}$, we need
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2138	a predicate for ``star-created'' regular expressions:
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2139	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2140	\begin{mathpar}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2141	\inferrule{\mbox{}}{ \textit{createdByStar}\; \RSEQ{ra}{\RSTAR{rb}} }
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2142
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2143	\inferrule{ \textit{createdByStar} \; r_1\; \land \; \textit{createdByStar} \; r_2 }{\textit{createdByStar} \; (r_1 + r_2) }
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2144	\end{mathpar}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2145	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2146	\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2147	All regular expressions created by taking derivatives of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2148	$r_1 \cdot (r_2)^*$ satisfy the $\textit{createdByStar}$ predicate:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2149	\begin{lemma}\label{starDersCbs}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2150	$\textit{createdByStar} \; ((r_1 \cdot r_2^*) \backslash_r s) $ holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2151	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2152	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2153	By a reverse induction on $s$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2154	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2155	If a regular expression conforms to the shape of a star's derivative,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2156	then we can push an application of $\hflataux{\_}$ inside a derivative of it:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2157	\begin{lemma}\label{hfauPushin}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2158	If $\textit{createdByStar} \; r$ holds, then
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2159	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2160	$\hflataux{r \backslash_r c} = \textit{concat} \; (
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2161	\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2162	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2163	holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2164	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2165	\begin{proof}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2166	By an induction on the inductive cases of $\textit{createdByStar}$.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2167	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2168	%This is not entirely true for annotated regular expressions:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2169	%%TODO: bsimp bders \neq bderssimp
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2170	%\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2171	% $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2172	%\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2173	%For bit-codes, the order in which simplification is applied
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2174	%might cause a difference in the location they are placed.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2175	%If we want something like
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2176	%\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2177	% $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2178	%\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2179	%Some "canonicalization" procedure is required,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2180	%which either pushes all the common bitcodes to nodes
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2181	%as senior as possible:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2182	%\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2183	% $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2184	%\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2185	%or does the reverse. However bitcodes are not of interest if we are talking about
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2186	%the $\llbracket r \rrbracket$ size of a regex.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2187	%Therefore for the ease and simplicity of producing a
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2188	%proof for a size bound, we are happy to restrict ourselves to
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2189	%unannotated regular expressions, and obtain such equalities as
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2190	%TODO: rsimp sflat
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2191	% The simplification of a flattened out regular expression, provided it comes
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2192	%from the derivative of a star, is the same as the one nested.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2193
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2194
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2195
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2196	Now we introduce an inductive property
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2197	for $\starupdate$ and $\hflataux{\_}$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2198	\begin{lemma}\label{starHfauInduct}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2199	If we do derivatives of $r^*$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2200	with a string that starts with $c$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2201	then flatten it out,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2202	we obtain a list
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2203	of the shape $\sum_{s' \in sS} (r\backslash_r s') \cdot r^*$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2204	where $sS = \starupdates \; s \; r \; [[c]]$. Namely,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2205	\begin{center}
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2206	$\hflataux{(( (\rder{c}{r_0})\cdot(r_0^*))\backslash_{rs} s)} =
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2207	\map \; (\lambda s_1. (r_0 \backslash_r s_1) \cdot (r_0^*)) \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2208	(\starupdates \; s \; r_0 \; [[c]])$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2209	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2210	holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2211	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2212	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2213	By an induction on $s$, the inductive cases
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2214	being $[]$ and $s@[c]$. The lemmas \ref{hfauPushin} and \ref{starDersCbs} are used.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2215	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2216	\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2217
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2218	The function $\hflataux{\_}$ has a similar effect as $\textit{flatten}$:
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2219	\begin{lemma}\label{hflatauxGrewrites}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2220	$a :: rs \grewrites \hflataux{a} @ rs$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2221	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2222	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2223	By induction on $a$. $rs$ is set to take arbitrary values.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2224	\end{proof}
638 dd9dde2d902b comments till chap4 Chengsong parents: 625 diff changeset	2225	It is also not surprising that
dd9dde2d902b comments till chap4 Chengsong parents: 625 diff changeset	2226	two regular expressions differing only in terms
dd9dde2d902b comments till chap4 Chengsong parents: 625 diff changeset	2227	of the
dd9dde2d902b comments till chap4 Chengsong parents: 625 diff changeset	2228	nesting of parentheses are equivalent w.r.t. $\textit{rsimp}$:
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2229	\begin{lemma}\label{cbsHfauRsimpeq1}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2230	$\rsimp{(r_1 + r_2)} = \rsimp{(\RALTS{\hflataux{r_1} @ \hflataux{r_2}})}$
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2231	\end{lemma}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2232
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2233	\begin{proof}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2234	By using the rewriting relation $\rightsquigarrow$
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2235	\end{proof}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2236	And from this we obtain the following fact: a
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2237	regular expression created by star
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2238	is the same as its flattened version, up to equivalence under $\textit{bsimp}$.
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2239	For example,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2240	\begin{lemma}\label{hfauRsimpeq2}
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2241	$\textit{createdByStar} \; r \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2242	\end{lemma}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2243	\begin{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2244	By structural induction on $r$, where the induction rules
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2245	are these of $\createdByStar{\_}$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2246	Lemma \ref{cbsHfauRsimpeq1} is used in the inductive case.
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2247	\end{proof}
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2248
3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2249
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2250	%Here is a corollary that states the lemma in
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2251	%a more intuitive way:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2252	%\begin{corollary}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2253	% $\hflataux{r^* \backslash_r (c::xs)} = \map \; (\lambda s. (r \backslash_r s) \cdot
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2254	% (r^*))\; (\starupdates \; c\; r\; [[c]])$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2255	%\end{corollary}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2256	%\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2257	%Note that this is also agnostic of the simplification
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2258	%function we defined, and is therefore of more general interest.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2259
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2260	Together with the rewriting relation
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2261	\begin{lemma}\label{starClosedForm6Hrewrites}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2262	We have the following set of rewriting relations or equalities:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2263	\begin{itemize}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2264	\item
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2265	$\textit{rsimp} \; (r^* \backslash_r (c::s))
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2266	\sequal
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2267	\sum \; ( ( \sum (\lambda s. (r\backslash_r s) \cdot r^*) \; (
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2268	\starupdates \; s \; r \; [ c::[]] ) ) )$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2269	\item
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2270	$r \backslash_{rsimps} (c::s) = \textit{rsimp} \; ( (
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2271	\sum ( (\map \; (\lambda s_1. (r\backslash s_1) \; r^*) \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2272	(\starupdates \;s \; r \; [ c::[] ])))))$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2273	\item
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2274	$\sum ( (\map \; (\lambda s. (r\backslash s) \; r^*) \; Ss))
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2275	\sequal
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2276	\sum ( (\map \; (\lambda s. \textit{rsimp} \; (r\backslash s) \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2277	r^*) \; Ss) )$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2278	\item
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2279	$\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2280	\scfrewrites
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2281	\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2282	\item
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2283	$( ( \sum ( ( \map \ (\lambda s. \;\;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2284	(\textit{rsimp} \; (r \backslash_r s)) \cdot r^*) \; (\starupdates \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2285	s \; r \; [ c::[] ])))))$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2286	$\sequal$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2287	$( ( \sum ( ( \map \ (\lambda s. \;\;
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2288	( r \backslash_{rsimps} s)) \cdot r^*) \; (\starupdates \;
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2289	s \; r \; [ c::[] ]))))$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2290	\end{itemize}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2291	\end{lemma}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2292	\begin{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2293	Part 1 leads to part 2.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2294	The rest of them are routine.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2295	\end{proof}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2296	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2297	Next the closed form for star regular expressions can be derived:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2298	\begin{theorem}\label{starClosedForm}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2299	$\rderssimp{r^*}{c::s} =
671a83abccf3 haha Chengsong parents: 557 diff changeset	2300	\rsimp{
671a83abccf3 haha Chengsong parents: 557 diff changeset	2301	(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \;
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2302	(\starupdates \; s\; r \; [[c]])
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2303	)
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2304	)
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2305	}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2306	$
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2307	\end{theorem}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2308	\begin{proof}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2309	By an induction on $s$.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2310	The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, \ref{starClosedForm6Hrewrites}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2311	and \ref{hfauRsimpeq2}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2312	are used.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2313	In \ref{starClosedForm6Hrewrites}, the equalities are
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2314	used to link the LHS and RHS.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2315	\end{proof}
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2316
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2317
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2318
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2319
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2320
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2321
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2322	%----------------------------------------------------------------------------------------
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2323	% SECTION ??
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2324	%----------------------------------------------------------------------------------------
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2325
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2326	%-----------------------------------
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2327	% SECTION syntactic equivalence under simp
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2328	%-----------------------------------
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2329
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2330
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2331	%----------------------------------------------------------------------------------------
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2332	% SECTION ALTS CLOSED FORM
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2333	%----------------------------------------------------------------------------------------
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2334	%\section{A Closed Form for \textit{ALTS}}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2335	%Now we prove that $rsimp (rders\_simp (RALTS rs) s) = rsimp (RALTS (map (\lambda r. rders\_simp r s) rs))$.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2336	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2337	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2338	%There are a few key steps, one of these steps is
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2339	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2340	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2341	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2342	%One might want to prove this by something a simple statement like:
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2343	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2344	%For this to hold we want the $\textit{distinct}$ function to pick up
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2345	%the elements before and after derivatives correctly:
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2346	%$r \in rset \equiv (rder x r) \in (rder x rset)$.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2347	%which essentially requires that the function $\backslash$ is an injective mapping.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2348	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2349	%Unfortunately the function $\backslash c$ is not an injective mapping.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2350	%
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2351	%\subsection{function $\backslash c$ is not injective (1-to-1)}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2352	%\begin{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2353	% The derivative $w.r.t$ character $c$ is not one-to-one.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2354	% Formally,
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2355	% $\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2356	%\end{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2357	%This property is trivially true for the
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2358	%character regex example:
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2359	%\begin{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2360	% $r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2361	%\end{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2362	%But apart from the cases where the derivative
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2363	%output is $\ZERO$, are there non-trivial results
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2364	%of derivatives which contain strings?
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2365	%The answer is yes.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2366	%For example,
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2367	%\begin{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2368	% Let $r_1 = a^b\;\quad r_2 = (a\cdot a^)\cdot b + b$.\\
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2369	% where $a$ is not nullable.\\
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2370	% $r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2371	% $r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2372	%\end{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2373	%We start with two syntactically different regular expressions,
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2374	%and end up with the same derivative result.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2375	%This is not surprising as we have such
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2376	%equality as below in the style of Arden's lemma:\\
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2377	%\begin{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2378	% $L(A^B) = L(A\cdot A^ \cdot B + B)$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2379	%\end{center}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2380	\section{Bounding Closed Forms}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2381
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2382	In this section, we introduce how we formalised the bound
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2383	on closed forms.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2384	We first show that in general the number of regular expressions up to a certain
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2385	size is finite.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2386	Then we prove that functions such as $\rflts$
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2387	will not cause the size of r-regular expressions to grow.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2388	Putting this together with a general bound
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2389	on the finiteness of distinct regular expressions
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2390	up to a specific size, we obtain a bound on
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2391	the closed forms.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2392
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2393	\subsection{Finiteness of Distinct Regular Expressions}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2394	We define the set of regular expressions whose size is no more than
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2395	a certain size $N$ as $\textit{sizeNregex} \; N$:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2396	\[
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2397	\textit{sizeNregex} \; N \dn \{r\; \mid \; \llbracket r \rrbracket_r \leq N \}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2398	\]
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2399	We have that $\textit{sizeNregex} \; N$ is always a finite set:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2400	\begin{lemma}\label{finiteSizeN}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2401	$\textit{finite} \; (\textit{sizeNregex} \; N)$ holds.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2402	\end{lemma}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2403	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2404	By splitting the set $\textit{sizeNregex} \; (N + 1)$ into
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2405	subsets by their categories:
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2406	$\{\ZERO_r, \ONE_r, c\}$, $\{r^* \mid r \in \textit{sizeNregex} \; N\}$,
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2407	and so on. Each of these subsets is finitely bounded.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2408	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2409	\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2410	From this we get a corollary that
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2411	if forall $r \in rs$, $\rsize{r} \leq N$, then the output of
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2412	$\rdistinct{rs}{\varnothing}$ is a list of regular
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2413	expressions of finite size depending on $N$ only.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2414	\begin{corollary}\label{finiteSizeNCorollary}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2415	$\rsize{\rdistinct{rs}{\varnothing}} \leq c_N * N$ holds,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2416	where the constant $c_N$ is equal to $\textit{card} \; (\textit{sizeNregex} \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2417	N)$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2418	\end{corollary}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2419	\begin{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2420	For all $r$ in
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2421	$\textit{set} \; (\rdistinct{rs}{\varnothing})$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2422	it is always the case that $\rsize{r} \leq N$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2423	In addition, the list length is bounded by
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2424	$c_N$, yielding the desired bound.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2425	\end{proof}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2426	\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2427	This fact will be handy in estimating the closed form sizes.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2428	%We have proven that the size of the
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2429	%output of $\textit{rdistinct} \; rs' \; \varnothing$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2430	%is bounded by a constant $N * c_N$ depending only on $N$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2431	%provided that each of $rs'$'s element
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2432	%is bounded by $N$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2433
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2434	\subsection{$\textit{rsimp}$ Does Not Increase the Size}
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2435	Although it seems evident, we need a series
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2436	of non-trivial lemmas to establish that functions such as $\rflts$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2437	do not cause the regular expressions to grow.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2438	\begin{lemma}\label{rsimpMonoLemmas}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2439	\mbox{}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2440	\begin{itemize}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2441	\item
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2442	\[
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2443	\llbracket \rsimpalts \; rs \rrbracket_r \leq
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2444	\llbracket \sum \; rs \rrbracket_r
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2445	\]
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2446	\item
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2447	\[
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2448	\llbracket \rsimpseq \; r_1 \; r_2 \rrbracket_r \leq
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2449	\llbracket r_1 \cdot r_2 \rrbracket_r
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2450	\]
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2451	\item
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2452	\[
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2453	\llbracket \rflts \; rs \rrbracket_r \leq
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2454	\llbracket rs \rrbracket_r
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2455	\]
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2456	\item
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2457	\[
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2458	\llbracket \rDistinct \; rs \; ss \rrbracket_r \leq
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2459	\llbracket rs \rrbracket_r
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2460	\]
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2461	\item
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2462	If all elements $a$ in the set $as$ satisfy the property
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2463	that $\llbracket \textit{rsimp} \; a \rrbracket_r \leq
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2464	\llbracket a \rrbracket_r$, then we have
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2465	\[
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2466	\llbracket \; \rsimpalts \; (\textit{rdistinct} \;
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2467	(\textit{rflts} \; (\textit{map}\;\textit{rsimp} as)) \{\})
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2468	\rrbracket \leq
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2469	\llbracket \; \sum \; (\rDistinct \; (\rflts \;(\map \;
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2470	\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2471	\]
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2472	\end{itemize}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2473	\end{lemma}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2474	\begin{proof}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2475	Points 1, 3, and 4 can be proven by an induction on $rs$.
613 b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2476	Point 2 is by case analysis on $r_1$ and $r_2$.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2477	The last part is a corollary of the previous ones.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2478	\end{proof}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2479	\noindent
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2480	With the lemmas for each inductive case in place, we are ready to get
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2481	the non-increasing property as a corollary:
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2482	\begin{corollary}\label{rsimpMono}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2483	$\llbracket \textit{rsimp} \; r \rrbracket_r \leq \llbracket r \rrbracket_r$
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2484	\end{corollary}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2485	\begin{proof}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2486	By \ref{rsimpMonoLemmas}.
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2487	\end{proof}
b0f0d884a547 chap5 Chengsong parents: 611 diff changeset	2488
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2489	\subsection{Estimating the Closed Forms' sizes}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2490	We recap the closed forms we obtained
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2491	earlier:
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2492	\begin{itemize}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2493	\item
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2494	$\rderssimp{(\sum rs)}{s} \sequal
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2495	\sum \; (\map \; (\rderssimp{\_}{s}) \; rs)$
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2496	\item
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2497	$\rderssimp{(r_1 \cdot r_2)}{s} \sequal \sum ((r_1 \backslash s) \cdot r_2 )
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2498	:: (\map \; (r_2 \backslash \_) (\vsuf{s}{r_1}))$
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2499	\item
671a83abccf3 haha Chengsong parents: 557 diff changeset	2500
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2501	$\rderssimp{r^*}{c::s} =
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2502	\rsimp{
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2503	(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \;
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2504	(\starupdates \; s\; r \; [[c]])
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2505	)
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2506	)
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2507	}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2508	$
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2509	\end{itemize}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2510	\noindent
671a83abccf3 haha Chengsong parents: 557 diff changeset	2511	The closed forms on the left-hand-side
671a83abccf3 haha Chengsong parents: 557 diff changeset	2512	are all of the same shape: $\rsimp{ (\sum rs)} $.
671a83abccf3 haha Chengsong parents: 557 diff changeset	2513	Such regular expression will be bounded by the size of $\sum rs'$,
671a83abccf3 haha Chengsong parents: 557 diff changeset	2514	where every element in $rs'$ is distinct, and each element
671a83abccf3 haha Chengsong parents: 557 diff changeset	2515	can be described by some inductive sub-structures
671a83abccf3 haha Chengsong parents: 557 diff changeset	2516	(for example when $r = r_1 \cdot r_2$ then $rs'$
671a83abccf3 haha Chengsong parents: 557 diff changeset	2517	will be solely comprised of $r_1 \backslash s'$
671a83abccf3 haha Chengsong parents: 557 diff changeset	2518	and $r_2 \backslash s''$, $s'$ and $s''$ being
671a83abccf3 haha Chengsong parents: 557 diff changeset	2519	sub-strings of $s$).
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2520	which will each have a size upper bound
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2521	according to the inductive hypothesis, which controls $r \backslash s$.
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	2522
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2523	We elaborate the above reasoning by a series of lemmas
671a83abccf3 haha Chengsong parents: 557 diff changeset	2524	below, where straightforward proofs are omitted.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2525	%We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2526	We show that $\textit{rdistinct}$ and $\rflts$
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2527	working together is at least as
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2528	good as $\textit{rdistinct}$ alone, which can be written as
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2529	\begin{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2530	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2531	\leq
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2532	\llbracket \rdistinct{rs}{\varnothing} \rrbracket_r $.
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2533	\end{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2534	We need this so that we know the outcome of our real
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2535	simplification is better than or equal to a rough estimate,
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2536	and therefore can be bounded by that estimate.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2537	This is a bit harder to establish compared to proving
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2538	$\textit{flts}$ does not make a list larger (which can
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2539	be proven using routine induction):
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2540	\begin{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2541	$\llbracket \textit{rflts}\; rs \rrbracket_r \leq
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2542	\llbracket \textit{rs} \rrbracket_r$
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2543	\end{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2544	We cannot simply prove how each helper function
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2545	reduces the size and then put them together:
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2546	From
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2547	\begin{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2548	$\llbracket \textit{rflts}\; rs \rrbracket_r \leq
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2549	\llbracket \textit{rs} \rrbracket_r$
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2550	\end{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2551	and
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2552	\begin{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2553	$\llbracket \textit{rdistinct} \; rs \; \varnothing \leq
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2554	\llbracket rs \rrbracket_r$
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2555	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2556	one cannot infer
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2557	\begin{center}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2558	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r
671a83abccf3 haha Chengsong parents: 557 diff changeset	2559	\leq
671a83abccf3 haha Chengsong parents: 557 diff changeset	2560	\llbracket \rdistinct{rs}{\varnothing} \rrbracket_r $.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2561	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2562	What we can infer is that
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2563	\begin{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2564	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2565	\leq
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2566	\llbracket rs \rrbracket_r$
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2567	\end{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2568	but this estimate is too rough and $\llbracket rs \rrbracket_r$ is unbounded.
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2569	The way we
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2570	get around this is by first proving a more general lemma
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2571	(so that the inductive case goes through):
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2572	\begin{lemma}\label{fltsSizeReductionAlts}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2573	If we have three accumulator sets:
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2574	$noalts\_set$, $alts\_set$ and $corr\_set$,
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2575	satisfying:
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2576	\begin{itemize}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2577	\item
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2578	$\forall r \in noalts\_set. \; \nexists xs.\; r = \sum xs$
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2579	\item
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2580	$\forall r \in alts\_set. \; \exists xs. \; r = \sum xs
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2581	\; \textit{and} \; set \; xs \subseteq corr\_set$
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2582	\end{itemize}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2583	then we have that
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2584	\begin{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2585	\begin{tabular}{lcl}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2586	$\llbracket (\textit{rdistinct} \; (\textit{rflts} \; as) \;
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2587	(noalts\_set \cup corr\_set)) \rrbracket_r$ & $\leq$ &\\
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2588	$\llbracket (\textit{rdistinct} \; as \; (noalts\_set \cup alts\_set \cup
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2589	\{ \ZERO_r \} )) \rrbracket_r$ & & \\
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2590	\end{tabular}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2591	\end{center}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2592	holds.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	2593	\end{lemma}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2594	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2595	We split the accumulator into two parts: the part
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2596	which contains alternative regular expressions ($alts\_set$), and
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2597	the part without any of them($noalts\_set$).
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2598	This is because $\rflts$ opens up the alternatives in $as$,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2599	causing the accumulators on both sides of the inequality
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2600	to diverge slightly.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2601	If we want to compare the accumulators that are not
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2602	perfectly in sync, we need to consider the alternatives and non-alternatives
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2603	separately.
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2604	The set $corr\_set$ is the corresponding set
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2605	of $alts\_set$ with all elements under the alternative constructor
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2606	spilled out.
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2607	\begin{proof}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2608	By induction on the list $as$. We make use of lemma \ref{rdistinctConcat}.
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2609	\end{proof}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2610	By setting all three sets to the empty set, one gets the desired size estimate:
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2611	\begin{corollary}\label{interactionFltsDB}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2612	$\llbracket \rdistinct{(\rflts \; \textit{rs})}{\varnothing} \rrbracket_r
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2613	\leq
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2614	\llbracket \rdistinct{rs}{\varnothing} \rrbracket_r $.
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2615	\end{corollary}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2616	\begin{proof}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2617	By using the lemma \ref{fltsSizeReductionAlts}.
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2618	\end{proof}
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2619	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2620	The intuition for why this is true
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2621	is that if we remove duplicates from the $\textit{LHS}$, at least the same amount of
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2622	duplicates will be removed from the list $\textit{rs}$ in the $\textit{RHS}$.
671a83abccf3 haha Chengsong parents: 557 diff changeset	2623
671a83abccf3 haha Chengsong parents: 557 diff changeset	2624	Now this $\rsimp{\sum rs}$ can be estimated using $\rdistinct{rs}{\varnothing}$:
671a83abccf3 haha Chengsong parents: 557 diff changeset	2625	\begin{lemma}\label{altsSimpControl}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2626	$\rsize{\rsimp{\sum rs}} \leq \rsize{\rdistinct{rs}{\varnothing}}+ 1$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	2627	\end{lemma}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2628	\begin{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2629	By using corollary \ref{interactionFltsDB}.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2630	\end{proof}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2631	\noindent
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2632	This is a key lemma in establishing the bounds of all the
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2633	closed forms.
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2634	With this we are now ready to control the sizes of
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2635	$(r_1 \cdot r_2 )\backslash s$ and $r^* \backslash s$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2636	\begin{theorem}\label{rBound}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2637	For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2638	\end{theorem}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2639	\noindent
671a83abccf3 haha Chengsong parents: 557 diff changeset	2640	\begin{proof}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2641	We prove this by induction on $r$. The base cases for $\RZERO$,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2642	$\RONE $ and $\RCHAR{c}$ are straightforward.
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2643	In the sequence $r_1 \cdot r_2$ case,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2644	the inductive hypotheses state
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2645	$\exists N_1. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N_1$ and
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2646	$\exists N_2. \forall s. \; \llbracket \rderssimp{r_2}{s} \rrbracket \leq N_2$.
562 57e33978e55d more Chengsong parents: 561 diff changeset	2647
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2648	When the string $s$ is not empty, we can reason as follows
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2649	%
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2650	\begin{center}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2651	\begin{tabular}{lcll}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2652	& & $ \llbracket \rderssimp{r_1\cdot r_2 }{s} \rrbracket_r $\\
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2653	& $ = $ & $\llbracket \rsimp{(\sum(r_1 \backslash_{rsimps} s \cdot r_2 \; \; :: \; \;
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2654	\map \; (r_2\backslash_{rsimps} \_)\; (\vsuf{s}{r})))} \rrbracket_r $ & (1) \\
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2655	& $\leq$ & $\llbracket \rdistinct{(r_1 \backslash_{rsimps} s \cdot r_2 \; \; :: \; \;
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2656	\map \; (r_2\backslash_{rsimps} \_)\; (\vsuf{s}{r}))}{\varnothing} \rrbracket_r + 1$ & (2) \\
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2657	& $\leq$ & $2 + N_1 + \rsize{r_2} + (N_2 * (card\;(\sizeNregex \; N_2)))$ & (3)\\
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2658	\end{tabular}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2659	\end{center}
561 486fb297ac7c more done Chengsong parents: 559 diff changeset	2660	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2661	(1) is by theorem \ref{seqClosedForm}.
561 486fb297ac7c more done Chengsong parents: 559 diff changeset	2662	(2) is by \ref{altsSimpControl}.
486fb297ac7c more done Chengsong parents: 559 diff changeset	2663	(3) is by \ref{finiteSizeNCorollary}.
562 57e33978e55d more Chengsong parents: 561 diff changeset	2664
57e33978e55d more Chengsong parents: 561 diff changeset	2665
57e33978e55d more Chengsong parents: 561 diff changeset	2666	Combining the cases when $s = []$ and $s \neq []$, we get (4):
57e33978e55d more Chengsong parents: 561 diff changeset	2667	\begin{center}
57e33978e55d more Chengsong parents: 561 diff changeset	2668	\begin{tabular}{lcll}
57e33978e55d more Chengsong parents: 561 diff changeset	2669	$\rsize{(r_1 \cdot r_2) \backslash_r s}$ & $\leq$ &
57e33978e55d more Chengsong parents: 561 diff changeset	2670	$max \; (2 + N_1 +
57e33978e55d more Chengsong parents: 561 diff changeset	2671	\llbracket r_2 \rrbracket_r +
57e33978e55d more Chengsong parents: 561 diff changeset	2672	N_2 * (card\; (\sizeNregex \; N_2))) \; \rsize{r_1\cdot r_2}$ & (4)
57e33978e55d more Chengsong parents: 561 diff changeset	2673	\end{tabular}
57e33978e55d more Chengsong parents: 561 diff changeset	2674	\end{center}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2675
562 57e33978e55d more Chengsong parents: 561 diff changeset	2676	We reason similarly for $\STAR$.
57e33978e55d more Chengsong parents: 561 diff changeset	2677	The inductive hypothesis is
57e33978e55d more Chengsong parents: 561 diff changeset	2678	$\exists N. \forall s. \; \llbracket \rderssimp{r}{s} \rrbracket \leq N$.
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2679	Let $n_r = \llbracket r^* \rrbracket_r$.
562 57e33978e55d more Chengsong parents: 561 diff changeset	2680	When $s = c :: cs$ is not empty,
57e33978e55d more Chengsong parents: 561 diff changeset	2681	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2682	\begin{tabular}{lcll}
562 57e33978e55d more Chengsong parents: 561 diff changeset	2683	& & $ \llbracket \rderssimp{r^* }{c::cs} \rrbracket_r $\\
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2684	& $ = $ & $\llbracket \rsimp{(\sum (\map \; (\lambda s. (r \backslash_{rsimps} s) \cdot r^*) \; (\starupdates\;
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2685	cs \; r \; [[c]] )) )} \rrbracket_r $ & (5) \\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2686	& $\leq$ & $\llbracket
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2687	\rdistinct{
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2688	(\map \;
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2689	(\lambda s. (r \backslash_{rsimps} s) \cdot r^*) \;
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2690	(\starupdates\; cs \; r \; [[c]] )
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2691	)}
562 57e33978e55d more Chengsong parents: 561 diff changeset	2692	{\varnothing} \rrbracket_r + 1$ & (6) \\
57e33978e55d more Chengsong parents: 561 diff changeset	2693	& $\leq$ & $1 + (\textit{card} (\sizeNregex \; (N + n_r)))
57e33978e55d more Chengsong parents: 561 diff changeset	2694	* (1 + (N + n_r)) $ & (7)\\
57e33978e55d more Chengsong parents: 561 diff changeset	2695	\end{tabular}
57e33978e55d more Chengsong parents: 561 diff changeset	2696	\end{center}
57e33978e55d more Chengsong parents: 561 diff changeset	2697	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2698	(5) is by theorem \ref{starClosedForm}.
562 57e33978e55d more Chengsong parents: 561 diff changeset	2699	(6) is by \ref{altsSimpControl}.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2700	(7) is by corollary \ref{finiteSizeNCorollary}.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2701	Combining with the case when $s = []$, one obtains
562 57e33978e55d more Chengsong parents: 561 diff changeset	2702	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2703	\begin{tabular}{lcll}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2704	$\rsize{r^* \backslash_r s}$ & $\leq$ & $max \; n_r \; 1 + (\textit{card} (\sizeNregex \; (N + n_r)))
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2705	* (1 + (N + n_r)) $ & (8)\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2706	\end{tabular}
562 57e33978e55d more Chengsong parents: 561 diff changeset	2707	\end{center}
57e33978e55d more Chengsong parents: 561 diff changeset	2708	\noindent
57e33978e55d more Chengsong parents: 561 diff changeset	2709
57e33978e55d more Chengsong parents: 561 diff changeset	2710	The alternative case is slightly less involved.
57e33978e55d more Chengsong parents: 561 diff changeset	2711	The inductive hypothesis
57e33978e55d more Chengsong parents: 561 diff changeset	2712	is equivalent to $\exists N. \forall r \in (\map \; (\_ \backslash_r s) \; rs). \rsize{r} \leq N$.
57e33978e55d more Chengsong parents: 561 diff changeset	2713	In the case when $s = c::cs$, we have
57e33978e55d more Chengsong parents: 561 diff changeset	2714	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2715	\begin{tabular}{lcll}
562 57e33978e55d more Chengsong parents: 561 diff changeset	2716	& & $ \llbracket \rderssimp{\sum rs }{c::cs} \rrbracket_r $\\
620 ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2717	& $ = $ & $\llbracket \rsimp{(\sum (\map \; (\_ \backslash_{rsimps} s) \; rs) )} \rrbracket_r $ & (9) \\
ae6010c14e49 chap6 almost done Chengsong parents: 618 diff changeset	2718	& $\leq$ & $\llbracket (\sum (\map \; (\_ \backslash_{rsimps} s) \; rs) ) \rrbracket_r $ & (10) \\
562 57e33978e55d more Chengsong parents: 561 diff changeset	2719	& $\leq$ & $1 + N * (length \; rs) $ & (11)\\
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2720	\end{tabular}
562 57e33978e55d more Chengsong parents: 561 diff changeset	2721	\end{center}
57e33978e55d more Chengsong parents: 561 diff changeset	2722	\noindent
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2723	(9) is by theorem \ref{altsClosedForm}, (10) by lemma \ref{rsimpMono} and (11) by inductive hypothesis.
562 57e33978e55d more Chengsong parents: 561 diff changeset	2724
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2725	Combining with the case when $s = []$, we obtain
562 57e33978e55d more Chengsong parents: 561 diff changeset	2726	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2727	\begin{tabular}{lcll}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2728	$\rsize{\sum rs \backslash_r s}$ & $\leq$ & $max \; \rsize{\sum rs} \; 1+N*(length \; rs)$
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2729	& (12)\\
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	2730	\end{tabular}
562 57e33978e55d more Chengsong parents: 561 diff changeset	2731	\end{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2732	We have all the inductive cases proven.
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2733	\end{proof}
671a83abccf3 haha Chengsong parents: 557 diff changeset	2734
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2735	This leads to our main result on the size bound:
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2736	\begin{corollary}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2737	For any annotated regular expression $a$, $\exists N_r. \forall s. \; \rsize{\bderssimp{a}{s}} \leq N_r$
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2738	\end{corollary}
3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2739	\begin{proof}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	2740	By lemma \ref{sizeRelations} and theorem \ref{rBound}.
564 3cbcd7cda0a9 more Chengsong parents: 562 diff changeset	2741	\end{proof}
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2742	\noindent
671a83abccf3 haha Chengsong parents: 557 diff changeset	2743
609 61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2744
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2745
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2746
61139fdddae0 chap1 totally done Chengsong parents: 601 diff changeset	2747
558 671a83abccf3 haha Chengsong parents: 557 diff changeset	2748	%-----------------------------------
671a83abccf3 haha Chengsong parents: 557 diff changeset	2749	% SECTION 2
671a83abccf3 haha Chengsong parents: 557 diff changeset	2750	%-----------------------------------
671a83abccf3 haha Chengsong parents: 557 diff changeset	2751
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2752	\section{Bounded Repetitions}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2753	We have promised in chapter \ref{Introduction}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2754	that our lexing algorithm can potentially be extended
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2755	to handle bounded repetitions
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2756	in natural and elegant ways.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2757	Now we fulfill our promise by adding support for
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2758	the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2759	We add clauses in our derivatives-based lexing algorithms (with simplifications)
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2760	introduced in chapter \ref{Bitcoded2}.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2761
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2762	\subsection{Augmented Definitions}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2763	There are a number of definitions that need to be augmented.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2764	The most notable one would be the POSIX rules for $r^{\{n\}}$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2765	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2766	\begin{mathpar}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2767	\inferrule{\forall v \in vs_1. \vdash v:r \land
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2768	\|v\| \neq []\\ \forall v \in vs_2. \vdash v:r \land \|v\| = []\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2769	\textit{length} \; (vs_1 @ vs_2) = n}{\textit{Stars} \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2770	(vs_1 @ vs_2) : r^{\{n\}} }
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2771	\end{mathpar}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2772	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2773	As Ausaf had pointed out \cite{Ausaf},
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2774	sometimes empty iterations have to be taken to get
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2775	a match with exactly $n$ repetitions,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2776	and hence the $vs_2$ part.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2777
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2778	Another important definition would be the size:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2779	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2780	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2781	$\llbracket r^{\{n\}} \rrbracket_r$ & $\dn$ &
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2782	$\llbracket r \rrbracket_r + n$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2783	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2784	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2785	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2786	Arguably we should use $\log \; n$ for the size because
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2787	the number of digits increases logarithmically w.r.t $n$.
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2788	For simplicity we choose to add the counter directly to the size.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2789
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2790	The derivative w.r.t a bounded regular expression
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2791	is given as
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2792	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2793	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2794	$r^{\{n\}} \backslash_r c$ & $\dn$ &
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2795	$r\backslash_r c \cdot r^{\{n-1\}} \;\; \textit{if} \; n \geq 1$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2796	& & $\RZERO \;\quad \quad\quad \quad
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2797	\textit{otherwise}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2798	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2799	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2800	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2801	For brevity, we sometimes use NTIMES to refer to bounded
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2802	regular expressions.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2803	The $\mkeps$ function clause for NTIMES would be
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2804	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2805	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2806	$\mkeps \; r^{\{n\}} $ & $\dn$ & $\Stars \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2807	(\textit{replicate} \; n\; (\mkeps \; r))$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2808	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2809	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2810	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2811	The injection looks like
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2812	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2813	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2814	$\inj \; r^{\{n\}} \; c\; (\Seq \;v \; (\Stars \; vs)) $ &
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2815	$\dn$ & $\Stars \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2816	((\inj \; r \;c \;v ) :: vs)$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2817	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2818	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2819	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2820
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2821
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2822	\subsection{Proofs for the Augmented Lexing Algorithm}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2823	We need to maintain two proofs with the additional $r^{\{n\}}$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2824	construct: the
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2825	correctness proof in chapter \ref{Bitcoded2},
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2826	and the finiteness proof in chapter \ref{Finite}.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2827
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2828	\subsubsection{Correctness Proof Augmentation}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2829	The correctness of $\textit{lexer}$ and $\textit{blexer}$ with bounded repetitions
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2830	have been proven by Ausaf and Urban\cite{AusafDyckhoffUrban2016}.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2831	As they have commented, once the definitions are in place,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2832	the proofs given for the basic regular expressions will extend to
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2833	bounded regular expressions, and there are no ``surprises''.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2834	We confirm this point because the correctness theorem would also
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2835	extend without surprise to $\blexersimp$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2836	The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2837	do not need to be changed,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2838	and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	2839	add one more line which can be solved by the Sledgehammer tool
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2840	to solve the $r^{\{n\}}$ inductive case.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2841
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2842
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2843	\subsubsection{Finiteness Proof Augmentation}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2844	The bounded repetitions are
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2845	very similar to stars, and therefore the treatment
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2846	is similar, with minor changes to handle some slight complications
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2847	when the counter reaches 0.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2848	The exponential growth is similar:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2849	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2850	\begin{tabular}{ll}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2851	$r^{\{n\}} $ & $\longrightarrow_{\backslash c}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2852	$(r\backslash c) \cdot
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2853	r^{\{n - 1\}}*$ & $\longrightarrow_{\backslash c'}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2854	\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2855	$r \backslash cc' \cdot r^{\{n - 2\}}* +
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2856	r \backslash c' \cdot r^{\{n - 1\}}*$ &
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2857	$\longrightarrow_{\backslash c''}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2858	\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2859	$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* +
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2860	r \backslash c''\cdot r^{\{n-1\}}) +
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2861	(r \backslash c'c'' \cdot r^{\{n-2\}}* +
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2862	r \backslash c'' \cdot r^{\{n-1\}}*)$ &
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2863	$\longrightarrow_{\backslash c'''}$ \\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2864	\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2865	$\ldots$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2866	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2867	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2868	Again, we assume that $r\backslash c$, $r \backslash cc'$ and so on
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2869	are all nullable.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2870	The flattened list of terms for $r^{\{n\}} \backslash_{rs} s$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2871	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2872	$[r_1 \backslash cc'c'' \cdot r^{\{n-3\}}*,\;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2873	r \backslash c''\cdot r^{\{n-1\}}, \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2874	r \backslash c'c'' \cdot r^{\{n-2\}}*, \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2875	r \backslash c'' \cdot r^{\{n-1\}}*,\; \ldots ]$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2876	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2877	that comes from
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2878	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2879	$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* +
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2880	r \backslash c''\cdot r^{\{n-1\}}) +
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2881	(r \backslash c'c'' \cdot r^{\{n-2\}}* +
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2882	r \backslash c'' \cdot r^{\{n-1\}}*)+ \ldots$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2883	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2884	are made of sequences with different tails, where the counters
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2885	might differ.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2886	The observation for maintaining the bound is that
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2887	these counters never exceed $n$, the original
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2888	counter. With the number of counters staying finite,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2889	$\rDistinct$ will deduplicate and keep the list finite.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2890	We introduce this idea as a lemma once we describe all
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2891	the necessary helper functions.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2892
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2893	Similar to the star case, we want
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2894	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2895	$\rderssimp{r^{\{n\}}}{s} = \rsimp{\sum rs}$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2896	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2897	where $rs$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2898	shall be in the form of
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2899	$\map \; f \; Ss$, where $f$ is a function and
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2900	$Ss$ a list of objects to act on.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2901	For star, the object's datatype is string.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2902	The list of strings $Ss$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2903	is generated using functions
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2904	$\starupdate$ and $\starupdates$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2905	The function that takes a string and returns a regular expression
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2906	is the anonymous function $
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2907	(\lambda s'. \; r\backslash s' \cdot r^{\{m\}})$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2908	In the NTIMES setting,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2909	the $\starupdate$ and $\starupdates$ functions are replaced by
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2910	$\textit{nupdate}$ and $\textit{nupdates}$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2911	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2912	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2913	$\nupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2914	$\nupdate \; c \; r \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2915	(\Some \; (s, \; n + 1) \; :: \; Ss)$ & $\dn$ & %\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2916	$\textit{if} \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2917	(\rnullable \; (r \backslash_{rs} s))$ \\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2918	& & $\;\;\textit{then}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2919	\;\; \Some \; (s @ [c], n + 1) :: \Some \; ([c], n) :: (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2920	\nupdate \; c \; r \; Ss)$ \\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2921	& & $\textit{else} \;\; \Some \; (s @ [c], n+1) :: (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2922	\nupdate \; c \; r \; Ss)$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2923	$\nupdate \; c \; r \; (\textit{None} :: Ss)$ & $\dn$ &
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2924	$(\None :: \nupdate \; c \; r \; Ss)$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2925	& & \\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2926	%\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2927	%\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2928	%\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2929	%\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2930	$\nupdates \; [] \; r \; Ss$ & $\dn$ & $Ss$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2931	$\nupdates \; (c :: cs) \; r \; Ss$ & $\dn$ & $\nupdates \; cs \; r \; (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2932	\nupdate \; c \; r \; Ss)$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2933	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2934	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2935	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2936	which take into account when a subterm
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2937	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2938	$r \backslash_s s \cdot r^{\{n\}}$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2939	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2940	counter $n$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2941	is 0, and therefore expands to
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2942	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2943	$r \backslash_s (s@[c]) \cdot r^{\{n\}} \;+
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2944	\; \ZERO$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2945	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2946	after taking a derivative.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2947	The object now has type
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2948	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2949	$\textit{option} \;(\textit{string}, \textit{nat})$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2950	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2951	and therefore the function for converting such an option into
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2952	a regular expression term is called $\opterm$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2953
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2954	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2955	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2956	$\opterm \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2957	& & $\;\;\Some \; (s, n) \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2958	(r\backslash_{rs} s)\cdot r^{\{n\}}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2959	& & $\;\;\None \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2960	\ZERO$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2961	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2962	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2963	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2964	Put together, the list $\map \; f \; Ss$ is instantiated as
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2965	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2966	$\map \; (\opterm \; r) \; (\nupdates \; s \; r \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2967	[\Some \; ([c], n)])$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2968	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2969	For the closed form to be bounded, we would like
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2970	simplification to be applied to each term in the list.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2971	Therefore we introduce some variants of $\opterm$,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2972	which help conveniently express the rewriting steps
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2973	needed in the closed form proof.
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2974	We have $\optermOsimp$, $\optermosimp$ and $\optermsimp$
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	2975	with slightly different spellings because they help the proof to go through:
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2976	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2977	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2978	$\optermOsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2979	& & $\;\;\Some \; (s, n) \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2980	\textit{rsimp} \; ((r\backslash_{rs} s)\cdot r^{\{n\}})$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2981	& & $\;\;\None \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2982	\ZERO$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2983	\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2984	$\optermosimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2985	& & $\;\;\Some \; (s, n) \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2986	(\textit{rsimp} \; (r\backslash_{rs} s))
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2987	\cdot r^{\{n\}}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2988	& & $\;\;\None \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2989	\ZERO$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2990	\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2991	$\optermsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2992	& & $\;\;\Some \; (s, n) \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2993	(r\backslash_{rsimps} s)\cdot r^{\{n\}}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2994	& & $\;\;\None \Rightarrow
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2995	\ZERO$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2996	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2997	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2998
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	2999
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3000	For a list of
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3001	$\textit{option} \;(\textit{string}, \textit{nat})$ elements,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3002	we define the highest power for it recursively:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3003	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3004	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3005	$\hpa \; [] \; n $ & $\dn$ & $n$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3006	$\hpa \; (\None :: os) \; n $ & $\dn$ & $\hpa \; os \; n$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3007	$\hpa \; (\Some \; (s, n) :: os) \; m$ & $\dn$ &
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3008	$\hpa \;os \; (\textit{max} \; n\; m)$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3009	\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3010	$\hpower \; rs $ & $\dn$ & $\hpa \; rs \; 0$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3011	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3012	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3013
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3014	Now the intuition that an NTIMES regular expression's power
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3015	does not increase can be easily expressed as
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3016	\begin{lemma}\label{nupdatesMono2}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3017	$\hpower \; (\nupdates \;s \; r \; [\Some \; ([c], n)]) \leq n$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3018	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3019	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3020	Note that the power is non-increasing after a $\nupdate$ application:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3021	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3022	$\hpa \;\; (\nupdate \; c \; r \; Ss)\;\; m \leq
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3023	\hpa\; \; Ss \; m$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3024	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3025	This is also the case for $\nupdates$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3026	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3027	$\hpa \;\; (\nupdates \; s \; r \; Ss)\;\; m \leq
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3028	\hpa\; \; Ss \; m$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3029	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3030	Therefore we have that
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3031	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3032	$\hpower \;\; (\nupdates \; s \; r \; Ss) \leq
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3033	\hpower \;\; Ss$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3034	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3035	which leads to the lemma being proven.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3036
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3037	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3038
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3039
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3040	We also define the inductive rules for
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3041	the shape of derivatives of the NTIMES regular expressions:\\[-3em]
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3042	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3043	\begin{mathpar}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3044	\inferrule{\mbox{}}{\cbn \;\ZERO}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3045
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3046	\inferrule{\mbox{}}{\cbn \; \; r_a \cdot (r^{\{n\}})}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3047
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3048	\inferrule{\cbn \; r_1 \;\; \; \cbn \; r_2}{\cbn \; r_1 + r_2}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3049
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3050	\inferrule{\cbn \; r}{\cbn \; r + \ZERO}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3051	\end{mathpar}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3052	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3053	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3054	A derivative of NTIMES fits into the shape described by $\cbn$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3055	\begin{lemma}\label{ntimesDersCbn}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3056	$\cbn \; ((r' \cdot r^{\{n\}}) \backslash_{rs} s)$ holds.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3057	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3058	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3059	By a reverse induction on $s$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3060	For the inductive case, note that if $\cbn \; r$ holds,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3061	then $\cbn \; (r\backslash_r c)$ holds.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3062	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3063	\noindent
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3064	In addition, for $\cbn$-shaped regular expressions, one can flatten
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3065	them:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3066	\begin{lemma}\label{ntimesHfauPushin}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3067	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} =
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3068	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3069	(\hflataux{r})})$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3070	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3071	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3072	By an induction on the inductive cases of $\cbn$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3073	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3074	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3075	This time we do not need to define the flattening functions for NTIMES only,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3076	because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3077	\begin{lemma}\label{ntimesHfauInduct}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3078	$\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} =
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3079	\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3080	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3081	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3082	By a reverse induction on $s$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3083	The lemmas \ref{ntimesHfauPushin} and \ref{ntimesDersCbn} are used.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3084	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3085	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3086	We have a recursive property for NTIMES with $\nupdate$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3087	similar to that for STAR,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3088	and one for $\nupdates $ as well:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3089	\begin{lemma}\label{nupdateInduct1}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3090	\mbox{}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3091	\begin{itemize}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3092	\item
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3093	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3094	$\textit{concat} \; (\map \; (\hflataux{\_} \circ (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3095	\opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3096	c \; r \; Ss)$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3097	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3098	holds.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3099	\item
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3100	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3101	$\textit{concat} \; (\map \; \hflataux{\_}\;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3102	\map \; (\_\backslash_r x) \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3103	(\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss)))$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3104	$=$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3105	$\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3106	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3107	holds.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3108	\end{itemize}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3109	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3110	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3111	(i) is by an induction on $Ss$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3112	(ii) is by an induction on $xs$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3113	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3114	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3115	The $\nString$ predicate is defined for conveniently
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3116	expressing that there are no empty strings in the
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3117	$\Some \;(s, n)$ elements generated by $\nupdate$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3118	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3119	\begin{tabular}{lcl}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3120	$\nString \; \None$ & $\dn$ & $ \textit{true}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3121	$\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3122	$\nString \; (\Some \; (c::s, n))$ & $\dn$ & $ \textit{true}$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3123	\end{tabular}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3124	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3125	\begin{lemma}\label{nupdatesNonempty}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3126	If for all elements $o \in \textit{set} \; Ss$,
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3127	$\nString \; o$ holds, then we have that
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3128	for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3129	$\nString \; o'$ holds.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3130	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3131	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3132	By an induction on $s$, where $Ss$ is set to vary over all possible values.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3133	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3134
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3135	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3136
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3137	\begin{lemma}\label{ntimesClosedFormsSteps}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3138	The following list of equalities or rewriting relations hold:\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3139	(i) $r^{\{n+1\}} \backslash_{rsimps} (c::s) =
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3140	\textit{rsimp} \; (\sum (\map \; (\opterm \;r \;\_) \; (\nupdates \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3141	s \; r \; [\Some \; ([c], n)])))$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3142	(ii)
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3143	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3144	$\sum (\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3145	\Some \; ([c], n)]))$ \\ $ \sequal$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3146	$\sum (\map \; (\textit{rsimp} \circ (\opterm \; r))\; (\nupdates \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3147	s\;r \; [\Some \; ([c], n)]))$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3148	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3149	(iii)
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3150	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3151	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3152	([c], n)]))$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3153	$\sequal$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3154	$\sum \;(\map \; (\optermsimp r) \; (\nupdates \; s \; r \; [\Some \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3155	([c], n)])) $\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3156	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3157	(iv)
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3158	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3159	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3160	([c], n)])) $ \\ $\sequal$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3161	$\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3162	([c], n)])) $\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3163	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3164	(v)
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3165	\begin{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3166	$\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3167	([c], n)])) $ \\ $\sequal$\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3168	$\sum \; (\map \; (\textit{rsimp} \circ (\opterm \; r)) \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3169	(\nupdates \; s \; r \; [\Some \; ([c], n)]))$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3170	\end{center}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3171	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3172	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3173	Routine.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3174	(iii) and (iv) make use of the fact that all the strings $s$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3175	inside $\Some \; (s, m)$ which are elements of the list
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3176	$\nupdates \; s\;r\;[\Some\; ([c], n)]$ are non-empty,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3177	which is from lemma \ref{nupdatesNonempty}.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3178	Once the string in $o = \Some \; (s, n)$ is
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3179	nonempty, $\optermsimp \; r \;o$,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3180	$\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3181	to be equal.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3182	(v) uses \ref{nupdateInduct1}.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3183	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3184	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3185	Now we are ready to present the closed form for NTIMES:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3186	\begin{theorem}\label{ntimesClosedForm}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3187	The derivative of $r^{\{n+1\}}$ can be described as an alternative
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3188	containing a list
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3189	of terms:\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3190	$r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3191	\sum (\map \; (\optermsimp \; r) \; (\nupdates \; s \; r \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3192	[\Some \; ([c], n)])))$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3193	\end{theorem}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3194	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3195	By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3196	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3197	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3198	The key observation for bounding this closed form
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3199	is that the counter on $r^{\{n\}}$ will
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3200	only decrement during derivatives:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3201	\begin{lemma}\label{nupdatesNLeqN}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3202	For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3203	[\Some \; ([c], n)])$, either $o = \None$, or $o = \Some
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3204	\; (s', m)$ for some string $s'$ and number $m \leq n$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3205	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3206	\noindent
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3207	The proof is routine and therefore omitted.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3208	This allows us to say what kind of terms
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3209	are in the list $\textit{set} \; (\map \; (\optermsimp \; r) \; (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3210	\nupdates \; s \; r \; [\Some \; ([c], n)]))$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3211	only $\ZERO_r$s or a sequence with the tail an $r^{\{m\}}$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3212	with a small $m$:
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3213	\begin{lemma}\label{ntimesClosedFormListElemShape}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3214	For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3215	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3216	we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3217	r^{\{m\}}$ for some string $s'$ and number $m \leq n$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3218	\end{lemma}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3219	\begin{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3220	Using lemma \ref{nupdatesNLeqN}.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3221	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3222
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3223	\begin{theorem}\label{ntimesClosedFormBounded}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3224	Assuming that for any string $s$, $\llbracket r \backslash_{rsimps} s
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3225	\rrbracket_r \leq N$ holds, then we have that\\
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3226	$\llbracket r^{\{n+1\}} \backslash_{rsimps} s \rrbracket_r \leq
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3227	\textit{max} \; (c_N+1)* (N + \llbracket r^{\{n\}} \rrbracket+1)$,
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3228	where $c_N = \textit{card} \; (\textit{sizeNregex} \; (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3229	N + \llbracket r^{\{n\}} \rrbracket_r+1))$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3230	\end{theorem}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3231	\begin{proof}
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3232	We have that for all regular expressions $r'$ in
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3233	\begin{center}
80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3234	$\textit{set} \; (\map \; (\optermsimp \; r) \; (
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3235	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3236	\end{center}
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3237	$r'$'s size is less than or equal to $N + \llbracket r^{\{n\}}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3238	\rrbracket_r + 1$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3239	because $r'$ can only be either a $\ZERO$ or $r \backslash_{rsimps} s' \cdot
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3240	r^{\{m\}}$ for some string $s'$ and number
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3241	$m \leq n$ (lemma \ref{ntimesClosedFormListElemShape}).
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3242	In addition, we know that the list
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3243	$\map \; (\optermsimp \; r) \; (
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3244	\nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3245	$c_N = \textit{card} \;
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3246	(\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3247	This gives us $\llbracket r \backslash_{rsimps} \;s \rrbracket_r
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3248	\leq N * c_N$.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3249	\end{proof}
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3250
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3251	We aim to formalise the correctness and size bound
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3252	for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3253	and so on, which is still work in progress.
b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3254	They should more or less follow the same recipe described in this section.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3255	Once we know how to deal with them recursively using suitable auxiliary
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3256	definitions, we can routinely establish the proofs.
625 b797c9a709d9 section reorganising, related work Chengsong parents: 624 diff changeset	3257
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3258
557 812e5d112f49 more changes Chengsong parents: 556 diff changeset	3259	%----------------------------------------------------------------------------------------
812e5d112f49 more changes Chengsong parents: 556 diff changeset	3260	% SECTION 3
812e5d112f49 more changes Chengsong parents: 556 diff changeset	3261	%----------------------------------------------------------------------------------------
812e5d112f49 more changes Chengsong parents: 556 diff changeset	3262
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3263
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3264	\section{Comments and Future Improvements}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3265	\subsection{Some Experimental Results}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3266	What guarantee does this bound give us?
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3267	It states that whatever the regex is, it will not grow indefinitely.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3268	Take our previous example $(a + aa)^*$ as an example:
cc54ce075db5 restructured Chengsong parents: diff changeset	3269	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3270	\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3271	\begin{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3272	\begin{axis}[
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3273	xlabel={number of $a$'s},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3274	x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3275	ylabel={regex size},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3276	enlargelimits=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3277	xtick={0,5,...,30},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3278	xmax=33,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3279	ymax= 40,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3280	ytick={0,10,...,40},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3281	scaled ticks=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3282	axis lines=left,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3283	width=5cm,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3284	height=4cm,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3285	legend entries={$(a + aa)^*$},
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3286	legend pos=south east,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3287	legend cell align=left]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3288	\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star.data};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3289	\end{axis}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3290	\end{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3291	\end{tabular}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3292	\end{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	3293	We are able to limit the size of the regex $(a + aa)^*$'s derivatives
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3294	with our simplification
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3295	rules very effectively.
cc54ce075db5 restructured Chengsong parents: diff changeset	3296
cc54ce075db5 restructured Chengsong parents: diff changeset	3297
cc54ce075db5 restructured Chengsong parents: diff changeset	3298	In our proof for the inductive case $r_1 \cdot r_2$, the dominant term in the bound
cc54ce075db5 restructured Chengsong parents: diff changeset	3299	is $l_{N_2} * N_2$, where $N_2$ is the bound we have for $\llbracket \bderssimp{r_2}{s} \rrbracket$.
cc54ce075db5 restructured Chengsong parents: diff changeset	3300	Given that $l_{N_2}$ is roughly the size $4^{N_2}$, the size bound $\llbracket \bderssimp{r_1 \cdot r_2}{s} \rrbracket$
cc54ce075db5 restructured Chengsong parents: diff changeset	3301	inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function
cc54ce075db5 restructured Chengsong parents: diff changeset	3302	$f(x) = x * 2^x$.
cc54ce075db5 restructured Chengsong parents: diff changeset	3303	This means the bound we have will surge up at least
cc54ce075db5 restructured Chengsong parents: diff changeset	3304	tower-exponentially with a linear increase of the depth.
cc54ce075db5 restructured Chengsong parents: diff changeset	3305
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3306	One might be pretty skepticafl about what this non-elementary
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3307	bound can bring us.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3308	It turns out that the giant bounds are far from being hit.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3309	Here we have some test data from randomly generated regular expressions:
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3310	\begin{figure}[H]
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3311	\begin{tabular}{@{}c@{\hspace{2mm}}c@{\hspace{0mm}}c@{}}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3312	\begin{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3313	\begin{axis}[
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3314	xlabel={$n$},
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3315	x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3316	ylabel={regex size},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3317	enlargelimits=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3318	xtick={0,5,...,30},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3319	xmax=33,
611 bc1df466150a more Chengsong parents: 610 diff changeset	3320	%ymax=1000,
bc1df466150a more Chengsong parents: 610 diff changeset	3321	%ytick={0,100,...,1000},
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3322	scaled ticks=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3323	axis lines=left,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3324	width=4.75cm,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3325	height=3.8cm,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3326	legend entries={regex1},
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3327	legend pos=north east,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3328	legend cell align=left]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3329	\addplot[red,mark=*, mark options={fill=white}] table {regex1_size_change.data};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3330	\end{axis}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3331	\end{tikzpicture}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3332	&
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3333	\begin{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3334	\begin{axis}[
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3335	xlabel={$n$},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3336	x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3337	%ylabel={time in secs},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3338	enlargelimits=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3339	xtick={0,5,...,30},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3340	xmax=33,
611 bc1df466150a more Chengsong parents: 610 diff changeset	3341	%ymax=1000,
bc1df466150a more Chengsong parents: 610 diff changeset	3342	%ytick={0,100,...,1000},
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3343	scaled ticks=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3344	axis lines=left,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3345	width=4.75cm,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3346	height=3.8cm,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3347	legend entries={regex2},
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3348	legend pos=south east,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3349	legend cell align=left]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3350	\addplot[blue,mark=*, mark options={fill=white}] table {regex2_size_change.data};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3351	\end{axis}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3352	\end{tikzpicture}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3353	&
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3354	\begin{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3355	\begin{axis}[
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3356	xlabel={$n$},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3357	x label style={at={(1.05,-0.05)}},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3358	%ylabel={time in secs},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3359	enlargelimits=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3360	xtick={0,5,...,30},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3361	xmax=33,
611 bc1df466150a more Chengsong parents: 610 diff changeset	3362	%ymax=1000,
bc1df466150a more Chengsong parents: 610 diff changeset	3363	%ytick={0,100,...,1000},
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3364	scaled ticks=false,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3365	axis lines=left,
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3366	width=4.75cm,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3367	height=3.8cm,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3368	legend entries={regex3},
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3369	legend pos=south east,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3370	legend cell align=left]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3371	\addplot[cyan,mark=*, mark options={fill=white}] table {regex3_size_change.data};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3372	\end{axis}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3373	\end{tikzpicture}\\
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3374	\multicolumn{3}{c}{}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3375	\end{tabular}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3376	\caption{Graphs: size change of 3 randomly generated
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3377	regular expressions $w.r.t.$ input string length.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3378	The x-axis represents the length of the input.}
611 bc1df466150a more Chengsong parents: 610 diff changeset	3379	\end{figure}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3380	\noindent
cc54ce075db5 restructured Chengsong parents: diff changeset	3381	Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the
cc54ce075db5 restructured Chengsong parents: diff changeset	3382	original size.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	3383	We will discuss improvements to this bound in the next chapter.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3384
cc54ce075db5 restructured Chengsong parents: diff changeset	3385
cc54ce075db5 restructured Chengsong parents: diff changeset	3386
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3387	\subsection{Possible Further Improvements}
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3388	There are two problems with this finiteness result, though:\\
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3389	(i)
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3390	First, it is not yet a direct formalisation of our lexer's complexity,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3391	as a complexity proof would require looking into
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3392	the time it takes to execute {\bf all} the operations
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3393	involved in the lexer (simp, collect, decode), not just the derivative.\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3394	(ii)
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3395	Second, the bound is not yet tight, and we seek to improve $N_a$ so that
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3396	it is polynomial on $\llbracket a \rrbracket$.\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3397	Still, we believe this contribution is useful,
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3398	because
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3399	\begin{itemize}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3400	\item
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3401
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3402	The size proof can serve as a starting point for a complexity
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3403	formalisation.
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3404	Derivatives are the most important phases of our lexer algorithm.
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3405	Size properties about derivatives cover the majority of the algorithm
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3406	and is therefore a good indication of the complexity of the entire program.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3407	\item
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3408	The bound is already a strong indication that catastrophic
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3409	backtracking is much less likely to occur in our $\blexersimp$
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3410	algorithm.
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3411	We refine $\blexersimp$ with $\blexerStrong$ in the next chapter
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3412	so that we conjecture the bound becomes polynomial.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 577 diff changeset	3413	\end{itemize}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3414
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3415	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	3416	% SECTION 4
cc54ce075db5 restructured Chengsong parents: diff changeset	3417	%----------------------------------------------------------------------------------------
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3418
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3419
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3420
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3421
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3422
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3423
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3424
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3425
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3426	One might wonder about the actual bound rather than the loose bound we gave
bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3427	for the convenience of a more straightforward proof.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3428	How much can the regex $r^* \backslash s$ grow?
cc54ce075db5 restructured Chengsong parents: diff changeset	3429	As earlier graphs have shown,
cc54ce075db5 restructured Chengsong parents: diff changeset	3430	%TODO: reference that graph where size grows quickly
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3431	they can grow at a maximum speed
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3432	exponential $w.r.t$ the number of characters,
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3433	but will eventually level off when the string $s$ is long enough.
cc54ce075db5 restructured Chengsong parents: diff changeset	3434	If they grow to a size exponential $w.r.t$ the original regex, our algorithm
cc54ce075db5 restructured Chengsong parents: diff changeset	3435	would still be slow.
cc54ce075db5 restructured Chengsong parents: diff changeset	3436	And unfortunately, we have concrete examples
576 3e1b699696b6 thesis chap5 Chengsong parents: 564 diff changeset	3437	where such regular expressions grew exponentially large before levelling off:
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3438	\begin{center}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3439	$(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots +
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3440	(\underbrace{a \ldots a}_{\text{n a's}})^)^$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3441	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3442	will already have a maximum
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3443	size that is exponential on the number $n$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3444	under our current simplification rules:
cc54ce075db5 restructured Chengsong parents: diff changeset	3445	%TODO: graph of a regex whose size increases exponentially.
cc54ce075db5 restructured Chengsong parents: diff changeset	3446	\begin{center}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3447	\begin{tikzpicture}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3448	\begin{axis}[
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3449	height=0.5\textwidth,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3450	width=\textwidth,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3451	xlabel=number of a's,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3452	xtick={0,...,9},
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3453	ylabel=maximum size,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3454	ymode=log,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3455	log basis y={2}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3456	]
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3457	\addplot[mark=*,blue] table {re-chengsong.data};
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3458	\end{axis}
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3459	\end{tikzpicture}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3460	\end{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	3461
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3462	For convenience we use $(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^)^$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3463	to express $(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots +
cc54ce075db5 restructured Chengsong parents: diff changeset	3464	(\underbrace{a \ldots a}_{\text{n a's}})^*$ in the below discussion.
cc54ce075db5 restructured Chengsong parents: diff changeset	3465	The exponential size is triggered by that the regex
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3466	$\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3467	inside the $(\ldots) ^*$ having exponentially many
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3468	different derivatives, despite those differences being minor.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3469	$(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^)^\backslash \underbrace{a \ldots a}_{\text{m a's}}$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3470	will therefore contain the following terms (after flattening out all nested
cc54ce075db5 restructured Chengsong parents: diff changeset	3471	alternatives):
cc54ce075db5 restructured Chengsong parents: diff changeset	3472	\begin{center}
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3473	$(\sum_{i = 1}^{n} (\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot (\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot (\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3474	[1mm]
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3475	$(1 \leq m' \leq m )$
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3476	\end{center}
639 80cc6dc4c98b until chap 7 Chengsong parents: 638 diff changeset	3477	There are at least exponentially
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3478	many such terms.\footnote{To be exact, these terms are
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3479	distinct for $m' \leq L.C.M.(1, \ldots, n)$, the details are omitted,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3480	but the point is that the number is exponential.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3481	}
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3482	With each new input character taking the derivative against the intermediate result, more and more such distinct
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3483	terms will accumulate.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3484	The function $\textit{distinctBy}$ will not be able to de-duplicate any two of these terms
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3485	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3486	$(\sum_{i = 1}^{n}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3487	(\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3488	(\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3489	(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^)^$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3490	$(\sum_{i = 1}^{n} (\underbrace{a \ldots a}_{\text{((i - (m'' \% i))\%i) a's}})\cdot
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3491	(\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3492	(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^)^$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3493	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3494	\noindent
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3495	where $m' \neq m''$
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3496	as they are slightly different.
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3497	This means that with our current simplification methods,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3498	we will not be able to control the derivative so that
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3499	$\llbracket \bderssimp{r}{s} \rrbracket$ stays polynomial. %\leq O((\llbracket r\rrbacket)^c)$
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3500	These terms are similar in the sense that the head of those terms
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3501	are all consisted of sub-terms of the form:
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3502	$(\underbrace{a \ldots a}_{\text{j a's}})\cdot (\underbrace{a \ldots a}_{\text{i a's}})^* $.
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3503	For $\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$, there will be at most
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3504	$n * (n + 1) / 2$ such terms.
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3505	For example, $(a^* + (aa)^* + (aaa)^) ^$'s derivatives
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3506	can be described by 6 terms:
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3507	$a^$, $a\cdot (aa)^$, $ (aa)^*$,
83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3508	$aa \cdot (aaa)^$, $a \cdot (aaa)^$, and $(aaa)^*$.
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3509	The total number of different "head terms", $n * (n + 1) / 2$,
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3510	is proportional to the number of characters in the regex
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3511	$(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^)^$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3512	If we can improve our deduplication process so that it becomes smarter
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3513	and only keep track of these $n * (n+1) /2$ terms, then we can keep
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3514	the size growth polynomial again.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3515	This example also suggests a slightly different notion of size, which we call the
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3516	alphabetic width:
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3517	\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3518	\begin{tabular}{lcl}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3519	$\textit{awidth} \; \ZERO$ & $\dn$ & $0$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3520	$\textit{awidth} \; \ONE$ & $\dn$ & $0$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3521	$\textit{awidth} \; c$ & $\dn$ & $1$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3522	$\textit{awidth} \; r_1 + r_2$ & $\dn$ & $\textit{awidth} \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3523	r_1 + \textit{awidth} \; r_2$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3524	$\textit{awidth} \; r_1 \cdot r_2$ & $\dn$ & $\textit{awidth} \;
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3525	r_1 + \textit{awidth} \; r_2$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3526	$\textit{awidth} \; r^*$ & $\dn$ & $\textit{awidth} \; r$\\
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3527	\end{tabular}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3528	\end{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3529
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3530
593 83fab852d72d more chap5 Chengsong parents: 591 diff changeset	3531
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3532	Antimirov\parencite{Antimirov95} has proven that
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3533	$\textit{PDER}_{UNIV}(r) \leq \textit{awidth}(r)$,
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3534	where $\textit{PDER}_{UNIV}(r)$ is a set of all possible subterms
cc54ce075db5 restructured Chengsong parents: diff changeset	3535	created by doing derivatives of $r$ against all possible strings.
cc54ce075db5 restructured Chengsong parents: diff changeset	3536	If we can make sure that at any moment in our lexing algorithm our
cc54ce075db5 restructured Chengsong parents: diff changeset	3537	intermediate result hold at most one copy of each of the
cc54ce075db5 restructured Chengsong parents: diff changeset	3538	subterms then we can get the same bound as Antimirov's.
cc54ce075db5 restructured Chengsong parents: diff changeset	3539	This leads to the algorithm in the next chapter.
cc54ce075db5 restructured Chengsong parents: diff changeset	3540
cc54ce075db5 restructured Chengsong parents: diff changeset	3541
cc54ce075db5 restructured Chengsong parents: diff changeset	3542
cc54ce075db5 restructured Chengsong parents: diff changeset	3543
cc54ce075db5 restructured Chengsong parents: diff changeset	3544
cc54ce075db5 restructured Chengsong parents: diff changeset	3545	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	3546	% SECTION 1
cc54ce075db5 restructured Chengsong parents: diff changeset	3547	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	3548
cc54ce075db5 restructured Chengsong parents: diff changeset	3549
cc54ce075db5 restructured Chengsong parents: diff changeset	3550	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	3551	% SUBSECTION 1
cc54ce075db5 restructured Chengsong parents: diff changeset	3552	%-----------------------------------
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3553	%\subsection{Syntactic Equivalence Under $\simp$}
640 bd1354127574 more proofreading done, last version before submission Chengsong parents: 639 diff changeset	3554	%We prove that minor differences can be annihilated
618 233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3555	%by $\simp$.
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3556	%For example,
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3557	%\begin{center}
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3558	% $\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) =
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3559	% \simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$
233cf2b97d1a chapter 5 finished!! Chengsong parents: 614 diff changeset	3560	%\end{center}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	3561

author	Chengsong
	Mon, 10 Jul 2023 19:29:22 +0100
changeset 664	ba44144875b1
parent 663	0d1e68268d0f
child 668	3831621d7b14
permissions	-rwxr-xr-x