lexing: ChengsongTanPhdThesis/Chapters/Bitcoded2.tex@80cc6dc4c98b (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
cc54ce075db5 restructured Chengsong parents: diff changeset	11
cc54ce075db5 restructured Chengsong parents: diff changeset	12
cc54ce075db5 restructured Chengsong parents: diff changeset	13
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	14	In this chapter we introduce simplifications
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	15	for annotated regular expressions that can be applied to
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	16	each intermediate derivative result. This allows
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	17	us to make $\blexer$ much more efficient.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	18	Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	19	but their simplification functions were inefficient and in some cases needed fixing.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	20	%We contrast our simplification function
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	21	%with Sulzmann and Lu's, indicating the simplicity of our algorithm.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	22	%This is another case for the usefulness
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	23	%and reliability of formal proofs on algorithms.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	24	%These ``aggressive'' simplifications would not be possible in the injection-based
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	25	%lexing we introduced in chapter \ref{Inj}.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	26	%We then prove the correctness with the improved version of
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	27	%$\blexer$, called $\blexersimp$, by establishing
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	28	%$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	29	%
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	30	\section{Simplifications by Sulzmann and Lu}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	31	Consider the derivatives of the following example $(a^a^)^*$:
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	32	%and $(a^* + (aa)^)^$:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	33	\begin{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	34	\begin{tabular}{lcl}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	35	$(a^a^)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ &
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	36	$ (a^a^ + a^)\cdot(a^a^)^$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	37	&
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	38	$ \stackrel{\backslash a}{\longrightarrow} $ &
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	39	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	40	& $\stackrel{\backslash a}{
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	41	\longrightarrow} $ & $\ldots$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	42	\end{tabular}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	43	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	44	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	45	As can be seen, there are serveral duplications.
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	46	A simple-minded simplification function cannot simplify
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	47	the third regular expression in the above chain of derivative
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	48	regular expressions, namely
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	49	\begin{center}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	50	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^$
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	51	\end{center}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	52	because the duplicates are
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	53	not next to each other and therefore the rule
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	54	$r+ r \rightarrow r$ from $\textit{simp}$ does not fire.
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	55	One would expect a better simplification function to work in the
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	56	following way:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	57	\begin{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	58	((a^a^ + \underbrace{a^}_\text{A})+\underbrace{a^}_\text{duplicate of A})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	59	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this}.\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	60	\bigg\downarrow (1) \\
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	61	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	62	\color{gray} + a^* \color{black})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	63	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this} \\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	64	\bigg\downarrow (2) \\
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	65	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	66	)\cdot(a^a^)^*
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	67	\color{gray} + (a^a^ + a^) \cdot(a^a^)^\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	68	\bigg\downarrow (3) \\
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	69	(a^a^ + a^*
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	70	)\cdot(a^a^)^*
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	71	\end{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	72	\noindent
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	73	In the first step, the nested alternative regular expression
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	74	$(a^a^ + a^) + a^$ is flattened into $a^a^ + a^* + a^*$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	75	Now the third term $a^*$ can clearly be identified as a duplicate
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	76	and therefore removed in the second step.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	77	This causes the two
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	78	top-level terms to become the same and the second $(a^a^+a^)\cdot(a^a^)^$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	79	removed in the final step.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	80	Sulzmann and Lu's simplification function (using our notations) can achieve this
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	81	simplification:
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	82	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	83	\begin{tabular}{lcl}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	84	$\textit{simp}\_{SL} \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ &
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	85	$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	86	& &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	87	$\textit{simp}\_{SL} \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	88	\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	89	\textit{then} \;\; \ZERO$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	90	& & $\textit{else}\;\;_{bs}((\textit{simp}\_{SL} \;r_1)\cdot
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	91	(\textit{simp}\_{SL} \; r_2))$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	92	$\textit{simp}\_{SL} \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	93	$\textit{simp}\_{SL} \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	94	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	95	$\textit{simp}\_{SL} \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\textit{simp}\_{SL} \; r)$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	96	$\textit{simp}\_{SL} \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	97	(\nub \; (\filter \; (\neg\zeroable)\;((\textit{simp}\_{SL} \; r) :: \map \; \textit{simp}\_{SL} \; rs)))$\\
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	98
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	99	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	100	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	101	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	102	The $\textit{zeroable}$ predicate
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	103	tests whether the regular expression
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	104	is equivalent to $\ZERO$, and
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	105	can be defined as:
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	106	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	107	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	108	$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	109	\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	110	$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	111	$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	112	$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	113	$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	114	$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	115	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	116	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	117	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	118	The
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	119	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	120	\begin{tabular}{lcl}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	121	$\textit{simp}\_{SL} \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	122	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	123	\end{tabular}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	124	\end{center}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	125	\noindent
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	126	clause does flatten the alternative as required in step (1),
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	127	but $\textit{simp}\_{SL}$ is insufficient if we want to do steps (2) and (3),
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	128	as these ``identical'' terms have different bit-annotations.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	129	They also suggested that the $\textit{simp}\_{SL} $ function should be
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	130	applied repeatedly until a fixpoint is reached.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	131	We call this construction $\textit{SLSimp}$:
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	132	\begin{center}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	133	\begin{tabular}{lcl}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	134	$\textit{SLSimp} \; r$ & $\dn$ &
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	135	$\textit{while}((\textit{simp}\_{SL} \; r)\; \cancel{=} \; r)$ \\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	136	& & $\quad r := \textit{simp}\_{SL} \; r$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	137	& & $\textit{return} \; r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	138	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	139	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	140	We call the operation of alternatingly
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	141	applying derivatives and simplifications
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	142	(until the string is exhausted) Sulz-simp-derivative,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	143	written $\backslash_{SLSimp}$:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	144	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	145	\begin{tabular}{lcl}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	146	$r \backslash_{SLSimp} (c\!::\!s) $ & $\dn$ & $(\textit{SLSimp} \; (r \backslash c)) \backslash_{SLSimp}\, s$ \\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	147	$r \backslash_{SLSimp} [\,] $ & $\dn$ & $r$
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	148	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	149	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	150	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	151	After the derivatives have been taken, the bitcodes
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	152	are extracted and decoded in the same manner
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	153	as $\blexer$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	154	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	155	\begin{tabular}{lcl}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	156	$\textit{blexer\_SLSimp}\;r\,s$ & $\dn$ &
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	157	$\textit{let}\;a = (r^\uparrow)\backslash_{SLSimp}\, s\;\textit{in}$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	158	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	159	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	160	& & $\;\;\textit{else}\;\textit{None}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	161	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	162	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	163	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	164	We implemented this lexing algorithm in Scala,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	165	and found that the final derivative regular expression
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	166	size still grows exponentially (note the logarithmic scale):
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	167	\begin{figure}[H]
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	168	\centering
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	169	\begin{tikzpicture}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	170	\begin{axis}[
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	171	xlabel={$n$},
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	172	ylabel={size},
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	173	ymode = log,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	174	legend entries={Final Derivative Size},
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	175	legend pos=north west,
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	176	legend cell align=left]
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	177	\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	178	\end{axis}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	179	\end{tikzpicture}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	180	\caption{Lexing the regular expression $(a^a^)^*$ against strings of the form
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	181	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	182	$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	183	\end{figure}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	184	\noindent
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	185	At $n= 20$ we already get an out of memory error with Scala's normal
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	186	JVM heap size settings.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	187	In fact their simplification does not improve much over
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	188	the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	189	The time required also grows exponentially:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	190	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	191	\centering
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	192	\begin{tikzpicture}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	193	\begin{axis}[
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	194	xlabel={$n$},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	195	ylabel={time},
601 ce4e5151a836 more Chengsong parents: 600 diff changeset	196	%ymode = log,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	197	legend entries={time in secs},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	198	legend pos=north west,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	199	legend cell align=left]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	200	\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexerTime.data};
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	201	\end{axis}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	202	\end{tikzpicture}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	203	\caption{Lexing the regular expression $(a^a^)^*$ against strings of the form
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	204	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	205	$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexerTime}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	206	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	207	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	208	which seems like a counterexample for
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	209	Sulzmann and Lu's linear complexity claim
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	210	in their paper \cite{Sulzmann2014}:
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	211	\begin{quote}\it
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	212	``Linear-Time Complexity Claim \\It is easy to see that each call of one of the functions/operations:
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	213	simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input.''
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	214	\end{quote}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	215	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	216	The assumption that the size of the regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	217	in the algorithm
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	218	would stay below a finite constant is not true, at least not in the
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	219	examples we considered.
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	220	The main reason behind this is that (i) Haskell's $\textit{nub}$
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	221	function requires identical annotations between two
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	222	annotated regular expressions to qualify as duplicates,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	223	and therefore cannot simplify cases like $_{SZZ}a^+_{SZS}a^$
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	224	even if both $a^*$ denote the same language, and
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	225	(ii) the ``flattening'' only applies to the head of the list
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	226	in the
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	227	\begin{center}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	228	\begin{tabular}{lcl}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	229
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	230	$\textit{simp}\_{SL} \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	231	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	232	\end{tabular}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	233	\end{center}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	234	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	235	clause, and therefore is not strong enough to simplify all
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	236	needed parts of the regular expression. Moreover,
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	237	the $\textit{simp}\_{SL}$ function is applied repeatedly
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	238	in each derivative step until a fixed point is reached,
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	239	which makes the algorithm even more
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	240	unpredictable and inefficient.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	241	%To not get ``caught off guard'' by
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	242	%these counterexamples,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	243	%one needs to be more careful when designing the
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	244	%simplification function and making claims about them.
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	245
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	246	\section{Our $\textit{Simp}$ Function}
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	247	We will now introduce our own simplification function.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	248	%by making a contrast with $\textit{simp}\_{SL}$.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	249	We also describe
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	250	the ideas behind Sulzmann and Lu's $\textit{simp}\_{SL}$
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	251	algorithm
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	252	and why it fails to achieve the desired effect of keeping the sizes of derivatives finitely bounded.
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	253	In addition, our simplification function will come with a formal
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	254	correctness proof.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	255	\subsection{Flattening Nested Alternatives}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	256	The idea behind the clause
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	257	\begin{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	258	$\textit{simp}\_{SL} \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	259	_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	260	\end{center}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	261	is that it allows
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	262	duplicate removal of regular expressions at different
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	263	``levels'' of alternatives.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	264	For example, this would help with the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	265	following simplification:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	266
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	267	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	268	$(a+r)+r \longrightarrow a+r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	269	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	270	The problem is that only the head element
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	271	is ``spilled out''.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	272	It is more desirable
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	273	to flatten
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	274	an entire list to open up possibilities for further simplifications
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	275	with later regular expressions.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	276	Not flattening the rest of the elements also means that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	277	the later de-duplication processs
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	278	does not fully remove further duplicates.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	279	For example,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	280	using $\textit{simp}\_{SL}$ we cannot
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	281	simplify
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	282	\begin{center}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	283	$((a^* a^)+\underline{(a^ + a^)})\cdot (a^a^)^+
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	284	((a^a^)+a^)\cdot (a^a^)^$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	285	\end{center}
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	286	due to the underlined part not being the head
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	287	of the alternative.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	288
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	289	We define our flatten operation so that it flattens
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	290	the entire list:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	291	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	292	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	293	$\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	294	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	295	$\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \; \textit{as'} $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	296	$\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	297	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	298	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	299	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	300	Our $\flts$ operation
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	301	also throws away $\ZERO$s
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	302	as they do not contribute to a lexing result.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	303	\subsection{Duplicate Removal}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	304	After flattening is done, we can deduplicate.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	305	The de-duplicate function is called $\distinctBy$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	306	and that is where we make our second improvement over
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	307	Sulzmann and Lu's simplification method.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	308	The process goes as follows:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	309	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	310	$rs \stackrel{\textit{flts}}{\longrightarrow}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	311	rs_{flat}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	312	\xrightarrow{\distinctBy \;
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	313	rs_{flat} \; \rerases\; \varnothing} rs_{distinct}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	314	%\stackrel{\distinctBy \;
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	315	%rs_{flat} \; \erase\; \varnothing}{\longrightarrow} \; rs_{distinct}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	316	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	317	where the $\distinctBy$ function is defined as:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	318	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	319	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	320	$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	321	$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & $\quad \textit{if} (f \; x \in acc)\;\; \textit{then} \;\; \distinctBy \; xs \; f \; acc$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	322	& & $\quad \textit{else}\;\; x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	323	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	324	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	325	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	326	The reason we define a distinct function under a mapping $f$ is because
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	327	we want to eliminate regular expressions that are syntactically the same,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	328	but have different bit-codes.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	329	For example, we can remove the second $a^a^$ from
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	330	$_{ZSZ}a^a^ + _{SZZ}a^a^$, because it
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	331	represents a match with shorter initial sub-match
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	332	(and therefore is definitely not POSIX),
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	333	and will be discarded by $\bmkeps$ later.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	334	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	335	$_{ZSZ}\underbrace{a^}_{ZS:\; match \; 1\; times\quad}\underbrace{a^}_{Z: \;match\; 1 \;times} +
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	336	_{SZZ}\underbrace{a^}_{S: \; match \; 0 \; times\quad}\underbrace{a^}_{ZZ: \; match \; 2 \; times}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	337	$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	338	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	339	%$_{bs1} r_1 + _{bs2} r_2 \text{where} (r_1)_{\downarrow} = (r_2)_{\downarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	340	Due to the way our algorithm works,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	341	the matches that conform to the POSIX standard
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	342	will always be placed further to the left. When we
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	343	traverse the list from left to right,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	344	regular expressions we have already seen
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	345	will definitely not contribute to a POSIX value,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	346	even if they are attached with different bitcodes.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	347	These duplicates therefore need to be removed.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	348	To achieve this, we call $\rerases$ as the function $f$ during the distinction
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	349	operation. The function
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	350	$\rerases$ is very similar to $\erase$, except that it preserves the structure
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	351	when erasing an alternative regular expression.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	352	The reason why we use $\rerases$ instead of $\erase$ is that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	353	it keeps the structures of alternative
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	354	annotated regular expressions
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	355	whereas $\erase$ would turn it back into a binary tree structure.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	356	Not having to mess with the structure
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	357	greatly simplifies the finiteness proof in chapter
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	358	\ref{Finite}.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	359	We give the definitions of $\rerases$ here together with
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	360	the new datatype used by $\rerases$ (as our plain
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	361	regular expression datatype does not allow non-binary alternatives).
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	362	For now we can think of
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	363	$\rerases$ as the function $(\_)_\downarrow$ defined in chapter \ref{Bitcoded1}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	364	and $\rrexp$ as plain regular expressions, but having a general list constructor
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	365	for alternatives:
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	366	\begin{figure}[H]
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	367	\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	368	$\rrexp ::= \RZERO \mid \RONE
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	369	\mid \RCHAR{c}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	370	\mid \RSEQ{r_1}{r_2}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	371	\mid \RALTS{rs}
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	372	\mid \RSTAR{r} $
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	373	\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	374	\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	375	constructor}\label{rrexpDef}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	376	\end{figure}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	377	The function $\rerases$ we define as follows:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	378	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	379	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	380	$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	381	$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	382	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	383	$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	384	$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	385	$\rerase{_{bs} a ^}$ & $\dn$ & $\rerase{a}^$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	386	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	387	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	388
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	389	\subsection{Putting Things Together}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	390	We can now give the definition of our simplification function:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	391	%that looks somewhat similar to our Scala code is
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	392	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	393	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	394
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	395	$\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp} \; a_2) $ \\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	396	$\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \rerases \; \varnothing) $ \\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	397	$\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	398	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	399	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	400
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	401	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	402	The simplification (named $\textit{bsimp}$ for \emph{b}it-coded)
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	403	does a pattern matching on the regular expression.
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	404	When it detects that the regular expression is an alternative or
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	405	sequence, it will try to simplify its children regular expressions
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	406	recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	407	$\ONE$, which might trigger further simplification at the current level.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	408	Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	409	using rules such as $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	410	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	411	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	412	$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	413	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	414	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	415	&&$\quad\textit{case} \; (_{bs1}\ONE, a_2') \Rightarrow \textit{fuse} \; (bs@bs_1) \; a_2'$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	416	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	417	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	418	\end{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	419	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	420	The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	421	the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	422	and then call $\distinctBy$ on that list, the predicate determining whether two
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	423	elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	424	Finally, depending on whether the regular expression list $as'$ has turned into a
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	425	singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{ALTS}$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	426	decides whether to keep the current level constructor $\sum$ as it is, and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	427	removes it when there are less than two elements:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	428	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	429	\begin{tabular}{lcl}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	430	$\textit{bsimp}_{ALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	431	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	432	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	433	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	434	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	435
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	436	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	437	Having defined the $\textit{bsimp}$ function,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	438	we add it as a phase after a derivative is taken.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	439	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	440	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	441	$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	442	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	443	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	444	%Following previous notations
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	445	%when extending from derivatives w.r.t.~character to derivative
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	446	%w.r.t.~string, we define the derivative that nests simplifications
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	447	%with derivatives:%\comment{simp in the [] case?}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	448	We extend this from characters to strings:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	449	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	450	\begin{tabular}{lcl}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	451	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	452	$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	453	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	454	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	455
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	456	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	457	The lexer that extracts bitcodes from the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	458	derivatives with simplifications from our $\simp$ function
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	459	is called $\blexersimp$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	460	\begin{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	461	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	462	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	463	$\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	464	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	465	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	466	& & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	467	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	468	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	469	\noindent
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	470	This algorithm keeps the regular expression size small,
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	471	as we shall demonstrate with some examples in the next section.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	472
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	473
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	474	\subsection{Examples $(a+aa)^$ and $(a^\cdot a^)^$
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	475	After Simplification}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	476	Recall the
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	477	previous $(a^a^)^*$ example
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	478	where $\textit{simp}\_{SL}$ could not
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	479	prevent the fast growth (over
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	480	3 million nodes just below $20$ input length)
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	481	will be reduced to just 15 and stays constant no matter how long the
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	482	input string is.
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	483	This is shown in the graphs below.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	484	\begin{figure}[H]
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	485	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	486	\begin{tabular}{ll}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	487	\begin{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	488	\begin{axis}[
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	489	xlabel={$n$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	490	ylabel={derivative size},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	491	width=7cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	492	height=4cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	493	legend entries={Lexer with $\textit{bsimp}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	494	legend pos= south east,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	495	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	496	\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	497	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	498	\end{tikzpicture} %\label{fig:BitcodedLexer}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	499	&
7cf9f17aa179 more Chengsong parents: 538 diff changeset	500	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	501	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	502	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	503	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	504	width = 7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	505	height = 4cm,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	506	legend entries={Lexer with $\textit{simp}\_{SL}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	507	legend pos= north west,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	508	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	509	\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	510	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	511	\end{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	512	\end{tabular}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	513	\end{center}
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	514	\caption{Our Improvement over Sulzmann and Lu's in terms of size of the derivatives.}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	515	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	516	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	517	Given the size difference, it is not
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	518	surprising that our $\blexersimp$ significantly outperforms
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	519	$\textit{blexer\_SLSimp}$ by Sulzmann and Lu.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	520	In the next section we are going to establish that our
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	521	simplification preserves the correctness of the algorithm.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	522	%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	523	% SECTION rewrite relation
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	524	%----------------------------------------------------------------------------------------
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	525	\section{Correctness of $\blexersimp$}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	526	We first introduce the rewriting relation \emph{rrewrite}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	527	($\rrewrite$) between two regular expressions,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	528	which stands for an atomic
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	529	simplification.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	530	We then prove properties about
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	531	this rewriting relation and its reflexive transitive closure.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	532	Finally we leverage these properties to show
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	533	an equivalence between the results generated by
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	534	$\blexer$ and $\blexersimp$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	535
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	536	\subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	537	In the $\blexer$'s correctness proof, we
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	538	did not directly derive the fact that $\blexer$ generates the POSIX value,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	539	but first proved that $\blexer$ generates the same result as $\lexer$.
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	540	Then we re-use
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	541	the correctness of $\lexer$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	542	to obtain
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	543	\begin{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	544	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	545	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	546	r\;s = \None$.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	547	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	548	%\begin{center}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	549	% $(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	550	%\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	551	Here we apply this
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	552	modularised technique again
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	553	by first proving that
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	554	$\blexersimp \; r \; s $
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	555	produces the same output as $\blexer \; r\; s$,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	556	and then piecing it together with
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	557	$\blexer$'s correctness to achieve our main
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	558	theorem:
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	559	\begin{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	560	$(r, s) \rightarrow v \; \; \textit{iff} \;\; \blexersimp \; r \; s = \Some \;v$
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	561	\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	562	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	563	r\;s = \None$
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	564	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	565	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	566	The overall idea for the proof
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	567	of $\blexer \;r \;s = \blexersimp \; r \;s$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	568	is that the transition from $r$ to $\textit{bsimp}\; r$ can be
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	569	broken down into smaller rewrite steps of the form:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	570	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	571	$r \rightsquigarrow^* \textit{bsimp} \; r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	572	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	573	where each rewrite step, written $\rightsquigarrow$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	574	is an ``atomic'' simplification that
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	575	is similar to a small-step reduction in operational semantics (
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	576	see figure \ref{rrewriteRules} for the rules):
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	577	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	578	\begin{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	579	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	580
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	581	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	582
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	583	\inferrule * [Right = $S_1$]{\vspace{0em}}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	584
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	585
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	586
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	587	\inferrule * [Right = $SL$] {\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	588
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	589	\inferrule * [Right = $SR$] {\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	590
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	591	\inferrule * [Right = $A0$] {\vspace{0em}}{ _{bs}\sum [] \rightsquigarrow \ZERO}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	592
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	593	\inferrule * [Right = $A1$] {\vspace{0em}}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	594
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	595	\inferrule * [Right = $AL$] {\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	596
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	597	\inferrule * [Right = $LE$] {\vspace{0em}}{ [] \stackrel{s}{\rightsquigarrow} []}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	598
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	599	\inferrule * [Right = $LT$] {rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{ r :: rs_1 \stackrel{s}{\rightsquigarrow} r :: rs_2 }
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	600
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	601	\inferrule * [Right = $LH$] {r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	602
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	603	\inferrule * [Right = $L\ZERO$] {\vspace{0em}}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	604
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	605	\inferrule * [Right = $LS$] {\vspace{0em}}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	606
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	607	\inferrule * [Right = $LD$] {\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rs_c \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	608
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	609	\end{mathpar}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	610	\caption{
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	611	The rewrite rules that generate simplified regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	612	in small steps: $r_1 \rightsquigarrow r_2$ is for bitcoded regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	613	and $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$ for
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	614	lists of bitcoded regular expressions.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	615	Interesting is the LD rule that allows copies of regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	616	to be removed provided a regular expression
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	617	earlier in the list can match the same strings.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	618	}\label{rrewriteRules}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	619	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	620	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	621	The rules $LT$ and $LH$ are for rewriting two regular expression lists
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	622	such that one regular expression
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	623	in the left-hand-side list is rewritable in one step
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	624	to the right-hand-side's regular expression at the same position.
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	625	This helps with defining the ``context rule'' $AL$.
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	626
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	627	The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	628	are defined in the usual way:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	629	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	630	\centering
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	631	\begin{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	632	\inferrule{\vspace{0em}}{ r \rightsquigarrow^* r \\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	633
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	634	\inferrule{\vspace{0em}}{rs \stackrel{s*}{\rightsquigarrow} rs \\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	635
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	636	\inferrule{r_1 \rightsquigarrow^* r_2 \land \; r_2 \rightsquigarrow^* r_3}{r_1 \rightsquigarrow^* r_3\\}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	637
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	638	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \land \; rs_2 \stackrel{s}{\rightsquigarrow} rs_3}{rs_1 \stackrel{s*}{\rightsquigarrow} rs_3}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	639	\end{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	640	\caption{The Reflexive Transitive Closure of
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	641	$\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	642	\end{figure}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	643	%Two rewritable terms will remain rewritable to each other
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	644	%even after a derivative is taken:
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	645	The main point of our rewriting relation
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	646	is that it is preserved under derivatives,
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	647	namely
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	648	\begin{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	649	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	650	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	651	And also, if two terms are rewritable to each other,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	652	then they produce the same bitcodes:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	653	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	654	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	655	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	656	The decoding phase of both $\blexer$ and $\blexersimp$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	657	are the same, which means that if they receive the same
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	658	bitcodes before the decoding phase,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	659	they generate the same value after decoding is done.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	660	We will prove the three properties
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	661	we mentioned above in the next sub-section.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	662	\subsection{Important Properties of $\rightsquigarrow$}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	663	First we prove some basic facts
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	664	about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	665	$\rightsquigarrow^$ and $\stackrel{s}{\rightsquigarrow}$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	666	which will be needed later.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	667	The inference rules (\ref{rrewriteRules}) we
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	668	gave in the previous section
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	669	have their ``many-steps version'':
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	670
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	671	\begin{lemma}\label{squig1}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	672	\hspace{0em}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	673	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	674	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	675	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{}{\rightsquigarrow} _{bs} \sum rs_2$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	676	\item
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	677	$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\; _{bs} \sum (r' :: rs)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	678
826af400b068 more chap4 Chengsong parents: 585 diff changeset	679	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	680	The rewriting in many steps property is composible
826af400b068 more chap4 Chengsong parents: 585 diff changeset	681	in terms of the sequence constructor:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	682	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	683	\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	684	_{bs} r_2 \cdot r_3 \quad $
826af400b068 more chap4 Chengsong parents: 585 diff changeset	685	and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	686	$\quad r_3 \rightsquigarrow^* r_4
826af400b068 more chap4 Chengsong parents: 585 diff changeset	687	\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* _{bs} \; r_1 \cdot r_4$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	688	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	689	The rewriting in many steps properties
826af400b068 more chap4 Chengsong parents: 585 diff changeset	690	$\stackrel{}{\rightsquigarrow}$ and $\stackrel{s}{\rightsquigarrow}$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	691	is preserved under the function $\fuse$:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	692	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	693	\implies \fuse \; bs \; r_1 \rightsquigarrow^* \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	694	\fuse \; bs \; r_2 \quad $ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	695	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	696	\implies \map \; (\fuse \; bs) \; rs_1
826af400b068 more chap4 Chengsong parents: 585 diff changeset	697	\stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	698	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	699	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	700	\begin{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	701	By an induction on
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	702	the inductive cases of $\stackrel{s}{\rightsquigarrow}$ and $\rightsquigarrow^$ respectively.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	703	The third and fourth points are
826af400b068 more chap4 Chengsong parents: 585 diff changeset	704	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	705	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3
826af400b068 more chap4 Chengsong parents: 585 diff changeset	706	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	707	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	708	$\stackrel{s}{\rightsquigarrow}$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	709	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	710	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	711	The inference rules of $\stackrel{s}{\rightsquigarrow}$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	712	are defined in terms of the list cons operation, where
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	713	we establish that the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	714	$\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	715	relation is also preserved w.r.t appending and prepending of a list.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	716	In addition, we
826af400b068 more chap4 Chengsong parents: 585 diff changeset	717	also prove some relations
826af400b068 more chap4 Chengsong parents: 585 diff changeset	718	between $\rightsquigarrow^$ and $\stackrel{s}{\rightsquigarrow}$.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	719	\begin{lemma}\label{ssgqTossgs}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	720	\hspace{0em}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	721	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	722	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	723	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	724
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	725	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	726	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	727	rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2 \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	728	\textit{and} \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	729	rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	730
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	731	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	732	The $\stackrel{s}{\rightsquigarrow} $ relation after appending
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	733	a list becomes $\stackrel{s*}{\rightsquigarrow}$:\\
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	734	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	735	\implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	736	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	737
826af400b068 more chap4 Chengsong parents: 585 diff changeset	738	$r_1 \rightsquigarrow^* r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	739	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	740
826af400b068 more chap4 Chengsong parents: 585 diff changeset	741	$rs_3 \stackrel{s}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^ r_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	742	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	743	\item
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	744	If we can rewrite a regular expression
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	745	in many steps to $\ZERO$, then
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	746	we can also rewrite any sequence containing it to $\ZERO$:\\
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	747	$r_1 \rightsquigarrow^* \ZERO
826af400b068 more chap4 Chengsong parents: 585 diff changeset	748	\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	749	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	750	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	751	\begin{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	752	The first part is by induction on the list $rs$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	753	The second part is by induction on the inductive cases of $\stackrel{s*}{\rightsquigarrow}$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	754	The third part is
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	755	by rule induction of $\stackrel{s}{\rightsquigarrow}$.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	756	The fourth sub-lemma is
826af400b068 more chap4 Chengsong parents: 585 diff changeset	757	by rule induction of
826af400b068 more chap4 Chengsong parents: 585 diff changeset	758	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	759	The fifth part is a corollary of part four.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	760	The last part is proven by rule induction again on $\rightsquigarrow^*$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	761	\end{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	762	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	763	Now we are ready to give the proofs of the following properties:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	764	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	765	\item
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	766	$r \rightsquigarrow^* r'\land \bnullable \; r_1
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	767	\implies \bmkeps \; r = \bmkeps \; r'$. \\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	768	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	769	$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	770	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	771	$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	772	\end{itemize}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	773
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	774	\subsubsection{Property 1: $r \rightsquigarrow^* r'\land \bnullable \; r_1
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	775	\implies \bmkeps \; r = \bmkeps \; r'$}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	776	Intuitively, this property says we can
826af400b068 more chap4 Chengsong parents: 585 diff changeset	777	extract the same bitcodes using $\bmkeps$ from the nullable
826af400b068 more chap4 Chengsong parents: 585 diff changeset	778	components of two regular expressions $r$ and $r'$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	779	if we can rewrite from one to the other in finitely
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	780	many steps.
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	781
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	782	For convenience,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	783	we define a predicate for a list of regular expressions
826af400b068 more chap4 Chengsong parents: 585 diff changeset	784	having at least one nullable regular expressions:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	785	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	786	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	787	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	788	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	789	The rewriting relation $\rightsquigarrow$ preserves (b)nullability:
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	790	\begin{lemma}\label{rewritesBnullable}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	791	\hspace{0em}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	792	\begin{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	793	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	794	$\text{If} \; r_1 \rightsquigarrow r_2, \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	795	\text{then} \; \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	796	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	797	$\text{If} \; rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	798	\text{then} \; \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	799	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	800	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	801	\implies \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	802	\end{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	803	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	804	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	805	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	806	The third point is a corollary of the second.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	807	\end{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	808	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	809	For convenience again,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	810	we define $\bmkepss$ on a list $rs$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	811	which extracts the bit-codes on the first $\bnullable$ element in $rs$:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	812	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	813	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	814	$\bmkepss \; [] $ & $\dn$ & $[]$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	815	$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \;(\bnullable \; r) \;\;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	816	\textit{then} \;\; \bmkeps \; r \; \textit{else} \;\; \bmkepss \; rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	817	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	818	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	819	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	820	If both regular expressions in a rewriting relation are nullable, then they
826af400b068 more chap4 Chengsong parents: 585 diff changeset	821	produce the same bitcodes:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	822	\begin{lemma}\label{rewriteBmkepsAux}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	823	\hspace{0em}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	824	\begin{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	825	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	826	$r_1 \rightsquigarrow r_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	827	(\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 =
826af400b068 more chap4 Chengsong parents: 585 diff changeset	828	\bmkeps \; r_2)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	829	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	830	and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	831	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	832	\implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	833	\bmkepss \; rs_1 = \bmkepss \; rs2)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	834	\end{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	835	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	836	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	837	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	838	\end{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	839	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	840	With lemma \ref{rewriteBmkepsAux} in place we are ready to prove its
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	841	many-step version:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	842	\begin{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	843	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	844	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	845	\begin{proof}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	846	By rule induction of $\stackrel{*}{\rightsquigarrow} $. Lemma
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	847	$\ref{rewritesBnullable}$ gives us both $r$ and $r'$ are nullable.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	848	The lemma \ref{rewriteBmkepsAux} solves the inductive case.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	849	\end{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	850
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	851	\subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$}
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	852	Now we get to the key part of the proof,
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	853	which says that our simplification's helper functions
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	854	such as $\distinctBy$ and $\flts$ describe
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	855	reducts of $\stackrel{s*}{\rightsquigarrow}$ and
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	856	$\rightsquigarrow^* $.
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	857
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	858	The first lemma to prove is a more general version of
826af400b068 more chap4 Chengsong parents: 585 diff changeset	859	$rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	860	\begin{lemma}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	861	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	862	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	863	\end{lemma}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	864	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	865	It says that that for a list made of two parts $rs_1 @ rs_2$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	866	one can throw away the duplicate
826af400b068 more chap4 Chengsong parents: 585 diff changeset	867	elements in $rs_2$, as well as those that have appeared in $rs_1$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	868	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	869	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	870	\end{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	871	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	872	Setting $rs_2$ to be empty,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	873	we get the corollary
826af400b068 more chap4 Chengsong parents: 585 diff changeset	874	\begin{corollary}\label{dBPreserves}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	875	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	876	\end{corollary}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	877	\noindent
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	878	Similarly the flatten function $\flts$ describes a reduct of
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	879	$\stackrel{s*}{\rightsquigarrow}$ as well:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	880
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	881	\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	882	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	883	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	884	\begin{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	885	By an induction on $rs$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	886	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	887	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	888	The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	889	\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	890	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	891	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	892	\noindent
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	893	The simplification function
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	894	$\textit{bsimp}$ only transforms the regular expression using steps specified by
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	895	$\rightsquigarrow^*$ and nothing else:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	896	\begin{lemma}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	897	$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	898	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	899	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	900	By an induction on $r$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	901	The most involved case is the alternative,
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	902	where we use lemmas \ref{bsimpaltsPreserves},
826af400b068 more chap4 Chengsong parents: 585 diff changeset	903	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	904	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	905	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	906	$rs$ & $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	907	& $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	908	& $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	909	(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	910	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	911	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	912	Using this we can derive the following rewrite sequence:\\
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	913	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	914	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	915	$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	916	& $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	917	& $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	918	& $\rightsquigarrow^*$ & $\bsimpalts \; bs \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	919	(\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs))
826af400b068 more chap4 Chengsong parents: 585 diff changeset	920	\; \rerases \; \phi)$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	921	%& $\rightsquigarrow^*$ & $ _{bs} \sum (\distinctBy \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	922	%(\flts \; (\map \; \textit{bsimp}\; rs)) \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	923	%\rerases \; \;\phi) $\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	924	& $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	925	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	926	\end{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	927	\end{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	928	\subsubsection{Property 3: $r_1 \stackrel{}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{}{\rightsquigarrow} r_2 \backslash c$}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	929	The rewrite relation
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	930	$\rightsquigarrow$ changes into $\stackrel{*}{\rightsquigarrow}$
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	931	after derivatives are taken on both sides:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	932	\begin{lemma}\label{rewriteBder}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	933	\hspace{0em}
80e1114d6421 data Chengsong parents: 586 diff changeset	934	\begin{itemize}
80e1114d6421 data Chengsong parents: 586 diff changeset	935	\item
80e1114d6421 data Chengsong parents: 586 diff changeset	936	If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c
80e1114d6421 data Chengsong parents: 586 diff changeset	937	\rightsquigarrow^* r_2 \backslash c$
80e1114d6421 data Chengsong parents: 586 diff changeset	938	\item
80e1114d6421 data Chengsong parents: 586 diff changeset	939	If $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$, then $
80e1114d6421 data Chengsong parents: 586 diff changeset	940	\map \; (\_\backslash c) \; rs_1
80e1114d6421 data Chengsong parents: 586 diff changeset	941	\stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
80e1114d6421 data Chengsong parents: 586 diff changeset	942	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	943	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	944	\begin{proof}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	945	By induction on $\rightsquigarrow$
80e1114d6421 data Chengsong parents: 586 diff changeset	946	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
80e1114d6421 data Chengsong parents: 586 diff changeset	947	\end{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	948	\noindent
80e1114d6421 data Chengsong parents: 586 diff changeset	949	Now we can prove property 3, as an immediate corollary:
80e1114d6421 data Chengsong parents: 586 diff changeset	950	\begin{corollary}\label{rewritesBder}
80e1114d6421 data Chengsong parents: 586 diff changeset	951	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*
80e1114d6421 data Chengsong parents: 586 diff changeset	952	r_2 \backslash c$
80e1114d6421 data Chengsong parents: 586 diff changeset	953	\end{corollary}
80e1114d6421 data Chengsong parents: 586 diff changeset	954	\begin{proof}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	955	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and lemma \ref{rewriteBder}.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	956	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	957	\noindent
588 80e1114d6421 data Chengsong parents: 586 diff changeset	958	This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	959	to obtain the correspondence between
588 80e1114d6421 data Chengsong parents: 586 diff changeset	960	$\blexer$ and $\blexersimp$'s intermediate
80e1114d6421 data Chengsong parents: 586 diff changeset	961	derivative regular expressions
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	962	\begin{lemma}\label{bderBderssimp}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	963	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	964	\end{lemma}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	965	\begin{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	966	By an induction on $s$.
80e1114d6421 data Chengsong parents: 586 diff changeset	967	\end{proof}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	968	\subsection{Main Theorem}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	969	Now with \ref{bderBderssimp} in place we are ready for the main theorem.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	970	\begin{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	971	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	972	\end{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	973	\noindent
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	974	\begin{proof}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	975	We can rewrite in many steps from the original lexer's
588 80e1114d6421 data Chengsong parents: 586 diff changeset	976	derivative regular expressions to the
80e1114d6421 data Chengsong parents: 586 diff changeset	977	lexer with simplification applied (by lemma \ref{bderBderssimp}):
80e1114d6421 data Chengsong parents: 586 diff changeset	978	\begin{center}
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	979	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $.
588 80e1114d6421 data Chengsong parents: 586 diff changeset	980	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	981	We know that they generate the same bits, if the lexing result is a match:
588 80e1114d6421 data Chengsong parents: 586 diff changeset	982	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	983	$\bnullable \; (a \backslash s)
80e1114d6421 data Chengsong parents: 586 diff changeset	984	\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
80e1114d6421 data Chengsong parents: 586 diff changeset	985	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	986	Now that they generate the same bits, we know that they give the same value after decoding.
588 80e1114d6421 data Chengsong parents: 586 diff changeset	987	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	988	$\bnullable \; (a \backslash s)
80e1114d6421 data Chengsong parents: 586 diff changeset	989	\implies \decode \; r \; (\bmkeps \; (a \backslash s)) =
80e1114d6421 data Chengsong parents: 586 diff changeset	990	\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
80e1114d6421 data Chengsong parents: 586 diff changeset	991	\end{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	992	Which is required by our proof goal:
588 80e1114d6421 data Chengsong parents: 586 diff changeset	993	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	994	$\blexer \; r \; s = \blexersimp \; r \; s$.
80e1114d6421 data Chengsong parents: 586 diff changeset	995	\end{center}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	996	\end{proof}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	997	\noindent
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	998	As a corollary,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	999	we can link this result with the lemma we proved earlier that
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	1000	\begin{center}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1001	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = \Some \;v$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1002	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1003	r\;s = \None$.
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	1004	\end{center}
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1005	and obtain the property that the bit-coded lexer with simplification is
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1006	indeed correctly generating a POSIX lexing result, if such a result exists.
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	1007	\begin{corollary}
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1008	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s = \Some \; v$\\
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1009	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1010	r\;s = \None$.
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	1011	\end{corollary}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	1012
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1013	\subsection{Comments on the Proof Techniques Used}
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1014	Straightforward and simple as the proof may seem,
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1015	the efforts we spent obtaining it were far from trivial.
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1016	We initially attempted to re-use the argument
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1017	in \cref{flex_retrieve}.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1018	The problem is that both functions $\inj$ and $\retrieve$ require
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1019	that the annotated regular expressions stay unsimplified,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1020	so that one can
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1021	correctly compare $v_{i+1}$ and $r_i$ and $v_i$
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1022	in diagram \ref{graph:inj}.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1023
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1024	We also tried to prove
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1025	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1026	$\textit{bsimp} \;\; (\bderssimp{a}{s}) =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1027	\textit{bsimp} \;\; (a\backslash s)$,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1028	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1029	but this turns out to be not true.
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1030	A counterexample is
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1031	\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\;
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1032	\text{and} \;\; s = bb.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1033	\]
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1034	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1035	Then we would have
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1036	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1037	$\textit{bsimp}\;\; ( a \backslash s )$ =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1038	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1039	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1040	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1041	whereas
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1042	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1043	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1044	$_{Z}(_{Z} \ONE + _{S} c)$.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1045	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1046	Unfortunately,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1047	if we apply $\textit{bsimp}$ differently
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1048	we will always have this discrepancy.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1049	This is due to
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1050	the $\map \; (\fuse\; bs) \; as$ operation
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1051	happening at different locations in the regular expression.
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1052
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1053	The rewriting relation
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1054	$\rightsquigarrow^*$
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1055	allows us to ignore this discrepancy
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1056	and view the expressions
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1057	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1058	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1059	and\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1060	$_{Z}(_{Z} \ONE + _{S} c)$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1061
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1062	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1063	as equal, because they were both re-written
639 80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1064	from the same expression.
80cc6dc4c98b until chap 7 Chengsong parents: 624 diff changeset	1065
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1066	The simplification rewriting rules
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1067	given in \ref{rrewriteRules} are by no means
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1068	final,
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1069	one could come up new rules
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1070	such as
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1071	$\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1072	\SEQs [r_1, r_2, r_3]$.
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1073	However this does not fit with the proof technique
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1074	of our main theorem, but seem to not violate the POSIX
624 8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1075	property.
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1076
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1077	Having established the correctness of our
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1078	$\blexersimp$, in the next chapter we shall prove that with our $\simp$ function,
8ffa28fce271 all comments incorporated!!+related work Chengsong parents: 601 diff changeset	1079	for a given $r$, the derivative size is always
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1080	finitely bounded by a constant.

author	Chengsong
	Fri, 30 Dec 2022 17:37:51 +0000
changeset 639	80cc6dc4c98b
parent 624	8ffa28fce271
child 640	bd1354127574
permissions	-rwxr-xr-x