lexing: ChengsongTanPhdThesis/Chapters/Bitcoded2.tex@fd068f39ac23 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
cc54ce075db5 restructured Chengsong parents: diff changeset	11
cc54ce075db5 restructured Chengsong parents: diff changeset	12
cc54ce075db5 restructured Chengsong parents: diff changeset	13
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	14	In this chapter we introduce simplifications
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	15	on annotated regular expressions that can be applied to
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	16	each intermediate derivative result. This allows
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	17	us to make $\blexer$ much more efficient.
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	18	Sulzmann and Lu already had some bit-coded simplifications,
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	19	but their simplification functions were inefficient.
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	20	We contrast our simplification function
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	21	with Sulzmann and Lu's, indicating the simplicity of our algorithm.
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	22	This is another case for the usefulness
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	23	and reliability of formal proofs on algorithms.
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	24	These ``aggressive'' simplifications would not be possible in the injection-based
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	25	lexing we introduced in chapter \ref{Inj}.
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	26	We then prove the correctness with the improved version of
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	27	$\blexer$, called $\blexersimp$, by establishing
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	28	$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	29
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	30	\section{Simplifications by Sulzmann and Lu}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	31	Consider the derivatives of examples such as $(a^a^)^*$
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	32	and $(a^* + (aa)^)^$:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	33	\begin{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	34	$(a^a^)^* \stackrel{\backslash a}{\longrightarrow} (a^a^ + a^)\cdot(a^a^)^ \stackrel{\backslash a}{\longrightarrow} $\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	35	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^ \stackrel{\backslash a}{\longrightarrow} \ldots$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	36	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	37	\noindent
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	38	As can be seen, there is a lot of duplication
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	39	in the example we have already mentioned in
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	40	\ref{eqn:growth2}.
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	41	A simple-minded simplification function cannot simplify
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	42	the third regular expression in the above chain of derivative
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	43	regular expressions, namely
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	44	\begin{center}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	45	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^$
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	46	\end{center}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	47	because the duplicates are
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	48	not next to each other and therefore the rule
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	49	$r+ r \rightarrow r$ does not fire.
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	50	One would expect a better simplification function to work in the
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	51	following way:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	52	\begin{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	53	((a^a^ + \underbrace{a^}_\text{A})+\underbrace{a^}_\text{duplicate of A})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	54	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this}.\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	55	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	56	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	57	\color{gray} + a^* \color{black})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	58	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this} \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	59	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	60	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	61	)\cdot(a^a^)^*
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	62	\color{gray} + (a^a^ + a^) \cdot(a^a^)^\\
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	63	\bigg\downarrow \\
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	64	(a^a^ + a^*
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	65	)\cdot(a^a^)^*
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	66	\end{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	67	\noindent
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	68	In the first step, the nested alternative regular expression
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	69	$(a^a^ + a^) + a^$ is flattened into $a^a^ + a^* + a^*$.
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	70	Now the third term $a^*$ is clearly identified as a duplicate
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	71	and therefore removed in the second step. This causes the two
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	72	top-level terms to become the same and the second $(a^a^+a^)\cdot(a^a^)^$
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	73	removed in the final step.\\
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	74	This motivating example is from testing Sulzmann and Lu's
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	75	algorithm: their simplification does
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	76	not work!
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	77	Consider their simplification (using our notations):
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	78	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	79	\begin{tabular}{lcl}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	80	$\simpsulz \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ &
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	81	$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	82	& &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	83	$\simpsulz \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	84	\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	85	\textit{then} \;\; \ZERO$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	86	& & $\textit{else}\;\;_{bs}((\simpsulz \;r_1)\cdot
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	87	(\simpsulz \; r_2))$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	88	$\simpsulz \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	89	$\simpsulz \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	90	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	91	$\simpsulz \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simpsulz \; r)$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	92	$\simpsulz \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	93	(\nub \; (\filter \; (\not \circ \zeroable)\;((\simpsulz \; r) :: \map \; \simpsulz \; rs)))$\\
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	94
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	95	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	96	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	97	\noindent
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	98	where the $\textit{zeroable}$ predicate
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	99	tests whether the regular expression
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	100	is equivalent to $\ZERO$,
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	101	can be defined as:
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	102	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	103	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	104	$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	105	\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	106	$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	107	$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	108	$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	109	$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	110	$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	111	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	112	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	113	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	114	They suggested that the $\simpsulz $ function should be
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	115	applied repeatedly until a fixpoint is reached.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	116	We call this construction $\textit{sulzSimp}$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	117	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	118	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	119	$\textit{sulzSimp} \; r$ & $\dn$ &
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	120	$\textit{while}((\simpsulz \; r)\; \cancel{=} \; r)$ \\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	121	& & $\quad r := \simpsulz \; r$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	122	& & $\textit{return} \; r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	123	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	124	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	125	We call the operation of alternatingly
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	126	applying derivatives and simplifications
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	127	(until the string is exhausted) Sulz-simp-derivative,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	128	written $\backslash_{sulzSimp}$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	129	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	130	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	131	$r \backslash_{sulzSimp} (c\!::\!s) $ & $\dn$ & $(\textit{sulzSimp} \; (r \backslash c)) \backslash_{sulzSimp}\, s$ \\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	132	$r \backslash_{sulzSimp} [\,] $ & $\dn$ & $r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	133	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	134	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	135	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	136	After the derivatives have been taken, the bitcodes
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	137	are extracted and decoded in the same manner
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	138	as $\blexer$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	139	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	140	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	141	$\textit{blexer\_sulzSimp}\;r\,s$ & $\dn$ &
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	142	$\textit{let}\;a = (r^\uparrow)\backslash_{sulzSimp}\, s\;\textit{in}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	143	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	144	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	145	& & $\;\;\textit{else}\;\textit{None}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	146	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	147	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	148	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	149	We implemented this lexing algorithm in Scala,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	150	and found that the final derivative regular expression
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	151	size grows exponentially fast:
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	152	\begin{figure}[H]
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	153	\centering
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	154	\begin{tikzpicture}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	155	\begin{axis}[
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	156	xlabel={$n$},
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	157	ylabel={size},
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	158	ymode = log,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	159	legend entries={Final Derivative Size},
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	160	legend pos=north west,
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	161	legend cell align=left]
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	162	\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	163	\end{axis}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	164	\end{tikzpicture}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	165	\caption{Lexing the regular expression $(a^a^)^*$ against strings of the form
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	166	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	167	$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	168	\end{figure}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	169	\noindent
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	170	At $n= 20$ we already get an out of memory error with Scala's normal
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	171	JVM heap size settings.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	172	In fact their simplification does not improve over
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	173	the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	174	The time required also grows exponentially:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	175	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	176	\centering
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	177	\begin{tikzpicture}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	178	\begin{axis}[
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	179	xlabel={$n$},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	180	ylabel={time},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	181	ymode = log,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	182	legend entries={time in secs},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	183	legend pos=north west,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	184	legend cell align=left]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	185	\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexerTime.data};
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	186	\end{axis}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	187	\end{tikzpicture}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	188	\caption{Lexing the regular expression $(a^a^)^*$ against strings of the form
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	189	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	190	$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexerTime}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	191	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	192	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	193	which seems like a counterexample for
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	194	their linear complexity claim:
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	195	\begin{quote}\it
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	196	Linear-Time Complexity Claim \\It is easy to see that each call of one of the functions/operations:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	197	simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	198	\end{quote}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	199	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	200	The assumption that the size of the regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	201	in the algorithm
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	202	would stay below a finite constant is not ture.
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	203	The main reason behind this is that (i) The $\textit{nub}$
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	204	function requires identical annotations between two
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	205	annotated regular expressions to qualify as duplicates,
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	206	and cannot simplify the cases like $_{SZZ}a^+_{SZS}a^$
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	207	even if both $a^*$ denote the same language.
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	208	(ii) The ``flattening'' only applies to the head of the list
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	209	in the
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	210	\begin{center}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	211	\begin{tabular}{lcl}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	212
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	213	$\simpsulz \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	214	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	215	\end{tabular}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	216	\end{center}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	217	\noindent
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	218	clause, and therefore is not thorough enough to simplify all
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	219	needed parts of the regular expression.\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	220	In addition to that, even if the regular expressions size
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	221	do stay finite, one has to take into account that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	222	the $\simpsulz$ function is applied many times
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	223	in each derivative step, and that number is not necessarily
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	224	a constant with respect to the size of the regular expression.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	225	To not get ``caught off guard'' by
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	226	these counterexamples,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	227	one needs to be more careful when designing the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	228	simplification function and making claims about them.
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	229
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	230	\section{Our $\textit{Simp}$ Function}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	231	We will now introduce our simplification function,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	232	by making a contrast with $\simpsulz$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	233	We describe
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	234	the ideas behind components in their algorithm
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	235	and why they fail to achieve the desired effect, followed
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	236	by our solution. These solutions come with correctness
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	237	statements that are backed up by formal proofs.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	238	\subsection{Flattening Nested Alternatives}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	239	The idea behind the clause
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	240	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	241	$\simpsulz \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	242	_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	243	\end{center}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	244	is that it allows
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	245	duplicate removal of regular expressions at different
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	246	``levels'' of alternatives.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	247	For example, this would help with the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	248	following simplification:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	249
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	250	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	251	$(a+r)+r \longrightarrow a+r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	252	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	253	The problem here is that only the head element
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	254	is ``spilled out'',
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	255	whereas we would want to flatten
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	256	an entire list to open up possibilities for further simplifications.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	257	Not flattening the rest of the elements also means that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	258	the later de-duplication processs
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	259	does not fully remove further duplicates.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	260	For example,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	261	using $\simpsulz$ we could not
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	262	simplify
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	263	\begin{center}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	264	$((a^* a^)+\underline{(a^ + a^)})\cdot (a^a^)^+
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	265	((a^a^)+a^)\cdot (a^a^)^$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	266	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	267	due to the underlined part not in the first element
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	268	of the alternative.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	269	We define a flatten operation that flattens not only
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	270	the first regular expression of an alternative,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	271	but the entire list:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	272	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	273	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	274	$\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	275	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	276	$\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \; \textit{as'} $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	277	$\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	278	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	279	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	280	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	281	Our $\flts$ operation
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	282	also throws away $\ZERO$s
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	283	as they do not contribute to a lexing result.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	284	\subsection{Duplicate Removal}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	285	After flattening is done, we are ready to deduplicate.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	286	The de-duplicate function is called $\distinctBy$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	287	and that is where we make our second improvement over
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	288	Sulzmann and Lu's.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	289	The process goes as follows:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	290	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	291	$rs \stackrel{\textit{flts}}{\longrightarrow}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	292	rs_{flat}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	293	\xrightarrow{\distinctBy \;
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	294	rs_{flat} \; \rerases\; \varnothing} rs_{distinct}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	295	%\stackrel{\distinctBy \;
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	296	%rs_{flat} \; \erase\; \varnothing}{\longrightarrow} \; rs_{distinct}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	297	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	298	where the $\distinctBy$ function is defined as:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	299	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	300	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	301	$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	302	$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & $\quad \textit{if} (f \; x \in acc)\;\; \textit{then} \;\; \distinctBy \; xs \; f \; acc$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	303	& & $\quad \textit{else}\;\; x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	304	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	305	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	306	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	307	The reason we define a distinct function under a mapping $f$ is because
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	308	we want to eliminate regular expressions that are syntactically the same,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	309	but with different bit-codes.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	310	For example, we can remove the second $a^a^$ from
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	311	$_{ZSZ}a^a^ + _{SZZ}a^a^$, because it
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	312	represents a match with shorter initial sub-match
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	313	(and therefore is definitely not POSIX),
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	314	and will be discarded by $\bmkeps$ later.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	315	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	316	$_{ZSZ}\underbrace{a^}_{ZS:\; match \; 1\; times\quad}\underbrace{a^}_{Z: \;match\; 1 \;times} +
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	317	_{SZZ}\underbrace{a^}_{S: \; match \; 0 \; times\quad}\underbrace{a^}_{ZZ: \; match \; 2 \; times}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	318	$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	319	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	320	%$_{bs1} r_1 + _{bs2} r_2 \text{where} (r_1)_{\downarrow} = (r_2)_{\downarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	321	Due to the way our algorithm works,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	322	the matches that conform to the POSIX standard
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	323	will always be placed further to the left. When we
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	324	traverse the list from left to right,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	325	regular expressions we have already seen
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	326	will definitely not contribute to a POSIX value,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	327	even if they are attached with different bitcodes.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	328	These duplicates therefore need to be removed.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	329	To achieve this, we call $\rerases$ as the function $f$ during the distinction
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	330	operation.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	331	$\rerases$ is very similar to $\erase$, except that it preserves the structure
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	332	when erasing an alternative regular expression.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	333	The reason why we use $\rerases$ instead of $\erase$ is that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	334	it keeps the structures of alternative
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	335	annotated regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	336	whereas $\erase$ would turn it back into a binary structure.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	337	Not having to mess with the structure
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	338	greatly simplifies the finiteness proof in chapter
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	339	\ref{Finite} (we will follow up with more details there).
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	340	We give the definitions of $\rerases$ here together with
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	341	the new datatype used by $\rerases$ (as our plain
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	342	regular expression datatype does not allow non-binary alternatives).
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	343	For the moment the reader can just think of
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	344	$\rerases$ as $\erase$ and $\rrexp$ as plain regular expressions.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	345	\begin{figure}[H]
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	346	\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	347	$\rrexp ::= \RZERO \mid \RONE
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	348	\mid \RCHAR{c}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	349	\mid \RSEQ{r_1}{r_2}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	350	\mid \RALTS{rs}
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	351	\mid \RSTAR{r} $
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	352	\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	353	\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	354	constructor}\label{rrexpDef}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	355	\end{figure}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	356	The notation of $\rerases$ also follows that of $\erase$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	357	which is a postfix operator written as a subscript,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	358	except that it has an \emph{r} attached to it to distinguish against $\erase$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	359	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	360	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	361	$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	362	$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	363	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	364	$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	365	$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	366	$\rerase{_{bs} a ^}$ & $\dn$ & $\rerase{a}^$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	367	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	368	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	369
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	370	\subsection{Putting Things Together}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	371	A recursive definition of our simplification function
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	372	is given below:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	373	%that looks somewhat similar to our Scala code is
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	374	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	375	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	376
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	377	$\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp} \; a_2) $ \\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	378	$\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \rerases \; \varnothing) $ \\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	379	$\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	380	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	381	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	382
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	383	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	384	The simplification (named $\textit{bsimp}$ for \emph{b}it-coded)
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	385	does a pattern matching on the regular expression.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	386	When it detected that the regular expression is an alternative or
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	387	sequence, it will try to simplify its children regular expressions
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	388	recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	389	$\ONE$, which might trigger further simplification at the current level.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	390	Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	391	using rules such as $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	392	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	393	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	394	$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	395	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	396	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	397	&&$\quad\textit{case} \; (_{bs1}\ONE, a_2') \Rightarrow \textit{fuse} \; (bs@bs_1) \; a_2'$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	398	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	399	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	400	\end{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	401	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	402	The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	403	the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	404	and then call $\distinctBy$ on that list, the predicate determining whether two
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	405	elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	406	Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	407	singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	408	decides whether to keep the current level constructor $\sum$ as it is, and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	409	removes it when there are less than two elements:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	410	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	411	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	412	$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	413	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	414	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	415	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	416	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	417
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	418	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	419	Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	420	we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	421	so it stays small:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	422	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	423	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	424	$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	425	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	426	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	427	%Following previous notations
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	428	%when extending from derivatives w.r.t.~character to derivative
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	429	%w.r.t.~string, we define the derivative that nests simplifications
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	430	%with derivatives:%\comment{simp in the [] case?}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	431	We extend this from character to string:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	432	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	433	\begin{tabular}{lcl}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	434	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	435	$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	436	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	437	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	438
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	439	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	440	The lexer that extracts bitcodes from the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	441	derivatives with simplifications from our $\simp$ function
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	442	is called $\blexersimp$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	443	\begin{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	444	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	445	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	446	$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	447	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	448	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	449	& & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	450	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	451	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	452
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	453	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	454	This algorithm keeps the regular expression size small.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	455
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	456
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	457	\subsection{Examples $(a+aa)^$ and $(a^\cdot a^)^$
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	458	After Simplification}
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	459	Recall the
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	460	previous $(a^a^)^*$ example
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	461	where $\simpsulz$ could not
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	462	prevent the fast growth (over
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	463	3 million nodes just below $20$ input length)
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	464	will be reduced to just 15 and stays constant no matter how long the
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	465	input string is.
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	466	This is shown in the graphs below.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	467	\begin{figure}[H]
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	468	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	469	\begin{tabular}{ll}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	470	\begin{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	471	\begin{axis}[
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	472	xlabel={$n$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	473	ylabel={derivative size},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	474	width=7cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	475	height=4cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	476	legend entries={Lexer with $\textit{bsimp}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	477	legend pos= south east,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	478	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	479	\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	480	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	481	\end{tikzpicture} %\label{fig:BitcodedLexer}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	482	&
7cf9f17aa179 more Chengsong parents: 538 diff changeset	483	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	484	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	485	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	486	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	487	width = 7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	488	height = 4cm,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	489	legend entries={Lexer with $\simpsulz$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	490	legend pos= north west,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	491	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	492	\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	493	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	494	\end{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	495	\end{tabular}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	496	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	497	\caption{Our Improvement over Sulzmann and Lu's in terms of size}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	498	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	499	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	500	Given the size difference, it is not
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	501	surprising that our $\blexersimp$ significantly outperforms
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	502	$\textit{blexer\_sulzSimp}$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	503	In the next section we are going to establish the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	504	first important property of our lexer--the correctness.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	505	%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	506	% SECTION rewrite relation
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	507	%----------------------------------------------------------------------------------------
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	508	\section{Correctness of $\blexersimp$}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	509	In this section we give details
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	510	of the correctness proof of $\blexersimp$,
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	511	one of the contributions of this thesis.\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	512	We first introduce the rewriting relation \emph{rrewrite}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	513	($\rrewrite$) between two regular expressions,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	514	which expresses an atomic
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	515	simplification.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	516	We then prove properties about
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	517	this rewriting relation and its reflexive transitive closure.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	518	Finally we leverage these properties to show
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	519	an equivalence between the internal data structures of
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	520	$\blexer$ and $\blexersimp$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	521
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	522	\subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	523	In the $\blexer$'s correctness proof, we
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	524	did not directly derive the fact that $\blexer$ generates the POSIX value,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	525	but first proved that $\blexer$ is linked with $\lexer$.
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	526	Then we re-use
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	527	the correctness of $\lexer$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	528	to obtain
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	529	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	530	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	531	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	532	Here we apply this
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	533	modularised technique again
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	534	by first proving that
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	535	$\blexersimp \; r \; s $
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	536	produces the same output as $\blexer \; r\; s$,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	537	and then piecing it together with
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	538	$\blexer$'s correctness to achieve our main
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	539	theorem:\footnote{ The case when
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	540	$s$ is not in $L \; r$, is routine to establish.}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	541	\begin{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	542	$(r, s) \rightarrow v \; \; \textit{iff} \;\; \blexersimp \; r \; s = v$
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	543	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	544	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	545	The overall idea for the proof
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	546	of $\blexer \;r \;s = \blexersimp \; r \;s$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	547	is that the transition from $r$ to $\textit{bsimp}\; r$ can be
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	548	broken down into finitely many rewrite steps:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	549	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	550	$r \rightsquigarrow^* \textit{bsimp} \; r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	551	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	552	where each rewrite step, written $\rightsquigarrow$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	553	is an ``atomic'' simplification that
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	554	is similar to a small-step reduction in operational semantics:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	555	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	556	\begin{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	557	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	558
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	559	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	560
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	561	\inferrule * [Right = $S_1$]{\vspace{0em}}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	562
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	563
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	564
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	565	\inferrule * [Right = $SL$] {\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	566
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	567	\inferrule * [Right = $SR$] {\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	568
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	569	\inferrule * [Right = $A0$] {\vspace{0em}}{ _{bs}\sum [] \rightsquigarrow \ZERO}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	570
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	571	\inferrule * [Right = $A1$] {\vspace{0em}}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	572
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	573	\inferrule * [Right = $AL$] {\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	574
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	575	\inferrule * [Right = $LE$] {\vspace{0em}}{ [] \stackrel{s}{\rightsquigarrow} []}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	576
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	577	\inferrule * [Right = $LT$] {rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{ r :: rs_1 \stackrel{s}{\rightsquigarrow} r :: rs_2 }
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	578
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	579	\inferrule * [Right = $LH$] {r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	580
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	581	\inferrule * [Right = $L\ZERO$] {\vspace{0em}}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	582
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	583	\inferrule * [Right = $LS$] {\vspace{0em}}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	584
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	585	\inferrule * [Right = $LD$] {\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rs_c \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	586
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	587	\end{mathpar}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	588	\caption{
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	589	The rewrite rules that generate simplified regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	590	in small steps: $r_1 \rightsquigarrow r_2$ is for bitcoded regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	591	and $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$ for
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	592	lists of bitcoded regular expressions.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	593	Interesting is the LD rule that allows copies of regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	594	to be removed provided a regular expression
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	595	earlier in the list can match the same strings.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	596	}\label{rrewriteRules}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	597	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	598	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	599	The rules such as $LT$ and $LH$ are for rewriting between two regular expression lists
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	600	such that one regular expression
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	601	in the left-hand-side list is rewritable in one step
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	602	to the right-hand-side's regular expression at the same position.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	603	This helps with defining the ``context rules'' such as $AL$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	604	The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	605	are defined in the usual way:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	606	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	607	\centering
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	608	\begin{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	609	\inferrule{\vspace{0em}}{ r \rightsquigarrow^* r \\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	610
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	611	\inferrule{\vspace{0em}}{rs \stackrel{s*}{\rightsquigarrow} rs \\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	612
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	613	\inferrule{r_1 \rightsquigarrow^* r_2 \land \; r_2 \rightsquigarrow^* r_3}{r_1 \rightsquigarrow^* r_3\\}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	614
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	615	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \land \; rs_2 \stackrel{s}{\rightsquigarrow} rs_3}{rs_1 \stackrel{s*}{\rightsquigarrow} rs_3}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	616	\end{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	617	\caption{The Reflexive Transitive Closure of
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	618	$\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	619	\end{figure}
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	620	%Two rewritable terms will remain rewritable to each other
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	621	%even after a derivative is taken:
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	622	Rewriting is preserved under derivatives,
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	623	namely
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	624	\begin{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	625	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	626	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	627	And finally, if two terms are rewritable to each other,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	628	then they produce the same bitcodes:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	629	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	630	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	631	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	632	The decoding phase of both $\blexer$ and $\blexersimp$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	633	are the same, which means that if they get the same
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	634	bitcodes before the decoding phase,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	635	they get the same value after decoding is done.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	636	We will prove the three properties
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	637	we mentioned above in the next sub-section.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	638	\subsection{Important Properties of $\rightsquigarrow$}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	639	First we prove some basic facts
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	640	about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	641	$\rightsquigarrow^$ and $\stackrel{s}{\rightsquigarrow}$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	642	which will be needed later.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	643	The inference rules (\ref{rrewriteRules}) we
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	644	gave in the previous section
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	645	have their ``many-steps version'':
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	646
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	647	\begin{lemma}\label{squig1}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	648	\hspace{0em}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	649	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	650	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	651	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{}{\rightsquigarrow} _{bs} \sum rs_2$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	652	\item
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	653	$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\; _{bs} \sum (r' :: rs)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	654
826af400b068 more chap4 Chengsong parents: 585 diff changeset	655	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	656	The rewriting in many steps property is composible
826af400b068 more chap4 Chengsong parents: 585 diff changeset	657	in terms of the sequence constructor:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	658	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	659	\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	660	_{bs} r_2 \cdot r_3 \quad $
826af400b068 more chap4 Chengsong parents: 585 diff changeset	661	and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	662	$\quad r_3 \rightsquigarrow^* r_4
826af400b068 more chap4 Chengsong parents: 585 diff changeset	663	\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* _{bs} \; r_1 \cdot r_4$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	664	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	665	The rewriting in many steps properties
826af400b068 more chap4 Chengsong parents: 585 diff changeset	666	$\stackrel{}{\rightsquigarrow}$ and $\stackrel{s}{\rightsquigarrow}$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	667	is preserved under the function $\fuse$:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	668	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	669	\implies \fuse \; bs \; r_1 \rightsquigarrow^* \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	670	\fuse \; bs \; r_2 \quad $ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	671	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	672	\implies \map \; (\fuse \; bs) \; rs_1
826af400b068 more chap4 Chengsong parents: 585 diff changeset	673	\stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	674	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	675	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	676	\begin{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	677	By an induction on
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	678	the inductive cases of $\stackrel{s}{\rightsquigarrow}$ and $\rightsquigarrow^$ respectively.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	679	The third and fourth points are
826af400b068 more chap4 Chengsong parents: 585 diff changeset	680	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	681	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3
826af400b068 more chap4 Chengsong parents: 585 diff changeset	682	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	683	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	684	$\stackrel{s}{\rightsquigarrow}$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	685	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	686	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	687	The inference rules of $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	688	are defined in terms of list cons operation, here
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	689	we establish that the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	690	$\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	691	relation is also preserved w.r.t appending and prepending of a list.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	692	In addition, we
826af400b068 more chap4 Chengsong parents: 585 diff changeset	693	also prove some relations
826af400b068 more chap4 Chengsong parents: 585 diff changeset	694	between $\rightsquigarrow^$ and $\stackrel{s}{\rightsquigarrow}$.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	695	\begin{lemma}\label{ssgqTossgs}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	696	\hspace{0em}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	697	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	698	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	699	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	700
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	701	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	702	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	703	rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2 \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	704	\textit{and} \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	705	rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	706
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	707	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	708	The $\stackrel{s}{\rightsquigarrow} $ relation after appending
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	709	a list becomes $\stackrel{s*}{\rightsquigarrow}$:\\
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	710	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	711	\implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	712	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	713
826af400b068 more chap4 Chengsong parents: 585 diff changeset	714	$r_1 \rightsquigarrow^* r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	715	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	716
826af400b068 more chap4 Chengsong parents: 585 diff changeset	717	$rs_3 \stackrel{s}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^ r_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	718	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	719	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	720	If we could rewrite a regular expression
826af400b068 more chap4 Chengsong parents: 585 diff changeset	721	in many steps to $\ZERO$, then
826af400b068 more chap4 Chengsong parents: 585 diff changeset	722	we could also rewrite any sequence containing it to $\ZERO$:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	723	$r_1 \rightsquigarrow^* \ZERO
826af400b068 more chap4 Chengsong parents: 585 diff changeset	724	\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	725	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	726	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	727	\begin{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	728	The first part is by induction on the list $rs$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	729	The second part is by induction on the inductive cases of $\stackrel{s*}{\rightsquigarrow}$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	730	The third part is
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	731	by rule induction of $\stackrel{s}{\rightsquigarrow}$.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	732	The fourth sub-lemma is
826af400b068 more chap4 Chengsong parents: 585 diff changeset	733	by rule induction of
826af400b068 more chap4 Chengsong parents: 585 diff changeset	734	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	735	The fifth part is a corollary of part four.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	736	The last part is proven by rule induction again on $\rightsquigarrow^*$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	737	\end{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	738	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	739	Now we are ready to give the proofs of the below properties:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	740	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	741	\item
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	742	$(r \rightsquigarrow^* r'\land \bnullable \; r_1)
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	743	\implies \bmkeps \; r = \bmkeps \; r'$. \\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	744	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	745	$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	746	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	747	$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	748	\end{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	749	These properties would work together towards the correctness theorem.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	750	\subsubsection{Property 1: $(r \rightsquigarrow^* r'\land \bnullable \; r_1)
826af400b068 more chap4 Chengsong parents: 585 diff changeset	751	\implies \bmkeps \; r = \bmkeps \; r'$}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	752	Intuitively, this property says we can
826af400b068 more chap4 Chengsong parents: 585 diff changeset	753	extract the same bitcodes using $\bmkeps$ from the nullable
826af400b068 more chap4 Chengsong parents: 585 diff changeset	754	components of two regular expressions $r$ and $r'$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	755	if we can rewrite from one to the other in finitely
826af400b068 more chap4 Chengsong parents: 585 diff changeset	756	many steps.\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	757	For convenience,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	758	we define a predicate for a list of regular expressions
826af400b068 more chap4 Chengsong parents: 585 diff changeset	759	having at least one nullable regular expressions:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	760	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	761	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	762	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	763	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	764	The rewriting relation $\rightsquigarrow$ preserves nullability:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	765	\begin{lemma}\label{rewritesBnullable}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	766	\hspace{0em}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	767	\begin{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	768	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	769	$\text{If} \; r_1 \rightsquigarrow r_2, \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	770	\text{then} \; \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	771	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	772	$\text{If} \; rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	773	\text{then} \; \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	774	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	775	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	776	\implies \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	777	\end{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	778	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	779	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	780	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	781	The third point is a corollary of the second.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	782	\end{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	783	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	784	For convenience again,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	785	we define $\bmkepss$ on a list $rs$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	786	which extracts the bit-codes on the first $\bnullable$ element in $rs$:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	787	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	788	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	789	$\bmkepss \; [] $ & $\dn$ & $[]$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	790	$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \;(\bnullable \; r) \;\;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	791	\textit{then} \;\; \bmkeps \; r \; \textit{else} \;\; \bmkepss \; rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	792	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	793	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	794	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	795	If both regular expressions in a rewriting relation are nullable, then they
826af400b068 more chap4 Chengsong parents: 585 diff changeset	796	produce the same bitcodes:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	797	\begin{lemma}\label{rewriteBmkepsAux}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	798	\hspace{0em}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	799	\begin{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	800	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	801	$r_1 \rightsquigarrow r_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	802	(\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 =
826af400b068 more chap4 Chengsong parents: 585 diff changeset	803	\bmkeps \; r_2)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	804	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	805	and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	806	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	807	\implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	808	\bmkepss \; rs_1 = \bmkepss \; rs2)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	809	\end{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	810	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	811	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	812	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	813	\end{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	814	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	815	With lemma \ref{rewriteBmkepsAux} we are ready to prove its
826af400b068 more chap4 Chengsong parents: 585 diff changeset	816	many-step version:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	817	\begin{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	818	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	819	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	820	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	821	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	822	$\ref{rewritesBnullable}$ tells us both $r$ and $r'$ are nullable.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	823	\ref{rewriteBmkepsAux} solves the inductive case.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	824	\end{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	825
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	826	\subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	827	Now we get to the ``meaty'' part of the proof,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	828	which says that our simplification's helper functions
826af400b068 more chap4 Chengsong parents: 585 diff changeset	829	such as $\distinctBy$ and $\flts$ conform to
826af400b068 more chap4 Chengsong parents: 585 diff changeset	830	the $\stackrel{s*}{\rightsquigarrow}$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	831	$\rightsquigarrow^* $ rewriting relations.\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	832	The first lemma to prove is a more general version of
826af400b068 more chap4 Chengsong parents: 585 diff changeset	833	$rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	834	\begin{lemma}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	835	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	836	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	837	\end{lemma}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	838	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	839	It says that that for a list made of two parts $rs_1 @ rs_2$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	840	one can throw away the duplicate
826af400b068 more chap4 Chengsong parents: 585 diff changeset	841	elements in $rs_2$, as well as those that have appeared in $rs_1$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	842	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	843	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	844	\end{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	845	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	846	Setting $rs_2$ to be empty,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	847	we get the corollary
826af400b068 more chap4 Chengsong parents: 585 diff changeset	848	\begin{corollary}\label{dBPreserves}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	849	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	850	\end{corollary}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	851	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	852	The flatten function $\flts$ conforms to
826af400b068 more chap4 Chengsong parents: 585 diff changeset	853	$\stackrel{s*}{\rightsquigarrow}$ as well:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	854
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	855	\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	856	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	857	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	858	\begin{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	859	By an induction on $rs$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	860	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	861	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	862	The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	863	\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	864	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	865	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	866	\noindent
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	867	The simplification function
826af400b068 more chap4 Chengsong parents: 585 diff changeset	868	$\textit{bsimp}$ only transforms the regex $r$ using steps specified by
826af400b068 more chap4 Chengsong parents: 585 diff changeset	869	$\rightsquigarrow^*$ and nothing else.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	870	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	871	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	872	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	873	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	874	By an induction on $r$.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	875	The most involved case would be the alternative,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	876	where we use lemmas \ref{bsimpaltsPreserves},
826af400b068 more chap4 Chengsong parents: 585 diff changeset	877	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	878	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	879	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	880	$rs$ & $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	881	& $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	882	& $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	883	(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	884	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	885	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	886	Using this we derive the following rewrite relation:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	887	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	888	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	889	$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	890	& $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	891	& $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	892	& $\rightsquigarrow^*$ & $\bsimpalts \; bs \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	893	(\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs))
826af400b068 more chap4 Chengsong parents: 585 diff changeset	894	\; \rerases \; \phi)$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	895	%& $\rightsquigarrow^*$ & $ _{bs} \sum (\distinctBy \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	896	%(\flts \; (\map \; \textit{bsimp}\; rs)) \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	897	%\rerases \; \;\phi) $\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	898	& $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	899	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	900	\end{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	901	\end{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	902	\subsubsection{Property 3: $r_1 \stackrel{}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{}{\rightsquigarrow} r_2 \backslash c$}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	903	The rewritability relation
826af400b068 more chap4 Chengsong parents: 585 diff changeset	904	$\rightsquigarrow$ is preserved under derivatives--
826af400b068 more chap4 Chengsong parents: 585 diff changeset	905	it is just that we might need multiple steps
588 80e1114d6421 data Chengsong parents: 586 diff changeset	906	where originally only one step was needed:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	907	\begin{lemma}\label{rewriteBder}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	908	\hspace{0em}
80e1114d6421 data Chengsong parents: 586 diff changeset	909	\begin{itemize}
80e1114d6421 data Chengsong parents: 586 diff changeset	910	\item
80e1114d6421 data Chengsong parents: 586 diff changeset	911	If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c
80e1114d6421 data Chengsong parents: 586 diff changeset	912	\rightsquigarrow^* r_2 \backslash c$
80e1114d6421 data Chengsong parents: 586 diff changeset	913	\item
80e1114d6421 data Chengsong parents: 586 diff changeset	914	If $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$, then $
80e1114d6421 data Chengsong parents: 586 diff changeset	915	\map \; (\_\backslash c) \; rs_1
80e1114d6421 data Chengsong parents: 586 diff changeset	916	\stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
80e1114d6421 data Chengsong parents: 586 diff changeset	917	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	918	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	919	\begin{proof}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	920	By induction on $\rightsquigarrow$
80e1114d6421 data Chengsong parents: 586 diff changeset	921	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
80e1114d6421 data Chengsong parents: 586 diff changeset	922	\end{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	923	\noindent
80e1114d6421 data Chengsong parents: 586 diff changeset	924	Now we can prove property 3, as an immediate corollary:
80e1114d6421 data Chengsong parents: 586 diff changeset	925	\begin{corollary}\label{rewritesBder}
80e1114d6421 data Chengsong parents: 586 diff changeset	926	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*
80e1114d6421 data Chengsong parents: 586 diff changeset	927	r_2 \backslash c$
80e1114d6421 data Chengsong parents: 586 diff changeset	928	\end{corollary}
80e1114d6421 data Chengsong parents: 586 diff changeset	929	\begin{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	930	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma \ref{rewriteBder}.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	931	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	932	\noindent
588 80e1114d6421 data Chengsong parents: 586 diff changeset	933	This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
80e1114d6421 data Chengsong parents: 586 diff changeset	934	to obtain the rewritability between
80e1114d6421 data Chengsong parents: 586 diff changeset	935	$\blexer$ and $\blexersimp$'s intermediate
80e1114d6421 data Chengsong parents: 586 diff changeset	936	derivative regular expressions
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	937	\begin{lemma}\label{bderBderssimp}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	938	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	939	\end{lemma}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	940	\begin{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	941	By an induction on $s$.
80e1114d6421 data Chengsong parents: 586 diff changeset	942	\end{proof}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	943	\subsection{Main Theorem}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	944	Now with \ref{bderBderssimp} we are ready for the main theorem.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	945	\begin{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	946	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	947	\end{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	948	\noindent
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	949	\begin{proof}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	950	One can rewrite in many steps from the original lexer's
80e1114d6421 data Chengsong parents: 586 diff changeset	951	derivative regular expressions to the
80e1114d6421 data Chengsong parents: 586 diff changeset	952	lexer with simplification applied (by lemma \ref{bderBderssimp}):
80e1114d6421 data Chengsong parents: 586 diff changeset	953	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	954	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
80e1114d6421 data Chengsong parents: 586 diff changeset	955	\end{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	956	we know that they give out the same bits, if the lexing result is a match:
80e1114d6421 data Chengsong parents: 586 diff changeset	957	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	958	$\bnullable \; (a \backslash s)
80e1114d6421 data Chengsong parents: 586 diff changeset	959	\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
80e1114d6421 data Chengsong parents: 586 diff changeset	960	\end{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	961	Now that they give out the same bits, we know that they give the same value after decoding.
80e1114d6421 data Chengsong parents: 586 diff changeset	962	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	963	$\bnullable \; (a \backslash s)
80e1114d6421 data Chengsong parents: 586 diff changeset	964	\implies \decode \; r \; (\bmkeps \; (a \backslash s)) =
80e1114d6421 data Chengsong parents: 586 diff changeset	965	\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
80e1114d6421 data Chengsong parents: 586 diff changeset	966	\end{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	967	Which is equivalent to our proof goal:
80e1114d6421 data Chengsong parents: 586 diff changeset	968	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	969	$\blexer \; r \; s = \blexersimp \; r \; s$.
80e1114d6421 data Chengsong parents: 586 diff changeset	970	\end{center}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	971	\end{proof}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	972	\noindent
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	973	As a corollary,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	974	we link this result with the lemma we proved earlier that
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	975	\begin{center}
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	976	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	977	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	978	and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	979	indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	980	\begin{corollary}
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	981	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	982	\end{corollary}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	983
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	984	\subsection{Comments on the Proof Techniques Used}
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	985	Straightforward and simple as the proof may seem,
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	986	the efforts we spent obtaining it were far from trivial.\\
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	987	We initially attempted to re-use the argument
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	988	in \cref{flex_retrieve}.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	989	The problem was that both functions $\inj$ and $\retrieve$ require
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	990	that the annotated regular expressions stay unsimplified,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	991	so that one can
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	992	correctly compare $v_{i+1}$ and $r_i$ and $v_i$
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	993	in diagram \ref{graph:inj} and
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	994	``fit the key into the lock hole''.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	995
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	996	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	997	We also tried to prove
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	998	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	999	$\textit{bsimp} \;\; (\bderssimp{a}{s}) =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1000	\textit{bsimp} \;\; (a\backslash s)$,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1001	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1002	but this turns out to be not true.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1003	A counterexample would be
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1004	\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\;
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1005	\text{and} \;\; s = bb.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1006	\]
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1007	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1008	Then we would have
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1009	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1010	$\textit{bsimp}\;\; ( a \backslash s )$ =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1011	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1012	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1013	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1014	whereas
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1015	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1016	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1017	$_{Z}(_{Z} \ONE + _{S} c)$.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1018	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1019	Unfortunately,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1020	if we apply $\textit{bsimp}$ differently
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1021	we will always have this discrepancy.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1022	This is due to
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1023	the $\map \; (\fuse\; bs) \; as$ operation
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1024	happening at different locations in the regular expression.\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1025	The rewriting relation
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1026	$\rightsquigarrow^*$
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1027	allows us to ignore this discrepancy
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1028	and view the expressions
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1029	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1030	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1031	and\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1032	$_{Z}(_{Z} \ONE + _{S} c)$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1033
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1034	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1035	as equal, because they were both re-written
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1036	from the same expression.\\
600 fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1037	The simplification rewriting rules
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1038	given in \ref{rrewriteRules} are by no means
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1039	final,
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1040	one could come up new rules
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1041	such as
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1042	$\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1043	\SEQs [r_1, r_2, r_3]$.
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1044	This does not fit with the proof technique
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1045	of our main theorem, but seem to not violate the POSIX
fd068f39ac23 chap4 comments done Chengsong parents: 591 diff changeset	1046	property.\\
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1047	Having correctness property is good.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1048	But we would also a guarantee that the lexer is not slow in
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1049	some sense, for exampe, not grinding to a halt regardless of the input.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1050	As we have already seen, Sulzmann and Lu's simplification function
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1051	$\simpsulz$ cannot achieve this, because their claim that
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1052	the regular expression size does not grow arbitrary large
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1053	was not true.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1054	In the next chapter we shall prove that with our $\simp$,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1055	for a given $r$, the internal derivative size is always
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1056	finitely bounded by a constant.

author	Chengsong
	Mon, 12 Sep 2022 23:32:18 +0200
changeset 600	fd068f39ac23
parent 591	b2d0de6aee18
child 601	ce4e5151a836
permissions	-rwxr-xr-x