lexing: ChengsongTanPhdThesis/Chapters/Bitcoded2.tex@988e92a70704 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
cc54ce075db5 restructured Chengsong parents: diff changeset	11
cc54ce075db5 restructured Chengsong parents: diff changeset	12
cc54ce075db5 restructured Chengsong parents: diff changeset	13
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	14	In this chapter we introduce the simplifications
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	15	on annotated regular expressions that can be applied to
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	16	each intermediate derivative result. This allows
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	17	us to make $\blexer$ much more efficient.
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	18	We contrast this simplification function
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	19	with Sulzmann and Lu's original
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	20	simplifications, indicating the simplicity of our algorithm and
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	21	improvements we made, demostrating
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	22	the usefulness and reliability of formal proofs on algorithms.
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	23	These ``aggressive'' simplifications would not be possible in the injection-based
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	24	lexing we introduced in chapter \ref{Inj}.
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	25	We then go on to prove the correctness with the improved version of
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	26	$\blexer$, called $\blexersimp$, by establishing
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	27	$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	28
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	29	\section{Simplifications by Sulzmann and Lu}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	30	The first thing we notice in the fast growth of examples such as $(a^a^)^*$'s
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	31	and $(a^* + (aa)^)^$'s derivatives is that a lot of duplicated sub-patterns
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	32	are scattered around different levels, and therefore requires
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	33	de-duplication at different levels:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	34	\begin{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	35	$(a^a^)^* \stackrel{\backslash a}{\longrightarrow} (a^a^ + a^)\cdot(a^a^)^ \stackrel{\backslash a}{\longrightarrow} $\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	36	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^ \stackrel{\backslash a}{\longrightarrow} \ldots$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	37	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	38	\noindent
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	39	As we have already mentioned in \ref{eqn:growth2},
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	40	a simple-minded simplification function cannot simplify
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	41	the third regular expression in the above chain of derivative
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	42	regular expressions:
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	43	\begin{center}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	44	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^$
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	45	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	46	one would expect a better simplification function to work in the
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	47	following way:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	48	\begin{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	49	((a^a^ + \underbrace{a^}_\text{A})+\underbrace{a^}_\text{duplicate of A})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	50	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this}.\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	51	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	52	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	53	\color{gray} + a^* \color{black})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	54	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this} \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	55	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	56	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	57	)\cdot(a^a^)^*
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	58	\color{gray} + (a^a^ + a^) \cdot(a^a^)^\\
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	59	\bigg\downarrow \\
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	60	(a^a^ + a^*
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	61	)\cdot(a^a^)^*
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	62	\end{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	63	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	64	This motivating example came from testing Sulzmann and Lu's
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	65	algorithm: their simplification does
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	66	not work!
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	67	We quote their $\textit{simp}$ function verbatim here:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	68	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	69	\begin{tabular}{lcl}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	70	$\simpsulz \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ &
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	71	$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	72	& &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	73	$\simpsulz \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	74	\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	75	\textit{then} \;\; \ZERO$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	76	& & $\textit{else}\;\;_{bs}((\simpsulz \;r_1)\cdot
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	77	(\simpsulz \; r_2))$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	78	$\simpsulz \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	79	$\simpsulz \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	80	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	81	$\simpsulz \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simpsulz \; r)$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	82	$\simpsulz \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	83	(\nub \; (\filter \; (\not \circ \zeroable)\;((\simpsulz \; r) :: \map \; \simpsulz \; rs)))$\\
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	84
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	85	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	86	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	87	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	88	the $\textit{zeroable}$ predicate
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	89	which tests whether the regular expression
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	90	is equivalent to $\ZERO$,
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	91	is defined as:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	92	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	93	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	94	$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	95	\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	96	$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	97	$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	98	$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	99	$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	100	$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	101	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	102	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	103	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	104	They suggested that the $\simpsulz $ function should be
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	105	applied repeatedly until a fixpoint is reached.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	106	We call this construction $\textit{sulzSimp}$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	107	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	108	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	109	$\textit{sulzSimp} \; r$ & $\dn$ &
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	110	$\textit{while}((\simpsulz \; r)\; \cancel{=} \; r)$ \\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	111	& & $\quad r := \simpsulz \; r$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	112	& & $\textit{return} \; r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	113	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	114	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	115	We call the operation of alternatingly
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	116	applying derivatives and simplifications
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	117	(until the string is exhausted) Sulz-simp-derivative,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	118	written $\backslash_{sulzSimp}$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	119	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	120	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	121	$r \backslash_{sulzSimp} (c\!::\!s) $ & $\dn$ & $(\textit{sulzSimp} \; (r \backslash c)) \backslash_{sulzSimp}\, s$ \\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	122	$r \backslash_{sulzSimp} [\,] $ & $\dn$ & $r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	123	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	124	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	125	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	126	After the derivatives have been taken, the bitcodes
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	127	are extracted and decoded in the same manner
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	128	as $\blexer$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	129	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	130	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	131	$\textit{blexer\_sulzSimp}\;r\,s$ & $\dn$ &
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	132	$\textit{let}\;a = (r^\uparrow)\backslash_{sulzSimp}\, s\;\textit{in}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	133	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	134	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	135	& & $\;\;\textit{else}\;\textit{None}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	136	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	137	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	138	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	139	We implemented this lexing algorithm in Scala,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	140	and found that the final derivative regular expression
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	141	size grows exponentially fast:
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	142	\begin{figure}[H]
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	143	\centering
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	144	\begin{tikzpicture}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	145	\begin{axis}[
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	146	xlabel={$n$},
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	147	ylabel={size},
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	148	ymode = log,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	149	legend entries={Final Derivative Size},
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	150	legend pos=north west,
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	151	legend cell align=left]
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	152	\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	153	\end{axis}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	154	\end{tikzpicture}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	155	\caption{Lexing the regular expression $(a^a^)^*$ against strings of the form
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	156	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	157	$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	158	\end{figure}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	159	\noindent
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	160	At $n= 20$ we already get an out of memory error with Scala's normal
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	161	JVM heap size settings.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	162	In fact their simplification does not improve over
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	163	the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	164	The time required also grows exponentially:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	165	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	166	\centering
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	167	\begin{tikzpicture}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	168	\begin{axis}[
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	169	xlabel={$n$},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	170	ylabel={time},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	171	ymode = log,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	172	legend entries={time in secs},
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	173	legend pos=north west,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	174	legend cell align=left]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	175	\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexerTime.data};
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	176	\end{axis}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	177	\end{tikzpicture}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	178	\caption{Lexing the regular expression $(a^a^)^*$ against strings of the form
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	179	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	180	$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexerTime}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	181	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	182	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	183	which seems like a counterexample for
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	184	their linear complexity claim:
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	185	\begin{quote}\it
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	186	Linear-Time Complexity Claim \\It is easy to see that each call of one of the functions/operations:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	187	simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	188	\end{quote}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	189	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	190	The assumption that the size of the regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	191	in the algorithm
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	192	would stay below a finite constant is not ture.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	193	In addition to that, even if the regular expressions size
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	194	do stay finite, one has to take into account that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	195	the $\simpsulz$ function is applied many times
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	196	in each derivative step, and that number is not necessarily
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	197	a constant with respect to the size of the regular expression.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	198	To not get ``caught off guard'' by
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	199	these counterexamples,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	200	one needs to be more careful when designing the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	201	simplification function and making claims about them.
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	202
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	203	\section{Our $\textit{Simp}$ Function}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	204	We will now introduce our simplification function,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	205	by making a contrast with $\simpsulz$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	206	We describe
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	207	the ideas behind components in their algorithm
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	208	and why they fail to achieve the desired effect, followed
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	209	by our solution. These solutions come with correctness
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	210	statements that are backed up by formal proofs.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	211	\subsection{Flattening Nested Alternatives}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	212	The idea behind the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	213	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	214	$\simpsulz \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	215	_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	216	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	217	clause is that it allows
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	218	duplicate removal of regular expressions at different
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	219	levels.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	220	For example, this would help with the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	221	following simplification:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	222
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	223	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	224	$(a+r)+r \longrightarrow a+r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	225	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	226	The problem here is that only the head element
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	227	is ``spilled out'',
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	228	whereas we would want to flatten
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	229	an entire list to open up possibilities for further simplifications.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	230	Not flattening the rest of the elements also means that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	231	the later de-duplication processs
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	232	does not fully remove apparent duplicates.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	233	For example,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	234	using $\simpsulz$ we could not
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	235	simplify
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	236	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	237	$((a^* a^)+ (a^ + a^))\cdot (a^a^)^+
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	238	((a^a^)+a^)\cdot (a^a^)^$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	239	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	240	due to the underlined part not in the first element
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	241	of the alternative.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	242	We define a flatten operation that flattens not only
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	243	the first regular expression of an alternative,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	244	but the entire list:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	245	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	246	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	247	$\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	248	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	249	$\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \; \textit{as'} $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	250	$\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	251	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	252	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	253	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	254	Our $\flts$ operation
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	255	also throws away $\ZERO$s
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	256	as they do not contribute to a lexing result.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	257	\subsection{Duplicate Removal}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	258	After flattening is done, we are ready to deduplicate.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	259	The de-duplicate function is called $\distinctBy$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	260	and that is where we make our second improvement over
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	261	Sulzmann and Lu's.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	262	The process goes as follows:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	263	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	264	$rs \stackrel{\textit{flts}}{\longrightarrow}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	265	rs_{flat}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	266	\xrightarrow{\distinctBy \;
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	267	rs_{flat} \; \rerases\; \varnothing} rs_{distinct}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	268	%\stackrel{\distinctBy \;
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	269	%rs_{flat} \; \erase\; \varnothing}{\longrightarrow} \; rs_{distinct}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	270	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	271	where the $\distinctBy$ function is defined as:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	272	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	273	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	274	$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	275	$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & $\quad \textit{if} (f \; x \in acc)\;\; \textit{then} \;\; \distinctBy \; xs \; f \; acc$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	276	& & $\quad \textit{else}\;\; x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	277	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	278	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	279	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	280	The reason we define a distinct function under a mapping $f$ is because
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	281	we want to eliminate regular expressions that are syntactically the same,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	282	but with different bit-codes.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	283	For example, we can remove the second $a^a^$ from
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	284	$_{ZSZ}a^a^ + _{SZZ}a^a^$, because it
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	285	represents a match with shorter initial sub-match
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	286	(and therefore is definitely not POSIX),
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	287	and will be discarded by $\bmkeps$ later.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	288	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	289	$_{ZSZ}\underbrace{a^}_{ZS:\; match \; 1\; times\quad}\underbrace{a^}_{Z: \;match\; 1 \;times} +
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	290	_{SZZ}\underbrace{a^}_{S: \; match \; 0 \; times\quad}\underbrace{a^}_{ZZ: \; match \; 2 \; times}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	291	$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	292	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	293	%$_{bs1} r_1 + _{bs2} r_2 \text{where} (r_1)_{\downarrow} = (r_2)_{\downarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	294	Due to the way our algorithm works,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	295	the matches that conform to the POSIX standard
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	296	will always be placed further to the left. When we
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	297	traverse the list from left to right,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	298	regular expressions we have already seen
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	299	will definitely not contribute to a POSIX value,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	300	even if they are attached with different bitcodes.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	301	These duplicates therefore need to be removed.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	302	To achieve this, we call $\rerases$ as the function $f$ during the distinction
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	303	operation.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	304	$\rerases$ is very similar to $\erase$, except that it preserves the structure
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	305	when erasing an alternative regular expression.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	306	The reason why we use $\rerases$ instead of $\erase$ is that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	307	it keeps the structures of alternative
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	308	annotated regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	309	whereas $\erase$ would turn it back into a binary structure.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	310	Not having to mess with the structure
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	311	greatly simplifies the finiteness proof in chapter
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	312	\ref{Finite} (we will follow up with more details there).
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	313	We give the definitions of $\rerases$ here together with
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	314	the new datatype used by $\rerases$ (as our plain
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	315	regular expression datatype does not allow non-binary alternatives).
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	316	For the moment the reader can just think of
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	317	$\rerases$ as $\erase$ and $\rrexp$ as plain regular expressions.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	318	\begin{figure}[H]
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	319	\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	320	$\rrexp ::= \RZERO \mid \RONE
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	321	\mid \RCHAR{c}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	322	\mid \RSEQ{r_1}{r_2}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	323	\mid \RALTS{rs}
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	324	\mid \RSTAR{r} $
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	325	\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	326	\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	327	constructor}\label{rrexpDef}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 589 diff changeset	328	\end{figure}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	329	The notation of $\rerases$ also follows that of $\erase$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	330	which is a postfix operator written as a subscript,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	331	except that it has an \emph{r} attached to it to distinguish against $\erase$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	332	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	333	\begin{tabular}{lcl}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	334	$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	335	$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	336	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	337	$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	338	$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	339	$\rerase{_{bs} a ^}$ & $\dn$ & $\rerase{a}^$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	340	\end{tabular}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	341	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	342
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	343	\subsection{Putting Things Together}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	344	A recursive definition of our simplification function
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	345	is given below:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	346	%that looks somewhat similar to our Scala code is
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	347	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	348	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	349
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	350	$\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp} \; a_2) $ \\
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	351	$\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \rerases \; \varnothing) $ \\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	352	$\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	353	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	354	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	355
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	356	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	357	The simplification (named $\textit{bsimp}$ for \emph{b}it-coded)
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	358	does a pattern matching on the regular expression.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	359	When it detected that the regular expression is an alternative or
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	360	sequence, it will try to simplify its children regular expressions
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	361	recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	362	$\ONE$, which might trigger further simplification at the current level.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	363	Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	364	using rules such as $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	365	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	366	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	367	$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	368	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	369	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	370	&&$\quad\textit{case} \; (_{bs1}\ONE, a_2') \Rightarrow \textit{fuse} \; (bs@bs_1) \; a_2'$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	371	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	372	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	373	\end{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	374	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	375	The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	376	the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	377	and then call $\distinctBy$ on that list, the predicate determining whether two
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	378	elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	379	Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	380	singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	381	decides whether to keep the current level constructor $\sum$ as it is, and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	382	removes it when there are less than two elements:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	383	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	384	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	385	$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	386	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	387	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	388	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	389	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	390
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	391	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	392	Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	393	we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	394	so it stays small:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	395	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	396	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	397	$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	398	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	399	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	400	%Following previous notations
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	401	%when extending from derivatives w.r.t.~character to derivative
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	402	%w.r.t.~string, we define the derivative that nests simplifications
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	403	%with derivatives:%\comment{simp in the [] case?}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	404	We extend this from character to string:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	405	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	406	\begin{tabular}{lcl}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	407	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	408	$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	409	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	410	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	411
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	412	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	413	The lexer that extracts bitcodes from the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	414	derivatives with simplifications from our $\simp$ function
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	415	is called $\blexersimp$:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	416	\begin{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	417	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	418	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	419	$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	420	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	421	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	422	& & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	423	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	424	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	425
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	426	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	427	This algorithm keeps the regular expression size small.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	428
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	429
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	430	\subsection{$(a+aa)^$ and $(a^\cdot a^)^$ against
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	431	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	432	For example,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	433	with our simplification the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	434	previous $(a^a^)^*$ example
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	435	where $\simpsulz$ could not
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	436	stop the fast growth (over
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	437	3 million nodes just below $20$ input length)
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	438	will be reduced to just 15 and stays constant, no matter how long the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	439	input string is.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	440	This is demonstrated in the graphs below.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	441	\begin{figure}[H]
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	442	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	443	\begin{tabular}{ll}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	444	\begin{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	445	\begin{axis}[
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	446	xlabel={$n$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	447	ylabel={derivative size},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	448	width=7cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	449	height=4cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	450	legend entries={Lexer with $\textit{bsimp}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	451	legend pos= south east,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	452	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	453	\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	454	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	455	\end{tikzpicture} %\label{fig:BitcodedLexer}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	456	&
7cf9f17aa179 more Chengsong parents: 538 diff changeset	457	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	458	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	459	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	460	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	461	width = 7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	462	height = 4cm,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	463	legend entries={Lexer with $\simpsulz$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	464	legend pos= north west,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	465	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	466	\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	467	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	468	\end{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	469	\end{tabular}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	470	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	471	\caption{Our Improvement over Sulzmann and Lu's in terms of size}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	472	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	473	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	474	Given the size difference, it is not
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	475	surprising that our $\blexersimp$ significantly outperforms
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	476	$\textit{blexer\_sulzSimp}$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	477	In the next section we are going to establish the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	478	first important property of our lexer--the correctness.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	479	%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	480	% SECTION rewrite relation
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	481	%----------------------------------------------------------------------------------------
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	482	\section{Correctness of $\blexersimp$}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	483	In this section we give details
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	484	of the correctness proof of $\blexersimp$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	485	an important contribution of this thesis.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	486	We first introduce the rewriting relation \emph{rrewrite}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	487	($\rrewrite$) between two regular expressions,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	488	which expresses an atomic
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	489	simplification step from the left-hand-side
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	490	to the right-hand-side.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	491	We then prove properties about
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	492	this rewriting relation and its reflexive transitive closure.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	493	Finally we leverage these properties to show
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	494	an equivalence between the internal data structures of
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	495	$\blexer$ and $\blexersimp$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	496
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	497	\subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	498	In the $\blexer$'s correctness proof, we
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	499	did not directly derive the fact that $\blexer$ gives out the POSIX value,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	500	but first proved that $\blexer$ is linked with $\lexer$.
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	501	Then we re-use
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	502	the correctness of $\lexer$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	503	to obtain
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	504	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	505	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	506	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	507	Here we apply this
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	508	modularised technique again
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	509	by first proving that
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	510	$\blexersimp \; r \; s $
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	511	produces the same output as $\blexer \; r\; s$,
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	512	and then piecing it together with
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	513	$\blexer$'s correctness to achieve our main
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	514	theorem:\footnote{ the case when
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	515	$s$ is not in $L \; r$, is routine to establish }
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	516	\begin{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	517	$(r, s) \rightarrow v \; \; \textit{iff} \;\; \blexersimp \; r \; s = v$
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	518	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	519	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	520	The overall idea for the proof
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	521	of $\blexer \;r \;s = \blexersimp \; r \;s$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	522	is that the transition from $r$ to $\textit{bsimp}\; r$ can be
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	523	broken down into finitely many rewrite steps:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	524	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	525	$r \rightsquigarrow^* \textit{bsimp} \; r$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	526	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	527	where each rewrite step, written $\rightsquigarrow$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	528	is an ``atomic'' simplification that
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	529	cannot be broken down any further:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	530	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	531	\begin{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	532	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	533
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	534	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	535
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	536	\inferrule * [Right = $S_1$]{\vspace{0em}}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	537
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	538
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	539
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	540	\inferrule * [Right = $SL$] {\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	541
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	542	\inferrule * [Right = $SR$] {\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	543
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	544	\inferrule * [Right = $A0$] {\vspace{0em}}{ _{bs}\sum [] \rightsquigarrow \ZERO}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	545
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	546	\inferrule * [Right = $A1$] {\vspace{0em}}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	547
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	548	\inferrule * [Right = $AL$] {\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	549
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	550	\inferrule * [Right = $LE$] {\vspace{0em}}{ [] \stackrel{s}{\rightsquigarrow} []}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	551
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	552	\inferrule * [Right = $LT$] {rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{ r :: rs_1 \stackrel{s}{\rightsquigarrow} r :: rs_2 }
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	553
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	554	\inferrule * [Right = $LH$] {r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	555
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	556	\inferrule * [Right = $L\ZERO$] {\vspace{0em}}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	557
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	558	\inferrule * [Right = $LS$] {\vspace{0em}}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	559
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	560	\inferrule * [Right = $LD$] {\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rsc \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	561
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	562	\end{mathpar}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	563	\caption{
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	564	The rewrite rules that generate simplified regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	565	in small steps: $r_1 \rightsquigarrow r_2$ is for bitcoded regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	566	and $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$ for
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	567	lists of bitcoded regular expressions.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	568	Interesting is the LD rule that allows copies of regular expressions
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	569	to be removed provided a regular expression
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	570	earlier in the list can match the same strings.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	571	}\label{rrewriteRules}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	572	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	573	\noindent
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	574	The rules such as $LT$ and $LH$ are for rewriting between two regular expression lists
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	575	such that one regular expression
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	576	in the left-hand-side list is rewritable in one step
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	577	to the right-hand-side's regular expression at the same position.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	578	This helps with defining the ``context rules'' such as $AL$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	579	The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	580	are defined in the usual way:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	581	\begin{figure}[H]
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	582	\centering
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	583	\begin{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	584	\inferrule{\vspace{0em}}{ r \rightsquigarrow^* r \\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	585
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	586	\inferrule{\vspace{0em}}{rs \stackrel{s*}{\rightsquigarrow} rs \\}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	587
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	588	\inferrule{r_1 \rightsquigarrow^* r_2 \land \; r_2 \rightsquigarrow^* r_3}{r_1 \rightsquigarrow^* r_3\\}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	589
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	590	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \land \; rs_2 \stackrel{s}{\rightsquigarrow} rs_3}{rs_1 \stackrel{s*}{\rightsquigarrow} rs_3}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	591	\end{mathpar}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	592	\caption{The Reflexive Transitive Closure of
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	593	$\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	594	\end{figure}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	595	Two rewritable terms will remain rewritable to each other
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	596	even after a derivative is taken:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	597	\begin{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	598	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	599	\end{center}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	600	And finally, if two terms are rewritable to each other,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	601	then they produce the same bitcodes:
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	602	\begin{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	603	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	604	\end{center}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	605	The decoding phase of both $\blexer$ and $\blexersimp$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	606	are the same, which means that if they get the same
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	607	bitcodes before the decoding phase,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	608	they get the same value after decoding is done.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	609	We will prove the three properties
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	610	we mentioned above in the next sub-section.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	611	\subsection{Important Properties of $\rightsquigarrow$}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	612	First we prove some basic facts
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	613	about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	614	$\rightsquigarrow^$ and $\stackrel{s}{\rightsquigarrow}$,
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	615	which will be needed later.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	616	The inference rules (\ref{rrewriteRules}) we
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	617	gave in the previous section
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	618	have their ``many-steps version'':
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	619
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	620	\begin{lemma}\label{squig1}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	621	\hspace{0em}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	622	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	623	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	624	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{}{\rightsquigarrow} _{bs} \sum rs_2$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	625	\item
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	626	$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\; _{bs} \sum (r' :: rs)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	627
826af400b068 more chap4 Chengsong parents: 585 diff changeset	628	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	629	The rewriting in many steps property is composible
826af400b068 more chap4 Chengsong parents: 585 diff changeset	630	in terms of the sequence constructor:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	631	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	632	\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	633	_{bs} r_2 \cdot r_3 \quad $
826af400b068 more chap4 Chengsong parents: 585 diff changeset	634	and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	635	$\quad r_3 \rightsquigarrow^* r_4
826af400b068 more chap4 Chengsong parents: 585 diff changeset	636	\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* _{bs} \; r_1 \cdot r_4$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	637	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	638	The rewriting in many steps properties
826af400b068 more chap4 Chengsong parents: 585 diff changeset	639	$\stackrel{}{\rightsquigarrow}$ and $\stackrel{s}{\rightsquigarrow}$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	640	is preserved under the function $\fuse$:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	641	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	642	\implies \fuse \; bs \; r_1 \rightsquigarrow^* \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	643	\fuse \; bs \; r_2 \quad $ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	644	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	645	\implies \map \; (\fuse \; bs) \; rs_1
826af400b068 more chap4 Chengsong parents: 585 diff changeset	646	\stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	647	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	648	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	649	\begin{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	650	By an induction on
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	651	the inductive cases of $\stackrel{s}{\rightsquigarrow}$ and $\rightsquigarrow^$ respectively.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	652	The third and fourth points are
826af400b068 more chap4 Chengsong parents: 585 diff changeset	653	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	654	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3
826af400b068 more chap4 Chengsong parents: 585 diff changeset	655	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	656	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	657	$\stackrel{s}{\rightsquigarrow}$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	658	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	659	\noindent
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	660	The inference rules of $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	661	are defined in terms of list cons operation, here
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	662	we establish that the
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	663	$\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	664	relation is also preserved w.r.t appending and prepending of a list.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	665	In addition, we
826af400b068 more chap4 Chengsong parents: 585 diff changeset	666	also prove some relations
826af400b068 more chap4 Chengsong parents: 585 diff changeset	667	between $\rightsquigarrow^$ and $\stackrel{s}{\rightsquigarrow}$.
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	668	\begin{lemma}\label{ssgqTossgs}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	669	\hspace{0em}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	670	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	671	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	672	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	673
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	674	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	675	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	676	rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2 \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	677	\textit{and} \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	678	rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	679
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	680	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	681	The $\stackrel{s}{\rightsquigarrow} $ relation after appending
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	682	a list becomes $\stackrel{s*}{\rightsquigarrow}$:\\
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	683	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	684	\implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	685	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	686
826af400b068 more chap4 Chengsong parents: 585 diff changeset	687	$r_1 \rightsquigarrow^* r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	688	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	689
826af400b068 more chap4 Chengsong parents: 585 diff changeset	690	$rs_3 \stackrel{s}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^ r_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	691	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	692	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	693	If we could rewrite a regular expression
826af400b068 more chap4 Chengsong parents: 585 diff changeset	694	in many steps to $\ZERO$, then
826af400b068 more chap4 Chengsong parents: 585 diff changeset	695	we could also rewrite any sequence containing it to $\ZERO$:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	696	$r_1 \rightsquigarrow^* \ZERO
826af400b068 more chap4 Chengsong parents: 585 diff changeset	697	\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	698	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	699	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	700	\begin{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	701	The first part is by induction on the list $rs$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	702	The second part is by induction on the inductive cases of $\stackrel{s*}{\rightsquigarrow}$.
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	703	The third part is
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	704	by rule induction of $\stackrel{s}{\rightsquigarrow}$.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	705	The fourth sub-lemma is
826af400b068 more chap4 Chengsong parents: 585 diff changeset	706	by rule induction of
826af400b068 more chap4 Chengsong parents: 585 diff changeset	707	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	708	The fifth part is a corollary of part four.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	709	The last part is proven by rule induction again on $\rightsquigarrow^*$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	710	\end{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	711	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	712	Now we are ready to give the proofs of the below properties:
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	713	\begin{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	714	\item
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	715	$(r \rightsquigarrow^* r'\land \bnullable \; r_1)
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	716	\implies \bmkeps \; r = \bmkeps \; r'$. \\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	717	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	718	$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	719	\item
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	720	$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	721	\end{itemize}
4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	722	These properties would work together towards the correctness theorem.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	723	\subsubsection{Property 1: $(r \rightsquigarrow^* r'\land \bnullable \; r_1)
826af400b068 more chap4 Chengsong parents: 585 diff changeset	724	\implies \bmkeps \; r = \bmkeps \; r'$}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	725	Intuitively, this property says we can
826af400b068 more chap4 Chengsong parents: 585 diff changeset	726	extract the same bitcodes using $\bmkeps$ from the nullable
826af400b068 more chap4 Chengsong parents: 585 diff changeset	727	components of two regular expressions $r$ and $r'$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	728	if we can rewrite from one to the other in finitely
826af400b068 more chap4 Chengsong parents: 585 diff changeset	729	many steps.\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	730	For convenience,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	731	we define a predicate for a list of regular expressions
826af400b068 more chap4 Chengsong parents: 585 diff changeset	732	having at least one nullable regular expressions:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	733	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	734	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	735	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	736	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	737	The rewriting relation $\rightsquigarrow$ preserves nullability:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	738	\begin{lemma}\label{rewritesBnullable}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	739	\hspace{0em}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	740	\begin{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	741	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	742	$\text{If} \; r_1 \rightsquigarrow r_2, \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	743	\text{then} \; \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	744	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	745	$\text{If} \; rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	746	\text{then} \; \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	747	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	748	$r_1 \rightsquigarrow^* r_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	749	\implies \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	750	\end{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	751	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	752	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	753	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	754	The third point is a corollary of the second.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	755	\end{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	756	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	757	For convenience again,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	758	we define $\bmkepss$ on a list $rs$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	759	which extracts the bit-codes on the first $\bnullable$ element in $rs$:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	760	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	761	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	762	$\bmkepss \; [] $ & $\dn$ & $[]$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	763	$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \;(\bnullable \; r) \;\;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	764	\textit{then} \;\; \bmkeps \; r \; \textit{else} \;\; \bmkepss \; rs$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	765	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	766	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	767	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	768	If both regular expressions in a rewriting relation are nullable, then they
826af400b068 more chap4 Chengsong parents: 585 diff changeset	769	produce the same bitcodes:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	770	\begin{lemma}\label{rewriteBmkepsAux}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	771	\hspace{0em}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	772	\begin{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	773	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	774	$r_1 \rightsquigarrow r_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	775	(\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 =
826af400b068 more chap4 Chengsong parents: 585 diff changeset	776	\bmkeps \; r_2)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	777	\item
826af400b068 more chap4 Chengsong parents: 585 diff changeset	778	and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	779	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2
826af400b068 more chap4 Chengsong parents: 585 diff changeset	780	\implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies
826af400b068 more chap4 Chengsong parents: 585 diff changeset	781	\bmkepss \; rs_1 = \bmkepss \; rs2)$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	782	\end{itemize}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	783	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	784	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	785	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	786	\end{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	787	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	788	With lemma \ref{rewriteBmkepsAux} we are ready to prove its
826af400b068 more chap4 Chengsong parents: 585 diff changeset	789	many-step version:
826af400b068 more chap4 Chengsong parents: 585 diff changeset	790	\begin{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	791	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
826af400b068 more chap4 Chengsong parents: 585 diff changeset	792	\end{lemma}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	793	\begin{proof}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	794	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	795	$\ref{rewritesBnullable}$ tells us both $r$ and $r'$ are nullable.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	796	\ref{rewriteBmkepsAux} solves the inductive case.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	797	\end{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	798
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	799	\subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	800	Now we get to the ``meaty'' part of the proof,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	801	which says that our simplification's helper functions
826af400b068 more chap4 Chengsong parents: 585 diff changeset	802	such as $\distinctBy$ and $\flts$ conform to
826af400b068 more chap4 Chengsong parents: 585 diff changeset	803	the $\stackrel{s*}{\rightsquigarrow}$ and
826af400b068 more chap4 Chengsong parents: 585 diff changeset	804	$\rightsquigarrow^* $ rewriting relations.\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	805	The first lemma to prove is a more general version of
826af400b068 more chap4 Chengsong parents: 585 diff changeset	806	$rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	807	\begin{lemma}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	808	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	809	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	810	\end{lemma}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	811	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	812	It says that that for a list made of two parts $rs_1 @ rs_2$,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	813	one can throw away the duplicate
826af400b068 more chap4 Chengsong parents: 585 diff changeset	814	elements in $rs_2$, as well as those that have appeared in $rs_1$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	815	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	816	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	817	\end{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	818	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	819	Setting $rs_2$ to be empty,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	820	we get the corollary
826af400b068 more chap4 Chengsong parents: 585 diff changeset	821	\begin{corollary}\label{dBPreserves}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	822	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
826af400b068 more chap4 Chengsong parents: 585 diff changeset	823	\end{corollary}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	824	\noindent
826af400b068 more chap4 Chengsong parents: 585 diff changeset	825	The flatten function $\flts$ conforms to
826af400b068 more chap4 Chengsong parents: 585 diff changeset	826	$\stackrel{s*}{\rightsquigarrow}$ as well:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	827
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	828	\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	829	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	830	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	831	\begin{proof}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	832	By an induction on $rs$.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	833	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	834	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	835	The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	836	\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	837	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	838	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	839	\noindent
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	840	The simplification function
826af400b068 more chap4 Chengsong parents: 585 diff changeset	841	$\textit{bsimp}$ only transforms the regex $r$ using steps specified by
826af400b068 more chap4 Chengsong parents: 585 diff changeset	842	$\rightsquigarrow^*$ and nothing else.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	843	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	844	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	845	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	846	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	847	By an induction on $r$.
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	848	The most involved case would be the alternative,
826af400b068 more chap4 Chengsong parents: 585 diff changeset	849	where we use lemmas \ref{bsimpaltsPreserves},
826af400b068 more chap4 Chengsong parents: 585 diff changeset	850	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	851	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	852	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	853	$rs$ & $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	854	& $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	855	& $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	856	(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	857	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	858	\end{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	859	Using this we derive the following rewrite relation:\\
826af400b068 more chap4 Chengsong parents: 585 diff changeset	860	\begin{center}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	861	\begin{tabular}{lcl}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	862	$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	863	& $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	864	& $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	865	& $\rightsquigarrow^*$ & $\bsimpalts \; bs \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	866	(\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs))
826af400b068 more chap4 Chengsong parents: 585 diff changeset	867	\; \rerases \; \phi)$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	868	%& $\rightsquigarrow^*$ & $ _{bs} \sum (\distinctBy \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	869	%(\flts \; (\map \; \textit{bsimp}\; rs)) \; \;
826af400b068 more chap4 Chengsong parents: 585 diff changeset	870	%\rerases \; \;\phi) $\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	871	& $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
826af400b068 more chap4 Chengsong parents: 585 diff changeset	872	\end{tabular}
826af400b068 more chap4 Chengsong parents: 585 diff changeset	873	\end{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	874	\end{proof}
585 4969ef817d92 chap4 more Chengsong parents: 584 diff changeset	875	\subsubsection{Property 3: $r_1 \stackrel{}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{}{\rightsquigarrow} r_2 \backslash c$}
586 826af400b068 more chap4 Chengsong parents: 585 diff changeset	876	The rewritability relation
826af400b068 more chap4 Chengsong parents: 585 diff changeset	877	$\rightsquigarrow$ is preserved under derivatives--
826af400b068 more chap4 Chengsong parents: 585 diff changeset	878	it is just that we might need multiple steps
588 80e1114d6421 data Chengsong parents: 586 diff changeset	879	where originally only one step was needed:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	880	\begin{lemma}\label{rewriteBder}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	881	\hspace{0em}
80e1114d6421 data Chengsong parents: 586 diff changeset	882	\begin{itemize}
80e1114d6421 data Chengsong parents: 586 diff changeset	883	\item
80e1114d6421 data Chengsong parents: 586 diff changeset	884	If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c
80e1114d6421 data Chengsong parents: 586 diff changeset	885	\rightsquigarrow^* r_2 \backslash c$
80e1114d6421 data Chengsong parents: 586 diff changeset	886	\item
80e1114d6421 data Chengsong parents: 586 diff changeset	887	If $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$, then $
80e1114d6421 data Chengsong parents: 586 diff changeset	888	\map \; (\_\backslash c) \; rs_1
80e1114d6421 data Chengsong parents: 586 diff changeset	889	\stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
80e1114d6421 data Chengsong parents: 586 diff changeset	890	\end{itemize}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	891	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	892	\begin{proof}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	893	By induction on $\rightsquigarrow$
80e1114d6421 data Chengsong parents: 586 diff changeset	894	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
80e1114d6421 data Chengsong parents: 586 diff changeset	895	\end{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	896	\noindent
80e1114d6421 data Chengsong parents: 586 diff changeset	897	Now we can prove property 3, as an immediate corollary:
80e1114d6421 data Chengsong parents: 586 diff changeset	898	\begin{corollary}\label{rewritesBder}
80e1114d6421 data Chengsong parents: 586 diff changeset	899	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*
80e1114d6421 data Chengsong parents: 586 diff changeset	900	r_2 \backslash c$
80e1114d6421 data Chengsong parents: 586 diff changeset	901	\end{corollary}
80e1114d6421 data Chengsong parents: 586 diff changeset	902	\begin{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	903	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma \ref{rewriteBder}.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	904	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	905	\noindent
588 80e1114d6421 data Chengsong parents: 586 diff changeset	906	This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
80e1114d6421 data Chengsong parents: 586 diff changeset	907	to obtain the rewritability between
80e1114d6421 data Chengsong parents: 586 diff changeset	908	$\blexer$ and $\blexersimp$'s intermediate
80e1114d6421 data Chengsong parents: 586 diff changeset	909	derivative regular expressions
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	910	\begin{lemma}\label{bderBderssimp}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	911	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	912	\end{lemma}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	913	\begin{proof}
80e1114d6421 data Chengsong parents: 586 diff changeset	914	By an induction on $s$.
80e1114d6421 data Chengsong parents: 586 diff changeset	915	\end{proof}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	916	\subsection{Main Theorem}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	917	Now with \ref{bderBderssimp} we are ready for the main theorem.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	918	\begin{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	919	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	920	\end{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	921	\noindent
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	922	\begin{proof}
588 80e1114d6421 data Chengsong parents: 586 diff changeset	923	One can rewrite in many steps from the original lexer's
80e1114d6421 data Chengsong parents: 586 diff changeset	924	derivative regular expressions to the
80e1114d6421 data Chengsong parents: 586 diff changeset	925	lexer with simplification applied (by lemma \ref{bderBderssimp}):
80e1114d6421 data Chengsong parents: 586 diff changeset	926	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	927	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
80e1114d6421 data Chengsong parents: 586 diff changeset	928	\end{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	929	we know that they give out the same bits, if the lexing result is a match:
80e1114d6421 data Chengsong parents: 586 diff changeset	930	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	931	$\bnullable \; (a \backslash s)
80e1114d6421 data Chengsong parents: 586 diff changeset	932	\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
80e1114d6421 data Chengsong parents: 586 diff changeset	933	\end{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	934	Now that they give out the same bits, we know that they give the same value after decoding.
80e1114d6421 data Chengsong parents: 586 diff changeset	935	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	936	$\bnullable \; (a \backslash s)
80e1114d6421 data Chengsong parents: 586 diff changeset	937	\implies \decode \; r \; (\bmkeps \; (a \backslash s)) =
80e1114d6421 data Chengsong parents: 586 diff changeset	938	\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
80e1114d6421 data Chengsong parents: 586 diff changeset	939	\end{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	940	Which is equivalent to our proof goal:
80e1114d6421 data Chengsong parents: 586 diff changeset	941	\begin{center}
80e1114d6421 data Chengsong parents: 586 diff changeset	942	$\blexer \; r \; s = \blexersimp \; r \; s$.
80e1114d6421 data Chengsong parents: 586 diff changeset	943	\end{center}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	944	\end{proof}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	945	\noindent
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	946	As a corollary,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	947	we link this result with the lemma we proved earlier that
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	948	\begin{center}
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	949	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	950	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	951	and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	952	indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	953	\begin{corollary}
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	954	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	955	\end{corollary}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	956
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	957	\subsection{Comments on the Proof Techniques Used}
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	958	Straightforward and simple as the proof may seem,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	959	the efforts we spent obtaining it was far from trivial.\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	960	We initially attempted to re-use the argument
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	961	in \cref{flex_retrieve}.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	962	The problem was that both functions $\inj$ and $\retrieve$ require
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	963	that the annotated regular expressions stay unsimplified,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	964	so that one can
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	965	correctly compare $v_{i+1}$ and $r_i$ and $v_i$
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	966	in diagram \ref{graph:inj} and
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	967	``fit the key into the lock hole''.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	968
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	969	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	970	We also tried to prove
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	971	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	972	$\textit{bsimp} \;\; (\bderssimp{a}{s}) =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	973	\textit{bsimp} \;\; (a\backslash s)$,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	974	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	975	but this turns out to be not true.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	976	A counterexample would be
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	977	\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\;
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	978	\text{and} \;\; s = bb.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	979	\]
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	980	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	981	Then we would have
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	982	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	983	$\textit{bsimp}\;\; ( a \backslash s )$ =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	984	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	985	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	986	\noindent
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	987	whereas
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	988	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	989	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	990	$_{Z}(_{Z} \ONE + _{S} c)$.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	991	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	992	Unfortunately,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	993	if we apply $\textit{bsimp}$ differently
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	994	we will always have this discrepancy.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	995	This is due to
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	996	the $\map \; (\fuse\; bs) \; as$ operation
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	997	happening at different locations in the regular expression.\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	998	The rewriting relation
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	999	$\rightsquigarrow^*$
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1000	allows us to ignore this discrepancy
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1001	and view the expressions
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1002	\begin{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1003	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1004	and\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1005	$_{Z}(_{Z} \ONE + _{S} c)$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1006
589 86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1007	\end{center}
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1008	as equal, because they were both re-written
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1009	from the same expression.\\
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1010	Having correctness property is good.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1011	But we would also a guarantee that the lexer is not slow in
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1012	some sense, for exampe, not grinding to a halt regardless of the input.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1013	As we have already seen, Sulzmann and Lu's simplification function
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1014	$\simpsulz$ cannot achieve this, because their claim that
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1015	the regular expression size does not grow arbitrary large
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1016	was not true.
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1017	In the next chapter we shall prove that with our $\simp$,
86e0203db2da chap4 finished Chengsong parents: 588 diff changeset	1018	for a given $r$, the internal derivative size is always
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	1019	finitely bounded by a constant.

author	Chengsong
	Wed, 31 Aug 2022 23:57:42 +0100 (2022-08-31)
changeset 590	988e92a70704
parent 589	86e0203db2da
child 591	b2d0de6aee18
permissions	-rwxr-xr-x