lexing: ChengsongTanPhdThesis/Chapters/Bitcoded2.tex@1734bd5975a3 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
cc54ce075db5 restructured Chengsong parents: diff changeset	11
cc54ce075db5 restructured Chengsong parents: diff changeset	12
cc54ce075db5 restructured Chengsong parents: diff changeset	13
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	14	In this chapter we introduce the simplifications
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	15	on annotated regular expressions that can be applied to
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	16	each intermediate derivative result. This allows
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	17	us to make $\blexer$ much more efficient.
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	18	We contrast this simplification function
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	19	with Sulzmann and Lu's original
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	20	simplifications, indicating the simplicity of our algorithm and
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	21	improvements we made, demostrating
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	22	the usefulness and reliability of formal proofs on algorithms.
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	23	These ``aggressive'' simplifications would not be possible in the injection-based
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	24	lexing we introduced in chapter \ref{Inj}.
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	25	We then go on to prove the correctness with the improved version of
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	26	$\blexer$, called $\blexersimp$, by establishing
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	27	$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	28
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	29	\section{Simplification for Annotated Regular Expressions}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	30	The first thing we notice in the fast growth of examples such as $(a^a^)^*$'s
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	31	and $(a^* + (aa)^)^$'s derivatives is that a lot of duplicated sub-patterns
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	32	are scattered around different levels, and therefore requires
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	33	de-duplication at different levels:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	34	\begin{center}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	35	$(a^a^)^* \rightarrow (a^a^ + a^)\cdot(a^a^)^ \rightarrow $\\
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	36	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^ \rightarrow \ldots$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	37	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	38	\noindent
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	39	As we have already mentioned in \ref{eqn:growth2},
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	40	a simple-minded simplification function cannot simplify
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	41	\begin{center}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	42	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^$
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	43	\end{center}
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	44	any further, one would expect a better simplification function to work in the
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	45	following way:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	46	\begin{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	47	((a^a^ + \underbrace{a^}_\text{A})+\underbrace{a^}_\text{duplicate of A})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	48	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this}.\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	49	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	50	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	51	\color{gray} + a^* \color{black})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	52	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this} \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	53	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	54	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	55	)\cdot(a^a^)^*
583 4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	56	\color{gray} + (a^a^ + a^) \cdot(a^a^)^\\
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	57	\bigg\downarrow \\
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	58	(a^a^ + a^*
4aabb0629e4b chap4 Chengsong parents: 582 diff changeset	59	)\cdot(a^a^)^*
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	60	\end{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	61	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	62	This motivating example came from testing Sulzmann and Lu's
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	63	algorithm: their simplification does
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	64	not work!
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	65	We quote their $\textit{simp}$ function verbatim here:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	66	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	67	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	68	$\simp\; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ &
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	69	$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	70	& &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	71	$\simp\;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	72	\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	73	\textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	74	& & $\textit{else}\;\;_{bs}((\simp\;r_1)\cdot
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	75	(\simp\; r_2))$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	76	$\simp \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	77	$\simp \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	78	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	79	$\simp \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simp \; r)$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	80	$\simp \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	81	(\nub \; (\filter \; (\not \circ \zeroable)\;((\simp \; r) :: \map \; \simp \; rs)))$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	82
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	83	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	84	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	85	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	86	the $\textit{zeroable}$ predicate
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	87	which tests whether the regular expression
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	88	is equivalent to $\ZERO$,
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	89	is defined as:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	90	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	91	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	92	$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	93	\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	94	$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	95	$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	96	$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	97	$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	98	$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	99	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	100	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	101	\noindent
584 1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	102	They suggested that the $\simp$ function should be
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	103	applied repeatedly until a fixpoint is reached.
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	104	We implemented the algorithm as required in Scala,
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	105	and found that final derivative regular expression
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	106	size grows exponentially fast:
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	107	\begin{center}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	108	\begin{figure}[H]
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	109	\begin{tikzpicture}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	110	\begin{axis}[
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	111	xlabel={$n$},
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	112	ylabel={time in secs},
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	113	ymode = log,
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	114	legend entries={Sulzmann and Lu's $\simp$},
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	115	legend pos=north west,
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	116	legend cell align=left]
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	117	\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	118	\end{axis}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	119	\end{tikzpicture}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	120	\caption{Matching the regular expression $(a^a^)^*$ against strings of the form
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	121	$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	122	$ using Sulzmann and Lu's lexer algorithm}\label{SulzmannLuLexer}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	123	\end{figure}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	124	\end{center}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	125	\noindent
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	126	At $n= 20$ we already get an out of memory error with Scala's normal
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	127	JVM heap size settings,
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	128	which seems a counterexample for their linear complexity claim:
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	129	\begin{quote}\it
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	130	Linear-Time Complexity Claim It is easy to see that each call of one of the functions/operations\b,simp,fuse,mkEpsBCandisPhi leadstosubcallswhose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. We yet need to work out detailed estimates regarding the space complexity of our algorithm.
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	131	\end{quote}
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	132
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	133
1734bd5975a3 chap4 nub Chengsong parents: 583 diff changeset	134
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	135	Despite that we have already implemented the simple-minded simplifications
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	136	such as throwing away useless $\ONE$s and $\ZERO$s.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	137	The simplification rule $r + r \rightarrow $ cannot make a difference either
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	138	since it only removes duplicates on the same level, not something like
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	139	$(r+a)+r \rightarrow r+a$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	140	This requires us to break up nested alternatives into lists, for example
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	141	using the flatten operation similar to the one defined for any function language's
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	142	list datatype:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	143
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	144	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	145	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	146	$\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	147	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	148	$\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \; \textit{as'} $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	149	$\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	150	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	151	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	152
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	153	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	154	There is a minor difference though, in that our $\flts$ operation defined by us
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	155	also throws away $\ZERO$s.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	156	For a flattened list of regular expressions, a de-duplication can be done efficiently:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	157
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	158
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	159	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	160	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	161	$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	162	$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	163	& & $\quad \textit{if} (f \; x) \in acc \textit{then} \distinctBy \; xs \; f \; acc$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	164	& & $\quad \textit{else} x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	165	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	166	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	167	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	168	The reason we define a distinct function under a mapping $f$ is because
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	169	we want to eliminate regular expressions that are the same syntactically,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	170	but having different bit-codes (more on the reason why we can do this later).
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	171	To achieve this, we call $\erase$ as the function $f$ during the distinction
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	172	operation.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	173	A recursive definition of our simplification function
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	174	that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	175
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	176	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	177	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	178
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	179	$\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp} \; a_2) $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	180	$\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \erase \; \phi) $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	181	$\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	182	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	183	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	184
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	185	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	186	The simplification (named $\bsimp$ for \emph{b}it-coded) does a pattern matching on the regular expression.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	187	When it detected that the regular expression is an alternative or
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	188	sequence, it will try to simplify its children regular expressions
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	189	recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	190	$\ONE$, which might trigger further simplification at the current level.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	191	Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	192	using rules such as $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	193	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	194	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	195	$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	196	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	197	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	198	&&$\quad\textit{case} \; (_{bs1}\ONE, a_2') \Rightarrow \textit{fuse} \; (bs@bs_1) \; a_2'$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	199	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	200	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	201	\end{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	202	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	203	The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	204	the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	205	and then call $\distinctBy$ on that list, the predicate determining whether two
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	206	elements are the same is $\erase \; r_1 = \erase\; r_2$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	207	Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	208	singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	209	decides whether to keep the current level constructor $\sum$ as it is, and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	210	removes it when there are less than two elements:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	211	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	212	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	213	$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	214	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	215	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	216	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	217	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	218
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	219	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	220	Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	221	we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	222	so it stays small:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	223	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	224	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	225	$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	226	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	227	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	228	Following previous notation of natural
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	229	extension from derivative w.r.t.~character to derivative
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	230	w.r.t.~string, we define the derivative that nests simplifications with derivatives:%\comment{simp in the [] case?}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	231
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	232	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	233	\begin{tabular}{lcl}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	234	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	235	$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	236	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	237	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	238
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	239	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	240	Extracting bit-codes from the derivatives that had been simplified repeatedly after
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	241	each derivative run, the simplified $\blexer$, called $\blexersimp$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	242	is defined as
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	243	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	244	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	245	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	246	$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	247	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	248	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	249	& & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	250	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	251	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	252
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	253	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	254	This algorithm keeps the regular expression size small, for example,
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	255	with this simplification our previous $(a + aa)^*$ example's fast growth (over
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	256	$10^5$ nodes at around $20$ input length)
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	257	will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	258	input string is.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	259	We show some graphs to better demonstrate this imrpovement.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	260
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	261
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	262	\section{$(a+aa)^$ and $(a^\cdot a^)^$ against $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	263	For $(a+aa)^*$, it used to grow to over $9000$ nodes with simple-minded simplification
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	264	at only around $15$ input characters:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	265	\begin{center}
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	266	\begin{tabular}{ll}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	267	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	268	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	269	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	270	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	271	width=7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	272	height=4cm,
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	273	legend entries={Lexer with $\textit{bsimp}$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	274	legend pos= south east,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	275	legend cell align=left]
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	276	\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_bsimp.data};
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	277	\end{axis}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	278	\end{tikzpicture} %\label{fig:BitcodedLexer}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	279	&
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	280	\begin{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	281	\begin{axis}[
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	282	xlabel={$n$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	283	ylabel={derivative size},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	284	width = 7cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	285	height = 4cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	286	legend entries={Lexer without $\textit{bsimp}$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	287	legend pos= north west,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	288	legend cell align=left]
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	289	\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_easysimp.data};
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	290	\end{axis}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	291	\end{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	292	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	293	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	294	And for $(a^\cdot a^)^*$, unlike in \ref{fig:BetterWaterloo}, the size with simplification now stay nicely constant with current
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	295	simplification rules (we put the graphs together to show the contrast)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	296	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	297	\begin{tabular}{ll}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	298	\begin{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	299	\begin{axis}[
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	300	xlabel={$n$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	301	ylabel={derivative size},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	302	width=7cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	303	height=4cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	304	legend entries={Lexer with $\textit{bsimp}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	305	legend pos= south east,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	306	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	307	\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	308	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	309	\end{tikzpicture} %\label{fig:BitcodedLexer}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	310	&
7cf9f17aa179 more Chengsong parents: 538 diff changeset	311	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	312	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	313	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	314	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	315	width = 7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	316	height = 4cm,
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	317	legend entries={Lexer without $\textit{bsimp}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	318	legend pos= north west,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	319	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	320	\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	321	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	322	\end{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	323	\end{tabular}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	324	\end{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	325
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	326	%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	327	% SECTION rewrite relation
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	328	%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	329	\section{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	330	In the $\blexer$'s correctness proof, we
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	331	did not directly derive the fact that $\blexer$ gives out the POSIX value,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	332	but first prove that $\blexer$ is linked with $\lexer$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	333	Then we re-use
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	334	the correctness of $\lexer$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	335	Here we follow suit by first proving that
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	336	$\blexersimp \; r \; s $
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	337	produces the same output as $\blexer \; r\; s$,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	338	and then putting it together with
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	339	$\blexer$'s correctness result
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	340	($(r, s) \rightarrow v \implies \blexer \; r \;s = v$)
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	341	to obtain half of the correctness property (the other half
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	342	being when $s$ is not $L \; r$ which is routine to establish)
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	343	\begin{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	344	$(r, s) \rightarrow v \implies \blexersimp \; r \; s = v$
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	345	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	346	\noindent
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	347	because it makes the proof simpler
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	348	The overall idea for the correctness of our simplified bitcoded lexer
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	349	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	350	is that the $\textit{bsimp}$ will not change the regular expressions so much that
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	351	it becomes impossible to extract the $\POSIX$ values.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	352	To capture this "similarity" between unsimplified regular expressions and simplified ones,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	353	we devise the rewriting relation $\rrewrite$, written infix as $\rightsquigarrow$:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	354
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	355	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	356	\begin{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	357	\inferrule{}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	358
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	359	\inferrule{}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	360
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	361	\inferrule{}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	362
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	363
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	364
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	365	\inferrule{\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	366
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	367	\inferrule{\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	368
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	369	\inferrule{}{ _{bs}\sum [] \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	370
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	371	\inferrule{}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	372
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	373	\inferrule{\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	374
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	375	\inferrule{}{\\ [] \stackrel{s}{\rightsquigarrow} []}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	376
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	377	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{\\ r :: rs_1 \rightsquigarrow r :: rs_2 \rightsquigarrow}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	378
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	379	\inferrule{r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	380
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	381	\inferrule{}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	382
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	383	\inferrule{}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	384
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	385	\inferrule{\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rsc \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	386
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	387	\end{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	388	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	389	These "rewrite" steps define the atomic simplifications we could impose on regular expressions
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	390	under our simplification algorithm.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	391	For convenience, we define a relation $\stackrel{s}{\rightsquigarrow}$ for rewriting
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	392	a list of regular exression to another list.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	393	The $\rerase{}$ function is used instead of $_\downarrow$ for the finiteness bound proof of next chapter,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	394	which we will discuss later. For the moment the reader can assume they basically do the same thing as $\erase$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	395
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	396	Usually more than one steps are taking place during the simplification of a regular expression,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	397	therefore we define the reflexive transitive closure of the $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	398	relations:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	399
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	400	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	401	\begin{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	402	\inferrule{}{ r \stackrel{*}{\rightsquigarrow} r \\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	403	\inferrule{}{\\ rs \stackrel{s*}{\rightsquigarrow} rs \\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	404	\inferrule{\\r_1 \stackrel{}{\rightsquigarrow} r_2 \land \; r_2 \stackrel{}{\rightsquigarrow} r_3}{r_1 \stackrel{*}{\rightsquigarrow} r_3\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	405	\inferrule{\\rs_1 \stackrel{}{\rightsquigarrow} rs_2 \land \; rs_2 \stackrel{}{\rightsquigarrow} rs_3}{rs_1 \stackrel{*}{\rightsquigarrow} rs_3}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	406	\end{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	407	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	408	Now that we have modelled the rewriting behaviour of our simplifier $\bsimp$, we prove mainly
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	409	three properties about how these relations connect to $\blexersimp$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	410	\begin{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	411	\item
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	412	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	413	The algorithm $\bsimp$ only transforms the regex $r$ using steps specified by
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	414	$\stackrel{*}{\rightsquigarrow}$ and nothing else.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	415	\item
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	416	$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow r'\backslash c$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	417	The relation $\stackrel{*}{rightsquigarrow}$ is preserved under derivative.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	418	\item
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	419	$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$. If we reach another
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	420	expression in finitely many atomic simplification steps, then these two regular expressions
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	421	will produce the same bit-codes under the bit collection function $\bmkeps$ used by our $\blexer$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	422
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	423	\end{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	424	\section{Three Important properties}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	425	These properties would work together towards the correctness theorem.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	426	We start proving each of these lemmas below.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	427	\subsection{$(r \stackrel{}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$ and $r \stackrel{}{\rightsquigarrow} \bsimp{r}$}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	428	The first few properties we establish is that the inference rules we gave for $\rightsquigarrow$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	429	and $\stackrel{s}{\rightsquigarrow}$ also hold as implications for $\stackrel{*}{\rightsquigarrow}$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	430	$\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	431	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	432	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{}{\rightsquigarrow} _{bs} \sum rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	433	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	434	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	435	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	436	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	437	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	438	$r \stackrel{}{\rightsquigarrow} r' \implies _{bs} \sum r :: rs \stackrel{}{\rightsquigarrow} _{bs} \sum r' :: rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	439	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	440	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	441	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	442	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	443	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	444	Then we establish that the $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ relation is preserved w.r.t appending and prepending of a list:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	445	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	446	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	447	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	448	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	449	By induction on the list $rs$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	450	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	451
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	452	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	453	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	454	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	455	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	456	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	457	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	458
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	459	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	460	The $\stackrel{s}{\rightsquigarrow} $ relation after appending a list becomes $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	461	\begin{lemma}\label{ssgqTossgs}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	462	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	463	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	464	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	465	By rule induction of $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	466	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	467	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	468	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	469	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	470	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	471	By rule induction of $\stackrel{s*}{\rightsquigarrow}$ and using \ref{ssgqTossgs}.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	472	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	473	Here are two lemmas relating $\stackrel{}{\rightsquigarrow}$ and $\stackrel{s}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	474	\begin{lemma}\label{singleton}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	475	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies [r_1] \stackrel{s}{\rightsquigarrow} [r_2]$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	476	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	477	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	478	By rule induction of $ \stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	479	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	480	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	481	$rs_3 \stackrel{s}{\rightsquigarrow} rs_4 \land r_1 \stackrel{}{\rightsquigarrow} r_2 \implies
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	482	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	483	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	484	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	485	By using \ref{singleton}.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	486	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	487	Now we get to the "meaty" part of the proof, which relates the relations $\stackrel{s*}{\rightsquigarrow}$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	488	$\stackrel{*}{\rightsquigarrow} $ with our simplification components such $\distinctBy$ and $\flts$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	489	The first lemma below says that for a list made of two parts $rs_1 @ rs_2$, one can throw away the duplicate
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	490	elements in $rs_2$, as well as those that have appeared in $rs_1$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	491	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	492	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} (rs_1 @ (\distinctBy \; rs_2 \; \; \rerase{\_}\; \; (\map\;\; \rerase{\_}\; \; rs_1)))$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	493	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	494	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	495	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	496	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	497	The above h as the corollary that is suitable for the actual way $\distinctBy$ is called in $\bsimp$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	498	\begin{lemma}\label{dBPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	499	$rs_ 1 \rightarrow \distinctBy \; rs_1 \; \phi$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	500	\end{lemma}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	501
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	502
532 cc54ce075db5 restructured Chengsong parents: diff changeset	503
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	504	The flatten function $\flts$ works within the $\rightsquigarrow$ relation:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	505	\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	506	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	507	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	508
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	509	The rewriting in many steps property is composible in terms of regular expression constructors:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	510	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	511	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies _{bs} r_1 \cdot r_3 \stackrel{}{\rightsquigarrow} \; _{bs} r_2 \cdot r_3 \quad $ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	512	$r_3 \stackrel{}{\rightsquigarrow} r_4 \implies _{bs} r_1 \cdot r_3 \stackrel{}{\rightsquigarrow} _{bs} \; r_1 \cdot r_4$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	513	\end{lemma}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	514
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	515	The rewriting in many steps properties $\stackrel{}{\rightsquigarrow}$ and $\stackrel{s}{\rightsquigarrow}$ is preserved under the function $\fuse$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	516	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	517	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies \fuse \; bs \; r_1 \stackrel{}{\rightsquigarrow} \; \fuse \; bs \; r_2 \quad $ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	518	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\fuse \; bs) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	519	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	520	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	521	By the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	522	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 \implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	523	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	524	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	525	If we could rewrite a regular expression in many steps to $\ZERO$, then
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	526	we could also rewrite any sequence containing it to $\ZERO$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	527	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	528	$r_1 \stackrel{}{\rightsquigarrow} \ZERO \implies _{bs}r_1\cdot r_2 \stackrel{}{\rightsquigarrow} \ZERO$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	529	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	530	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	531	$\bmkeps \; (r \backslash s) = \bmkeps \; \bderssimp{r}{s}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	532	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	533	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	534	The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	535	\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	536	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	537	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	538	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	539	Before we give out the next lemmas, we define a predicate for a list of regular expressions
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	540	having at least one nullable regular expressions:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	541	\begin{definition}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	542	$\textit{bnullables} \; rs \dn \exists r \in rs. \bnullable r$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	543	\end{definition}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	544	The rewriting relation $\rightsquigarrow$ preserves nullability:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	545	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	546	$r_1 \rightsquigarrow r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	547	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	548	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	549	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	550	By rule induction of $\rightarrow$ and $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	551	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	552	So does the many steps rewriting:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	553	\begin{lemma}\label{rewritesBnullable}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	554	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	555	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	556	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	557	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	558	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	559	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	560	And if both regular expressions in a rewriting relation are nullable, then they
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	561	produce the same bit-codes:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	562
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	563	\begin{lemma}\label{rewriteBmkepsAux}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	564	$r_1 \rightsquigarrow r_2 \implies (\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 =
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	565	\bmkeps \; r_2)$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	566	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 \implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	567	\bmkepss \; rs_1 = \bmkepss \; rs2)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	568	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	569	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	570	The definition of $\bmkepss$ on list $rs$ is just to extract the bit-codes on the first element in $rs$ that
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	571	is $bnullable$:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	572	\begin{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	573	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	574	$\bmkepss \; [] $ & $\dn$ & $[]$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	575	$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \; \bnullable \; r then \; (\bmkeps \; r) \; \textit{else} \; \bmkepss \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	576	\end{tabular}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	577	\end{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	578	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	579	And now we are ready to prove the key property that if you
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	580	have two regular expressions, one rewritable in many steps to the other,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	581	and one of them is $\bnullable$, then they will both yield the same bits under $\bmkeps$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	582	\begin{lemma}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	583	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r_1 \;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	584	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	585	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	586	By rule induction of $\stackrel{*}{\rightsquigarrow} $, using \ref{rewriteBmkepsAux} and $\ref{rewritesBnullable}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	587	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	588	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	589	the other property is also ready:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	590	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	591	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	592	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	593	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	594	By an induction on $r$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	595	The most difficult case would be the alternative case, where we using properties such as \ref{bsimpaltsPreserves} and \ref{fltsPreserves} and \ref{dBPreserves}, we could continuously rewrite a list like:\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	596	$rs \stackrel{s*}{\rightsquigarrow} \map \; \textit{bsimp} \; rs$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	597	$\ldots \stackrel{s*}{\rightsquigarrow} \flts \; (\map \; \textit{bsimp} \; rs)$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	598	$\ldots \;\stackrel{s*}{\rightsquigarrow} \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	599	Then we could do the following regular expresssion many steps rewrite:\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	600	$ _{bs} \sum \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi \stackrel{*}{\rightsquigarrow} \bsimpalts \; bs \; (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	601	\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	602
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	603	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	604	\section{Proof for the Property: $r_1 \stackrel{}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{}{\rightsquigarrow} r_2 \backslash c$}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	605	The first thing we prove is that if we could rewrite in one step, then after derivative
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	606	we could rewrite in many steps:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	607	\begin{lemma}\label{rewriteBder}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	608	$r_1 \rightsquigarrow r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	609	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\_\backslash c) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	610	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	611	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	612	By induction on $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	613	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	614	Now we can prove that once we could rewrite from one expression to another in many steps,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	615	then after a derivative on both sides we could still rewrite one to another in many steps:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	616	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	617	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{}{\rightsquigarrow} r_2 \backslash c$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	618	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	619	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	620	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma :\ref{rewriteBder}.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	621	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	622	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	623	This can be extended and combined with the previous two important properties
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	624	so that a regular expression's successivve derivatives can be rewritten in many steps
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	625	to its simplified counterpart:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	626	\begin{lemma}\label{bderBderssimp}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	627	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	628	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	629	\subsection{Main Theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	630	Now with \ref{bdersBderssimp} we are ready for the main theorem.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	631	To link $\blexersimp$ and $\blexer$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	632	we first say that they give out the same bits, if the lexing result is a match:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	633	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	634	$\bnullable \; (a \backslash s) \implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	635	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	636	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	637	Now that they give out the same bits, we know that they give the same value after decoding,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	638	which we know is correct value as $\blexer$ is correct:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	639	\begin{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	640	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	641	\end{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	642	\noindent
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	643	\begin{proof}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	644	One can rewrite in many steps from the original lexer's derivative regular expressions to the
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	645	lexer with simplification applied:
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	646	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	647	If two regular expressions are rewritable, then they produce the same bits.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	648	Under the condition that $ r_1$ is nullable, we have
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	649	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r', \;\;\text{then} \;\; \bmkeps \; r = \bmkeps \; r'$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	650	This proves the \emph{if} (interesting) branch of
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	651	$\blexer \; r \; s = \blexersimp{r}{s}$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	652
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	653	\end{proof}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	654	\noindent
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	655	As a corollary,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	656	we link this result with the lemma we proved earlier that
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	657	\begin{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	658	$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	659	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	660	and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	661	indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	662	\begin{corollary}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	663	$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	664	\end{corollary}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	665
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	666	\subsection{Comments on the Proof Techniques Used}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	667	The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured Chengsong parents: diff changeset	668	compared with not having simplification is that we can no longer use the argument
cc54ce075db5 restructured Chengsong parents: diff changeset	669	in \cref{flex_retrieve}.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	670	The function \retrieve needs the cumbersome structure of the (umsimplified)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	671	annotated regular expression to
532 cc54ce075db5 restructured Chengsong parents: diff changeset	672	agree with the structure of the value, but simplification will always mess with the
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	673	structure.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	674
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	675	We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	676	but this turns out to be not true, A counterexample of this being
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	677	\[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	678	\]
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	679
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	680	Then we would have $\bsimp{a \backslash s}$ being
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	681	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	682	whereas $\bsimp{\bderssimp{a}{s}}$ would be
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	683	$_{Z}(_{Z} \ONE + _{S} c)$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	684	Unfortunately if we apply $\textit{bsimp}$ at different
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	685	stages we will always have this discrepancy, due to
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	686	whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	687	is taken at some points will be entirely dependant on when the simplification
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	688	take place whether there is a larger alternative structure surrounding the
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	689	alternative being simplified.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	690	The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	691	us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	692	are taken, but simply say that they can be taken to make two similar
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	693	regular expressions equal, and can be done after interleaving derivatives
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	694	and simplifications.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	695
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	696
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	697	Having correctness property is good. But we would also like the lexer to be efficient in
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	698	some sense, for exampe, not grinding to a halt at certain cases.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	699	In the next chapter we shall prove that for a given $r$, the internal derivative size is always
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	700	finitely bounded by a constant.
582 3e19073e91f4 chap3 done Chengsong parents: 579 diff changeset	701	we would expect in the

author	Chengsong
	Tue, 23 Aug 2022 22:59:49 +0100
changeset 584	1734bd5975a3
parent 583	4aabb0629e4b
child 585	4969ef817d92
permissions	-rwxr-xr-x