lexing: ChengsongTanPhdThesis/Chapters/Bitcoded2.tex@35df9cdd36ca (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
cc54ce075db5 restructured Chengsong parents: diff changeset	11
cc54ce075db5 restructured Chengsong parents: diff changeset	12
cc54ce075db5 restructured Chengsong parents: diff changeset	13
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	14	Now we introduce the simplifications, which is why we introduce the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	15	bitcodes in the first place.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	16
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	17	\section{Simplification for Annotated Regular Expressions}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	18	The first thing we notice in the fast growth of examples such as $(a^a^)^*$'s
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	19	and $(a^* + (aa)^)^$'s derivatives is that a lot of duplicated sub-patterns
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	20	are scattered around different levels, and therefore requires
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	21	de-duplication at different levels:
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	22	\begin{center}
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	23	$(a^a^)^* \rightarrow (a^a^ + a^)\cdot(a^a^)^ \rightarrow $\\
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	24	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	25	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	26	\noindent
579 35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	27	As we have already mentioned in \ref{eqn:growth2},
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	28	a simple-minded simplification function cannot simplify
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	29	$((a^a^ + a^) + a^)\cdot(a^a^)^* + (a^a^ + a^)\cdot(a^a^)^$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	30	any further.
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	31	we would expect a better simplification function to work in the
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	32	following way:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	33	\begin{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	34	((a^a^ + \underbrace{a^}_\text{A})+\underbrace{a^}_\text{duplicate of A})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	35	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this}.\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	36	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	37	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	38	\color{gray} + a^* \color{black})\cdot(a^a^)^* +
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	39	\underbrace{(a^a^ + a^)\cdot(a^a^)^}_\text{further simp removes this} \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	40	\bigg\downarrow \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	41	(a^a^ + a^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	42	)\cdot(a^a^)^*
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	43	\color{gray} + (a^a^ + a^) \cdot(a^a^)^
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	44	\end{gather*}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	45	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	46	This motivating example came from testing Sulzmann and Lu's
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	47	algorithm: their simplification does
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	48	not work!
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	49	We quote their $\textit{simp}$ function verbatim here:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	50	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	51	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	52	$\simp\; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ &
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	53	$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	54	& &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	55	$\simp\;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	56	\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	57	\textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	58	& & $\textit{else}\;\;_{bs}((\simp\;r_1)\cdot
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	59	(\simp\; r_2))$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	60	$\simp \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	61	$\simp \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	62	$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	63	$\simp \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simp \; r)$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	64	$\simp \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	65	(\nub \; (\filter \; (\not \circ \zeroable)\;((\simp \; r) :: \map \; \simp \; rs)))$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	66
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	67	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	68	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	69	\noindent
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	70	the $\textit{zeroable}$ predicate
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	71	which tests whether the regular expression
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	72	is equivalent to $\ZERO$,
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	73	is defined as:
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	74	\begin{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	75	\begin{tabular}{lcl}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	76	$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	77	\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	78	$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	79	$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	80	$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	81	$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	82	$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	83	\end{tabular}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	84	\end{center}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	85	%\[
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	86	%\begin{aligned}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	87	%&\textit{isPhi} \left(b s @ r i^{*}\right)=\text { False } \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	88	%&\textit{isPhi}\left(b s @ r i_{1} r i_{2}\right)=\textit{isPhi} r i_{1} \vee \textit{isPhi} r i_{2} \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	89	%&\textit{isPhi} \left(b s @ r i_{1} \oplus r i_{2}\right)=\textit{isPhi} r i_{1} \wedge \textit{isPhi} r i_{2} \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	90	%&\textit{isPhi}(b s @ l)= False \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	91	%&\textit{isPhi}(b s @ \epsilon)= False \\
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	92	%&\textit{isPhi} \; \ZERO = True
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	93	%\end{aligned}
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	94	%\]
35df9cdd36ca more chap3 Chengsong parents: 576 diff changeset	95	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	96	Despite that we have already implemented the simple-minded simplifications
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	97	such as throwing away useless $\ONE$s and $\ZERO$s.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	98	The simplification rule $r + r \rightarrow $ cannot make a difference either
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	99	since it only removes duplicates on the same level, not something like
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	100	$(r+a)+r \rightarrow r+a$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	101	This requires us to break up nested alternatives into lists, for example
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	102	using the flatten operation similar to the one defined for any function language's
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	103	list datatype:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	104
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	105	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	106	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	107	$\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	108	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	109	$\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \; \textit{as'} $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	110	$\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	111	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	112	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	113
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	114	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	115	There is a minor difference though, in that our $\flts$ operation defined by us
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	116	also throws away $\ZERO$s.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	117	For a flattened list of regular expressions, a de-duplication can be done efficiently:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	118
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	119
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	120	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	121	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	122	$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	123	$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	124	& & $\quad \textit{if} (f \; x) \in acc \textit{then} \distinctBy \; xs \; f \; acc$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	125	& & $\quad \textit{else} x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	126	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	127	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	128	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	129	The reason we define a distinct function under a mapping $f$ is because
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	130	we want to eliminate regular expressions that are the same syntactically,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	131	but having different bit-codes (more on the reason why we can do this later).
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	132	To achieve this, we call $\erase$ as the function $f$ during the distinction
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	133	operation.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	134	A recursive definition of our simplification function
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	135	that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	136
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	137	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	138	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	139
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	140	$\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp} \; a_2) $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	141	$\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \erase \; \phi) $ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	142	$\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	143	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	144	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	145
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	146	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	147	The simplification (named $\bsimp$ for \emph{b}it-coded) does a pattern matching on the regular expression.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	148	When it detected that the regular expression is an alternative or
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	149	sequence, it will try to simplify its children regular expressions
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	150	recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	151	$\ONE$, which might trigger further simplification at the current level.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	152	Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	153	using rules such as $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	154	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	155	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	156	$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	157	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	158	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	159	&&$\quad\textit{case} \; (_{bs1}\ONE, a_2') \Rightarrow \textit{fuse} \; (bs@bs_1) \; a_2'$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	160	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	161	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	162	\end{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	163	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	164	The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	165	the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	166	and then call $\distinctBy$ on that list, the predicate determining whether two
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	167	elements are the same is $\erase \; r_1 = \erase\; r_2$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	168	Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	169	singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	170	decides whether to keep the current level constructor $\sum$ as it is, and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	171	removes it when there are less than two elements:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	172	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	173	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	174	$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	175	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	176	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	177	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	178	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	179
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	180	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	181	Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	182	we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	183	so it stays small:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	184	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	185	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	186	$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	187	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	188	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	189	Following previous notation of natural
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	190	extension from derivative w.r.t.~character to derivative
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	191	w.r.t.~string, we define the derivative that nests simplifications with derivatives:%\comment{simp in the [] case?}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	192
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	193	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	194	\begin{tabular}{lcl}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	195	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	196	$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	197	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	198	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	199
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	200	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	201	Extracting bit-codes from the derivatives that had been simplified repeatedly after
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	202	each derivative run, the simplified $\blexer$, called $\blexersimp$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	203	is defined as
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	204	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	205	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	206	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	207	$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	208	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	209	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	210	& & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	211	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	212	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	213
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	214	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	215	This algorithm keeps the regular expression size small, for example,
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	216	with this simplification our previous $(a + aa)^*$ example's fast growth (over
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	217	$10^5$ nodes at around $20$ input length)
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	218	will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	219	input string is.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	220	We show some graphs to better demonstrate this imrpovement.
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	221
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	222
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	223	\section{$(a+aa)^$ and $(a^\cdot a^)^$ against $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	224	For $(a+aa)^*$, it used to grow to over $9000$ nodes with simple-minded simplification
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	225	at only around $15$ input characters:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	226	\begin{center}
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	227	\begin{tabular}{ll}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	228	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	229	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	230	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	231	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	232	width=7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	233	height=4cm,
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	234	legend entries={Lexer with $\textit{bsimp}$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	235	legend pos= south east,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	236	legend cell align=left]
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	237	\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_bsimp.data};
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	238	\end{axis}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	239	\end{tikzpicture} %\label{fig:BitcodedLexer}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	240	&
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	241	\begin{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	242	\begin{axis}[
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	243	xlabel={$n$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	244	ylabel={derivative size},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	245	width = 7cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	246	height = 4cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	247	legend entries={Lexer without $\textit{bsimp}$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	248	legend pos= north west,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	249	legend cell align=left]
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	250	\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_easysimp.data};
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	251	\end{axis}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	252	\end{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	253	\end{tabular}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	254	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	255	And for $(a^\cdot a^)^*$, unlike in \ref{fig:BetterWaterloo}, the size with simplification now stay nicely constant with current
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	256	simplification rules (we put the graphs together to show the contrast)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	257	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	258	\begin{tabular}{ll}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	259	\begin{tikzpicture}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	260	\begin{axis}[
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	261	xlabel={$n$},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	262	ylabel={derivative size},
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	263	width=7cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	264	height=4cm,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	265	legend entries={Lexer with $\textit{bsimp}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	266	legend pos= south east,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	267	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	268	\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	269	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	270	\end{tikzpicture} %\label{fig:BitcodedLexer}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	271	&
7cf9f17aa179 more Chengsong parents: 538 diff changeset	272	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	273	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	274	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	275	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	276	width = 7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	277	height = 4cm,
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	278	legend entries={Lexer without $\textit{bsimp}$},
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	279	legend pos= north west,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	280	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	281	\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	282	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	283	\end{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	284	\end{tabular}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	285	\end{center}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	286
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	287	%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	288	% SECTION rewrite relation
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	289	%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	290	\section{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	291	In the $\blexer$'s correctness proof, we
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	292	did not directly derive the fact that $\blexer$ gives out the POSIX value,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	293	but first prove that $\blexer$ is linked with $\lexer$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	294	Then we re-use
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	295	the correctness of $\lexer$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	296	Here we follow suit by first proving that
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	297	$\blexersimp \; r \; s $
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	298	produces the same output as $\blexer \; r\; s$,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	299	and then putting it together with
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	300	$\blexer$'s correctness result
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	301	($(r, s) \rightarrow v \implies \blexer \; r \;s = v$)
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	302	to obtain half of the correctness property (the other half
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	303	being when $s$ is not $L \; r$ which is routine to establish)
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	304	\begin{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	305	$(r, s) \rightarrow v \implies \blexersimp \; r \; s = v$
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	306	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	307	\noindent
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	308	because it makes the proof simpler
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	309	The overall idea for the correctness of our simplified bitcoded lexer
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	310	\noindent
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	311	is that the $\textit{bsimp}$ will not change the regular expressions so much that
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	312	it becomes impossible to extract the $\POSIX$ values.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	313	To capture this "similarity" between unsimplified regular expressions and simplified ones,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	314	we devise the rewriting relation $\rrewrite$, written infix as $\rightsquigarrow$:
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	315
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	316	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	317	\begin{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	318	\inferrule{}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	319
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	320	\inferrule{}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	321
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	322	\inferrule{}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	323
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	324
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	325
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	326	\inferrule{\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	327
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	328	\inferrule{\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	329
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	330	\inferrule{}{ _{bs}\sum [] \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	331
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	332	\inferrule{}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	333
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	334	\inferrule{\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	335
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	336	\inferrule{}{\\ [] \stackrel{s}{\rightsquigarrow} []}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	337
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	338	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{\\ r :: rs_1 \rightsquigarrow r :: rs_2 \rightsquigarrow}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	339
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	340	\inferrule{r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	341
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	342	\inferrule{}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	343
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	344	\inferrule{}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	345
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	346	\inferrule{\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rsc \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	347
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	348	\end{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	349	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	350	These "rewrite" steps define the atomic simplifications we could impose on regular expressions
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	351	under our simplification algorithm.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	352	For convenience, we define a relation $\stackrel{s}{\rightsquigarrow}$ for rewriting
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	353	a list of regular exression to another list.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	354	The $\rerase{}$ function is used instead of $_\downarrow$ for the finiteness bound proof of next chapter,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	355	which we will discuss later. For the moment the reader can assume they basically do the same thing as $\erase$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	356
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	357	Usually more than one steps are taking place during the simplification of a regular expression,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	358	therefore we define the reflexive transitive closure of the $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	359	relations:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	360
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	361	\begin{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	362	\begin{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	363	\inferrule{}{ r \stackrel{*}{\rightsquigarrow} r \\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	364	\inferrule{}{\\ rs \stackrel{s*}{\rightsquigarrow} rs \\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	365	\inferrule{\\r_1 \stackrel{}{\rightsquigarrow} r_2 \land \; r_2 \stackrel{}{\rightsquigarrow} r_3}{r_1 \stackrel{*}{\rightsquigarrow} r_3\\}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	366	\inferrule{\\rs_1 \stackrel{}{\rightsquigarrow} rs_2 \land \; rs_2 \stackrel{}{\rightsquigarrow} rs_3}{rs_1 \stackrel{*}{\rightsquigarrow} rs_3}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	367	\end{mathpar}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	368	\end{center}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	369	Now that we have modelled the rewriting behaviour of our simplifier $\bsimp$, we prove mainly
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	370	three properties about how these relations connect to $\blexersimp$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	371	\begin{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	372	\item
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	373	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	374	The algorithm $\bsimp$ only transforms the regex $r$ using steps specified by
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	375	$\stackrel{*}{\rightsquigarrow}$ and nothing else.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	376	\item
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	377	$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow r'\backslash c$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	378	The relation $\stackrel{*}{rightsquigarrow}$ is preserved under derivative.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	379	\item
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	380	$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$. If we reach another
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	381	expression in finitely many atomic simplification steps, then these two regular expressions
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	382	will produce the same bit-codes under the bit collection function $\bmkeps$ used by our $\blexer$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	383
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	384	\end{itemize}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	385	\section{Three Important properties}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	386	These properties would work together towards the correctness theorem.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	387	We start proving each of these lemmas below.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	388	\subsection{$(r \stackrel{}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$ and $r \stackrel{}{\rightsquigarrow} \bsimp{r}$}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	389	The first few properties we establish is that the inference rules we gave for $\rightsquigarrow$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	390	and $\stackrel{s}{\rightsquigarrow}$ also hold as implications for $\stackrel{*}{\rightsquigarrow}$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	391	$\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	392	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	393	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{}{\rightsquigarrow} _{bs} \sum rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	394	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	395	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	396	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	397	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	398	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	399	$r \stackrel{}{\rightsquigarrow} r' \implies _{bs} \sum r :: rs \stackrel{}{\rightsquigarrow} _{bs} \sum r' :: rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	400	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	401	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	402	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	403	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	404	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	405	Then we establish that the $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ relation is preserved w.r.t appending and prepending of a list:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	406	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	407	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	408	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	409	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	410	By induction on the list $rs$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	411	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	412
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	413	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	414	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	415	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	416	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	417	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	418	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	419
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	420	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	421	The $\stackrel{s}{\rightsquigarrow} $ relation after appending a list becomes $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	422	\begin{lemma}\label{ssgqTossgs}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	423	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	424	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	425	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	426	By rule induction of $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	427	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	428	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	429	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	430	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	431	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	432	By rule induction of $\stackrel{s*}{\rightsquigarrow}$ and using \ref{ssgqTossgs}.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	433	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	434	Here are two lemmas relating $\stackrel{}{\rightsquigarrow}$ and $\stackrel{s}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	435	\begin{lemma}\label{singleton}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	436	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies [r_1] \stackrel{s}{\rightsquigarrow} [r_2]$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	437	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	438	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	439	By rule induction of $ \stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	440	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	441	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	442	$rs_3 \stackrel{s}{\rightsquigarrow} rs_4 \land r_1 \stackrel{}{\rightsquigarrow} r_2 \implies
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	443	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	444	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	445	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	446	By using \ref{singleton}.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	447	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	448	Now we get to the "meaty" part of the proof, which relates the relations $\stackrel{s*}{\rightsquigarrow}$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	449	$\stackrel{*}{\rightsquigarrow} $ with our simplification components such $\distinctBy$ and $\flts$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	450	The first lemma below says that for a list made of two parts $rs_1 @ rs_2$, one can throw away the duplicate
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	451	elements in $rs_2$, as well as those that have appeared in $rs_1$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	452	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	453	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} (rs_1 @ (\distinctBy \; rs_2 \; \; \rerase{\_}\; \; (\map\;\; \rerase{\_}\; \; rs_1)))$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	454	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	455	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	456	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	457	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	458	The above h as the corollary that is suitable for the actual way $\distinctBy$ is called in $\bsimp$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	459	\begin{lemma}\label{dBPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	460	$rs_ 1 \rightarrow \distinctBy \; rs_1 \; \phi$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	461	\end{lemma}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	462
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	463
532 cc54ce075db5 restructured Chengsong parents: diff changeset	464
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	465	The flatten function $\flts$ works within the $\rightsquigarrow$ relation:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	466	\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	467	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	468	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	469
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	470	The rewriting in many steps property is composible in terms of regular expression constructors:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	471	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	472	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies _{bs} r_1 \cdot r_3 \stackrel{}{\rightsquigarrow} \; _{bs} r_2 \cdot r_3 \quad $ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	473	$r_3 \stackrel{}{\rightsquigarrow} r_4 \implies _{bs} r_1 \cdot r_3 \stackrel{}{\rightsquigarrow} _{bs} \; r_1 \cdot r_4$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	474	\end{lemma}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	475
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	476	The rewriting in many steps properties $\stackrel{}{\rightsquigarrow}$ and $\stackrel{s}{\rightsquigarrow}$ is preserved under the function $\fuse$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	477	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	478	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies \fuse \; bs \; r_1 \stackrel{}{\rightsquigarrow} \; \fuse \; bs \; r_2 \quad $ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	479	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\fuse \; bs) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	480	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	481	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	482	By the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	483	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 \implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	484	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	485	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	486	If we could rewrite a regular expression in many steps to $\ZERO$, then
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	487	we could also rewrite any sequence containing it to $\ZERO$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	488	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	489	$r_1 \stackrel{}{\rightsquigarrow} \ZERO \implies _{bs}r_1\cdot r_2 \stackrel{}{\rightsquigarrow} \ZERO$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	490	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	491	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	492	$\bmkeps \; (r \backslash s) = \bmkeps \; \bderssimp{r}{s}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	493	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	494	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	495	The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	496	\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	497	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	498	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	499	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	500	Before we give out the next lemmas, we define a predicate for a list of regular expressions
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	501	having at least one nullable regular expressions:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	502	\begin{definition}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	503	$\textit{bnullables} \; rs \dn \exists r \in rs. \bnullable r$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	504	\end{definition}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	505	The rewriting relation $\rightsquigarrow$ preserves nullability:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	506	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	507	$r_1 \rightsquigarrow r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	508	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	509	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	510	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	511	By rule induction of $\rightarrow$ and $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	512	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	513	So does the many steps rewriting:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	514	\begin{lemma}\label{rewritesBnullable}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	515	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	516	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	517	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	518	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	519	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	520	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	521	And if both regular expressions in a rewriting relation are nullable, then they
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	522	produce the same bit-codes:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	523
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	524	\begin{lemma}\label{rewriteBmkepsAux}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	525	$r_1 \rightsquigarrow r_2 \implies (\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 =
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	526	\bmkeps \; r_2)$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	527	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 \implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	528	\bmkepss \; rs_1 = \bmkepss \; rs2)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	529	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	530	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	531	The definition of $\bmkepss$ on list $rs$ is just to extract the bit-codes on the first element in $rs$ that
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	532	is $bnullable$:
532 cc54ce075db5 restructured Chengsong parents: diff changeset	533	\begin{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	534	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	535	$\bmkepss \; [] $ & $\dn$ & $[]$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	536	$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \; \bnullable \; r then \; (\bmkeps \; r) \; \textit{else} \; \bmkepss \; rs$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	537	\end{tabular}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	538	\end{center}
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	539	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	540	And now we are ready to prove the key property that if you
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	541	have two regular expressions, one rewritable in many steps to the other,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	542	and one of them is $\bnullable$, then they will both yield the same bits under $\bmkeps$:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	543	\begin{lemma}
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	544	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r_1 \;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	545	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	546	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	547	By rule induction of $\stackrel{*}{\rightsquigarrow} $, using \ref{rewriteBmkepsAux} and $\ref{rewritesBnullable}$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	548	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	549	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	550	the other property is also ready:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	551	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	552	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	553	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	554	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	555	By an induction on $r$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	556	The most difficult case would be the alternative case, where we using properties such as \ref{bsimpaltsPreserves} and \ref{fltsPreserves} and \ref{dBPreserves}, we could continuously rewrite a list like:\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	557	$rs \stackrel{s*}{\rightsquigarrow} \map \; \textit{bsimp} \; rs$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	558	$\ldots \stackrel{s*}{\rightsquigarrow} \flts \; (\map \; \textit{bsimp} \; rs)$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	559	$\ldots \;\stackrel{s*}{\rightsquigarrow} \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi$\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	560	Then we could do the following regular expresssion many steps rewrite:\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	561	$ _{bs} \sum \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi \stackrel{*}{\rightsquigarrow} \bsimpalts \; bs \; (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi)$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	562	\\
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	563
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	564	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	565	\section{Proof for the Property: $r_1 \stackrel{}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{}{\rightsquigarrow} r_2 \backslash c$}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	566	The first thing we prove is that if we could rewrite in one step, then after derivative
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	567	we could rewrite in many steps:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	568	\begin{lemma}\label{rewriteBder}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	569	$r_1 \rightsquigarrow r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$ and
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	570	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\_\backslash c) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	571	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	572	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	573	By induction on $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	574	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	575	Now we can prove that once we could rewrite from one expression to another in many steps,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	576	then after a derivative on both sides we could still rewrite one to another in many steps:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	577	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	578	$r_1 \stackrel{}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{}{\rightsquigarrow} r_2 \backslash c$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	579	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	580	\begin{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	581	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma :\ref{rewriteBder}.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	582	\end{proof}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	583	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	584	This can be extended and combined with the previous two important properties
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	585	so that a regular expression's successivve derivatives can be rewritten in many steps
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	586	to its simplified counterpart:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	587	\begin{lemma}\label{bderBderssimp}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	588	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	589	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	590	\subsection{Main Theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	591	Now with \ref{bdersBderssimp} we are ready for the main theorem.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	592	To link $\blexersimp$ and $\blexer$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	593	we first say that they give out the same bits, if the lexing result is a match:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	594	\begin{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	595	$\bnullable \; (a \backslash s) \implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	596	\end{lemma}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	597	\noindent
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	598	Now that they give out the same bits, we know that they give the same value after decoding,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	599	which we know is correct value as $\blexer$ is correct:
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	600	\begin{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	601	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	602	\end{theorem}
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	603	\noindent
576 3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	604	\begin{proof}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	605	One can rewrite in many steps from the original lexer's derivative regular expressions to the
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	606	lexer with simplification applied:
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	607	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	608	If two regular expressions are rewritable, then they produce the same bits.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	609	Under the condition that $ r_1$ is nullable, we have
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	610	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r', \;\;\text{then} \;\; \bmkeps \; r = \bmkeps \; r'$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	611	This proves the \emph{if} (interesting) branch of
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	612	$\blexer \; r \; s = \blexersimp{r}{s}$.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	613
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	614	\end{proof}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	615	\noindent
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	616	As a corollary,
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	617	we link this result with the lemma we proved earlier that
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	618	\begin{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	619	$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	620	\end{center}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	621	and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	622	indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	623	\begin{corollary}
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	624	$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
3e1b699696b6 thesis chap5 Chengsong parents: 543 diff changeset	625	\end{corollary}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	626
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	627	\subsection{Comments on the Proof Techniques Used}
532 cc54ce075db5 restructured Chengsong parents: diff changeset	628	The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured Chengsong parents: diff changeset	629	compared with not having simplification is that we can no longer use the argument
cc54ce075db5 restructured Chengsong parents: diff changeset	630	in \cref{flex_retrieve}.
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	631	The function \retrieve needs the cumbersome structure of the (umsimplified)
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	632	annotated regular expression to
532 cc54ce075db5 restructured Chengsong parents: diff changeset	633	agree with the structure of the value, but simplification will always mess with the
543 b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	634	structure.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	635
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	636	We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	637	but this turns out to be not true, A counterexample of this being
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	638	\[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	639	\]
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	640
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	641	Then we would have $\bsimp{a \backslash s}$ being
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	642	$_{[]}(_{ZZ}\ONE + _{ZS}c ) $
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	643	whereas $\bsimp{\bderssimp{a}{s}}$ would be
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	644	$_{Z}(_{Z} \ONE + _{S} c)$.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	645	Unfortunately if we apply $\textit{bsimp}$ at different
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	646	stages we will always have this discrepancy, due to
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	647	whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	648	is taken at some points will be entirely dependant on when the simplification
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	649	take place whether there is a larger alternative structure surrounding the
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	650	alternative being simplified.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	651	The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	652	us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	653	are taken, but simply say that they can be taken to make two similar
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	654	regular expressions equal, and can be done after interleaving derivatives
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	655	and simplifications.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	656
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	657
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	658	Having correctness property is good. But we would also like the lexer to be efficient in
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	659	some sense, for exampe, not grinding to a halt at certain cases.
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	660	In the next chapter we shall prove that for a given $r$, the internal derivative size is always
b2bea5968b89 thesis_thys Chengsong parents: 539 diff changeset	661	finitely bounded by a constant.

author	Chengsong
	Wed, 17 Aug 2022 01:09:13 +0100 (2022-08-17)
changeset 579	35df9cdd36ca
parent 576	3e1b699696b6
child 582	3e19073e91f4
permissions	-rwxr-xr-x