lexing: ChengsongTanPhdThesis/Chapters/Bitcoded2.tex@8016a2480704 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
cc54ce075db5 restructured Chengsong parents: diff changeset	11
cc54ce075db5 restructured Chengsong parents: diff changeset	12
cc54ce075db5 restructured Chengsong parents: diff changeset	13
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	14	Now we introduce the simplifications, which is why we introduce the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	15	bitcodes in the first place.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	16
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	17	\subsection*{Simplification Rules}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	18
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	19	This section introduces aggressive (in terms of size) simplification rules
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	20	on annotated regular expressions
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	21	to keep derivatives small. Such simplifications are promising
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	22	as we have
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	23	generated test data that show
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	24	that a good tight bound can be achieved. We could only
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	25	partially cover the search space as there are infinitely many regular
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	26	expressions and strings.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	27
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	28	One modification we introduced is to allow a list of annotated regular
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	29	expressions in the $\sum$ constructor. This allows us to not just
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	30	delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	31	also unnecessary ``copies'' of regular expressions (very similar to
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	32	simplifying $r + r$ to just $r$, but in a more general setting). Another
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	33	modification is that we use simplification rules inspired by Antimirov's
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	34	work on partial derivatives. They maintain the idea that only the first
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	35	``copy'' of a regular expression in an alternative contributes to the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	36	calculation of a POSIX value. All subsequent copies can be pruned away from
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	37	the regular expression. A recursive definition of our simplification function
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	38	that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	39	%\comment{Use $\ZERO$, $\ONE$ and so on.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	40	%Is it $ALTS$ or $ALTS$?}\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	41
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	42	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	43	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	44
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	45	$\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	46	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	47	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	48	&&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	49	&&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	50	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	51
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	52	$\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	53	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	54	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	55	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	56
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	57	$\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	58	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	59	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	60
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	61	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	62	The simplification does a pattern matching on the regular expression.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	63	When it detected that the regular expression is an alternative or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	64	sequence, it will try to simplify its child regular expressions
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	65	recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	66	$\ONE$, which might trigger further simplification at the current level.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	67	The most involved part is the $\sum$ clause, where we use two
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	68	auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	69	alternatives and reduce as many duplicates as possible. Function
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	70	$\textit{distinct}$ keeps the first occurring copy only and removes all later ones
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	71	when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	72	Its recursive definition is given below:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	73
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	74	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	75	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	76	$\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	77	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	78	$\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \; \textit{as'} $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	79	$\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise)
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	80	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	81	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	82
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	83	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	84	Here $\textit{flatten}$ behaves like the traditional functional programming flatten
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	85	function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	86	removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	87
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	88	Having defined the $\simp$ function,
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	89	we can use the previous notation of natural
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	90	extension from derivative w.r.t.~character to derivative
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	91	w.r.t.~string:%\comment{simp in the [] case?}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	92
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	93	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	94	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	95	$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	96	$r \backslash_{simp} [\,] $ & $\dn$ & $r$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	97	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	98	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	99
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	100	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	101	to obtain an optimised version of the algorithm:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	102
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	103	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	104	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	105	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	106	$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	107	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	108	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	109	& & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	110	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	111	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	112
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	113	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	114	This algorithm keeps the regular expression size small, for example,
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	115	with this simplification our previous $(a + aa)^*$ example's 8000 nodes
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	116	will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	117	input string is.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	118
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	119
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	120
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	121
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	122
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	123
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	124
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	125
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	126
532 cc54ce075db5 restructured Chengsong parents: diff changeset	127
cc54ce075db5 restructured Chengsong parents: diff changeset	128	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	129	% SECTION common identities
cc54ce075db5 restructured Chengsong parents: diff changeset	130	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	131	\section{Common Identities In Simplification-Related Functions}
cc54ce075db5 restructured Chengsong parents: diff changeset	132	Need to prove these before starting on the big correctness proof.
cc54ce075db5 restructured Chengsong parents: diff changeset	133
cc54ce075db5 restructured Chengsong parents: diff changeset	134	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	135	% SUBSECTION
cc54ce075db5 restructured Chengsong parents: diff changeset	136	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	137	\subsection{Idempotency of $\simp$}
cc54ce075db5 restructured Chengsong parents: diff changeset	138
cc54ce075db5 restructured Chengsong parents: diff changeset	139	\begin{equation}
cc54ce075db5 restructured Chengsong parents: diff changeset	140	\simp \;r = \simp\; \simp \; r
cc54ce075db5 restructured Chengsong parents: diff changeset	141	\end{equation}
cc54ce075db5 restructured Chengsong parents: diff changeset	142	This property means we do not have to repeatedly
cc54ce075db5 restructured Chengsong parents: diff changeset	143	apply simplification in each step, which justifies
cc54ce075db5 restructured Chengsong parents: diff changeset	144	our definition of $\blexersimp$.
cc54ce075db5 restructured Chengsong parents: diff changeset	145	It will also be useful in future proofs where properties such as
cc54ce075db5 restructured Chengsong parents: diff changeset	146	closed forms are needed.
cc54ce075db5 restructured Chengsong parents: diff changeset	147	The proof is by structural induction on $r$.
cc54ce075db5 restructured Chengsong parents: diff changeset	148
cc54ce075db5 restructured Chengsong parents: diff changeset	149	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	150	% SUBSECTION
cc54ce075db5 restructured Chengsong parents: diff changeset	151	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	152	\subsection{Syntactic Equivalence Under $\simp$}
cc54ce075db5 restructured Chengsong parents: diff changeset	153	We prove that minor differences can be annhilated
cc54ce075db5 restructured Chengsong parents: diff changeset	154	by $\simp$.
cc54ce075db5 restructured Chengsong parents: diff changeset	155	For example,
cc54ce075db5 restructured Chengsong parents: diff changeset	156	\begin{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	157	$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) =
cc54ce075db5 restructured Chengsong parents: diff changeset	158	\simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$
cc54ce075db5 restructured Chengsong parents: diff changeset	159	\end{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	160
cc54ce075db5 restructured Chengsong parents: diff changeset	161
cc54ce075db5 restructured Chengsong parents: diff changeset	162
cc54ce075db5 restructured Chengsong parents: diff changeset	163
cc54ce075db5 restructured Chengsong parents: diff changeset	164
cc54ce075db5 restructured Chengsong parents: diff changeset	165
cc54ce075db5 restructured Chengsong parents: diff changeset	166
cc54ce075db5 restructured Chengsong parents: diff changeset	167	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	168	% SECTION corretness proof
cc54ce075db5 restructured Chengsong parents: diff changeset	169	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	170	\section{Proof Technique of Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	171	The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured Chengsong parents: diff changeset	172	compared with not having simplification is that we can no longer use the argument
cc54ce075db5 restructured Chengsong parents: diff changeset	173	in \cref{flex_retrieve}.
cc54ce075db5 restructured Chengsong parents: diff changeset	174	The function \retrieve needs the structure of the annotated regular expression to
cc54ce075db5 restructured Chengsong parents: diff changeset	175	agree with the structure of the value, but simplification will always mess with the
cc54ce075db5 restructured Chengsong parents: diff changeset	176	structure:
cc54ce075db5 restructured Chengsong parents: diff changeset	177	%TODO: after simp does not agree with each other: (a + 0) --> a v.s. Left(Char(a))

author	Chengsong
	Thu, 09 Jun 2022 12:57:53 +0100
changeset 538	8016a2480704
parent 532	cc54ce075db5
child 539	7cf9f17aa179
permissions	-rwxr-xr-x