lexing: ChengsongTanPhdThesis/Chapters/Bitcoded2.tex@7cf9f17aa179 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	% Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4	\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	5
cc54ce075db5 restructured Chengsong parents: diff changeset	6	\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
cc54ce075db5 restructured Chengsong parents: diff changeset	8	%simplifications and therefore introduce our version of the bitcoded algorithm and
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%its correctness proof in
cc54ce075db5 restructured Chengsong parents: diff changeset	10	%Chapter 3\ref{Chapter3}.
cc54ce075db5 restructured Chengsong parents: diff changeset	11
cc54ce075db5 restructured Chengsong parents: diff changeset	12
cc54ce075db5 restructured Chengsong parents: diff changeset	13
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	14	Now we introduce the simplifications, which is why we introduce the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	15	bitcodes in the first place.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	16
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	17	\subsection*{Simplification Rules}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	18
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	19	This section introduces aggressive (in terms of size) simplification rules
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	20	on annotated regular expressions
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	21	to keep derivatives small. Such simplifications are promising
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	22	as we have
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	23	generated test data that show
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	24	that a good tight bound can be achieved. We could only
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	25	partially cover the search space as there are infinitely many regular
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	26	expressions and strings.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	27
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	28	One modification we introduced is to allow a list of annotated regular
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	29	expressions in the $\sum$ constructor. This allows us to not just
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	30	delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	31	also unnecessary ``copies'' of regular expressions (very similar to
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	32	simplifying $r + r$ to just $r$, but in a more general setting). Another
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	33	modification is that we use simplification rules inspired by Antimirov's
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	34	work on partial derivatives. They maintain the idea that only the first
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	35	``copy'' of a regular expression in an alternative contributes to the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	36	calculation of a POSIX value. All subsequent copies can be pruned away from
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	37	the regular expression. A recursive definition of our simplification function
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	38	that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	39	%\comment{Use $\ZERO$, $\ONE$ and so on.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	40	%Is it $ALTS$ or $ALTS$?}\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	41
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	42	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	43	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	44
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	45	$\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	46	&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	47	&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	48	&&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	49	&&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	50	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	51
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	52	$\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	53	&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	54	&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	55	&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	56
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	57	$\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	58	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	59	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	60
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	61	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	62	The simplification does a pattern matching on the regular expression.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	63	When it detected that the regular expression is an alternative or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	64	sequence, it will try to simplify its child regular expressions
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	65	recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	66	$\ONE$, which might trigger further simplification at the current level.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	67	The most involved part is the $\sum$ clause, where we use two
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	68	auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	69	alternatives and reduce as many duplicates as possible. Function
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	70	$\textit{distinct}$ keeps the first occurring copy only and removes all later ones
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	71	when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	72	Its recursive definition is given below:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	73
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	74	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	75	\begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	76	$\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	77	(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	78	$\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \; \textit{as'} $ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	79	$\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise)
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	80	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	81	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	82
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	83	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	84	Here $\textit{flatten}$ behaves like the traditional functional programming flatten
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	85	function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	86	removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	87
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	88	Having defined the $\simp$ function,
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	89	we can use the previous notation of natural
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	90	extension from derivative w.r.t.~character to derivative
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	91	w.r.t.~string:%\comment{simp in the [] case?}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	92
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	93	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	94	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	95	$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	96	$r \backslash_{simp} [\,] $ & $\dn$ & $r$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	97	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	98	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	99
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	100	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	101	to obtain an optimised version of the algorithm:
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	102
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	103	\begin{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	104	\begin{tabular}{lcl}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	105	$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	106	$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	107	& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	108	& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	109	& & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	110	\end{tabular}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	111	\end{center}
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	112
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	113	\noindent
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	114	This algorithm keeps the regular expression size small, for example,
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	115	with this simplification our previous $(a + aa)^*$ example's 8000 nodes
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	116	will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	117	input string is.
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	118
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	119
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	120
539 7cf9f17aa179 more Chengsong parents: 538 diff changeset	121	\section{ $(a^\cdot a^)^*$ against $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	122	We
7cf9f17aa179 more Chengsong parents: 538 diff changeset	123	Unlike in \ref{fig:BetterWaterloo}, the size with simplification now stay nicely constant with current
7cf9f17aa179 more Chengsong parents: 538 diff changeset	124	simplification rules (we put the graphs together to show the contrast)
7cf9f17aa179 more Chengsong parents: 538 diff changeset	125	\begin{tabular}{ll}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	126	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	127	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	128	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	129	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	130	width=7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	131	height=4cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	132	legend entries={Lexer with $\bsimp$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	133	legend pos= south east,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	134	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	135	\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	136	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	137	\end{tikzpicture} %\label{fig:BitcodedLexer}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	138	&
7cf9f17aa179 more Chengsong parents: 538 diff changeset	139	\begin{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	140	\begin{axis}[
7cf9f17aa179 more Chengsong parents: 538 diff changeset	141	xlabel={$n$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	142	ylabel={derivative size},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	143	width = 7cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	144	height = 4cm,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	145	legend entries={Lexer without $\bsimp$},
7cf9f17aa179 more Chengsong parents: 538 diff changeset	146	legend pos= north west,
7cf9f17aa179 more Chengsong parents: 538 diff changeset	147	legend cell align=left]
7cf9f17aa179 more Chengsong parents: 538 diff changeset	148	\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
7cf9f17aa179 more Chengsong parents: 538 diff changeset	149	\end{axis}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	150	\end{tikzpicture}
7cf9f17aa179 more Chengsong parents: 538 diff changeset	151	\end{tabular}
538 8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	152
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	153
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	154
8016a2480704 intro and chap2 Chengsong parents: 532 diff changeset	155
532 cc54ce075db5 restructured Chengsong parents: diff changeset	156
cc54ce075db5 restructured Chengsong parents: diff changeset	157	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	158	% SECTION common identities
cc54ce075db5 restructured Chengsong parents: diff changeset	159	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	160	\section{Common Identities In Simplification-Related Functions}
cc54ce075db5 restructured Chengsong parents: diff changeset	161	Need to prove these before starting on the big correctness proof.
cc54ce075db5 restructured Chengsong parents: diff changeset	162
cc54ce075db5 restructured Chengsong parents: diff changeset	163	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	164	% SUBSECTION
cc54ce075db5 restructured Chengsong parents: diff changeset	165	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	166	\subsection{Idempotency of $\simp$}
cc54ce075db5 restructured Chengsong parents: diff changeset	167
cc54ce075db5 restructured Chengsong parents: diff changeset	168	\begin{equation}
cc54ce075db5 restructured Chengsong parents: diff changeset	169	\simp \;r = \simp\; \simp \; r
cc54ce075db5 restructured Chengsong parents: diff changeset	170	\end{equation}
cc54ce075db5 restructured Chengsong parents: diff changeset	171	This property means we do not have to repeatedly
cc54ce075db5 restructured Chengsong parents: diff changeset	172	apply simplification in each step, which justifies
cc54ce075db5 restructured Chengsong parents: diff changeset	173	our definition of $\blexersimp$.
cc54ce075db5 restructured Chengsong parents: diff changeset	174	It will also be useful in future proofs where properties such as
cc54ce075db5 restructured Chengsong parents: diff changeset	175	closed forms are needed.
cc54ce075db5 restructured Chengsong parents: diff changeset	176	The proof is by structural induction on $r$.
cc54ce075db5 restructured Chengsong parents: diff changeset	177
cc54ce075db5 restructured Chengsong parents: diff changeset	178	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	179	% SUBSECTION
cc54ce075db5 restructured Chengsong parents: diff changeset	180	%-----------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	181	\subsection{Syntactic Equivalence Under $\simp$}
cc54ce075db5 restructured Chengsong parents: diff changeset	182	We prove that minor differences can be annhilated
cc54ce075db5 restructured Chengsong parents: diff changeset	183	by $\simp$.
cc54ce075db5 restructured Chengsong parents: diff changeset	184	For example,
cc54ce075db5 restructured Chengsong parents: diff changeset	185	\begin{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	186	$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) =
cc54ce075db5 restructured Chengsong parents: diff changeset	187	\simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$
cc54ce075db5 restructured Chengsong parents: diff changeset	188	\end{center}
cc54ce075db5 restructured Chengsong parents: diff changeset	189
cc54ce075db5 restructured Chengsong parents: diff changeset	190
cc54ce075db5 restructured Chengsong parents: diff changeset	191
cc54ce075db5 restructured Chengsong parents: diff changeset	192
cc54ce075db5 restructured Chengsong parents: diff changeset	193
cc54ce075db5 restructured Chengsong parents: diff changeset	194
cc54ce075db5 restructured Chengsong parents: diff changeset	195
cc54ce075db5 restructured Chengsong parents: diff changeset	196	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	197	% SECTION corretness proof
cc54ce075db5 restructured Chengsong parents: diff changeset	198	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	199	\section{Proof Technique of Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	200	The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured Chengsong parents: diff changeset	201	compared with not having simplification is that we can no longer use the argument
cc54ce075db5 restructured Chengsong parents: diff changeset	202	in \cref{flex_retrieve}.
cc54ce075db5 restructured Chengsong parents: diff changeset	203	The function \retrieve needs the structure of the annotated regular expression to
cc54ce075db5 restructured Chengsong parents: diff changeset	204	agree with the structure of the value, but simplification will always mess with the
cc54ce075db5 restructured Chengsong parents: diff changeset	205	structure:
cc54ce075db5 restructured Chengsong parents: diff changeset	206	%TODO: after simp does not agree with each other: (a + 0) --> a v.s. Left(Char(a))

author	Chengsong
	Thu, 09 Jun 2022 22:07:44 +0100
changeset 539	7cf9f17aa179
parent 538	8016a2480704
child 543	b2bea5968b89
permissions	-rwxr-xr-x