lexing: ChengsongTanPhdThesis/Chapters/Cubic.tex@dd9dde2d902b (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	3	%We also present the idempotency property proof
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	4	%of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	5	%This reinforces our claim that the fixpoint construction
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	6	%originally required by Sulzmann and Lu can be removed in $\blexersimp$.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	7	%Last but not least, we present our efforts and challenges we met
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	8	%in further improving the algorithm by data
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	9	%structures such as zippers.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	10	%----------------------------------------------------------------------------------------
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	11	% SECTION strongsimp
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	12	%----------------------------------------------------------------------------------------
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	13	%TODO: search for isabelle proofs of algorithms that check equivalence
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	14
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	15	\chapter{A Better Size Bound for Derivatives} % Main chapter title
532 cc54ce075db5 restructured Chengsong parents: diff changeset	16
cc54ce075db5 restructured Chengsong parents: diff changeset	17	\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
cc54ce075db5 restructured Chengsong parents: diff changeset	18	%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
cc54ce075db5 restructured Chengsong parents: diff changeset	19	%algorithm to include constructs such as bounded repetitions and negations.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	20	\lstset{style=myScalastyle}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	21
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	22
625 b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	23	This chapter is a ``work-in-progress''
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	24	chapter which records
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	25	extensions to our $\blexersimp$.
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	26	We make a conjecture that the finite
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	27	size bound from the previous chapter can be
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	28	improved to a cubic bound.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	29	We implemented our conjecture in Scala.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	30	We intend to formalise this part in Isabelle/HOL at a
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	31	later stage.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	32	%we have not been able to finish due to time constraints of the PhD.
625 b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	33	Nevertheless, we outline the ideas we intend to use for the proof.
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	34
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	35	\section{A Stronger Version of Simplification}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	36
625 b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	37	We present further improvements
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	38	for our lexer algorithm $\blexersimp$.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	39	We devise a stronger simplification algorithm,
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	40	called $\bsimpStrong$, which can prune away
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	41	similar components in two regular expressions at the same
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	42	alternative level,
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	43	even if these regular expressions are not exactly the same.
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	44	We call the lexer that uses this stronger simplification function
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	45	$\blexerStrong$.
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	46	%Unfortunately we did not have time to
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	47	%work out the proofs, like in
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	48	%the previous chapters.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	49	We conjecture that both
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	50	\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	51	$\blexerStrong \;r \; s = \blexer\; r\;s$
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	52	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	53	and
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	54	\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	55	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	56	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	57	hold.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	58	%but a formalisation
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	59	%is still future work.
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	60	We give an informal justification
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	61	why the correctness and cubic size bound proofs
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	62	can be achieved
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	63	by exploring the connection between the internal
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	64	data structure of our $\blexerStrong$ and
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	65	Animirov's partial derivatives.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	66
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	67	In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	68	For example, the regular expression
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	69	\[
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	70	aa \cdot a^+ a \cdot a^ + aa\cdot a^*
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	71	\]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	72	contains three terms,
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	73	expressing three possibilities for how it can match some input.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	74	The first and the third terms are identical, which means we can eliminate
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	75	the latter as it will not contribute to a POSIX value.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	76	In $\bsimps$, the $\distinctBy$ function takes care of
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	77	such instances.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	78	The criteria $\distinctBy$ uses for removing a duplicate
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	79	$a_2$ in the list
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	80	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	81	$rs_a@[a_1]@rs_b@[a_2]@rs_c$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	82	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	83	is that
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	84	the two erased regular expressions are equal
533 6acbc939af6a more Chengsong parents: 532 diff changeset	85	\begin{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	86	$\rerase{a_1} = \rerase{a_2}$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	87	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	88	This is characterised as the $LD$
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	89	rewrite rule in figure \ref{rrewriteRules}.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	90	The problem, however, is that identical components
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	91	in two slightly different regular expressions cannot be removed
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	92	by the $LD$ rule.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	93	Consider the stronger simplification
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	94	\begin{equation}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	95	\label{eqn:partialDedup}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	96	(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	97	\end{equation}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	98	where the $(\underline{a}+c+e)\cdot r_1$ is deleted in the right alternative
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	99	$a+c+e$.
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	100	This is permissible because we have $(a+\ldots)\cdot r_1$ in the left
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	101	alternative.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	102	The difficulty is that such ``buried''
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	103	alternatives-sequences are not easily recognised.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	104	But simplification like this actually
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	105	cannot be omitted, if we want to have a better bound.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	106	For example, the size of derivatives can still
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	107	blow up even with our $\textit{bsimp}$
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	108	function:
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	109	consider again the example
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	110	$\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^)^$,
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	111	and set $n$ to a relatively small number like $n=5$, then we get the following
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	112	exponential growth:
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	113	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	114	\centering
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	115	\begin{tikzpicture}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	116	\begin{axis}[
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	117	%xlabel={$n$},
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	118	myplotstyle,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	119	xlabel={input length},
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	120	ylabel={size},
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	121	]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	122	\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	123	\end{axis}
533 6acbc939af6a more Chengsong parents: 532 diff changeset	124	\end{tikzpicture}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	125	\caption{Size of derivatives of $\blexersimp$ from chapter 5 for matching
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	126	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^)^$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	127	with strings
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	128	of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	129	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	130	\noindent
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	131	One possible approach would be to apply the rewriting
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	132	rule
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	133	\[
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	134	(a+b+d) \cdot r_1 \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	135	\]
533 6acbc939af6a more Chengsong parents: 532 diff changeset	136	\noindent
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	137	which pushes the sequence into the alternatives
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	138	in our $\simp$ function.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	139	This would then make the simplification shown in \eqref{eqn:partialDedup} possible.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	140	Translating this rule into our $\textit{bsimp}$ function would simply
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	141	involve adding a new clause to the $\textit{bsimp}_{ASEQ}$ function:
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	142	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	143	\begin{tabular}{@{}lcl@{}}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	144	$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	145	&& $\ldots$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	146	&&$\quad\textit{case} \; (_{bs1}\sum as, a_2') \Rightarrow _{bs1}\sum (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	147	\map \; (_{[]}\textit{ASEQ} \; \_ \; a_2') \; as)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	148	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	149	\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	150	\end{center}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	151	\noindent
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	152	Unfortunately,
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	153	if we introduce this clause in our
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	154	setting we would lose the POSIX property of our calculated values.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	155	For example given the regular expression
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	156	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	157	$(a + ab)(bc + c)$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	158	\end{center}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	159	and the string $ab$,
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	160	then our algorithm generates the following
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	161	correct POSIX value
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	162	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	163	$\Seq \; (\Right \; ab) \; (\Right \; c)$.
533 6acbc939af6a more Chengsong parents: 532 diff changeset	164	\end{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	165	Essentially it matches the string with the longer Right-alternative
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	166	in the first sequence (and
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	167	then the 'rest' with the character regular expression $c$ from the second sequence).
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	168	If we add the simplification above, however,
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	169	then we would obtain the following value
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	170	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	171	$\Left \; (\Seq \; a \; (\Left \; bc))$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	172	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	173	where the $\Left$-alternatives get priority.
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	174	This violates the POSIX rules.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	175	The reason for getting this undesired value
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	176	is that the new rule splits this regular expression up into
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	177	a topmost alternative
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	178	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	179	$a\cdot(b c + c) + ab \cdot (bc + c)$,
533 6acbc939af6a more Chengsong parents: 532 diff changeset	180	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	181	which is a regular expression with a
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	182	quite different meaning: the original regular expression
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	183	is a sequence, but the simplified regular expression is an alternative.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	184	With an alternative the maximal munch rule no longer works.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	185
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	186
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	187	A method to reconcile this problem is to do the
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	188	transformation in \eqref{eqn:partialDedup} ``non-invasively'',
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	189	meaning that we traverse the list of regular expressions
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	190	%\begin{center}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	191	% $rs_a@[a]@rs_c$
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	192	%\end{center}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	193	inside alternatives
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	194	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	195	$\sum ( rs_a@[a]@rs_c)$
533 6acbc939af6a more Chengsong parents: 532 diff changeset	196	\end{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	197	using a function similar to $\distinctBy$,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	198	but this time
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	199	we allow the following more general rewrite rule:
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	200	\begin{equation}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	201	\label{eqn:cubicRule}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	202	%\mbox{
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	203	%\begin{mathpar}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	204	\inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	205	\stackrel{s}{\rightsquigarrow }
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	206	rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	207	%\end{mathpar}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	208	%\caption{The rule capturing the pruning simplification needed to achieve
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	209	%a cubic bound}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	210	\end{equation}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	211	\noindent
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	212	%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	213	where $\textit{prune} \;a \; acc$ traverses $a$
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	214	without altering the structure of $a$, but removing components in $a$
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	215	that have appeared in the accumulator $acc$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	216	For example
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	217	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	218	$\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	219	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	220	should be equal to
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	221	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	222	$(r_g+r_h)r_d$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	223	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	224	because $r_gr_d$ and
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	225	$r_hr_d$ are the only terms
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	226	that do not appeared in the accumulator list
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	227	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	228	$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	229	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	230	We implemented the
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	231	function $\textit{prune}$ in Scala (see figure \ref{fig:pruneFunc})
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	232	The function $\textit{prune}$
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	233	is a stronger version of $\textit{distinctBy}$.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	234	It does not just walk through a list looking for exact duplicates,
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	235	but prunes sub-expressions recursively.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	236	It manages proper contexts by the helper functions
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	237	$\textit{removeSeqTail}$, $\textit{isOne}$ and $\textit{atMostEmpty}$.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	238	\begin{figure}%[H]
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	239	\begin{lstlisting}[numbers=left]
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	240	def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	241	case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != AZERO) match
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	242	{
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	243	//all components have been removed, meaning this is effectively a duplicate
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	244	//flats will take care of removing this AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	245	case Nil => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	246	case r::Nil => fuse(bs, r)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	247	case rs1 => AALTS(bs, rs1)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	248	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	249	case ASEQ(bs, r1, r2) =>
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	250	//remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	251	prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	252	//after pruning, returns 0
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	253	case AZERO => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	254	//after pruning, got r1'.r2, where r1' is equal to 1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	255	case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	256	//assemble the pruned head r1p with r2
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	257	case r1p => ASEQ(bs, r1p, r2)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	258	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	259	//this does the duplicate component removal task
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	260	case r => if(acc(erase(r))) AZERO else r
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	261	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	262	\end{lstlisting}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	263	\caption{The function $\textit{prune}$ is called recursively in the
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	264	alternative case (line 2) and in the sequence case (line 12).
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	265	In the alternative case we keep all the accumulators the same, but
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	266	in the sequence case we are making necessary changes to the accumulators
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	267	to allow correct de-duplication.}\label{fig:pruneFunc}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	268	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	269	\noindent
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	270
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	271	\begin{figure}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	272	\begin{lstlisting}[numbers=left]
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	273	def atMostEmpty(r: Rexp) : Boolean = r match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	274	case ZERO => true
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	275	case ONE => true
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	276	case STAR(r) => atMostEmpty(r)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	277	case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	278	case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	279	case CHAR(_) => false
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	280	}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	281
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	282	def isOne(r: Rexp) : Boolean = r match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	283	case ONE => true
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	284	case SEQ(r1, r2) => isOne(r1) && isOne(r2)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	285	case ALTS(r1, r2) => (isOne(r1) \|\| isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	286	case STAR(r0) => atMostEmpty(r0)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	287	case CHAR(c) => false
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	288	case ZERO => false
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	289	}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	290
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	291	def removeSeqTail(r: Rexp, tail: Rexp) : Rexp =
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	292	if (r == tail)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	293	ONE
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	294	else {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	295	r match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	296	case SEQ(r1, r2) =>
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	297	if(r2 == tail) r1 else ZERO
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	298	case r => ZERO
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	299	}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	300	}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	301
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	302
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	303	\end{lstlisting}
638 dd9dde2d902b comments till chap4 Chengsong parents: 630 diff changeset	304	\caption{The helper functions of $\textit{prune}$:
dd9dde2d902b comments till chap4 Chengsong parents: 630 diff changeset	305	$\textit{atMostEmpty}$, $\textit{isOne}$ and $\textit{removeSeqTail}$}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	306	\end{figure}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	307	\noindent
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	308	Suppose we feed
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	309	\begin{center}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	310	$r= (\underline{\ONE}+(\underline{f}+b)\cdot g)\cdot (a\cdot(d\cdot e))$
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	311	\end{center}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	312	and
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	313	\begin{center}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	314	$acc = \{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \}$
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	315	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	316	as the input into $\textit{prune}$.
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	317	The end result will be
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	318	\[
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	319	b\cdot(g\cdot(a\cdot(d\cdot e)))
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	320	\]
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	321	where the underlined components in $r$ are eliminated.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	322	Looking more closely, at the topmost call
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	323	\[
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	324	\textit{prune} \quad (\ONE+
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	325	(f+b)\cdot g)\cdot (a\cdot(d\cdot e)) \quad
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	326	\{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	327	\]
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	328	The sequence clause will be called,
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	329	where a sub-call
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	330	\[
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	331	\textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	332	\]
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	333	is made. The terms in the new accumulator $\{\ONE,\; f\cdot g \}$ come from
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	334	the two calls to $\textit{removeSeqTail}$:
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	335	\[
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	336	\textit{removeSeqTail} \quad\;\; a \cdot(d\cdot e) \quad\;\; a \cdot(d\cdot e)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	337	\]
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	338	and
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	339	\[
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	340	\textit{removeSeqTail} \quad \;\;
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	341	f\cdot(g\cdot (a \cdot(d\cdot e)))\quad \;\; a \cdot(d\cdot e).
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	342	\]
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	343
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	344	The idea behind $\textit{removeSeqTail}$ is that
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	345	when pruning recursively, we need to ``zoom in''
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	346	to sub-expressions, and this ``zoom in'' needs to be performed
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	347	on the
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	348	accumulators as well, otherwise the deletion will not work.
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	349	The sub-call
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	350	$\textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \}$
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	351	is simpler, which will trigger the alternative clause, causing
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	352	a pruning on each element in $(\ONE+(f+b)\cdot g)$,
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	353	leaving us with $b\cdot g$ only.
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	354
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	355	Our new lexer with stronger simplification
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	356	uses $\textit{prune}$ by making it
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	357	the core component of the deduplicating function
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	358	called $\textit{distinctWith}$.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	359	$\textit{DistinctWith}$ ensures that all verbose
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	360	parts of a regular expression are pruned away.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	361
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	362	\begin{figure}[H]
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	363	\begin{lstlisting}[numbers=left]
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	364	def turnIntoTerms(r: Rexp): List[Rexp] = r match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	365	case SEQ(r1, r2) =>
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	366	turnIntoTerms(r1).flatMap(r11 => furtherSEQ(r11, r2))
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	367	case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	368	case ZERO => Nil
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	369	case _ => r :: Nil
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	370	}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	371
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	372	def distinctWith(rs: List[ARexp],
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	373	pruneFunction: (ARexp, Set[Rexp]) => ARexp,
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	374	acc: Set[Rexp] = Set()) : List[ARexp] =
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	375	rs match{
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	376	case Nil => Nil
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	377	case r :: rs =>
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	378	if(acc(erase(r)))
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	379	distinctWith(rs, pruneFunction, acc)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	380	else {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	381	val pruned_r = pruneFunction(r, acc)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	382	pruned_r ::
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	383	distinctWith(rs,
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	384	pruneFunction,
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	385	turnIntoTerms(erase(pruned_r)) ++: acc
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	386	)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	387	}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	388	}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	389
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	390	\end{lstlisting}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	391	\caption{A Stronger Version of $\textit{distinctBy}$ XXX}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	392	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	393	\noindent
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	394	Once a regular expression has been pruned,
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	395	all its components will be added to the accumulator
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	396	to remove any future regular expressions' duplicate components.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	397
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	398	The function $\textit{bsimpStrong}$
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	399	is very much the same as $\textit{bsimp}$, just with
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	400	$\textit{distinctBy}$ replaced
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	401	by $\textit{distinctWith}$.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	402	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	403	\begin{lstlisting}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	404
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	405	def bsimpStrong(r: ARexp): ARexp =
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	406	{
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	407	r match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	408	case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	409	case (AZERO, _) => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	410	case (_, AZERO) => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	411	case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	412	case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	413	case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	414	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	415	case AALTS(bs1, rs) => {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	416	distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	417	case Nil => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	418	case s :: Nil => fuse(bs1, s)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	419	case rs => AALTS(bs1, rs)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	420	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	421	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	422	case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	423	case r => r
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	424	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	425	}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	426	def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	427	case Nil => r
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	428	case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	429	}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	430
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	431	\end{lstlisting}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	432	\caption{The function
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	433	$\textit{bsimpStrong}$: a stronger version of $\textit{bsimp}$}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	434	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	435	\noindent
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	436	The benefits of using
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	437	$\textit{prune}$ refining the finiteness bound
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	438	to a cubic bound has not been formalised yet.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	439	Therefore we choose to use Scala code rather than an Isabelle-style formal
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	440	definition like we did for $\simp$, as the definitions might change
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	441	to suit our proof needs.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	442	In the rest of the chapter we will use this convention consistently.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	443
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	444	%The function $\textit{prune}$ is used in $\distinctWith$.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	445	%$\distinctWith$ is a stronger version of $\distinctBy$
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	446	%which not only removes duplicates as $\distinctBy$ would
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	447	%do, but also uses the $\textit{pruneFunction}$
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	448	%argument to prune away verbose components in a regular expression.\\
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	449	%\begin{figure}[H]
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	450	%\begin{lstlisting}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	451	% //a stronger version of simp
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	452	% def bsimpStrong(r: ARexp): ARexp =
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	453	% {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	454	% r match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	455	% case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	456	% //normal clauses same as simp
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	457	% case (AZERO, _) => AZERO
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	458	% case (_, AZERO) => AZERO
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	459	% case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	460	% //bs2 can be discarded
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	461	% case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	462	% case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	463	% }
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	464	% case AALTS(bs1, rs) => {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	465	% //distinctBy(flat_res, erase)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	466	% distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	467	% case Nil => AZERO
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	468	% case s :: Nil => fuse(bs1, s)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	469	% case rs => AALTS(bs1, rs)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	470	% }
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	471	% }
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	472	% //stars that can be treated as 1
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	473	% case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	474	% case r => r
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	475	% }
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	476	% }
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	477	% def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	478	% case Nil => r
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	479	% case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	480	% }
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	481	%\end{lstlisting}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	482	%\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	483	%\end{figure}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	484	%\noindent
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	485	%$\distinctWith$, is in turn used in $\bsimpStrong$:
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	486	%\begin{figure}[H]
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	487	%\begin{lstlisting}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	488	% //Conjecture: [\| bdersStrong(s, r) \|] = O([\| r \|]^3)
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	489	% def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	490	% case Nil => r
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	491	% case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	492	% }
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	493	%\end{lstlisting}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	494	%\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	495	%\end{figure}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	496	%\noindent
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	497	We conjecture that the above Scala function $\bdersStrongs$,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	498	written $\bdersStrong{\_}{\_}$ as an infix notation,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	499	satisfies the following property:
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	500	\begin{conjecture}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	501	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	502	\end{conjecture}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	503	\noindent
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	504	The stronger version of $\blexersimp$'s
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	505	code in Scala looks like:
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	506	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	507	\begin{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	508	def strongBlexer(r: Rexp, s: String) : Option[Val] = {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	509	Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	510	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	511	def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	512	case Nil => {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	513	if (bnullable(r)) {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	514	mkepsBC(r)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	515	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	516	else
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	517	throw new Exception("Not matched")
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	518	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	519	case c::cs => {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	520	strong_blex_simp(strongBsimp(bder(c, r)), cs)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	521	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	522	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	523	\end{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	524	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	525	\noindent
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	526	We call this lexer $\blexerStrong$.
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	527	This version is able to reduce the
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	528	size of the derivatives which
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	529	otherwise
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	530	triggered exponential behaviour in
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	531	$\blexersimp$.
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	532	Consider again the runtime for matching
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	533	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^)^$ with strings
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	534	of the form $\protect\underbrace{aa..a}_{n}$.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	535	They produce the folloiwng graphs ($\blexerStrong$ on the left-hand-side and
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	536	$\blexersimp$ on the right-hand-side).
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	537	\begin{figure}[H]
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	538	\centering
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	539	\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	540	\begin{tikzpicture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	541	\begin{axis}[
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	542	%xlabel={$n$},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	543	myplotstyle,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	544	xlabel={input length},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	545	ylabel={size},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	546	width = 7cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	547	height = 5cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	548	]
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	549	\addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	550	\end{axis}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	551	\end{tikzpicture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	552	&
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	553	\begin{tikzpicture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	554	\begin{axis}[
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	555	%xlabel={$n$},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	556	myplotstyle,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	557	xlabel={input length},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	558	ylabel={size},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	559	width = 7cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	560	height = 5cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	561	]
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	562	\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	563	\end{axis}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	564	\end{tikzpicture}\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	565	\multicolumn{2}{l}{}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	566	\end{tabular}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	567	\caption{}\label{fig:aaaaaStarStar}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	568	\end{figure}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	569	\noindent
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	570	We hope the correctness is preserved.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	571	%We would like to preserve the correctness like the one
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	572	%we had for $\blexersimp$:
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	573	The proof idea is to preserve the key lemma in chapter \ref{Bitcoded2}
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	574	such as in equation \eqref{eqn:cubicRule}.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	575	\begin{conjecture}\label{cubicConjecture}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	576	$\blexerStrong \;r \; s = \blexer\; r\;s$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	577	\end{conjecture}
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	578	\noindent
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	579	The idea is to maintain key lemmas in
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	580	chapter \ref{Bitcoded2} like
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	581	$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	582	with the new rewriting rule
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	583	shown in figure \eqref{eqn:cubicRule} .
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	584
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	585	In the next sub-section,
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	586	we will describe why we
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	587	believe a cubic size bound can be achieved with
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	588	the stronger simplification.
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	589	For this we give a short introduction to the
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	590	partial derivatives,
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	591	which were invented by Antimirov \cite{Antimirov95},
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	592	and then link them with the result of the function
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	593	$\bdersStrongs$.
7f4c353c0f6b more Chengsong parents: 591 diff changeset	594
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	595	\subsection{Antimirov's partial derivatives}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	596	Partial derivatives were first introduced by
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	597	Antimirov \cite{Antimirov95}.
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	598	They are very similar
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	599	to Brzozowski derivatives,
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	600	but split children of alternative regular expressions into
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	601	multiple independent terms. This means the output of
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	602	partial derivatives is a
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	603	set of regular expressions, defined as follows
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	604	\begin{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	605	\begin{tabular}{lcl@{\hspace{-5mm}}l}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	606	$\partial_x \; (r_1 \cdot r_2)$ &
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	607	$\dn$ & $(\partial_x \; r_1) \cdot r_2 \cup
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	608	\partial_x \; r_2 \;$ & $ \textit{if} \; \; \nullable\; r_1$\\
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	609	& & $(\partial_x \; r_1)\cdot r_2 \quad\quad$ & $
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	610	\textit{otherwise}$\\
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	611	$\partial_x \; r^$ & $\dn$ & $(\partial_x \; r) \cdot r^$\\
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	612	$\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \;
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	613	\textit{then} \;
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	614	\{ \ONE\} \;\;\textit{else} \; \varnothing$\\
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	615	$\partial_x(r_1+r_2)$ & $=$ & $\partial_x(r_1) \cup \partial_x(r_2)$\\
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	616	$\partial_x(\ONE)$ & $=$ & $\varnothing$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	617	$\partial_x(\ZERO)$ & $\dn$ & $\varnothing$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	618	\end{tabular}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	619	\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	620	\noindent
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	621	The $\cdot$ in the example
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	622	$(\partial_x \; r_1) \cdot r_2 $
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	623	is a shorthand notation for the cartesian product
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	624	$(\partial_x \; r_1) \times \{ r_2\}$.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	625	%Each element in the set generated by a partial derivative
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	626	%corresponds to a (potentially partial) match
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	627	%TODO: define derivatives w.r.t string s
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	628	Rather than joining the calculated derivatives $\partial_x r_1$ and $\partial_x r_2$ together
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	629	using the $\sum$ constructor, Antimirov put them into
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	630	a set. This means many subterms will be de-duplicated
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	631	because they are sets.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	632	For example, to compute what the derivative of the regular expression
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	633	$x^(xx + y)^$
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	634	w.r.t. $x$ is, one can compute a partial derivative
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	635	and get two singleton sets $\{x^* \cdot (xx + y)^\}$ and $\{x \cdot (xx + y) ^ \}$
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	636	from $\partial_x(x^) \cdot (xx + y) ^$ and $\partial_x((xx + y)^*)$.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	637
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	638	The partial derivative w.r.t. a string is defined recursively:
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	639	\[
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	640	\partial_{c::cs} r \dn \bigcup_{r'\in (\partial_c r)}
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	641	\partial_{cs} r'
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	642	\]
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	643	Suppose an alphabet $\Sigma$, we use $\Sigma^*$ for the set of all possible strings
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	644	from the alphabet.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	645	The set of all possible partial derivatives is then defined
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	646	as the union of derivatives w.r.t all the strings:
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	647	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	648	\begin{tabular}{lcl}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	649	$\textit{PDER}_{\Sigma^} \; r $ & $\dn $ & $\bigcup_{w \in \Sigma^}\partial_w \; r$
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	650	\end{tabular}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	651	\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	652	\noindent
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	653	Consider now again our pathological case where we apply the more
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	654	aggressive
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	655	simplification
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	656	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	657	$((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^)^$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	658	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	659	let use abbreviate theis regular expression with $r$,
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	660	then we have that
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	661	\begin{center}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	662	$\textit{PDER}_{\Sigma^*} \; r =
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	663	\bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	664	(\underbrace{a \ldots a}_{\text{j a's}}\cdot
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	665	(\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot r \}$,
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	666	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	667	The union on the right-hand-side has $n * (n + 1) / 2$ terms.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	668	This leads us to believe that the maximum number of terms needed
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	669	in our derivative is also only $n * (n + 1) / 2$.
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	670	Therefore
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	671	we conjecture that $\bsimpStrong$ is also able to achieve this
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	672	upper limit in general
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	673	\begin{conjecture}\label{bsimpStrongInclusionPder}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	674	Using a suitable transformation $f$, we have that
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	675	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	676	$\forall s.\; f \; (r \bdersStrong \; s) \subseteq
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	677	\textit{PDER}_{\Sigma^*} \; r$
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	678	\end{center}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	679	holds.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	680	\end{conjecture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	681	\noindent
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	682	The reason is that our \eqref{eqn:cubicRule} will keep only one copy of each term,
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	683	where the function $\textit{prune}$ takes care of maintaining
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	684	a set like structure similar to partial derivatives.
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	685	%We might need to adjust $\textit{prune}$
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	686	%slightly to make sure all duplicate terms are eliminated,
d50a309a0645 with Christian Chengsong parents: 628 diff changeset	687	%which should be doable.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	688
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	689	Antimirov had proven that the sum of all the partial derivative
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	690	terms' sizes is bounded by the cubic of the size of that regular
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	691	expression:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	692	\begin{property}\label{pderBound}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	693	$\llbracket \textit{PDER}_{\Sigma^*} \; r \rrbracket \leq O(\llbracket r \rrbracket^3)$
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	694	\end{property}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	695	\noindent
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	696	This property was formalised by Wu et al. \cite{Wu2014}, and the
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	697	details can be found in the Archive of Formal Froofs\footnote{https://www.isa-afp.org/entries/Myhill-Nerode.html}.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	698	Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound}
630 d50a309a0645 with Christian Chengsong parents: 628 diff changeset	699	would provide us with a cubic bound for our $\blexerStrong$ algorithm:
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	700	\begin{conjecture}\label{strongCubic}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	701	$\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	702	\end{conjecture}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	703	\noindent
7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	704	We leave this as future work.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	705
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	706
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	707	%To get all the "atomic" components of a regular expression's possible derivatives,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	708	%there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	709	%whatever character is available at the head of the string inside the language of a
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	710	%regular expression, and gives back the character and the derivative regular expression
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	711	%as a pair (which he called "monomial"):
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	712	% \begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	713	% \begin{tabular}{ccc}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	714	% $\lf(\ONE)$ & $=$ & $\phi$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	715	%$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	716	% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	717	% $\lf(r^)$ & $=$ & $\lf(r) \bigodot \lf(r^)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	718	%\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	719	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	720	%%TODO: completion
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	721	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	722	%There is a slight difference in the last three clauses compared
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	723	%with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	724	%expression $r$ with every element inside $\textit{rset}$ to create a set of
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	725	%sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	726	%on a set of monomials (which Antimirov called "linear form") and a regular
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	727	%expression, and returns a linear form:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	728	% \begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	729	% \begin{tabular}{ccc}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	730	% $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	731	% $l \bigodot (\ONE)$ & $=$ & $l$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	732	% $\phi \bigodot t$ & $=$ & $\phi$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	733	% $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	734	% $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	735	% $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	736	% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	737	% $\lf(r^)$ & $=$ & $\lf(r) \cdot \lf(r^)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	738	%\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	739	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	740	%%TODO: completion
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	741	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	742	% Some degree of simplification is applied when doing $\bigodot$, for example,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	743	% $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	744	% and $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	745	% $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	746	% and so on.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	747	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	748	% With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	749	% an iterative procedure:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	750	% \begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	751	% \begin{tabular}{llll}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	752	%$\textit{while}$ & $(\Delta_i \neq \phi)$ & & \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	753	% & $\Delta_{i+1}$ & $ =$ & $\lf(\Delta_i) - \PD_i$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	754	% & $\PD_{i+1}$ & $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	755	%$\partial_{UNIV}(r)$ & $=$ & $\PD$ &
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	756	%\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	757	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	758	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	759	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	760	% $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	761
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	762
532 cc54ce075db5 restructured Chengsong parents: diff changeset	763
cc54ce075db5 restructured Chengsong parents: diff changeset	764
cc54ce075db5 restructured Chengsong parents: diff changeset	765	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	766	% SECTION 2
cc54ce075db5 restructured Chengsong parents: diff changeset	767	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	768
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	769
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	770	%The closed form for them looks like:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	771	%%\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	772	%% \begin{tabular}{llrclll}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	773	%% $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	774	%% $\textit{rsimp}$ & $($ & $
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	775	%% \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	776	%% & & & & $\textit{nupdates} \;$ &
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	777	%% $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	778	%% & & & & $)$ &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	779	%% & & $)$ & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	780	%% & $)$ & & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	781	%% \end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	782	%%\end{center}
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	783	%\begin{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	784	% \begin{tabular}{llrcllrllll}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	785	% $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	786	% &&&&$\textit{rsimp}$ & $($ & $
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	787	% \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	788	% &&&& & & & & $\;\; \textit{nupdates} \;$ &
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	789	% $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	790	% &&&& & & & & $)$ &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	791	% &&&& & & $)$ & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	792	% &&&& & $)$ & & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	793	% \end{tabular}
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	794	%\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	795	%The $\textit{optermsimp}$ function with the argument $r$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	796	%chooses from two options: $\ZERO$ or
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	797	%We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	798	%and $\starupdates$:
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	799	%\begin{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	800	% \begin{tabular}{lcl}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	801	% $\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	802	% $\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	803	% & & $\textit{if} \;
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	804	% (\rnullable \; (\rders \; r \; s))$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	805	% & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	806	% \starupdate \; c \; r \; Ss)$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	807	% & & $\textit{else} \;\; (s @ [c]) :: (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	808	% \starupdate \; c \; r \; Ss)$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	809	% \end{tabular}
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	810	%\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	811	%\noindent
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	812	%As a generalisation from characters to strings,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	813	%$\starupdates$ takes a string instead of a character
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	814	%as the first input argument, and is otherwise the same
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	815	%as $\starupdate$.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	816	%\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	817	% \begin{tabular}{lcl}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	818	% $\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	819	% $\starupdates \; (c :: cs) \; r \; Ss$ & $=$ & $\starupdates \; cs \; r \; (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	820	% \starupdate \; c \; r \; Ss)$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	821	% \end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	822	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	823	%\noindent
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	824
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	825
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	826
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	827	%\section{Zippers}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	828	%Zipper is a data structure designed to operate on
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	829	%and navigate between local parts of a tree.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	830	%It was first formally described by Huet \cite{HuetZipper}.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	831	%Typical applications of zippers involve text editor buffers
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	832	%and proof system databases.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	833	%In our setting, the idea is to compactify the representation
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	834	%of derivatives with zippers, thereby making our algorithm faster.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	835	%Some initial results
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	836	%We first give a brief introduction to what zippers are,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	837	%and other works
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	838	%that apply zippers to derivatives
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	839	%When dealing with large trees, it would be a waste to
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	840	%traverse the entire tree if
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	841	%the operation only
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	842	%involves a small fraction of it.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	843	%The idea is to put the focus on that subtree, turning other parts
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	844	%of the tree into a context
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	845	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	846	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	847	%One observation about our derivative-based lexing algorithm is that
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	848	%the derivative operation sometimes traverses the entire regular expression
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	849	%unnecessarily:
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	850
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	851
612 8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	852	%----------------------------------------------------------------------------------------
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	853	% SECTION 1
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	854	%----------------------------------------------------------------------------------------
532 cc54ce075db5 restructured Chengsong parents: diff changeset	855
612 8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	856	%\section{Adding Support for the Negation Construct, and its Correctness Proof}
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	857	%We now add support for the negation regular expression:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	858	%\[ r ::= \ZERO \mid \ONE
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	859	% \mid c
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	860	% \mid r_1 \cdot r_2
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	861	% \mid r_1 + r_2
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	862	% \mid r^*
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	863	% \mid \sim r
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	864	%\]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	865	%The $\textit{nullable}$ function's clause for it would be
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	866	%\[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	867	%\textit{nullable}(~r) = \neg \nullable(r)
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	868	%\]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	869	%The derivative would be
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	870	%\[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	871	%~r \backslash c = ~ (r \backslash c)
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	872	%\]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	873	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	874	%The most tricky part of lexing for the $~r$ regular expression
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	875	% is creating a value for it.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	876	% For other regular expressions, the value aligns with the
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	877	% structure of the regular expression:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	878	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	879	% \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	880	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	881	%But for the $~r$ regular expression, $s$ is a member of it if and only if
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	882	%$s$ does not belong to $L(r)$.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	883	%That means when there
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	884	%is a match for the not regular expression, it is not possible to generate how the string $s$ matched
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	885	%with $r$.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	886	%What we can do is preserve the information of how $s$ was not matched by $r$,
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	887	%and there are a number of options to do this.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	888	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	889	%We could give a partial value when there is a partial match for the regular expression inside
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	890	%the $\mathbf{not}$ construct.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	891	%For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	892	%A value for it could be
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	893	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	894	% \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	895	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	896	% The above example demonstrates what value to construct
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	897	% when the string $s$ is at most a real prefix
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	898	% of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	899	% in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	900	% constructor.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	901	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	902	% Another option would be to either store the string $s$ that resulted in
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	903	% a mis-match for $r$ or a dummy value as a placeholder:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	904	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	905	% \vdash \textit{Not}(abcd) : ~( r_1 )
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	906	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	907	%or
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	908	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	909	% \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	910	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	911	% We choose to implement this as it is most straightforward:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	912	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	913	% \mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	914	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	915	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	916	%
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	917	%\begin{center}
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	918	% \begin{tabular}{lcl}
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	919	% $\ntset \; r \; (n+1) \; c::cs $ & $\dn$ & $\nupdates \;
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	920	% cs \; r \; [\Some \; ([c], n)]$\\
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	921	% $\ntset \; r\; 0 \; \_$ & $\dn$ & $\None$\\
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	922	% $\ntset \; r \; \_ \; [] $ & $ \dn$ & $[]$\\
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	923	% \end{tabular}
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	924	%\end{center}
628 7af4e2420a8c ready to submit~~ Chengsong parents: 625 diff changeset	925

author	Chengsong
	Fri, 30 Dec 2022 01:52:32 +0000
changeset 638	dd9dde2d902b
parent 630	d50a309a0645
child 639	80cc6dc4c98b
permissions	-rwxr-xr-x