lexing: ChengsongTanPhdThesis/Chapters/Cubic.tex@b797c9a709d9 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	\chapter{A Better Bound and Other Extensions} % Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4
cc54ce075db5 restructured Chengsong parents: diff changeset	5	\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
cc54ce075db5 restructured Chengsong parents: diff changeset	6	%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%algorithm to include constructs such as bounded repetitions and negations.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	8	\lstset{style=myScalastyle}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	9
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	10
625 b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	11	This chapter is a ``work-in-progress''
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	12	chapter which records
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	13	extensions to our $\blexersimp$.
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	14	We intend to formalise this part, which
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	15	we have not been able to finish due to time constraints of the PhD.
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	16	Nevertheless, we outline the ideas we intend to use for the proof.
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	17
b797c9a709d9 section reorganising, related work Chengsong parents: 621 diff changeset	18	We present further improvements
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	19	made to our lexer algorithm $\blexersimp$.
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	20	We devise a stronger simplification algorithm,
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	21	called $\bsimpStrong$, which can prune away
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	22	similar components in two regular expressions at the same
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	23	alternative level,
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	24	even if these regular expressions are not exactly the same.
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	25	We call the lexer that uses this stronger simplification function
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	26	$\blexerStrong$.
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	27	We conjecture that both
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	28	\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	29	$\blexerStrong \;r \; s = \blexer\; r\;s$
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	30	\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	31	and
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	32	\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	33	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	34	\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	35	hold, but formalising
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	36	them is still work in progress.
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	37	We give reasons why the correctness and cubic size bound proofs
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	38	can be achieved,
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	39	by exploring the connection between the internal
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	40	data structure of our $\blexerStrong$ and
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	41	Animirov's partial derivatives.\\
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	42	%We also present the idempotency property proof
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	43	%of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	44	%This reinforces our claim that the fixpoint construction
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	45	%originally required by Sulzmann and Lu can be removed in $\blexersimp$.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	46
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	47	%Last but not least, we present our efforts and challenges we met
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	48	%in further improving the algorithm by data
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	49	%structures such as zippers.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	50
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	51
988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	52
532 cc54ce075db5 restructured Chengsong parents: diff changeset	53	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	54	% SECTION strongsimp
cc54ce075db5 restructured Chengsong parents: diff changeset	55	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	56	\section{A Stronger Version of Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	57	%TODO: search for isabelle proofs of algorithms that check equivalence
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	58	In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	59	For example, the regular expression
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	60	\[
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	61	aa \cdot a^+ a \cdot a^ + aa\cdot a^*
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	62	\]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	63	contains three terms,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	64	expressing three possibilities it will match future input.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	65	The first and the third terms are identical, which means we can eliminate
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	66	the latter as we know it will not be picked up by $\bmkeps$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	67	In $\bsimps$, the $\distinctBy$ function takes care of this.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	68	The criteria $\distinctBy$ uses for removing a duplicate
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	69	$a_2$ in the list
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	70	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	71	$rs_a@[a_1]@rs_b@[a_2]@rs_c$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	72	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	73	is that
533 6acbc939af6a more Chengsong parents: 532 diff changeset	74	\begin{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	75	$\rerase{a_1} = \rerase{a_2}$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	76	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	77	It can be characterised as the $LD$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	78	rewrite rule in \ref{rrewriteRules}.\\
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	79	The problem , however, is that identical components
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	80	in two slightly different regular expressions cannot be removed:
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	81	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	82	\[
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	83	(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	84	\]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	85	\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	86	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	87	\noindent
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	88	A simplification like this actually
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	89	cannot be omitted,
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	90	as without it the size could blow up even with our $\textit{bsimp}$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	91	function: for the chapter \ref{Finite} example
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	92	$\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^)^$,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	93	by just setting n to a small number,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	94	we get exponential growth that does not stop before it becomes huge:
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	95	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	96	\centering
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	97	\begin{tikzpicture}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	98	\begin{axis}[
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	99	%xlabel={$n$},
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	100	myplotstyle,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	101	xlabel={input length},
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	102	ylabel={size},
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	103	]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	104	\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	105	\end{axis}
533 6acbc939af6a more Chengsong parents: 532 diff changeset	106	\end{tikzpicture}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	107	\caption{Runtime of $\blexersimp$ for matching
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	108	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^)^$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	109	with strings
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	110	of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	111	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	112	\noindent
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	113	We would like to apply the rewriting at some stage
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	114	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	115	\[
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	116	(a+b+d) \cdot r_1 \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	117	\]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	118	\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}
533 6acbc939af6a more Chengsong parents: 532 diff changeset	119	\end{figure}
6acbc939af6a more Chengsong parents: 532 diff changeset	120	\noindent
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	121	in our $\simp$ function,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	122	so that it makes the simplification in \ref{partialDedup} possible.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	123	Translating the rule into our $\textit{bsimp}$ function simply
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	124	involves adding a new clause to the $\textit{bsimp}_{ASEQ}$ function:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	125	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	126	\begin{tabular}{@{}lcl@{}}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	127	$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	128	&& $\ldots$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	129	&&$\quad\textit{case} \; (_{bs1}\sum as, a_2') \Rightarrow _{bs1}\sum (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	130	\map \; (_{[]}\textit{ASEQ} \; \_ \; a_2') \; as)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	131	&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	132	\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	133	\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	134
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	135
533 6acbc939af6a more Chengsong parents: 532 diff changeset	136
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	137	Unfortunately,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	138	if we introduce them in our
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	139	setting we would lose the POSIX property of our calculated values.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	140	For example given the regular expression
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	141	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	142	$(a + ab)(bc + c)$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	143	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	144	and the string
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	145	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	146	$ab$,
533 6acbc939af6a more Chengsong parents: 532 diff changeset	147	\end{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	148	then our algorithm generates the following
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	149	correct POSIX value
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	150	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	151	$\Seq \; (\Right \; ab) \; (\Right \; c)$.
533 6acbc939af6a more Chengsong parents: 532 diff changeset	152	\end{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	153	Essentially it matches the string with the longer Right-alternative
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	154	in the first sequence (and
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	155	then the 'rest' with the character regular expression $c$ from the second sequence).
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	156	If we add the simplification above, then we obtain the following value
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	157	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	158	$\Left \; (\Seq \; a \; (\Left \; bc))$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	159	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	160	where the $\Left$-alternatives get priority.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	161	However this violates the POSIX rules.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	162	The reason for getting this undesired value
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	163	is that the new rule splits this regular expression up into
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	164	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	165	$a\cdot(b c + c) + ab \cdot (bc + c)$,
533 6acbc939af6a more Chengsong parents: 532 diff changeset	166	\end{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	167	which becomes a regular expression with a
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	168	totally different structure--the original
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	169	was a sequence, and now it becomes an alternative.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	170	With an alternative the maximum munch rule no longer works.\\
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	171	A method to reconcile this is to do the
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	172	transformation in \ref{desiredSimp} ``non-invasively'',
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	173	meaning that we traverse the list of regular expressions
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	174	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	175	$rs_a@[a]@rs_c$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	176	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	177	in the alternative
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	178	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	179	$\sum ( rs_a@[a]@rs_c)$
533 6acbc939af6a more Chengsong parents: 532 diff changeset	180	\end{center}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	181	using a function similar to $\distinctBy$,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	182	but this time
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	183	we allow a more general list rewrite:
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	184	\begin{mathpar}\label{cubicRule}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	185	\inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	186	\stackrel{s}{\rightsquigarrow }
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	187	rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	188	\end{mathpar}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	189	%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	190	where $\textit{prune} \;a \; acc$ traverses $a$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	191	without altering the structure of $a$, removing components in $a$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	192	that have appeared in the accumulator $acc$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	193	For example
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	194	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	195	$\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	196	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	197	should be equal to
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	198	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	199	$(r_g+r_h)r_d$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	200	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	201	because $r_gr_d$ and
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	202	$r_hr_d$ are the only terms
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	203	that have not appeared in the accumulator list
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	204	\begin{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	205	$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	206	\end{center}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	207	We implemented
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	208	function $\textit{prune}$ in Scala,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	209	and incorporated into our lexer,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	210	by replacing the $\simp$ function
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	211	with a stronger version called $\bsimpStrong$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	212	that prunes regular expressions.
590 988e92a70704 more chap5 and chap6 bsimp_idem Chengsong parents: 538 diff changeset	213	\begin{figure}[H]
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	214
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	215	\begin{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	216	def atMostEmpty(r: Rexp) : Boolean = r match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	217	case ZERO => true
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	218	case ONE => true
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	219	case STAR(r) => atMostEmpty(r)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	220	case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	221	case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	222	case CHAR(_) => false
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	223	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	224
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	225
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	226	def isOne(r: Rexp) : Boolean = r match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	227	case ONE => true
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	228	case SEQ(r1, r2) => isOne(r1) && isOne(r2)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	229	case ALTS(r1, r2) => (isOne(r1) \|\| isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	230	case STAR(r0) => atMostEmpty(r0)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	231	case CHAR(c) => false
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	232	case ZERO => false
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	233	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	234
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	235	//r = r' ~ tail' : If tail' matches tail => returns r'
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	236	def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	237	case SEQ(r1, r2) =>
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	238	if(r2 == tail)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	239	r1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	240	else
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	241	ZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	242	case r => ZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	243	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	244
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	245	def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	246	case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	247	{
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	248	//all components have been removed, meaning this is effectively a duplicate
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	249	//flats will take care of removing this AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	250	case Nil => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	251	case r::Nil => fuse(bs, r)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	252	case rs1 => AALTS(bs, rs1)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	253	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	254	case ASEQ(bs, r1, r2) =>
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	255	//remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	256	prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	257	//after pruning, returns 0
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	258	case AZERO => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	259	//after pruning, got r1'.r2, where r1' is equal to 1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	260	case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	261	//assemble the pruned head r1p with r2
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	262	case r1p => ASEQ(bs, r1p, r2)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	263	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	264	//this does the duplicate component removal task
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	265	case r => if(acc(erase(r))) AZERO else r
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	266	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	267	\end{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	268	\caption{pruning function together with its helper functions}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	269	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	270	\noindent
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	271	The benefits of using
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	272	$\textit{prune}$ such as refining the finiteness bound
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	273	to a cubic bound has not been formalised yet.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	274	Therefore we choose to use Scala code rather than an Isabelle-style formal
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	275	definition like we did for $\simp$, as the definitions might change
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	276	to suit proof needs.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	277	In the rest of the chapter we will use this convention consistently.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	278	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	279	\begin{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	280	def distinctWith(rs: List[ARexp],
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	281	pruneFunction: (ARexp, Set[Rexp]) => ARexp,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	282	acc: Set[Rexp] = Set()) : List[ARexp] =
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	283	rs match{
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	284	case Nil => Nil
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	285	case r :: rs =>
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	286	if(acc(erase(r)))
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	287	distinctWith(rs, pruneFunction, acc)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	288	else {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	289	val pruned_r = pruneFunction(r, acc)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	290	pruned_r ::
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	291	distinctWith(rs,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	292	pruneFunction,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	293	turnIntoTerms(erase(pruned_r)) ++: acc
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	294	)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	295	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	296	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	297	\end{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	298	\caption{A Stronger Version of $\textit{distinctBy}$}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	299	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	300	\noindent
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	301	The function $\textit{prune}$ is used in $\distinctWith$.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	302	$\distinctWith$ is a stronger version of $\distinctBy$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	303	which not only removes duplicates as $\distinctBy$ would
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	304	do, but also uses the $\textit{pruneFunction}$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	305	argument to prune away verbose components in a regular expression.\\
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	306	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	307	\begin{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	308	//a stronger version of simp
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	309	def bsimpStrong(r: ARexp): ARexp =
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	310	{
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	311	r match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	312	case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	313	//normal clauses same as simp
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	314	case (AZERO, _) => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	315	case (_, AZERO) => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	316	case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	317	//bs2 can be discarded
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	318	case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	319	case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	320	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	321	case AALTS(bs1, rs) => {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	322	//distinctBy(flat_res, erase)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	323	distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	324	case Nil => AZERO
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	325	case s :: Nil => fuse(bs1, s)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	326	case rs => AALTS(bs1, rs)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	327	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	328	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	329	//stars that can be treated as 1
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	330	case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	331	case r => r
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	332	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	333	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	334	\end{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	335	\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	336	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	337	\noindent
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	338	$\distinctWith$, is in turn used in $\bsimpStrong$:
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	339	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	340	\begin{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	341	//Conjecture: [\| bdersStrong(s, r) \|] = O([\| r \|]^3)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	342	def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	343	case Nil => r
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	344	case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	345	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	346	\end{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	347	\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	348	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	349	\noindent
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	350	We conjecture that the above Scala function $\bdersStrongs$,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	351	written $\bdersStrong{\_}{\_}$ as an infix notation,
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	352	satisfies the following property:
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	353	\begin{conjecture}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	354	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	355	\end{conjecture}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	356	The stronger version of $\blexersimp$'s
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	357	code in Scala looks like:
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	358	\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	359	\begin{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	360	def strongBlexer(r: Rexp, s: String) : Option[Val] = {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	361	Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	362	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	363	def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	364	case Nil => {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	365	if (bnullable(r)) {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	366	mkepsBC(r)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	367	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	368	else
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	369	throw new Exception("Not matched")
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	370	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	371	case c::cs => {
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	372	strong_blex_simp(strongBsimp(bder(c, r)), cs)
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	373	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	374	}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	375	\end{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	376	\end{figure}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	377	\noindent
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	378	We call this lexer $\blexerStrong$.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	379	$\blexerStrong$ is able to drastically reduce the
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	380	internal data structure size which could
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	381	trigger exponential behaviours in
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	382	$\blexersimp$.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	383	\begin{figure}[H]
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	384	\centering
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	385	\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	386	\begin{tikzpicture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	387	\begin{axis}[
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	388	%xlabel={$n$},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	389	myplotstyle,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	390	xlabel={input length},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	391	ylabel={size},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	392	width = 7cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	393	height = 5cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	394	]
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	395	\addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	396	\end{axis}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	397	\end{tikzpicture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	398	&
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	399	\begin{tikzpicture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	400	\begin{axis}[
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	401	%xlabel={$n$},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	402	myplotstyle,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	403	xlabel={input length},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	404	ylabel={size},
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	405	width = 7cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	406	height = 5cm,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	407	]
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	408	\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	409	\end{axis}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	410	\end{tikzpicture}\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	411	\multicolumn{2}{l}{}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	412	\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	413	\caption{Runtime for matching
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	414	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^)^$ with strings
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	415	of the form $\protect\underbrace{aa..a}_{n}$.}\label{fig:aaaaaStarStar}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	416	\end{figure}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	417	\noindent
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	418	We would like to preserve the correctness like the one
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	419	we had for $\blexersimp$:
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	420	\begin{conjecture}\label{cubicConjecture}
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	421	$\blexerStrong \;r \; s = \blexer\; r\;s$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	422	\end{conjecture}
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	423	\noindent
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	424	The idea is to maintain key lemmas in
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	425	chapter \ref{Bitcoded2} like
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	426	$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	427	with the new rewriting rule \ref{cubicRule} .
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	428
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	429	In the next sub-section,
592 7f4c353c0f6b more Chengsong parents: 591 diff changeset	430	we will describe why we
7f4c353c0f6b more Chengsong parents: 591 diff changeset	431	believe a cubic bound can be achieved.
7f4c353c0f6b more Chengsong parents: 591 diff changeset	432	We give an introduction to the
7f4c353c0f6b more Chengsong parents: 591 diff changeset	433	partial derivatives,
7f4c353c0f6b more Chengsong parents: 591 diff changeset	434	which was invented by Antimirov \cite{Antimirov95},
7f4c353c0f6b more Chengsong parents: 591 diff changeset	435	and then link it with the result of the function
7f4c353c0f6b more Chengsong parents: 591 diff changeset	436	$\bdersStrongs$.
7f4c353c0f6b more Chengsong parents: 591 diff changeset	437
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	438	\subsection{Antimirov's partial derivatives}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	439	Partial derivatives were first introduced by
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	440	Antimirov \cite{Antimirov95}.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	441	It does derivatives in a similar way as suggested by Brzozowski,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	442	but splits children of alternative regular expressions into
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	443	multiple independent terms, causing the output to become a
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	444	set of regular expressions:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	445	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	446	\begin{tabular}{lcl}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	447	$\partial_x \; (a \cdot b)$ &
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	448	$\dn$ & $\partial_x \; a\cdot b \cup
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	449	\partial_x \; b \; \textit{if} \; \; \nullable\; a$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	450	& & $\partial_x \; a\cdot b \quad\quad
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	451	\textit{otherwise}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	452	$\partial_x \; r^$ & $\dn$ & $\partial_x \; r \cdot r^$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	453	$\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \;
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	454	\textit{then} \;
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	455	\{ \ONE\} \;\;\textit{else} \; \varnothing$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	456	$\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	457	$\partial_x(\ONE)$ & $=$ & $\varnothing$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	458	$\partial_x(\ZERO)$ & $\dn$ & $\varnothing$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	459	\end{tabular}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	460	\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	461	\noindent
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	462	The $\cdot$ between for example
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	463	$\partial_x \; a\cdot b $
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	464	is a shorthand notation for the cartesian product
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	465	$\partial_x \; a \times \{ b\}$.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	466	%Each element in the set generated by a partial derivative
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	467	%corresponds to a (potentially partial) match
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	468	%TODO: define derivatives w.r.t string s
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	469	Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	470	using the $\sum$ constructor, Antimirov put them into
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	471	a set. This causes maximum de-duplication to happen,
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	472	allowing us to understand what are the "atomic" components of it.
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	473	For example, To compute what regular expression $x^(xx + y)^$'s
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	474	derivative against $x$ is made of, one can do a partial derivative
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	475	of it and get two singleton sets $\{x^* \cdot (xx + y)^\}$ and $\{x \cdot (xx + y) ^ \}$
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	476	from $\partial_x(x^) \cdot (xx + y) ^$ and $\partial_x((xx + y)^*)$.
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	477
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	478	The set of all possible partial derivatives is defined
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	479	as the union of derivatives w.r.t all the strings in the universe:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	480	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	481	\begin{tabular}{lcl}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	482	$\textit{PDER}_{UNIV} \; r $ & $\dn $ & $\bigcup_{w \in A^*}\partial_w \; r$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	483	\end{tabular}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	484	\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	485	\noindent
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	486
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	487	Back to our
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	488	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	489	$((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^)^$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	490	\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	491	example, if we denote this regular expression as $A$,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	492	we have that
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	493	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	494	$\textit{PDER}_{UNIV} \; A =
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	495	\bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	496	(\underbrace{a \ldots a}_{\text{j a's}}\cdot
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	497	(\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot A \}$,
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	498	\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	499	with exactly $n * (n + 1) / 2$ terms.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	500	This is in line with our speculation that only $n*(n+1)/2$ terms are
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	501	needed. We conjecture that $\bsimpStrong$ is also able to achieve this
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	502	upper limit in general
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	503	\begin{conjecture}\label{bsimpStrongInclusionPder}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	504	Using a suitable transformation $f$, we have
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	505	\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	506	$\forall s.\; f \; (r \bdersStrong \; s) \subseteq
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	507	\textit{PDER}_{UNIV} \; r$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	508	\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	509	\end{conjecture}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	510	\noindent
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	511	because our \ref{cubicRule} will keep only one copy of each term,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	512	where the function $\textit{prune}$ takes care of maintaining
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	513	a set like structure similar to partial derivatives.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	514	It is anticipated we might need to adjust $\textit{prune}$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	515	slightly to make sure all duplicate terms are eliminated,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	516	which should be doable.
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	517
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	518	Antimirov had proven that the sum of all the partial derivative
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	519	terms' sizes is bounded by the cubic of the size of that regular
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	520	expression:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	521	\begin{property}\label{pderBound}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	522	$\llbracket \textit{PDER}_{UNIV} \; r \rrbracket \leq O((\llbracket r \rrbracket)^3)$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	523	\end{property}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	524	This property was formalised by Urban, and the details are in the PDERIVS.thy file
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	525	in our repository.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	526	Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	527	would yield us a cubic bound for our $\blexerStrong$ algorithm:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	528	\begin{conjecture}\label{strongCubic}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	529	$\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	530	\end{conjecture}
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	531
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	532
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	533	%To get all the "atomic" components of a regular expression's possible derivatives,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	534	%there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	535	%whatever character is available at the head of the string inside the language of a
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	536	%regular expression, and gives back the character and the derivative regular expression
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	537	%as a pair (which he called "monomial"):
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	538	% \begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	539	% \begin{tabular}{ccc}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	540	% $\lf(\ONE)$ & $=$ & $\phi$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	541	%$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	542	% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	543	% $\lf(r^)$ & $=$ & $\lf(r) \bigodot \lf(r^)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	544	%\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	545	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	546	%%TODO: completion
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	547	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	548	%There is a slight difference in the last three clauses compared
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	549	%with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	550	%expression $r$ with every element inside $\textit{rset}$ to create a set of
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	551	%sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	552	%on a set of monomials (which Antimirov called "linear form") and a regular
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	553	%expression, and returns a linear form:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	554	% \begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	555	% \begin{tabular}{ccc}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	556	% $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	557	% $l \bigodot (\ONE)$ & $=$ & $l$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	558	% $\phi \bigodot t$ & $=$ & $\phi$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	559	% $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	560	% $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	561	% $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	562	% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	563	% $\lf(r^)$ & $=$ & $\lf(r) \cdot \lf(r^)$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	564	%\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	565	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	566	%%TODO: completion
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	567	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	568	% Some degree of simplification is applied when doing $\bigodot$, for example,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	569	% $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	570	% and $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	571	% $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	572	% and so on.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	573	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	574	% With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	575	% an iterative procedure:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	576	% \begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	577	% \begin{tabular}{llll}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	578	%$\textit{while}$ & $(\Delta_i \neq \phi)$ & & \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	579	% & $\Delta_{i+1}$ & $ =$ & $\lf(\Delta_i) - \PD_i$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	580	% & $\PD_{i+1}$ & $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	581	%$\partial_{UNIV}(r)$ & $=$ & $\PD$ &
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	582	%\end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	583	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	584	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	585	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	586	% $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
591 b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	587
b2d0de6aee18 more polishing integrated comments chap2 Chengsong parents: 590 diff changeset	588
532 cc54ce075db5 restructured Chengsong parents: diff changeset	589
cc54ce075db5 restructured Chengsong parents: diff changeset	590
cc54ce075db5 restructured Chengsong parents: diff changeset	591	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	592	% SECTION 2
cc54ce075db5 restructured Chengsong parents: diff changeset	593	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	594
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	595
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	596	%The closed form for them looks like:
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	597	%%\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	598	%% \begin{tabular}{llrclll}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	599	%% $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	600	%% $\textit{rsimp}$ & $($ & $
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	601	%% \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	602	%% & & & & $\textit{nupdates} \;$ &
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	603	%% $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	604	%% & & & & $)$ &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	605	%% & & $)$ & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	606	%% & $)$ & & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	607	%% \end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	608	%%\end{center}
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	609	%\begin{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	610	% \begin{tabular}{llrcllrllll}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	611	% $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	612	% &&&&$\textit{rsimp}$ & $($ & $
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	613	% \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	614	% &&&& & & & & $\;\; \textit{nupdates} \;$ &
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	615	% $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	616	% &&&& & & & & $)$ &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	617	% &&&& & & $)$ & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	618	% &&&& & $)$ & & & &\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	619	% \end{tabular}
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	620	%\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	621	%The $\textit{optermsimp}$ function with the argument $r$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	622	%chooses from two options: $\ZERO$ or
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	623	%We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	624	%and $\starupdates$:
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	625	%\begin{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	626	% \begin{tabular}{lcl}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	627	% $\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	628	% $\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	629	% & & $\textit{if} \;
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	630	% (\rnullable \; (\rders \; r \; s))$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	631	% & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	632	% \starupdate \; c \; r \; Ss)$ \\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	633	% & & $\textit{else} \;\; (s @ [c]) :: (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	634	% \starupdate \; c \; r \; Ss)$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	635	% \end{tabular}
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	636	%\end{center}
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	637	%\noindent
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	638	%As a generalisation from characters to strings,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	639	%$\starupdates$ takes a string instead of a character
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	640	%as the first input argument, and is otherwise the same
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	641	%as $\starupdate$.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	642	%\begin{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	643	% \begin{tabular}{lcl}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	644	% $\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	645	% $\starupdates \; (c :: cs) \; r \; Ss$ & $=$ & $\starupdates \; cs \; r \; (
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	646	% \starupdate \; c \; r \; Ss)$
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	647	% \end{tabular}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	648	%\end{center}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	649	%\noindent
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	650
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	651
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	652
621 17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	653	%\section{Zippers}
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	654	%Zipper is a data structure designed to operate on
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	655	%and navigate between local parts of a tree.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	656	%It was first formally described by Huet \cite{HuetZipper}.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	657	%Typical applications of zippers involve text editor buffers
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	658	%and proof system databases.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	659	%In our setting, the idea is to compactify the representation
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	660	%of derivatives with zippers, thereby making our algorithm faster.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	661	%Some initial results
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	662	%We first give a brief introduction to what zippers are,
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	663	%and other works
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	664	%that apply zippers to derivatives
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	665	%When dealing with large trees, it would be a waste to
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	666	%traverse the entire tree if
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	667	%the operation only
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	668	%involves a small fraction of it.
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	669	%The idea is to put the focus on that subtree, turning other parts
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	670	%of the tree into a context
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	671	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	672	%
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	673	%One observation about our derivative-based lexing algorithm is that
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	674	%the derivative operation sometimes traverses the entire regular expression
17c7611fb0a9 chap6 Chengsong parents: 620 diff changeset	675	%unnecessarily:
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	676
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	677
612 8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	678	%----------------------------------------------------------------------------------------
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	679	% SECTION 1
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	680	%----------------------------------------------------------------------------------------
532 cc54ce075db5 restructured Chengsong parents: diff changeset	681
612 8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	682	%\section{Adding Support for the Negation Construct, and its Correctness Proof}
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	683	%We now add support for the negation regular expression:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	684	%\[ r ::= \ZERO \mid \ONE
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	685	% \mid c
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	686	% \mid r_1 \cdot r_2
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	687	% \mid r_1 + r_2
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	688	% \mid r^*
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	689	% \mid \sim r
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	690	%\]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	691	%The $\textit{nullable}$ function's clause for it would be
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	692	%\[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	693	%\textit{nullable}(~r) = \neg \nullable(r)
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	694	%\]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	695	%The derivative would be
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	696	%\[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	697	%~r \backslash c = ~ (r \backslash c)
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	698	%\]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	699	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	700	%The most tricky part of lexing for the $~r$ regular expression
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	701	% is creating a value for it.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	702	% For other regular expressions, the value aligns with the
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	703	% structure of the regular expression:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	704	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	705	% \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	706	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	707	%But for the $~r$ regular expression, $s$ is a member of it if and only if
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	708	%$s$ does not belong to $L(r)$.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	709	%That means when there
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	710	%is a match for the not regular expression, it is not possible to generate how the string $s$ matched
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	711	%with $r$.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	712	%What we can do is preserve the information of how $s$ was not matched by $r$,
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	713	%and there are a number of options to do this.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	714	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	715	%We could give a partial value when there is a partial match for the regular expression inside
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	716	%the $\mathbf{not}$ construct.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	717	%For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	718	%A value for it could be
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	719	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	720	% \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	721	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	722	% The above example demonstrates what value to construct
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	723	% when the string $s$ is at most a real prefix
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	724	% of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	725	% in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	726	% constructor.
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	727	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	728	% Another option would be to either store the string $s$ that resulted in
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	729	% a mis-match for $r$ or a dummy value as a placeholder:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	730	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	731	% \vdash \textit{Not}(abcd) : ~( r_1 )
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	732	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	733	%or
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	734	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	735	% \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	736	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	737	% We choose to implement this as it is most straightforward:
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	738	% \[
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	739	% \mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	740	% \]
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	741	%
8c234a1bc7e0 chap6 Chengsong parents: 596 diff changeset	742	%
620 ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	743	%\begin{center}
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	744	% \begin{tabular}{lcl}
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	745	% $\ntset \; r \; (n+1) \; c::cs $ & $\dn$ & $\nupdates \;
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	746	% cs \; r \; [\Some \; ([c], n)]$\\
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	747	% $\ntset \; r\; 0 \; \_$ & $\dn$ & $\None$\\
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	748	% $\ntset \; r \; \_ \; [] $ & $ \dn$ & $[]$\\
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	749	% \end{tabular}
ae6010c14e49 chap6 almost done Chengsong parents: 613 diff changeset	750	%\end{center}

author	Chengsong
	Thu, 17 Nov 2022 23:13:57 +0000
changeset 625	b797c9a709d9
parent 621	17c7611fb0a9
child 628	7af4e2420a8c
permissions	-rwxr-xr-x