lexing: ChengsongTanPhdThesis/Chapters/Cubic.tex@cc54ce075db5 (annotated)

532 cc54ce075db5 restructured Chengsong parents: diff changeset	1	% Chapter Template
cc54ce075db5 restructured Chengsong parents: diff changeset	2
cc54ce075db5 restructured Chengsong parents: diff changeset	3	\chapter{A Better Bound and Other Extensions} % Main chapter title
cc54ce075db5 restructured Chengsong parents: diff changeset	4
cc54ce075db5 restructured Chengsong parents: diff changeset	5	\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
cc54ce075db5 restructured Chengsong parents: diff changeset	6	%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
cc54ce075db5 restructured Chengsong parents: diff changeset	7	%algorithm to include constructs such as bounded repetitions and negations.
cc54ce075db5 restructured Chengsong parents: diff changeset	8
cc54ce075db5 restructured Chengsong parents: diff changeset	9	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	10	% SECTION strongsimp
cc54ce075db5 restructured Chengsong parents: diff changeset	11	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	12	\section{A Stronger Version of Simplification}
cc54ce075db5 restructured Chengsong parents: diff changeset	13	%TODO: search for isabelle proofs of algorithms that check equivalence
cc54ce075db5 restructured Chengsong parents: diff changeset	14	Two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
cc54ce075db5 restructured Chengsong parents: diff changeset	15	are always expressed in the "derivative regular expression" as two
cc54ce075db5 restructured Chengsong parents: diff changeset	16	disjoint alternative terms at the current (sub-)regex level. The fact that they
cc54ce075db5 restructured Chengsong parents: diff changeset	17	are parallel tells us when we retrieve the information from this (sub-)regex
cc54ce075db5 restructured Chengsong parents: diff changeset	18	there will always be a choice of which alternative term to take.
cc54ce075db5 restructured Chengsong parents: diff changeset	19
cc54ce075db5 restructured Chengsong parents: diff changeset	20	Here is an example for this.
cc54ce075db5 restructured Chengsong parents: diff changeset	21
cc54ce075db5 restructured Chengsong parents: diff changeset	22	Assume we have the derivative regex
cc54ce075db5 restructured Chengsong parents: diff changeset	23	\[(a^* + (aa)^* + (aaa)^)\cdot(a^ + (aa)^* + (aaa)^)^ +
cc54ce075db5 restructured Chengsong parents: diff changeset	24	(a^* + a\cdot(aa)^* + (aa)\cdot (aaa)^)\cdot(a^ + (aa)^* + (aaa)^)^
cc54ce075db5 restructured Chengsong parents: diff changeset	25	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	26
cc54ce075db5 restructured Chengsong parents: diff changeset	27	occurring in an intermediate step in successive
cc54ce075db5 restructured Chengsong parents: diff changeset	28	derivatives of an input string $\underbrace{aa\ldots a}_{\text{n \; a's}}$.
cc54ce075db5 restructured Chengsong parents: diff changeset	29	In this expression, there will be 6 "parallel" terms if we break up regexes
cc54ce075db5 restructured Chengsong parents: diff changeset	30	of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
cc54ce075db5 restructured Chengsong parents: diff changeset	31	\begin{align}
cc54ce075db5 restructured Chengsong parents: diff changeset	32	(\nonumber \\
cc54ce075db5 restructured Chengsong parents: diff changeset	33	a^* + & \label{term:1} \\
cc54ce075db5 restructured Chengsong parents: diff changeset	34	(aa)^* + & \label{term:2} \\
cc54ce075db5 restructured Chengsong parents: diff changeset	35	(aaa)^* & \label{term:3} \\
cc54ce075db5 restructured Chengsong parents: diff changeset	36	& )\cdot(a^* + (aa)^* + (aaa)^)^ \; + \; (\nonumber \\
cc54ce075db5 restructured Chengsong parents: diff changeset	37	a^* + & \label{term:4} \\
cc54ce075db5 restructured Chengsong parents: diff changeset	38	a \cdot (aa)^* + & \label{term:5} \\
cc54ce075db5 restructured Chengsong parents: diff changeset	39	aa \cdot (aaa)^* \label{term:6}\\
cc54ce075db5 restructured Chengsong parents: diff changeset	40	& )\cdot(a^* + (aa)^* + (aaa)^)^\nonumber
cc54ce075db5 restructured Chengsong parents: diff changeset	41	\end{align}
cc54ce075db5 restructured Chengsong parents: diff changeset	42
cc54ce075db5 restructured Chengsong parents: diff changeset	43
cc54ce075db5 restructured Chengsong parents: diff changeset	44	They have been labelled, and each label denotes
cc54ce075db5 restructured Chengsong parents: diff changeset	45	one term, for example, \ref{term:1} denotes
cc54ce075db5 restructured Chengsong parents: diff changeset	46	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	47	a^\cdot(a^ + (aa)^* + (aaa)^)^
cc54ce075db5 restructured Chengsong parents: diff changeset	48	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	49	\noindent
cc54ce075db5 restructured Chengsong parents: diff changeset	50	and \ref{term:3} denotes
cc54ce075db5 restructured Chengsong parents: diff changeset	51	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	52	(aaa)^\cdot(a^ + (aa)^* + (aaa)^)^.
cc54ce075db5 restructured Chengsong parents: diff changeset	53	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	54	These terms can be interpreted in terms of
cc54ce075db5 restructured Chengsong parents: diff changeset	55	their current input position's most recent
cc54ce075db5 restructured Chengsong parents: diff changeset	56	partial match.
cc54ce075db5 restructured Chengsong parents: diff changeset	57	Term \ref{term:1}, \ref{term:2}, and \ref{term:3}
cc54ce075db5 restructured Chengsong parents: diff changeset	58	denote the partial-match ending at the current position:
cc54ce075db5 restructured Chengsong parents: diff changeset	59	\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
cc54ce075db5 restructured Chengsong parents: diff changeset	60
cc54ce075db5 restructured Chengsong parents: diff changeset	61
cc54ce075db5 restructured Chengsong parents: diff changeset	62
cc54ce075db5 restructured Chengsong parents: diff changeset	63	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	64	% SECTION 1
cc54ce075db5 restructured Chengsong parents: diff changeset	65	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	66
cc54ce075db5 restructured Chengsong parents: diff changeset	67	\section{Adding Support for the Negation Construct, and its Correctness Proof}
cc54ce075db5 restructured Chengsong parents: diff changeset	68	We now add support for the negation regular expression:
cc54ce075db5 restructured Chengsong parents: diff changeset	69	\[ r ::= \ZERO \mid \ONE
cc54ce075db5 restructured Chengsong parents: diff changeset	70	\mid c
cc54ce075db5 restructured Chengsong parents: diff changeset	71	\mid r_1 \cdot r_2
cc54ce075db5 restructured Chengsong parents: diff changeset	72	\mid r_1 + r_2
cc54ce075db5 restructured Chengsong parents: diff changeset	73	\mid r^*
cc54ce075db5 restructured Chengsong parents: diff changeset	74	\mid \sim r
cc54ce075db5 restructured Chengsong parents: diff changeset	75	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	76	The $\textit{nullable}$ function's clause for it would be
cc54ce075db5 restructured Chengsong parents: diff changeset	77	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	78	\textit{nullable}(~r) = \neg \nullable(r)
cc54ce075db5 restructured Chengsong parents: diff changeset	79	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	80	The derivative would be
cc54ce075db5 restructured Chengsong parents: diff changeset	81	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	82	~r \backslash c = ~ (r \backslash c)
cc54ce075db5 restructured Chengsong parents: diff changeset	83	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	84
cc54ce075db5 restructured Chengsong parents: diff changeset	85	The most tricky part of lexing for the $~r$ regex
cc54ce075db5 restructured Chengsong parents: diff changeset	86	is creating a value for it.
cc54ce075db5 restructured Chengsong parents: diff changeset	87	For other regular expressions, the value aligns with the
cc54ce075db5 restructured Chengsong parents: diff changeset	88	structure of the regex:
cc54ce075db5 restructured Chengsong parents: diff changeset	89	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	90	\vdash \Seq(\Char(a), \Char(b)) : a \cdot b
cc54ce075db5 restructured Chengsong parents: diff changeset	91	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	92	But for the $~r$ regex, $s$ is a member of it if and only if
cc54ce075db5 restructured Chengsong parents: diff changeset	93	$s$ does not belong to $L(r)$.
cc54ce075db5 restructured Chengsong parents: diff changeset	94	That means when there
cc54ce075db5 restructured Chengsong parents: diff changeset	95	is a match for the not regex, it is not possible to generate how the string $s$ matched
cc54ce075db5 restructured Chengsong parents: diff changeset	96	with $r$.
cc54ce075db5 restructured Chengsong parents: diff changeset	97	What we can do is preserve the information of how $s$ was not matched by $r$,
cc54ce075db5 restructured Chengsong parents: diff changeset	98	and there are a number of options to do this.
cc54ce075db5 restructured Chengsong parents: diff changeset	99
cc54ce075db5 restructured Chengsong parents: diff changeset	100	We could give a partial value when there is a partial match for the regex inside
cc54ce075db5 restructured Chengsong parents: diff changeset	101	the $\mathbf{not}$ construct.
cc54ce075db5 restructured Chengsong parents: diff changeset	102	For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
cc54ce075db5 restructured Chengsong parents: diff changeset	103	A value for it could be
cc54ce075db5 restructured Chengsong parents: diff changeset	104	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	105	\vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
cc54ce075db5 restructured Chengsong parents: diff changeset	106	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	107	The above example demonstrates what value to construct
cc54ce075db5 restructured Chengsong parents: diff changeset	108	when the string $s$ is at most a real prefix
cc54ce075db5 restructured Chengsong parents: diff changeset	109	of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
cc54ce075db5 restructured Chengsong parents: diff changeset	110	in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
cc54ce075db5 restructured Chengsong parents: diff changeset	111	constructor.
cc54ce075db5 restructured Chengsong parents: diff changeset	112
cc54ce075db5 restructured Chengsong parents: diff changeset	113	Another option would be to either store the string $s$ that resulted in
cc54ce075db5 restructured Chengsong parents: diff changeset	114	a mis-match for $r$ or a dummy value as a placeholder:
cc54ce075db5 restructured Chengsong parents: diff changeset	115	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	116	\vdash \textit{Not}(abcd) : ~(a^*)
cc54ce075db5 restructured Chengsong parents: diff changeset	117	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	118	or
cc54ce075db5 restructured Chengsong parents: diff changeset	119	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	120	\vdash \textit{Not}(\textit{Dummy}) : ~(a^*)
cc54ce075db5 restructured Chengsong parents: diff changeset	121	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	122	We choose to implement this as it is most straightforward:
cc54ce075db5 restructured Chengsong parents: diff changeset	123	\[
cc54ce075db5 restructured Chengsong parents: diff changeset	124	\mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
cc54ce075db5 restructured Chengsong parents: diff changeset	125	\]
cc54ce075db5 restructured Chengsong parents: diff changeset	126
cc54ce075db5 restructured Chengsong parents: diff changeset	127	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	128	% SECTION 2
cc54ce075db5 restructured Chengsong parents: diff changeset	129	%----------------------------------------------------------------------------------------
cc54ce075db5 restructured Chengsong parents: diff changeset	130
cc54ce075db5 restructured Chengsong parents: diff changeset	131	\section{Bounded Repetitions}
cc54ce075db5 restructured Chengsong parents: diff changeset	132
cc54ce075db5 restructured Chengsong parents: diff changeset	133

author	Chengsong
	Fri, 03 Jun 2022 16:45:30 +0100
changeset 532	cc54ce075db5
child 533	6acbc939af6a
permissions	-rwxr-xr-x