diff -r c334f0b3ef52 -r cc54ce075db5 ChengsongTanPhdThesis/Chapters/Cubic.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Fri Jun 03 16:45:30 2022 +0100 @@ -0,0 +1,133 @@ +% Chapter Template + +\chapter{A Better Bound and Other Extensions} % Main chapter title + +\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound +%in Chapter 4 to a polynomial one, and demonstrate how one can extend the +%algorithm to include constructs such as bounded repetitions and negations. + +%---------------------------------------------------------------------------------------- +% SECTION strongsimp +%---------------------------------------------------------------------------------------- +\section{A Stronger Version of Simplification} +%TODO: search for isabelle proofs of algorithms that check equivalence +Two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair +are always expressed in the "derivative regular expression" as two +disjoint alternative terms at the current (sub-)regex level. The fact that they +are parallel tells us when we retrieve the information from this (sub-)regex +there will always be a choice of which alternative term to take. + +Here is an example for this. + +Assume we have the derivative regex +\[(a^* + (aa)^* + (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^* + +(a^* + a\cdot(aa)^* + (aa)\cdot (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^* +\] + +occurring in an intermediate step in successive +derivatives of an input string $\underbrace{aa\ldots a}_{\text{n \; a's}}$. +In this expression, there will be 6 "parallel" terms if we break up regexes +of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$). +\begin{align} +(\nonumber \\ +a^* + & \label{term:1} \\ +(aa)^* + & \label{term:2} \\ +(aaa)^* & \label{term:3} \\ +& )\cdot(a^* + (aa)^* + (aaa)^*)^* \; + \; (\nonumber \\ +a^* + & \label{term:4} \\ +a \cdot (aa)^* + & \label{term:5} \\ +aa \cdot (aaa)^* \label{term:6}\\ +& )\cdot(a^* + (aa)^* + (aaa)^*)^*\nonumber +\end{align} + + +They have been labelled, and each label denotes +one term, for example, \ref{term:1} denotes +\[ +a^*\cdot(a^* + (aa)^* + (aaa)^*)^* +\] +\noindent +and \ref{term:3} denotes +\[ +(aaa)^*\cdot(a^* + (aa)^* + (aaa)^*)^*. +\] +These terms can be interpreted in terms of +their current input position's most recent +partial match. +Term \ref{term:1}, \ref{term:2}, and \ref{term:3} +denote the partial-match ending at the current position: +\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$} + + + +%---------------------------------------------------------------------------------------- +% SECTION 1 +%---------------------------------------------------------------------------------------- + +\section{Adding Support for the Negation Construct, and its Correctness Proof} +We now add support for the negation regular expression: +\[ r ::= \ZERO \mid \ONE + \mid c + \mid r_1 \cdot r_2 + \mid r_1 + r_2 + \mid r^* + \mid \sim r +\] +The $\textit{nullable}$ function's clause for it would be +\[ +\textit{nullable}(~r) = \neg \nullable(r) +\] +The derivative would be +\[ +~r \backslash c = ~ (r \backslash c) +\] + +The most tricky part of lexing for the $~r$ regex + is creating a value for it. + For other regular expressions, the value aligns with the + structure of the regex: + \[ + \vdash \Seq(\Char(a), \Char(b)) : a \cdot b + \] +But for the $~r$ regex, $s$ is a member of it if and only if +$s$ does not belong to $L(r)$. +That means when there +is a match for the not regex, it is not possible to generate how the string $s$ matched +with $r$. +What we can do is preserve the information of how $s$ was not matched by $r$, +and there are a number of options to do this. + +We could give a partial value when there is a partial match for the regex inside +the $\mathbf{not}$ construct. +For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$, +A value for it could be + \[ + \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c) + \] + The above example demonstrates what value to construct + when the string $s$ is at most a real prefix + of the strings in $L(r)$. When $s$ instead is not a prefix of any strings + in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$ + constructor. + + Another option would be to either store the string $s$ that resulted in + a mis-match for $r$ or a dummy value as a placeholder: + \[ + \vdash \textit{Not}(abcd) : ~(a^*) + \] +or + \[ + \vdash \textit{Not}(\textit{Dummy}) : ~(a^*) + \] + We choose to implement this as it is most straightforward: + \[ + \mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy}) + \] + +%---------------------------------------------------------------------------------------- +% SECTION 2 +%---------------------------------------------------------------------------------------- + +\section{Bounded Repetitions} + +