--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Fri Jun 03 16:45:30 2022 +0100
@@ -0,0 +1,133 @@
+% Chapter Template
+
+\chapter{A Better Bound and Other Extensions} % Main chapter title
+
+\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
+%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
+%algorithm to include constructs such as bounded repetitions and negations.
+
+%----------------------------------------------------------------------------------------
+% SECTION strongsimp
+%----------------------------------------------------------------------------------------
+\section{A Stronger Version of Simplification}
+%TODO: search for isabelle proofs of algorithms that check equivalence
+Two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
+are always expressed in the "derivative regular expression" as two
+disjoint alternative terms at the current (sub-)regex level. The fact that they
+are parallel tells us when we retrieve the information from this (sub-)regex
+there will always be a choice of which alternative term to take.
+
+Here is an example for this.
+
+Assume we have the derivative regex
+\[(a^* + (aa)^* + (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^* +
+(a^* + a\cdot(aa)^* + (aa)\cdot (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^*
+\]
+
+occurring in an intermediate step in successive
+derivatives of an input string $\underbrace{aa\ldots a}_{\text{n \; a's}}$.
+In this expression, there will be 6 "parallel" terms if we break up regexes
+of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
+\begin{align}
+(\nonumber \\
+a^* + & \label{term:1} \\
+(aa)^* + & \label{term:2} \\
+(aaa)^* & \label{term:3} \\
+& )\cdot(a^* + (aa)^* + (aaa)^*)^* \; + \; (\nonumber \\
+a^* + & \label{term:4} \\
+a \cdot (aa)^* + & \label{term:5} \\
+aa \cdot (aaa)^* \label{term:6}\\
+& )\cdot(a^* + (aa)^* + (aaa)^*)^*\nonumber
+\end{align}
+
+
+They have been labelled, and each label denotes
+one term, for example, \ref{term:1} denotes
+\[
+a^*\cdot(a^* + (aa)^* + (aaa)^*)^*
+\]
+\noindent
+and \ref{term:3} denotes
+\[
+(aaa)^*\cdot(a^* + (aa)^* + (aaa)^*)^*.
+\]
+These terms can be interpreted in terms of
+their current input position's most recent
+partial match.
+Term \ref{term:1}, \ref{term:2}, and \ref{term:3}
+denote the partial-match ending at the current position:
+\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
+
+
+
+%----------------------------------------------------------------------------------------
+% SECTION 1
+%----------------------------------------------------------------------------------------
+
+\section{Adding Support for the Negation Construct, and its Correctness Proof}
+We now add support for the negation regular expression:
+\[ r ::= \ZERO \mid \ONE
+ \mid c
+ \mid r_1 \cdot r_2
+ \mid r_1 + r_2
+ \mid r^*
+ \mid \sim r
+\]
+The $\textit{nullable}$ function's clause for it would be
+\[
+\textit{nullable}(~r) = \neg \nullable(r)
+\]
+The derivative would be
+\[
+~r \backslash c = ~ (r \backslash c)
+\]
+
+The most tricky part of lexing for the $~r$ regex
+ is creating a value for it.
+ For other regular expressions, the value aligns with the
+ structure of the regex:
+ \[
+ \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
+ \]
+But for the $~r$ regex, $s$ is a member of it if and only if
+$s$ does not belong to $L(r)$.
+That means when there
+is a match for the not regex, it is not possible to generate how the string $s$ matched
+with $r$.
+What we can do is preserve the information of how $s$ was not matched by $r$,
+and there are a number of options to do this.
+
+We could give a partial value when there is a partial match for the regex inside
+the $\mathbf{not}$ construct.
+For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
+A value for it could be
+ \[
+ \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
+ \]
+ The above example demonstrates what value to construct
+ when the string $s$ is at most a real prefix
+ of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
+ in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
+ constructor.
+
+ Another option would be to either store the string $s$ that resulted in
+ a mis-match for $r$ or a dummy value as a placeholder:
+ \[
+ \vdash \textit{Not}(abcd) : ~(a^*)
+ \]
+or
+ \[
+ \vdash \textit{Not}(\textit{Dummy}) : ~(a^*)
+ \]
+ We choose to implement this as it is most straightforward:
+ \[
+ \mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
+ \]
+
+%----------------------------------------------------------------------------------------
+% SECTION 2
+%----------------------------------------------------------------------------------------
+
+\section{Bounded Repetitions}
+
+