lexing: comparison ChengsongTanPhdThesis/Chapters/Chapter5.tex

equal deleted inserted replaced

-:856d025dbc15
+:26584b9d47f4
+% Chapter Template
+\chapter{A Better Bound and Other Extensions} % Main chapter title
+\label{Chapter5} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
+%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
+%algorithm to include constructs such as bounded repetitions and negations.
+%----------------------------------------------------------------------------------------
+%	SECTION strongsimp
+%----------------------------------------------------------------------------------------
+\section{A Stronger Version of Simplification Inspired by Antimirov}
+%TODO: search for isabelle proofs of algorithms that check equivalence
+%----------------------------------------------------------------------------------------
+%	SECTION 1
+%----------------------------------------------------------------------------------------
+\section{Adding Support for the Negation Construct, and its Correctness Proof}
+We now add support for the negation regular expression:
+\[			r ::=   \ZERO \mid  \ONE
+			 \mid  c
+			 \mid  r_1 \cdot r_2
+			 \mid  r_1 + r_2
+			 \mid r^*
+			 \mid \sim r
+\]
+The $\textit{nullable}$ function's clause for it would be
+\[
+\textit{nullable}(~r) = \neg \nullable(r)
+\]
+The derivative would be
+\[
+~r \backslash c = ~ (r \backslash c)
+\]
+The most tricky part of lexing for the $~r$ regex
+is creating a value for it.
+For other regular expressions, the value aligns with the
+structure of the regex:
+\[
+\vdash \Seq(\Char(a), \Char(b)) : a \cdot b
+\]
+But for the $~r$ regex, $s$ is a member of it if and only if
+$s$ does not belong to $L(r)$.
+That means when there
+is a match for the not regex, it is not possible to generate how the string $s$ matched
+with $r$.
+What we can do is preserve the information of how $s$ was not matched by $r$,
+and there are a number of options to do this.
+We could give a partial value when there is a partial match for the regex inside
+the $\mathbf{not}$ construct.
+For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
+A value for it could be
+\[
+\vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
+\]
+The above example demonstrates what value to construct
+when the string $s$ is at most a real prefix
+of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
+in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
+constructor.
+Another option would be to either store the string $s$ that resulted in
+a mis-match for $r$ or a dummy value as a placeholder:
+\[
+\vdash \textit{Not}(abcd) : ~(a^*)
+\]
+or
+\[
+\vdash \textit{Not}(\textit{Dummy}) : ~(a^*)
+\]
+We choose to implement this as it is most straightforward:
+\[
+\mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
+\]
+%----------------------------------------------------------------------------------------
+%	SECTION 2
+%----------------------------------------------------------------------------------------
+\section{Bounded Repetitions}