ChengsongTanPhdThesis/Chapters/Cubic.tex
author Chengsong
Fri, 03 Jun 2022 16:45:30 +0100
changeset 532 cc54ce075db5
child 533 6acbc939af6a
permissions -rwxr-xr-x
restructured
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
\chapter{A Better Bound and Other Extensions} % Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%algorithm to include constructs such as bounded repetitions and negations.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%	SECTION strongsimp
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
\section{A Stronger Version of Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
%TODO: search for isabelle proofs of algorithms that check equivalence 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    14
Two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    15
are always expressed in the "derivative regular expression" as two
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    16
disjoint alternative terms at the current (sub-)regex level. The fact that they
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    17
are parallel tells us when we retrieve the information from this (sub-)regex 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    18
there will always be a choice of which alternative term to take.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    19
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    20
Here is an example for this.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    21
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    22
Assume we have the derivative regex 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    23
\[(a^* + (aa)^* + (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^* + 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    24
(a^* + a\cdot(aa)^* + (aa)\cdot (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^*
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    25
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    26
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    27
occurring in an intermediate step in successive
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    28
derivatives of an input string $\underbrace{aa\ldots a}_{\text{n \; a's}}$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    29
In this expression, there will be 6 "parallel" terms if we break up regexes 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    30
of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    31
\begin{align}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    32
(\nonumber \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    33
a^* + &    \label{term:1} \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    34
(aa)^* + &  \label{term:2} \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    35
(aaa)^* &  \label{term:3} \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    36
&	)\cdot(a^* + (aa)^* + (aaa)^*)^* \; + \; (\nonumber \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    37
a^* + & \label{term:4} \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    38
a \cdot (aa)^* + & \label{term:5} \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    39
aa \cdot (aaa)^* \label{term:6}\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    40
&      )\cdot(a^* + (aa)^* + (aaa)^*)^*\nonumber
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    41
\end{align}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    42
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    43
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    44
They have been labelled, and each label denotes  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    45
one term, for example, \ref{term:1} denotes
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    46
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    47
a^*\cdot(a^* + (aa)^* + (aaa)^*)^* 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    48
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    49
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    50
and \ref{term:3} denotes
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    51
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    52
(aaa)^*\cdot(a^* + (aa)^* + (aaa)^*)^*.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    53
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    54
These terms can be interpreted in terms of 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    55
their current input position's most recent 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    56
partial match.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    57
Term \ref{term:1}, \ref{term:2}, and \ref{term:3}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    58
denote the partial-match ending at the current position:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    59
\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    60
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    61
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    62
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    63
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    64
%	SECTION 1
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    65
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    66
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    67
\section{Adding Support for the Negation Construct, and its Correctness Proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    68
We now add support for the negation regular expression:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    69
\[			r ::=   \ZERO \mid  \ONE
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    70
			 \mid  c  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    71
			 \mid  r_1 \cdot r_2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    72
			 \mid  r_1 + r_2   
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    73
			 \mid r^*    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    74
			 \mid \sim r
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    75
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    76
The $\textit{nullable}$ function's clause for it would be 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    77
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    78
\textit{nullable}(~r) = \neg \nullable(r)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    79
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    80
The derivative would be
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    81
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    82
~r \backslash c = ~ (r \backslash c)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    83
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    84
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    85
The most tricky part of lexing for the $~r$ regex
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    86
 is creating a value for it.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    87
 For other regular expressions, the value aligns with the 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    88
 structure of the regex:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    89
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    90
 \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    91
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    92
But for the $~r$ regex, $s$ is a member of it if and only if
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    93
$s$ does not belong to $L(r)$. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    94
That means when there
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    95
is a match for the not regex, it is not possible to generate how the string $s$ matched
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    96
with $r$. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    97
What we can do is preserve the information of how $s$ was not matched by $r$,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    98
and there are a number of options to do this.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    99
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   100
We could give a partial value when there is a partial match for the regex inside
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   101
the $\mathbf{not}$ construct.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   102
For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   103
A value for it could be 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   104
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   105
 \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   106
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   107
 The above example demonstrates what value to construct 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   108
 when the string $s$ is at most a real prefix
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   109
 of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   110
 in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   111
 constructor.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   112
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   113
 Another option would be to either store the string $s$ that resulted in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   114
 a mis-match for $r$ or a dummy value as a placeholder:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   115
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   116
 \vdash \textit{Not}(abcd) : ~(a^*)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   117
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   118
or
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   119
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   120
 \vdash \textit{Not}(\textit{Dummy}) : ~(a^*)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   121
 \] 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   122
 We choose to implement this as it is most straightforward:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   123
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   124
 \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   125
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   126
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   127
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   128
%	SECTION 2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   129
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   130
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   131
\section{Bounded Repetitions}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   132
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   133