ChengsongTanPhdThesis/Chapters/Cubic.tex
changeset 532 cc54ce075db5
child 533 6acbc939af6a
equal deleted inserted replaced
531:c334f0b3ef52 532:cc54ce075db5
       
     1 % Chapter Template
       
     2 
       
     3 \chapter{A Better Bound and Other Extensions} % Main chapter title
       
     4 
       
     5 \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
       
     6 %in Chapter 4 to a polynomial one, and demonstrate how one can extend the
       
     7 %algorithm to include constructs such as bounded repetitions and negations.
       
     8  
       
     9 %----------------------------------------------------------------------------------------
       
    10 %	SECTION strongsimp
       
    11 %----------------------------------------------------------------------------------------
       
    12 \section{A Stronger Version of Simplification}
       
    13 %TODO: search for isabelle proofs of algorithms that check equivalence 
       
    14 Two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
       
    15 are always expressed in the "derivative regular expression" as two
       
    16 disjoint alternative terms at the current (sub-)regex level. The fact that they
       
    17 are parallel tells us when we retrieve the information from this (sub-)regex 
       
    18 there will always be a choice of which alternative term to take.
       
    19 
       
    20 Here is an example for this.
       
    21 
       
    22 Assume we have the derivative regex 
       
    23 \[(a^* + (aa)^* + (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^* + 
       
    24 (a^* + a\cdot(aa)^* + (aa)\cdot (aaa)^*)\cdot(a^* + (aa)^* + (aaa)^*)^*
       
    25 \]
       
    26 
       
    27 occurring in an intermediate step in successive
       
    28 derivatives of an input string $\underbrace{aa\ldots a}_{\text{n \; a's}}$.
       
    29 In this expression, there will be 6 "parallel" terms if we break up regexes 
       
    30 of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
       
    31 \begin{align}
       
    32 (\nonumber \\
       
    33 a^* + &    \label{term:1} \\
       
    34 (aa)^* + &  \label{term:2} \\
       
    35 (aaa)^* &  \label{term:3} \\
       
    36 &	)\cdot(a^* + (aa)^* + (aaa)^*)^* \; + \; (\nonumber \\
       
    37 a^* + & \label{term:4} \\
       
    38 a \cdot (aa)^* + & \label{term:5} \\
       
    39 aa \cdot (aaa)^* \label{term:6}\\
       
    40 &      )\cdot(a^* + (aa)^* + (aaa)^*)^*\nonumber
       
    41 \end{align}
       
    42 
       
    43 
       
    44 They have been labelled, and each label denotes  
       
    45 one term, for example, \ref{term:1} denotes
       
    46 \[
       
    47 a^*\cdot(a^* + (aa)^* + (aaa)^*)^* 
       
    48 \]
       
    49 \noindent
       
    50 and \ref{term:3} denotes
       
    51 \[
       
    52 (aaa)^*\cdot(a^* + (aa)^* + (aaa)^*)^*.
       
    53 \]
       
    54 These terms can be interpreted in terms of 
       
    55 their current input position's most recent 
       
    56 partial match.
       
    57 Term \ref{term:1}, \ref{term:2}, and \ref{term:3}
       
    58 denote the partial-match ending at the current position:
       
    59 \mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
       
    60 
       
    61 
       
    62 
       
    63 %----------------------------------------------------------------------------------------
       
    64 %	SECTION 1
       
    65 %----------------------------------------------------------------------------------------
       
    66 
       
    67 \section{Adding Support for the Negation Construct, and its Correctness Proof}
       
    68 We now add support for the negation regular expression:
       
    69 \[			r ::=   \ZERO \mid  \ONE
       
    70 			 \mid  c  
       
    71 			 \mid  r_1 \cdot r_2
       
    72 			 \mid  r_1 + r_2   
       
    73 			 \mid r^*    
       
    74 			 \mid \sim r
       
    75 \]
       
    76 The $\textit{nullable}$ function's clause for it would be 
       
    77 \[
       
    78 \textit{nullable}(~r) = \neg \nullable(r)
       
    79 \]
       
    80 The derivative would be
       
    81 \[
       
    82 ~r \backslash c = ~ (r \backslash c)
       
    83 \]
       
    84  
       
    85 The most tricky part of lexing for the $~r$ regex
       
    86  is creating a value for it.
       
    87  For other regular expressions, the value aligns with the 
       
    88  structure of the regex:
       
    89  \[
       
    90  \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
       
    91  \]
       
    92 But for the $~r$ regex, $s$ is a member of it if and only if
       
    93 $s$ does not belong to $L(r)$. 
       
    94 That means when there
       
    95 is a match for the not regex, it is not possible to generate how the string $s$ matched
       
    96 with $r$. 
       
    97 What we can do is preserve the information of how $s$ was not matched by $r$,
       
    98 and there are a number of options to do this.
       
    99 
       
   100 We could give a partial value when there is a partial match for the regex inside
       
   101 the $\mathbf{not}$ construct.
       
   102 For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
       
   103 A value for it could be 
       
   104  \[
       
   105  \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
       
   106  \]
       
   107  The above example demonstrates what value to construct 
       
   108  when the string $s$ is at most a real prefix
       
   109  of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
       
   110  in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
       
   111  constructor.
       
   112  
       
   113  Another option would be to either store the string $s$ that resulted in 
       
   114  a mis-match for $r$ or a dummy value as a placeholder:
       
   115  \[
       
   116  \vdash \textit{Not}(abcd) : ~(a^*)
       
   117  \]
       
   118 or
       
   119  \[
       
   120  \vdash \textit{Not}(\textit{Dummy}) : ~(a^*)
       
   121  \] 
       
   122  We choose to implement this as it is most straightforward:
       
   123  \[
       
   124  \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
       
   125  \]
       
   126  
       
   127 %----------------------------------------------------------------------------------------
       
   128 %	SECTION 2
       
   129 %----------------------------------------------------------------------------------------
       
   130 
       
   131 \section{Bounded Repetitions}
       
   132 
       
   133