ChengsongTanPhdThesis/Chapters/Cubic.tex
author Chengsong
Tue, 02 Aug 2022 22:11:22 +0100
changeset 575 3178f0e948ac
parent 538 8016a2480704
child 590 988e92a70704
permissions -rwxr-xr-x
more
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
\chapter{A Better Bound and Other Extensions} % Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%algorithm to include constructs such as bounded repetitions and negations.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%	SECTION strongsimp
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
\section{A Stronger Version of Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
%TODO: search for isabelle proofs of algorithms that check equivalence 
533
Chengsong
parents: 532
diff changeset
    14
In our bit-coded lexing algorithm, with or without simplification, 
Chengsong
parents: 532
diff changeset
    15
two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    16
are always expressed in the "derivative regular expression" as two
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    17
disjoint alternative terms at the current (sub-)regex level. The fact that they
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    18
are parallel tells us when we retrieve the information from this (sub-)regex 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    19
there will always be a choice of which alternative term to take.
533
Chengsong
parents: 532
diff changeset
    20
As an example, the regular expression $aa \cdot a^*+ a \cdot a^*$ (omitting bit-codes)
Chengsong
parents: 532
diff changeset
    21
expresses two possibilities it will match future input.
Chengsong
parents: 532
diff changeset
    22
It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
Chengsong
parents: 532
diff changeset
    23
\begin{figure}\label{string_3parts1}
Chengsong
parents: 532
diff changeset
    24
\begin{center}
Chengsong
parents: 532
diff changeset
    25
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 532
diff changeset
    26
    \node [rectangle split, rectangle split horizontal, rectangle split parts=3]
Chengsong
parents: 532
diff changeset
    27
        {Consumed Input
Chengsong
parents: 532
diff changeset
    28
        \nodepart{two} Expects $aa$
Chengsong
parents: 532
diff changeset
    29
         \nodepart{three} Then expects $a^*$};
Chengsong
parents: 532
diff changeset
    30
Chengsong
parents: 532
diff changeset
    31
\end{tikzpicture}
Chengsong
parents: 532
diff changeset
    32
\end{center}
Chengsong
parents: 532
diff changeset
    33
\end{figure}
Chengsong
parents: 532
diff changeset
    34
\noindent
Chengsong
parents: 532
diff changeset
    35
Or it will match at least 1 more $a$:
Chengsong
parents: 532
diff changeset
    36
\begin{figure}\label{string_3parts2}
Chengsong
parents: 532
diff changeset
    37
\begin{center}
Chengsong
parents: 532
diff changeset
    38
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 532
diff changeset
    39
    \node [rectangle split, rectangle split horizontal, rectangle split parts=3]
Chengsong
parents: 532
diff changeset
    40
        {Consumed
Chengsong
parents: 532
diff changeset
    41
        \nodepart{two} Expects $a$
Chengsong
parents: 532
diff changeset
    42
         \nodepart{three} Then expects $a^*$};
Chengsong
parents: 532
diff changeset
    43
Chengsong
parents: 532
diff changeset
    44
\end{tikzpicture}
Chengsong
parents: 532
diff changeset
    45
\end{center}
Chengsong
parents: 532
diff changeset
    46
\end{figure}
Chengsong
parents: 532
diff changeset
    47
If these two possibilities are identical, we can eliminate
Chengsong
parents: 532
diff changeset
    48
the second one as we know the second one corresponds to 
Chengsong
parents: 532
diff changeset
    49
a match that is not $\POSIX$.
Chengsong
parents: 532
diff changeset
    50
Chengsong
parents: 532
diff changeset
    51
Chengsong
parents: 532
diff changeset
    52
If two identical regexes 
Chengsong
parents: 532
diff changeset
    53
happen to be grouped into different sequences, one cannot use
Chengsong
parents: 532
diff changeset
    54
the $a + a \rightsquigarrow a$ rule to eliminate them, even if they 
Chengsong
parents: 532
diff changeset
    55
are "parallel" to each other:
Chengsong
parents: 532
diff changeset
    56
\[
Chengsong
parents: 532
diff changeset
    57
(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
Chengsong
parents: 532
diff changeset
    58
\]
Chengsong
parents: 532
diff changeset
    59
and that's because they are followed by possibly 
Chengsong
parents: 532
diff changeset
    60
different "suffixes" $r_1$ and $r_2$, and if 
Chengsong
parents: 532
diff changeset
    61
they do turn out to be different then doing 
Chengsong
parents: 532
diff changeset
    62
\[
Chengsong
parents: 532
diff changeset
    63
(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
Chengsong
parents: 532
diff changeset
    64
\]
Chengsong
parents: 532
diff changeset
    65
might cause a possible match where the string is in $L(a \cdot r_2)$ to be lost.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    66
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    67
Here is an example for this.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    68
Assume we have the derivative regex 
533
Chengsong
parents: 532
diff changeset
    69
\[( r_1  +  r_2  +  r_3)\cdot r+ 
Chengsong
parents: 532
diff changeset
    70
( r_1  +  r_5  +  r_6)\cdot( r_1  +  r_2  +  r_3)^*
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    71
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    72
533
Chengsong
parents: 532
diff changeset
    73
occurring in an intermediate step in our bit-coded lexing algorithm.
Chengsong
parents: 532
diff changeset
    74
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    75
In this expression, there will be 6 "parallel" terms if we break up regexes 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    76
of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    77
\begin{align}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    78
(\nonumber \\
533
Chengsong
parents: 532
diff changeset
    79
 r_1  + &    \label{term:1} \\
Chengsong
parents: 532
diff changeset
    80
 r_2  + &  \label{term:2} \\
Chengsong
parents: 532
diff changeset
    81
 r_3 &  \label{term:3} \\
Chengsong
parents: 532
diff changeset
    82
&	)\cdot r\; + \; (\nonumber \\
Chengsong
parents: 532
diff changeset
    83
 r_1  + & \label{term:4} \\
Chengsong
parents: 532
diff changeset
    84
 r_5  + & \label{term:5} \\
Chengsong
parents: 532
diff changeset
    85
 r_6 \label{term:6}\\
Chengsong
parents: 532
diff changeset
    86
&      )\cdot r\nonumber
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    87
\end{align}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    88
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    89
They have been labelled, and each label denotes  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    90
one term, for example, \ref{term:1} denotes
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    91
\[
533
Chengsong
parents: 532
diff changeset
    92
 r_1 \cdot r
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    93
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    94
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    95
and \ref{term:3} denotes
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    96
\[
533
Chengsong
parents: 532
diff changeset
    97
 r_3\cdot r.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    98
\]
533
Chengsong
parents: 532
diff changeset
    99
From a lexer's point of view, \ref{term:4} will never 
Chengsong
parents: 532
diff changeset
   100
be picked up in later phases of matching because there
Chengsong
parents: 532
diff changeset
   101
is a term \ref{term:1} giving identical matching information.
Chengsong
parents: 532
diff changeset
   102
The first term \ref{term:1} will match a string in $L(r_1 \cdot  r)$, 
Chengsong
parents: 532
diff changeset
   103
and so on for term \ref{term:2}, \ref{term:3}, etc.
Chengsong
parents: 532
diff changeset
   104
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   105
\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
533
Chengsong
parents: 532
diff changeset
   106
\begin{center}\label{string_2parts}
Chengsong
parents: 532
diff changeset
   107
Chengsong
parents: 532
diff changeset
   108
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 532
diff changeset
   109
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
Chengsong
parents: 532
diff changeset
   110
        {$r_{x1}$
Chengsong
parents: 532
diff changeset
   111
         \nodepart{two} $r_1\cdot r$ };
Chengsong
parents: 532
diff changeset
   112
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 532
diff changeset
   113
\end{tikzpicture}
Chengsong
parents: 532
diff changeset
   114
Chengsong
parents: 532
diff changeset
   115
\end{center}
Chengsong
parents: 532
diff changeset
   116
For term 1 \ref{term:1}, whatever was before the current
Chengsong
parents: 532
diff changeset
   117
input position was fully matched and the regex corresponding
Chengsong
parents: 532
diff changeset
   118
to it reduced to $\ONE$, 
Chengsong
parents: 532
diff changeset
   119
and in the next input token, it will start on $r_1\cdot r$.
Chengsong
parents: 532
diff changeset
   120
The resulting value will be something of a similar configuration:
Chengsong
parents: 532
diff changeset
   121
\begin{center}\label{value_2parts}
Chengsong
parents: 532
diff changeset
   122
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 532
diff changeset
   123
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 532
diff changeset
   124
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
Chengsong
parents: 532
diff changeset
   125
        {$v_{x1}$
Chengsong
parents: 532
diff changeset
   126
         \nodepart{two} $v_{r_1\cdot r}$ };
Chengsong
parents: 532
diff changeset
   127
\end{tikzpicture}
Chengsong
parents: 532
diff changeset
   128
\end{center}
Chengsong
parents: 532
diff changeset
   129
For term 2 \ref{term:2} we have a similar value partition:
Chengsong
parents: 532
diff changeset
   130
\begin{center}\label{value_2parts2}
Chengsong
parents: 532
diff changeset
   131
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 532
diff changeset
   132
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 532
diff changeset
   133
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
Chengsong
parents: 532
diff changeset
   134
        {$v_{x2}$
Chengsong
parents: 532
diff changeset
   135
         \nodepart{two} $v_{r_2\cdot r}$ };
Chengsong
parents: 532
diff changeset
   136
\end{tikzpicture}
Chengsong
parents: 532
diff changeset
   137
\end{center}
Chengsong
parents: 532
diff changeset
   138
\noindent
Chengsong
parents: 532
diff changeset
   139
and so on.
Chengsong
parents: 532
diff changeset
   140
We note that for term 4 \ref{term:4} its result value 
Chengsong
parents: 532
diff changeset
   141
after any position beyond  the current one will always
Chengsong
parents: 532
diff changeset
   142
be the same:
Chengsong
parents: 532
diff changeset
   143
\begin{center}\label{value_2parts4}
Chengsong
parents: 532
diff changeset
   144
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 532
diff changeset
   145
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 532
diff changeset
   146
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
Chengsong
parents: 532
diff changeset
   147
        {$v_{x4}$
Chengsong
parents: 532
diff changeset
   148
         \nodepart{two} $v_{r_1\cdot r}$ };
Chengsong
parents: 532
diff changeset
   149
\end{tikzpicture}
Chengsong
parents: 532
diff changeset
   150
\end{center}
Chengsong
parents: 532
diff changeset
   151
And $\POSIX$ rules says that we can eliminate at least one of them.
Chengsong
parents: 532
diff changeset
   152
Our algorithm always puts the regex with the longest initial sub-match at the left of the 
Chengsong
parents: 532
diff changeset
   153
alternatives, so we safely throw away \ref{term:4}.
Chengsong
parents: 532
diff changeset
   154
The fact that term 1 and 4 are not immediately in the same alternative
Chengsong
parents: 532
diff changeset
   155
expression does not prevent them from being two duplicate matches at
Chengsong
parents: 532
diff changeset
   156
the current point of view.
Chengsong
parents: 532
diff changeset
   157
To implement this idea into an algorithm, we define a "pruning function"
Chengsong
parents: 532
diff changeset
   158
$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
Chengsong
parents: 532
diff changeset
   159
$\textit{acc}$, which acts as an element
Chengsong
parents: 532
diff changeset
   160
is a stronger version of $\textit{distinctBy}$.
Chengsong
parents: 532
diff changeset
   161
Here is a concise version of it (in the style of Scala):
Chengsong
parents: 532
diff changeset
   162
\begin{verbatim}
Chengsong
parents: 532
diff changeset
   163
def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) : 
Chengsong
parents: 532
diff changeset
   164
List[ARexp] =  rs match {
Chengsong
parents: 532
diff changeset
   165
  case Nil => Nil
Chengsong
parents: 532
diff changeset
   166
  case r :: rs =>
Chengsong
parents: 532
diff changeset
   167
    if(acc.contains(erase(r)))
Chengsong
parents: 532
diff changeset
   168
      distinctByPlus(rs, acc)
Chengsong
parents: 532
diff changeset
   169
    else 
Chengsong
parents: 532
diff changeset
   170
      prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
Chengsong
parents: 532
diff changeset
   171
}
Chengsong
parents: 532
diff changeset
   172
Chengsong
parents: 532
diff changeset
   173
\end{verbatim}
Chengsong
parents: 532
diff changeset
   174
But for the function $\textit{prune}$ there is a difficulty:
Chengsong
parents: 532
diff changeset
   175
how could one define the $L$ function in a "computable" way,
Chengsong
parents: 532
diff changeset
   176
so that they generate a (lazy infinite) set of strings in $L(r)$.
Chengsong
parents: 532
diff changeset
   177
We also need a function that tests whether $L(r_1) \subseteq L(r_2)$
Chengsong
parents: 532
diff changeset
   178
is true.
Chengsong
parents: 532
diff changeset
   179
For the moment we cut corners and do not define these two functions
Chengsong
parents: 532
diff changeset
   180
as they are not used by Antimirov (and will probably not contribute
Chengsong
parents: 532
diff changeset
   181
to a bound better than Antimirov's cubic bound).
Chengsong
parents: 532
diff changeset
   182
Rather, we do not try to eliminate in every instance of regular expressions
Chengsong
parents: 532
diff changeset
   183
that have "language duplicates", but only eliminate the "exact duplicates".
Chengsong
parents: 532
diff changeset
   184
For this we need to break up terms $(a+b)\cdot c$ into two
Chengsong
parents: 532
diff changeset
   185
terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   186
\begin{figure}
533
Chengsong
parents: 532
diff changeset
   187
\begin{verbatim}
Chengsong
parents: 532
diff changeset
   188
def distinctWith(rs: List[ARexp], 
Chengsong
parents: 532
diff changeset
   189
                pruneFunction: (ARexp, Set[Rexp]) => ARexp, 
Chengsong
parents: 532
diff changeset
   190
                acc: Set[Rexp] = Set()) : List[ARexp] = 
Chengsong
parents: 532
diff changeset
   191
  rs match{
Chengsong
parents: 532
diff changeset
   192
    case Nil => Nil
Chengsong
parents: 532
diff changeset
   193
    case r :: rs => 
Chengsong
parents: 532
diff changeset
   194
      if(acc(erase(r)))
Chengsong
parents: 532
diff changeset
   195
        distinctWith(rs, pruneFunction, acc)
Chengsong
parents: 532
diff changeset
   196
      else {
Chengsong
parents: 532
diff changeset
   197
        val pruned_r = pruneFunction(r, acc)
Chengsong
parents: 532
diff changeset
   198
        pruned_r :: 
Chengsong
parents: 532
diff changeset
   199
        distinctWith(rs, 
Chengsong
parents: 532
diff changeset
   200
          pruneFunction, 
Chengsong
parents: 532
diff changeset
   201
          turnIntoTerms(erase(pruned_r)) ++: acc
Chengsong
parents: 532
diff changeset
   202
        )
Chengsong
parents: 532
diff changeset
   203
      }
Chengsong
parents: 532
diff changeset
   204
  }
Chengsong
parents: 532
diff changeset
   205
\end{verbatim}
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   206
\caption{A Stronger Version of $\textit{distinctBy}$}
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   207
\end{figure}
533
Chengsong
parents: 532
diff changeset
   208
\noindent
Chengsong
parents: 532
diff changeset
   209
This process is done by the function $\textit{turnIntoTerms}$:
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   210
\begin{figure}
533
Chengsong
parents: 532
diff changeset
   211
\begin{verbatim}
Chengsong
parents: 532
diff changeset
   212
def turnIntoTerms(r: Rexp): List[Rexp] = r match {
Chengsong
parents: 532
diff changeset
   213
  case SEQ(r1, r2)  => if(isOne(r1)) 
Chengsong
parents: 532
diff changeset
   214
                          turnIntoTerms(r2) 
Chengsong
parents: 532
diff changeset
   215
                       else 
Chengsong
parents: 532
diff changeset
   216
                          turnIntoTerms(r1).map(r11 => SEQ(r11, r2))
Chengsong
parents: 532
diff changeset
   217
  case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2)
Chengsong
parents: 532
diff changeset
   218
  case ZERO => Nil
Chengsong
parents: 532
diff changeset
   219
  case _ => r :: Nil
Chengsong
parents: 532
diff changeset
   220
}
Chengsong
parents: 532
diff changeset
   221
\end{verbatim}
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   222
\caption{function that breaks up regular expression into "atomic" terms}
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   223
\end{figure}
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   224
533
Chengsong
parents: 532
diff changeset
   225
\noindent
Chengsong
parents: 532
diff changeset
   226
The pruning function can be defined recursively:
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   227
\begin{figure}
533
Chengsong
parents: 532
diff changeset
   228
\begin{verbatim}
Chengsong
parents: 532
diff changeset
   229
def prune7(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
Chengsong
parents: 532
diff changeset
   230
  case AALTS(bs, rs) => rs.map(r => prune7(r, acc)).filter(_ != ZERO) match
Chengsong
parents: 532
diff changeset
   231
  {
Chengsong
parents: 532
diff changeset
   232
    case Nil => AZERO
Chengsong
parents: 532
diff changeset
   233
    case r::Nil => fuse(bs, r)
Chengsong
parents: 532
diff changeset
   234
    case rs1 => AALTS(bs, rs1)
Chengsong
parents: 532
diff changeset
   235
  }
Chengsong
parents: 532
diff changeset
   236
  case ASEQ(bs, r1, r2) => prune7(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
Chengsong
parents: 532
diff changeset
   237
    case AZERO => AZERO
Chengsong
parents: 532
diff changeset
   238
    case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
Chengsong
parents: 532
diff changeset
   239
    case r1p => ASEQ(bs, r1p, r2)
Chengsong
parents: 532
diff changeset
   240
  }
Chengsong
parents: 532
diff changeset
   241
  case r => if(acc(erase(r))) AZERO else r
Chengsong
parents: 532
diff changeset
   242
}
Chengsong
parents: 532
diff changeset
   243
\end{verbatim}
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   244
\caption{pruning function}
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   245
\end{figure}
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   246
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   247
533
Chengsong
parents: 532
diff changeset
   248
Chengsong
parents: 532
diff changeset
   249
\begin{figure}
Chengsong
parents: 532
diff changeset
   250
\centering
Chengsong
parents: 532
diff changeset
   251
\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
Chengsong
parents: 532
diff changeset
   252
\begin{tikzpicture}
Chengsong
parents: 532
diff changeset
   253
\begin{axis}[
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   254
    %xlabel={$n$},
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   255
    myplotstyle,
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   256
    xlabel={input length},
533
Chengsong
parents: 532
diff changeset
   257
    ylabel={size},
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   258
    ]
533
Chengsong
parents: 532
diff changeset
   259
\addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};
Chengsong
parents: 532
diff changeset
   260
\end{axis}
Chengsong
parents: 532
diff changeset
   261
\end{tikzpicture}
Chengsong
parents: 532
diff changeset
   262
  &
Chengsong
parents: 532
diff changeset
   263
\begin{tikzpicture}
Chengsong
parents: 532
diff changeset
   264
\begin{axis}[
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   265
    %xlabel={$n$},
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   266
    myplotstyle,
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   267
    xlabel={input length},
533
Chengsong
parents: 532
diff changeset
   268
    ylabel={size},
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   269
    ]
533
Chengsong
parents: 532
diff changeset
   270
\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
Chengsong
parents: 532
diff changeset
   271
\end{axis}
Chengsong
parents: 532
diff changeset
   272
\end{tikzpicture}\\
535
ce91c29d2885 fixed plotting error 6.1
Chengsong
parents: 533
diff changeset
   273
\multicolumn{2}{l}{Graphs: Runtime for matching $((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
533
Chengsong
parents: 532
diff changeset
   274
           of the form $\underbrace{aa..a}_{n}$.}
Chengsong
parents: 532
diff changeset
   275
\end{tabular}    
Chengsong
parents: 532
diff changeset
   276
\caption{aaaaaStarStar} \label{fig:aaaaaStarStar}
Chengsong
parents: 532
diff changeset
   277
\end{figure}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   278
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   279
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   280
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   281
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   282
%	SECTION 1
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   283
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   284
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   285
\section{Adding Support for the Negation Construct, and its Correctness Proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   286
We now add support for the negation regular expression:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   287
\[			r ::=   \ZERO \mid  \ONE
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   288
			 \mid  c  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   289
			 \mid  r_1 \cdot r_2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   290
			 \mid  r_1 + r_2   
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   291
			 \mid r^*    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   292
			 \mid \sim r
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   293
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   294
The $\textit{nullable}$ function's clause for it would be 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   295
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   296
\textit{nullable}(~r) = \neg \nullable(r)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   297
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   298
The derivative would be
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   299
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   300
~r \backslash c = ~ (r \backslash c)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   301
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   302
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   303
The most tricky part of lexing for the $~r$ regex
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   304
 is creating a value for it.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   305
 For other regular expressions, the value aligns with the 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   306
 structure of the regex:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   307
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   308
 \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   309
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   310
But for the $~r$ regex, $s$ is a member of it if and only if
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   311
$s$ does not belong to $L(r)$. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   312
That means when there
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   313
is a match for the not regex, it is not possible to generate how the string $s$ matched
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   314
with $r$. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   315
What we can do is preserve the information of how $s$ was not matched by $r$,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   316
and there are a number of options to do this.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   317
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   318
We could give a partial value when there is a partial match for the regex inside
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   319
the $\mathbf{not}$ construct.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   320
For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   321
A value for it could be 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   322
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   323
 \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   324
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   325
 The above example demonstrates what value to construct 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   326
 when the string $s$ is at most a real prefix
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   327
 of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   328
 in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   329
 constructor.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   330
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   331
 Another option would be to either store the string $s$ that resulted in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   332
 a mis-match for $r$ or a dummy value as a placeholder:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   333
 \[
533
Chengsong
parents: 532
diff changeset
   334
 \vdash \textit{Not}(abcd) : ~( r_1 )
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   335
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   336
or
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   337
 \[
533
Chengsong
parents: 532
diff changeset
   338
 \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   339
 \] 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   340
 We choose to implement this as it is most straightforward:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   341
 \[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   342
 \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   343
 \]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   344
 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   345
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   346
%	SECTION 2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   347
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   348
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   349
\section{Bounded Repetitions}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   350
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   351