ChengsongTanPhdThesis/Chapters/Chapter2.tex
author Chengsong
Mon, 02 May 2022 00:23:39 +0100
changeset 500 4d9eecfc936a
parent 468 a0f27e21b42c
child 503 35b80e37dfe7
permissions -rwxr-xr-x
sad
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     1
% Chapter Template
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     2
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     3
\chapter{Chapter Title Here} % Main chapter title
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     4
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     5
\label{ChapterX} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     6
500
Chengsong
parents: 468
diff changeset
     7
Chengsong
parents: 468
diff changeset
     8
Chengsong
parents: 468
diff changeset
     9
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    10
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    11
%	SECTION 1
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    12
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    13
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    14
\section{Properties of $\backslash c$}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    15
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    16
To have a clear idea of what we can and 
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    17
need to prove about the algorithms involving
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    18
Brzozowski's derivatives, there are a few 
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    19
properties we need to be clear about 
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    20
it.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    21
\subsection{function $\backslash c$ is not 1-to-1}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    22
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    23
The derivative $w.r.t$ character $c$ is not one-to-one.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    24
Formally,
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    25
	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    26
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    27
This property is trivially true for the
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    28
character regex example:
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    29
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    30
	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    31
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    32
But apart from the cases where the derivative
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    33
output is $\ZERO$, are there non-trivial results
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    34
of derivatives which contain strings?
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    35
The answer is yes.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    36
For example,
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    37
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    38
	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    39
	where $a$ is not nullable.\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    40
	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    41
	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    42
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    43
We start with two syntactically different regexes,
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    44
and end up with the same derivative result, which is
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    45
a "meaningful" regex because it contains strings.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    46
We have rediscovered Arden's lemma:\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    47
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    48
	$A^*B = A\cdot A^* \cdot B + B$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    49
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    50
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    51
	
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    52
%-----------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    53
%	SUBSECTION 1
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    54
%-----------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    55
\subsection{Subsection 1}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    56
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    57
Nunc posuere quam at lectus tristique eu ultrices augue venenatis. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam erat volutpat. Vivamus sodales tortor eget quam adipiscing in vulputate ante ullamcorper. Sed eros ante, lacinia et sollicitudin et, aliquam sit amet augue. In hac habitasse platea dictumst.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    58
500
Chengsong
parents: 468
diff changeset
    59
Chengsong
parents: 468
diff changeset
    60
Chengsong
parents: 468
diff changeset
    61
Chengsong
parents: 468
diff changeset
    62
Chengsong
parents: 468
diff changeset
    63
Chengsong
parents: 468
diff changeset
    64
Chengsong
parents: 468
diff changeset
    65
Chengsong
parents: 468
diff changeset
    66
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    67
%-----------------------------------
500
Chengsong
parents: 468
diff changeset
    68
%	SECTION 2
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    69
%-----------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    70
500
Chengsong
parents: 468
diff changeset
    71
\section{Finiteness Property}
Chengsong
parents: 468
diff changeset
    72
In this section let us sketch our argument for why the size of the simplified
Chengsong
parents: 468
diff changeset
    73
derivatives with the aggressive simplification function can be finitely bounded.
Chengsong
parents: 468
diff changeset
    74
For convenience, we use a new datatype which we call $\textit{rrexp}$ to denote
Chengsong
parents: 468
diff changeset
    75
the difference between it and annotated regular expressions. 
Chengsong
parents: 468
diff changeset
    76
For $\rrexp$ we throw away the bitcodes on the annotated regular expressions, but keeps
Chengsong
parents: 468
diff changeset
    77
everything else intact.
Chengsong
parents: 468
diff changeset
    78
It is similar to annotated regular expression being $\erase$ed, but with all its structure being intact
Chengsong
parents: 468
diff changeset
    79
(the $\erase$ function unfortunately does not preserve structure in the case of empty and singleton
Chengsong
parents: 468
diff changeset
    80
$\AALTS$.
Chengsong
parents: 468
diff changeset
    81
We denote the operation of erasing the bits and turning an annotated regular expression 
Chengsong
parents: 468
diff changeset
    82
into an $\rrexp$ as $\rerase$.
Chengsong
parents: 468
diff changeset
    83
%TODO: definition of rerase
Chengsong
parents: 468
diff changeset
    84
That we can bound the size of annotated regular expressions by 
Chengsong
parents: 468
diff changeset
    85
$\rrexp$ is that the bitcodes grow linearly w.r.t the input string, and don't contribute to the structural size of 
Chengsong
parents: 468
diff changeset
    86
an annotated regular expression:
Chengsong
parents: 468
diff changeset
    87
$\rsize (\rerase a) = \asize a$
Chengsong
parents: 468
diff changeset
    88
Chengsong
parents: 468
diff changeset
    89
 Suppose
Chengsong
parents: 468
diff changeset
    90
we have a size function for bitcoded regular expressions, written
Chengsong
parents: 468
diff changeset
    91
$\llbracket r\rrbracket$, which counts the number of nodes if we regard $r$ as a tree
Chengsong
parents: 468
diff changeset
    92
(we omit the precise definition; ditto for lists $\llbracket r\!s\rrbracket$). For this we show that for every $r$
Chengsong
parents: 468
diff changeset
    93
there exists a bound $N$
Chengsong
parents: 468
diff changeset
    94
such that 
Chengsong
parents: 468
diff changeset
    95
Chengsong
parents: 468
diff changeset
    96
\begin{center}
Chengsong
parents: 468
diff changeset
    97
$\forall s. \; \llbracket{\bderssimp r s}\rrbracket \leq N$
Chengsong
parents: 468
diff changeset
    98
\end{center}
Chengsong
parents: 468
diff changeset
    99
Chengsong
parents: 468
diff changeset
   100
\noindent
Chengsong
parents: 468
diff changeset
   101
We prove this by induction on $r$. The base cases for $\AZERO$,
Chengsong
parents: 468
diff changeset
   102
$\AONE \textit{bs}$ and $\ACHAR \textit{bs} c$ are straightforward. The interesting case is
Chengsong
parents: 468
diff changeset
   103
for sequences of the form $\ASEQ \textit{bs} r_1 r_2$. In this case our induction
Chengsong
parents: 468
diff changeset
   104
hypotheses state $\exists N_1. \forall s. \; \llbracket \bderssimp(r_1, s) \rrbracket \leq N_1$ and
Chengsong
parents: 468
diff changeset
   105
$\exists N_2. \forall s. \; \llbracket \bderssimp(r_2, s) \rrbracket \leq N_2$. We can reason as follows
Chengsong
parents: 468
diff changeset
   106
%
Chengsong
parents: 468
diff changeset
   107
\begin{center}
Chengsong
parents: 468
diff changeset
   108
\begin{tabular}{lcll}
Chengsong
parents: 468
diff changeset
   109
& & $ \llbracket \bderssimp( (\ASEQ\, \textit{bs} \, r_1 \,r_2),  s) \rrbracket$\\
Chengsong
parents: 468
diff changeset
   110
& $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;((ASEQ [] (\bderssimp r_1 s) r_2) ::
Chengsong
parents: 468
diff changeset
   111
    [\bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)]))\rrbracket $ & (1) \\
Chengsong
parents: 468
diff changeset
   112
& $\leq$ &
Chengsong
parents: 468
diff changeset
   113
    $\llbracket\textit{\distinctWith}\,((\ASEQ [] (\bderssimp r_1 s) r_2$) ::
Chengsong
parents: 468
diff changeset
   114
    [$\bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)])\,\approx\;{}\rrbracket + 1 $ & (2) \\
Chengsong
parents: 468
diff changeset
   115
& $\leq$ & $\llbracket\ASEQ [] (\bderssimp r_1 s) r_2\rrbracket +
Chengsong
parents: 468
diff changeset
   116
             \llbracket\textit{distinctWith}\,[\bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)]\,\approx\;{}\rrbracket + 1 $ & (3) \\
Chengsong
parents: 468
diff changeset
   117
& $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 +
Chengsong
parents: 468
diff changeset
   118
      \llbracket \distinctWith\,[ \bderssimp r_2 s' \;|\; s' \in \textit{Suffix}( r_1, s)] \,\approx\;\rrbracket$ & (4)\\
Chengsong
parents: 468
diff changeset
   119
& $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5)
Chengsong
parents: 468
diff changeset
   120
\end{tabular}
Chengsong
parents: 468
diff changeset
   121
\end{center}
Chengsong
parents: 468
diff changeset
   122
Chengsong
parents: 468
diff changeset
   123
% tell Chengsong about Indian paper of closed forms of derivatives
Chengsong
parents: 468
diff changeset
   124
Chengsong
parents: 468
diff changeset
   125
\noindent
Chengsong
parents: 468
diff changeset
   126
where in (1) the $\textit{Suffix}( r_1, s)$ are the all the suffixes of $s$ where $\bderssimp r_1 s'$ is nullable ($s'$ being a suffix of $s$).
Chengsong
parents: 468
diff changeset
   127
The reason why we could write the derivatives of a sequence this way is described in section 2.
Chengsong
parents: 468
diff changeset
   128
The term (2) is used to control (1) since it we know that you can obtain an overall
Chengsong
parents: 468
diff changeset
   129
smaller regex list
Chengsong
parents: 468
diff changeset
   130
by flattening it and removing $\ZERO$s first before applying $\distinctWith$ on it.
Chengsong
parents: 468
diff changeset
   131
Section 3 is dedicated to its proof.
Chengsong
parents: 468
diff changeset
   132
In (3) we know that  $\llbracket\ASEQ [] (\bderssimp r_1 s) r_2\rrbracket$ is 
Chengsong
parents: 468
diff changeset
   133
bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
Chengsong
parents: 468
diff changeset
   134
than $N_2$. The list length after  $\distinctWith$ is bounded by a number, which we call $l_{N_2}$. It stands
Chengsong
parents: 468
diff changeset
   135
for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
Chengsong
parents: 468
diff changeset
   136
We reason similarly for  $\STAR$.\medskip
Chengsong
parents: 468
diff changeset
   137
Chengsong
parents: 468
diff changeset
   138
\noindent
Chengsong
parents: 468
diff changeset
   139
Clearly we give in this finiteness argument (Step (5)) a very loose bound that is
Chengsong
parents: 468
diff changeset
   140
far from the actual bound we can expect. We can do better than this, but this does not improve
Chengsong
parents: 468
diff changeset
   141
the finiteness property we are proving. If we are interested in a polynomial bound,
Chengsong
parents: 468
diff changeset
   142
one would hope to obtain a similar tight bound as for partial
Chengsong
parents: 468
diff changeset
   143
derivatives introduced by Antimirov \cite{Antimirov95}. After all the idea with
Chengsong
parents: 468
diff changeset
   144
 $\distinctWith$ is to maintain a ``set'' of alternatives (like the sets in
Chengsong
parents: 468
diff changeset
   145
partial derivatives). Unfortunately to obtain the exact same bound would mean
Chengsong
parents: 468
diff changeset
   146
we need to introduce simplifications, such as
Chengsong
parents: 468
diff changeset
   147
 $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
Chengsong
parents: 468
diff changeset
   148
which exist for partial derivatives. However, if we introduce them in our
Chengsong
parents: 468
diff changeset
   149
setting we would lose the POSIX property of our calculated values. We leave better
Chengsong
parents: 468
diff changeset
   150
bounds for future work.\\[-6.5mm]
Chengsong
parents: 468
diff changeset
   151
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   152
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   153
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   154
%	SECTION 2
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   155
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   156
500
Chengsong
parents: 468
diff changeset
   157
\section{"Closed Forms" of regular expressions' derivatives w.r.t strings}
Chengsong
parents: 468
diff changeset
   158
There is a nice property about the compound regular expression 
Chengsong
parents: 468
diff changeset
   159
$r_1 \cdot r_2$ in general,
Chengsong
parents: 468
diff changeset
   160
that the derivatives of it against a string $s$ can be described by
Chengsong
parents: 468
diff changeset
   161
the derivatives w.r.t $r_1$ and $r_2$ over substrings of $s$:
Chengsong
parents: 468
diff changeset
   162
\begin{lemma}
Chengsong
parents: 468
diff changeset
   163
$\textit{sflat}\_{aux} ( (r_1 \cdot r_2) \backslash s) = (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf s r_1))$
Chengsong
parents: 468
diff changeset
   164
\end{lemma}
Chengsong
parents: 468
diff changeset
   165
Chengsong
parents: 468
diff changeset
   166
Chengsong
parents: 468
diff changeset
   167
%----------------------------------------------------------------------------------------
Chengsong
parents: 468
diff changeset
   168
%	SECTION 3
Chengsong
parents: 468
diff changeset
   169
%----------------------------------------------------------------------------------------
Chengsong
parents: 468
diff changeset
   170
Chengsong
parents: 468
diff changeset
   171
\section{interaction between $\distinctWith$ and $\flts$}
Chengsong
parents: 468
diff changeset
   172
Note that it is not immediately obvious that 
Chengsong
parents: 468
diff changeset
   173
$\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   174
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   175
Sed ullamcorper quam eu nisl interdum at interdum enim egestas. Aliquam placerat justo sed lectus lobortis ut porta nisl porttitor. Vestibulum mi dolor, lacinia molestie gravida at, tempus vitae ligula. Donec eget quam sapien, in viverra eros. Donec pellentesque justo a massa fringilla non vestibulum metus vestibulum. Vestibulum in orci quis felis tempor lacinia. Vivamus ornare ultrices facilisis. Ut hendrerit volutpat vulputate. Morbi condimentum venenatis augue, id porta ipsum vulputate in. Curabitur luctus tempus justo. Vestibulum risus lectus, adipiscing nec condimentum quis, condimentum nec nisl. Aliquam dictum sagittis velit sed iaculis. Morbi tristique augue sit amet nulla pulvinar id facilisis ligula mollis. Nam elit libero, tincidunt ut aliquam at, molestie in quam. Aenean rhoncus vehicula hendrerit.