ChengsongTanPhdThesis/Chapters/Chapter2.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Sun, 08 May 2022 09:58:50 +0100
changeset 504 79fdaad6f36b
parent 503 35b80e37dfe7
child 505 5ce3bd8e5696
permissions -rwxr-xr-x
added ASEQs version of Blexer
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     1
% Chapter Template
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     2
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     3
\chapter{Chapter Title Here} % Main chapter title
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     4
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     5
\label{ChapterX} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
     6
500
Chengsong
parents: 468
diff changeset
     7
Chengsong
parents: 468
diff changeset
     8
Chengsong
parents: 468
diff changeset
     9
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    10
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    11
%	SECTION 1
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    12
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    13
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    14
\section{Properties of $\backslash c$}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    15
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    16
To have a clear idea of what we can and 
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    17
need to prove about the algorithms involving
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    18
Brzozowski's derivatives, there are a few 
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    19
properties we need to be clear about 
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    20
it.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    21
\subsection{function $\backslash c$ is not 1-to-1}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    22
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    23
The derivative $w.r.t$ character $c$ is not one-to-one.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    24
Formally,
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    25
	$\exists r_1 \;r_2. r_1 \neq r_2 \mathit{and} r_1 \backslash c = r_2 \backslash c$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    26
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    27
This property is trivially true for the
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    28
character regex example:
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    29
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    30
	$r_1 = e; \; r_2 = d;\; r_1 \backslash c = \ZERO = r_2 \backslash c$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    31
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    32
But apart from the cases where the derivative
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    33
output is $\ZERO$, are there non-trivial results
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    34
of derivatives which contain strings?
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    35
The answer is yes.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    36
For example,
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    37
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    38
	Let $r_1 = a^*b\;\quad r_2 = (a\cdot a^*)\cdot b + b$.\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    39
	where $a$ is not nullable.\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    40
	$r_1 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    41
	$r_2 \backslash c = ((a \backslash c)\cdot a^*)\cdot c + b \backslash c$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    42
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    43
We start with two syntactically different regexes,
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    44
and end up with the same derivative result, which is
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    45
a "meaningful" regex because it contains strings.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    46
We have rediscovered Arden's lemma:\\
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    47
\begin{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    48
	$A^*B = A\cdot A^* \cdot B + B$
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    49
\end{center}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    50
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    51
	
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    52
%-----------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    53
%	SUBSECTION 1
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    54
%-----------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    55
\subsection{Subsection 1}
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    56
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    57
Nunc posuere quam at lectus tristique eu ultrices augue venenatis. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam erat volutpat. Vivamus sodales tortor eget quam adipiscing in vulputate ante ullamcorper. Sed eros ante, lacinia et sollicitudin et, aliquam sit amet augue. In hac habitasse platea dictumst.
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    58
500
Chengsong
parents: 468
diff changeset
    59
Chengsong
parents: 468
diff changeset
    60
Chengsong
parents: 468
diff changeset
    61
Chengsong
parents: 468
diff changeset
    62
Chengsong
parents: 468
diff changeset
    63
Chengsong
parents: 468
diff changeset
    64
Chengsong
parents: 468
diff changeset
    65
Chengsong
parents: 468
diff changeset
    66
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    67
%-----------------------------------
500
Chengsong
parents: 468
diff changeset
    68
%	SECTION 2
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    69
%-----------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
    70
500
Chengsong
parents: 468
diff changeset
    71
\section{Finiteness Property}
Chengsong
parents: 468
diff changeset
    72
In this section let us sketch our argument for why the size of the simplified
Chengsong
parents: 468
diff changeset
    73
derivatives with the aggressive simplification function can be finitely bounded.
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    74
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    75
For convenience, we use a new datatype which we call $\rrexp$ to denote
500
Chengsong
parents: 468
diff changeset
    76
the difference between it and annotated regular expressions. 
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    77
\[			\rrexp ::=   \RZERO \mid  \RONE
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    78
			 \mid  \RCHAR{c}  
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    79
			 \mid  \RSEQ{r_1}{r_2}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    80
			 \mid  \RALTS{rs}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    81
			 \mid \RSTAR{r}        
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    82
\]
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    83
For $\rrexp$ we throw away the bitcodes on the annotated regular expressions, 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    84
but keep everything else intact.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    85
It is similar to annotated regular expressions being $\erase$-ed, but with all its structure preserved
500
Chengsong
parents: 468
diff changeset
    86
(the $\erase$ function unfortunately does not preserve structure in the case of empty and singleton
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    87
$\ALTS$).
500
Chengsong
parents: 468
diff changeset
    88
We denote the operation of erasing the bits and turning an annotated regular expression 
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    89
into an $\rrexp{}$ as $\rerase{}$.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    90
\begin{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    91
\begin{tabular}{lcr}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    92
$\rerase{\AZERO}$ & $=$ & $\RZERO$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    93
$\rerase{\ASEQ{bs}{r_1}{r_2}}$ & $=$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    94
$\rerase{\AALTS{bs}{rs}}$ & $ =$ & $ \RALTS{rs}$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    95
\end{tabular}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    96
\end{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    97
%TODO: FINISH definition of rerase
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    98
Similarly we could define the derivative  and simplification on 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
    99
$\rrexp$, which would be identical to those we defined for plain $\rexp$s in chapter1, 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   100
except that now they can operate on alternatives taking multiple arguments.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   101
%TODO: definition of rder rsimp (maybe only the alternative clause)
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   102
The reason why these definitions  mirror precisely the corresponding operations
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   103
on annotated regualar expressions is that we can calculate sizes more easily:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   104
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   105
\begin{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   106
$\rsize{\rerase a} = \asize a$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   107
\end{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   108
A similar equation holds for annotated regular expressions' simplification
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   109
and the plain regular expressions' simplification:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   110
\begin{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   111
$\asize{\bsimp{a}} = \rsize{\rsimp{\rerase{a}}}$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   112
\end{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   113
Putting these two together we get a property that allows us to estimate the 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   114
size of an annotated regular expression derivative using its un-annotated counterpart:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   115
\begin{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   116
$\asize{\bderssimp{r}{s}} =  \rsize{\rderssimp{\rerase{r}}{s}}$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   117
\end{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   118
Unless stated otherwise in this chapter all $\textit{rexp}$s without
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   119
 bitcodes are seen as $\rrexp$s.
500
Chengsong
parents: 468
diff changeset
   120
Chengsong
parents: 468
diff changeset
   121
 Suppose
Chengsong
parents: 468
diff changeset
   122
we have a size function for bitcoded regular expressions, written
Chengsong
parents: 468
diff changeset
   123
$\llbracket r\rrbracket$, which counts the number of nodes if we regard $r$ as a tree
Chengsong
parents: 468
diff changeset
   124
(we omit the precise definition; ditto for lists $\llbracket r\!s\rrbracket$). For this we show that for every $r$
Chengsong
parents: 468
diff changeset
   125
there exists a bound $N$
Chengsong
parents: 468
diff changeset
   126
such that 
Chengsong
parents: 468
diff changeset
   127
Chengsong
parents: 468
diff changeset
   128
\begin{center}
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   129
$\forall s. \; \llbracket{\bderssimp{r}{s}}\rrbracket \leq N$
500
Chengsong
parents: 468
diff changeset
   130
\end{center}
Chengsong
parents: 468
diff changeset
   131
Chengsong
parents: 468
diff changeset
   132
\noindent
Chengsong
parents: 468
diff changeset
   133
We prove this by induction on $r$. The base cases for $\AZERO$,
Chengsong
parents: 468
diff changeset
   134
$\AONE \textit{bs}$ and $\ACHAR \textit{bs} c$ are straightforward. The interesting case is
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   135
for sequences of the form $\ASEQ{bs}{r_1}{r_2}$. In this case our induction
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   136
hypotheses state $\exists N_1. \forall s. \; \llbracket \bderssimp{r}{s} \rrbracket \leq N_1$ and
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   137
$\exists N_2. \forall s. \; \llbracket \bderssimp{r_2}{s} \rrbracket \leq N_2$. We can reason as follows
500
Chengsong
parents: 468
diff changeset
   138
%
Chengsong
parents: 468
diff changeset
   139
\begin{center}
Chengsong
parents: 468
diff changeset
   140
\begin{tabular}{lcll}
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   141
& & $ \llbracket   \bderssimp{\ASEQ{bs}{r_1}{r_2} }{s} \rrbracket$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   142
& $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;(\ASEQ{nil}{\bderssimp{ r_1}{s}}{ r_2} ::
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   143
    [\bderssimp{ r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)]))\rrbracket $ & (1) \\
500
Chengsong
parents: 468
diff changeset
   144
& $\leq$ &
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   145
    $\llbracket\textit{\distinctWith}\,((\ASEQ{nil}{\bderssimp{r_1}{s}}{r_2}$) ::
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   146
    [$\bderssimp{ r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)])\,\approx\;{}\rrbracket + 1 $ & (2) \\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   147
& $\leq$ & $\llbracket\ASEQ{bs}{\bderssimp{ r_1}{s}}{r_2}\rrbracket +
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   148
             \llbracket\textit{distinctWith}\,[\bderssimp{r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)]\,\approx\;{}\rrbracket + 1 $ & (3) \\
500
Chengsong
parents: 468
diff changeset
   149
& $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 +
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   150
      \llbracket \distinctWith\,[ \bderssimp{r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)] \,\approx\;\rrbracket$ & (4)\\
500
Chengsong
parents: 468
diff changeset
   151
& $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5)
Chengsong
parents: 468
diff changeset
   152
\end{tabular}
Chengsong
parents: 468
diff changeset
   153
\end{center}
Chengsong
parents: 468
diff changeset
   154
Chengsong
parents: 468
diff changeset
   155
% tell Chengsong about Indian paper of closed forms of derivatives
Chengsong
parents: 468
diff changeset
   156
Chengsong
parents: 468
diff changeset
   157
\noindent
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   158
where in (1) the $\textit{Suffix}( r_1, s)$ are the all the suffixes of $s$ where $\bderssimp{ r_1}{s'}$ is nullable ($s'$ being a suffix of $s$).
500
Chengsong
parents: 468
diff changeset
   159
The reason why we could write the derivatives of a sequence this way is described in section 2.
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   160
The term (2) is used to control (1). 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   161
That is because one can obtain an overall
500
Chengsong
parents: 468
diff changeset
   162
smaller regex list
Chengsong
parents: 468
diff changeset
   163
by flattening it and removing $\ZERO$s first before applying $\distinctWith$ on it.
Chengsong
parents: 468
diff changeset
   164
Section 3 is dedicated to its proof.
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   165
In (3) we know that  $\llbracket \ASEQ{bs}{(\bderssimp{ r_1}{s}}{r_2}\rrbracket$ is 
500
Chengsong
parents: 468
diff changeset
   166
bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
Chengsong
parents: 468
diff changeset
   167
than $N_2$. The list length after  $\distinctWith$ is bounded by a number, which we call $l_{N_2}$. It stands
Chengsong
parents: 468
diff changeset
   168
for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
Chengsong
parents: 468
diff changeset
   169
We reason similarly for  $\STAR$.\medskip
Chengsong
parents: 468
diff changeset
   170
Chengsong
parents: 468
diff changeset
   171
\noindent
Chengsong
parents: 468
diff changeset
   172
Clearly we give in this finiteness argument (Step (5)) a very loose bound that is
Chengsong
parents: 468
diff changeset
   173
far from the actual bound we can expect. We can do better than this, but this does not improve
Chengsong
parents: 468
diff changeset
   174
the finiteness property we are proving. If we are interested in a polynomial bound,
Chengsong
parents: 468
diff changeset
   175
one would hope to obtain a similar tight bound as for partial
Chengsong
parents: 468
diff changeset
   176
derivatives introduced by Antimirov \cite{Antimirov95}. After all the idea with
Chengsong
parents: 468
diff changeset
   177
 $\distinctWith$ is to maintain a ``set'' of alternatives (like the sets in
Chengsong
parents: 468
diff changeset
   178
partial derivatives). Unfortunately to obtain the exact same bound would mean
Chengsong
parents: 468
diff changeset
   179
we need to introduce simplifications, such as
Chengsong
parents: 468
diff changeset
   180
 $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
Chengsong
parents: 468
diff changeset
   181
which exist for partial derivatives. However, if we introduce them in our
Chengsong
parents: 468
diff changeset
   182
setting we would lose the POSIX property of our calculated values. We leave better
Chengsong
parents: 468
diff changeset
   183
bounds for future work.\\[-6.5mm]
Chengsong
parents: 468
diff changeset
   184
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   185
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   186
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   187
%	SECTION 2
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   188
%----------------------------------------------------------------------------------------
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   189
500
Chengsong
parents: 468
diff changeset
   190
\section{"Closed Forms" of regular expressions' derivatives w.r.t strings}
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   191
To embark on getting the "closed forms" of regexes, we first
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   192
define a few auxiliary definitions to make expressing them smoothly.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   193
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   194
 \begin{center}  
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   195
 \begin{tabular}{ccc}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   196
 $\sflataux{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   197
$\sflataux{\AALTS{ }{[]}}$ & $ = $ & $ []$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   198
$\sflataux r$ & $=$ & $ [r]$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   199
\end{tabular}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   200
\end{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   201
The intuition behind $\sflataux{\_}$ is to break up nested regexes 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   202
of the $(\ldots((r_1 + r_2) + r_3) + \ldots )$(left-associated) shape
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   203
into a more "balanced" list: $\AALTS{\_}{[r_1,\, r_2 ,\, r_3, \ldots]}$.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   204
It will return the singleton list $[r]$ otherwise.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   205
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   206
$\sflat{\_}$ works the same  as $\sflataux{\_}$, except that it keeps
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   207
the output type a regular expression, not a list:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   208
 \begin{center} 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   209
 \begin{tabular}{ccc}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   210
 $\sflat{\AALTS{ }{r :: rs}}$ & $=$ & $\sflataux{r} @ rs$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   211
$\sflat{\AALTS{ }{[]}}$ & $ = $ & $ \AALTS{ }{[]}$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   212
$\sflat r$ & $=$ & $ [r]$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   213
\end{tabular}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   214
\end{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   215
$\sflataux{\_}$  and $\sflat{\_}$ is only recursive in terms of the
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   216
 first element of the list of children of
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   217
an alternative regex ($\AALTS{ }{rs}$), and is purposefully built for  dealing with the regular expression
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   218
$r_1 \cdot r_2 \backslash s$.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   219
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   220
With $\sflat{\_}$ and $\sflataux{\_}$ we make 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   221
precise what  "closed forms" we have for the sequence derivatives and their simplifications,
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   222
in other words, how can we construct $(r_1 \cdot r_2) \backslash s$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   223
and $\bderssimp{(r_1\cdot r_2)}{s}$,
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   224
if we are only allowed to use a combination of $r_1 \backslash s'$ ($\bderssimp{r_1}{s'}$)
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   225
and  $r_2 \backslash s'$ ($\bderssimp{r_2}{s'}$), where $s'$  ranges over 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   226
the substring of $s$?
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   227
First let's look at a series of derivatives steps on a sequence 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   228
regular expression,  (assuming) that each time the first
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   229
component of the sequence is always nullable):
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   230
\begin{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   231
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   232
$r_1 \cdot r_2 \quad \longrightarrow_{\backslash c}  \quad   r_1  \backslash c \cdot r_2 + r_2 \backslash c \quad \longrightarrow_{\backslash c'} \quad (r_1 \backslash cc' \cdot r_2 + r_2 \backslash c') + r_2 \backslash cc' \longrightarrow_{\backslash c''} \quad$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   233
$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'') + r_2 \backslash cc'c''   \longrightarrow_{\backslash c''} \quad
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   234
 \ldots$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   235
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   236
\end{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   237
%TODO: cite indian paper
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   238
Indianpaper have  come up with a slightly more formal way of putting the above process:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   239
\begin{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   240
$r_1 \cdot r_2 \backslash (c_1 :: c_2 ::\ldots c_n) \myequiv r_1 \backslash (c_1 :: c_2:: \ldots c_n) +
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   241
\delta(\nullable(r_1 \backslash (c_1 :: c_2 \ldots c_{n-1}) ), r_2 \backslash (c_n)) + \ldots
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   242
+ \delta (\nullable(r_1), r_2\backslash (c_1 :: c_2 ::\ldots c_n))$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   243
\end{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   244
where  $\delta(b, r)$ will produce $r$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   245
if $b$ evaluates to true, 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   246
and $\ZERO$ otherwise.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   247
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   248
 But the $\myequiv$ sign in the above equation means language equivalence instead of syntactical
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   249
 equivalence. To make this intuition useful 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   250
 for a formal proof, we need something
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   251
stronger than language equivalence.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   252
With the help of $\sflat{\_}$ we can state the equation in Indianpaper
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   253
more rigorously:
500
Chengsong
parents: 468
diff changeset
   254
\begin{lemma}
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   255
$\sflat{(r_1 \cdot r_2) \backslash s} = \AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}$
500
Chengsong
parents: 468
diff changeset
   256
\end{lemma}
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   257
With this property we can prove the finiteness of the size of the regex $(r_1 \cdot r_2) \backslash s$,
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   258
much like a recursive function's termination proof.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   259
The function $\vsuf{\_}{\_}$ is defined this way:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   260
%TODO: DEF of vsuf
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   261
\begin{center}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   262
\begin{tabular}{lcl}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   263
$\vsuf{[]}{\_} $ & $=$ &  $[]$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   264
$\vsuf{c::cs}{r_1}$ & $ =$ & $ \textit{if} (\rnullable{r_1}) \textit{then}  (\vsuf{cs}{(rder c r1)}) @ [c :: cs]$\\
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   265
                                     && $\textit{else}  (\vsuf{cs}{rder c r1})  $
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   266
\end{tabular}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   267
\end{center}                   
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   268
It takes into account which prefixes $s'$ of $s$ would make $r \backslash s'$ nullable,
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   269
and keep a list of suffixes $s''$ such that $s' @ s'' = s$. The list is sorted in
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   270
the order $r_2\backslash s''$ appears in $(r_1\cdot r_2)\backslash s$.
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   271
In essence, $\vsuf{\_}{\_}$ is doing a "virtual derivative" of $r$, but instead of producing 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   272
the entire result of  $(r_1 \cdot r_2) \backslash s$, 
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   273
it only stores all the $r_2 \backslash s''$ terms.
500
Chengsong
parents: 468
diff changeset
   274
503
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   275
With this we can also add simplifications to both sides and get
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   276
\begin{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   277
$\rsimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   278
\end{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   279
Together with the idempotency property of $\rsimp$,
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   280
%TODO: state the idempotency property of rsimp
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   281
it is possible to convert the above lemma to obtain a "closed form"
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   282
for our lexer's intermediate result without bitcodes:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   283
\begin{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   284
$\rderssimp{\sflat{(r_1 \cdot r_2) \backslash s} }= \rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   285
\end{lemma}
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   286
And now the reason we have (2) in section 1 is clear:
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   287
$\rsize{\rsimp{\AALTS{[[]}{ (r_1 \backslash s) \cdot r_2 :: (\map (r_2 \backslash \_) (\vsuf{s}{r_1}))}}}$
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   288
is bounded by     $\llbracket\textit{\distinctWith}\,((\ASEQ{nil}{\bderssimp{r_1}{s}}{r_2}$) ::
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   289
    [$\bderssimp{ r_2}{s'} \;|\; s' \in \textit{Suffix}( r_1, s)])\,\approx\;{}\rrbracket + 1 $ ,
35b80e37dfe7 new writing
Chengsong
parents: 500
diff changeset
   290
    as $\vsuf{s}{r_1}$ is a subset of s $\textit{Suffix}( r_1, s)])$.
500
Chengsong
parents: 468
diff changeset
   291
Chengsong
parents: 468
diff changeset
   292
%----------------------------------------------------------------------------------------
Chengsong
parents: 468
diff changeset
   293
%	SECTION 3
Chengsong
parents: 468
diff changeset
   294
%----------------------------------------------------------------------------------------
Chengsong
parents: 468
diff changeset
   295
Chengsong
parents: 468
diff changeset
   296
\section{interaction between $\distinctWith$ and $\flts$}
Chengsong
parents: 468
diff changeset
   297
Note that it is not immediately obvious that 
Chengsong
parents: 468
diff changeset
   298
$\llbracket \distinctWith (\flts \textit{rs}) = \phi \rrbracket   \leq \llbracket \distinctWith \textit{rs} = \phi \rrbracket  $
468
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   299
a0f27e21b42c all texrelated
Chengsong
parents:
diff changeset
   300
Sed ullamcorper quam eu nisl interdum at interdum enim egestas. Aliquam placerat justo sed lectus lobortis ut porta nisl porttitor. Vestibulum mi dolor, lacinia molestie gravida at, tempus vitae ligula. Donec eget quam sapien, in viverra eros. Donec pellentesque justo a massa fringilla non vestibulum metus vestibulum. Vestibulum in orci quis felis tempor lacinia. Vivamus ornare ultrices facilisis. Ut hendrerit volutpat vulputate. Morbi condimentum venenatis augue, id porta ipsum vulputate in. Curabitur luctus tempus justo. Vestibulum risus lectus, adipiscing nec condimentum quis, condimentum nec nisl. Aliquam dictum sagittis velit sed iaculis. Morbi tristique augue sit amet nulla pulvinar id facilisis ligula mollis. Nam elit libero, tincidunt ut aliquam at, molestie in quam. Aenean rhoncus vehicula hendrerit.