ChengsongTanPhdThesis/Chapters/Cubic.tex
author Chengsong
Fri, 30 Dec 2022 17:37:51 +0000
changeset 639 80cc6dc4c98b
parent 638 dd9dde2d902b
permissions -rwxr-xr-x
until chap 7
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
     3
%We also present the idempotency property proof
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
     4
%of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
     5
%This reinforces our claim that the fixpoint construction
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
     6
%originally required by Sulzmann and Lu can be removed in $\blexersimp$.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
     7
%Last but not least, we present our efforts and challenges we met
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
     8
%in further improving the algorithm by data
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
     9
%structures such as zippers.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    10
%----------------------------------------------------------------------------------------
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    11
%	SECTION strongsimp
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    12
%----------------------------------------------------------------------------------------
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    13
%TODO: search for isabelle proofs of algorithms that check equivalence 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    14
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    15
\chapter{A Better Size Bound for Derivatives} % Main chapter title
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    16
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    17
\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    18
%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    19
%algorithm to include constructs such as bounded repetitions and negations.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    20
\lstset{style=myScalastyle}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    21
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    22
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 621
diff changeset
    23
This chapter is a ``work-in-progress''
b797c9a709d9 section reorganising, related work
Chengsong
parents: 621
diff changeset
    24
chapter which records
b797c9a709d9 section reorganising, related work
Chengsong
parents: 621
diff changeset
    25
extensions to our $\blexersimp$.
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    26
We make a conjecture that the finite
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    27
size bound from the previous chapter can be 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    28
improved to a cubic bound.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    29
We implemented our conjecture in Scala.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    30
We have not formalised this part in Isabelle/HOL. 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    31
%we have not been able to finish due to time constraints of the PhD.
625
b797c9a709d9 section reorganising, related work
Chengsong
parents: 621
diff changeset
    32
Nevertheless, we outline the ideas we intend to use for the proof.
b797c9a709d9 section reorganising, related work
Chengsong
parents: 621
diff changeset
    33
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    34
\section{A Stronger Version of Simplification}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    35
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    36
Let us first present further improvements
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    37
for our lexer algorithm $\blexersimp$.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    38
We devise a stronger simplification algorithm,
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    39
called $\bsimpStrong$, which can prune away
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    40
similar components in two regular expressions at the same 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    41
alternative level,
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    42
even if these regular expressions are not exactly the same.
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    43
We call the lexer that uses this stronger simplification function
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    44
$\blexerStrong$.
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    45
%Unfortunately we did not have time to 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    46
%work out the proofs, like in
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    47
%the previous chapters.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    48
We conjecture that both
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    49
\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    50
	$\blexerStrong \;r \; s = \blexer\; r\;s$
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    51
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    52
and 
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    53
\begin{center}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    54
	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    55
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    56
hold.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    57
%but a formalisation
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    58
%is still future work.
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    59
We give an informal justification 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    60
why the correctness and cubic size bound proofs
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    61
can be achieved
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    62
by exploring the connection between the internal
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    63
data structure of our $\blexerStrong$ and
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    64
Animirov's partial derivatives.
621
Chengsong
parents: 620
diff changeset
    65
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 538
diff changeset
    66
In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    67
For example, the regular expression 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    68
\[
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    69
	aa \cdot a^*+ a \cdot a^* + aa\cdot a^*
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    70
\]
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    71
contains three terms, 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    72
expressing three possibilities for how it can match some more
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
    73
input of the form $a \ldots a$.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    74
The first and the third terms are identical, which means we can eliminate
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    75
the latter as it will not contribute to a POSIX value.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    76
In $\bsimps$, the $\distinctBy$ function takes care of 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    77
such instances.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    78
The criteria $\distinctBy$ uses for removing a duplicate
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    79
$a_2$ in the list
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    80
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    81
	$rs_a@[a_1]@rs_b@[a_2]@rs_c$
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    82
\end{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    83
is that 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    84
the two erased regular expressions are equal
533
Chengsong
parents: 532
diff changeset
    85
\begin{center}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    86
	$\rerase{a_1} = \rerase{a_2}$.
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    87
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    88
This is characterised as the $LD$ 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    89
rewrite rule in figure \ref{rrewriteRules}.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    90
The problem, however, is that identical components
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    91
in two slightly different regular expressions cannot be removed
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    92
by the $LD$ rule.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    93
Consider the stronger simplification
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    94
\begin{equation}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    95
	\label{eqn:partialDedup}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
    96
	(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
    97
\end{equation}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    98
where the $(\underline{a}+c+e)\cdot r_1$ is deleted in the right alternative
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
    99
$a+c+e$.
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   100
This is permissible because we have $(a+\ldots)\cdot r_1$ in the left
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   101
alternative.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   102
The difficulty is that such  ``buried''
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   103
alternatives-sequences are not easily recognised.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   104
But simplification like this actually
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   105
cannot be omitted, if we want to have a better bound.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   106
For example, the size of derivatives can still
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   107
blow up even with our $\textit{bsimp}$ 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   108
function: 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   109
consider again the example
621
Chengsong
parents: 620
diff changeset
   110
$\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$,
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   111
and set $n$ to a relatively small number like $n=5$, then we get the following
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   112
exponential growth:
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   113
\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   114
\centering
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   115
\begin{tikzpicture}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   116
\begin{axis}[
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   117
    %xlabel={$n$},
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   118
    myplotstyle,
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   119
    xlabel={input length},
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   120
    ylabel={size},
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   121
    ]
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   122
\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   123
\end{axis}
533
Chengsong
parents: 532
diff changeset
   124
\end{tikzpicture}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   125
\caption{Size of derivatives of $\blexersimp$ from chapter 5 for matching 
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   126
	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   127
	with strings 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   128
	of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   129
\end{figure}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   130
\noindent
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   131
One possible approach would be to apply the rewriting
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   132
rule
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   133
\[
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   134
	(a+b+d) \cdot r_1  \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   135
\]
533
Chengsong
parents: 532
diff changeset
   136
\noindent
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   137
which pushes the sequence into the alternatives
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   138
in our $\simp$ function.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   139
This would then make the simplification shown in \eqref{eqn:partialDedup} possible.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   140
Translating this rule into our $\textit{bsimp}$ function would simply
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   141
involve adding a new clause to the $\textit{bsimp}_{ASEQ}$ function:
621
Chengsong
parents: 620
diff changeset
   142
\begin{center}
Chengsong
parents: 620
diff changeset
   143
	\begin{tabular}{@{}lcl@{}}
Chengsong
parents: 620
diff changeset
   144
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
Chengsong
parents: 620
diff changeset
   145
						       && $\ldots$\\
Chengsong
parents: 620
diff changeset
   146
   &&$\quad\textit{case} \; (_{bs1}\sum as, a_2') \Rightarrow _{bs1}\sum (
Chengsong
parents: 620
diff changeset
   147
   \map \; (_{[]}\textit{ASEQ} \; \_ \; a_2') \; as)$\\
Chengsong
parents: 620
diff changeset
   148
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ \\
Chengsong
parents: 620
diff changeset
   149
	\end{tabular}
Chengsong
parents: 620
diff changeset
   150
\end{center}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   151
\noindent
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   152
Unfortunately,
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   153
if we introduce this clause in our
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   154
setting we would lose the POSIX property of our calculated values. 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   155
For example given the regular expression 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   156
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   157
	$(a + ab)(bc + c)$
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   158
\end{center}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   159
and the string $ab$,
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   160
then our algorithm generates the following
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   161
correct POSIX value
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   162
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   163
	$\Seq \; (\Right \; ab) \; (\Right \; c)$.
533
Chengsong
parents: 532
diff changeset
   164
\end{center}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   165
Essentially it matches the string with the longer Right-alternative
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   166
in the first sequence (and
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   167
then the 'rest' with the character regular expression $c$ from the second sequence).
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   168
If we add the simplification above, however, 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   169
then we would obtain the following value
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   170
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   171
	$\Left \; (\Seq \; a \; (\Left \; bc))$
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   172
\end{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   173
where the $\Left$-alternatives get priority. 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   174
This violates the POSIX rules.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   175
The reason for getting this undesired value
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   176
is that the new rule splits this regular expression up into 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   177
a topmost alternative
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   178
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   179
	$a\cdot(b c + c) + ab \cdot (bc + c)$,
533
Chengsong
parents: 532
diff changeset
   180
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   181
which is a regular expression with a 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   182
quite different meaning: the original regular expression
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   183
is a sequence, but the simplified regular expression is an alternative.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   184
With an alternative the maximal munch rule no longer works.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   185
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   186
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   187
A method to reconcile this problem is to do the 
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   188
transformation in \eqref{eqn:partialDedup} ``non-invasively'',
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   189
meaning that we traverse the list of regular expressions
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   190
%\begin{center}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   191
%	$rs_a@[a]@rs_c$
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   192
%\end{center}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   193
inside alternatives
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   194
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   195
	$\sum ( rs_a@[a]@rs_c)$
533
Chengsong
parents: 532
diff changeset
   196
\end{center}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   197
using  a function similar to $\distinctBy$,
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   198
but this time 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   199
we allow the following more general rewrite rule:
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   200
\begin{equation}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   201
\label{eqn:cubicRule}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   202
%\mbox{
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   203
%\begin{mathpar}	
621
Chengsong
parents: 620
diff changeset
   204
	\inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c 
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   205
			\stackrel{s}{\rightsquigarrow }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   206
		rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   207
%\end{mathpar}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   208
%\caption{The rule capturing the pruning simplification needed to achieve
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   209
%a cubic bound}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   210
\end{equation}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   211
\noindent
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   212
%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   213
where $\textit{prune} \;a \; acc$ traverses $a$
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   214
without altering the structure of $a$, but removing components in $a$
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   215
that have appeared in the accumulator $acc$.
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   216
For example
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   217
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   218
	$\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   219
\end{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   220
should be equal to 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   221
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   222
	$(r_g+r_h)r_d$
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   223
\end{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   224
because $r_gr_d$ and 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   225
$r_hr_d$ are the only terms
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   226
that do not appeared in the accumulator list 
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   227
\begin{center}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   228
$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   229
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   230
We implemented the
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   231
function $\textit{prune}$ in Scala (see figure \ref{fig:pruneFunc})
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   232
The function $\textit{prune}$ 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   233
is a stronger version of $\textit{distinctBy}$.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   234
It does not just walk through a list looking for exact duplicates,
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   235
but prunes sub-expressions recursively.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   236
It manages proper contexts by the helper functions
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   237
$\textit{removeSeqTail}$, $\textit{isOne}$ and $\textit{atMostEmpty}$.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   238
\begin{figure}%[H]
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   239
\begin{lstlisting}[numbers=left]
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   240
    def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   241
      case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != AZERO) match
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   242
      {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   243
        //all components have been removed, meaning this is effectively a duplicate
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   244
        //flats will take care of removing this AZERO 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   245
        case Nil => AZERO
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   246
        case r::Nil => fuse(bs, r)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   247
        case rs1 => AALTS(bs, rs1)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   248
      }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   249
      case ASEQ(bs, r1, r2) => 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   250
        //remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   251
        prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   252
          //after pruning, returns 0
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   253
          case AZERO => AZERO
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   254
          //after pruning, got r1'.r2, where r1' is equal to 1
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   255
          case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   256
          //assemble the pruned head r1p with r2
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   257
          case r1p => ASEQ(bs, r1p, r2)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   258
        }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   259
        //this does the duplicate component removal task
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   260
      case r => if(acc(erase(r))) AZERO else r
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   261
    }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   262
\end{lstlisting}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   263
\caption{The function $\textit{prune}$ is called recursively in the 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   264
alternative case (line 2) and in the sequence case (line 12).
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   265
In the alternative case we keep all the accumulators the same, but
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   266
in the sequence case we are making necessary changes to the accumulators 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   267
to allow correct de-duplication.}\label{fig:pruneFunc}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   268
\end{figure}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   269
\noindent
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   270
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   271
\begin{figure}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   272
\begin{lstlisting}[numbers=left]
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   273
    def atMostEmpty(r: Rexp) : Boolean = r match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   274
      case ZERO => true
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   275
      case ONE => true
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   276
      case STAR(r) => atMostEmpty(r)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   277
      case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   278
      case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   279
      case CHAR(_) => false
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   280
    }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   281
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   282
    def isOne(r: Rexp) : Boolean = r match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   283
      case ONE => true
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   284
      case SEQ(r1, r2) => isOne(r1) && isOne(r2)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   285
      case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   286
      case STAR(r0) => atMostEmpty(r0)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   287
      case CHAR(c) => false
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   288
      case ZERO => false
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   289
    }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   290
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   291
    def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   292
      if (r == tail)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   293
        ONE
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   294
      else {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   295
        r match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   296
          case SEQ(r1, r2) => 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   297
            if(r2 == tail) r1 else ZERO
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   298
          case r => ZERO
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   299
        }
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   300
      }
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   301
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   302
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   303
\end{lstlisting}
638
dd9dde2d902b comments till chap4
Chengsong
parents: 630
diff changeset
   304
\caption{The helper functions of $\textit{prune}$: 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   305
$\textit{atMostEmpty}$, $\textit{isOne}$ and $\textit{removeSeqTail}$. $\textit{atMostEmpty}
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   306
$ is a function that takes a regular expression and returns true only in case that it
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   307
contains nothing more than the empty string in its language. $\textit{isOne}$ tests whether
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   308
a regular expression is equivalent to $\ONE$. $\textit{removeSeqTail}$ is a function that
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   309
takes away the tail of a sequence regular expression.}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   310
\end{figure}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   311
\noindent
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   312
Suppose we feed 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   313
\begin{center}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   314
	$r= (\underline{\ONE}+(\underline{f}+b)\cdot g)\cdot (a\cdot(d\cdot e))$
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   315
\end{center}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   316
and 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   317
\begin{center}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   318
	$acc = \{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \}$
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   319
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   320
as the input into $\textit{prune}$.
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   321
The end result will be
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   322
\[
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   323
	b\cdot(g\cdot(a\cdot(d\cdot e)))
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   324
\]
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   325
where the underlined components in $r$ are eliminated.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   326
Looking more closely, at the topmost call 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   327
\[
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   328
	\textit{prune} \quad (\ONE+
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   329
	(f+b)\cdot g)\cdot (a\cdot(d\cdot e)) \quad
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   330
	\{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   331
\]
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   332
The sequence clause will be called,
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   333
where a sub-call
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   334
\[
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   335
	\textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   336
\]
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   337
is made. The terms in the new accumulator $\{\ONE,\; f\cdot g \}$ come from
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   338
the two calls to $\textit{removeSeqTail}$:
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   339
\[
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   340
	\textit{removeSeqTail} \quad\;\; a \cdot(d\cdot e) \quad\;\; a \cdot(d\cdot e) 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   341
\]
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   342
and
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   343
\[
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   344
	\textit{removeSeqTail} \quad \;\; 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   345
	f\cdot(g\cdot (a \cdot(d\cdot e)))\quad  \;\; a \cdot(d\cdot e).
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   346
\]
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   347
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   348
The idea behind $\textit{removeSeqTail}$ is that
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   349
when pruning recursively, we need to ``zoom in''
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   350
to sub-expressions, and this ``zoom in'' needs to be performed
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   351
on the
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   352
accumulators as well, otherwise the deletion will not work.
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   353
The sub-call 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   354
$\textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \}$
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   355
is simpler, which will trigger the alternative clause, causing
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   356
a pruning on each element in $(\ONE+(f+b)\cdot g)$,
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   357
leaving us with $b\cdot g$ only.
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   358
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   359
Our new lexer with stronger simplification
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   360
uses $\textit{prune}$ by making it 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   361
the core component of the deduplicating function
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   362
called $\textit{distinctWith}$.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   363
$\textit{DistinctWith}$ ensures that all verbose
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   364
parts of a regular expression are pruned away.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   365
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   366
\begin{figure}[H]
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   367
\begin{lstlisting}[numbers=left]
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   368
    def turnIntoTerms(r: Rexp): List[Rexp] = r match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   369
      case SEQ(r1, r2)  => 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   370
        turnIntoTerms(r1).flatMap(r11 => furtherSEQ(r11, r2))
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   371
          case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   372
          case ZERO => Nil
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   373
          case _ => r :: Nil
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   374
    }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   375
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   376
    def distinctWith(rs: List[ARexp], 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   377
      pruneFunction: (ARexp, Set[Rexp]) => ARexp, 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   378
      acc: Set[Rexp] = Set()) : List[ARexp] = 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   379
        rs match{
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   380
          case Nil => Nil
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   381
          case r :: rs => 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   382
            if(acc(erase(r)))
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   383
              distinctWith(rs, pruneFunction, acc)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   384
            else {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   385
              val pruned_r = pruneFunction(r, acc)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   386
              pruned_r :: 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   387
              distinctWith(rs, 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   388
                pruneFunction, 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   389
                turnIntoTerms(erase(pruned_r)) ++: acc
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   390
                )
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   391
            }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   392
        }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   393
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   394
\end{lstlisting}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   395
\caption{A Stronger Version of $\textit{distinctBy}$. This function allows ``partial de-duplication''
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   396
of a regular expression. When part of a regular expression has appeared before in the accumulator, we 
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   397
can remove that verbose component.}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   398
\end{figure}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   399
\noindent
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   400
Once a regular expression has been pruned,
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   401
all its components will be added to the accumulator
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   402
to remove any future regular expressions' duplicate components.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   403
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   404
The function $\textit{bsimpStrong}$
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   405
is very much the same as $\textit{bsimp}$, just with
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   406
$\textit{distinctBy}$ replaced 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   407
by $\textit{distinctWith}$.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   408
\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   409
\begin{lstlisting}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   410
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   411
    def bsimpStrong(r: ARexp): ARexp =
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   412
    {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   413
      r match {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   414
        case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   415
        case (AZERO, _) => AZERO
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   416
        case (_, AZERO) => AZERO
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   417
        case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   418
        case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   419
        case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   420
        }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   421
        case AALTS(bs1, rs) => {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   422
          distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   423
            case Nil => AZERO
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   424
            case s :: Nil => fuse(bs1, s)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   425
            case rs => AALTS(bs1, rs)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   426
          }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   427
        }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   428
        case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   429
        case r => r
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   430
      }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   431
    }
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   432
    def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   433
        case Nil => r
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   434
        case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   435
      }
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   436
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   437
\end{lstlisting}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   438
\caption{The function
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   439
$\textit{bsimpStrong}$: a stronger version of $\textit{bsimp}$}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   440
\end{figure}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   441
\noindent
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   442
The benefits of using 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   443
$\textit{prune}$ refining the finiteness bound
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   444
to a cubic bound has not been formalised yet.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   445
Therefore we choose to use Scala code rather than an Isabelle-style formal 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   446
definition like we did for $\textit{bsimp}$, as the definitions might change
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   447
to suit our proof needs.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   448
%In the rest of the chapter we will use this convention consistently.
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   449
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   450
%The function $\textit{prune}$ is used in $\distinctWith$.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   451
%$\distinctWith$ is a stronger version of $\distinctBy$
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   452
%which not only removes duplicates as $\distinctBy$ would
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   453
%do, but also uses the $\textit{pruneFunction}$
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   454
%argument to prune away verbose components in a regular expression.\\
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   455
%\begin{figure}[H]
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   456
%\begin{lstlisting}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   457
%   //a stronger version of simp
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   458
%    def bsimpStrong(r: ARexp): ARexp =
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   459
%    {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   460
%      r match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   461
%        case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   462
%          //normal clauses same as simp
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   463
%        case (AZERO, _) => AZERO
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   464
%        case (_, AZERO) => AZERO
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   465
%        case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   466
%        //bs2 can be discarded
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   467
%        case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   468
%        case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   469
%        }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   470
%        case AALTS(bs1, rs) => {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   471
%          //distinctBy(flat_res, erase)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   472
%          distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   473
%            case Nil => AZERO
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   474
%            case s :: Nil => fuse(bs1, s)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   475
%            case rs => AALTS(bs1, rs)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   476
%          }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   477
%        }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   478
%        //stars that can be treated as 1
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   479
%        case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   480
%        case r => r
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   481
%      }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   482
%    }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   483
%    def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   484
%        case Nil => r
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   485
%        case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   486
%      }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   487
%\end{lstlisting}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   488
%\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   489
%\end{figure}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   490
%\noindent
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   491
%$\distinctWith$, is in turn used in $\bsimpStrong$:
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   492
%\begin{figure}[H]
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   493
%\begin{lstlisting}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   494
%      //Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3)
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   495
%      def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   496
%        case Nil => r
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   497
%        case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   498
%      }
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   499
%\end{lstlisting}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   500
%\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   501
%\end{figure}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   502
%\noindent
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   503
We conjecture that the above Scala function $\bdersStrongs$,
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   504
written $\bdersStrong{\_}{\_}$ as an infix notation, 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   505
satisfies the following property:
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   506
\begin{conjecture}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   507
	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   508
\end{conjecture}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   509
\noindent
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   510
The stronger version of $\blexersimp$'s
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   511
code in Scala looks like: 
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   512
\begin{figure}[H]
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   513
\begin{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   514
      def strongBlexer(r: Rexp, s: String) : Option[Val] = {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   515
        Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   516
      }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   517
      def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   518
        case Nil => {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   519
          if (bnullable(r)) {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   520
            mkepsBC(r)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   521
          }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   522
          else
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   523
            throw new Exception("Not matched")
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   524
        }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   525
        case c::cs => {
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   526
          strong_blex_simp(strongBsimp(bder(c, r)), cs)
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   527
        }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   528
      }
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   529
\end{lstlisting}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   530
\end{figure}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   531
\noindent
621
Chengsong
parents: 620
diff changeset
   532
We call this lexer $\blexerStrong$.
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   533
This version is able to reduce the
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   534
size of the derivatives which 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   535
otherwise 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   536
triggered exponential behaviour in
621
Chengsong
parents: 620
diff changeset
   537
$\blexersimp$.
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   538
Consider again the runtime for matching 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   539
$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   540
of the form $\protect\underbrace{aa..a}_{n}$.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   541
They produce the folloiwng graphs ($\blexerStrong$ on the left-hand-side and
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   542
$\blexersimp$ on the right-hand-side).
621
Chengsong
parents: 620
diff changeset
   543
\begin{figure}[H]
Chengsong
parents: 620
diff changeset
   544
\centering
Chengsong
parents: 620
diff changeset
   545
\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
Chengsong
parents: 620
diff changeset
   546
\begin{tikzpicture}
Chengsong
parents: 620
diff changeset
   547
\begin{axis}[
Chengsong
parents: 620
diff changeset
   548
    %xlabel={$n$},
Chengsong
parents: 620
diff changeset
   549
    myplotstyle,
Chengsong
parents: 620
diff changeset
   550
    xlabel={input length},
Chengsong
parents: 620
diff changeset
   551
    ylabel={size},
Chengsong
parents: 620
diff changeset
   552
    width = 7cm,
Chengsong
parents: 620
diff changeset
   553
    height = 5cm,
Chengsong
parents: 620
diff changeset
   554
    ]
Chengsong
parents: 620
diff changeset
   555
\addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};
Chengsong
parents: 620
diff changeset
   556
\end{axis}
Chengsong
parents: 620
diff changeset
   557
\end{tikzpicture}
Chengsong
parents: 620
diff changeset
   558
  &
Chengsong
parents: 620
diff changeset
   559
\begin{tikzpicture}
Chengsong
parents: 620
diff changeset
   560
\begin{axis}[
Chengsong
parents: 620
diff changeset
   561
    %xlabel={$n$},
Chengsong
parents: 620
diff changeset
   562
    myplotstyle,
Chengsong
parents: 620
diff changeset
   563
    xlabel={input length},
Chengsong
parents: 620
diff changeset
   564
    ylabel={size},
Chengsong
parents: 620
diff changeset
   565
    width = 7cm,
Chengsong
parents: 620
diff changeset
   566
    height = 5cm,
Chengsong
parents: 620
diff changeset
   567
    ]
Chengsong
parents: 620
diff changeset
   568
\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
Chengsong
parents: 620
diff changeset
   569
\end{axis}
Chengsong
parents: 620
diff changeset
   570
\end{tikzpicture}\\
Chengsong
parents: 620
diff changeset
   571
\multicolumn{2}{l}{}
Chengsong
parents: 620
diff changeset
   572
\end{tabular}    
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   573
\caption{}\label{fig:aaaaaStarStar}
621
Chengsong
parents: 620
diff changeset
   574
\end{figure}
Chengsong
parents: 620
diff changeset
   575
\noindent
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   576
We hope the correctness is preserved.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   577
%We would like to preserve the correctness like the one 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   578
%we had for $\blexersimp$:
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   579
The proof idea is to preserve the key lemma in chapter \ref{Bitcoded2}
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   580
such as in equation \eqref{eqn:cubicRule}.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   581
\begin{conjecture}\label{cubicConjecture}
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   582
	$\blexerStrong \;r \; s = \blexer\; r\;s$
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   583
\end{conjecture}
592
Chengsong
parents: 591
diff changeset
   584
\noindent
621
Chengsong
parents: 620
diff changeset
   585
The idea is to maintain key lemmas in
Chengsong
parents: 620
diff changeset
   586
chapter \ref{Bitcoded2} like
Chengsong
parents: 620
diff changeset
   587
$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   588
with the new rewriting rule 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   589
shown in figure \eqref{eqn:cubicRule} .
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   590
621
Chengsong
parents: 620
diff changeset
   591
In the next sub-section,
592
Chengsong
parents: 591
diff changeset
   592
we will describe why we 
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   593
believe a cubic size bound can be achieved with
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   594
the stronger simplification.
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   595
For this we give a short introduction to the
592
Chengsong
parents: 591
diff changeset
   596
partial derivatives,
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   597
which were invented by Antimirov \cite{Antimirov95},
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   598
and then link them with the result of the function
592
Chengsong
parents: 591
diff changeset
   599
$\bdersStrongs$.
Chengsong
parents: 591
diff changeset
   600
 
621
Chengsong
parents: 620
diff changeset
   601
\subsection{Antimirov's partial derivatives}
Chengsong
parents: 620
diff changeset
   602
Partial derivatives were first introduced by 
Chengsong
parents: 620
diff changeset
   603
Antimirov \cite{Antimirov95}.
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   604
They are very similar
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   605
to Brzozowski derivatives,
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   606
but split children of alternative regular expressions into 
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   607
multiple independent terms. This means the output of
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   608
partial derivatives is a 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   609
set of regular expressions, defined as follows
621
Chengsong
parents: 620
diff changeset
   610
\begin{center}  
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   611
	\begin{tabular}{lcl@{\hspace{-5mm}}l}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   612
		$\partial_x \; (r_1 \cdot r_2)$ & 
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   613
		$\dn$ & $(\partial_x \; r_1) \cdot r_2 \cup
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   614
		\partial_x \; r_2 \;$ & $ \textit{if} \; \; \nullable\; r_1$\\
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   615
				      & & $(\partial_x \; r_1)\cdot r_2 \quad\quad$ & $ 
621
Chengsong
parents: 620
diff changeset
   616
		      \textit{otherwise}$\\ 
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   617
		$\partial_x \; r^*$ & $\dn$ & $(\partial_x \; r) \cdot r^*$\\
621
Chengsong
parents: 620
diff changeset
   618
		$\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \; 
Chengsong
parents: 620
diff changeset
   619
		\textit{then} \;
Chengsong
parents: 620
diff changeset
   620
		\{ \ONE\} \;\;\textit{else} \; \varnothing$\\
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   621
		$\partial_x(r_1+r_2)$ & $=$ & $\partial_x(r_1) \cup \partial_x(r_2)$\\
621
Chengsong
parents: 620
diff changeset
   622
		$\partial_x(\ONE)$ & $=$ & $\varnothing$\\
Chengsong
parents: 620
diff changeset
   623
		$\partial_x(\ZERO)$ & $\dn$ & $\varnothing$\\
Chengsong
parents: 620
diff changeset
   624
	\end{tabular}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   625
\end{center}
621
Chengsong
parents: 620
diff changeset
   626
\noindent
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   627
The $\cdot$ in the example 
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   628
$(\partial_x \; r_1) \cdot r_2 $ 
621
Chengsong
parents: 620
diff changeset
   629
is a shorthand notation for the cartesian product 
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   630
$(\partial_x \; r_1) \times \{ r_2\}$.
621
Chengsong
parents: 620
diff changeset
   631
%Each element in the set generated by a partial derivative
Chengsong
parents: 620
diff changeset
   632
%corresponds to a (potentially partial) match
Chengsong
parents: 620
diff changeset
   633
%TODO: define derivatives w.r.t string s
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   634
Rather than joining the calculated derivatives $\partial_x r_1$ and $\partial_x r_2$ together
621
Chengsong
parents: 620
diff changeset
   635
using the $\sum$ constructor, Antimirov put them into
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   636
a set.  This means many subterms will be de-duplicated
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   637
because they are sets.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   638
For example, to compute what the derivative of the regular expression 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   639
$x^*(xx + y)^*$
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   640
w.r.t. $x$ is, one can compute a partial derivative
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   641
and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   642
from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
621
Chengsong
parents: 620
diff changeset
   643
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   644
The partial derivative w.r.t. a string is defined recursively:
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   645
\[
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   646
	\partial_{c::cs} r \dn \bigcup_{r'\in (\partial_c r)}
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   647
	\partial_{cs} r'
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   648
\]
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   649
Suppose an alphabet $\Sigma$, we use $\Sigma^*$ for the set of all possible strings
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   650
from the alphabet. 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   651
The set of all possible partial derivatives is then defined
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   652
as the union of derivatives w.r.t all the strings: 
621
Chengsong
parents: 620
diff changeset
   653
\begin{center}
Chengsong
parents: 620
diff changeset
   654
	\begin{tabular}{lcl}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   655
		$\textit{PDER}_{\Sigma^*} \; r $ & $\dn $ & $\bigcup_{w \in \Sigma^*}\partial_w \; r$
621
Chengsong
parents: 620
diff changeset
   656
	\end{tabular}
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   657
\end{center}
621
Chengsong
parents: 620
diff changeset
   658
\noindent
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   659
Consider now again our pathological case where we apply the more
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   660
aggressive 
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   661
simplification
621
Chengsong
parents: 620
diff changeset
   662
\begin{center}
Chengsong
parents: 620
diff changeset
   663
	$((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$
Chengsong
parents: 620
diff changeset
   664
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   665
let use abbreviate theis regular expression with $r$,
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   666
then we have that
621
Chengsong
parents: 620
diff changeset
   667
\begin{center}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   668
$\textit{PDER}_{\Sigma^*} \; r = 
621
Chengsong
parents: 620
diff changeset
   669
\bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{ 
Chengsong
parents: 620
diff changeset
   670
	(\underbrace{a \ldots a}_{\text{j a's}}\cdot
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   671
(\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot r \}$, 
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   672
\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   673
The union on the right-hand-side has $n * (n + 1) / 2$ terms.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   674
This leads us to believe that the maximum number of terms needed
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   675
in our derivative is also only $n * (n + 1) / 2$.
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   676
Therefore
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   677
we conjecture that $\bsimpStrong$ is also able to achieve this
639
80cc6dc4c98b until chap 7
Chengsong
parents: 638
diff changeset
   678
upper limit in general.
621
Chengsong
parents: 620
diff changeset
   679
\begin{conjecture}\label{bsimpStrongInclusionPder}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   680
	Using a suitable transformation $f$, we have that
621
Chengsong
parents: 620
diff changeset
   681
	\begin{center}
Chengsong
parents: 620
diff changeset
   682
		$\forall s.\; f \; (r \bdersStrong  \; s) \subseteq
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   683
		 \textit{PDER}_{\Sigma^*} \; r$
621
Chengsong
parents: 620
diff changeset
   684
	\end{center}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   685
	holds.
621
Chengsong
parents: 620
diff changeset
   686
\end{conjecture}
Chengsong
parents: 620
diff changeset
   687
\noindent
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   688
The reason is that our \eqref{eqn:cubicRule} will keep only one copy of each term,
621
Chengsong
parents: 620
diff changeset
   689
where the function $\textit{prune}$ takes care of maintaining
Chengsong
parents: 620
diff changeset
   690
a set like structure similar to partial derivatives.
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   691
%We might need to adjust $\textit{prune}$
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   692
%slightly to make sure all duplicate terms are eliminated,
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   693
%which should be doable.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   694
621
Chengsong
parents: 620
diff changeset
   695
Antimirov had proven that the sum of all the partial derivative 
Chengsong
parents: 620
diff changeset
   696
terms' sizes is bounded by the cubic of the size of that regular
Chengsong
parents: 620
diff changeset
   697
expression:
Chengsong
parents: 620
diff changeset
   698
\begin{property}\label{pderBound}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   699
	$\llbracket \textit{PDER}_{\Sigma^*} \; r \rrbracket \leq O(\llbracket r \rrbracket^3)$
621
Chengsong
parents: 620
diff changeset
   700
\end{property}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   701
\noindent
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   702
This property was formalised by Wu et al. \cite{Wu2014}, and the 
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   703
details can be found in the Archive of Formal Froofs\footnote{https://www.isa-afp.org/entries/Myhill-Nerode.html}.
621
Chengsong
parents: 620
diff changeset
   704
Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound}
630
d50a309a0645 with Christian
Chengsong
parents: 628
diff changeset
   705
would provide us with a cubic bound for our $\blexerStrong$ algorithm: 
621
Chengsong
parents: 620
diff changeset
   706
\begin{conjecture}\label{strongCubic}
Chengsong
parents: 620
diff changeset
   707
	$\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$
Chengsong
parents: 620
diff changeset
   708
\end{conjecture}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   709
\noindent
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   710
We leave this as future work.
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   711
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   712
621
Chengsong
parents: 620
diff changeset
   713
%To get all the "atomic" components of a regular expression's possible derivatives,
Chengsong
parents: 620
diff changeset
   714
%there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
Chengsong
parents: 620
diff changeset
   715
%whatever character is available at the head of the string inside the language of a
Chengsong
parents: 620
diff changeset
   716
%regular expression, and gives back the character and the derivative regular expression
Chengsong
parents: 620
diff changeset
   717
%as a pair (which he called "monomial"):
Chengsong
parents: 620
diff changeset
   718
% \begin{center}  
Chengsong
parents: 620
diff changeset
   719
% \begin{tabular}{ccc}
Chengsong
parents: 620
diff changeset
   720
% $\lf(\ONE)$ & $=$ & $\phi$\\
Chengsong
parents: 620
diff changeset
   721
%$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
Chengsong
parents: 620
diff changeset
   722
% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
Chengsong
parents: 620
diff changeset
   723
% $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
Chengsong
parents: 620
diff changeset
   724
%\end{tabular}
Chengsong
parents: 620
diff changeset
   725
%\end{center}
Chengsong
parents: 620
diff changeset
   726
%%TODO: completion
Chengsong
parents: 620
diff changeset
   727
%
Chengsong
parents: 620
diff changeset
   728
%There is a slight difference in the last three clauses compared
Chengsong
parents: 620
diff changeset
   729
%with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular 
Chengsong
parents: 620
diff changeset
   730
%expression $r$ with every element inside $\textit{rset}$ to create a set of 
Chengsong
parents: 620
diff changeset
   731
%sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates 
Chengsong
parents: 620
diff changeset
   732
%on a set of monomials (which Antimirov called "linear form") and a regular 
Chengsong
parents: 620
diff changeset
   733
%expression, and returns a linear form:
Chengsong
parents: 620
diff changeset
   734
% \begin{center}  
Chengsong
parents: 620
diff changeset
   735
% \begin{tabular}{ccc}
Chengsong
parents: 620
diff changeset
   736
% $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
Chengsong
parents: 620
diff changeset
   737
% $l \bigodot (\ONE)$ & $=$ & $l$\\
Chengsong
parents: 620
diff changeset
   738
% $\phi \bigodot t$ & $=$ & $\phi$\\
Chengsong
parents: 620
diff changeset
   739
% $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
Chengsong
parents: 620
diff changeset
   740
% $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
Chengsong
parents: 620
diff changeset
   741
%  $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
Chengsong
parents: 620
diff changeset
   742
% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
Chengsong
parents: 620
diff changeset
   743
% $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
Chengsong
parents: 620
diff changeset
   744
%\end{tabular}
Chengsong
parents: 620
diff changeset
   745
%\end{center}
Chengsong
parents: 620
diff changeset
   746
%%TODO: completion
Chengsong
parents: 620
diff changeset
   747
%
Chengsong
parents: 620
diff changeset
   748
% Some degree of simplification is applied when doing $\bigodot$, for example,
Chengsong
parents: 620
diff changeset
   749
% $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
Chengsong
parents: 620
diff changeset
   750
% and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
Chengsong
parents: 620
diff changeset
   751
%  $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
Chengsong
parents: 620
diff changeset
   752
%  and so on.
Chengsong
parents: 620
diff changeset
   753
%  
Chengsong
parents: 620
diff changeset
   754
%  With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with 
Chengsong
parents: 620
diff changeset
   755
%  an iterative procedure:
Chengsong
parents: 620
diff changeset
   756
%   \begin{center}  
Chengsong
parents: 620
diff changeset
   757
% \begin{tabular}{llll}
Chengsong
parents: 620
diff changeset
   758
%$\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
Chengsong
parents: 620
diff changeset
   759
% 		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
Chengsong
parents: 620
diff changeset
   760
%		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
Chengsong
parents: 620
diff changeset
   761
%$\partial_{UNIV}(r)$ & $=$ & $\PD$ &		     
Chengsong
parents: 620
diff changeset
   762
%\end{tabular}
Chengsong
parents: 620
diff changeset
   763
%\end{center}
Chengsong
parents: 620
diff changeset
   764
%
Chengsong
parents: 620
diff changeset
   765
%
Chengsong
parents: 620
diff changeset
   766
% $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   767
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   768
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   769
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   770
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   771
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   772
%	SECTION 2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   773
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   774
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   775
621
Chengsong
parents: 620
diff changeset
   776
%The closed form for them looks like:
Chengsong
parents: 620
diff changeset
   777
%%\begin{center}
Chengsong
parents: 620
diff changeset
   778
%%	\begin{tabular}{llrclll}
Chengsong
parents: 620
diff changeset
   779
%%		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
Chengsong
parents: 620
diff changeset
   780
%%		$\textit{rsimp}$ & $($ & $
Chengsong
parents: 620
diff changeset
   781
%%		\sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
Chengsong
parents: 620
diff changeset
   782
%%			 & & & & $\textit{nupdates} \;$ & 
Chengsong
parents: 620
diff changeset
   783
%%			 $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
Chengsong
parents: 620
diff changeset
   784
%%			 & & & & $)$ &\\
Chengsong
parents: 620
diff changeset
   785
%%			 & &  $)$ & & &\\
Chengsong
parents: 620
diff changeset
   786
%%			 & $)$ & & & &\\
Chengsong
parents: 620
diff changeset
   787
%%	\end{tabular}
Chengsong
parents: 620
diff changeset
   788
%%\end{center}
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   789
%\begin{center}
621
Chengsong
parents: 620
diff changeset
   790
%	\begin{tabular}{llrcllrllll}
Chengsong
parents: 620
diff changeset
   791
%		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\
Chengsong
parents: 620
diff changeset
   792
%			      &&&&$\textit{rsimp}$ & $($ & $
Chengsong
parents: 620
diff changeset
   793
%			      \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
Chengsong
parents: 620
diff changeset
   794
%					  &&&& & & & & $\;\; \textit{nupdates} \;$ & 
Chengsong
parents: 620
diff changeset
   795
%			 		  $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
Chengsong
parents: 620
diff changeset
   796
%					  &&&& & & & & $)$ &\\
Chengsong
parents: 620
diff changeset
   797
%					  &&&& & &  $)$ & & &\\
Chengsong
parents: 620
diff changeset
   798
%					  &&&& & $)$ & & & &\\
Chengsong
parents: 620
diff changeset
   799
%	\end{tabular}
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   800
%\end{center}
621
Chengsong
parents: 620
diff changeset
   801
%The $\textit{optermsimp}$ function with the argument $r$ 
Chengsong
parents: 620
diff changeset
   802
%chooses from two options: $\ZERO$ or 
Chengsong
parents: 620
diff changeset
   803
%We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$
Chengsong
parents: 620
diff changeset
   804
%and $\starupdates$:
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   805
%\begin{center}
621
Chengsong
parents: 620
diff changeset
   806
%	\begin{tabular}{lcl}
Chengsong
parents: 620
diff changeset
   807
%		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
Chengsong
parents: 620
diff changeset
   808
%		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
Chengsong
parents: 620
diff changeset
   809
%						     & & $\textit{if} \; 
Chengsong
parents: 620
diff changeset
   810
%						     (\rnullable \; (\rders \; r \; s))$ \\
Chengsong
parents: 620
diff changeset
   811
%						     & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
Chengsong
parents: 620
diff changeset
   812
%						     \starupdate \; c \; r \; Ss)$ \\
Chengsong
parents: 620
diff changeset
   813
%						     & & $\textit{else} \;\; (s @ [c]) :: (
Chengsong
parents: 620
diff changeset
   814
%						     \starupdate \; c \; r \; Ss)$
Chengsong
parents: 620
diff changeset
   815
%	\end{tabular}
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   816
%\end{center}
621
Chengsong
parents: 620
diff changeset
   817
%\noindent
Chengsong
parents: 620
diff changeset
   818
%As a generalisation from characters to strings,
Chengsong
parents: 620
diff changeset
   819
%$\starupdates$ takes a string instead of a character
Chengsong
parents: 620
diff changeset
   820
%as the first input argument, and is otherwise the same
Chengsong
parents: 620
diff changeset
   821
%as $\starupdate$.
Chengsong
parents: 620
diff changeset
   822
%\begin{center}
Chengsong
parents: 620
diff changeset
   823
%	\begin{tabular}{lcl}
Chengsong
parents: 620
diff changeset
   824
%		$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
Chengsong
parents: 620
diff changeset
   825
%		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
Chengsong
parents: 620
diff changeset
   826
%		\starupdate \; c \; r \; Ss)$
Chengsong
parents: 620
diff changeset
   827
%	\end{tabular}
Chengsong
parents: 620
diff changeset
   828
%\end{center}
Chengsong
parents: 620
diff changeset
   829
%\noindent
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   830
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   831
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   832
621
Chengsong
parents: 620
diff changeset
   833
%\section{Zippers}
Chengsong
parents: 620
diff changeset
   834
%Zipper is a data structure designed to operate on 
Chengsong
parents: 620
diff changeset
   835
%and navigate between local parts of a tree.
Chengsong
parents: 620
diff changeset
   836
%It was first formally described by Huet \cite{HuetZipper}.
Chengsong
parents: 620
diff changeset
   837
%Typical applications of zippers involve text editor buffers
Chengsong
parents: 620
diff changeset
   838
%and proof system databases.
Chengsong
parents: 620
diff changeset
   839
%In our setting, the idea is to compactify the representation
Chengsong
parents: 620
diff changeset
   840
%of derivatives with zippers, thereby making our algorithm faster.
Chengsong
parents: 620
diff changeset
   841
%Some initial results 
Chengsong
parents: 620
diff changeset
   842
%We first give a brief introduction to what zippers are,
Chengsong
parents: 620
diff changeset
   843
%and other works
Chengsong
parents: 620
diff changeset
   844
%that apply zippers to derivatives
Chengsong
parents: 620
diff changeset
   845
%When dealing with large trees, it would be a waste to 
Chengsong
parents: 620
diff changeset
   846
%traverse the entire tree if
Chengsong
parents: 620
diff changeset
   847
%the operation only
Chengsong
parents: 620
diff changeset
   848
%involves a small fraction of it.
Chengsong
parents: 620
diff changeset
   849
%The idea is to put the focus on that subtree, turning other parts
Chengsong
parents: 620
diff changeset
   850
%of the tree into a context
Chengsong
parents: 620
diff changeset
   851
%
Chengsong
parents: 620
diff changeset
   852
%
Chengsong
parents: 620
diff changeset
   853
%One observation about our derivative-based lexing algorithm is that
Chengsong
parents: 620
diff changeset
   854
%the derivative operation sometimes traverses the entire regular expression
Chengsong
parents: 620
diff changeset
   855
%unnecessarily:
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   856
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   857
612
Chengsong
parents: 596
diff changeset
   858
%----------------------------------------------------------------------------------------
Chengsong
parents: 596
diff changeset
   859
%	SECTION 1
Chengsong
parents: 596
diff changeset
   860
%----------------------------------------------------------------------------------------
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   861
612
Chengsong
parents: 596
diff changeset
   862
%\section{Adding Support for the Negation Construct, and its Correctness Proof}
Chengsong
parents: 596
diff changeset
   863
%We now add support for the negation regular expression:
Chengsong
parents: 596
diff changeset
   864
%\[			r ::=   \ZERO \mid  \ONE
Chengsong
parents: 596
diff changeset
   865
%			 \mid  c  
Chengsong
parents: 596
diff changeset
   866
%			 \mid  r_1 \cdot r_2
Chengsong
parents: 596
diff changeset
   867
%			 \mid  r_1 + r_2   
Chengsong
parents: 596
diff changeset
   868
%			 \mid r^*    
Chengsong
parents: 596
diff changeset
   869
%			 \mid \sim r
Chengsong
parents: 596
diff changeset
   870
%\]
Chengsong
parents: 596
diff changeset
   871
%The $\textit{nullable}$ function's clause for it would be 
Chengsong
parents: 596
diff changeset
   872
%\[
Chengsong
parents: 596
diff changeset
   873
%\textit{nullable}(~r) = \neg \nullable(r)
Chengsong
parents: 596
diff changeset
   874
%\]
Chengsong
parents: 596
diff changeset
   875
%The derivative would be
Chengsong
parents: 596
diff changeset
   876
%\[
Chengsong
parents: 596
diff changeset
   877
%~r \backslash c = ~ (r \backslash c)
Chengsong
parents: 596
diff changeset
   878
%\]
Chengsong
parents: 596
diff changeset
   879
% 
Chengsong
parents: 596
diff changeset
   880
%The most tricky part of lexing for the $~r$ regular expression
Chengsong
parents: 596
diff changeset
   881
% is creating a value for it.
Chengsong
parents: 596
diff changeset
   882
% For other regular expressions, the value aligns with the 
Chengsong
parents: 596
diff changeset
   883
% structure of the regular expression:
Chengsong
parents: 596
diff changeset
   884
% \[
Chengsong
parents: 596
diff changeset
   885
% \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
Chengsong
parents: 596
diff changeset
   886
% \]
Chengsong
parents: 596
diff changeset
   887
%But for the $~r$ regular expression, $s$ is a member of it if and only if
Chengsong
parents: 596
diff changeset
   888
%$s$ does not belong to $L(r)$. 
Chengsong
parents: 596
diff changeset
   889
%That means when there
Chengsong
parents: 596
diff changeset
   890
%is a match for the not regular expression, it is not possible to generate how the string $s$ matched
Chengsong
parents: 596
diff changeset
   891
%with $r$. 
Chengsong
parents: 596
diff changeset
   892
%What we can do is preserve the information of how $s$ was not matched by $r$,
Chengsong
parents: 596
diff changeset
   893
%and there are a number of options to do this.
Chengsong
parents: 596
diff changeset
   894
%
Chengsong
parents: 596
diff changeset
   895
%We could give a partial value when there is a partial match for the regular expression inside
Chengsong
parents: 596
diff changeset
   896
%the $\mathbf{not}$ construct.
Chengsong
parents: 596
diff changeset
   897
%For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
Chengsong
parents: 596
diff changeset
   898
%A value for it could be 
Chengsong
parents: 596
diff changeset
   899
% \[
Chengsong
parents: 596
diff changeset
   900
% \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
Chengsong
parents: 596
diff changeset
   901
% \]
Chengsong
parents: 596
diff changeset
   902
% The above example demonstrates what value to construct 
Chengsong
parents: 596
diff changeset
   903
% when the string $s$ is at most a real prefix
Chengsong
parents: 596
diff changeset
   904
% of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
Chengsong
parents: 596
diff changeset
   905
% in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
Chengsong
parents: 596
diff changeset
   906
% constructor.
Chengsong
parents: 596
diff changeset
   907
% 
Chengsong
parents: 596
diff changeset
   908
% Another option would be to either store the string $s$ that resulted in 
Chengsong
parents: 596
diff changeset
   909
% a mis-match for $r$ or a dummy value as a placeholder:
Chengsong
parents: 596
diff changeset
   910
% \[
Chengsong
parents: 596
diff changeset
   911
% \vdash \textit{Not}(abcd) : ~( r_1 )
Chengsong
parents: 596
diff changeset
   912
% \]
Chengsong
parents: 596
diff changeset
   913
%or
Chengsong
parents: 596
diff changeset
   914
% \[
Chengsong
parents: 596
diff changeset
   915
% \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )
Chengsong
parents: 596
diff changeset
   916
% \] 
Chengsong
parents: 596
diff changeset
   917
% We choose to implement this as it is most straightforward:
Chengsong
parents: 596
diff changeset
   918
% \[
Chengsong
parents: 596
diff changeset
   919
% \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
Chengsong
parents: 596
diff changeset
   920
% \]
Chengsong
parents: 596
diff changeset
   921
% 
Chengsong
parents: 596
diff changeset
   922
%
620
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   923
%\begin{center}
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   924
%	\begin{tabular}{lcl}
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   925
%		$\ntset \; r \; (n+1) \; c::cs $ & $\dn$ & $\nupdates \;
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   926
%		cs \; r \; [\Some \; ([c], n)]$\\
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   927
%		$\ntset \; r\; 0 \; \_$ &  $\dn$ &  $\None$\\
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   928
%		$\ntset \; r \; \_ \; [] $ & $ \dn$ & $[]$\\
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   929
%	\end{tabular}
ae6010c14e49 chap6 almost done
Chengsong
parents: 613
diff changeset
   930
%\end{center}
628
7af4e2420a8c ready to submit~~
Chengsong
parents: 625
diff changeset
   931