ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Sun, 18 Jun 2023 17:54:52 +0100
changeset 649 ef2b8abcbc55
parent 640 bd1354127574
child 650 a365d1364640
permissions -rwxr-xr-x
more
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
649
Chengsong
parents: 640
diff changeset
    11
\section{Overview}
Chengsong
parents: 640
diff changeset
    12
Chengsong
parents: 640
diff changeset
    13
This chapter
Chengsong
parents: 640
diff changeset
    14
is the point from which novel contributions of this PhD project are introduced
Chengsong
parents: 640
diff changeset
    15
in detail, 
Chengsong
parents: 640
diff changeset
    16
and previous
Chengsong
parents: 640
diff changeset
    17
chapters are essential background work for setting the scene of the formal proof we
Chengsong
parents: 640
diff changeset
    18
are about to describe.
Chengsong
parents: 640
diff changeset
    19
In particular, the correctness theorem 
Chengsong
parents: 640
diff changeset
    20
of the un-optimised bit-coded lexer $\blexer$ in 
Chengsong
parents: 640
diff changeset
    21
chapter \ref{Bitcoded1} formalised by Ausaf et al.
Chengsong
parents: 640
diff changeset
    22
relies on lemma \ref{retrieveStepwise} that says
Chengsong
parents: 640
diff changeset
    23
any value can be retrieved in a stepwise manner:
Chengsong
parents: 640
diff changeset
    24
\begin{center}	
Chengsong
parents: 640
diff changeset
    25
	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
Chengsong
parents: 640
diff changeset
    26
\end{center}
Chengsong
parents: 640
diff changeset
    27
This no longer holds once we introduce simplifications.
Chengsong
parents: 640
diff changeset
    28
To control the size of regular expressions during derivatives, 
Chengsong
parents: 640
diff changeset
    29
one has to eliminate redundant sub-expression with some
Chengsong
parents: 640
diff changeset
    30
procedure we call $\textit{simp}$, 
Chengsong
parents: 640
diff changeset
    31
and $\textit{simp}$ is defined as
Chengsong
parents: 640
diff changeset
    32
:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    33
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    34
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    35
649
Chengsong
parents: 640
diff changeset
    36
Having defined the $\textit{bsimp}$ function,
Chengsong
parents: 640
diff changeset
    37
we add it as a phase after a derivative is taken.
Chengsong
parents: 640
diff changeset
    38
\begin{center}
Chengsong
parents: 640
diff changeset
    39
	\begin{tabular}{lcl}
Chengsong
parents: 640
diff changeset
    40
		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
Chengsong
parents: 640
diff changeset
    41
	\end{tabular}
Chengsong
parents: 640
diff changeset
    42
\end{center}
Chengsong
parents: 640
diff changeset
    43
%Following previous notations
Chengsong
parents: 640
diff changeset
    44
%when extending from derivatives w.r.t.~character to derivative
Chengsong
parents: 640
diff changeset
    45
%w.r.t.~string, we define the derivative that nests simplifications 
Chengsong
parents: 640
diff changeset
    46
%with derivatives:%\comment{simp in  the [] case?}
Chengsong
parents: 640
diff changeset
    47
We extend this from characters to strings:
Chengsong
parents: 640
diff changeset
    48
\begin{center}
Chengsong
parents: 640
diff changeset
    49
\begin{tabular}{lcl}
Chengsong
parents: 640
diff changeset
    50
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
Chengsong
parents: 640
diff changeset
    51
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
Chengsong
parents: 640
diff changeset
    52
\end{tabular}
Chengsong
parents: 640
diff changeset
    53
\end{center}
Chengsong
parents: 640
diff changeset
    54
Chengsong
parents: 640
diff changeset
    55
\noindent
Chengsong
parents: 640
diff changeset
    56
The lexer that extracts bitcodes from the 
Chengsong
parents: 640
diff changeset
    57
derivatives with simplifications from our $\simp$ function
Chengsong
parents: 640
diff changeset
    58
is called $\blexersimp$:
Chengsong
parents: 640
diff changeset
    59
Chengsong
parents: 640
diff changeset
    60
\begin{center}
Chengsong
parents: 640
diff changeset
    61
Chengsong
parents: 640
diff changeset
    62
\begin{center}
Chengsong
parents: 640
diff changeset
    63
	\begin{tabular}{lcl}
Chengsong
parents: 640
diff changeset
    64
		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
Chengsong
parents: 640
diff changeset
    65
	\end{tabular}
Chengsong
parents: 640
diff changeset
    66
\end{center}
Chengsong
parents: 640
diff changeset
    67
\begin{tabular}{lcl}
Chengsong
parents: 640
diff changeset
    68
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
Chengsong
parents: 640
diff changeset
    69
      $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
Chengsong
parents: 640
diff changeset
    70
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
Chengsong
parents: 640
diff changeset
    71
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
Chengsong
parents: 640
diff changeset
    72
  & & $\;\;\textit{else}\;\textit{None}$
Chengsong
parents: 640
diff changeset
    73
\end{tabular}
Chengsong
parents: 640
diff changeset
    74
\end{center}
Chengsong
parents: 640
diff changeset
    75
\noindent
Chengsong
parents: 640
diff changeset
    76
the redundant sub-expressions after each derivative operation
Chengsong
parents: 640
diff changeset
    77
allows the exact structure of each intermediate result to be preserved,
Chengsong
parents: 640
diff changeset
    78
so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
Chengsong
parents: 640
diff changeset
    79
$\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$
Chengsong
parents: 640
diff changeset
    80
and $v^{c} \dn \inj \;r \; c \; v$).
Chengsong
parents: 640
diff changeset
    81
Chengsong
parents: 640
diff changeset
    82
Chengsong
parents: 640
diff changeset
    83
Define the 
Chengsong
parents: 640
diff changeset
    84
But $\blexersimp$ introduces simplification after the derivative
Chengsong
parents: 640
diff changeset
    85
to reduce redundancy,
Chengsong
parents: 640
diff changeset
    86
yielding $r \backslash c$ 
Chengsong
parents: 640
diff changeset
    87
This allows 
Chengsong
parents: 640
diff changeset
    88
Chengsong
parents: 640
diff changeset
    89
The proof details are necessary materials for this thesis
Chengsong
parents: 640
diff changeset
    90
because it provides necessary context to explain why we need a
Chengsong
parents: 640
diff changeset
    91
new framework for the proof of $\blexersimp$, which involves
Chengsong
parents: 640
diff changeset
    92
simplifications that cause structural changes to the regular expression.
Chengsong
parents: 640
diff changeset
    93
a new formal proof of the correctness of $\blexersimp$, where the 
Chengsong
parents: 640
diff changeset
    94
proof of $\blexer$
Chengsong
parents: 640
diff changeset
    95
is not applicatble in the sense that we cannot straightforwardly extend the
Chengsong
parents: 640
diff changeset
    96
proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does
Chengsong
parents: 640
diff changeset
    97
not hold anymore.
Chengsong
parents: 640
diff changeset
    98
This is because the structural induction on the stepwise correctness
Chengsong
parents: 640
diff changeset
    99
of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
Chengsong
parents: 640
diff changeset
   100
in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
Chengsong
parents: 640
diff changeset
   101
each other.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   102
In this chapter we introduce simplifications
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   103
for annotated regular expressions that can be applied to 
583
Chengsong
parents: 582
diff changeset
   104
each intermediate derivative result. This allows
Chengsong
parents: 582
diff changeset
   105
us to make $\blexer$ much more efficient.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   106
Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   107
but their simplification functions could have been more efficient and in some cases needed fixing.
649
Chengsong
parents: 640
diff changeset
   108
Chengsong
parents: 640
diff changeset
   109
Chengsong
parents: 640
diff changeset
   110
Chengsong
parents: 640
diff changeset
   111
Chengsong
parents: 640
diff changeset
   112
From this chapter we start with the main contribution of this thesis, which
Chengsong
parents: 640
diff changeset
   113
Chengsong
parents: 640
diff changeset
   114
o
Chengsong
parents: 640
diff changeset
   115
In particular, the $\blexer$ proof relies on a lockstep POSIX
Chengsong
parents: 640
diff changeset
   116
correspondence between the lexical value and the
Chengsong
parents: 640
diff changeset
   117
regular expression in each derivative and injection.
Chengsong
parents: 640
diff changeset
   118
Chengsong
parents: 640
diff changeset
   119
Chengsong
parents: 640
diff changeset
   120
which is essential for getting an understanding this thesis
Chengsong
parents: 640
diff changeset
   121
in chapter \ref{Bitcoded1}, which is necessary for understanding why
Chengsong
parents: 640
diff changeset
   122
the proof 
Chengsong
parents: 640
diff changeset
   123
Chengsong
parents: 640
diff changeset
   124
In this chapter,
Chengsong
parents: 640
diff changeset
   125
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   126
%We contrast our simplification function 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   127
%with Sulzmann and Lu's, indicating the simplicity of our algorithm.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   128
%This is another case for the usefulness 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   129
%and reliability of formal proofs on algorithms.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   130
%These ``aggressive'' simplifications would not be possible in the injection-based 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   131
%lexing we introduced in chapter \ref{Inj}.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   132
%We then prove the correctness with the improved version of 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   133
%$\blexer$, called $\blexersimp$, by establishing 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   134
%$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   135
%
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   136
\section{Simplifications by Sulzmann and Lu}
649
Chengsong
parents: 640
diff changeset
   137
The algorithms $\lexer$ and $\blexer$ work beautifully as functional 
Chengsong
parents: 640
diff changeset
   138
programs, but not as practical code. One main reason for the slowness is due
Chengsong
parents: 640
diff changeset
   139
to the size of intermediate representations--the derivative regular expressions
Chengsong
parents: 640
diff changeset
   140
tend to grow unbounded if the matching involved a large number of possible matches.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   141
Consider the derivatives of the following example $(a^*a^*)^*$:
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   142
%and $(a^* + (aa)^*)^*$:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   143
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   144
	\begin{tabular}{lcl}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   145
		$(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   146
		$ (a^*a^* + a^*)\cdot(a^*a^*)^*$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   147
			     & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   148
		$ \stackrel{\backslash a}{\longrightarrow} $ & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   149
	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   150
							     & $\stackrel{\backslash a}{
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   151
	\longrightarrow} $ & $\ldots$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   152
	\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   153
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   154
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   155
As can be seen, there are several duplications.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   156
A simple-minded simplification function cannot simplify
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   157
the third regular expression in the above chain of derivative
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   158
regular expressions, namely
583
Chengsong
parents: 582
diff changeset
   159
\begin{center}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   160
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
583
Chengsong
parents: 582
diff changeset
   161
\end{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   162
because the duplicates are
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   163
not next to each other, and therefore the rule
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   164
$r+ r \rightarrow r$ from $\textit{simp}$ does not fire.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   165
One would expect a better simplification function to work in the 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   166
following way:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   167
\begin{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   168
	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   169
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   170
	\bigg\downarrow (1) \\
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   171
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   172
	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   173
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   174
	\bigg\downarrow (2) \\
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   175
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   176
	)\cdot(a^*a^*)^*  
583
Chengsong
parents: 582
diff changeset
   177
	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   178
	\bigg\downarrow (3) \\
583
Chengsong
parents: 582
diff changeset
   179
	(a^*a^* + a^* 
Chengsong
parents: 582
diff changeset
   180
	)\cdot(a^*a^*)^*  
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   181
\end{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   182
\noindent
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   183
In the first step, the nested alternative regular expression
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   184
$(a^*a^* + a^*) + a^*$ is flattened into $a^*a^* + a^* + a^*$.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   185
Now the third term $a^*$ can clearly be identified as a duplicate
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   186
and therefore removed in the second step. 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   187
This causes the two
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   188
top-level terms to become the same and the second $(a^*a^*+a^*)\cdot(a^*a^*)^*$ 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   189
removed in the final step.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   190
Sulzmann and Lu's simplification function (using our notations) can achieve this
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   191
simplification:
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   192
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   193
	\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   194
		$\textit{simp}\_{SL} \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   195
		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   196
						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   197
		$\textit{simp}\_{SL} \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   198
		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   199
		\textit{then} \;\; \ZERO$\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   200
							    & & $\textit{else}\;\;_{bs}((\textit{simp}\_{SL} \;r_1)\cdot
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   201
							    (\textit{simp}\_{SL} \; r_2))$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   202
		$\textit{simp}\_{SL}  \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   203
		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   204
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   205
		$\textit{simp}\_{SL}  \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\textit{simp}\_{SL}  \; r)$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   206
		$\textit{simp}\_{SL}  \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   207
		(\nub \; (\filter \; (\neg\zeroable)\;((\textit{simp}\_{SL}  \; r) :: \map \; \textit{simp}\_{SL}  \; rs)))$\\ 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   208
		
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   209
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   210
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   211
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   212
The $\textit{zeroable}$ predicate 
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   213
tests whether the regular expression
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   214
is equivalent to $\ZERO$, and
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   215
can be defined as:
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   216
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   217
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   218
		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   219
		\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   220
		$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   221
		$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   222
		$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   223
		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   224
		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   225
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   226
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   227
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   228
The 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   229
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   230
	\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   231
		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   232
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   233
	\end{tabular}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   234
\end{center}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   235
\noindent
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   236
clause does flatten the alternative as required in step (1),
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   237
but $\textit{simp}\_{SL}$ is insufficient if we want to do steps (2) and (3),
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   238
as these ``identical'' terms have different bit-annotations.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   239
They also suggested that the $\textit{simp}\_{SL} $ function should be
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   240
applied repeatedly until a fixpoint is reached.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   241
We call this construction $\textit{SLSimp}$:
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   242
\begin{center}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   243
	\begin{tabular}{lcl}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   244
		$\textit{SLSimp} \; r$ & $\dn$ & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   245
		$\textit{while}((\textit{simp}\_{SL}  \; r)\; \cancel{=} \; r)$ \\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   246
					 & & $\quad r := \textit{simp}\_{SL}  \; r$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   247
		& & $\textit{return} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   248
	\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   249
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   250
We call the operation of alternatingly 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   251
applying derivatives and simplifications
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   252
(until the string is exhausted) Sulz-simp-derivative,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   253
written $\backslash_{SLSimp}$:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   254
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   255
\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   256
	$r \backslash_{SLSimp} (c\!::\!s) $ & $\dn$ & $(\textit{SLSimp} \; (r \backslash c)) \backslash_{SLSimp}\, s$ \\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   257
$r \backslash_{SLSimp} [\,] $ & $\dn$ & $r$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   258
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   259
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   260
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   261
After the derivatives have been taken, the bitcodes
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   262
are extracted and decoded in the same manner
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   263
as $\blexer$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   264
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   265
\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   266
  $\textit{blexer\_SLSimp}\;r\,s$ & $\dn$ &
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   267
      $\textit{let}\;a = (r^\uparrow)\backslash_{SLSimp}\, s\;\textit{in}$\\                
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   268
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   269
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   270
  & & $\;\;\textit{else}\;\textit{None}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   271
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   272
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   273
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   274
We implemented this lexing algorithm in Scala, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   275
and found that the final derivative regular expression
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   276
size still grows exponentially (note the logarithmic scale):
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   277
\begin{figure}[H]
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   278
	\centering
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   279
\begin{tikzpicture}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   280
\begin{axis}[
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   281
    xlabel={$n$},
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   282
    ylabel={size},
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   283
    ymode = log,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   284
    legend entries={Final Derivative Size},  
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   285
    legend pos=north west,
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   286
    legend cell align=left]
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   287
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   288
\end{axis}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   289
\end{tikzpicture} 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   290
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   291
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   292
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   293
\end{figure}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   294
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   295
At $n= 20$ we already get an out-of-memory error with Scala's normal 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   296
JVM heap size settings.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   297
In fact their simplification does not improve much over
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   298
the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   299
The time required also grows exponentially:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   300
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   301
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   302
\begin{tikzpicture}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   303
\begin{axis}[
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   304
    xlabel={$n$},
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   305
    ylabel={time},
601
Chengsong
parents: 600
diff changeset
   306
    %ymode = log,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   307
    legend entries={time in secs},  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   308
    legend pos=north west,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   309
    legend cell align=left]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   310
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexerTime.data};
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   311
\end{axis}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   312
\end{tikzpicture} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   313
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   314
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   315
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexerTime}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   316
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   317
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   318
which seems like a counterexample for 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   319
Sulzmann and Lu's linear complexity claim
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   320
in their paper \cite{Sulzmann2014}:
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   321
\begin{quote}\it
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   322
``Linear-Time Complexity Claim \\It is easy to see that each call of one of the functions/operations:
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   323
simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input.'' 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   324
\end{quote}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   325
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   326
The assumption that the size of the regular expressions
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   327
in the algorithm
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   328
would stay below a finite constant is not true, at least not in the
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   329
examples we considered.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   330
The main reason behind this is that (i) Haskell's $\textit{nub}$
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   331
function requires identical annotations between two 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   332
annotated regular expressions to qualify as duplicates,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   333
and therefore cannot simplify cases like $_{SZZ}a^*+_{SZS}a^*$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   334
even if both $a^*$ denote the same language, and
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   335
(ii) the ``flattening'' only applies to the head of the list
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   336
in the 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   337
\begin{center}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   338
	\begin{tabular}{lcl}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   339
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   340
		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   341
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   342
	\end{tabular}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   343
\end{center}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   344
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   345
clause, and therefore is not strong enough to simplify all
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   346
needed parts of the regular expression. Moreover,
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   347
the $\textit{simp}\_{SL}$ function is applied repeatedly
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   348
in each derivative step until a fixed point is reached, 
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   349
which makes the algorithm even more
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   350
unpredictable and inefficient.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   351
%To not get ``caught off guard'' by
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   352
%these counterexamples,
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   353
%one needs to be more careful when designing the
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   354
%simplification function and making claims about them.
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   355
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   356
\section{Our $\textit{Simp}$ Function}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   357
We will now introduce our own simplification function.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   358
%by making a contrast with $\textit{simp}\_{SL}$.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   359
We also describe
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   360
the ideas behind Sulzmann and Lu's $\textit{simp}\_{SL}$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   361
algorithm 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   362
and why it fails to achieve the desired effect of keeping the sizes of derivatives finitely bounded. 
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   363
In addition, our simplification function will come with a formal
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   364
correctness proof.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   365
\subsection{Flattening Nested Alternatives}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   366
The idea behind the clause
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   367
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   368
	$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   369
	       _{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   370
\end{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   371
is that it allows
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   372
duplicate removal of regular expressions at different
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   373
``levels'' of alternatives.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   374
For example, this would help with the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   375
following simplification:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   376
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   377
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   378
$(a+r)+r \longrightarrow a+r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   379
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   380
The problem is that only the head element
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   381
is ``spilled out''.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   382
It is more desirable
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   383
to flatten
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   384
an entire list to open up possibilities for further simplifications
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   385
with later regular expressions.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   386
Not flattening the rest of the elements also means that
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   387
the later de-duplication process 
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   388
does not fully remove further duplicates.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   389
For example,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   390
using $\textit{simp}\_{SL}$ we cannot
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   391
simplify
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   392
\begin{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   393
	$((a^* a^*)+\underline{(a^* + a^*)})\cdot (a^*a^*)^*+
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   394
((a^*a^*)+a^*)\cdot (a^*a^*)^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   395
\end{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   396
due to the underlined part not being the head 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   397
of the alternative.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   398
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   399
We define our flatten operation so that it flattens 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   400
the entire list: 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   401
 \begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   402
  \begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   403
  $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   404
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   405
  $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   406
    $\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise) 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   407
\end{tabular}    
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   408
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   409
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   410
Our $\flts$ operation 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   411
also throws away $\ZERO$s
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   412
as they do not contribute to a lexing result.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   413
\subsection{Duplicate Removal}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   414
After flattening is done, we can deduplicate.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   415
The de-duplicate function is called $\distinctBy$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   416
and that is where we make our second improvement over
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   417
Sulzmann and Lu's simplification method.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   418
The process goes as follows:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   419
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   420
$rs \stackrel{\textit{flts}}{\longrightarrow} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   421
rs_{flat} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   422
\xrightarrow{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   423
rs_{flat} \; \rerases\; \varnothing} rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   424
%\stackrel{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   425
%rs_{flat} \; \erase\; \varnothing}{\longrightarrow} \; rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   426
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   427
where the $\distinctBy$ function is defined as:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   428
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   429
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   430
		$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   431
		$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & $\quad \textit{if} (f \; x \in acc)\;\; \textit{then} \;\; \distinctBy \; xs \; f \; acc$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   432
						       & & $\quad \textit{else}\;\; x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$ 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   433
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   434
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   435
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   436
The reason we define a distinct function under a mapping $f$ is because
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   437
we want to eliminate regular expressions that are syntactically the same,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   438
but have different bit-codes.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   439
For example, we can remove the second $a^*a^*$ from
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   440
$_{ZSZ}a^*a^* + _{SZZ}a^*a^*$, because it
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   441
represents a match with shorter initial sub-match 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   442
(and therefore is definitely not POSIX),
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   443
and will be discarded by $\bmkeps$ later.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   444
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   445
	$_{ZSZ}\underbrace{a^*}_{ZS:\; match \; 1\; times\quad}\underbrace{a^*}_{Z: \;match\; 1 \;times} + 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   446
	_{SZZ}\underbrace{a^*}_{S: \; match \; 0 \; times\quad}\underbrace{a^*}_{ZZ: \; match \; 2 \; times}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   447
	$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   448
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   449
%$_{bs1} r_1 + _{bs2} r_2 \text{where} (r_1)_{\downarrow} = (r_2)_{\downarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   450
Due to the way our algorithm works,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   451
the matches that conform to the POSIX standard 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   452
will always be placed further to the left. When we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   453
traverse the list from left to right,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   454
regular expressions we have already seen
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   455
will definitely not contribute to a POSIX value,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   456
even if they are attached with different bitcodes.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   457
These duplicates therefore need to be removed.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   458
To achieve this, we call $\rerases$ as the function $f$ during the distinction
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   459
operation. The function
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   460
$\rerases$ is very similar to $\erase$, except that it preserves the structure
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   461
when erasing an alternative regular expression.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   462
The reason why we use $\rerases$ instead of $\erase$ is that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   463
it keeps the structures of alternative 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   464
annotated regular expressions
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   465
whereas $\erase$ would turn it back into a binary  tree structure.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   466
Not having to mess with the structure 
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   467
greatly simplifies the finiteness proof in chapter 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   468
\ref{Finite}.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   469
We give the definitions of $\rerases$ here together with
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   470
the new datatype used by $\rerases$ (as our plain
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   471
regular expression datatype does not allow non-binary alternatives).
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   472
For now we can think of 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   473
$\rerases$ as the function $(\_)_\downarrow$ defined in chapter \ref{Bitcoded1}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   474
and $\rrexp$ as plain regular expressions, but having a general list constructor
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   475
for alternatives:
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   476
\begin{figure}[H]
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   477
\begin{center}	
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   478
	$\rrexp ::=   \RZERO \mid  \RONE
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   479
			 \mid  \RCHAR{c}  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   480
			 \mid  \RSEQ{r_1}{r_2}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   481
			 \mid  \RALTS{rs}
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   482
			 \mid \RSTAR{r}        $
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   483
\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   484
\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   485
constructor}\label{rrexpDef}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   486
\end{figure}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   487
The function $\rerases$ we define as follows:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   488
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   489
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   490
$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   491
$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   492
	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   493
$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   494
$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   495
$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   496
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   497
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   498
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   499
\subsection{Putting Things Together}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   500
We can now give the definition of our  simplification function:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   501
%that looks somewhat similar to our Scala code is 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   502
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   503
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   504
   
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   505
	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   506
	  $\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \rerases \; \varnothing) $ \\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   507
   $\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   508
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   509
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   510
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   511
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   512
The simplification (named $\textit{bsimp}$ for \emph{b}it-coded) 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   513
does a pattern matching on the regular expression.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   514
When it detects that the regular expression is an alternative or
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   515
sequence, it will try to simplify its children regular expressions
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   516
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   517
$\ONE$, which might trigger further simplification at the current level.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   518
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   519
using rules such as  $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   520
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   521
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   522
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   523
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   524
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   525
   &&$\quad\textit{case} \;  (_{bs1}\ONE, a_2') \Rightarrow  \textit{fuse} \; (bs@bs_1) \;  a_2'$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   526
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   527
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   528
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   529
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   530
The most involved part is the $\sum$ clause, where we first call $\flts$ on
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   531
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$,
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   532
and then call $\distinctBy$ on that list. The predicate used in $\distinctBy$ for determining whether two 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   533
elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   534
Finally, depending on whether the regular expression list $as'$ has turned into a
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   535
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{ALTS}$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   536
decides whether to keep the current level constructor $\sum$ as it is, and 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   537
removes it when there are fewer than two elements:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   538
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   539
	\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   540
		$\textit{bsimp}_{ALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   541
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   542
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   543
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   544
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   545
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   546
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   547
Having defined the $\textit{bsimp}$ function,
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   548
we add it as a phase after a derivative is taken.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   549
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   550
	\begin{tabular}{lcl}
649
Chengsong
parents: 640
diff changeset
   551
		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   552
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   553
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   554
%Following previous notations
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   555
%when extending from derivatives w.r.t.~character to derivative
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   556
%w.r.t.~string, we define the derivative that nests simplifications 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   557
%with derivatives:%\comment{simp in  the [] case?}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   558
We extend this from characters to strings:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   559
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   560
\begin{tabular}{lcl}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   561
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   562
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   563
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   564
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   565
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   566
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   567
The lexer that extracts bitcodes from the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   568
derivatives with simplifications from our $\simp$ function
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   569
is called $\blexersimp$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   570
\begin{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   571
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   572
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   573
      $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   574
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   575
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   576
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   577
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   578
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   579
\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   580
This algorithm keeps the regular expression size small, 
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   581
as we shall demonstrate with some examples in the next section.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   582
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   583
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   584
\subsection{Examples $(a+aa)^*$ and $(a^*\cdot a^*)^*$
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   585
After Simplification}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   586
Recall the
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   587
previous $(a^*a^*)^*$ example
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   588
where $\textit{simp}\_{SL}$ could not
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   589
prevent the fast growth (over
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   590
3 million nodes just below $20$ input length)
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   591
will be reduced to just 15 and stays constant no matter how long the
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   592
input string is.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   593
This is shown in the graphs below.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   594
\begin{figure}[H]
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   595
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   596
\begin{tabular}{ll}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   597
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   598
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   599
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   600
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   601
        width=7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   602
    height=4cm, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   603
    legend entries={Lexer with $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   604
    legend pos=  south east,
Chengsong
parents: 538
diff changeset
   605
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   606
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
Chengsong
parents: 538
diff changeset
   607
\end{axis}
Chengsong
parents: 538
diff changeset
   608
\end{tikzpicture} %\label{fig:BitcodedLexer}
Chengsong
parents: 538
diff changeset
   609
&
Chengsong
parents: 538
diff changeset
   610
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   611
\begin{axis}[
Chengsong
parents: 538
diff changeset
   612
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   613
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   614
    width = 7cm,
Chengsong
parents: 538
diff changeset
   615
    height = 4cm,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   616
    legend entries={Lexer with $\textit{simp}\_{SL}$},  
539
Chengsong
parents: 538
diff changeset
   617
    legend pos=  north west,
Chengsong
parents: 538
diff changeset
   618
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   619
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
Chengsong
parents: 538
diff changeset
   620
\end{axis}
Chengsong
parents: 538
diff changeset
   621
\end{tikzpicture} 
Chengsong
parents: 538
diff changeset
   622
\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   623
\end{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   624
\caption{Our Improvement over Sulzmann and Lu's in terms of size of the derivatives.}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   625
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   626
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   627
Given the size difference, it is not
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   628
surprising that our $\blexersimp$ significantly outperforms
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   629
$\textit{blexer\_SLSimp}$ by Sulzmann and Lu.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   630
In the next section we are going to establish that our
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   631
simplification preserves the correctness of the algorithm.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   632
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   633
%	SECTION rewrite relation
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   634
%----------------------------------------------------------------------------------------
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   635
\section{Correctness of $\blexersimp$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   636
We first introduce the rewriting relation \emph{rrewrite}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   637
($\rrewrite$) between two regular expressions,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   638
which stands for an atomic
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   639
simplification.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   640
We then prove properties about
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   641
this rewriting relation and its reflexive transitive closure.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   642
Finally we leverage these properties to show
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   643
an equivalence between the results generated by
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   644
$\blexer$ and $\blexersimp$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   645
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   646
\subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   647
In the $\blexer$'s correctness proof, we
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   648
did not directly derive the fact that $\blexer$ generates the POSIX value,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   649
but first proved that $\blexer$ generates the same result as $\lexer$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   650
Then we re-use
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   651
the correctness of $\lexer$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   652
to obtain 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   653
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   654
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   655
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   656
	r\;s = \None$.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   657
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   658
%\begin{center}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   659
%	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   660
%\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   661
Here we apply this
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   662
modularised technique again
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   663
by first proving that
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   664
$\blexersimp \; r \; s $ 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   665
produces the same output as $\blexer \; r\; s$,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   666
and then piecing it together with 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   667
$\blexer$'s correctness to achieve our main
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   668
theorem:
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   669
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   670
	$(r, s) \rightarrow v \; \;   \textit{iff} \;\;  \blexersimp \; r \; s = \Some \;v$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   671
	\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   672
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   673
	r\;s = \None$
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   674
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   675
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   676
The overall idea for the proof
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   677
of $\blexer \;r \;s = \blexersimp \; r \;s$ 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   678
is that the transition from $r$ to $\textit{bsimp}\; r$ can be
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   679
broken down into smaller rewrite steps of the form:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   680
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   681
	$r \rightsquigarrow^* \textit{bsimp} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   682
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   683
where each rewrite step, written $\rightsquigarrow$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   684
is an ``atomic'' simplification that
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   685
is similar to a small-step reduction in operational semantics (
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   686
see figure \ref{rrewriteRules} for the rules):
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   687
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   688
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   689
	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   690
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   691
	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   692
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   693
	\inferrule * [Right = $S_1$]{\vspace{0em}}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   694
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   695
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   696
	
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   697
	\inferrule * [Right = $SL$] {\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   698
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   699
	\inferrule * [Right = $SR$] {\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   700
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   701
	\inferrule * [Right = $A0$] {\vspace{0em}}{ _{bs}\sum [] \rightsquigarrow \ZERO}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   702
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   703
	\inferrule * [Right = $A1$] {\vspace{0em}}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   704
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   705
	\inferrule * [Right = $AL$] {\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   706
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   707
	\inferrule * [Right = $LE$] {\vspace{0em}}{ [] \stackrel{s}{\rightsquigarrow} []}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   708
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   709
	\inferrule * [Right = $LT$] {rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{ r :: rs_1 \stackrel{s}{\rightsquigarrow} r :: rs_2 }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   710
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   711
	\inferrule * [Right = $LH$] {r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   712
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   713
	\inferrule * [Right = $L\ZERO$] {\vspace{0em}}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   714
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   715
	\inferrule * [Right = $LS$] {\vspace{0em}}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   716
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   717
	\inferrule * [Right = $LD$] {\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rs_c \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   718
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   719
\end{mathpar}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   720
\caption{
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   721
The rewrite rules that generate simplified regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   722
in small steps: $r_1 \rightsquigarrow r_2$ is for bitcoded regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   723
and $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$ for 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   724
lists of bitcoded regular expressions. 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   725
Interesting is the LD rule that allows copies of regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   726
to be removed provided a regular expression 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   727
earlier in the list can match the same strings.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   728
}\label{rrewriteRules}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   729
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   730
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   731
The rules $LT$ and $LH$ are for rewriting two regular expression lists
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   732
such that one regular expression
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   733
in the left-hand-side list is rewritable in one step
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   734
to the right-hand side's regular expression at the same position.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   735
This helps with defining the ``context rule'' $AL$.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   736
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   737
The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   738
are defined in the usual way:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   739
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   740
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   741
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   742
	\inferrule{\vspace{0em}}{ r \rightsquigarrow^* r \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   743
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   744
	\inferrule{\vspace{0em}}{rs \stackrel{s*}{\rightsquigarrow} rs \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   745
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   746
	\inferrule{r_1 \rightsquigarrow^*  r_2 \land \; r_2 \rightsquigarrow^* r_3}{r_1 \rightsquigarrow^* r_3\\}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   747
	
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   748
	\inferrule{rs_1 \stackrel{s*}{\rightsquigarrow}  rs_2 \land \; rs_2 \stackrel{s*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{s*}{\rightsquigarrow} rs_3}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   749
\end{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   750
\caption{The Reflexive Transitive Closure of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   751
$\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   752
\end{figure}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   753
%Two rewritable terms will remain rewritable to each other
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   754
%even after a derivative is taken:
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   755
The main point of our rewriting relation
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   756
is that it is preserved under derivatives,
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   757
namely
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   758
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   759
	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   760
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   761
And also, if two terms are rewritable to each other,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   762
then they produce the same bitcodes:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   763
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   764
	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   765
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   766
The decoding phase of both $\blexer$ and $\blexersimp$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   767
are the same, which means that if they receive the same
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   768
bitcodes before the decoding phase,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   769
they generate the same value after decoding is done.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   770
We will prove the three properties 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   771
we mentioned above in the next sub-section.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   772
\subsection{Important Properties of $\rightsquigarrow$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   773
First we prove some basic facts 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   774
about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   775
$\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   776
which will be needed later.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   777
The inference rules (\ref{rrewriteRules}) we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   778
gave in the previous section 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   779
have their ``many-steps version'':
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   780
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   781
\begin{lemma}\label{squig1}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   782
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   783
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   784
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   785
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   786
		\item
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   787
			$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\;  _{bs} \sum (r' :: rs)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   788
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   789
		\item
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   790
			The rewriting in many steps property is composable 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   791
			in terms of the sequence constructor:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   792
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   793
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;  
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   794
			_{bs} r_2 \cdot r_3 \quad $ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   795
			and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   796
			$\quad r_3 \rightsquigarrow^* r_4 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   797
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* _{bs} \; r_1 \cdot r_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   798
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   799
			The rewriting in many steps properties 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   800
			$\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   801
			is preserved under the function $\fuse$:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   802
				$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   803
				\implies \fuse \; bs \; r_1 \rightsquigarrow^* \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   804
				\fuse \; bs \; r_2 \quad  $ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   805
				$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   806
				\implies \map \; (\fuse \; bs) \; rs_1 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   807
				\stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   808
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   809
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   810
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   811
	By an induction on 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   812
	the inductive cases of $\stackrel{s*}{\rightsquigarrow}$ and $\rightsquigarrow^*$ respectively.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   813
	The third and fourth points are 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   814
	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   815
	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   816
	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   817
	which can be inductively proven by the inductive cases of $\rightsquigarrow$ and 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   818
	$\stackrel{s}{\rightsquigarrow}$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   819
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   820
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   821
The inference rules of $\stackrel{s}{\rightsquigarrow}$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   822
are defined in terms of the list cons operation, where
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   823
we establish that the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   824
$\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   825
relation is also preserved w.r.t appending and prepending of a list.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   826
In addition, we
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   827
also prove some relations 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   828
between $\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   829
\begin{lemma}\label{ssgqTossgs}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   830
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   831
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   832
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   833
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   834
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   835
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   836
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   837
			rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2 \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   838
			\textit{and} \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   839
			rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   840
			
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   841
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   842
			The $\stackrel{s}{\rightsquigarrow} $ relation after appending 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   843
			a list becomes $\stackrel{s*}{\rightsquigarrow}$:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   844
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   845
			\implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   846
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   847
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   848
			$r_1 \rightsquigarrow^* r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   849
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   850
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   851
			$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^* r_2 \implies
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   852
			r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   853
		\item			
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   854
			If we can rewrite a regular expression 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   855
			in many steps to $\ZERO$, then 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   856
			we can also rewrite any sequence containing it to $\ZERO$:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   857
			$r_1 \rightsquigarrow^* \ZERO 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   858
			\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   859
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   860
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   861
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   862
	The first part is by induction on the list $rs$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   863
	The second part is by induction on the inductive cases of $\stackrel{s*}{\rightsquigarrow}$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   864
	The third part is 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   865
	by rule induction of $\stackrel{s}{\rightsquigarrow}$.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   866
	The fourth sub-lemma is 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   867
	by rule induction of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   868
	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three. 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   869
	The fifth part is a corollary of part four.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   870
	The last part is proven by rule induction again on $\rightsquigarrow^*$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   871
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   872
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   873
Now we are ready to give the proofs of the following properties:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   874
\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   875
	\item
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   876
		$r \rightsquigarrow^* r'\land \bnullable \; r_1 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   877
		\implies \bmkeps \; r = \bmkeps \; r'$. \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   878
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   879
		$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   880
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   881
		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   882
\end{itemize}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   883
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   884
\subsubsection{Property 1: $r \rightsquigarrow^* r'\land \bnullable \; r_1 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   885
		\implies \bmkeps \; r = \bmkeps \; r'$}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   886
Intuitively, this property says we can 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   887
extract the same bitcodes using $\bmkeps$ from the nullable
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   888
components of two regular expressions $r$ and $r'$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   889
if we can rewrite from one to the other in finitely
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   890
many steps.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   891
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   892
For convenience, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   893
we define a predicate for a list of regular expressions
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   894
having at least one nullable regular expression:
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   895
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   896
	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   897
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   898
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   899
The rewriting relation $\rightsquigarrow$ preserves (b)nullability:
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   900
\begin{lemma}\label{rewritesBnullable}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   901
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   902
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   903
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   904
			$\text{If} \; r_1 \rightsquigarrow r_2, \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   905
			\text{then} \; \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   906
		\item 	
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   907
			$\text{If} \; rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   908
			\text{then} \; \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   909
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   910
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   911
			\implies \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   912
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   913
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   914
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   915
	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   916
	The third point is a result of the second.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   917
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   918
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   919
For convenience again,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   920
we define $\bmkepss$ on a list $rs$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   921
which extracts the bit-codes on the first $\bnullable$ element in $rs$:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   922
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   923
	\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   924
		$\bmkepss \; [] $ & $\dn$ & $[]$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   925
		$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \;(\bnullable \; r) \;\; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   926
		\textit{then} \;\; \bmkeps \; r \; \textit{else} \;\; \bmkepss \; rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   927
	\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   928
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   929
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   930
If both regular expressions in a rewriting relation are nullable, then they 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   931
produce the same bitcodes:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   932
\begin{lemma}\label{rewriteBmkepsAux}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   933
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   934
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   935
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   936
			$r_1 \rightsquigarrow r_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   937
			(\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 = 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   938
			\bmkeps \; r_2)$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   939
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   940
			and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   941
			$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   942
			\implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   943
			\bmkepss \; rs_1 = \bmkepss \; rs2)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   944
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   945
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   946
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   947
	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   948
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   949
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   950
With lemma \ref{rewriteBmkepsAux} in place we are ready to prove its
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   951
many-step version: 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   952
\begin{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   953
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   954
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   955
\begin{proof}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   956
	By rule induction of $\stackrel{*}{\rightsquigarrow} $. Lemma 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   957
	$\ref{rewritesBnullable}$ gives us both $r$ and $r'$ are nullable.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   958
	The lemma \ref{rewriteBmkepsAux} solves the inductive case.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   959
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   960
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   961
\subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   962
Now we get to the key part of the proof, 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   963
which says that our simplification's helper functions 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   964
such as $\distinctBy$ and $\flts$ describe
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   965
reducts of $\stackrel{s*}{\rightsquigarrow}$ and 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   966
$\rightsquigarrow^* $.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   967
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   968
The first lemma to prove is a more general version of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   969
$rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   970
\begin{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   971
	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   972
	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   973
\end{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   974
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   975
It says that for a list made of two parts $rs_1 @ rs_2$, 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   976
one can throw away the duplicate
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   977
elements in $rs_2$, as well as those that have appeared in $rs_1$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   978
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   979
	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   980
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   981
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   982
Setting $rs_2$ to be empty,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   983
we get the corollary
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   984
\begin{corollary}\label{dBPreserves}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   985
	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   986
\end{corollary}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   987
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   988
Similarly the flatten function $\flts$ describes a reduct of
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   989
$\stackrel{s*}{\rightsquigarrow}$ as well:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   990
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   991
\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   992
	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   993
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   994
\begin{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   995
	By an induction on $rs$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   996
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   997
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   998
The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   999
\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1000
	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1001
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1002
\noindent
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1003
The simplification function
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1004
$\textit{bsimp}$ only transforms the regular expression  using steps specified by 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1005
$\rightsquigarrow^*$ and nothing else:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1006
\begin{lemma}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1007
	$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1008
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1009
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1010
	By an induction on $r$.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1011
	The most involved case is the alternative, 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1012
	where we use lemmas \ref{bsimpaltsPreserves},
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1013
	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1014
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1015
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1016
			$rs$ &  $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1017
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1018
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1019
			(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1020
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1021
	\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1022
	Using this we can derive the following rewrite sequence:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1023
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1024
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1025
			$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1026
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1027
			    & $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1028
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1029
			    (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1030
			    \; \rerases \; \phi)$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1031
			    %& $\rightsquigarrow^*$ & $ _{bs} \sum (\distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1032
				%(\flts \; (\map \; \textit{bsimp}\; rs)) \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1033
				%\rerases \; \;\phi) $\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1034
			    & $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1035
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
  1036
	\end{center}	
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1037
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
  1038
\subsubsection{Property 3: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1039
The rewrite relation 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1040
$\rightsquigarrow$ changes into $\stackrel{*}{\rightsquigarrow}$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1041
after derivatives are taken on both sides:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1042
\begin{lemma}\label{rewriteBder}
588
Chengsong
parents: 586
diff changeset
  1043
	\hspace{0em}
Chengsong
parents: 586
diff changeset
  1044
	\begin{itemize}
Chengsong
parents: 586
diff changeset
  1045
		\item
Chengsong
parents: 586
diff changeset
  1046
			If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c 
Chengsong
parents: 586
diff changeset
  1047
			\rightsquigarrow^*  r_2 \backslash c$ 
Chengsong
parents: 586
diff changeset
  1048
		\item	
Chengsong
parents: 586
diff changeset
  1049
			If $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$, then $ 
Chengsong
parents: 586
diff changeset
  1050
			\map \; (\_\backslash c) \; rs_1 
Chengsong
parents: 586
diff changeset
  1051
			\stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
Chengsong
parents: 586
diff changeset
  1052
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1053
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1054
\begin{proof}
588
Chengsong
parents: 586
diff changeset
  1055
	By induction on $\rightsquigarrow$ 
Chengsong
parents: 586
diff changeset
  1056
	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
Chengsong
parents: 586
diff changeset
  1057
\end{proof}
Chengsong
parents: 586
diff changeset
  1058
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1059
Now we can prove property 3 as an immediate corollary:
588
Chengsong
parents: 586
diff changeset
  1060
\begin{corollary}\label{rewritesBder}
Chengsong
parents: 586
diff changeset
  1061
	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*   
Chengsong
parents: 586
diff changeset
  1062
	r_2 \backslash c$
Chengsong
parents: 586
diff changeset
  1063
\end{corollary}
Chengsong
parents: 586
diff changeset
  1064
\begin{proof}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1065
	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and   lemma \ref{rewriteBder}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1066
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1067
\noindent
588
Chengsong
parents: 586
diff changeset
  1068
This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1069
to obtain the correspondence between
588
Chengsong
parents: 586
diff changeset
  1070
$\blexer$ and $\blexersimp$'s intermediate
Chengsong
parents: 586
diff changeset
  1071
derivative regular expressions 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1072
\begin{lemma}\label{bderBderssimp}
588
Chengsong
parents: 586
diff changeset
  1073
	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1074
\end{lemma}
588
Chengsong
parents: 586
diff changeset
  1075
\begin{proof}
Chengsong
parents: 586
diff changeset
  1076
	By an induction on $s$.
Chengsong
parents: 586
diff changeset
  1077
\end{proof}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1078
\subsection{Main Theorem}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1079
Now with \ref{bderBderssimp} in place we are ready for the main theorem.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1080
\begin{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1081
	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1082
\end{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1083
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1084
\begin{proof}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1085
	We can rewrite in many steps from the original lexer's 
588
Chengsong
parents: 586
diff changeset
  1086
	derivative regular expressions to the 
Chengsong
parents: 586
diff changeset
  1087
	lexer with simplification applied (by lemma \ref{bderBderssimp}):
Chengsong
parents: 586
diff changeset
  1088
	\begin{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1089
		$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $.
588
Chengsong
parents: 586
diff changeset
  1090
	\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1091
	We know that they generate the same bits, if the lexing result is a match:
588
Chengsong
parents: 586
diff changeset
  1092
	\begin{center}
Chengsong
parents: 586
diff changeset
  1093
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
  1094
		\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
Chengsong
parents: 586
diff changeset
  1095
	\end{center}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1096
	Now that they generate the same bits, we know they also give the same value after decoding.
588
Chengsong
parents: 586
diff changeset
  1097
	\begin{center}
Chengsong
parents: 586
diff changeset
  1098
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
  1099
		\implies \decode \; r \; (\bmkeps \; (a \backslash s)) = 
Chengsong
parents: 586
diff changeset
  1100
		\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
Chengsong
parents: 586
diff changeset
  1101
	\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1102
	Which is required by our proof goal:
588
Chengsong
parents: 586
diff changeset
  1103
	\begin{center}
Chengsong
parents: 586
diff changeset
  1104
		$\blexer \; r \; s = \blexersimp \; r \; s$.
Chengsong
parents: 586
diff changeset
  1105
	\end{center}	
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1106
\end{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1107
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1108
As a corollary,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1109
we can link this result with the lemma we proved earlier that 
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1110
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1111
	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = \Some \;v$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1112
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1113
	r\;s = \None$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1114
\end{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1115
and obtain the property that the bit-coded lexer with simplification is
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1116
indeed correctly generating a POSIX lexing result, if such a result exists.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1117
\begin{corollary}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1118
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s = \Some \; v$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1119
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1120
	r\;s = \None$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1121
\end{corollary}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1122
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1123
\subsection{Comments on the Proof Techniques Used}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1124
Straightforward as the proof may seem,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1125
the efforts we spent obtaining it were far from trivial.
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1126
We initially attempted to re-use the argument 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1127
in \cref{flex_retrieve}. 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1128
The problem is that both functions $\inj$ and $\retrieve$ require 
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1129
that the annotated regular expressions stay unsimplified, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1130
so that one can 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1131
correctly compare $v_{i+1}$ and $r_i$  and $v_i$ 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1132
in diagram \ref{graph:inj}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1133
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1134
We also tried to prove 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1135
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1136
$\textit{bsimp} \;\; (\bderssimp{a}{s}) = 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1137
\textit{bsimp} \;\;  (a\backslash s)$,
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1138
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1139
but this turns out to be not true.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1140
A counterexample is
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1141
\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\; 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1142
	\text{and} \;\; s = bb.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1143
\]
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1144
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1145
Then we would have 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1146
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1147
	$\textit{bsimp}\;\; ( a \backslash s )$ =
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1148
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1149
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1150
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1151
whereas 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1152
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1153
	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =  
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1154
	$_{Z}(_{Z} \ONE + _{S} c)$.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1155
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1156
Unfortunately, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1157
if we apply $\textit{bsimp}$ differently
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1158
we will always have this discrepancy. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1159
This is due to 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1160
the $\map \; (\fuse\; bs) \; as$ operation 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1161
happening at different locations in the regular expression.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1162
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1163
The rewriting relation 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1164
$\rightsquigarrow^*$ 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1165
allows us to ignore this discrepancy
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1166
and view the expressions 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1167
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1168
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1169
	and\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1170
	$_{Z}(_{Z} \ONE + _{S} c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1171
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1172
\end{center}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1173
as equal because they were both re-written
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1174
from the same expression.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1175
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1176
The simplification rewriting rules
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1177
given in \ref{rrewriteRules} are by no means
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1178
final,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1179
one could come up with new rules
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1180
such as 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1181
$\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1182
\SEQs [r_1, r_2, r_3]$.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1183
However this does not fit with the proof technique
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1184
of our main theorem, but seem to not violate the POSIX
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1185
property.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1186
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1187
Having established the correctness of our
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1188
$\blexersimp$, in the next chapter we shall prove that with our $\simp$ function,
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1189
for a given $r$, the derivative size is always
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1190
finitely bounded by a constant.