ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Thu, 01 Sep 2022 23:47:37 +0100
changeset 591 b2d0de6aee18
parent 590 988e92a70704
child 600 fd068f39ac23
permissions -rwxr-xr-x
more polishing integrated comments chap2
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
583
Chengsong
parents: 582
diff changeset
    14
In this chapter we introduce the simplifications
Chengsong
parents: 582
diff changeset
    15
on annotated regular expressions that can be applied to 
Chengsong
parents: 582
diff changeset
    16
each intermediate derivative result. This allows
Chengsong
parents: 582
diff changeset
    17
us to make $\blexer$ much more efficient.
Chengsong
parents: 582
diff changeset
    18
We contrast this simplification function 
Chengsong
parents: 582
diff changeset
    19
with Sulzmann and Lu's original
Chengsong
parents: 582
diff changeset
    20
simplifications, indicating the simplicity of our algorithm and
Chengsong
parents: 582
diff changeset
    21
improvements we made, demostrating
Chengsong
parents: 582
diff changeset
    22
the usefulness and reliability of formal proofs on algorithms.
Chengsong
parents: 582
diff changeset
    23
These ``aggressive'' simplifications would not be possible in the injection-based 
Chengsong
parents: 582
diff changeset
    24
lexing we introduced in chapter \ref{Inj}.
Chengsong
parents: 582
diff changeset
    25
We then go on to prove the correctness with the improved version of 
Chengsong
parents: 582
diff changeset
    26
$\blexer$, called $\blexersimp$, by establishing 
Chengsong
parents: 582
diff changeset
    27
$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    28
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    29
\section{Simplifications by Sulzmann and Lu}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    30
The first thing we notice in the fast growth of examples such as $(a^*a^*)^*$'s
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    31
and $(a^* + (aa)^*)^*$'s derivatives is that a lot of duplicated sub-patterns
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    32
are scattered around different levels, and therefore requires 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    33
de-duplication at different levels:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    34
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    35
	$(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} (a^*a^* + a^*)\cdot(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} $\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    36
	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} \ldots$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    37
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    38
\noindent
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    39
As we have already mentioned in \ref{eqn:growth2},
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    40
a simple-minded simplification function cannot simplify
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    41
the third regular expression in the above chain of derivative
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    42
regular expressions:
583
Chengsong
parents: 582
diff changeset
    43
\begin{center}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    44
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
583
Chengsong
parents: 582
diff changeset
    45
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    46
one would expect a better simplification function to work in the 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    47
following way:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    48
\begin{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    49
	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    50
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    51
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    52
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    53
	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    54
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    55
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    56
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    57
	)\cdot(a^*a^*)^*  
583
Chengsong
parents: 582
diff changeset
    58
	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
Chengsong
parents: 582
diff changeset
    59
	\bigg\downarrow \\
Chengsong
parents: 582
diff changeset
    60
	(a^*a^* + a^* 
Chengsong
parents: 582
diff changeset
    61
	)\cdot(a^*a^*)^*  
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    62
\end{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    63
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    64
This motivating example came from testing Sulzmann and Lu's 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    65
algorithm: their simplification does 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    66
not work!
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    67
We quote their $\textit{simp}$ function verbatim here:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    68
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    69
	\begin{tabular}{lcl}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    70
		$\simpsulz \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    71
		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    72
						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    73
		$\simpsulz \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    74
		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    75
		\textit{then} \;\; \ZERO$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    76
					     & & $\textit{else}\;\;_{bs}((\simpsulz \;r_1)\cdot
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    77
					     (\simpsulz \; r_2))$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    78
		$\simpsulz  \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    79
		$\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    80
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    81
		$\simpsulz  \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simpsulz  \; r)$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    82
		$\simpsulz  \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    83
		(\nub \; (\filter \; (\not \circ \zeroable)\;((\simpsulz  \; r) :: \map \; \simpsulz  \; rs)))$\\ 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    84
		
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    85
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    86
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    87
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    88
the $\textit{zeroable}$ predicate 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    89
which tests whether the regular expression
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    90
is equivalent to $\ZERO$,
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    91
is defined as:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    92
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    93
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    94
		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    95
		\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    96
		$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    97
		$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    98
		$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    99
		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   100
		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   101
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   102
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   103
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   104
They suggested that the $\simpsulz $ function should be
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   105
applied repeatedly until a fixpoint is reached.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   106
We call this construction $\textit{sulzSimp}$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   107
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   108
	\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   109
		$\textit{sulzSimp} \; r$ & $\dn$ & 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   110
		$\textit{while}((\simpsulz  \; r)\; \cancel{=} \; r)$ \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   111
		& & $\quad r := \simpsulz  \; r$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   112
		& & $\textit{return} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   113
	\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   114
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   115
We call the operation of alternatingly 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   116
applying derivatives and simplifications
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   117
(until the string is exhausted) Sulz-simp-derivative,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   118
written $\backslash_{sulzSimp}$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   119
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   120
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   121
	$r \backslash_{sulzSimp} (c\!::\!s) $ & $\dn$ & $(\textit{sulzSimp} \; (r \backslash c)) \backslash_{sulzSimp}\, s$ \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   122
$r \backslash_{sulzSimp} [\,] $ & $\dn$ & $r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   123
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   124
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   125
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   126
After the derivatives have been taken, the bitcodes
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   127
are extracted and decoded in the same manner
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   128
as $\blexer$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   129
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   130
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   131
  $\textit{blexer\_sulzSimp}\;r\,s$ & $\dn$ &
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   132
      $\textit{let}\;a = (r^\uparrow)\backslash_{sulzSimp}\, s\;\textit{in}$\\                
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   133
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   134
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   135
  & & $\;\;\textit{else}\;\textit{None}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   136
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   137
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   138
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   139
We implemented this lexing algorithm in Scala, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   140
and found that the final derivative regular expression
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   141
size grows exponentially fast:
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   142
\begin{figure}[H]
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   143
	\centering
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   144
\begin{tikzpicture}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   145
\begin{axis}[
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   146
    xlabel={$n$},
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   147
    ylabel={size},
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   148
    ymode = log,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   149
    legend entries={Final Derivative Size},  
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   150
    legend pos=north west,
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   151
    legend cell align=left]
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   152
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   153
\end{axis}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   154
\end{tikzpicture} 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   155
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   156
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   157
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   158
\end{figure}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   159
\noindent
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   160
At $n= 20$ we already get an out of memory error with Scala's normal 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   161
JVM heap size settings.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   162
In fact their simplification does not improve over
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   163
the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   164
The time required also grows exponentially:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   165
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   166
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   167
\begin{tikzpicture}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   168
\begin{axis}[
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   169
    xlabel={$n$},
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   170
    ylabel={time},
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   171
    ymode = log,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   172
    legend entries={time in secs},  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   173
    legend pos=north west,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   174
    legend cell align=left]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   175
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexerTime.data};
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   176
\end{axis}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   177
\end{tikzpicture} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   178
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   179
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   180
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexerTime}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   181
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   182
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   183
which seems like a counterexample for 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   184
their linear complexity claim:
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   185
\begin{quote}\it
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   186
Linear-Time Complexity Claim \\It is easy to see that each call of one of the functions/operations:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   187
simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   188
\end{quote}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   189
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   190
The assumption that the size of the regular expressions
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   191
in the algorithm
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   192
would stay below a finite constant is not ture.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   193
In addition to that, even if the regular expressions size
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   194
do stay finite, one has to take into account that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   195
the $\simpsulz$ function is applied many times
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   196
in each derivative step, and that number is not necessarily
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   197
a constant with respect to the size of the regular expression.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   198
To not get ``caught off guard'' by
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   199
these counterexamples,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   200
one needs to be more careful when designing the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   201
simplification function and making claims about them.
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   202
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   203
\section{Our $\textit{Simp}$ Function}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   204
We will now introduce our simplification function,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   205
by making a contrast with $\simpsulz$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   206
We describe
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   207
the ideas behind components in their algorithm 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   208
and why they fail to achieve the desired effect, followed
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   209
by our solution. These solutions come with correctness
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   210
statements that are backed up by formal proofs.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   211
\subsection{Flattening Nested Alternatives}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   212
The idea behind the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   213
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   214
$\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   215
	       _{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   216
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   217
clause is that it allows
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   218
duplicate removal of regular expressions at different
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   219
levels.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   220
For example, this would help with the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   221
following simplification:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   222
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   223
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   224
$(a+r)+r \longrightarrow a+r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   225
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   226
The problem here is that only the head element
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   227
is ``spilled out'',
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   228
whereas we would want to flatten
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   229
an entire list to open up possibilities for further simplifications.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   230
Not flattening the rest of the elements also means that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   231
the later de-duplication processs 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   232
does not fully remove apparent duplicates.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   233
For example,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   234
using $\simpsulz$ we could not 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   235
simplify
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   236
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   237
$((a^* a^*)+ (a^* + a^*))\cdot (a^*a^*)^*+
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   238
((a^*a^*)+a^*)\cdot (a^*a^*)^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   239
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   240
due to the underlined part not in the first element
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   241
of the alternative.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   242
We define a flatten operation that flattens not only 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   243
the first regular expression of an alternative,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   244
but the entire list: 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   245
 \begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   246
  \begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   247
  $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   248
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   249
  $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   250
    $\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise) 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   251
\end{tabular}    
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   252
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   253
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   254
Our $\flts$ operation 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   255
also throws away $\ZERO$s
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   256
as they do not contribute to a lexing result.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   257
\subsection{Duplicate Removal}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   258
After flattening is done, we are ready to deduplicate.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   259
The de-duplicate function is called $\distinctBy$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   260
and that is where we make our second improvement over
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   261
Sulzmann and Lu's.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   262
The process goes as follows:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   263
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   264
$rs \stackrel{\textit{flts}}{\longrightarrow} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   265
rs_{flat} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   266
\xrightarrow{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   267
rs_{flat} \; \rerases\; \varnothing} rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   268
%\stackrel{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   269
%rs_{flat} \; \erase\; \varnothing}{\longrightarrow} \; rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   270
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   271
where the $\distinctBy$ function is defined as:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   272
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   273
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   274
		$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   275
		$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & $\quad \textit{if} (f \; x \in acc)\;\; \textit{then} \;\; \distinctBy \; xs \; f \; acc$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   276
						       & & $\quad \textit{else}\;\; x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$ 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   277
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   278
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   279
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   280
The reason we define a distinct function under a mapping $f$ is because
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   281
we want to eliminate regular expressions that are syntactically the same,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   282
but with different bit-codes.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   283
For example, we can remove the second $a^*a^*$ from
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   284
$_{ZSZ}a^*a^* + _{SZZ}a^*a^*$, because it
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   285
represents a match with shorter initial sub-match 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   286
(and therefore is definitely not POSIX),
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   287
and will be discarded by $\bmkeps$ later.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   288
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   289
	$_{ZSZ}\underbrace{a^*}_{ZS:\; match \; 1\; times\quad}\underbrace{a^*}_{Z: \;match\; 1 \;times} + 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   290
	_{SZZ}\underbrace{a^*}_{S: \; match \; 0 \; times\quad}\underbrace{a^*}_{ZZ: \; match \; 2 \; times}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   291
	$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   292
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   293
%$_{bs1} r_1 + _{bs2} r_2 \text{where} (r_1)_{\downarrow} = (r_2)_{\downarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   294
Due to the way our algorithm works,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   295
the matches that conform to the POSIX standard 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   296
will always be placed further to the left. When we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   297
traverse the list from left to right,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   298
regular expressions we have already seen
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   299
will definitely not contribute to a POSIX value,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   300
even if they are attached with different bitcodes.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   301
These duplicates therefore need to be removed.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   302
To achieve this, we call $\rerases$ as the function $f$ during the distinction
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   303
operation.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   304
$\rerases$ is very similar to $\erase$, except that it preserves the structure
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   305
when erasing an alternative regular expression.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   306
The reason why we use $\rerases$ instead of $\erase$ is that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   307
it keeps the structures of alternative 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   308
annotated regular expressions
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   309
whereas $\erase$ would turn it back into a binary structure.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   310
Not having to mess with the structure 
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   311
greatly simplifies the finiteness proof in chapter 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   312
\ref{Finite} (we will follow up with more details there).
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   313
We give the definitions of $\rerases$ here together with
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   314
the new datatype used by $\rerases$ (as our plain
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   315
regular expression datatype does not allow non-binary alternatives).
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   316
For the moment the reader can just think of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   317
$\rerases$ as $\erase$ and $\rrexp$ as plain regular expressions.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   318
\begin{figure}[H]
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   319
\begin{center}	
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   320
	$\rrexp ::=   \RZERO \mid  \RONE
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   321
			 \mid  \RCHAR{c}  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   322
			 \mid  \RSEQ{r_1}{r_2}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   323
			 \mid  \RALTS{rs}
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   324
			 \mid \RSTAR{r}        $
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   325
\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   326
\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   327
constructor}\label{rrexpDef}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   328
\end{figure}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   329
The notation of $\rerases$ also follows that of $\erase$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   330
which is a postfix operator written as a subscript,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   331
except that it has an \emph{r} attached to it to distinguish against $\erase$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   332
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   333
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   334
$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   335
$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   336
	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   337
$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   338
$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   339
$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   340
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   341
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   342
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   343
\subsection{Putting Things Together}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   344
A recursive definition of our  simplification function 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   345
is given below:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   346
%that looks somewhat similar to our Scala code is 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   347
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   348
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   349
   
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   350
	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   351
	  $\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \rerases \; \varnothing) $ \\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   352
   $\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   353
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   354
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   355
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   356
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   357
The simplification (named $\textit{bsimp}$ for \emph{b}it-coded) 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   358
does a pattern matching on the regular expression.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   359
When it detected that the regular expression is an alternative or
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   360
sequence, it will try to simplify its children regular expressions
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   361
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   362
$\ONE$, which might trigger further simplification at the current level.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   363
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   364
using rules such as  $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   365
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   366
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   367
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   368
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   369
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   370
   &&$\quad\textit{case} \;  (_{bs1}\ONE, a_2') \Rightarrow  \textit{fuse} \; (bs@bs_1) \;  a_2'$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   371
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   372
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   373
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   374
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   375
The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   376
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   377
and then call $\distinctBy$ on that list, the predicate determining whether two 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   378
elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   379
Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   380
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   381
decides whether to keep the current level constructor $\sum$ as it is, and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   382
removes it when there are less than two elements:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   383
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   384
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   385
		$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   386
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   387
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   388
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   389
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   390
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   391
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   392
Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   393
we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   394
so it stays small:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   395
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   396
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   397
		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   398
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   399
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   400
%Following previous notations
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   401
%when extending from derivatives w.r.t.~character to derivative
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   402
%w.r.t.~string, we define the derivative that nests simplifications 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   403
%with derivatives:%\comment{simp in  the [] case?}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   404
We extend this from character to string:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   405
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   406
\begin{tabular}{lcl}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   407
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   408
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   409
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   410
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   411
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   412
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   413
The lexer that extracts bitcodes from the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   414
derivatives with simplifications from our $\simp$ function
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   415
is called $\blexersimp$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   416
\begin{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   417
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   418
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   419
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   420
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   421
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   422
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   423
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   424
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   425
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   426
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   427
This algorithm keeps the regular expression size small.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   428
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   429
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   430
\subsection{$(a+aa)^*$ and $(a^*\cdot a^*)^*$  against 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   431
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   432
For example,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   433
with our simplification the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   434
previous $(a^*a^*)^*$ example
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   435
where $\simpsulz$ could not
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   436
stop the fast growth (over
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   437
3 million nodes just below $20$ input length)
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   438
will be reduced to just 15 and stays constant, no matter how long the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   439
input string is.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   440
This is demonstrated in the graphs below.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   441
\begin{figure}[H]
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   442
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   443
\begin{tabular}{ll}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   444
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   445
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   446
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   447
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   448
        width=7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   449
    height=4cm, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   450
    legend entries={Lexer with $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   451
    legend pos=  south east,
Chengsong
parents: 538
diff changeset
   452
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   453
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
Chengsong
parents: 538
diff changeset
   454
\end{axis}
Chengsong
parents: 538
diff changeset
   455
\end{tikzpicture} %\label{fig:BitcodedLexer}
Chengsong
parents: 538
diff changeset
   456
&
Chengsong
parents: 538
diff changeset
   457
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   458
\begin{axis}[
Chengsong
parents: 538
diff changeset
   459
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   460
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   461
    width = 7cm,
Chengsong
parents: 538
diff changeset
   462
    height = 4cm,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   463
    legend entries={Lexer with $\simpsulz$},  
539
Chengsong
parents: 538
diff changeset
   464
    legend pos=  north west,
Chengsong
parents: 538
diff changeset
   465
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   466
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
Chengsong
parents: 538
diff changeset
   467
\end{axis}
Chengsong
parents: 538
diff changeset
   468
\end{tikzpicture} 
Chengsong
parents: 538
diff changeset
   469
\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   470
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   471
\caption{Our Improvement over Sulzmann and Lu's in terms of size}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   472
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   473
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   474
Given the size difference, it is not
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   475
surprising that our $\blexersimp$ significantly outperforms
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   476
$\textit{blexer\_sulzSimp}$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   477
In the next section we are going to establish the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   478
first important property of our lexer--the correctness.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   479
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   480
%	SECTION rewrite relation
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   481
%----------------------------------------------------------------------------------------
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   482
\section{Correctness of $\blexersimp$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   483
In this section we give details
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   484
of the correctness proof of $\blexersimp$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   485
an important contribution of this thesis.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   486
We first introduce the rewriting relation \emph{rrewrite}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   487
($\rrewrite$) between two regular expressions,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   488
which expresses an atomic
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   489
simplification step from the left-hand-side
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   490
to the right-hand-side.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   491
We then prove properties about
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   492
this rewriting relation and its reflexive transitive closure.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   493
Finally we leverage these properties to show
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   494
an equivalence between the internal data structures of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   495
$\blexer$ and $\blexersimp$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   496
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   497
\subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   498
In the $\blexer$'s correctness proof, we
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   499
did not directly derive the fact that $\blexer$ gives out the POSIX value,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   500
but first proved that $\blexer$ is linked with $\lexer$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   501
Then we re-use
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   502
the correctness of $\lexer$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   503
to obtain
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   504
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   505
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   506
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   507
Here we apply this
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   508
modularised technique again
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   509
by first proving that
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   510
$\blexersimp \; r \; s $ 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   511
produces the same output as $\blexer \; r\; s$,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   512
and then piecing it together with 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   513
$\blexer$'s correctness to achieve our main
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   514
theorem:\footnote{ the case when 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   515
$s$ is not in $L \; r$, is routine to establish }
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   516
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   517
	$(r, s) \rightarrow v \; \;   \textit{iff} \;\;  \blexersimp \; r \; s = v$
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   518
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   519
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   520
The overall idea for the proof
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   521
of $\blexer \;r \;s = \blexersimp \; r \;s$ 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   522
is that the transition from $r$ to $\textit{bsimp}\; r$ can be
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   523
broken down into finitely many rewrite steps:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   524
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   525
	$r \rightsquigarrow^* \textit{bsimp} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   526
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   527
where each rewrite step, written $\rightsquigarrow$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   528
is an ``atomic'' simplification that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   529
cannot be broken down any further:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   530
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   531
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   532
	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   533
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   534
	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   535
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   536
	\inferrule * [Right = $S_1$]{\vspace{0em}}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   537
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   538
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   539
	
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   540
	\inferrule * [Right = $SL$] {\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   541
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   542
	\inferrule * [Right = $SR$] {\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   543
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   544
	\inferrule * [Right = $A0$] {\vspace{0em}}{ _{bs}\sum [] \rightsquigarrow \ZERO}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   545
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   546
	\inferrule * [Right = $A1$] {\vspace{0em}}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   547
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   548
	\inferrule * [Right = $AL$] {\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   549
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   550
	\inferrule * [Right = $LE$] {\vspace{0em}}{ [] \stackrel{s}{\rightsquigarrow} []}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   551
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   552
	\inferrule * [Right = $LT$] {rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{ r :: rs_1 \stackrel{s}{\rightsquigarrow} r :: rs_2 }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   553
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   554
	\inferrule * [Right = $LH$] {r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   555
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   556
	\inferrule * [Right = $L\ZERO$] {\vspace{0em}}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   557
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   558
	\inferrule * [Right = $LS$] {\vspace{0em}}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   559
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   560
	\inferrule * [Right = $LD$] {\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rs_c \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   561
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   562
\end{mathpar}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   563
\caption{
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   564
The rewrite rules that generate simplified regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   565
in small steps: $r_1 \rightsquigarrow r_2$ is for bitcoded regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   566
and $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$ for 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   567
lists of bitcoded regular expressions. 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   568
Interesting is the LD rule that allows copies of regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   569
to be removed provided a regular expression 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   570
earlier in the list can match the same strings.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   571
}\label{rrewriteRules}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   572
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   573
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   574
The rules such as $LT$ and $LH$ are for rewriting between two regular expression lists
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   575
such that one regular expression
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   576
in the left-hand-side list is rewritable in one step
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   577
to the right-hand-side's regular expression at the same position.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   578
This helps with defining the ``context rules'' such as $AL$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   579
The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   580
are defined in the usual way:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   581
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   582
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   583
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   584
	\inferrule{\vspace{0em}}{ r \rightsquigarrow^* r \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   585
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   586
	\inferrule{\vspace{0em}}{rs \stackrel{s*}{\rightsquigarrow} rs \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   587
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   588
	\inferrule{r_1 \rightsquigarrow^*  r_2 \land \; r_2 \rightsquigarrow^* r_3}{r_1 \rightsquigarrow^* r_3\\}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   589
	
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   590
	\inferrule{rs_1 \stackrel{s*}{\rightsquigarrow}  rs_2 \land \; rs_2 \stackrel{s*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{s*}{\rightsquigarrow} rs_3}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   591
\end{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   592
\caption{The Reflexive Transitive Closure of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   593
$\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   594
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   595
Two rewritable terms will remain rewritable to each other
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   596
even after a derivative is taken:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   597
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   598
	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   599
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   600
And finally, if two terms are rewritable to each other,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   601
then they produce the same bitcodes:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   602
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   603
	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   604
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   605
The decoding phase of both $\blexer$ and $\blexersimp$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   606
are the same, which means that if they get the same
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   607
bitcodes before the decoding phase,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   608
they get the same value after decoding is done.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   609
We will prove the three properties 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   610
we mentioned above in the next sub-section.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   611
\subsection{Important Properties of $\rightsquigarrow$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   612
First we prove some basic facts 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   613
about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   614
$\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   615
which will be needed later.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   616
The inference rules (\ref{rrewriteRules}) we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   617
gave in the previous section 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   618
have their ``many-steps version'':
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   619
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   620
\begin{lemma}\label{squig1}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   621
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   622
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   623
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   624
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   625
		\item
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   626
			$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\;  _{bs} \sum (r' :: rs)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   627
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   628
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   629
			The rewriting in many steps property is composible 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   630
			in terms of the sequence constructor:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   631
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   632
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;  
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   633
			_{bs} r_2 \cdot r_3 \quad $ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   634
			and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   635
			$\quad r_3 \rightsquigarrow^* r_4 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   636
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* _{bs} \; r_1 \cdot r_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   637
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   638
			The rewriting in many steps properties 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   639
			$\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   640
			is preserved under the function $\fuse$:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   641
				$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   642
				\implies \fuse \; bs \; r_1 \rightsquigarrow^* \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   643
				\fuse \; bs \; r_2 \quad  $ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   644
				$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   645
				\implies \map \; (\fuse \; bs) \; rs_1 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   646
				\stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   647
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   648
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   649
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   650
	By an induction on 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   651
	the inductive cases of $\stackrel{s*}{\rightsquigarrow}$ and $\rightsquigarrow^*$ respectively.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   652
	The third and fourth points are 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   653
	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   654
	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   655
	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   656
	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   657
	$\stackrel{s}{\rightsquigarrow}$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   658
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   659
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   660
The inference rules of $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   661
are defined in terms of list cons operation, here
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   662
we establish that the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   663
$\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   664
relation is also preserved w.r.t appending and prepending of a list.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   665
In addition, we
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   666
also prove some relations 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   667
between $\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   668
\begin{lemma}\label{ssgqTossgs}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   669
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   670
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   671
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   672
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   673
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   674
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   675
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   676
			rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2 \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   677
			\textit{and} \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   678
			rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   679
			
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   680
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   681
			The $\stackrel{s}{\rightsquigarrow} $ relation after appending 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   682
			a list becomes $\stackrel{s*}{\rightsquigarrow}$:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   683
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   684
			\implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   685
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   686
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   687
			$r_1 \rightsquigarrow^* r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   688
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   689
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   690
			$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^* r_2 \implies
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   691
			r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   692
		\item			
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   693
			If we could rewrite a regular expression 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   694
			in many steps to $\ZERO$, then 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   695
			we could also rewrite any sequence containing it to $\ZERO$:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   696
			$r_1 \rightsquigarrow^* \ZERO 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   697
			\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   698
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   699
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   700
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   701
	The first part is by induction on the list $rs$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   702
	The second part is by induction on the inductive cases of $\stackrel{s*}{\rightsquigarrow}$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   703
	The third part is 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   704
	by rule induction of $\stackrel{s}{\rightsquigarrow}$.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   705
	The fourth sub-lemma is 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   706
	by rule induction of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   707
	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three. 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   708
	The fifth part is a corollary of part four.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   709
	The last part is proven by rule induction again on $\rightsquigarrow^*$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   710
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   711
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   712
Now we are ready to give the proofs of the below properties:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   713
\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   714
	\item
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   715
		$(r \rightsquigarrow^* r'\land \bnullable \; r_1) 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   716
		\implies \bmkeps \; r = \bmkeps \; r'$. \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   717
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   718
		$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   719
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   720
		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   721
\end{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   722
These properties would work together towards the correctness theorem.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   723
\subsubsection{Property 1: $(r \rightsquigarrow^* r'\land \bnullable \; r_1) 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   724
		\implies \bmkeps \; r = \bmkeps \; r'$}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   725
Intuitively, this property says we can 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   726
extract the same bitcodes using $\bmkeps$ from the nullable
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   727
components of two regular expressions $r$ and $r'$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   728
if we can rewrite from one to the other in finitely
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   729
many steps.\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   730
For convenience, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   731
we define a predicate for a list of regular expressions
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   732
having at least one nullable regular expressions:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   733
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   734
	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   735
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   736
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   737
The rewriting relation $\rightsquigarrow$ preserves nullability:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   738
\begin{lemma}\label{rewritesBnullable}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   739
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   740
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   741
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   742
			$\text{If} \; r_1 \rightsquigarrow r_2, \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   743
			\text{then} \; \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   744
		\item 	
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   745
			$\text{If} \; rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   746
			\text{then} \; \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   747
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   748
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   749
			\implies \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   750
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   751
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   752
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   753
	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   754
	The third point is a corollary of the second.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   755
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   756
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   757
For convenience again,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   758
we define $\bmkepss$ on a list $rs$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   759
which extracts the bit-codes on the first $\bnullable$ element in $rs$:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   760
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   761
	\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   762
		$\bmkepss \; [] $ & $\dn$ & $[]$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   763
		$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \;(\bnullable \; r) \;\; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   764
		\textit{then} \;\; \bmkeps \; r \; \textit{else} \;\; \bmkepss \; rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   765
	\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   766
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   767
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   768
If both regular expressions in a rewriting relation are nullable, then they 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   769
produce the same bitcodes:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   770
\begin{lemma}\label{rewriteBmkepsAux}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   771
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   772
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   773
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   774
			$r_1 \rightsquigarrow r_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   775
			(\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 = 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   776
			\bmkeps \; r_2)$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   777
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   778
			and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   779
			$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   780
			\implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   781
			\bmkepss \; rs_1 = \bmkepss \; rs2)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   782
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   783
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   784
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   785
	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   786
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   787
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   788
With lemma \ref{rewriteBmkepsAux} we are ready to prove its
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   789
many-step version: 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   790
\begin{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   791
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   792
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   793
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   794
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   795
	$\ref{rewritesBnullable}$ tells us both $r$ and $r'$ are nullable.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   796
	\ref{rewriteBmkepsAux} solves the inductive case.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   797
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   798
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   799
\subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   800
Now we get to the ``meaty'' part of the proof, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   801
which says that our simplification's helper functions 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   802
such as $\distinctBy$ and $\flts$ conform to 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   803
the $\stackrel{s*}{\rightsquigarrow}$ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   804
$\rightsquigarrow^* $ rewriting relations.\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   805
The first lemma to prove is a more general version of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   806
$rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   807
\begin{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   808
	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   809
	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   810
\end{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   811
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   812
It says that that for a list made of two parts $rs_1 @ rs_2$, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   813
one can throw away the duplicate
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   814
elements in $rs_2$, as well as those that have appeared in $rs_1$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   815
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   816
	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   817
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   818
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   819
Setting $rs_2$ to be empty,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   820
we get the corollary
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   821
\begin{corollary}\label{dBPreserves}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   822
	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   823
\end{corollary}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   824
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   825
The flatten function $\flts$ conforms to
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   826
$\stackrel{s*}{\rightsquigarrow}$ as well:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   827
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   828
\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   829
	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   830
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   831
\begin{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   832
	By an induction on $rs$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   833
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   834
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   835
The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   836
\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   837
	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   838
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   839
\noindent
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   840
The simplification function
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   841
$\textit{bsimp}$ only transforms the regex $r$ using steps specified by 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   842
$\rightsquigarrow^*$ and nothing else.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   843
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   844
	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   845
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   846
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   847
	By an induction on $r$.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   848
	The most involved case would be the alternative, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   849
	where we use lemmas \ref{bsimpaltsPreserves},
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   850
	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   851
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   852
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   853
			$rs$ &  $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   854
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   855
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   856
			(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   857
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   858
	\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   859
	Using this we derive the following rewrite relation:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   860
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   861
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   862
			$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   863
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   864
			    & $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   865
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   866
			    (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   867
			    \; \rerases \; \phi)$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   868
			    %& $\rightsquigarrow^*$ & $ _{bs} \sum (\distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   869
				%(\flts \; (\map \; \textit{bsimp}\; rs)) \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   870
				%\rerases \; \;\phi) $\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   871
			    & $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   872
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   873
	\end{center}	
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   874
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   875
\subsubsection{Property 3: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   876
The rewritability relation 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   877
$\rightsquigarrow$ is preserved under derivatives--
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   878
it is just that we might need multiple steps 
588
Chengsong
parents: 586
diff changeset
   879
where originally only one step was needed:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   880
\begin{lemma}\label{rewriteBder}
588
Chengsong
parents: 586
diff changeset
   881
	\hspace{0em}
Chengsong
parents: 586
diff changeset
   882
	\begin{itemize}
Chengsong
parents: 586
diff changeset
   883
		\item
Chengsong
parents: 586
diff changeset
   884
			If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c 
Chengsong
parents: 586
diff changeset
   885
			\rightsquigarrow^*  r_2 \backslash c$ 
Chengsong
parents: 586
diff changeset
   886
		\item	
Chengsong
parents: 586
diff changeset
   887
			If $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$, then $ 
Chengsong
parents: 586
diff changeset
   888
			\map \; (\_\backslash c) \; rs_1 
Chengsong
parents: 586
diff changeset
   889
			\stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
Chengsong
parents: 586
diff changeset
   890
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   891
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   892
\begin{proof}
588
Chengsong
parents: 586
diff changeset
   893
	By induction on $\rightsquigarrow$ 
Chengsong
parents: 586
diff changeset
   894
	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
Chengsong
parents: 586
diff changeset
   895
\end{proof}
Chengsong
parents: 586
diff changeset
   896
\noindent
Chengsong
parents: 586
diff changeset
   897
Now we can prove property 3, as an immediate corollary:
Chengsong
parents: 586
diff changeset
   898
\begin{corollary}\label{rewritesBder}
Chengsong
parents: 586
diff changeset
   899
	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*   
Chengsong
parents: 586
diff changeset
   900
	r_2 \backslash c$
Chengsong
parents: 586
diff changeset
   901
\end{corollary}
Chengsong
parents: 586
diff changeset
   902
\begin{proof}
Chengsong
parents: 586
diff changeset
   903
	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma \ref{rewriteBder}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   904
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   905
\noindent
588
Chengsong
parents: 586
diff changeset
   906
This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
Chengsong
parents: 586
diff changeset
   907
to obtain the rewritability between
Chengsong
parents: 586
diff changeset
   908
$\blexer$ and $\blexersimp$'s intermediate
Chengsong
parents: 586
diff changeset
   909
derivative regular expressions 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   910
\begin{lemma}\label{bderBderssimp}
588
Chengsong
parents: 586
diff changeset
   911
	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   912
\end{lemma}
588
Chengsong
parents: 586
diff changeset
   913
\begin{proof}
Chengsong
parents: 586
diff changeset
   914
	By an induction on $s$.
Chengsong
parents: 586
diff changeset
   915
\end{proof}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   916
\subsection{Main Theorem}
588
Chengsong
parents: 586
diff changeset
   917
Now with \ref{bderBderssimp} we are ready for the main theorem.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   918
\begin{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   919
	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   920
\end{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   921
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   922
\begin{proof}
588
Chengsong
parents: 586
diff changeset
   923
	One can rewrite in many steps from the original lexer's 
Chengsong
parents: 586
diff changeset
   924
	derivative regular expressions to the 
Chengsong
parents: 586
diff changeset
   925
	lexer with simplification applied (by lemma \ref{bderBderssimp}):
Chengsong
parents: 586
diff changeset
   926
	\begin{center}
Chengsong
parents: 586
diff changeset
   927
		$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
Chengsong
parents: 586
diff changeset
   928
	\end{center}
Chengsong
parents: 586
diff changeset
   929
	we know that they give out the same bits, if the lexing result is a match:
Chengsong
parents: 586
diff changeset
   930
	\begin{center}
Chengsong
parents: 586
diff changeset
   931
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
   932
		\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
Chengsong
parents: 586
diff changeset
   933
	\end{center}
Chengsong
parents: 586
diff changeset
   934
	Now that they give out the same bits, we know that they give the same value after decoding.
Chengsong
parents: 586
diff changeset
   935
	\begin{center}
Chengsong
parents: 586
diff changeset
   936
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
   937
		\implies \decode \; r \; (\bmkeps \; (a \backslash s)) = 
Chengsong
parents: 586
diff changeset
   938
		\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
Chengsong
parents: 586
diff changeset
   939
	\end{center}
Chengsong
parents: 586
diff changeset
   940
	Which is equivalent to our proof goal:
Chengsong
parents: 586
diff changeset
   941
	\begin{center}
Chengsong
parents: 586
diff changeset
   942
		$\blexer \; r \; s = \blexersimp \; r \; s$.
Chengsong
parents: 586
diff changeset
   943
	\end{center}	
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   944
\end{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   945
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   946
As a corollary,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   947
we link this result with the lemma we proved earlier that 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   948
\begin{center}
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   949
	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   950
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   951
and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   952
indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   953
\begin{corollary}
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   954
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   955
\end{corollary}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   956
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   957
\subsection{Comments on the Proof Techniques Used}
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   958
Straightforward and simple as the proof may seem,
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   959
the efforts we spent obtaining it was far from trivial.\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   960
We initially attempted to re-use the argument 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   961
in \cref{flex_retrieve}. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   962
The problem was that both functions $\inj$ and $\retrieve$ require 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   963
that the annotated regular expressions stay unsimplified, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   964
so that one can 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   965
correctly compare $v_{i+1}$ and $r_i$  and $v_i$ 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   966
in diagram \ref{graph:inj} and 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   967
``fit the key into the lock hole''.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   968
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   969
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   970
We also tried to prove 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   971
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   972
$\textit{bsimp} \;\; (\bderssimp{a}{s}) = 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   973
\textit{bsimp} \;\;  (a\backslash s)$,
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   974
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   975
but this turns out to be not true.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   976
A counterexample would be
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   977
\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\; 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   978
	\text{and} \;\; s = bb.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   979
\]
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   980
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   981
Then we would have 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   982
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   983
	$\textit{bsimp}\;\; ( a \backslash s )$ =
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   984
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   985
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   986
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   987
whereas 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   988
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   989
	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =  
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   990
	$_{Z}(_{Z} \ONE + _{S} c)$.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   991
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   992
Unfortunately, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   993
if we apply $\textit{bsimp}$ differently
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   994
we will always have this discrepancy. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   995
This is due to 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   996
the $\map \; (\fuse\; bs) \; as$ operation 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   997
happening at different locations in the regular expression.\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   998
The rewriting relation 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   999
$\rightsquigarrow^*$ 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1000
allows us to ignore this discrepancy
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1001
and view the expressions 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1002
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1003
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1004
	and\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1005
	$_{Z}(_{Z} \ONE + _{S} c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1006
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1007
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1008
as equal, because they were both re-written
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1009
from the same expression.\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1010
Having correctness property is good. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1011
But we would also a guarantee that the lexer is not slow in 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1012
some sense, for exampe, not grinding to a halt regardless of the input.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1013
As we have already seen, Sulzmann and Lu's simplification function
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1014
$\simpsulz$ cannot achieve this, because their claim that
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1015
the regular expression size does not grow arbitrary large
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1016
was not true. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1017
In the next chapter we shall prove that with our $\simp$, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1018
for a given $r$, the internal derivative size is always
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1019
finitely bounded by a constant.