ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Wed, 12 Oct 2022 14:08:06 +0100
changeset 614 d5e9bcb384ec
parent 601 ce4e5151a836
child 624 8ffa28fce271
permissions -rwxr-xr-x
reorder
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    14
In this chapter we introduce simplifications
583
Chengsong
parents: 582
diff changeset
    15
on annotated regular expressions that can be applied to 
Chengsong
parents: 582
diff changeset
    16
each intermediate derivative result. This allows
Chengsong
parents: 582
diff changeset
    17
us to make $\blexer$ much more efficient.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    18
Sulzmann and Lu already had some bit-coded simplifications,
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    19
but their simplification functions  were inefficient.
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    20
We contrast our simplification function 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    21
with Sulzmann and Lu's, indicating the simplicity of our algorithm.
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    22
This is another case for the usefulness 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    23
and reliability of formal proofs on algorithms.
583
Chengsong
parents: 582
diff changeset
    24
These ``aggressive'' simplifications would not be possible in the injection-based 
Chengsong
parents: 582
diff changeset
    25
lexing we introduced in chapter \ref{Inj}.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    26
We then prove the correctness with the improved version of 
583
Chengsong
parents: 582
diff changeset
    27
$\blexer$, called $\blexersimp$, by establishing 
Chengsong
parents: 582
diff changeset
    28
$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    29
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    30
\section{Simplifications by Sulzmann and Lu}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    31
Consider the derivatives of examples such as $(a^*a^*)^*$
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    32
and $(a^* + (aa)^*)^*$:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    33
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    34
	$(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} (a^*a^* + a^*)\cdot(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} $\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    35
	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} \ldots$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    36
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    37
\noindent
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    38
As can be seen, there is a lot of duplication 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    39
in the example we have already mentioned in 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    40
\ref{eqn:growth2}.
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    41
A simple-minded simplification function cannot simplify
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    42
the third regular expression in the above chain of derivative
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    43
regular expressions, namely
583
Chengsong
parents: 582
diff changeset
    44
\begin{center}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    45
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
583
Chengsong
parents: 582
diff changeset
    46
\end{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    47
because the duplicates are
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    48
not next to each other and therefore the rule
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    49
$r+ r \rightarrow r$ does not fire.
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    50
One would expect a better simplification function to work in the 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    51
following way:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    52
\begin{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    53
	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    54
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    55
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    56
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    57
	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    58
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    59
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    60
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    61
	)\cdot(a^*a^*)^*  
583
Chengsong
parents: 582
diff changeset
    62
	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
Chengsong
parents: 582
diff changeset
    63
	\bigg\downarrow \\
Chengsong
parents: 582
diff changeset
    64
	(a^*a^* + a^* 
Chengsong
parents: 582
diff changeset
    65
	)\cdot(a^*a^*)^*  
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    66
\end{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    67
\noindent
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    68
In the first step, the nested alternative regular expression
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    69
$(a^*a^* + a^*) + a^*$ is flattened into $a^*a^* + a^* + a^*$.
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    70
Now the third term $a^*$ is clearly identified as a duplicate
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    71
and therefore removed in the second step. This causes the two
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    72
top-level terms to become the same and the second $(a^*a^*+a^*)\cdot(a^*a^*)^*$ 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    73
removed in the final step.\\
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    74
This motivating example is from testing Sulzmann and Lu's 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    75
algorithm: their simplification does 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    76
not work!
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    77
Consider their simplification (using our notations):
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    78
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    79
	\begin{tabular}{lcl}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    80
		$\simpsulz \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    81
		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    82
						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    83
		$\simpsulz \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    84
		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    85
		\textit{then} \;\; \ZERO$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    86
					     & & $\textit{else}\;\;_{bs}((\simpsulz \;r_1)\cdot
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    87
					     (\simpsulz \; r_2))$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    88
		$\simpsulz  \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    89
		$\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    90
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    91
		$\simpsulz  \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simpsulz  \; r)$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    92
		$\simpsulz  \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    93
		(\nub \; (\filter \; (\not \circ \zeroable)\;((\simpsulz  \; r) :: \map \; \simpsulz  \; rs)))$\\ 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    94
		
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    95
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    96
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    97
\noindent
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    98
where the $\textit{zeroable}$ predicate 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    99
tests whether the regular expression
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   100
is equivalent to $\ZERO$,
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   101
can be defined as:
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   102
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   103
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   104
		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   105
		\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   106
		$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   107
		$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   108
		$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   109
		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   110
		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   111
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   112
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   113
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   114
They suggested that the $\simpsulz $ function should be
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   115
applied repeatedly until a fixpoint is reached.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   116
We call this construction $\textit{sulzSimp}$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   117
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   118
	\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   119
		$\textit{sulzSimp} \; r$ & $\dn$ & 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   120
		$\textit{while}((\simpsulz  \; r)\; \cancel{=} \; r)$ \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   121
		& & $\quad r := \simpsulz  \; r$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   122
		& & $\textit{return} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   123
	\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   124
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   125
We call the operation of alternatingly 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   126
applying derivatives and simplifications
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   127
(until the string is exhausted) Sulz-simp-derivative,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   128
written $\backslash_{sulzSimp}$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   129
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   130
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   131
	$r \backslash_{sulzSimp} (c\!::\!s) $ & $\dn$ & $(\textit{sulzSimp} \; (r \backslash c)) \backslash_{sulzSimp}\, s$ \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   132
$r \backslash_{sulzSimp} [\,] $ & $\dn$ & $r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   133
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   134
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   135
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   136
After the derivatives have been taken, the bitcodes
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   137
are extracted and decoded in the same manner
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   138
as $\blexer$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   139
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   140
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   141
  $\textit{blexer\_sulzSimp}\;r\,s$ & $\dn$ &
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   142
      $\textit{let}\;a = (r^\uparrow)\backslash_{sulzSimp}\, s\;\textit{in}$\\                
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   143
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   144
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   145
  & & $\;\;\textit{else}\;\textit{None}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   146
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   147
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   148
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   149
We implemented this lexing algorithm in Scala, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   150
and found that the final derivative regular expression
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   151
size grows exponentially fast:
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   152
\begin{figure}[H]
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   153
	\centering
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   154
\begin{tikzpicture}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   155
\begin{axis}[
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   156
    xlabel={$n$},
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   157
    ylabel={size},
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   158
    ymode = log,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   159
    legend entries={Final Derivative Size},  
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   160
    legend pos=north west,
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   161
    legend cell align=left]
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   162
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   163
\end{axis}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   164
\end{tikzpicture} 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   165
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   166
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   167
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   168
\end{figure}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   169
\noindent
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   170
At $n= 20$ we already get an out of memory error with Scala's normal 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   171
JVM heap size settings.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   172
In fact their simplification does not improve over
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   173
the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   174
The time required also grows exponentially:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   175
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   176
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   177
\begin{tikzpicture}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   178
\begin{axis}[
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   179
    xlabel={$n$},
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   180
    ylabel={time},
601
Chengsong
parents: 600
diff changeset
   181
    %ymode = log,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   182
    legend entries={time in secs},  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   183
    legend pos=north west,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   184
    legend cell align=left]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   185
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexerTime.data};
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   186
\end{axis}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   187
\end{tikzpicture} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   188
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   189
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   190
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexerTime}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   191
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   192
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   193
which seems like a counterexample for 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   194
their linear complexity claim:
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   195
\begin{quote}\it
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   196
Linear-Time Complexity Claim \\It is easy to see that each call of one of the functions/operations:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   197
simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   198
\end{quote}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   199
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   200
The assumption that the size of the regular expressions
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   201
in the algorithm
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   202
would stay below a finite constant is not ture.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   203
The main reason behind this is that (i) The $\textit{nub}$
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   204
function requires identical annotations between two 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   205
annotated regular expressions to qualify as duplicates,
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   206
and cannot simplify the cases like $_{SZZ}a^*+_{SZS}a^*$
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   207
even if both $a^*$ denote the same language.
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   208
(ii) The ``flattening'' only applies to the head of the list
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   209
in the 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   210
\begin{center}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   211
	\begin{tabular}{lcl}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   212
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   213
		$\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   214
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   215
	\end{tabular}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   216
\end{center}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   217
\noindent
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   218
clause, and therefore is not thorough enough to simplify all
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   219
needed parts of the regular expression.\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   220
In addition to that, even if the regular expressions size
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   221
do stay finite, one has to take into account that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   222
the $\simpsulz$ function is applied many times
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   223
in each derivative step, and that number is not necessarily
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   224
a constant with respect to the size of the regular expression.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   225
To not get ``caught off guard'' by
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   226
these counterexamples,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   227
one needs to be more careful when designing the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   228
simplification function and making claims about them.
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   229
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   230
\section{Our $\textit{Simp}$ Function}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   231
We will now introduce our simplification function,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   232
by making a contrast with $\simpsulz$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   233
We describe
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   234
the ideas behind components in their algorithm 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   235
and why they fail to achieve the desired effect, followed
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   236
by our solution. These solutions come with correctness
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   237
statements that are backed up by formal proofs.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   238
\subsection{Flattening Nested Alternatives}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   239
The idea behind the clause
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   240
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   241
$\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   242
	       _{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   243
\end{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   244
is that it allows
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   245
duplicate removal of regular expressions at different
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   246
``levels'' of alternatives.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   247
For example, this would help with the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   248
following simplification:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   249
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   250
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   251
$(a+r)+r \longrightarrow a+r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   252
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   253
The problem here is that only the head element
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   254
is ``spilled out'',
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   255
whereas we would want to flatten
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   256
an entire list to open up possibilities for further simplifications.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   257
Not flattening the rest of the elements also means that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   258
the later de-duplication processs 
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   259
does not fully remove further duplicates.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   260
For example,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   261
using $\simpsulz$ we could not 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   262
simplify
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   263
\begin{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   264
	$((a^* a^*)+\underline{(a^* + a^*)})\cdot (a^*a^*)^*+
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   265
((a^*a^*)+a^*)\cdot (a^*a^*)^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   266
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   267
due to the underlined part not in the first element
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   268
of the alternative.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   269
We define a flatten operation that flattens not only 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   270
the first regular expression of an alternative,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   271
but the entire list: 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   272
 \begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   273
  \begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   274
  $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   275
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   276
  $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   277
    $\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise) 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   278
\end{tabular}    
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   279
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   280
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   281
Our $\flts$ operation 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   282
also throws away $\ZERO$s
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   283
as they do not contribute to a lexing result.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   284
\subsection{Duplicate Removal}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   285
After flattening is done, we are ready to deduplicate.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   286
The de-duplicate function is called $\distinctBy$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   287
and that is where we make our second improvement over
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   288
Sulzmann and Lu's.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   289
The process goes as follows:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   290
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   291
$rs \stackrel{\textit{flts}}{\longrightarrow} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   292
rs_{flat} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   293
\xrightarrow{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   294
rs_{flat} \; \rerases\; \varnothing} rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   295
%\stackrel{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   296
%rs_{flat} \; \erase\; \varnothing}{\longrightarrow} \; rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   297
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   298
where the $\distinctBy$ function is defined as:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   299
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   300
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   301
		$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   302
		$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & $\quad \textit{if} (f \; x \in acc)\;\; \textit{then} \;\; \distinctBy \; xs \; f \; acc$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   303
						       & & $\quad \textit{else}\;\; x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$ 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   304
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   305
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   306
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   307
The reason we define a distinct function under a mapping $f$ is because
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   308
we want to eliminate regular expressions that are syntactically the same,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   309
but with different bit-codes.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   310
For example, we can remove the second $a^*a^*$ from
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   311
$_{ZSZ}a^*a^* + _{SZZ}a^*a^*$, because it
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   312
represents a match with shorter initial sub-match 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   313
(and therefore is definitely not POSIX),
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   314
and will be discarded by $\bmkeps$ later.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   315
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   316
	$_{ZSZ}\underbrace{a^*}_{ZS:\; match \; 1\; times\quad}\underbrace{a^*}_{Z: \;match\; 1 \;times} + 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   317
	_{SZZ}\underbrace{a^*}_{S: \; match \; 0 \; times\quad}\underbrace{a^*}_{ZZ: \; match \; 2 \; times}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   318
	$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   319
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   320
%$_{bs1} r_1 + _{bs2} r_2 \text{where} (r_1)_{\downarrow} = (r_2)_{\downarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   321
Due to the way our algorithm works,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   322
the matches that conform to the POSIX standard 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   323
will always be placed further to the left. When we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   324
traverse the list from left to right,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   325
regular expressions we have already seen
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   326
will definitely not contribute to a POSIX value,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   327
even if they are attached with different bitcodes.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   328
These duplicates therefore need to be removed.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   329
To achieve this, we call $\rerases$ as the function $f$ during the distinction
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   330
operation.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   331
$\rerases$ is very similar to $\erase$, except that it preserves the structure
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   332
when erasing an alternative regular expression.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   333
The reason why we use $\rerases$ instead of $\erase$ is that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   334
it keeps the structures of alternative 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   335
annotated regular expressions
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   336
whereas $\erase$ would turn it back into a binary structure.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   337
Not having to mess with the structure 
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   338
greatly simplifies the finiteness proof in chapter 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   339
\ref{Finite} (we will follow up with more details there).
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   340
We give the definitions of $\rerases$ here together with
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   341
the new datatype used by $\rerases$ (as our plain
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   342
regular expression datatype does not allow non-binary alternatives).
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   343
For the moment the reader can just think of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   344
$\rerases$ as $\erase$ and $\rrexp$ as plain regular expressions.
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   345
\begin{figure}[H]
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   346
\begin{center}	
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   347
	$\rrexp ::=   \RZERO \mid  \RONE
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   348
			 \mid  \RCHAR{c}  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   349
			 \mid  \RSEQ{r_1}{r_2}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   350
			 \mid  \RALTS{rs}
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   351
			 \mid \RSTAR{r}        $
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   352
\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   353
\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   354
constructor}\label{rrexpDef}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   355
\end{figure}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   356
The notation of $\rerases$ also follows that of $\erase$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   357
which is a postfix operator written as a subscript,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   358
except that it has an \emph{r} attached to it to distinguish against $\erase$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   359
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   360
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   361
$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   362
$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   363
	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   364
$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   365
$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   366
$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   367
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   368
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   369
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   370
\subsection{Putting Things Together}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   371
A recursive definition of our  simplification function 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   372
is given below:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   373
%that looks somewhat similar to our Scala code is 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   374
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   375
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   376
   
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   377
	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   378
	  $\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \rerases \; \varnothing) $ \\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   379
   $\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   380
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   381
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   382
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   383
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   384
The simplification (named $\textit{bsimp}$ for \emph{b}it-coded) 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   385
does a pattern matching on the regular expression.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   386
When it detected that the regular expression is an alternative or
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   387
sequence, it will try to simplify its children regular expressions
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   388
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   389
$\ONE$, which might trigger further simplification at the current level.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   390
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   391
using rules such as  $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   392
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   393
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   394
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   395
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   396
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   397
   &&$\quad\textit{case} \;  (_{bs1}\ONE, a_2') \Rightarrow  \textit{fuse} \; (bs@bs_1) \;  a_2'$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   398
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   399
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   400
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   401
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   402
The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   403
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   404
and then call $\distinctBy$ on that list, the predicate determining whether two 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   405
elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   406
Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   407
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   408
decides whether to keep the current level constructor $\sum$ as it is, and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   409
removes it when there are less than two elements:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   410
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   411
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   412
		$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   413
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   414
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   415
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   416
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   417
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   418
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   419
Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   420
we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   421
so it stays small:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   422
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   423
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   424
		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   425
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   426
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   427
%Following previous notations
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   428
%when extending from derivatives w.r.t.~character to derivative
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   429
%w.r.t.~string, we define the derivative that nests simplifications 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   430
%with derivatives:%\comment{simp in  the [] case?}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   431
We extend this from character to string:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   432
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   433
\begin{tabular}{lcl}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   434
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   435
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   436
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   437
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   438
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   439
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   440
The lexer that extracts bitcodes from the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   441
derivatives with simplifications from our $\simp$ function
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   442
is called $\blexersimp$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   443
\begin{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   444
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   445
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   446
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   447
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   448
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   449
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   450
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   451
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   452
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   453
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   454
This algorithm keeps the regular expression size small.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   455
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   456
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   457
\subsection{Examples $(a+aa)^*$ and $(a^*\cdot a^*)^*$
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   458
After Simplification}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   459
Recall the
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   460
previous $(a^*a^*)^*$ example
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   461
where $\simpsulz$ could not
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   462
prevent the fast growth (over
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   463
3 million nodes just below $20$ input length)
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   464
will be reduced to just 15 and stays constant no matter how long the
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   465
input string is.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   466
This is shown in the graphs below.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   467
\begin{figure}[H]
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   468
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   469
\begin{tabular}{ll}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   470
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   471
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   472
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   473
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   474
        width=7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   475
    height=4cm, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   476
    legend entries={Lexer with $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   477
    legend pos=  south east,
Chengsong
parents: 538
diff changeset
   478
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   479
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
Chengsong
parents: 538
diff changeset
   480
\end{axis}
Chengsong
parents: 538
diff changeset
   481
\end{tikzpicture} %\label{fig:BitcodedLexer}
Chengsong
parents: 538
diff changeset
   482
&
Chengsong
parents: 538
diff changeset
   483
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   484
\begin{axis}[
Chengsong
parents: 538
diff changeset
   485
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   486
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   487
    width = 7cm,
Chengsong
parents: 538
diff changeset
   488
    height = 4cm,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   489
    legend entries={Lexer with $\simpsulz$},  
539
Chengsong
parents: 538
diff changeset
   490
    legend pos=  north west,
Chengsong
parents: 538
diff changeset
   491
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   492
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
Chengsong
parents: 538
diff changeset
   493
\end{axis}
Chengsong
parents: 538
diff changeset
   494
\end{tikzpicture} 
Chengsong
parents: 538
diff changeset
   495
\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   496
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   497
\caption{Our Improvement over Sulzmann and Lu's in terms of size}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   498
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   499
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   500
Given the size difference, it is not
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   501
surprising that our $\blexersimp$ significantly outperforms
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   502
$\textit{blexer\_sulzSimp}$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   503
In the next section we are going to establish the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   504
first important property of our lexer--the correctness.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   505
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   506
%	SECTION rewrite relation
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   507
%----------------------------------------------------------------------------------------
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   508
\section{Correctness of $\blexersimp$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   509
In this section we give details
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   510
of the correctness proof of $\blexersimp$,
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   511
one of the contributions of this thesis.\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   512
We first introduce the rewriting relation \emph{rrewrite}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   513
($\rrewrite$) between two regular expressions,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   514
which expresses an atomic
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   515
simplification.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   516
We then prove properties about
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   517
this rewriting relation and its reflexive transitive closure.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   518
Finally we leverage these properties to show
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   519
an equivalence between the internal data structures of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   520
$\blexer$ and $\blexersimp$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   521
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   522
\subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   523
In the $\blexer$'s correctness proof, we
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   524
did not directly derive the fact that $\blexer$ generates the POSIX value,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   525
but first proved that $\blexer$ is linked with $\lexer$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   526
Then we re-use
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   527
the correctness of $\lexer$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   528
to obtain
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   529
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   530
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   531
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   532
Here we apply this
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   533
modularised technique again
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   534
by first proving that
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   535
$\blexersimp \; r \; s $ 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   536
produces the same output as $\blexer \; r\; s$,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   537
and then piecing it together with 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   538
$\blexer$'s correctness to achieve our main
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   539
theorem:\footnote{ The case when 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   540
$s$ is not in $L \; r$, is routine to establish.}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   541
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   542
	$(r, s) \rightarrow v \; \;   \textit{iff} \;\;  \blexersimp \; r \; s = v$
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   543
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   544
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   545
The overall idea for the proof
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   546
of $\blexer \;r \;s = \blexersimp \; r \;s$ 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   547
is that the transition from $r$ to $\textit{bsimp}\; r$ can be
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   548
broken down into finitely many rewrite steps:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   549
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   550
	$r \rightsquigarrow^* \textit{bsimp} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   551
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   552
where each rewrite step, written $\rightsquigarrow$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   553
is an ``atomic'' simplification that
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   554
is similar to a small-step reduction in operational semantics:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   555
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   556
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   557
	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   558
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   559
	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   560
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   561
	\inferrule * [Right = $S_1$]{\vspace{0em}}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   562
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   563
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   564
	
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   565
	\inferrule * [Right = $SL$] {\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   566
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   567
	\inferrule * [Right = $SR$] {\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   568
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   569
	\inferrule * [Right = $A0$] {\vspace{0em}}{ _{bs}\sum [] \rightsquigarrow \ZERO}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   570
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   571
	\inferrule * [Right = $A1$] {\vspace{0em}}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   572
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   573
	\inferrule * [Right = $AL$] {\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   574
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   575
	\inferrule * [Right = $LE$] {\vspace{0em}}{ [] \stackrel{s}{\rightsquigarrow} []}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   576
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   577
	\inferrule * [Right = $LT$] {rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{ r :: rs_1 \stackrel{s}{\rightsquigarrow} r :: rs_2 }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   578
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   579
	\inferrule * [Right = $LH$] {r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   580
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   581
	\inferrule * [Right = $L\ZERO$] {\vspace{0em}}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   582
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   583
	\inferrule * [Right = $LS$] {\vspace{0em}}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   584
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   585
	\inferrule * [Right = $LD$] {\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rs_c \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   586
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   587
\end{mathpar}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   588
\caption{
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   589
The rewrite rules that generate simplified regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   590
in small steps: $r_1 \rightsquigarrow r_2$ is for bitcoded regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   591
and $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$ for 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   592
lists of bitcoded regular expressions. 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   593
Interesting is the LD rule that allows copies of regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   594
to be removed provided a regular expression 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   595
earlier in the list can match the same strings.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   596
}\label{rrewriteRules}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   597
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   598
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   599
The rules such as $LT$ and $LH$ are for rewriting between two regular expression lists
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   600
such that one regular expression
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   601
in the left-hand-side list is rewritable in one step
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   602
to the right-hand-side's regular expression at the same position.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   603
This helps with defining the ``context rules'' such as $AL$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   604
The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   605
are defined in the usual way:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   606
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   607
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   608
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   609
	\inferrule{\vspace{0em}}{ r \rightsquigarrow^* r \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   610
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   611
	\inferrule{\vspace{0em}}{rs \stackrel{s*}{\rightsquigarrow} rs \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   612
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   613
	\inferrule{r_1 \rightsquigarrow^*  r_2 \land \; r_2 \rightsquigarrow^* r_3}{r_1 \rightsquigarrow^* r_3\\}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   614
	
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   615
	\inferrule{rs_1 \stackrel{s*}{\rightsquigarrow}  rs_2 \land \; rs_2 \stackrel{s*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{s*}{\rightsquigarrow} rs_3}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   616
\end{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   617
\caption{The Reflexive Transitive Closure of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   618
$\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   619
\end{figure}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   620
%Two rewritable terms will remain rewritable to each other
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   621
%even after a derivative is taken:
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   622
Rewriting is preserved under derivatives,
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   623
namely
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   624
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   625
	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   626
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   627
And finally, if two terms are rewritable to each other,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   628
then they produce the same bitcodes:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   629
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   630
	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   631
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   632
The decoding phase of both $\blexer$ and $\blexersimp$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   633
are the same, which means that if they get the same
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   634
bitcodes before the decoding phase,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   635
they get the same value after decoding is done.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   636
We will prove the three properties 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   637
we mentioned above in the next sub-section.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   638
\subsection{Important Properties of $\rightsquigarrow$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   639
First we prove some basic facts 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   640
about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   641
$\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   642
which will be needed later.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   643
The inference rules (\ref{rrewriteRules}) we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   644
gave in the previous section 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   645
have their ``many-steps version'':
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   646
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   647
\begin{lemma}\label{squig1}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   648
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   649
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   650
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   651
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   652
		\item
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   653
			$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\;  _{bs} \sum (r' :: rs)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   654
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   655
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   656
			The rewriting in many steps property is composible 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   657
			in terms of the sequence constructor:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   658
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   659
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;  
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   660
			_{bs} r_2 \cdot r_3 \quad $ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   661
			and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   662
			$\quad r_3 \rightsquigarrow^* r_4 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   663
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* _{bs} \; r_1 \cdot r_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   664
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   665
			The rewriting in many steps properties 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   666
			$\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   667
			is preserved under the function $\fuse$:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   668
				$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   669
				\implies \fuse \; bs \; r_1 \rightsquigarrow^* \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   670
				\fuse \; bs \; r_2 \quad  $ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   671
				$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   672
				\implies \map \; (\fuse \; bs) \; rs_1 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   673
				\stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   674
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   675
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   676
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   677
	By an induction on 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   678
	the inductive cases of $\stackrel{s*}{\rightsquigarrow}$ and $\rightsquigarrow^*$ respectively.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   679
	The third and fourth points are 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   680
	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   681
	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   682
	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   683
	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   684
	$\stackrel{s}{\rightsquigarrow}$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   685
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   686
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   687
The inference rules of $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   688
are defined in terms of list cons operation, here
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   689
we establish that the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   690
$\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   691
relation is also preserved w.r.t appending and prepending of a list.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   692
In addition, we
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   693
also prove some relations 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   694
between $\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   695
\begin{lemma}\label{ssgqTossgs}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   696
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   697
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   698
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   699
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   700
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   701
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   702
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   703
			rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2 \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   704
			\textit{and} \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   705
			rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   706
			
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   707
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   708
			The $\stackrel{s}{\rightsquigarrow} $ relation after appending 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   709
			a list becomes $\stackrel{s*}{\rightsquigarrow}$:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   710
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   711
			\implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   712
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   713
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   714
			$r_1 \rightsquigarrow^* r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   715
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   716
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   717
			$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^* r_2 \implies
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   718
			r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   719
		\item			
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   720
			If we could rewrite a regular expression 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   721
			in many steps to $\ZERO$, then 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   722
			we could also rewrite any sequence containing it to $\ZERO$:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   723
			$r_1 \rightsquigarrow^* \ZERO 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   724
			\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   725
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   726
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   727
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   728
	The first part is by induction on the list $rs$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   729
	The second part is by induction on the inductive cases of $\stackrel{s*}{\rightsquigarrow}$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   730
	The third part is 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   731
	by rule induction of $\stackrel{s}{\rightsquigarrow}$.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   732
	The fourth sub-lemma is 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   733
	by rule induction of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   734
	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three. 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   735
	The fifth part is a corollary of part four.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   736
	The last part is proven by rule induction again on $\rightsquigarrow^*$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   737
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   738
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   739
Now we are ready to give the proofs of the below properties:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   740
\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   741
	\item
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   742
		$(r \rightsquigarrow^* r'\land \bnullable \; r_1) 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   743
		\implies \bmkeps \; r = \bmkeps \; r'$. \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   744
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   745
		$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   746
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   747
		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   748
\end{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   749
These properties would work together towards the correctness theorem.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   750
\subsubsection{Property 1: $(r \rightsquigarrow^* r'\land \bnullable \; r_1) 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   751
		\implies \bmkeps \; r = \bmkeps \; r'$}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   752
Intuitively, this property says we can 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   753
extract the same bitcodes using $\bmkeps$ from the nullable
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   754
components of two regular expressions $r$ and $r'$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   755
if we can rewrite from one to the other in finitely
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   756
many steps.\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   757
For convenience, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   758
we define a predicate for a list of regular expressions
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   759
having at least one nullable regular expressions:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   760
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   761
	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   762
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   763
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   764
The rewriting relation $\rightsquigarrow$ preserves nullability:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   765
\begin{lemma}\label{rewritesBnullable}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   766
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   767
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   768
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   769
			$\text{If} \; r_1 \rightsquigarrow r_2, \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   770
			\text{then} \; \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   771
		\item 	
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   772
			$\text{If} \; rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   773
			\text{then} \; \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   774
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   775
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   776
			\implies \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   777
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   778
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   779
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   780
	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   781
	The third point is a corollary of the second.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   782
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   783
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   784
For convenience again,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   785
we define $\bmkepss$ on a list $rs$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   786
which extracts the bit-codes on the first $\bnullable$ element in $rs$:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   787
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   788
	\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   789
		$\bmkepss \; [] $ & $\dn$ & $[]$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   790
		$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \;(\bnullable \; r) \;\; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   791
		\textit{then} \;\; \bmkeps \; r \; \textit{else} \;\; \bmkepss \; rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   792
	\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   793
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   794
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   795
If both regular expressions in a rewriting relation are nullable, then they 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   796
produce the same bitcodes:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   797
\begin{lemma}\label{rewriteBmkepsAux}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   798
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   799
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   800
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   801
			$r_1 \rightsquigarrow r_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   802
			(\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 = 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   803
			\bmkeps \; r_2)$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   804
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   805
			and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   806
			$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   807
			\implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   808
			\bmkepss \; rs_1 = \bmkepss \; rs2)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   809
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   810
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   811
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   812
	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   813
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   814
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   815
With lemma \ref{rewriteBmkepsAux} we are ready to prove its
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   816
many-step version: 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   817
\begin{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   818
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   819
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   820
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   821
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   822
	$\ref{rewritesBnullable}$ tells us both $r$ and $r'$ are nullable.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   823
	\ref{rewriteBmkepsAux} solves the inductive case.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   824
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   825
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   826
\subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   827
Now we get to the ``meaty'' part of the proof, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   828
which says that our simplification's helper functions 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   829
such as $\distinctBy$ and $\flts$ conform to 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   830
the $\stackrel{s*}{\rightsquigarrow}$ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   831
$\rightsquigarrow^* $ rewriting relations.\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   832
The first lemma to prove is a more general version of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   833
$rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   834
\begin{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   835
	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   836
	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   837
\end{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   838
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   839
It says that that for a list made of two parts $rs_1 @ rs_2$, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   840
one can throw away the duplicate
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   841
elements in $rs_2$, as well as those that have appeared in $rs_1$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   842
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   843
	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   844
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   845
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   846
Setting $rs_2$ to be empty,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   847
we get the corollary
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   848
\begin{corollary}\label{dBPreserves}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   849
	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   850
\end{corollary}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   851
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   852
The flatten function $\flts$ conforms to
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   853
$\stackrel{s*}{\rightsquigarrow}$ as well:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   854
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   855
\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   856
	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   857
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   858
\begin{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   859
	By an induction on $rs$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   860
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   861
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   862
The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   863
\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   864
	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   865
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   866
\noindent
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   867
The simplification function
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   868
$\textit{bsimp}$ only transforms the regex $r$ using steps specified by 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   869
$\rightsquigarrow^*$ and nothing else.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   870
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   871
	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   872
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   873
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   874
	By an induction on $r$.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   875
	The most involved case would be the alternative, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   876
	where we use lemmas \ref{bsimpaltsPreserves},
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   877
	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   878
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   879
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   880
			$rs$ &  $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   881
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   882
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   883
			(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   884
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   885
	\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   886
	Using this we derive the following rewrite relation:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   887
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   888
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   889
			$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   890
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   891
			    & $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   892
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   893
			    (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   894
			    \; \rerases \; \phi)$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   895
			    %& $\rightsquigarrow^*$ & $ _{bs} \sum (\distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   896
				%(\flts \; (\map \; \textit{bsimp}\; rs)) \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   897
				%\rerases \; \;\phi) $\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   898
			    & $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   899
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   900
	\end{center}	
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   901
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   902
\subsubsection{Property 3: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   903
The rewritability relation 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   904
$\rightsquigarrow$ is preserved under derivatives--
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   905
it is just that we might need multiple steps 
588
Chengsong
parents: 586
diff changeset
   906
where originally only one step was needed:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   907
\begin{lemma}\label{rewriteBder}
588
Chengsong
parents: 586
diff changeset
   908
	\hspace{0em}
Chengsong
parents: 586
diff changeset
   909
	\begin{itemize}
Chengsong
parents: 586
diff changeset
   910
		\item
Chengsong
parents: 586
diff changeset
   911
			If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c 
Chengsong
parents: 586
diff changeset
   912
			\rightsquigarrow^*  r_2 \backslash c$ 
Chengsong
parents: 586
diff changeset
   913
		\item	
Chengsong
parents: 586
diff changeset
   914
			If $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$, then $ 
Chengsong
parents: 586
diff changeset
   915
			\map \; (\_\backslash c) \; rs_1 
Chengsong
parents: 586
diff changeset
   916
			\stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
Chengsong
parents: 586
diff changeset
   917
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   918
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   919
\begin{proof}
588
Chengsong
parents: 586
diff changeset
   920
	By induction on $\rightsquigarrow$ 
Chengsong
parents: 586
diff changeset
   921
	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
Chengsong
parents: 586
diff changeset
   922
\end{proof}
Chengsong
parents: 586
diff changeset
   923
\noindent
Chengsong
parents: 586
diff changeset
   924
Now we can prove property 3, as an immediate corollary:
Chengsong
parents: 586
diff changeset
   925
\begin{corollary}\label{rewritesBder}
Chengsong
parents: 586
diff changeset
   926
	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*   
Chengsong
parents: 586
diff changeset
   927
	r_2 \backslash c$
Chengsong
parents: 586
diff changeset
   928
\end{corollary}
Chengsong
parents: 586
diff changeset
   929
\begin{proof}
Chengsong
parents: 586
diff changeset
   930
	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma \ref{rewriteBder}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   931
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   932
\noindent
588
Chengsong
parents: 586
diff changeset
   933
This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
Chengsong
parents: 586
diff changeset
   934
to obtain the rewritability between
Chengsong
parents: 586
diff changeset
   935
$\blexer$ and $\blexersimp$'s intermediate
Chengsong
parents: 586
diff changeset
   936
derivative regular expressions 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   937
\begin{lemma}\label{bderBderssimp}
588
Chengsong
parents: 586
diff changeset
   938
	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   939
\end{lemma}
588
Chengsong
parents: 586
diff changeset
   940
\begin{proof}
Chengsong
parents: 586
diff changeset
   941
	By an induction on $s$.
Chengsong
parents: 586
diff changeset
   942
\end{proof}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   943
\subsection{Main Theorem}
588
Chengsong
parents: 586
diff changeset
   944
Now with \ref{bderBderssimp} we are ready for the main theorem.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   945
\begin{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   946
	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   947
\end{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   948
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   949
\begin{proof}
588
Chengsong
parents: 586
diff changeset
   950
	One can rewrite in many steps from the original lexer's 
Chengsong
parents: 586
diff changeset
   951
	derivative regular expressions to the 
Chengsong
parents: 586
diff changeset
   952
	lexer with simplification applied (by lemma \ref{bderBderssimp}):
Chengsong
parents: 586
diff changeset
   953
	\begin{center}
Chengsong
parents: 586
diff changeset
   954
		$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
Chengsong
parents: 586
diff changeset
   955
	\end{center}
Chengsong
parents: 586
diff changeset
   956
	we know that they give out the same bits, if the lexing result is a match:
Chengsong
parents: 586
diff changeset
   957
	\begin{center}
Chengsong
parents: 586
diff changeset
   958
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
   959
		\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
Chengsong
parents: 586
diff changeset
   960
	\end{center}
Chengsong
parents: 586
diff changeset
   961
	Now that they give out the same bits, we know that they give the same value after decoding.
Chengsong
parents: 586
diff changeset
   962
	\begin{center}
Chengsong
parents: 586
diff changeset
   963
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
   964
		\implies \decode \; r \; (\bmkeps \; (a \backslash s)) = 
Chengsong
parents: 586
diff changeset
   965
		\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
Chengsong
parents: 586
diff changeset
   966
	\end{center}
Chengsong
parents: 586
diff changeset
   967
	Which is equivalent to our proof goal:
Chengsong
parents: 586
diff changeset
   968
	\begin{center}
Chengsong
parents: 586
diff changeset
   969
		$\blexer \; r \; s = \blexersimp \; r \; s$.
Chengsong
parents: 586
diff changeset
   970
	\end{center}	
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   971
\end{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   972
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   973
As a corollary,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   974
we link this result with the lemma we proved earlier that 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   975
\begin{center}
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   976
	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   977
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   978
and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   979
indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   980
\begin{corollary}
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   981
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   982
\end{corollary}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   983
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   984
\subsection{Comments on the Proof Techniques Used}
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   985
Straightforward and simple as the proof may seem,
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   986
the efforts we spent obtaining it were far from trivial.\\
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   987
We initially attempted to re-use the argument 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   988
in \cref{flex_retrieve}. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   989
The problem was that both functions $\inj$ and $\retrieve$ require 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   990
that the annotated regular expressions stay unsimplified, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   991
so that one can 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   992
correctly compare $v_{i+1}$ and $r_i$  and $v_i$ 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   993
in diagram \ref{graph:inj} and 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   994
``fit the key into the lock hole''.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   995
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   996
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   997
We also tried to prove 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   998
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
   999
$\textit{bsimp} \;\; (\bderssimp{a}{s}) = 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1000
\textit{bsimp} \;\;  (a\backslash s)$,
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1001
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1002
but this turns out to be not true.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1003
A counterexample would be
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1004
\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\; 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1005
	\text{and} \;\; s = bb.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1006
\]
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1007
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1008
Then we would have 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1009
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1010
	$\textit{bsimp}\;\; ( a \backslash s )$ =
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1011
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1012
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1013
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1014
whereas 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1015
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1016
	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =  
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1017
	$_{Z}(_{Z} \ONE + _{S} c)$.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1018
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1019
Unfortunately, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1020
if we apply $\textit{bsimp}$ differently
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1021
we will always have this discrepancy. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1022
This is due to 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1023
the $\map \; (\fuse\; bs) \; as$ operation 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1024
happening at different locations in the regular expression.\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1025
The rewriting relation 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1026
$\rightsquigarrow^*$ 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1027
allows us to ignore this discrepancy
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1028
and view the expressions 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1029
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1030
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1031
	and\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1032
	$_{Z}(_{Z} \ONE + _{S} c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1033
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1034
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1035
as equal, because they were both re-written
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1036
from the same expression.\\
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1037
The simplification rewriting rules
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1038
given in \ref{rrewriteRules} are by no means
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1039
final,
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1040
one could come up new rules
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1041
such as 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1042
$\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1043
\SEQs [r_1, r_2, r_3]$.
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1044
This does not fit with the proof technique
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1045
of our main theorem, but seem to not violate the POSIX
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1046
property.\\
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1047
Having correctness property is good. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1048
But we would also a guarantee that the lexer is not slow in 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1049
some sense, for exampe, not grinding to a halt regardless of the input.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1050
As we have already seen, Sulzmann and Lu's simplification function
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1051
$\simpsulz$ cannot achieve this, because their claim that
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1052
the regular expression size does not grow arbitrary large
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1053
was not true. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1054
In the next chapter we shall prove that with our $\simp$, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1055
for a given $r$, the internal derivative size is always
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1056
finitely bounded by a constant.