ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Fri, 26 May 2023 08:09:30 +0100
changeset 646 56057198e4f5
parent 640 bd1354127574
child 649 ef2b8abcbc55
permissions -rwxr-xr-x
intro
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    14
In this chapter we introduce simplifications
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    15
for annotated regular expressions that can be applied to 
583
Chengsong
parents: 582
diff changeset
    16
each intermediate derivative result. This allows
Chengsong
parents: 582
diff changeset
    17
us to make $\blexer$ much more efficient.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    18
Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
    19
but their simplification functions could have been more efficient and in some cases needed fixing.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    20
%We contrast our simplification function 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    21
%with Sulzmann and Lu's, indicating the simplicity of our algorithm.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    22
%This is another case for the usefulness 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    23
%and reliability of formal proofs on algorithms.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    24
%These ``aggressive'' simplifications would not be possible in the injection-based 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    25
%lexing we introduced in chapter \ref{Inj}.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    26
%We then prove the correctness with the improved version of 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    27
%$\blexer$, called $\blexersimp$, by establishing 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    28
%$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    29
%
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    30
\section{Simplifications by Sulzmann and Lu}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    31
Consider the derivatives of the following example $(a^*a^*)^*$:
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    32
%and $(a^* + (aa)^*)^*$:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    33
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    34
	\begin{tabular}{lcl}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    35
		$(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    36
		$ (a^*a^* + a^*)\cdot(a^*a^*)^*$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    37
			     & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    38
		$ \stackrel{\backslash a}{\longrightarrow} $ & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    39
	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    40
							     & $\stackrel{\backslash a}{
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    41
	\longrightarrow} $ & $\ldots$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    42
	\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    43
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    44
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
    45
As can be seen, there are several duplications.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    46
A simple-minded simplification function cannot simplify
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
    47
the third regular expression in the above chain of derivative
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    48
regular expressions, namely
583
Chengsong
parents: 582
diff changeset
    49
\begin{center}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    50
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
583
Chengsong
parents: 582
diff changeset
    51
\end{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    52
because the duplicates are
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
    53
not next to each other, and therefore the rule
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
    54
$r+ r \rightarrow r$ from $\textit{simp}$ does not fire.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    55
One would expect a better simplification function to work in the 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    56
following way:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    57
\begin{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    58
	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    59
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    60
	\bigg\downarrow (1) \\
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    61
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    62
	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    63
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    64
	\bigg\downarrow (2) \\
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    65
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    66
	)\cdot(a^*a^*)^*  
583
Chengsong
parents: 582
diff changeset
    67
	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    68
	\bigg\downarrow (3) \\
583
Chengsong
parents: 582
diff changeset
    69
	(a^*a^* + a^* 
Chengsong
parents: 582
diff changeset
    70
	)\cdot(a^*a^*)^*  
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    71
\end{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    72
\noindent
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    73
In the first step, the nested alternative regular expression
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    74
$(a^*a^* + a^*) + a^*$ is flattened into $a^*a^* + a^* + a^*$.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    75
Now the third term $a^*$ can clearly be identified as a duplicate
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    76
and therefore removed in the second step. 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    77
This causes the two
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
    78
top-level terms to become the same and the second $(a^*a^*+a^*)\cdot(a^*a^*)^*$ 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    79
removed in the final step.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    80
Sulzmann and Lu's simplification function (using our notations) can achieve this
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    81
simplification:
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    82
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    83
	\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    84
		$\textit{simp}\_{SL} \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    85
		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    86
						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    87
		$\textit{simp}\_{SL} \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    88
		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    89
		\textit{then} \;\; \ZERO$\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    90
							    & & $\textit{else}\;\;_{bs}((\textit{simp}\_{SL} \;r_1)\cdot
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    91
							    (\textit{simp}\_{SL} \; r_2))$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    92
		$\textit{simp}\_{SL}  \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    93
		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    94
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    95
		$\textit{simp}\_{SL}  \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\textit{simp}\_{SL}  \; r)$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
    96
		$\textit{simp}\_{SL}  \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
    97
		(\nub \; (\filter \; (\neg\zeroable)\;((\textit{simp}\_{SL}  \; r) :: \map \; \textit{simp}\_{SL}  \; rs)))$\\ 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    98
		
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    99
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   100
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   101
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   102
The $\textit{zeroable}$ predicate 
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   103
tests whether the regular expression
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   104
is equivalent to $\ZERO$, and
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   105
can be defined as:
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   106
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   107
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   108
		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   109
		\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   110
		$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   111
		$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   112
		$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   113
		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   114
		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   115
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   116
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   117
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   118
The 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   119
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   120
	\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   121
		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   122
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   123
	\end{tabular}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   124
\end{center}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   125
\noindent
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   126
clause does flatten the alternative as required in step (1),
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   127
but $\textit{simp}\_{SL}$ is insufficient if we want to do steps (2) and (3),
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   128
as these ``identical'' terms have different bit-annotations.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   129
They also suggested that the $\textit{simp}\_{SL} $ function should be
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   130
applied repeatedly until a fixpoint is reached.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   131
We call this construction $\textit{SLSimp}$:
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   132
\begin{center}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   133
	\begin{tabular}{lcl}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   134
		$\textit{SLSimp} \; r$ & $\dn$ & 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   135
		$\textit{while}((\textit{simp}\_{SL}  \; r)\; \cancel{=} \; r)$ \\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   136
					 & & $\quad r := \textit{simp}\_{SL}  \; r$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   137
		& & $\textit{return} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   138
	\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   139
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   140
We call the operation of alternatingly 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   141
applying derivatives and simplifications
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   142
(until the string is exhausted) Sulz-simp-derivative,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   143
written $\backslash_{SLSimp}$:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   144
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   145
\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   146
	$r \backslash_{SLSimp} (c\!::\!s) $ & $\dn$ & $(\textit{SLSimp} \; (r \backslash c)) \backslash_{SLSimp}\, s$ \\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   147
$r \backslash_{SLSimp} [\,] $ & $\dn$ & $r$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   148
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   149
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   150
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   151
After the derivatives have been taken, the bitcodes
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   152
are extracted and decoded in the same manner
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   153
as $\blexer$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   154
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   155
\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   156
  $\textit{blexer\_SLSimp}\;r\,s$ & $\dn$ &
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   157
      $\textit{let}\;a = (r^\uparrow)\backslash_{SLSimp}\, s\;\textit{in}$\\                
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   158
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   159
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   160
  & & $\;\;\textit{else}\;\textit{None}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   161
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   162
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   163
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   164
We implemented this lexing algorithm in Scala, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   165
and found that the final derivative regular expression
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   166
size still grows exponentially (note the logarithmic scale):
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   167
\begin{figure}[H]
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   168
	\centering
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   169
\begin{tikzpicture}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   170
\begin{axis}[
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   171
    xlabel={$n$},
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   172
    ylabel={size},
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   173
    ymode = log,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   174
    legend entries={Final Derivative Size},  
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   175
    legend pos=north west,
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   176
    legend cell align=left]
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   177
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   178
\end{axis}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   179
\end{tikzpicture} 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   180
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   181
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   182
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   183
\end{figure}
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   184
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   185
At $n= 20$ we already get an out-of-memory error with Scala's normal 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   186
JVM heap size settings.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   187
In fact their simplification does not improve much over
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   188
the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   189
The time required also grows exponentially:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   190
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   191
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   192
\begin{tikzpicture}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   193
\begin{axis}[
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   194
    xlabel={$n$},
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   195
    ylabel={time},
601
Chengsong
parents: 600
diff changeset
   196
    %ymode = log,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   197
    legend entries={time in secs},  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   198
    legend pos=north west,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   199
    legend cell align=left]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   200
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexerTime.data};
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   201
\end{axis}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   202
\end{tikzpicture} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   203
\caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   204
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   205
$ using Sulzmann and Lu's lexer}\label{SulzmannLuLexerTime}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   206
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   207
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   208
which seems like a counterexample for 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   209
Sulzmann and Lu's linear complexity claim
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   210
in their paper \cite{Sulzmann2014}:
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   211
\begin{quote}\it
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   212
``Linear-Time Complexity Claim \\It is easy to see that each call of one of the functions/operations:
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   213
simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input.'' 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   214
\end{quote}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   215
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   216
The assumption that the size of the regular expressions
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   217
in the algorithm
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   218
would stay below a finite constant is not true, at least not in the
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   219
examples we considered.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   220
The main reason behind this is that (i) Haskell's $\textit{nub}$
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   221
function requires identical annotations between two 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   222
annotated regular expressions to qualify as duplicates,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   223
and therefore cannot simplify cases like $_{SZZ}a^*+_{SZS}a^*$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   224
even if both $a^*$ denote the same language, and
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   225
(ii) the ``flattening'' only applies to the head of the list
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   226
in the 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   227
\begin{center}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   228
	\begin{tabular}{lcl}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   229
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   230
		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   231
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   232
	\end{tabular}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   233
\end{center}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   234
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   235
clause, and therefore is not strong enough to simplify all
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   236
needed parts of the regular expression. Moreover,
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   237
the $\textit{simp}\_{SL}$ function is applied repeatedly
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   238
in each derivative step until a fixed point is reached, 
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   239
which makes the algorithm even more
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   240
unpredictable and inefficient.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   241
%To not get ``caught off guard'' by
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   242
%these counterexamples,
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   243
%one needs to be more careful when designing the
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   244
%simplification function and making claims about them.
584
1734bd5975a3 chap4 nub
Chengsong
parents: 583
diff changeset
   245
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   246
\section{Our $\textit{Simp}$ Function}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   247
We will now introduce our own simplification function.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   248
%by making a contrast with $\textit{simp}\_{SL}$.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   249
We also describe
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   250
the ideas behind Sulzmann and Lu's $\textit{simp}\_{SL}$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   251
algorithm 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   252
and why it fails to achieve the desired effect of keeping the sizes of derivatives finitely bounded. 
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   253
In addition, our simplification function will come with a formal
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   254
correctness proof.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   255
\subsection{Flattening Nested Alternatives}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   256
The idea behind the clause
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   257
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   258
	$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   259
	       _{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   260
\end{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   261
is that it allows
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   262
duplicate removal of regular expressions at different
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   263
``levels'' of alternatives.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   264
For example, this would help with the
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   265
following simplification:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   266
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   267
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   268
$(a+r)+r \longrightarrow a+r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   269
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   270
The problem is that only the head element
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   271
is ``spilled out''.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   272
It is more desirable
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   273
to flatten
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   274
an entire list to open up possibilities for further simplifications
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   275
with later regular expressions.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   276
Not flattening the rest of the elements also means that
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   277
the later de-duplication process 
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   278
does not fully remove further duplicates.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   279
For example,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   280
using $\textit{simp}\_{SL}$ we cannot
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   281
simplify
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   282
\begin{center}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   283
	$((a^* a^*)+\underline{(a^* + a^*)})\cdot (a^*a^*)^*+
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   284
((a^*a^*)+a^*)\cdot (a^*a^*)^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   285
\end{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   286
due to the underlined part not being the head 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   287
of the alternative.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   288
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   289
We define our flatten operation so that it flattens 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   290
the entire list: 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   291
 \begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   292
  \begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   293
  $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   294
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   295
  $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   296
    $\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise) 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   297
\end{tabular}    
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   298
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   299
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   300
Our $\flts$ operation 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   301
also throws away $\ZERO$s
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   302
as they do not contribute to a lexing result.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   303
\subsection{Duplicate Removal}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   304
After flattening is done, we can deduplicate.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   305
The de-duplicate function is called $\distinctBy$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   306
and that is where we make our second improvement over
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   307
Sulzmann and Lu's simplification method.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   308
The process goes as follows:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   309
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   310
$rs \stackrel{\textit{flts}}{\longrightarrow} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   311
rs_{flat} 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   312
\xrightarrow{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   313
rs_{flat} \; \rerases\; \varnothing} rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   314
%\stackrel{\distinctBy \; 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   315
%rs_{flat} \; \erase\; \varnothing}{\longrightarrow} \; rs_{distinct}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   316
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   317
where the $\distinctBy$ function is defined as:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   318
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   319
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   320
		$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   321
		$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & $\quad \textit{if} (f \; x \in acc)\;\; \textit{then} \;\; \distinctBy \; xs \; f \; acc$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   322
						       & & $\quad \textit{else}\;\; x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$ 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   323
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   324
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   325
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   326
The reason we define a distinct function under a mapping $f$ is because
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   327
we want to eliminate regular expressions that are syntactically the same,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   328
but have different bit-codes.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   329
For example, we can remove the second $a^*a^*$ from
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   330
$_{ZSZ}a^*a^* + _{SZZ}a^*a^*$, because it
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   331
represents a match with shorter initial sub-match 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   332
(and therefore is definitely not POSIX),
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   333
and will be discarded by $\bmkeps$ later.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   334
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   335
	$_{ZSZ}\underbrace{a^*}_{ZS:\; match \; 1\; times\quad}\underbrace{a^*}_{Z: \;match\; 1 \;times} + 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   336
	_{SZZ}\underbrace{a^*}_{S: \; match \; 0 \; times\quad}\underbrace{a^*}_{ZZ: \; match \; 2 \; times}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   337
	$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   338
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   339
%$_{bs1} r_1 + _{bs2} r_2 \text{where} (r_1)_{\downarrow} = (r_2)_{\downarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   340
Due to the way our algorithm works,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   341
the matches that conform to the POSIX standard 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   342
will always be placed further to the left. When we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   343
traverse the list from left to right,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   344
regular expressions we have already seen
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   345
will definitely not contribute to a POSIX value,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   346
even if they are attached with different bitcodes.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   347
These duplicates therefore need to be removed.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   348
To achieve this, we call $\rerases$ as the function $f$ during the distinction
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   349
operation. The function
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   350
$\rerases$ is very similar to $\erase$, except that it preserves the structure
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   351
when erasing an alternative regular expression.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   352
The reason why we use $\rerases$ instead of $\erase$ is that
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   353
it keeps the structures of alternative 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   354
annotated regular expressions
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   355
whereas $\erase$ would turn it back into a binary  tree structure.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   356
Not having to mess with the structure 
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   357
greatly simplifies the finiteness proof in chapter 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   358
\ref{Finite}.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   359
We give the definitions of $\rerases$ here together with
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   360
the new datatype used by $\rerases$ (as our plain
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   361
regular expression datatype does not allow non-binary alternatives).
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   362
For now we can think of 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   363
$\rerases$ as the function $(\_)_\downarrow$ defined in chapter \ref{Bitcoded1}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   364
and $\rrexp$ as plain regular expressions, but having a general list constructor
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   365
for alternatives:
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   366
\begin{figure}[H]
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   367
\begin{center}	
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   368
	$\rrexp ::=   \RZERO \mid  \RONE
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   369
			 \mid  \RCHAR{c}  
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   370
			 \mid  \RSEQ{r_1}{r_2}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   371
			 \mid  \RALTS{rs}
590
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   372
			 \mid \RSTAR{r}        $
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   373
\end{center}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   374
\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative 
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   375
constructor}\label{rrexpDef}
988e92a70704 more chap5 and chap6 bsimp_idem
Chengsong
parents: 589
diff changeset
   376
\end{figure}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   377
The function $\rerases$ we define as follows:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   378
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   379
\begin{tabular}{lcl}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   380
$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   381
$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   382
	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   383
$\rerase{_{bs}r_1\cdot r_2}$ & $\dn$ & $\RSEQ{\rerase{r_1}}{\rerase{r_2}}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   384
$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   385
$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   386
\end{tabular}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   387
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   388
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   389
\subsection{Putting Things Together}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   390
We can now give the definition of our  simplification function:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   391
%that looks somewhat similar to our Scala code is 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   392
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   393
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   394
   
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   395
	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   396
	  $\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \rerases \; \varnothing) $ \\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   397
   $\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   398
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   399
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   400
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   401
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   402
The simplification (named $\textit{bsimp}$ for \emph{b}it-coded) 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   403
does a pattern matching on the regular expression.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   404
When it detects that the regular expression is an alternative or
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   405
sequence, it will try to simplify its children regular expressions
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   406
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   407
$\ONE$, which might trigger further simplification at the current level.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   408
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   409
using rules such as  $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   410
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   411
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   412
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   413
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   414
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   415
   &&$\quad\textit{case} \;  (_{bs1}\ONE, a_2') \Rightarrow  \textit{fuse} \; (bs@bs_1) \;  a_2'$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   416
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   417
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   418
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   419
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   420
The most involved part is the $\sum$ clause, where we first call $\flts$ on
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   421
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$,
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   422
and then call $\distinctBy$ on that list. The predicate used in $\distinctBy$ for determining whether two 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   423
elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   424
Finally, depending on whether the regular expression list $as'$ has turned into a
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   425
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{ALTS}$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   426
decides whether to keep the current level constructor $\sum$ as it is, and 
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   427
removes it when there are fewer than two elements:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   428
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   429
	\begin{tabular}{lcl}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   430
		$\textit{bsimp}_{ALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   431
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   432
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   433
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   434
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   435
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   436
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   437
Having defined the $\textit{bsimp}$ function,
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   438
we add it as a phase after a derivative is taken.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   439
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   440
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   441
		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   442
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   443
\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   444
%Following previous notations
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   445
%when extending from derivatives w.r.t.~character to derivative
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   446
%w.r.t.~string, we define the derivative that nests simplifications 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   447
%with derivatives:%\comment{simp in  the [] case?}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   448
We extend this from characters to strings:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   449
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   450
\begin{tabular}{lcl}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   451
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   452
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   453
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   454
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   455
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   456
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   457
The lexer that extracts bitcodes from the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   458
derivatives with simplifications from our $\simp$ function
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   459
is called $\blexersimp$:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   460
\begin{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   461
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   462
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   463
      $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   464
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   465
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   466
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   467
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   468
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   469
\noindent
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   470
This algorithm keeps the regular expression size small, 
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   471
as we shall demonstrate with some examples in the next section.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   472
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   473
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   474
\subsection{Examples $(a+aa)^*$ and $(a^*\cdot a^*)^*$
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   475
After Simplification}
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   476
Recall the
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   477
previous $(a^*a^*)^*$ example
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   478
where $\textit{simp}\_{SL}$ could not
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   479
prevent the fast growth (over
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   480
3 million nodes just below $20$ input length)
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   481
will be reduced to just 15 and stays constant no matter how long the
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   482
input string is.
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   483
This is shown in the graphs below.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   484
\begin{figure}[H]
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   485
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   486
\begin{tabular}{ll}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   487
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   488
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   489
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   490
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   491
        width=7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   492
    height=4cm, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   493
    legend entries={Lexer with $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   494
    legend pos=  south east,
Chengsong
parents: 538
diff changeset
   495
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   496
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
Chengsong
parents: 538
diff changeset
   497
\end{axis}
Chengsong
parents: 538
diff changeset
   498
\end{tikzpicture} %\label{fig:BitcodedLexer}
Chengsong
parents: 538
diff changeset
   499
&
Chengsong
parents: 538
diff changeset
   500
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   501
\begin{axis}[
Chengsong
parents: 538
diff changeset
   502
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   503
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   504
    width = 7cm,
Chengsong
parents: 538
diff changeset
   505
    height = 4cm,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   506
    legend entries={Lexer with $\textit{simp}\_{SL}$},  
539
Chengsong
parents: 538
diff changeset
   507
    legend pos=  north west,
Chengsong
parents: 538
diff changeset
   508
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   509
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
Chengsong
parents: 538
diff changeset
   510
\end{axis}
Chengsong
parents: 538
diff changeset
   511
\end{tikzpicture} 
Chengsong
parents: 538
diff changeset
   512
\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   513
\end{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   514
\caption{Our Improvement over Sulzmann and Lu's in terms of size of the derivatives.}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   515
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   516
\noindent
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   517
Given the size difference, it is not
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   518
surprising that our $\blexersimp$ significantly outperforms
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   519
$\textit{blexer\_SLSimp}$ by Sulzmann and Lu.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   520
In the next section we are going to establish that our
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   521
simplification preserves the correctness of the algorithm.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   522
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   523
%	SECTION rewrite relation
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   524
%----------------------------------------------------------------------------------------
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   525
\section{Correctness of $\blexersimp$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   526
We first introduce the rewriting relation \emph{rrewrite}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   527
($\rrewrite$) between two regular expressions,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   528
which stands for an atomic
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   529
simplification.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   530
We then prove properties about
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   531
this rewriting relation and its reflexive transitive closure.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   532
Finally we leverage these properties to show
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   533
an equivalence between the results generated by
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   534
$\blexer$ and $\blexersimp$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   535
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   536
\subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   537
In the $\blexer$'s correctness proof, we
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   538
did not directly derive the fact that $\blexer$ generates the POSIX value,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   539
but first proved that $\blexer$ generates the same result as $\lexer$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   540
Then we re-use
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   541
the correctness of $\lexer$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   542
to obtain 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   543
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   544
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   545
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   546
	r\;s = \None$.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   547
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   548
%\begin{center}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   549
%	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   550
%\end{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   551
Here we apply this
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   552
modularised technique again
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   553
by first proving that
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   554
$\blexersimp \; r \; s $ 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   555
produces the same output as $\blexer \; r\; s$,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   556
and then piecing it together with 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   557
$\blexer$'s correctness to achieve our main
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   558
theorem:
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   559
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   560
	$(r, s) \rightarrow v \; \;   \textit{iff} \;\;  \blexersimp \; r \; s = \Some \;v$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   561
	\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   562
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   563
	r\;s = \None$
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   564
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   565
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   566
The overall idea for the proof
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   567
of $\blexer \;r \;s = \blexersimp \; r \;s$ 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   568
is that the transition from $r$ to $\textit{bsimp}\; r$ can be
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   569
broken down into smaller rewrite steps of the form:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   570
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   571
	$r \rightsquigarrow^* \textit{bsimp} \; r$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   572
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   573
where each rewrite step, written $\rightsquigarrow$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   574
is an ``atomic'' simplification that
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   575
is similar to a small-step reduction in operational semantics (
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   576
see figure \ref{rrewriteRules} for the rules):
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   577
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   578
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   579
	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   580
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   581
	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   582
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   583
	\inferrule * [Right = $S_1$]{\vspace{0em}}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   584
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   585
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   586
	
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   587
	\inferrule * [Right = $SL$] {\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   588
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   589
	\inferrule * [Right = $SR$] {\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   590
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   591
	\inferrule * [Right = $A0$] {\vspace{0em}}{ _{bs}\sum [] \rightsquigarrow \ZERO}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   592
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   593
	\inferrule * [Right = $A1$] {\vspace{0em}}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   594
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   595
	\inferrule * [Right = $AL$] {\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   596
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   597
	\inferrule * [Right = $LE$] {\vspace{0em}}{ [] \stackrel{s}{\rightsquigarrow} []}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   598
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   599
	\inferrule * [Right = $LT$] {rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{ r :: rs_1 \stackrel{s}{\rightsquigarrow} r :: rs_2 }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   600
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   601
	\inferrule * [Right = $LH$] {r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   602
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   603
	\inferrule * [Right = $L\ZERO$] {\vspace{0em}}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   604
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   605
	\inferrule * [Right = $LS$] {\vspace{0em}}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   606
591
b2d0de6aee18 more polishing integrated comments chap2
Chengsong
parents: 590
diff changeset
   607
	\inferrule * [Right = $LD$] {\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rs_c \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   608
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   609
\end{mathpar}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   610
\caption{
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   611
The rewrite rules that generate simplified regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   612
in small steps: $r_1 \rightsquigarrow r_2$ is for bitcoded regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   613
and $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$ for 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   614
lists of bitcoded regular expressions. 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   615
Interesting is the LD rule that allows copies of regular expressions 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   616
to be removed provided a regular expression 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   617
earlier in the list can match the same strings.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   618
}\label{rrewriteRules}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   619
\end{figure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   620
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   621
The rules $LT$ and $LH$ are for rewriting two regular expression lists
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   622
such that one regular expression
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   623
in the left-hand-side list is rewritable in one step
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   624
to the right-hand side's regular expression at the same position.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   625
This helps with defining the ``context rule'' $AL$.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   626
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   627
The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   628
are defined in the usual way:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   629
\begin{figure}[H]
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   630
	\centering
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   631
\begin{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   632
	\inferrule{\vspace{0em}}{ r \rightsquigarrow^* r \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   633
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   634
	\inferrule{\vspace{0em}}{rs \stackrel{s*}{\rightsquigarrow} rs \\}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   635
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   636
	\inferrule{r_1 \rightsquigarrow^*  r_2 \land \; r_2 \rightsquigarrow^* r_3}{r_1 \rightsquigarrow^* r_3\\}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   637
	
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   638
	\inferrule{rs_1 \stackrel{s*}{\rightsquigarrow}  rs_2 \land \; rs_2 \stackrel{s*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{s*}{\rightsquigarrow} rs_3}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   639
\end{mathpar}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   640
\caption{The Reflexive Transitive Closure of 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   641
$\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   642
\end{figure}
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   643
%Two rewritable terms will remain rewritable to each other
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   644
%even after a derivative is taken:
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   645
The main point of our rewriting relation
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   646
is that it is preserved under derivatives,
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
   647
namely
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   648
\begin{center}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   649
	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   650
\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   651
And also, if two terms are rewritable to each other,
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   652
then they produce the same bitcodes:
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   653
\begin{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   654
	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   655
\end{center}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   656
The decoding phase of both $\blexer$ and $\blexersimp$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   657
are the same, which means that if they receive the same
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   658
bitcodes before the decoding phase,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   659
they generate the same value after decoding is done.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   660
We will prove the three properties 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   661
we mentioned above in the next sub-section.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   662
\subsection{Important Properties of $\rightsquigarrow$}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   663
First we prove some basic facts 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   664
about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$, 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   665
$\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$,
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   666
which will be needed later.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   667
The inference rules (\ref{rrewriteRules}) we 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   668
gave in the previous section 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   669
have their ``many-steps version'':
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   670
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   671
\begin{lemma}\label{squig1}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   672
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   673
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   674
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   675
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   676
		\item
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   677
			$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\;  _{bs} \sum (r' :: rs)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   678
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   679
		\item
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   680
			The rewriting in many steps property is composable 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   681
			in terms of the sequence constructor:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   682
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   683
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;  
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   684
			_{bs} r_2 \cdot r_3 \quad $ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   685
			and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   686
			$\quad r_3 \rightsquigarrow^* r_4 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   687
			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* _{bs} \; r_1 \cdot r_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   688
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   689
			The rewriting in many steps properties 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   690
			$\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   691
			is preserved under the function $\fuse$:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   692
				$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   693
				\implies \fuse \; bs \; r_1 \rightsquigarrow^* \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   694
				\fuse \; bs \; r_2 \quad  $ and 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   695
				$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   696
				\implies \map \; (\fuse \; bs) \; rs_1 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   697
				\stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   698
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   699
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   700
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   701
	By an induction on 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   702
	the inductive cases of $\stackrel{s*}{\rightsquigarrow}$ and $\rightsquigarrow^*$ respectively.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   703
	The third and fourth points are 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   704
	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   705
	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   706
	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   707
	which can be inductively proven by the inductive cases of $\rightsquigarrow$ and 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   708
	$\stackrel{s}{\rightsquigarrow}$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   709
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   710
\noindent
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   711
The inference rules of $\stackrel{s}{\rightsquigarrow}$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   712
are defined in terms of the list cons operation, where
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   713
we establish that the 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   714
$\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   715
relation is also preserved w.r.t appending and prepending of a list.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   716
In addition, we
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   717
also prove some relations 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   718
between $\rightsquigarrow^*$ and $\stackrel{s*}{\rightsquigarrow}$.
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   719
\begin{lemma}\label{ssgqTossgs}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   720
	\hspace{0em}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   721
	\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   722
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   723
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   724
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   725
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   726
			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   727
			rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2 \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   728
			\textit{and} \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   729
			rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   730
			
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   731
		\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   732
			The $\stackrel{s}{\rightsquigarrow} $ relation after appending 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   733
			a list becomes $\stackrel{s*}{\rightsquigarrow}$:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   734
			$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   735
			\implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   736
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   737
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   738
			$r_1 \rightsquigarrow^* r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   739
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   740
		
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   741
			$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^* r_2 \implies
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   742
			r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   743
		\item			
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   744
			If we can rewrite a regular expression 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   745
			in many steps to $\ZERO$, then 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   746
			we can also rewrite any sequence containing it to $\ZERO$:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   747
			$r_1 \rightsquigarrow^* \ZERO 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   748
			\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   749
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   750
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   751
\begin{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   752
	The first part is by induction on the list $rs$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   753
	The second part is by induction on the inductive cases of $\stackrel{s*}{\rightsquigarrow}$.
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   754
	The third part is 
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   755
	by rule induction of $\stackrel{s}{\rightsquigarrow}$.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   756
	The fourth sub-lemma is 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   757
	by rule induction of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   758
	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three. 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   759
	The fifth part is a corollary of part four.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   760
	The last part is proven by rule induction again on $\rightsquigarrow^*$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   761
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   762
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   763
Now we are ready to give the proofs of the following properties:
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   764
\begin{itemize}
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   765
	\item
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   766
		$r \rightsquigarrow^* r'\land \bnullable \; r_1 
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   767
		\implies \bmkeps \; r = \bmkeps \; r'$. \\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   768
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   769
		$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   770
	\item
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   771
		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   772
\end{itemize}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   773
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   774
\subsubsection{Property 1: $r \rightsquigarrow^* r'\land \bnullable \; r_1 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   775
		\implies \bmkeps \; r = \bmkeps \; r'$}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   776
Intuitively, this property says we can 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   777
extract the same bitcodes using $\bmkeps$ from the nullable
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   778
components of two regular expressions $r$ and $r'$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   779
if we can rewrite from one to the other in finitely
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   780
many steps.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   781
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   782
For convenience, 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   783
we define a predicate for a list of regular expressions
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   784
having at least one nullable regular expression:
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   785
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   786
	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   787
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   788
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   789
The rewriting relation $\rightsquigarrow$ preserves (b)nullability:
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   790
\begin{lemma}\label{rewritesBnullable}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   791
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   792
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   793
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   794
			$\text{If} \; r_1 \rightsquigarrow r_2, \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   795
			\text{then} \; \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   796
		\item 	
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   797
			$\text{If} \; rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   798
			\text{then} \; \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   799
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   800
			$r_1 \rightsquigarrow^* r_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   801
			\implies \bnullable \; r_1 = \bnullable \; r_2$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   802
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   803
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   804
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   805
	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   806
	The third point is a result of the second.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   807
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   808
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   809
For convenience again,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   810
we define $\bmkepss$ on a list $rs$,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   811
which extracts the bit-codes on the first $\bnullable$ element in $rs$:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   812
\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   813
	\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   814
		$\bmkepss \; [] $ & $\dn$ & $[]$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   815
		$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \;(\bnullable \; r) \;\; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   816
		\textit{then} \;\; \bmkeps \; r \; \textit{else} \;\; \bmkepss \; rs$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   817
	\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   818
\end{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   819
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   820
If both regular expressions in a rewriting relation are nullable, then they 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   821
produce the same bitcodes:
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   822
\begin{lemma}\label{rewriteBmkepsAux}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   823
	\hspace{0em}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   824
	\begin{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   825
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   826
			$r_1 \rightsquigarrow r_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   827
			(\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 = 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   828
			\bmkeps \; r_2)$ 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   829
		\item
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   830
			and
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   831
			$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   832
			\implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   833
			\bmkepss \; rs_1 = \bmkepss \; rs2)$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   834
	\end{itemize}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   835
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   836
\begin{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   837
	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   838
\end{proof}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   839
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   840
With lemma \ref{rewriteBmkepsAux} in place we are ready to prove its
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   841
many-step version: 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   842
\begin{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   843
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   844
\end{lemma}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   845
\begin{proof}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   846
	By rule induction of $\stackrel{*}{\rightsquigarrow} $. Lemma 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   847
	$\ref{rewritesBnullable}$ gives us both $r$ and $r'$ are nullable.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   848
	The lemma \ref{rewriteBmkepsAux} solves the inductive case.
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   849
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   850
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   851
\subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$}
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   852
Now we get to the key part of the proof, 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   853
which says that our simplification's helper functions 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   854
such as $\distinctBy$ and $\flts$ describe
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   855
reducts of $\stackrel{s*}{\rightsquigarrow}$ and 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   856
$\rightsquigarrow^* $.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   857
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   858
The first lemma to prove is a more general version of 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   859
$rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   860
\begin{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   861
	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   862
	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   863
\end{lemma}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   864
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   865
It says that for a list made of two parts $rs_1 @ rs_2$, 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   866
one can throw away the duplicate
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   867
elements in $rs_2$, as well as those that have appeared in $rs_1$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   868
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   869
	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   870
\end{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   871
\noindent
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   872
Setting $rs_2$ to be empty,
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   873
we get the corollary
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   874
\begin{corollary}\label{dBPreserves}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   875
	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   876
\end{corollary}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   877
\noindent
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   878
Similarly the flatten function $\flts$ describes a reduct of
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   879
$\stackrel{s*}{\rightsquigarrow}$ as well:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   880
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   881
\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   882
	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   883
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   884
\begin{proof}
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   885
	By an induction on $rs$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   886
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   887
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   888
The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   889
\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   890
	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   891
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   892
\noindent
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   893
The simplification function
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   894
$\textit{bsimp}$ only transforms the regular expression  using steps specified by 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   895
$\rightsquigarrow^*$ and nothing else:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   896
\begin{lemma}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   897
	$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   898
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   899
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   900
	By an induction on $r$.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   901
	The most involved case is the alternative, 
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   902
	where we use lemmas \ref{bsimpaltsPreserves},
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   903
	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   904
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   905
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   906
			$rs$ &  $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   907
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   908
			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   909
			(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   910
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   911
	\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   912
	Using this we can derive the following rewrite sequence:\\
586
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   913
	\begin{center}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   914
		\begin{tabular}{lcl}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   915
			$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   916
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   917
			    & $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   918
			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   919
			    (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   920
			    \; \rerases \; \phi)$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   921
			    %& $\rightsquigarrow^*$ & $ _{bs} \sum (\distinctBy \; 
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   922
				%(\flts \; (\map \; \textit{bsimp}\; rs)) \; \;
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   923
				%\rerases \; \;\phi) $\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   924
			    & $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   925
		\end{tabular}
826af400b068 more chap4
Chengsong
parents: 585
diff changeset
   926
	\end{center}	
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   927
\end{proof}
585
4969ef817d92 chap4 more
Chengsong
parents: 584
diff changeset
   928
\subsubsection{Property 3: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   929
The rewrite relation 
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   930
$\rightsquigarrow$ changes into $\stackrel{*}{\rightsquigarrow}$
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   931
after derivatives are taken on both sides:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   932
\begin{lemma}\label{rewriteBder}
588
Chengsong
parents: 586
diff changeset
   933
	\hspace{0em}
Chengsong
parents: 586
diff changeset
   934
	\begin{itemize}
Chengsong
parents: 586
diff changeset
   935
		\item
Chengsong
parents: 586
diff changeset
   936
			If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c 
Chengsong
parents: 586
diff changeset
   937
			\rightsquigarrow^*  r_2 \backslash c$ 
Chengsong
parents: 586
diff changeset
   938
		\item	
Chengsong
parents: 586
diff changeset
   939
			If $rs_1 \stackrel{s}{\rightsquigarrow} rs_2$, then $ 
Chengsong
parents: 586
diff changeset
   940
			\map \; (\_\backslash c) \; rs_1 
Chengsong
parents: 586
diff changeset
   941
			\stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
Chengsong
parents: 586
diff changeset
   942
	\end{itemize}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   943
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   944
\begin{proof}
588
Chengsong
parents: 586
diff changeset
   945
	By induction on $\rightsquigarrow$ 
Chengsong
parents: 586
diff changeset
   946
	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
Chengsong
parents: 586
diff changeset
   947
\end{proof}
Chengsong
parents: 586
diff changeset
   948
\noindent
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   949
Now we can prove property 3 as an immediate corollary:
588
Chengsong
parents: 586
diff changeset
   950
\begin{corollary}\label{rewritesBder}
Chengsong
parents: 586
diff changeset
   951
	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*   
Chengsong
parents: 586
diff changeset
   952
	r_2 \backslash c$
Chengsong
parents: 586
diff changeset
   953
\end{corollary}
Chengsong
parents: 586
diff changeset
   954
\begin{proof}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   955
	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and   lemma \ref{rewriteBder}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   956
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   957
\noindent
588
Chengsong
parents: 586
diff changeset
   958
This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   959
to obtain the correspondence between
588
Chengsong
parents: 586
diff changeset
   960
$\blexer$ and $\blexersimp$'s intermediate
Chengsong
parents: 586
diff changeset
   961
derivative regular expressions 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   962
\begin{lemma}\label{bderBderssimp}
588
Chengsong
parents: 586
diff changeset
   963
	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   964
\end{lemma}
588
Chengsong
parents: 586
diff changeset
   965
\begin{proof}
Chengsong
parents: 586
diff changeset
   966
	By an induction on $s$.
Chengsong
parents: 586
diff changeset
   967
\end{proof}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   968
\subsection{Main Theorem}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   969
Now with \ref{bderBderssimp} in place we are ready for the main theorem.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   970
\begin{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   971
	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   972
\end{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   973
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   974
\begin{proof}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   975
	We can rewrite in many steps from the original lexer's 
588
Chengsong
parents: 586
diff changeset
   976
	derivative regular expressions to the 
Chengsong
parents: 586
diff changeset
   977
	lexer with simplification applied (by lemma \ref{bderBderssimp}):
Chengsong
parents: 586
diff changeset
   978
	\begin{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
   979
		$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $.
588
Chengsong
parents: 586
diff changeset
   980
	\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   981
	We know that they generate the same bits, if the lexing result is a match:
588
Chengsong
parents: 586
diff changeset
   982
	\begin{center}
Chengsong
parents: 586
diff changeset
   983
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
   984
		\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
Chengsong
parents: 586
diff changeset
   985
	\end{center}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
   986
	Now that they generate the same bits, we know they also give the same value after decoding.
588
Chengsong
parents: 586
diff changeset
   987
	\begin{center}
Chengsong
parents: 586
diff changeset
   988
		$\bnullable \; (a \backslash s) 
Chengsong
parents: 586
diff changeset
   989
		\implies \decode \; r \; (\bmkeps \; (a \backslash s)) = 
Chengsong
parents: 586
diff changeset
   990
		\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
Chengsong
parents: 586
diff changeset
   991
	\end{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   992
	Which is required by our proof goal:
588
Chengsong
parents: 586
diff changeset
   993
	\begin{center}
Chengsong
parents: 586
diff changeset
   994
		$\blexer \; r \; s = \blexersimp \; r \; s$.
Chengsong
parents: 586
diff changeset
   995
	\end{center}	
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   996
\end{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   997
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   998
As a corollary,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
   999
we can link this result with the lemma we proved earlier that 
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1000
\begin{center}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1001
	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = \Some \;v$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1002
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1003
	r\;s = \None$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1004
\end{center}
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1005
and obtain the property that the bit-coded lexer with simplification is
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1006
indeed correctly generating a POSIX lexing result, if such a result exists.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1007
\begin{corollary}
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1008
	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s = \Some \; v$\\
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1009
	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1010
	r\;s = \None$.
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
  1011
\end{corollary}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1012
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1013
\subsection{Comments on the Proof Techniques Used}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1014
Straightforward as the proof may seem,
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1015
the efforts we spent obtaining it were far from trivial.
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1016
We initially attempted to re-use the argument 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1017
in \cref{flex_retrieve}. 
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1018
The problem is that both functions $\inj$ and $\retrieve$ require 
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1019
that the annotated regular expressions stay unsimplified, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1020
so that one can 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1021
correctly compare $v_{i+1}$ and $r_i$  and $v_i$ 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1022
in diagram \ref{graph:inj}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1023
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1024
We also tried to prove 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1025
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1026
$\textit{bsimp} \;\; (\bderssimp{a}{s}) = 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1027
\textit{bsimp} \;\;  (a\backslash s)$,
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1028
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1029
but this turns out to be not true.
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1030
A counterexample is
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1031
\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\; 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1032
	\text{and} \;\; s = bb.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1033
\]
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1034
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1035
Then we would have 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1036
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1037
	$\textit{bsimp}\;\; ( a \backslash s )$ =
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1038
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1039
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1040
\noindent
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1041
whereas 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1042
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1043
	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =  
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1044
	$_{Z}(_{Z} \ONE + _{S} c)$.
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1045
\end{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1046
Unfortunately, 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1047
if we apply $\textit{bsimp}$ differently
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1048
we will always have this discrepancy. 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1049
This is due to 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1050
the $\map \; (\fuse\; bs) \; as$ operation 
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1051
happening at different locations in the regular expression.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1052
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1053
The rewriting relation 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1054
$\rightsquigarrow^*$ 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1055
allows us to ignore this discrepancy
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1056
and view the expressions 
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1057
\begin{center}
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1058
	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1059
	and\\
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1060
	$_{Z}(_{Z} \ONE + _{S} c)$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1061
589
86e0203db2da chap4 finished
Chengsong
parents: 588
diff changeset
  1062
\end{center}
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1063
as equal because they were both re-written
639
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1064
from the same expression.
80cc6dc4c98b until chap 7
Chengsong
parents: 624
diff changeset
  1065
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1066
The simplification rewriting rules
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1067
given in \ref{rrewriteRules} are by no means
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1068
final,
640
bd1354127574 more proofreading done, last version before submission
Chengsong
parents: 639
diff changeset
  1069
one could come up with new rules
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1070
such as 
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1071
$\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1072
\SEQs [r_1, r_2, r_3]$.
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1073
However this does not fit with the proof technique
600
fd068f39ac23 chap4 comments done
Chengsong
parents: 591
diff changeset
  1074
of our main theorem, but seem to not violate the POSIX
624
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1075
property.
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1076
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1077
Having established the correctness of our
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1078
$\blexersimp$, in the next chapter we shall prove that with our $\simp$ function,
8ffa28fce271 all comments incorporated!!+related work
Chengsong
parents: 601
diff changeset
  1079
for a given $r$, the derivative size is always
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
  1080
finitely bounded by a constant.