ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Tue, 23 Aug 2022 14:21:13 +0100
changeset 583 4aabb0629e4b
parent 582 3e19073e91f4
child 584 1734bd5975a3
permissions -rwxr-xr-x
chap4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
583
Chengsong
parents: 582
diff changeset
    14
In this chapter we introduce the simplifications
Chengsong
parents: 582
diff changeset
    15
on annotated regular expressions that can be applied to 
Chengsong
parents: 582
diff changeset
    16
each intermediate derivative result. This allows
Chengsong
parents: 582
diff changeset
    17
us to make $\blexer$ much more efficient.
Chengsong
parents: 582
diff changeset
    18
We contrast this simplification function 
Chengsong
parents: 582
diff changeset
    19
with Sulzmann and Lu's original
Chengsong
parents: 582
diff changeset
    20
simplifications, indicating the simplicity of our algorithm and
Chengsong
parents: 582
diff changeset
    21
improvements we made, demostrating
Chengsong
parents: 582
diff changeset
    22
the usefulness and reliability of formal proofs on algorithms.
Chengsong
parents: 582
diff changeset
    23
These ``aggressive'' simplifications would not be possible in the injection-based 
Chengsong
parents: 582
diff changeset
    24
lexing we introduced in chapter \ref{Inj}.
Chengsong
parents: 582
diff changeset
    25
We then go on to prove the correctness with the improved version of 
Chengsong
parents: 582
diff changeset
    26
$\blexer$, called $\blexersimp$, by establishing 
Chengsong
parents: 582
diff changeset
    27
$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    28
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    29
\section{Simplification for Annotated Regular Expressions}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    30
The first thing we notice in the fast growth of examples such as $(a^*a^*)^*$'s
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    31
and $(a^* + (aa)^*)^*$'s derivatives is that a lot of duplicated sub-patterns
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    32
are scattered around different levels, and therefore requires 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    33
de-duplication at different levels:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    34
\begin{center}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    35
	$(a^*a^*)^* \rightarrow (a^*a^* + a^*)\cdot(a^*a^*)^* \rightarrow $\\
583
Chengsong
parents: 582
diff changeset
    36
	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^* \rightarrow \ldots$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    37
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    38
\noindent
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    39
As we have already mentioned in \ref{eqn:growth2},
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    40
a simple-minded simplification function cannot simplify
583
Chengsong
parents: 582
diff changeset
    41
\begin{center}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    42
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
583
Chengsong
parents: 582
diff changeset
    43
\end{center}
Chengsong
parents: 582
diff changeset
    44
any further, one would expect a better simplification function to work in the 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    45
following way:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    46
\begin{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    47
	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    48
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    49
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    50
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    51
	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    52
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    53
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    54
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    55
	)\cdot(a^*a^*)^*  
583
Chengsong
parents: 582
diff changeset
    56
	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
Chengsong
parents: 582
diff changeset
    57
	\bigg\downarrow \\
Chengsong
parents: 582
diff changeset
    58
	(a^*a^* + a^* 
Chengsong
parents: 582
diff changeset
    59
	)\cdot(a^*a^*)^*  
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    60
\end{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    61
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    62
This motivating example came from testing Sulzmann and Lu's 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    63
algorithm: their simplification does 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    64
not work!
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    65
We quote their $\textit{simp}$ function verbatim here:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    66
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    67
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    68
		$\simp\; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    69
		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    70
						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    71
		$\simp\;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    72
		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    73
		\textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    74
					     & & $\textit{else}\;\;_{bs}((\simp\;r_1)\cdot
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    75
					     (\simp\; r_2))$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    76
		$\simp \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    77
		$\simp \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    78
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    79
		$\simp \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simp \; r)$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    80
		$\simp \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    81
		(\nub \; (\filter \; (\not \circ \zeroable)\;((\simp \; r) :: \map \; \simp \; rs)))$\\ 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    82
		
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    83
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    84
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    85
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    86
the $\textit{zeroable}$ predicate 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    87
which tests whether the regular expression
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    88
is equivalent to $\ZERO$,
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    89
is defined as:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    90
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    91
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    92
		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    93
		\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    94
		$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    95
		$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    96
		$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    97
		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    98
		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    99
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   100
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   101
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   102
Despite that we have already implemented the simple-minded simplifications 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   103
such as throwing away useless $\ONE$s and $\ZERO$s.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   104
The simplification rule $r + r \rightarrow $ cannot make a difference either
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   105
since it only removes duplicates on the same level, not something like 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   106
$(r+a)+r \rightarrow r+a$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   107
This requires us to break up nested alternatives into lists, for example 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   108
using the flatten operation similar to the one defined for any function language's
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   109
list datatype:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   110
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   111
 \begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   112
  \begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   113
  $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   114
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   115
  $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   116
    $\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise) 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   117
\end{tabular}    
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   118
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   119
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   120
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   121
There is a minor difference though, in that our $\flts$ operation defined by us
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   122
also throws away $\ZERO$s.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   123
For a flattened list of regular expressions, a de-duplication can be done efficiently:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   124
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   125
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   126
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   127
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   128
		$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   129
		$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   130
						       & & $\quad \textit{if} (f \; x) \in acc \textit{then} \distinctBy \; xs \; f \; acc$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   131
						       & & $\quad \textit{else} x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   132
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   133
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   134
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   135
The reason we define a distinct function under a mapping $f$ is because
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   136
we want to eliminate regular expressions that are the same syntactically,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   137
but having different bit-codes (more on the reason why we can do this later).
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   138
To achieve this, we call $\erase$ as the function $f$ during the distinction
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   139
operation.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   140
A recursive definition of our  simplification function 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   141
that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   142
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   143
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   144
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   145
   
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   146
	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   147
	  $\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \erase \; \phi) $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   148
   $\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   149
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   150
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   151
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   152
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   153
The simplification (named $\bsimp$ for \emph{b}it-coded) does a pattern matching on the regular expression.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   154
When it detected that the regular expression is an alternative or
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   155
sequence, it will try to simplify its children regular expressions
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   156
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   157
$\ONE$, which might trigger further simplification at the current level.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   158
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   159
using rules such as  $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   160
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   161
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   162
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   163
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   164
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   165
   &&$\quad\textit{case} \;  (_{bs1}\ONE, a_2') \Rightarrow  \textit{fuse} \; (bs@bs_1) \;  a_2'$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   166
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   167
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   168
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   169
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   170
The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   171
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   172
and then call $\distinctBy$ on that list, the predicate determining whether two 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   173
elements are the same is $\erase \; r_1 = \erase\; r_2$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   174
Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   175
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   176
decides whether to keep the current level constructor $\sum$ as it is, and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   177
removes it when there are less than two elements:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   178
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   179
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   180
		$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   181
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   182
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   183
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   184
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   185
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   186
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   187
Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   188
we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   189
so it stays small:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   190
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   191
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   192
		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   193
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   194
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   195
Following previous notation of  natural
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   196
extension from derivative w.r.t.~character to derivative
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   197
w.r.t.~string, we define the derivative that nests simplifications with derivatives:%\comment{simp in  the [] case?}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   198
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   199
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   200
\begin{tabular}{lcl}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   201
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   202
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   203
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   204
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   205
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   206
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   207
Extracting bit-codes from the derivatives that had been simplified repeatedly after 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   208
each derivative run, the simplified $\blexer$, called $\blexersimp$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   209
is defined as 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   210
 \begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   211
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   212
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   213
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   214
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   215
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   216
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   217
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   218
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   219
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   220
\noindent
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   221
This algorithm keeps the regular expression size small, for example,
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   222
with this simplification our previous $(a + aa)^*$ example's fast growth (over
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   223
$10^5$ nodes at around $20$ input length)
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   224
will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   225
input string is.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   226
We show some graphs to better demonstrate this imrpovement.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   227
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   228
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   229
\section{$(a+aa)^*$ and $(a^*\cdot a^*)^*$  against $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   230
For $(a+aa)^*$, it used to grow to over $9000$ nodes with simple-minded simplification
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   231
at only around $15$ input characters:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   232
\begin{center}
539
Chengsong
parents: 538
diff changeset
   233
\begin{tabular}{ll}
Chengsong
parents: 538
diff changeset
   234
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   235
\begin{axis}[
Chengsong
parents: 538
diff changeset
   236
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   237
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   238
        width=7cm,
Chengsong
parents: 538
diff changeset
   239
    height=4cm, 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   240
    legend entries={Lexer with $\textit{bsimp}$},  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   241
    legend pos=  south east,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   242
    legend cell align=left]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   243
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_bsimp.data};
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   244
\end{axis}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   245
\end{tikzpicture} %\label{fig:BitcodedLexer}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   246
&
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   247
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   248
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   249
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   250
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   251
    width = 7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   252
    height = 4cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   253
    legend entries={Lexer without $\textit{bsimp}$},  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   254
    legend pos=  north west,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   255
    legend cell align=left]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   256
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_easysimp.data};
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   257
\end{axis}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   258
\end{tikzpicture} 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   259
\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   260
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   261
And for $(a^*\cdot a^*)^*$, unlike in \ref{fig:BetterWaterloo}, the size with simplification now stay nicely constant with current
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   262
simplification rules (we put the graphs together to show the contrast)
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   263
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   264
\begin{tabular}{ll}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   265
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   266
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   267
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   268
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   269
        width=7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   270
    height=4cm, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   271
    legend entries={Lexer with $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   272
    legend pos=  south east,
Chengsong
parents: 538
diff changeset
   273
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   274
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
Chengsong
parents: 538
diff changeset
   275
\end{axis}
Chengsong
parents: 538
diff changeset
   276
\end{tikzpicture} %\label{fig:BitcodedLexer}
Chengsong
parents: 538
diff changeset
   277
&
Chengsong
parents: 538
diff changeset
   278
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   279
\begin{axis}[
Chengsong
parents: 538
diff changeset
   280
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   281
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   282
    width = 7cm,
Chengsong
parents: 538
diff changeset
   283
    height = 4cm,
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   284
    legend entries={Lexer without $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   285
    legend pos=  north west,
Chengsong
parents: 538
diff changeset
   286
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   287
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
Chengsong
parents: 538
diff changeset
   288
\end{axis}
Chengsong
parents: 538
diff changeset
   289
\end{tikzpicture} 
Chengsong
parents: 538
diff changeset
   290
\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   291
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   292
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   293
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   294
%	SECTION rewrite relation
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   295
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   296
\section{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   297
In the $\blexer$'s correctness proof, we
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   298
did not directly derive the fact that $\blexer$ gives out the POSIX value,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   299
but first prove that $\blexer$ is linked with $\lexer$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   300
Then we re-use
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   301
the correctness of $\lexer$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   302
Here we follow suit by first proving that
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   303
$\blexersimp \; r \; s $ 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   304
produces the same output as $\blexer \; r\; s$,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   305
and then putting it together with 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   306
$\blexer$'s correctness result
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   307
($(r, s) \rightarrow v \implies \blexer \; r \;s = v$)
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   308
to obtain half of the correctness property (the other half
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   309
being when $s$ is not $L \; r$ which is routine to establish)
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   310
\begin{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   311
	$(r, s) \rightarrow v \implies \blexersimp \; r \; s = v$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   312
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   313
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   314
because it makes the proof simpler
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   315
The overall idea for the correctness of our simplified bitcoded lexer
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   316
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   317
is that the $\textit{bsimp}$ will not change the regular expressions so much that
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   318
it becomes impossible to extract the $\POSIX$ values.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   319
To capture this "similarity" between unsimplified regular expressions and simplified ones,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   320
we devise the rewriting relation $\rrewrite$, written infix as $\rightsquigarrow$:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   321
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   322
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   323
\begin{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   324
	\inferrule{}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   325
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   326
	\inferrule{}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   327
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   328
	\inferrule{}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   329
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   330
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   331
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   332
	\inferrule{\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   333
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   334
	\inferrule{\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   335
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   336
	\inferrule{}{ _{bs}\sum [] \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   337
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   338
	\inferrule{}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   339
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   340
	\inferrule{\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   341
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   342
	\inferrule{}{\\ [] \stackrel{s}{\rightsquigarrow} []}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   343
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   344
	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{\\ r :: rs_1 \rightsquigarrow r :: rs_2 \rightsquigarrow}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   345
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   346
	\inferrule{r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   347
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   348
	\inferrule{}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   349
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   350
	\inferrule{}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   351
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   352
	\inferrule{\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rsc \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   353
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   354
\end{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   355
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   356
These "rewrite" steps define the atomic simplifications we could impose on regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   357
under our simplification algorithm.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   358
For convenience, we define a relation $\stackrel{s}{\rightsquigarrow}$ for rewriting
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   359
a list of regular exression to another list.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   360
The $\rerase{}$ function is used instead of $_\downarrow$ for the finiteness bound proof of next chapter,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   361
which we will discuss later. For the moment the reader can assume they basically do the same thing as $\erase$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   362
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   363
Usually more than one steps are taking place during the simplification of a regular expression,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   364
therefore we define the reflexive transitive closure of the $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   365
relations:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   366
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   367
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   368
\begin{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   369
	\inferrule{}{ r \stackrel{*}{\rightsquigarrow} r \\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   370
	\inferrule{}{\\ rs \stackrel{s*}{\rightsquigarrow} rs \\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   371
	\inferrule{\\r_1 \stackrel{*}{\rightsquigarrow}  r_2 \land \; r_2 \stackrel{*}{\rightsquigarrow} r_3}{r_1 \stackrel{*}{\rightsquigarrow} r_3\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   372
	\inferrule{\\rs_1 \stackrel{*}{\rightsquigarrow}  rs_2 \land \; rs_2 \stackrel{*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{*}{\rightsquigarrow} rs_3}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   373
\end{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   374
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   375
Now that we have modelled the rewriting behaviour of our simplifier $\bsimp$, we prove mainly 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   376
three properties about how these relations connect to $\blexersimp$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   377
\begin{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   378
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   379
		$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   380
		The algorithm $\bsimp$ only transforms the regex $r$ using steps specified by 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   381
		$\stackrel{*}{\rightsquigarrow}$ and nothing else.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   382
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   383
		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow r'\backslash c$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   384
		The relation $\stackrel{*}{rightsquigarrow}$ is preserved under derivative. 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   385
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   386
		$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$. If we reach another 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   387
		expression in finitely many atomic simplification steps, then these two regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   388
		will produce the same bit-codes under the bit collection function $\bmkeps$ used by our $\blexer$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   389
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   390
\end{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   391
\section{Three Important properties}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   392
These properties would work together towards the correctness theorem.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   393
We start proving each of these lemmas below.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   394
\subsection{$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$ and $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   395
The first few properties we establish is that the inference rules we gave for $\rightsquigarrow$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   396
and $\stackrel{s}{\rightsquigarrow}$ also hold as implications for $\stackrel{*}{\rightsquigarrow}$ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   397
$\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   398
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   399
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   400
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   401
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   402
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   403
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   404
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   405
	$r \stackrel{*}{\rightsquigarrow} r' \implies _{bs} \sum r :: rs \stackrel{*}{\rightsquigarrow}  _{bs} \sum r' :: rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   406
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   407
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   408
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   409
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   410
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   411
Then we establish that the $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ relation is preserved w.r.t appending and prepending of a  list:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   412
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   413
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   414
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   415
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   416
	By induction on the list $rs$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   417
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   418
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   419
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   420
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   421
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   422
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   423
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   424
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   425
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   426
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   427
The $\stackrel{s}{\rightsquigarrow} $ relation after appending a list becomes $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   428
\begin{lemma}\label{ssgqTossgs}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   429
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   430
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   431
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   432
	By rule induction of $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   433
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   434
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   435
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   436
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   437
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   438
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$ and using \ref{ssgqTossgs}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   439
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   440
Here are two lemmas relating $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   441
\begin{lemma}\label{singleton}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   442
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   443
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   444
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   445
	By rule induction of $ \stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   446
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   447
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   448
	$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   449
	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   450
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   451
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   452
	By using \ref{singleton}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   453
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   454
Now we get to the "meaty" part of the proof, which relates the relations $\stackrel{s*}{\rightsquigarrow}$ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   455
$\stackrel{*}{\rightsquigarrow} $ with our simplification components such $\distinctBy$ and $\flts$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   456
The first lemma below says that for a list made of two parts $rs_1 @ rs_2$, one can throw away the duplicate
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   457
elements in $rs_2$, as well as those that have appeared in $rs_1$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   458
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   459
	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} (rs_1 @ (\distinctBy \; rs_2 \; \; \rerase{\_}\;  \; (\map\;\; \rerase{\_}\; \; rs_1)))$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   460
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   461
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   462
	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   463
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   464
The above h as the corollary that is suitable for the actual way $\distinctBy$ is called in $\bsimp$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   465
\begin{lemma}\label{dBPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   466
	$rs_ 1 \rightarrow \distinctBy \; rs_1 \; \phi$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   467
\end{lemma}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   468
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   469
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   470
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   471
The flatten function $\flts$ works within the $\rightsquigarrow$ relation:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   472
\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   473
	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   474
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   475
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   476
The rewriting in many steps property is composible in terms of regular expression constructors:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   477
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   478
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} \;  _{bs} r_2 \cdot r_3 \quad $ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   479
$r_3 \stackrel{*}{\rightsquigarrow} r_4 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} _{bs} \; r_1 \cdot r_4$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   480
\end{lemma}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   481
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   482
The rewriting in many steps properties $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ is preserved under the function $\fuse$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   483
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   484
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies \fuse \; bs \; r_1 \stackrel{*}{\rightsquigarrow} \; \fuse \; bs \; r_2 \quad $ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   485
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\fuse \; bs) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   486
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   487
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   488
	By the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   489
	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 \implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   490
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   491
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   492
If we could rewrite a regular expression in many steps to $\ZERO$, then 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   493
we could also rewrite any sequence containing it to $\ZERO$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   494
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   495
	$r_1 \stackrel{*}{\rightsquigarrow} \ZERO \implies _{bs}r_1\cdot r_2 \stackrel{*}{\rightsquigarrow} \ZERO$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   496
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   497
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   498
	$\bmkeps \; (r \backslash s) = \bmkeps \; \bderssimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   499
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   500
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   501
The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   502
\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   503
	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   504
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   505
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   506
Before we give out the next lemmas, we define a predicate for a list of regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   507
having at least one nullable regular expressions:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   508
\begin{definition}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   509
	$\textit{bnullables} \; rs \dn  \exists r \in rs. \bnullable r$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   510
\end{definition}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   511
The rewriting relation $\rightsquigarrow$ preserves nullability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   512
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   513
	$r_1 \rightsquigarrow r_2 \implies  \bnullable \; r_1 = \bnullable \; r_2$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   514
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   515
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   516
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   517
	By rule induction of $\rightarrow$ and $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   518
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   519
So does the many steps rewriting:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   520
\begin{lemma}\label{rewritesBnullable}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   521
	$r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   522
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   523
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   524
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   525
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   526
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   527
And if both regular expressions in a rewriting relation are nullable, then they 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   528
produce the same bit-codes:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   529
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   530
\begin{lemma}\label{rewriteBmkepsAux}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   531
	$r_1 \rightsquigarrow r_2 \implies (\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 = 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   532
	\bmkeps \; r_2)$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   533
	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 \implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   534
	\bmkepss \; rs_1 = \bmkepss \; rs2)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   535
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   536
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   537
The definition of $\bmkepss$ on list $rs$ is just to extract the bit-codes on the first element in $rs$ that 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   538
is $bnullable$:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   539
\begin{center}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   540
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   541
		$\bmkepss \; [] $ & $\dn$ & $[]$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   542
		$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \; \bnullable \; r then \; (\bmkeps \; r) \; \textit{else} \; \bmkepss \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   543
	\end{tabular}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   544
\end{center}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   545
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   546
And now we are ready to prove the key property that if you 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   547
have two regular expressions, one rewritable in many steps to the other,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   548
and one of them is $\bnullable$, then they will both yield the same bits under $\bmkeps$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   549
\begin{lemma}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   550
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r_1 \;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   551
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   552
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   553
	By rule induction of $\stackrel{*}{\rightsquigarrow} $, using \ref{rewriteBmkepsAux} and $\ref{rewritesBnullable}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   554
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   555
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   556
the other property is also ready:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   557
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   558
	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   559
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   560
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   561
	By an induction on $r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   562
The most difficult case would be the alternative case, where we using properties such as \ref{bsimpaltsPreserves} and \ref{fltsPreserves} and \ref{dBPreserves}, we could continuously rewrite a list like:\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   563
	$rs \stackrel{s*}{\rightsquigarrow} \map \; \textit{bsimp} \; rs$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   564
	$\ldots \stackrel{s*}{\rightsquigarrow} \flts \; (\map \; \textit{bsimp} \; rs)$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   565
	$\ldots \;\stackrel{s*}{\rightsquigarrow} \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   566
	Then we could do the following regular expresssion many steps rewrite:\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   567
	$ _{bs} \sum \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi \stackrel{*}{\rightsquigarrow} \bsimpalts \; bs \; (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   568
	\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   569
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   570
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   571
\section{Proof for the Property: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   572
The first thing we prove is that if we could rewrite in one step, then after derivative
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   573
we could rewrite in many steps:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   574
\begin{lemma}\label{rewriteBder}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   575
	$r_1 \rightsquigarrow r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow}  r_2 \backslash c$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   576
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\_\backslash c) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   577
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   578
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   579
	By induction on $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   580
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   581
Now we can prove that once we could rewrite from one expression to another in many steps,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   582
then after a derivative on both sides we could still rewrite one to another in many steps:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   583
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   584
	$r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow}  r_2 \backslash c$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   585
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   586
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   587
	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma :\ref{rewriteBder}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   588
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   589
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   590
This can be extended and combined with the previous two important properties
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   591
so that a regular expression's successivve derivatives can be rewritten in many steps
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   592
to its simplified counterpart:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   593
\begin{lemma}\label{bderBderssimp}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   594
	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   595
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   596
\subsection{Main Theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   597
Now with \ref{bdersBderssimp} we are ready for the main theorem.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   598
To link $\blexersimp$ and $\blexer$, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   599
we first say that they give out the same bits, if the lexing result is a match:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   600
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   601
	$\bnullable \; (a \backslash s) \implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   602
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   603
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   604
Now that they give out the same bits, we know that they give the same value after decoding,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   605
which we know is correct value as $\blexer$ is correct:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   606
\begin{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   607
	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   608
\end{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   609
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   610
\begin{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   611
	One can rewrite in many steps from the original lexer's derivative regular expressions to the 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   612
	lexer with simplification applied:
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   613
	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   614
	If two regular expressions are rewritable, then they produce the same bits.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   615
	Under the condition that $ r_1$ is nullable, we have
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   616
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r', \;\;\text{then} \;\; \bmkeps \; r = \bmkeps \; r'$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   617
	This proves the \emph{if} (interesting) branch of
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   618
	$\blexer \; r \; s = \blexersimp{r}{s}$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   619
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   620
\end{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   621
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   622
As a corollary,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   623
we link this result with the lemma we proved earlier that 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   624
\begin{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   625
	$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   626
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   627
and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   628
indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   629
\begin{corollary}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   630
	$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   631
\end{corollary}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   632
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   633
\subsection{Comments on the Proof Techniques Used}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   634
The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   635
compared with not having simplification is that we can no longer use the argument 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   636
in \cref{flex_retrieve}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   637
The function \retrieve needs the cumbersome structure of the (umsimplified)
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   638
annotated regular expression to 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   639
agree with the structure of the value, but simplification will always mess with the 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   640
structure.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   641
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   642
We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   643
but this turns out to be not true, A counterexample of this being
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   644
\[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   645
\]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   646
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   647
Then we would have $\bsimp{a \backslash s}$ being 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   648
$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   649
whereas $\bsimp{\bderssimp{a}{s}}$ would be 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   650
$_{Z}(_{Z} \ONE + _{S} c)$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   651
Unfortunately if we apply $\textit{bsimp}$ at different
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   652
stages we will always have this discrepancy, due to 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   653
whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   654
is taken at some points will be entirely dependant on when the simplification 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   655
take place whether there is a larger alternative structure surrounding the 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   656
alternative being simplified.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   657
The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   658
us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   659
are taken, but simply say that they can be taken to make two similar 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   660
regular expressions equal, and can be done after interleaving derivatives
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   661
and simplifications.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   662
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   663
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   664
Having correctness property is good. But we would also like the lexer to be efficient in 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   665
some sense, for exampe, not grinding to a halt at certain cases.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   666
In the next chapter we shall prove that for a given $r$, the internal derivative size is always
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   667
finitely bounded by a constant.
582
3e19073e91f4 chap3 done
Chengsong
parents: 579
diff changeset
   668
we would expect in the