ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Wed, 17 Aug 2022 01:09:13 +0100
changeset 579 35df9cdd36ca
parent 576 3e1b699696b6
child 582 3e19073e91f4
permissions -rwxr-xr-x
more chap3
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    14
Now we introduce the simplifications, which is why we introduce the 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    15
bitcodes in the first place.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    16
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    17
\section{Simplification for Annotated Regular Expressions}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    18
The first thing we notice in the fast growth of examples such as $(a^*a^*)^*$'s
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    19
and $(a^* + (aa)^*)^*$'s derivatives is that a lot of duplicated sub-patterns
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    20
are scattered around different levels, and therefore requires 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    21
de-duplication at different levels:
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    22
\begin{center}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    23
	$(a^*a^*)^* \rightarrow (a^*a^* + a^*)\cdot(a^*a^*)^* \rightarrow $\\
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    24
	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    25
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    26
\noindent
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    27
As we have already mentioned in \ref{eqn:growth2},
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    28
a simple-minded simplification function cannot simplify
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    29
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    30
any further.
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    31
we would expect a better simplification function to work in the 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    32
following way:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    33
\begin{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    34
	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    35
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    36
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    37
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    38
	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    39
	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    40
	\bigg\downarrow \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    41
	(a^*a^* + a^* 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    42
	)\cdot(a^*a^*)^*  
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    43
	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    44
\end{gather*}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    45
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    46
This motivating example came from testing Sulzmann and Lu's 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    47
algorithm: their simplification does 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    48
not work!
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    49
We quote their $\textit{simp}$ function verbatim here:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    50
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    51
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    52
		$\simp\; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    53
		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    54
						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    55
		$\simp\;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    56
		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    57
		\textit{then} \;\; \ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    58
					     & & $\textit{else}\;\;_{bs}((\simp\;r_1)\cdot
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    59
					     (\simp\; r_2))$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    60
		$\simp \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    61
		$\simp \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    62
		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    63
		$\simp \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simp \; r)$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    64
		$\simp \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    65
		(\nub \; (\filter \; (\not \circ \zeroable)\;((\simp \; r) :: \map \; \simp \; rs)))$\\ 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    66
		
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    67
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    68
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    69
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    70
the $\textit{zeroable}$ predicate 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    71
which tests whether the regular expression
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    72
is equivalent to $\ZERO$,
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    73
is defined as:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    74
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    75
	\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    76
		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    77
		\zeroable \;_{[]}\sum\;rs $\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    78
		$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    79
		$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    80
		$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    81
		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    82
		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    83
	\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    84
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    85
%\[
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    86
%\begin{aligned}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    87
%&\textit{isPhi} \left(b s @ r i^{*}\right)=\text { False } \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    88
%&\textit{isPhi}\left(b s @ r i_{1} r i_{2}\right)=\textit{isPhi}  r i_{1} \vee \textit{isPhi}  r i_{2} \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    89
%&\textit{isPhi} \left(b s @ r i_{1} \oplus r i_{2}\right)=\textit{isPhi}  r i_{1} \wedge \textit{isPhi}  r i_{2} \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    90
%&\textit{isPhi}(b s @ l)= False \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    91
%&\textit{isPhi}(b s @ \epsilon)= False \\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    92
%&\textit{isPhi} \; \ZERO = True
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    93
%\end{aligned}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    94
%\]
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    95
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    96
Despite that we have already implemented the simple-minded simplifications 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    97
such as throwing away useless $\ONE$s and $\ZERO$s.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    98
The simplification rule $r + r \rightarrow $ cannot make a difference either
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    99
since it only removes duplicates on the same level, not something like 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   100
$(r+a)+r \rightarrow r+a$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   101
This requires us to break up nested alternatives into lists, for example 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   102
using the flatten operation similar to the one defined for any function language's
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   103
list datatype:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   104
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   105
 \begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   106
  \begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   107
  $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   108
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   109
  $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   110
    $\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise) 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   111
\end{tabular}    
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   112
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   113
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   114
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   115
There is a minor difference though, in that our $\flts$ operation defined by us
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   116
also throws away $\ZERO$s.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   117
For a flattened list of regular expressions, a de-duplication can be done efficiently:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   118
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   119
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   120
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   121
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   122
		$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   123
		$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   124
						       & & $\quad \textit{if} (f \; x) \in acc \textit{then} \distinctBy \; xs \; f \; acc$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   125
						       & & $\quad \textit{else} x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   126
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   127
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   128
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   129
The reason we define a distinct function under a mapping $f$ is because
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   130
we want to eliminate regular expressions that are the same syntactically,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   131
but having different bit-codes (more on the reason why we can do this later).
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   132
To achieve this, we call $\erase$ as the function $f$ during the distinction
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   133
operation.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   134
A recursive definition of our  simplification function 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   135
that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   136
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   137
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   138
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   139
   
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   140
	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   141
	  $\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \erase \; \phi) $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   142
   $\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   143
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   144
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   145
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   146
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   147
The simplification (named $\bsimp$ for \emph{b}it-coded) does a pattern matching on the regular expression.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   148
When it detected that the regular expression is an alternative or
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   149
sequence, it will try to simplify its children regular expressions
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   150
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   151
$\ONE$, which might trigger further simplification at the current level.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   152
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   153
using rules such as  $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   154
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   155
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   156
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   157
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   158
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   159
   &&$\quad\textit{case} \;  (_{bs1}\ONE, a_2') \Rightarrow  \textit{fuse} \; (bs@bs_1) \;  a_2'$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   160
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   161
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   162
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   163
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   164
The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   165
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   166
and then call $\distinctBy$ on that list, the predicate determining whether two 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   167
elements are the same is $\erase \; r_1 = \erase\; r_2$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   168
Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   169
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   170
decides whether to keep the current level constructor $\sum$ as it is, and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   171
removes it when there are less than two elements:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   172
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   173
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   174
		$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   175
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   176
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   177
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   178
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   179
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   180
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   181
Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   182
we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   183
so it stays small:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   184
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   185
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   186
		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   187
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   188
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   189
Following previous notation of  natural
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   190
extension from derivative w.r.t.~character to derivative
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   191
w.r.t.~string, we define the derivative that nests simplifications with derivatives:%\comment{simp in  the [] case?}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   192
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   193
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   194
\begin{tabular}{lcl}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   195
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   196
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   197
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   198
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   199
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   200
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   201
Extracting bit-codes from the derivatives that had been simplified repeatedly after 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   202
each derivative run, the simplified $\blexer$, called $\blexersimp$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   203
is defined as 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   204
 \begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   205
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   206
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   207
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   208
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   209
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   210
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   211
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   212
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   213
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   214
\noindent
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   215
This algorithm keeps the regular expression size small, for example,
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   216
with this simplification our previous $(a + aa)^*$ example's fast growth (over
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   217
$10^5$ nodes at around $20$ input length)
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   218
will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   219
input string is.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   220
We show some graphs to better demonstrate this imrpovement.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   221
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   222
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   223
\section{$(a+aa)^*$ and $(a^*\cdot a^*)^*$  against $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   224
For $(a+aa)^*$, it used to grow to over $9000$ nodes with simple-minded simplification
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   225
at only around $15$ input characters:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   226
\begin{center}
539
Chengsong
parents: 538
diff changeset
   227
\begin{tabular}{ll}
Chengsong
parents: 538
diff changeset
   228
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   229
\begin{axis}[
Chengsong
parents: 538
diff changeset
   230
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   231
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   232
        width=7cm,
Chengsong
parents: 538
diff changeset
   233
    height=4cm, 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   234
    legend entries={Lexer with $\textit{bsimp}$},  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   235
    legend pos=  south east,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   236
    legend cell align=left]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   237
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_bsimp.data};
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   238
\end{axis}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   239
\end{tikzpicture} %\label{fig:BitcodedLexer}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   240
&
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   241
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   242
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   243
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   244
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   245
    width = 7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   246
    height = 4cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   247
    legend entries={Lexer without $\textit{bsimp}$},  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   248
    legend pos=  north west,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   249
    legend cell align=left]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   250
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_easysimp.data};
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   251
\end{axis}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   252
\end{tikzpicture} 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   253
\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   254
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   255
And for $(a^*\cdot a^*)^*$, unlike in \ref{fig:BetterWaterloo}, the size with simplification now stay nicely constant with current
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   256
simplification rules (we put the graphs together to show the contrast)
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   257
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   258
\begin{tabular}{ll}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   259
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   260
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   261
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   262
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   263
        width=7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   264
    height=4cm, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   265
    legend entries={Lexer with $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   266
    legend pos=  south east,
Chengsong
parents: 538
diff changeset
   267
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   268
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
Chengsong
parents: 538
diff changeset
   269
\end{axis}
Chengsong
parents: 538
diff changeset
   270
\end{tikzpicture} %\label{fig:BitcodedLexer}
Chengsong
parents: 538
diff changeset
   271
&
Chengsong
parents: 538
diff changeset
   272
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   273
\begin{axis}[
Chengsong
parents: 538
diff changeset
   274
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   275
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   276
    width = 7cm,
Chengsong
parents: 538
diff changeset
   277
    height = 4cm,
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   278
    legend entries={Lexer without $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   279
    legend pos=  north west,
Chengsong
parents: 538
diff changeset
   280
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   281
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
Chengsong
parents: 538
diff changeset
   282
\end{axis}
Chengsong
parents: 538
diff changeset
   283
\end{tikzpicture} 
Chengsong
parents: 538
diff changeset
   284
\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   285
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   286
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   287
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   288
%	SECTION rewrite relation
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   289
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   290
\section{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   291
In the $\blexer$'s correctness proof, we
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   292
did not directly derive the fact that $\blexer$ gives out the POSIX value,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   293
but first prove that $\blexer$ is linked with $\lexer$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   294
Then we re-use
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   295
the correctness of $\lexer$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   296
Here we follow suit by first proving that
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   297
$\blexersimp \; r \; s $ 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   298
produces the same output as $\blexer \; r\; s$,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   299
and then putting it together with 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   300
$\blexer$'s correctness result
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   301
($(r, s) \rightarrow v \implies \blexer \; r \;s = v$)
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   302
to obtain half of the correctness property (the other half
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   303
being when $s$ is not $L \; r$ which is routine to establish)
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   304
\begin{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   305
	$(r, s) \rightarrow v \implies \blexersimp \; r \; s = v$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   306
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   307
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   308
because it makes the proof simpler
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   309
The overall idea for the correctness of our simplified bitcoded lexer
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   310
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   311
is that the $\textit{bsimp}$ will not change the regular expressions so much that
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   312
it becomes impossible to extract the $\POSIX$ values.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   313
To capture this "similarity" between unsimplified regular expressions and simplified ones,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   314
we devise the rewriting relation $\rrewrite$, written infix as $\rightsquigarrow$:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   315
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   316
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   317
\begin{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   318
	\inferrule{}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   319
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   320
	\inferrule{}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   321
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   322
	\inferrule{}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   323
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   324
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   325
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   326
	\inferrule{\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   327
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   328
	\inferrule{\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   329
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   330
	\inferrule{}{ _{bs}\sum [] \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   331
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   332
	\inferrule{}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   333
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   334
	\inferrule{\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   335
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   336
	\inferrule{}{\\ [] \stackrel{s}{\rightsquigarrow} []}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   337
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   338
	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{\\ r :: rs_1 \rightsquigarrow r :: rs_2 \rightsquigarrow}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   339
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   340
	\inferrule{r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   341
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   342
	\inferrule{}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   343
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   344
	\inferrule{}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   345
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   346
	\inferrule{\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rsc \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   347
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   348
\end{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   349
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   350
These "rewrite" steps define the atomic simplifications we could impose on regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   351
under our simplification algorithm.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   352
For convenience, we define a relation $\stackrel{s}{\rightsquigarrow}$ for rewriting
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   353
a list of regular exression to another list.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   354
The $\rerase{}$ function is used instead of $_\downarrow$ for the finiteness bound proof of next chapter,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   355
which we will discuss later. For the moment the reader can assume they basically do the same thing as $\erase$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   356
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   357
Usually more than one steps are taking place during the simplification of a regular expression,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   358
therefore we define the reflexive transitive closure of the $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   359
relations:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   360
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   361
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   362
\begin{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   363
	\inferrule{}{ r \stackrel{*}{\rightsquigarrow} r \\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   364
	\inferrule{}{\\ rs \stackrel{s*}{\rightsquigarrow} rs \\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   365
	\inferrule{\\r_1 \stackrel{*}{\rightsquigarrow}  r_2 \land \; r_2 \stackrel{*}{\rightsquigarrow} r_3}{r_1 \stackrel{*}{\rightsquigarrow} r_3\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   366
	\inferrule{\\rs_1 \stackrel{*}{\rightsquigarrow}  rs_2 \land \; rs_2 \stackrel{*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{*}{\rightsquigarrow} rs_3}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   367
\end{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   368
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   369
Now that we have modelled the rewriting behaviour of our simplifier $\bsimp$, we prove mainly 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   370
three properties about how these relations connect to $\blexersimp$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   371
\begin{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   372
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   373
		$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   374
		The algorithm $\bsimp$ only transforms the regex $r$ using steps specified by 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   375
		$\stackrel{*}{\rightsquigarrow}$ and nothing else.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   376
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   377
		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow r'\backslash c$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   378
		The relation $\stackrel{*}{rightsquigarrow}$ is preserved under derivative. 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   379
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   380
		$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$. If we reach another 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   381
		expression in finitely many atomic simplification steps, then these two regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   382
		will produce the same bit-codes under the bit collection function $\bmkeps$ used by our $\blexer$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   383
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   384
\end{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   385
\section{Three Important properties}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   386
These properties would work together towards the correctness theorem.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   387
We start proving each of these lemmas below.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   388
\subsection{$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$ and $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   389
The first few properties we establish is that the inference rules we gave for $\rightsquigarrow$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   390
and $\stackrel{s}{\rightsquigarrow}$ also hold as implications for $\stackrel{*}{\rightsquigarrow}$ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   391
$\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   392
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   393
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   394
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   395
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   396
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   397
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   398
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   399
	$r \stackrel{*}{\rightsquigarrow} r' \implies _{bs} \sum r :: rs \stackrel{*}{\rightsquigarrow}  _{bs} \sum r' :: rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   400
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   401
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   402
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   403
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   404
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   405
Then we establish that the $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ relation is preserved w.r.t appending and prepending of a  list:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   406
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   407
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   408
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   409
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   410
	By induction on the list $rs$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   411
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   412
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   413
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   414
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   415
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   416
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   417
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   418
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   419
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   420
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   421
The $\stackrel{s}{\rightsquigarrow} $ relation after appending a list becomes $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   422
\begin{lemma}\label{ssgqTossgs}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   423
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   424
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   425
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   426
	By rule induction of $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   427
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   428
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   429
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   430
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   431
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   432
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$ and using \ref{ssgqTossgs}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   433
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   434
Here are two lemmas relating $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   435
\begin{lemma}\label{singleton}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   436
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   437
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   438
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   439
	By rule induction of $ \stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   440
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   441
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   442
	$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   443
	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   444
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   445
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   446
	By using \ref{singleton}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   447
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   448
Now we get to the "meaty" part of the proof, which relates the relations $\stackrel{s*}{\rightsquigarrow}$ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   449
$\stackrel{*}{\rightsquigarrow} $ with our simplification components such $\distinctBy$ and $\flts$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   450
The first lemma below says that for a list made of two parts $rs_1 @ rs_2$, one can throw away the duplicate
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   451
elements in $rs_2$, as well as those that have appeared in $rs_1$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   452
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   453
	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} (rs_1 @ (\distinctBy \; rs_2 \; \; \rerase{\_}\;  \; (\map\;\; \rerase{\_}\; \; rs_1)))$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   454
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   455
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   456
	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   457
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   458
The above h as the corollary that is suitable for the actual way $\distinctBy$ is called in $\bsimp$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   459
\begin{lemma}\label{dBPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   460
	$rs_ 1 \rightarrow \distinctBy \; rs_1 \; \phi$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   461
\end{lemma}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   462
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   463
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   464
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   465
The flatten function $\flts$ works within the $\rightsquigarrow$ relation:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   466
\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   467
	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   468
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   469
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   470
The rewriting in many steps property is composible in terms of regular expression constructors:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   471
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   472
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} \;  _{bs} r_2 \cdot r_3 \quad $ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   473
$r_3 \stackrel{*}{\rightsquigarrow} r_4 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} _{bs} \; r_1 \cdot r_4$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   474
\end{lemma}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   475
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   476
The rewriting in many steps properties $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ is preserved under the function $\fuse$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   477
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   478
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies \fuse \; bs \; r_1 \stackrel{*}{\rightsquigarrow} \; \fuse \; bs \; r_2 \quad $ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   479
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\fuse \; bs) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   480
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   481
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   482
	By the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   483
	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 \implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   484
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   485
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   486
If we could rewrite a regular expression in many steps to $\ZERO$, then 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   487
we could also rewrite any sequence containing it to $\ZERO$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   488
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   489
	$r_1 \stackrel{*}{\rightsquigarrow} \ZERO \implies _{bs}r_1\cdot r_2 \stackrel{*}{\rightsquigarrow} \ZERO$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   490
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   491
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   492
	$\bmkeps \; (r \backslash s) = \bmkeps \; \bderssimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   493
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   494
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   495
The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   496
\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   497
	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   498
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   499
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   500
Before we give out the next lemmas, we define a predicate for a list of regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   501
having at least one nullable regular expressions:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   502
\begin{definition}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   503
	$\textit{bnullables} \; rs \dn  \exists r \in rs. \bnullable r$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   504
\end{definition}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   505
The rewriting relation $\rightsquigarrow$ preserves nullability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   506
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   507
	$r_1 \rightsquigarrow r_2 \implies  \bnullable \; r_1 = \bnullable \; r_2$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   508
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   509
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   510
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   511
	By rule induction of $\rightarrow$ and $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   512
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   513
So does the many steps rewriting:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   514
\begin{lemma}\label{rewritesBnullable}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   515
	$r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   516
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   517
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   518
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   519
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   520
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   521
And if both regular expressions in a rewriting relation are nullable, then they 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   522
produce the same bit-codes:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   523
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   524
\begin{lemma}\label{rewriteBmkepsAux}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   525
	$r_1 \rightsquigarrow r_2 \implies (\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 = 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   526
	\bmkeps \; r_2)$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   527
	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 \implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   528
	\bmkepss \; rs_1 = \bmkepss \; rs2)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   529
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   530
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   531
The definition of $\bmkepss$ on list $rs$ is just to extract the bit-codes on the first element in $rs$ that 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   532
is $bnullable$:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   533
\begin{center}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   534
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   535
		$\bmkepss \; [] $ & $\dn$ & $[]$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   536
		$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \; \bnullable \; r then \; (\bmkeps \; r) \; \textit{else} \; \bmkepss \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   537
	\end{tabular}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   538
\end{center}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   539
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   540
And now we are ready to prove the key property that if you 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   541
have two regular expressions, one rewritable in many steps to the other,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   542
and one of them is $\bnullable$, then they will both yield the same bits under $\bmkeps$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   543
\begin{lemma}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   544
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r_1 \;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   545
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   546
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   547
	By rule induction of $\stackrel{*}{\rightsquigarrow} $, using \ref{rewriteBmkepsAux} and $\ref{rewritesBnullable}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   548
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   549
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   550
the other property is also ready:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   551
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   552
	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   553
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   554
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   555
	By an induction on $r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   556
The most difficult case would be the alternative case, where we using properties such as \ref{bsimpaltsPreserves} and \ref{fltsPreserves} and \ref{dBPreserves}, we could continuously rewrite a list like:\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   557
	$rs \stackrel{s*}{\rightsquigarrow} \map \; \textit{bsimp} \; rs$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   558
	$\ldots \stackrel{s*}{\rightsquigarrow} \flts \; (\map \; \textit{bsimp} \; rs)$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   559
	$\ldots \;\stackrel{s*}{\rightsquigarrow} \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   560
	Then we could do the following regular expresssion many steps rewrite:\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   561
	$ _{bs} \sum \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi \stackrel{*}{\rightsquigarrow} \bsimpalts \; bs \; (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   562
	\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   563
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   564
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   565
\section{Proof for the Property: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   566
The first thing we prove is that if we could rewrite in one step, then after derivative
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   567
we could rewrite in many steps:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   568
\begin{lemma}\label{rewriteBder}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   569
	$r_1 \rightsquigarrow r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow}  r_2 \backslash c$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   570
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\_\backslash c) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   571
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   572
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   573
	By induction on $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   574
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   575
Now we can prove that once we could rewrite from one expression to another in many steps,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   576
then after a derivative on both sides we could still rewrite one to another in many steps:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   577
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   578
	$r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow}  r_2 \backslash c$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   579
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   580
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   581
	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma :\ref{rewriteBder}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   582
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   583
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   584
This can be extended and combined with the previous two important properties
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   585
so that a regular expression's successivve derivatives can be rewritten in many steps
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   586
to its simplified counterpart:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   587
\begin{lemma}\label{bderBderssimp}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   588
	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   589
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   590
\subsection{Main Theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   591
Now with \ref{bdersBderssimp} we are ready for the main theorem.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   592
To link $\blexersimp$ and $\blexer$, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   593
we first say that they give out the same bits, if the lexing result is a match:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   594
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   595
	$\bnullable \; (a \backslash s) \implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   596
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   597
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   598
Now that they give out the same bits, we know that they give the same value after decoding,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   599
which we know is correct value as $\blexer$ is correct:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   600
\begin{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   601
	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   602
\end{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   603
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   604
\begin{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   605
	One can rewrite in many steps from the original lexer's derivative regular expressions to the 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   606
	lexer with simplification applied:
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   607
	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   608
	If two regular expressions are rewritable, then they produce the same bits.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   609
	Under the condition that $ r_1$ is nullable, we have
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   610
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r', \;\;\text{then} \;\; \bmkeps \; r = \bmkeps \; r'$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   611
	This proves the \emph{if} (interesting) branch of
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   612
	$\blexer \; r \; s = \blexersimp{r}{s}$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   613
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   614
\end{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   615
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   616
As a corollary,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   617
we link this result with the lemma we proved earlier that 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   618
\begin{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   619
	$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   620
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   621
and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   622
indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   623
\begin{corollary}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   624
	$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   625
\end{corollary}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   626
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   627
\subsection{Comments on the Proof Techniques Used}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   628
The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   629
compared with not having simplification is that we can no longer use the argument 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   630
in \cref{flex_retrieve}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   631
The function \retrieve needs the cumbersome structure of the (umsimplified)
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   632
annotated regular expression to 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   633
agree with the structure of the value, but simplification will always mess with the 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   634
structure.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   635
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   636
We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   637
but this turns out to be not true, A counterexample of this being
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   638
\[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   639
\]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   640
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   641
Then we would have $\bsimp{a \backslash s}$ being 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   642
$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   643
whereas $\bsimp{\bderssimp{a}{s}}$ would be 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   644
$_{Z}(_{Z} \ONE + _{S} c)$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   645
Unfortunately if we apply $\textit{bsimp}$ at different
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   646
stages we will always have this discrepancy, due to 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   647
whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   648
is taken at some points will be entirely dependant on when the simplification 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   649
take place whether there is a larger alternative structure surrounding the 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   650
alternative being simplified.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   651
The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   652
us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   653
are taken, but simply say that they can be taken to make two similar 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   654
regular expressions equal, and can be done after interleaving derivatives
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   655
and simplifications.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   656
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   657
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   658
Having correctness property is good. But we would also like the lexer to be efficient in 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   659
some sense, for exampe, not grinding to a halt at certain cases.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   660
In the next chapter we shall prove that for a given $r$, the internal derivative size is always
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   661
finitely bounded by a constant.