ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Fri, 12 Aug 2022 00:39:23 +0100
changeset 576 3e1b699696b6
parent 543 b2bea5968b89
child 579 35df9cdd36ca
permissions -rwxr-xr-x
thesis chap5
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    14
Now we introduce the simplifications, which is why we introduce the 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    15
bitcodes in the first place.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    16
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    17
\section{Simplification for Annotated Regular Expressions}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    18
The first thing we notice in the fast growth of examples such as $(a^*a^*)^*$'s
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    19
and $(a^* + (aa)^*)^*$'s derivatives is that a lot of duplicated sub-patterns
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    20
are scattered around different levels:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    21
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    22
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    23
	$(a^*a^*)^* \rightarrow (a^*a^* + a^*)\cdot(a^*a^*)^*$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    24
	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    25
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    26
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    27
Despite that we have already implemented the simple-minded simplifications 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    28
such as throwing away useless $\ONE$s and $\ZERO$s.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    29
The simplification rule $r + r \rightarrow $ cannot make a difference either
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    30
since it only removes duplicates on the same level, not something like 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    31
$(r+a)+r \rightarrow r+a$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    32
This requires us to break up nested alternatives into lists, for example 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    33
using the flatten operation similar to the one defined for any function language's
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    34
list datatype:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    35
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    36
 \begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    37
  \begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    38
  $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    39
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    40
  $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    41
    $\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise) 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    42
\end{tabular}    
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    43
\end{center}  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    44
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    45
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    46
There is a minor difference though, in that our $\flts$ operation defined by us
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    47
also throws away $\ZERO$s.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    48
For a flattened list of regular expressions, a de-duplication can be done efficiently:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    49
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    50
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    51
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    52
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    53
		$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    54
		$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    55
						       & & $\quad \textit{if} (f \; x) \in acc \textit{then} \distinctBy \; xs \; f \; acc$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    56
						       & & $\quad \textit{else} x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    57
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    58
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    59
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    60
The reason we define a distinct function under a mapping $f$ is because
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    61
we want to eliminate regular expressions that are the same syntactically,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    62
but having different bit-codes (more on the reason why we can do this later).
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    63
To achieve this, we call $\erase$ as the function $f$ during the distinction
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    64
operation.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    65
A recursive definition of our  simplification function 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    66
that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    67
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    68
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    69
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    70
   
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    71
	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    72
	  $\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \erase \; \phi) $ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    73
   $\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    74
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    75
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    76
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    77
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    78
The simplification (named $\bsimp$ for \emph{b}it-coded) does a pattern matching on the regular expression.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    79
When it detected that the regular expression is an alternative or
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    80
sequence, it will try to simplify its children regular expressions
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    81
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    82
$\ONE$, which might trigger further simplification at the current level.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    83
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    84
using rules such as  $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    85
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    86
	\begin{tabular}{@{}lcl@{}}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    87
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    88
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    89
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    90
   &&$\quad\textit{case} \;  (_{bs1}\ONE, a_2') \Rightarrow  \textit{fuse} \; (bs@bs_1) \;  a_2'$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    91
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    92
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    93
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    94
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    95
The most involved part is the $\sum$ clause, where we first call $\flts$ on
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    96
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    97
and then call $\distinctBy$ on that list, the predicate determining whether two 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    98
elements are the same is $\erase \; r_1 = \erase\; r_2$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
    99
Finally, depending on whether the regular expression list $as'$ has turned into a
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   100
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   101
decides whether to keep the current level constructor $\sum$ as it is, and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   102
removes it when there are less than two elements:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   103
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   104
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   105
		$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   106
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   107
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   108
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   109
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   110
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   111
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   112
Having defined the $\bsimp$ function,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   113
we add it as a phase after a derivative is taken,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   114
so it stays small:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   115
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   116
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   117
		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   118
	\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   119
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   120
Following previous notation of  natural
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   121
extension from derivative w.r.t.~character to derivative
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   122
w.r.t.~string, we define the derivative that nests simplifications with derivatives:%\comment{simp in  the [] case?}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   123
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   124
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   125
\begin{tabular}{lcl}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   126
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   127
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   128
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   129
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   130
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   131
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   132
Extracting bit-codes from the derivatives that had been simplified repeatedly after 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   133
each derivative run, the simplified $\blexer$, called $\blexersimp$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   134
is defined as 
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   135
 \begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   136
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   137
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   138
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   139
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   140
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   141
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   142
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   143
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   144
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   145
\noindent
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   146
This algorithm keeps the regular expression size small, for example,
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   147
with this simplification our previous $(a + aa)^*$ example's fast growth (over
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   148
$10^5$ nodes at around $20$ input length)
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   149
will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   150
input string is.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   151
We show some graphs to better demonstrate this imrpovement.
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   152
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   153
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   154
\section{$(a+aa)^*$ and $(a^*\cdot a^*)^*$  against $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   155
For $(a+aa)^*$, it used to grow to over $9000$ nodes with simple-minded simplification
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   156
at only around $15$ input characters:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   157
\begin{center}
539
Chengsong
parents: 538
diff changeset
   158
\begin{tabular}{ll}
Chengsong
parents: 538
diff changeset
   159
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   160
\begin{axis}[
Chengsong
parents: 538
diff changeset
   161
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   162
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   163
        width=7cm,
Chengsong
parents: 538
diff changeset
   164
    height=4cm, 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   165
    legend entries={Lexer with $\textit{bsimp}$},  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   166
    legend pos=  south east,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   167
    legend cell align=left]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   168
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_bsimp.data};
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   169
\end{axis}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   170
\end{tikzpicture} %\label{fig:BitcodedLexer}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   171
&
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   172
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   173
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   174
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   175
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   176
    width = 7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   177
    height = 4cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   178
    legend entries={Lexer without $\textit{bsimp}$},  
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   179
    legend pos=  north west,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   180
    legend cell align=left]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   181
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_easysimp.data};
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   182
\end{axis}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   183
\end{tikzpicture} 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   184
\end{tabular}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   185
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   186
And for $(a^*\cdot a^*)^*$, unlike in \ref{fig:BetterWaterloo}, the size with simplification now stay nicely constant with current
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   187
simplification rules (we put the graphs together to show the contrast)
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   188
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   189
\begin{tabular}{ll}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   190
\begin{tikzpicture}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   191
\begin{axis}[
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   192
    xlabel={$n$},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   193
    ylabel={derivative size},
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   194
        width=7cm,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   195
    height=4cm, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   196
    legend entries={Lexer with $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   197
    legend pos=  south east,
Chengsong
parents: 538
diff changeset
   198
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   199
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
Chengsong
parents: 538
diff changeset
   200
\end{axis}
Chengsong
parents: 538
diff changeset
   201
\end{tikzpicture} %\label{fig:BitcodedLexer}
Chengsong
parents: 538
diff changeset
   202
&
Chengsong
parents: 538
diff changeset
   203
\begin{tikzpicture}
Chengsong
parents: 538
diff changeset
   204
\begin{axis}[
Chengsong
parents: 538
diff changeset
   205
    xlabel={$n$},
Chengsong
parents: 538
diff changeset
   206
    ylabel={derivative size},
Chengsong
parents: 538
diff changeset
   207
    width = 7cm,
Chengsong
parents: 538
diff changeset
   208
    height = 4cm,
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   209
    legend entries={Lexer without $\textit{bsimp}$},  
539
Chengsong
parents: 538
diff changeset
   210
    legend pos=  north west,
Chengsong
parents: 538
diff changeset
   211
    legend cell align=left]
Chengsong
parents: 538
diff changeset
   212
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
Chengsong
parents: 538
diff changeset
   213
\end{axis}
Chengsong
parents: 538
diff changeset
   214
\end{tikzpicture} 
Chengsong
parents: 538
diff changeset
   215
\end{tabular}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   216
\end{center}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   217
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   218
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   219
%	SECTION rewrite relation
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   220
%----------------------------------------------------------------------------------------
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   221
\section{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   222
In the $\blexer$'s correctness proof, we
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   223
did not directly derive the fact that $\blexer$ gives out the POSIX value,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   224
but first prove that $\blexer$ is linked with $\lexer$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   225
Then we re-use
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   226
the correctness of $\lexer$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   227
Here we follow suit by first proving that
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   228
$\blexersimp \; r \; s $ 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   229
produces the same output as $\blexer \; r\; s$,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   230
and then putting it together with 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   231
$\blexer$'s correctness result
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   232
($(r, s) \rightarrow v \implies \blexer \; r \;s = v$)
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   233
to obtain half of the correctness property (the other half
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   234
being when $s$ is not $L \; r$ which is routine to establish)
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   235
\begin{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   236
	$(r, s) \rightarrow v \implies \blexersimp \; r \; s = v$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   237
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   238
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   239
because it makes the proof simpler
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   240
The overall idea for the correctness of our simplified bitcoded lexer
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   241
\noindent
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   242
is that the $\textit{bsimp}$ will not change the regular expressions so much that
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   243
it becomes impossible to extract the $\POSIX$ values.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   244
To capture this "similarity" between unsimplified regular expressions and simplified ones,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   245
we devise the rewriting relation $\rrewrite$, written infix as $\rightsquigarrow$:
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   246
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   247
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   248
\begin{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   249
	\inferrule{}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   250
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   251
	\inferrule{}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   252
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   253
	\inferrule{}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   254
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   255
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   256
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   257
	\inferrule{\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   258
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   259
	\inferrule{\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   260
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   261
	\inferrule{}{ _{bs}\sum [] \rightsquigarrow \ZERO\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   262
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   263
	\inferrule{}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   264
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   265
	\inferrule{\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   266
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   267
	\inferrule{}{\\ [] \stackrel{s}{\rightsquigarrow} []}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   268
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   269
	\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{\\ r :: rs_1 \rightsquigarrow r :: rs_2 \rightsquigarrow}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   270
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   271
	\inferrule{r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   272
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   273
	\inferrule{}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   274
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   275
	\inferrule{}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   276
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   277
	\inferrule{\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rsc \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   278
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   279
\end{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   280
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   281
These "rewrite" steps define the atomic simplifications we could impose on regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   282
under our simplification algorithm.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   283
For convenience, we define a relation $\stackrel{s}{\rightsquigarrow}$ for rewriting
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   284
a list of regular exression to another list.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   285
The $\rerase{}$ function is used instead of $_\downarrow$ for the finiteness bound proof of next chapter,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   286
which we will discuss later. For the moment the reader can assume they basically do the same thing as $\erase$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   287
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   288
Usually more than one steps are taking place during the simplification of a regular expression,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   289
therefore we define the reflexive transitive closure of the $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   290
relations:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   291
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   292
\begin{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   293
\begin{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   294
	\inferrule{}{ r \stackrel{*}{\rightsquigarrow} r \\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   295
	\inferrule{}{\\ rs \stackrel{s*}{\rightsquigarrow} rs \\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   296
	\inferrule{\\r_1 \stackrel{*}{\rightsquigarrow}  r_2 \land \; r_2 \stackrel{*}{\rightsquigarrow} r_3}{r_1 \stackrel{*}{\rightsquigarrow} r_3\\}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   297
	\inferrule{\\rs_1 \stackrel{*}{\rightsquigarrow}  rs_2 \land \; rs_2 \stackrel{*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{*}{\rightsquigarrow} rs_3}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   298
\end{mathpar}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   299
\end{center}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   300
Now that we have modelled the rewriting behaviour of our simplifier $\bsimp$, we prove mainly 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   301
three properties about how these relations connect to $\blexersimp$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   302
\begin{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   303
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   304
		$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   305
		The algorithm $\bsimp$ only transforms the regex $r$ using steps specified by 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   306
		$\stackrel{*}{\rightsquigarrow}$ and nothing else.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   307
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   308
		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow r'\backslash c$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   309
		The relation $\stackrel{*}{rightsquigarrow}$ is preserved under derivative. 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   310
	\item
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   311
		$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$. If we reach another 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   312
		expression in finitely many atomic simplification steps, then these two regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   313
		will produce the same bit-codes under the bit collection function $\bmkeps$ used by our $\blexer$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   314
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   315
\end{itemize}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   316
\section{Three Important properties}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   317
These properties would work together towards the correctness theorem.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   318
We start proving each of these lemmas below.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   319
\subsection{$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$ and $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   320
The first few properties we establish is that the inference rules we gave for $\rightsquigarrow$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   321
and $\stackrel{s}{\rightsquigarrow}$ also hold as implications for $\stackrel{*}{\rightsquigarrow}$ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   322
$\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   323
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   324
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   325
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   326
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   327
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   328
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   329
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   330
	$r \stackrel{*}{\rightsquigarrow} r' \implies _{bs} \sum r :: rs \stackrel{*}{\rightsquigarrow}  _{bs} \sum r' :: rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   331
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   332
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   333
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   334
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   335
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   336
Then we establish that the $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ relation is preserved w.r.t appending and prepending of a  list:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   337
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   338
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   339
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   340
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   341
	By induction on the list $rs$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   342
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   343
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   344
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   345
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   346
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   347
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   348
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   349
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   350
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   351
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   352
The $\stackrel{s}{\rightsquigarrow} $ relation after appending a list becomes $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   353
\begin{lemma}\label{ssgqTossgs}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   354
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   355
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   356
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   357
	By rule induction of $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   358
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   359
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   360
	$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   361
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   362
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   363
	By rule induction of $\stackrel{s*}{\rightsquigarrow}$ and using \ref{ssgqTossgs}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   364
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   365
Here are two lemmas relating $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   366
\begin{lemma}\label{singleton}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   367
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   368
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   369
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   370
	By rule induction of $ \stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   371
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   372
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   373
	$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   374
	r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   375
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   376
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   377
	By using \ref{singleton}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   378
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   379
Now we get to the "meaty" part of the proof, which relates the relations $\stackrel{s*}{\rightsquigarrow}$ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   380
$\stackrel{*}{\rightsquigarrow} $ with our simplification components such $\distinctBy$ and $\flts$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   381
The first lemma below says that for a list made of two parts $rs_1 @ rs_2$, one can throw away the duplicate
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   382
elements in $rs_2$, as well as those that have appeared in $rs_1$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   383
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   384
	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} (rs_1 @ (\distinctBy \; rs_2 \; \; \rerase{\_}\;  \; (\map\;\; \rerase{\_}\; \; rs_1)))$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   385
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   386
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   387
	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   388
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   389
The above h as the corollary that is suitable for the actual way $\distinctBy$ is called in $\bsimp$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   390
\begin{lemma}\label{dBPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   391
	$rs_ 1 \rightarrow \distinctBy \; rs_1 \; \phi$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   392
\end{lemma}
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   393
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   394
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   395
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   396
The flatten function $\flts$ works within the $\rightsquigarrow$ relation:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   397
\begin{lemma}\label{fltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   398
	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   399
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   400
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   401
The rewriting in many steps property is composible in terms of regular expression constructors:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   402
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   403
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} \;  _{bs} r_2 \cdot r_3 \quad $ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   404
$r_3 \stackrel{*}{\rightsquigarrow} r_4 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} _{bs} \; r_1 \cdot r_4$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   405
\end{lemma}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   406
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   407
The rewriting in many steps properties $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ is preserved under the function $\fuse$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   408
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   409
	$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies \fuse \; bs \; r_1 \stackrel{*}{\rightsquigarrow} \; \fuse \; bs \; r_2 \quad $ and 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   410
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\fuse \; bs) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   411
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   412
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   413
	By the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   414
	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 \implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   415
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   416
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   417
If we could rewrite a regular expression in many steps to $\ZERO$, then 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   418
we could also rewrite any sequence containing it to $\ZERO$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   419
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   420
	$r_1 \stackrel{*}{\rightsquigarrow} \ZERO \implies _{bs}r_1\cdot r_2 \stackrel{*}{\rightsquigarrow} \ZERO$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   421
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   422
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   423
	$\bmkeps \; (r \backslash s) = \bmkeps \; \bderssimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   424
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   425
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   426
The function $\bsimpalts$ preserves rewritability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   427
\begin{lemma}\label{bsimpaltsPreserves}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   428
	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   429
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   430
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   431
Before we give out the next lemmas, we define a predicate for a list of regular expressions
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   432
having at least one nullable regular expressions:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   433
\begin{definition}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   434
	$\textit{bnullables} \; rs \dn  \exists r \in rs. \bnullable r$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   435
\end{definition}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   436
The rewriting relation $\rightsquigarrow$ preserves nullability:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   437
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   438
	$r_1 \rightsquigarrow r_2 \implies  \bnullable \; r_1 = \bnullable \; r_2$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   439
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   440
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   441
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   442
	By rule induction of $\rightarrow$ and $\stackrel{s}{\rightsquigarrow}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   443
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   444
So does the many steps rewriting:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   445
\begin{lemma}\label{rewritesBnullable}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   446
	$r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   447
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   448
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   449
	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   450
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   451
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   452
And if both regular expressions in a rewriting relation are nullable, then they 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   453
produce the same bit-codes:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   454
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   455
\begin{lemma}\label{rewriteBmkepsAux}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   456
	$r_1 \rightsquigarrow r_2 \implies (\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 = 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   457
	\bmkeps \; r_2)$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   458
	$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 \implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   459
	\bmkepss \; rs_1 = \bmkepss \; rs2)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   460
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   461
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   462
The definition of $\bmkepss$ on list $rs$ is just to extract the bit-codes on the first element in $rs$ that 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   463
is $bnullable$:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   464
\begin{center}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   465
	\begin{tabular}{lcl}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   466
		$\bmkepss \; [] $ & $\dn$ & $[]$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   467
		$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \; \bnullable \; r then \; (\bmkeps \; r) \; \textit{else} \; \bmkepss \; rs$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   468
	\end{tabular}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   469
\end{center}
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   470
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   471
And now we are ready to prove the key property that if you 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   472
have two regular expressions, one rewritable in many steps to the other,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   473
and one of them is $\bnullable$, then they will both yield the same bits under $\bmkeps$:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   474
\begin{lemma}
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   475
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r_1 \;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   476
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   477
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   478
	By rule induction of $\stackrel{*}{\rightsquigarrow} $, using \ref{rewriteBmkepsAux} and $\ref{rewritesBnullable}$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   479
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   480
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   481
the other property is also ready:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   482
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   483
	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   484
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   485
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   486
	By an induction on $r$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   487
The most difficult case would be the alternative case, where we using properties such as \ref{bsimpaltsPreserves} and \ref{fltsPreserves} and \ref{dBPreserves}, we could continuously rewrite a list like:\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   488
	$rs \stackrel{s*}{\rightsquigarrow} \map \; \textit{bsimp} \; rs$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   489
	$\ldots \stackrel{s*}{\rightsquigarrow} \flts \; (\map \; \textit{bsimp} \; rs)$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   490
	$\ldots \;\stackrel{s*}{\rightsquigarrow} \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi$\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   491
	Then we could do the following regular expresssion many steps rewrite:\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   492
	$ _{bs} \sum \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi \stackrel{*}{\rightsquigarrow} \bsimpalts \; bs \; (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi)$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   493
	\\
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   494
	
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   495
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   496
\section{Proof for the Property: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   497
The first thing we prove is that if we could rewrite in one step, then after derivative
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   498
we could rewrite in many steps:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   499
\begin{lemma}\label{rewriteBder}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   500
	$r_1 \rightsquigarrow r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow}  r_2 \backslash c$ and
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   501
	$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\_\backslash c) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   502
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   503
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   504
	By induction on $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   505
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   506
Now we can prove that once we could rewrite from one expression to another in many steps,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   507
then after a derivative on both sides we could still rewrite one to another in many steps:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   508
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   509
	$r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow}  r_2 \backslash c$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   510
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   511
\begin{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   512
	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma :\ref{rewriteBder}.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   513
\end{proof}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   514
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   515
This can be extended and combined with the previous two important properties
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   516
so that a regular expression's successivve derivatives can be rewritten in many steps
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   517
to its simplified counterpart:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   518
\begin{lemma}\label{bderBderssimp}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   519
	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   520
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   521
\subsection{Main Theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   522
Now with \ref{bdersBderssimp} we are ready for the main theorem.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   523
To link $\blexersimp$ and $\blexer$, 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   524
we first say that they give out the same bits, if the lexing result is a match:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   525
\begin{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   526
	$\bnullable \; (a \backslash s) \implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   527
\end{lemma}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   528
\noindent
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   529
Now that they give out the same bits, we know that they give the same value after decoding,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   530
which we know is correct value as $\blexer$ is correct:
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   531
\begin{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   532
	$\blexer \; r \; s = \blexersimp{r}{s}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   533
\end{theorem}
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   534
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   535
\begin{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   536
	One can rewrite in many steps from the original lexer's derivative regular expressions to the 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   537
	lexer with simplification applied:
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   538
	$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   539
	If two regular expressions are rewritable, then they produce the same bits.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   540
	Under the condition that $ r_1$ is nullable, we have
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   541
	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r', \;\;\text{then} \;\; \bmkeps \; r = \bmkeps \; r'$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   542
	This proves the \emph{if} (interesting) branch of
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   543
	$\blexer \; r \; s = \blexersimp{r}{s}$.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   544
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   545
\end{proof}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   546
\noindent
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   547
As a corollary,
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   548
we link this result with the lemma we proved earlier that 
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   549
\begin{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   550
	$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   551
\end{center}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   552
and obtain the corollary that the bit-coded lexer with simplification is
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   553
indeed correctly outputting POSIX lexing result, if such a result exists.
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   554
\begin{corollary}
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   555
	$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
3e1b699696b6 thesis chap5
Chengsong
parents: 543
diff changeset
   556
\end{corollary}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   557
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   558
\subsection{Comments on the Proof Techniques Used}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   559
The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   560
compared with not having simplification is that we can no longer use the argument 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   561
in \cref{flex_retrieve}.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   562
The function \retrieve needs the cumbersome structure of the (umsimplified)
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   563
annotated regular expression to 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   564
agree with the structure of the value, but simplification will always mess with the 
543
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   565
structure.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   566
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   567
We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   568
but this turns out to be not true, A counterexample of this being
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   569
\[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   570
\]
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   571
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   572
Then we would have $\bsimp{a \backslash s}$ being 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   573
$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   574
whereas $\bsimp{\bderssimp{a}{s}}$ would be 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   575
$_{Z}(_{Z} \ONE + _{S} c)$.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   576
Unfortunately if we apply $\textit{bsimp}$ at different
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   577
stages we will always have this discrepancy, due to 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   578
whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   579
is taken at some points will be entirely dependant on when the simplification 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   580
take place whether there is a larger alternative structure surrounding the 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   581
alternative being simplified.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   582
The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   583
us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   584
are taken, but simply say that they can be taken to make two similar 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   585
regular expressions equal, and can be done after interleaving derivatives
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   586
and simplifications.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   587
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   588
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   589
Having correctness property is good. But we would also like the lexer to be efficient in 
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   590
some sense, for exampe, not grinding to a halt at certain cases.
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   591
In the next chapter we shall prove that for a given $r$, the internal derivative size is always
b2bea5968b89 thesis_thys
Chengsong
parents: 539
diff changeset
   592
finitely bounded by a constant.