ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
author Chengsong
Thu, 09 Jun 2022 12:57:53 +0100
changeset 538 8016a2480704
parent 532 cc54ce075db5
child 539 7cf9f17aa179
permissions -rwxr-xr-x
intro and chap2
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    12
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    13
538
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    14
Now we introduce the simplifications, which is why we introduce the 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    15
bitcodes in the first place.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    16
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    17
\subsection*{Simplification Rules}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    18
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    19
This section introduces aggressive (in terms of size) simplification rules
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    20
on annotated regular expressions
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    21
to keep derivatives small. Such simplifications are promising
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    22
as we have
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    23
generated test data that show
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    24
that a good tight bound can be achieved. We could only
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    25
partially cover the search space as there are infinitely many regular
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    26
expressions and strings. 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    27
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    28
One modification we introduced is to allow a list of annotated regular
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    29
expressions in the $\sum$ constructor. This allows us to not just
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    30
delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    31
also unnecessary ``copies'' of regular expressions (very similar to
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    32
simplifying $r + r$ to just $r$, but in a more general setting). Another
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    33
modification is that we use simplification rules inspired by Antimirov's
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    34
work on partial derivatives. They maintain the idea that only the first
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    35
``copy'' of a regular expression in an alternative contributes to the
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    36
calculation of a POSIX value. All subsequent copies can be pruned away from
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    37
the regular expression. A recursive definition of our  simplification function 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    38
that looks somewhat similar to our Scala code is given below:
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    39
%\comment{Use $\ZERO$, $\ONE$ and so on. 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    40
%Is it $ALTS$ or $ALTS$?}\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    41
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    42
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    43
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    44
   
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    45
  $\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    46
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    47
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    48
   &&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    49
   &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    50
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    51
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    52
  $\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    53
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    54
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    55
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    56
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    57
   $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    58
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    59
\end{center}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    60
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    61
\noindent
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    62
The simplification does a pattern matching on the regular expression.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    63
When it detected that the regular expression is an alternative or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    64
sequence, it will try to simplify its child regular expressions
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    65
recursively and then see if one of the children turns into $\ZERO$ or
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    66
$\ONE$, which might trigger further simplification at the current level.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    67
The most involved part is the $\sum$ clause, where we use two
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    68
auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    69
alternatives and reduce as many duplicates as possible. Function
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    70
$\textit{distinct}$  keeps the first occurring copy only and removes all later ones
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    71
when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    72
Its recursive definition is given below:
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    73
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    74
 \begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    75
  \begin{tabular}{@{}lcl@{}}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    76
  $\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    77
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    78
  $\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \;  \textit{as'} $ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    79
    $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise) 
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    80
\end{tabular}    
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    81
\end{center}  
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    82
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    83
\noindent
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    84
Here $\textit{flatten}$ behaves like the traditional functional programming flatten
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    85
function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    86
removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    87
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    88
Having defined the $\simp$ function,
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    89
we can use the previous notation of  natural
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    90
extension from derivative w.r.t.~character to derivative
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    91
w.r.t.~string:%\comment{simp in  the [] case?}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    92
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    93
\begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    94
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    95
$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    96
$r \backslash_{simp} [\,] $ & $\dn$ & $r$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    97
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    98
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
    99
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   100
\noindent
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   101
to obtain an optimised version of the algorithm:
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   102
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   103
 \begin{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   104
\begin{tabular}{lcl}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   105
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   106
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   107
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   108
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   109
  & & $\;\;\textit{else}\;\textit{None}$
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   110
\end{tabular}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   111
\end{center}
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   112
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   113
\noindent
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   114
This algorithm keeps the regular expression size small, for example,
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   115
with this simplification our previous $(a + aa)^*$ example's 8000 nodes
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   116
will be reduced to just 17 and stays constant, no matter how long the
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   117
input string is.
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   118
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   119
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   120
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   121
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   122
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   123
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   124
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   125
8016a2480704 intro and chap2
Chengsong
parents: 532
diff changeset
   126
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   127
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   128
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   129
%	SECTION common identities
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   130
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   131
\section{Common Identities In Simplification-Related Functions} 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   132
Need to prove these before starting on the big correctness proof.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   133
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   134
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   135
%	SUBSECTION 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   136
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   137
\subsection{Idempotency of $\simp$}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   138
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   139
\begin{equation}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   140
	\simp \;r = \simp\; \simp \; r 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   141
\end{equation}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   142
This property means we do not have to repeatedly
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   143
apply simplification in each step, which justifies
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   144
our definition of $\blexersimp$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   145
It will also be useful in future proofs where properties such as 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   146
closed forms are needed.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   147
The proof is by structural induction on $r$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   148
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   149
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   150
%	SUBSECTION 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   151
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   152
\subsection{Syntactic Equivalence Under $\simp$}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   153
We prove that minor differences can be annhilated
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   154
by $\simp$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   155
For example,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   156
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   157
$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) = 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   158
 \simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   159
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   160
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   161
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   162
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   163
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   164
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   165
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   166
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   167
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   168
%	SECTION corretness proof
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   169
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   170
\section{Proof Technique of Correctness of Bit-coded Algorithm with Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   171
The non-trivial part of proving the correctness of the algorithm with simplification
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   172
compared with not having simplification is that we can no longer use the argument 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   173
in \cref{flex_retrieve}.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   174
The function \retrieve needs the structure of the annotated regular expression to 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   175
agree with the structure of the value, but simplification will always mess with the 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   176
structure:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   177
%TODO: after simp does not agree with each other: (a + 0) --> a v.s. Left(Char(a))