ChengsongTanPhdThesis/Chapters/ChapterBitcoded1.tex
author Chengsong
Thu, 23 Jun 2022 18:32:14 +0100
changeset 547 feae84f66472
parent 530 823d9b19d21c
permissions -rwxr-xr-x
before alternating rewriting relation
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     1
% Chapter Template
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     2
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     3
% Main chapter title
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm without Simplification}
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     5
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     6
\label{ChapterBitcoded1} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
    11
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    12
\section*{Bit-coded Algorithm}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    13
Bits and bitcodes (lists of bits) are defined as:
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
    14
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    15
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    16
		$b ::=   1 \mid  0 \qquad
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    17
bs ::= [] \mid b::bs    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    18
$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    19
\end{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    20
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    21
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    22
The $1$ and $0$ are not in bold in order to avoid 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    23
confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    24
bit-lists) can be used to encode values (or potentially incomplete values) in a
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    25
compact form. This can be straightforwardly seen in the following
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    26
coding function from values to bitcodes: 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    27
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    28
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    29
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    30
  $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    31
  $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    32
  $\textit{code}(\Left\,v)$ & $\dn$ & $0 :: code(v)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    33
  $\textit{code}(\Right\,v)$ & $\dn$ & $1 :: code(v)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    34
  $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    35
  $\textit{code}(\Stars\,[])$ & $\dn$ & $[0]$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    36
  $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $1 :: code(v) \;@\;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    37
                                                 code(\Stars\,vs)$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    38
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    39
\end{center} 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    40
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    41
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    42
Here $\textit{code}$ encodes a value into a bitcodes by converting
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    43
$\Left$ into $0$, $\Right$ into $1$, and marks the start of a non-empty
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    44
star iteration by $1$. The border where a local star terminates
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    45
is marked by $0$. This coding is lossy, as it throws away the information about
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    46
characters, and also does not encode the ``boundary'' between two
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    47
sequence values. Moreover, with only the bitcode we cannot even tell
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    48
whether the $1$s and $0$s are for $\Left/\Right$ or $\Stars$. The
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    49
reason for choosing this compact way of storing information is that the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    50
relatively small size of bits can be easily manipulated and ``moved
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    51
around'' in a regular expression. In order to recover values, we will 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    52
need the corresponding regular expression as an extra information. This
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    53
means the decoding function is defined as:
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    54
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    55
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    56
%\begin{definition}[Bitdecoding of Values]\mbox{}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    57
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    58
\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    59
  $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    60
  $\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    61
  $\textit{decode}'\,(0\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    62
     $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    63
       (\Left\,v, bs_1)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    64
  $\textit{decode}'\,(1\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    65
     $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    66
       (\Right\,v, bs_1)$\\                           
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    67
  $\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    68
        $\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    69
  & &   $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    70
  & &   \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    71
  $\textit{decode}'\,(0\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    72
  $\textit{decode}'\,(1\!::\!bs)\,(r^*)$ & $\dn$ & 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    73
         $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    74
  & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    75
  & &   \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    76
  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    77
  $\textit{decode}\,bs\,r$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    78
     $\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    79
  & & $\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    80
       \textit{else}\;\textit{None}$                       
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    81
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    82
\end{center}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    83
%\end{definition}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    84
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    85
Sulzmann and Lu's integrated the bitcodes into regular expressions to
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    86
create annotated regular expressions \cite{Sulzmann2014}.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    87
\emph{Annotated regular expressions} are defined by the following
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    88
grammar:%\comment{ALTS should have  an $as$ in  the definitions, not  just $a_1$ and $a_2$}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    89
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    90
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    91
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    92
  $\textit{a}$ & $::=$  & $\ZERO$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    93
                  & $\mid$ & $_{bs}\ONE$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    94
                  & $\mid$ & $_{bs}{\bf c}$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    95
                  & $\mid$ & $_{bs}\sum\,as$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    96
                  & $\mid$ & $_{bs}a_1\cdot a_2$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    97
                  & $\mid$ & $_{bs}a^*$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    98
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
    99
\end{center}  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   100
%(in \textit{ALTS})
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   101
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   102
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   103
where $bs$ stands for bitcodes, $a$  for $\mathbf{a}$nnotated regular
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   104
expressions and $as$ for a list of annotated regular expressions.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   105
The alternative constructor($\sum$) has been generalized to 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   106
accept a list of annotated regular expressions rather than just 2.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   107
We will show that these bitcodes encode information about
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   108
the (POSIX) value that should be generated by the Sulzmann and Lu
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   109
algorithm.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   110
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   111
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   112
To do lexing using annotated regular expressions, we shall first
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   113
transform the usual (un-annotated) regular expressions into annotated
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   114
regular expressions. This operation is called \emph{internalisation} and
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   115
defined as follows:
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   116
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   117
%\begin{definition}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   118
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   119
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   120
  $(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   121
  $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   122
  $(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   123
  $(r_1 + r_2)^\uparrow$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   124
  $_{[]}\sum[\textit{fuse}\,[0]\,r_1^\uparrow,\,
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   125
  \textit{fuse}\,[1]\,r_2^\uparrow]$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   126
  $(r_1\cdot r_2)^\uparrow$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   127
         $_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   128
  $(r^*)^\uparrow$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   129
         $_{[]}(r^\uparrow)^*$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   130
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   131
\end{center}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   132
%\end{definition}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   133
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   134
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   135
We use up arrows here to indicate that the basic un-annotated regular
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   136
expressions are ``lifted up'' into something slightly more complex. In the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   137
fourth clause, $\textit{fuse}$ is an auxiliary function that helps to
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   138
attach bits to the front of an annotated regular expression. Its
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   139
definition is as follows:
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   140
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   141
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   142
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   143
  $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   144
  $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   145
     $_{bs @ bs'}\ONE$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   146
  $\textit{fuse}\;bs\;_{bs'}{\bf c}$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   147
     $_{bs@bs'}{\bf c}$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   148
  $\textit{fuse}\;bs\,_{bs'}\sum\textit{as}$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   149
     $_{bs@bs'}\sum\textit{as}$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   150
  $\textit{fuse}\;bs\; _{bs'}a_1\cdot a_2$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   151
     $_{bs@bs'}a_1 \cdot a_2$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   152
  $\textit{fuse}\;bs\,_{bs'}a^*$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   153
     $_{bs @ bs'}a^*$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   154
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   155
\end{center}  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   156
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   157
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   158
After internalising the regular expression, we perform successive
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   159
derivative operations on the annotated regular expressions. This
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   160
derivative operation is the same as what we had previously for the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   161
basic regular expressions, except that we beed to take care of
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   162
the bitcodes:
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   163
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   164
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   165
\iffalse
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   166
 %\begin{definition}{bder}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   167
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   168
  \begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   169
  $(\textit{ZERO})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   170
  $(\textit{ONE}\;bs)\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   171
  $(\textit{CHAR}\;bs\,d)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   172
        $\textit{if}\;c=d\; \;\textit{then}\;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   173
         \textit{ONE}\;bs\;\textit{else}\;\textit{ZERO}$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   174
  $(\textit{ALTS}\;bs\,as)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   175
  $\textit{ALTS}\;bs\,(map (\backslash c) as)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   176
  $(\textit{SEQ}\;bs\,a_1\,a_2)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   177
     $\textit{if}\;\textit{bnullable}\,a_1$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   178
					       & &$\textit{then}\;\textit{ALTS}\,bs\,List((\textit{SEQ}\,[]\,(a_1\,\backslash c)\,a_2),$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   179
					       & &$\phantom{\textit{then}\;\textit{ALTS}\,bs\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c)))$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   180
  & &$\textit{else}\;\textit{SEQ}\,bs\,(a_1\,\backslash c)\,a_2$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   181
  $(\textit{STAR}\,bs\,a)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   182
      $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\,\backslash c))\,
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   183
       (\textit{STAR}\,[]\,r)$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   184
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   185
\end{center}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   186
%\end{definition}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   187
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   188
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   189
  \begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   190
  $(\textit{ZERO})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   191
  $(_{bs}\textit{ONE})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   192
  $(_{bs}\textit{CHAR}\;d)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   193
        $\textit{if}\;c=d\; \;\textit{then}\;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   194
         _{bs}\textit{ONE}\;\textit{else}\;\textit{ZERO}$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   195
  $(_{bs}\textit{ALTS}\;\textit{as})\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   196
  $_{bs}\textit{ALTS}\;(\textit{as}.\textit{map}(\backslash c))$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   197
  $(_{bs}\textit{SEQ}\;a_1\,a_2)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   198
     $\textit{if}\;\textit{bnullable}\,a_1$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   199
					       & &$\textit{then}\;_{bs}\textit{ALTS}\,List((_{[]}\textit{SEQ}\,(a_1\,\backslash c)\,a_2),$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   200
					       & &$\phantom{\textit{then}\;_{bs}\textit{ALTS}\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c)))$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   201
  & &$\textit{else}\;_{bs}\textit{SEQ}\,(a_1\,\backslash c)\,a_2$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   202
  $(_{bs}\textit{STAR}\,a)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   203
      $_{bs}\textit{SEQ}\;(\textit{fuse}\, [0] \; r\,\backslash c )\,
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   204
       (_{bs}\textit{STAR}\,[]\,r)$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   205
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   206
\end{center}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   207
%\end{definition}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   208
\fi
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   209
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   210
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   211
  \begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   212
  $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   213
  $(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   214
  $(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   215
        $\textit{if}\;c=d\; \;\textit{then}\;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   216
         _{bs}\ONE\;\textit{else}\;\ZERO$\\  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   217
  $(_{bs}\sum \;\textit{as})\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   218
  $_{bs}\sum\;(\textit{map} (\_\backslash c) as )$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   219
  $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   220
     $\textit{if}\;\textit{bnullable}\,a_1$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   221
					       & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   222
					       & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   223
  & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   224
  $(_{bs}a^*)\,\backslash c$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   225
      $_{bs}(\textit{fuse}\, [0] \; r\,\backslash c)\cdot
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   226
       (_{[]}r^*))$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   227
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   228
\end{center}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   229
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   230
%\end{definition}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   231
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   232
For instance, when we do derivative of  $_{bs}a^*$ with respect to c,
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   233
we need to unfold it into a sequence,
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   234
and attach an additional bit $0$ to the front of $r \backslash c$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   235
to indicate one more star iteration. Also the sequence clause
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   236
is more subtle---when $a_1$ is $\textit{bnullable}$ (here
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   237
\textit{bnullable} is exactly the same as $\textit{nullable}$, except
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   238
that it is for annotated regular expressions, therefore we omit the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   239
definition). Assume that $\textit{bmkeps}$ correctly extracts the bitcode for how
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   240
$a_1$ matches the string prior to character $c$ (more on this later),
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   241
then the right branch of alternative, which is $\textit{fuse} \; \bmkeps \;  a_1 (a_2
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   242
\backslash c)$ will collapse the regular expression $a_1$(as it has
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   243
already been fully matched) and store the parsing information at the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   244
head of the regular expression $a_2 \backslash c$ by fusing to it. The
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   245
bitsequence $\textit{bs}$, which was initially attached to the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   246
first element of the sequence $a_1 \cdot a_2$, has
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   247
now been elevated to the top-level of $\sum$, as this information will be
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   248
needed whichever way the sequence is matched---no matter whether $c$ belongs
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   249
to $a_1$ or $ a_2$. After building these derivatives and maintaining all
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   250
the lexing information, we complete the lexing by collecting the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   251
bitcodes using a generalised version of the $\textit{mkeps}$ function
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   252
for annotated regular expressions, called $\textit{bmkeps}$:
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   253
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   254
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   255
%\begin{definition}[\textit{bmkeps}]\mbox{}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   256
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   257
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   258
  $\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   259
  $\textit{bmkeps}\,(_{bs}\sum a::\textit{as})$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   260
     $\textit{if}\;\textit{bnullable}\,a$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   261
  & &$\textit{then}\;bs\,@\,\textit{bmkeps}\,a$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   262
  & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{bs}\sum \textit{as})$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   263
  $\textit{bmkeps}\,(_{bs} a_1 \cdot a_2)$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   264
     $bs \,@\,\textit{bmkeps}\,a_1\,@\, \textit{bmkeps}\,a_2$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   265
  $\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   266
     $bs \,@\, [0]$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   267
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   268
\end{center}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   269
%\end{definition}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   270
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   271
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   272
This function completes the value information by travelling along the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   273
path of the regular expression that corresponds to a POSIX value and
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   274
collecting all the bitcodes, and using $S$ to indicate the end of star
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   275
iterations. If we take the bitcodes produced by $\textit{bmkeps}$ and
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   276
decode them, we get the value we expect. The corresponding lexing
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   277
algorithm looks as follows:
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   278
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   279
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   280
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   281
  $\textit{blexer}\;r\,s$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   282
      $\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\                
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   283
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   284
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   285
  & & $\;\;\textit{else}\;\textit{None}$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   286
\end{tabular}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   287
\end{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   288
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   289
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   290
In this definition $\_\backslash s$ is the  generalisation  of the derivative
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   291
operation from characters to strings (just like the derivatives for un-annotated
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   292
regular expressions).
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   293
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   294
Now we introduce the simplifications, which is why we introduce the 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   295
bitcodes in the first place.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   296
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   297
\subsection*{Simplification Rules}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   298
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   299
This section introduces aggressive (in terms of size) simplification rules
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   300
on annotated regular expressions
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   301
to keep derivatives small. Such simplifications are promising
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   302
as we have
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   303
generated test data that show
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   304
that a good tight bound can be achieved. We could only
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   305
partially cover the search space as there are infinitely many regular
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   306
expressions and strings. 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   307
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   308
One modification we introduced is to allow a list of annotated regular
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   309
expressions in the $\sum$ constructor. This allows us to not just
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   310
delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   311
also unnecessary ``copies'' of regular expressions (very similar to
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   312
simplifying $r + r$ to just $r$, but in a more general setting). Another
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   313
modification is that we use simplification rules inspired by Antimirov's
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   314
work on partial derivatives. They maintain the idea that only the first
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   315
``copy'' of a regular expression in an alternative contributes to the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   316
calculation of a POSIX value. All subsequent copies can be pruned away from
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   317
the regular expression. A recursive definition of our  simplification function 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   318
that looks somewhat similar to our Scala code is given below:
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   319
%\comment{Use $\ZERO$, $\ONE$ and so on. 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   320
%Is it $ALTS$ or $ALTS$?}\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   321
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   322
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   323
  \begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   324
   
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   325
  $\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   326
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   327
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   328
   &&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   329
   &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   330
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   331
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   332
  $\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   333
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   334
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   335
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   336
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   337
   $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   338
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   339
\end{center}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   340
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   341
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   342
The simplification does a pattern matching on the regular expression.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   343
When it detected that the regular expression is an alternative or
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   344
sequence, it will try to simplify its child regular expressions
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   345
recursively and then see if one of the children turns into $\ZERO$ or
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   346
$\ONE$, which might trigger further simplification at the current level.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   347
The most involved part is the $\sum$ clause, where we use two
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   348
auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   349
alternatives and reduce as many duplicates as possible. Function
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   350
$\textit{distinct}$  keeps the first occurring copy only and removes all later ones
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   351
when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   352
Its recursive definition is given below:
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   353
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   354
 \begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   355
  \begin{tabular}{@{}lcl@{}}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   356
  $\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   357
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   358
  $\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \;  \textit{as'} $ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   359
    $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise) 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   360
\end{tabular}    
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   361
\end{center}  
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   362
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   363
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   364
Here $\textit{flatten}$ behaves like the traditional functional programming flatten
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   365
function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   366
removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   367
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   368
Having defined the $\simp$ function,
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   369
we can use the previous notation of  natural
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   370
extension from derivative w.r.t.~character to derivative
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   371
w.r.t.~string:%\comment{simp in  the [] case?}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   372
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   373
\begin{center}
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   374
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   375
$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   376
$r \backslash_{simp} [\,] $ & $\dn$ & $r$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   377
\end{tabular}
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   378
\end{center}
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   379
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   380
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   381
to obtain an optimised version of the algorithm:
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   382
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   383
 \begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   384
\begin{tabular}{lcl}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   385
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   386
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   387
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   388
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   389
  & & $\;\;\textit{else}\;\textit{None}$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   390
\end{tabular}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   391
\end{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   392
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   393
\noindent
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   394
This algorithm keeps the regular expression size small, for example,
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   395
with this simplification our previous $(a + aa)^*$ example's 8000 nodes
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   396
will be reduced to just 6 and stays constant, no matter how long the
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   397
input string is.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   398
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   399
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   400
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   401
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   402
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   403
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   404
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   405
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   406
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   407
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   408
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   409
%-----------------------------------
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   410
%	SUBSECTION 1
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   411
%-----------------------------------
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   412
\section{Specifications of Some Helper Functions}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   413
Here we give some functions' definitions, 
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   414
which we will use later.
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   415
\begin{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   416
\begin{tabular}{ccc}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   417
$\retrieve \; \ACHAR \, \textit{bs} \, c \; \Char(c) = \textit{bs}$
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   418
\end{tabular}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   419
\end{center}
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   420
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   421
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   422
%----------------------------------------------------------------------------------------
528
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   423
%	SECTION  correctness proof
527
2c907b118f78 all chapters put in
Chengsong
parents:
diff changeset
   424
%----------------------------------------------------------------------------------------
528
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   425
\section{Correctness of Bit-coded Algorithm (Without Simplification)}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   426
We now give the proof the correctness of the algorithm with bit-codes.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   427
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   428
Ausaf and Urban cleverly defined an auxiliary function called $\flex$,
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   429
defined as
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   430
\[
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   431
\flex \; r \; f \; [] \; v \; = \; f\; v
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   432
\flex \; r \; f \; c :: s \; v =  \flex r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   433
\]
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   434
which accumulates the characters that needs to be injected back, 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   435
and does the injection in a stack-like manner (last taken derivative first injected).
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   436
$\flex$ is connected to the $\lexer$:
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   437
\begin{lemma}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   438
$\flex \; r \; \textit{id}\; s \; \mkeps (r\backslash s) = \lexer \; r \; s$
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   439
\end{lemma}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   440
$\flex$ provides us a bridge between $\lexer$ and $\blexer$.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   441
What is even better about $\flex$ is that it allows us to 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   442
directly operate on the value $\mkeps (r\backslash v)$,
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   443
which is pivotal in the definition of  $\lexer $ and $\blexer$, but not visible as an argument.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   444
When the value created by $\mkeps$ becomes available, one can 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   445
prove some stepwise properties of lexing nicely:
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   446
\begin{lemma}\label{flexStepwise}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   447
$\textit{flex} \; r \; f \; s@[c] \; v= \flex \; r \; f\; s \; (\inj \; (r\backslash s) \; c \; v) $
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   448
\end{lemma}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   449
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   450
And for $\blexer$ we have a function with stepwise properties like $\flex$ as well,
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   451
called $\retrieve$\ref{retrieveDef}.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   452
$\retrieve$ takes bit-codes from annotated regular expressions
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   453
guided by a value.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   454
$\retrieve$ is connected to the $\blexer$ in the following way:
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   455
\begin{lemma}\label{blexer_retrieve}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   456
$\blexer \; r \; s = \decode  \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   457
\end{lemma}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   458
If you take derivative of an annotated regular expression, 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   459
you can $\retrieve$ the same bit-codes as before the derivative took place,
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   460
provided that you use the corresponding value:
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   461
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   462
\begin{lemma}\label{retrieveStepwise}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   463
$\retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   464
\end{lemma}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   465
The other good thing about $\retrieve$ is that it can be connected to $\flex$:
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   466
%centralLemma1
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   467
\begin{lemma}\label{flex_retrieve}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   468
$\flex \; r \; \textit{id}\; s\; v = \decode \; (\retrieve \; (r\backslash s )\; v) \; r$
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   469
\end{lemma}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   470
\begin{proof}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   471
By induction on $s$. The induction tactic is reverse induction on strings.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   472
$v$ is allowed to be arbitrary.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   473
The crucial point is to rewrite 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   474
\[
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   475
\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c]) 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   476
\]
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   477
as
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   478
\[
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   479
\retrieve \; (r \backslash s) \; (\inj \; (r \backslash s) \; c\;  \mkeps (r \backslash s@[c]))
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   480
\].
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   481
This enables us to equate 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   482
\[
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   483
\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c]) 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   484
\] 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   485
with 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   486
\[
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   487
\flex \; r \; \textit{id} \; s \; (\inj \; (r\backslash s) \; c\; (\mkeps (r\backslash s@[c])))
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   488
\],
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   489
which in turn can be rewritten as
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   490
\[
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   491
\flex \; r \; \textit{id} \; s@[c] \;  (\mkeps (r\backslash s@[c]))
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   492
\].
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   493
\end{proof}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   494
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   495
With the above lemma we can now link $\flex$ and $\blexer$.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   496
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   497
\begin{lemma}\label{flex_blexer}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   498
$\textit{flex} \; r \; \textit{id} \; s \; \mkeps(r \backslash s)  = \blexer \; r \; s$
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   499
\end{lemma}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   500
\begin{proof}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   501
Using two of the above lemmas: \ref{flex_retrieve} and \ref{blexer_retrieve}.
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   502
\end{proof}
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   503
Finally 
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   504
28751de4b4ba revised according to comments
Chengsong
parents: 527
diff changeset
   505
530
823d9b19d21c all comments addressed
Chengsong
parents: 528
diff changeset
   506