ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
author Chengsong
Mon, 06 Jun 2022 23:17:45 +0100
changeset 537 50e590823220
parent 536 aff7bf93b9c7
child 538 8016a2480704
permissions -rwxr-xr-x
more
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     4
\chapter{Correctness of Bit-coded Algorithm without Simplification}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded1} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    11
537
Chengsong
parents: 536
diff changeset
    12
\section{Bit-coded Algorithm}
Chengsong
parents: 536
diff changeset
    13
Chengsong
parents: 536
diff changeset
    14
Chengsong
parents: 536
diff changeset
    15
The lexer algorithm in Chapter \ref{Inj}, as shown in \ref{InjFigure},
Chengsong
parents: 536
diff changeset
    16
stores information of previous lexing steps
Chengsong
parents: 536
diff changeset
    17
on a stack, in the form of regular expressions
Chengsong
parents: 536
diff changeset
    18
and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc.
Chengsong
parents: 536
diff changeset
    19
\begin{envForCaption}
Chengsong
parents: 536
diff changeset
    20
\begin{ceqn}
Chengsong
parents: 536
diff changeset
    21
\begin{equation}%\label{graph:injLexer}
Chengsong
parents: 536
diff changeset
    22
\begin{tikzcd}
Chengsong
parents: 536
diff changeset
    23
r_0 \arrow[r, "\backslash c_0"]  \arrow[d] & r_1 \arrow[r, "\backslash c_1"] \arrow[d] & r_2 \arrow[r, dashed] \arrow[d] & r_n \arrow[d, "mkeps" description] \\
Chengsong
parents: 536
diff changeset
    24
v_0           & v_1 \arrow[l,"inj_{r_0} c_0"]                & v_2 \arrow[l, "inj_{r_1} c_1"]              & v_n \arrow[l, dashed]         
Chengsong
parents: 536
diff changeset
    25
\end{tikzcd}
Chengsong
parents: 536
diff changeset
    26
\end{equation}
Chengsong
parents: 536
diff changeset
    27
\end{ceqn}
Chengsong
parents: 536
diff changeset
    28
\caption{Injection-based Lexing from Chapter\ref{Inj}}\label{InjFigure}
Chengsong
parents: 536
diff changeset
    29
\end{envForCaption}
Chengsong
parents: 536
diff changeset
    30
\noindent
Chengsong
parents: 536
diff changeset
    31
This is both inefficient and prone to stack overflow.
Chengsong
parents: 536
diff changeset
    32
A natural question arises as to whether we can store lexing
Chengsong
parents: 536
diff changeset
    33
information on the fly, while still using regular expression 
Chengsong
parents: 536
diff changeset
    34
derivatives?
Chengsong
parents: 536
diff changeset
    35
Chengsong
parents: 536
diff changeset
    36
In a lexing algorithm's run, split by the current input position,
Chengsong
parents: 536
diff changeset
    37
we have a sub-string that has been consumed,
Chengsong
parents: 536
diff changeset
    38
and the sub-string that has yet to come.
Chengsong
parents: 536
diff changeset
    39
We already know what was before, and this should be reflected in the value 
Chengsong
parents: 536
diff changeset
    40
and the regular expression at that step as well. But this is not the 
Chengsong
parents: 536
diff changeset
    41
case for injection-based regular expression derivatives.
Chengsong
parents: 536
diff changeset
    42
Take the regex $(aa)^* \cdot bc$ matching the string $aabc$
Chengsong
parents: 536
diff changeset
    43
as an example, if we have just read the two former characters $aa$:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    44
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    45
\begin{center}
537
Chengsong
parents: 536
diff changeset
    46
\begin{envForCaption}
Chengsong
parents: 536
diff changeset
    47
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
    48
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
Chengsong
parents: 536
diff changeset
    49
        {Consumed: $aa$
Chengsong
parents: 536
diff changeset
    50
         \nodepart{two} Not Yet Reached: $bc$ };
Chengsong
parents: 536
diff changeset
    51
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 536
diff changeset
    52
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
    53
\caption{Partially matched String} 
Chengsong
parents: 536
diff changeset
    54
\end{envForCaption}
Chengsong
parents: 536
diff changeset
    55
\end{center}
Chengsong
parents: 536
diff changeset
    56
%\caption{Input String}\label{StringPartial}
Chengsong
parents: 536
diff changeset
    57
%\end{figure}
Chengsong
parents: 536
diff changeset
    58
Chengsong
parents: 536
diff changeset
    59
\noindent
Chengsong
parents: 536
diff changeset
    60
We have the value that has already been partially calculated,
Chengsong
parents: 536
diff changeset
    61
and the part that has yet to come:
Chengsong
parents: 536
diff changeset
    62
\begin{center}
Chengsong
parents: 536
diff changeset
    63
\begin{envForCaption}
Chengsong
parents: 536
diff changeset
    64
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
    65
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
Chengsong
parents: 536
diff changeset
    66
        {$\Seq(\Stars[\Char(a), \Char(a)], ???)$
Chengsong
parents: 536
diff changeset
    67
         \nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$};
Chengsong
parents: 536
diff changeset
    68
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 536
diff changeset
    69
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
    70
\caption{Partially constructed Value} 
Chengsong
parents: 536
diff changeset
    71
\end{envForCaption}
Chengsong
parents: 536
diff changeset
    72
\end{center}
Chengsong
parents: 536
diff changeset
    73
Chengsong
parents: 536
diff changeset
    74
In the regex derivative part , (after simplification)
Chengsong
parents: 536
diff changeset
    75
all we have is just what is about to come:
Chengsong
parents: 536
diff changeset
    76
\begin{center}
Chengsong
parents: 536
diff changeset
    77
\begin{envForCaption}
Chengsong
parents: 536
diff changeset
    78
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
    79
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={white!30,blue!20},]
Chengsong
parents: 536
diff changeset
    80
        {$???$
Chengsong
parents: 536
diff changeset
    81
         \nodepart{two} To Come: $b c$};
Chengsong
parents: 536
diff changeset
    82
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 536
diff changeset
    83
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
    84
\caption{Derivative} 
Chengsong
parents: 536
diff changeset
    85
\end{envForCaption}
Chengsong
parents: 536
diff changeset
    86
\end{center}
Chengsong
parents: 536
diff changeset
    87
\noindent
Chengsong
parents: 536
diff changeset
    88
The previous part is missing.
Chengsong
parents: 536
diff changeset
    89
How about keeping the partially constructed value 
Chengsong
parents: 536
diff changeset
    90
attached to the front of the regular expression?
Chengsong
parents: 536
diff changeset
    91
\begin{center}
Chengsong
parents: 536
diff changeset
    92
\begin{envForCaption}
Chengsong
parents: 536
diff changeset
    93
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
    94
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
Chengsong
parents: 536
diff changeset
    95
        {$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$
Chengsong
parents: 536
diff changeset
    96
         \nodepart{two} To Come: $b c$};
Chengsong
parents: 536
diff changeset
    97
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 536
diff changeset
    98
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
    99
\caption{Derivative} 
Chengsong
parents: 536
diff changeset
   100
\end{envForCaption}
Chengsong
parents: 536
diff changeset
   101
\end{center}
Chengsong
parents: 536
diff changeset
   102
\noindent
Chengsong
parents: 536
diff changeset
   103
If we do this kind of "attachment"
Chengsong
parents: 536
diff changeset
   104
and each time augment the attached partially
Chengsong
parents: 536
diff changeset
   105
constructed value when taking off a 
Chengsong
parents: 536
diff changeset
   106
character:
Chengsong
parents: 536
diff changeset
   107
\begin{center}
Chengsong
parents: 536
diff changeset
   108
\begin{envForCaption}
Chengsong
parents: 536
diff changeset
   109
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
   110
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
Chengsong
parents: 536
diff changeset
   111
        {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$
Chengsong
parents: 536
diff changeset
   112
         \nodepart{two} To Come: $c$};
Chengsong
parents: 536
diff changeset
   113
\end{tikzpicture}\\
Chengsong
parents: 536
diff changeset
   114
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
   115
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
Chengsong
parents: 536
diff changeset
   116
        {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$
Chengsong
parents: 536
diff changeset
   117
         \nodepart{two} EOF};
Chengsong
parents: 536
diff changeset
   118
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
   119
\caption{After $\backslash b$ and $\backslash c$} 
Chengsong
parents: 536
diff changeset
   120
\end{envForCaption}
Chengsong
parents: 536
diff changeset
   121
\end{center}
Chengsong
parents: 536
diff changeset
   122
\noindent
Chengsong
parents: 536
diff changeset
   123
In the end we could recover the value without a backward phase.
Chengsong
parents: 536
diff changeset
   124
But (partial) values are a bit clumsy to stick together with a regular expression, so 
Chengsong
parents: 536
diff changeset
   125
we instead use bit-codes to encode them.
Chengsong
parents: 536
diff changeset
   126
Chengsong
parents: 536
diff changeset
   127
Bits and bitcodes (lists of bits) are defined as:
Chengsong
parents: 536
diff changeset
   128
\begin{envForCaption}
Chengsong
parents: 536
diff changeset
   129
\begin{center}
Chengsong
parents: 536
diff changeset
   130
		$b ::=   S \mid  Z \qquad
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   131
bs ::= [] \mid b::bs    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   132
$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   133
\end{center}
537
Chengsong
parents: 536
diff changeset
   134
\caption{Bit-codes datatype}
Chengsong
parents: 536
diff changeset
   135
\end{envForCaption}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   136
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   137
\noindent
537
Chengsong
parents: 536
diff changeset
   138
The $S$ and $Z$ are not in bold in order to avoid 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   139
confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   140
bit-lists) can be used to encode values (or potentially incomplete values) in a
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   141
compact form. This can be straightforwardly seen in the following
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   142
coding function from values to bitcodes: 
537
Chengsong
parents: 536
diff changeset
   143
\begin{envForCaption}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   144
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   145
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   146
  $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   147
  $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
537
Chengsong
parents: 536
diff changeset
   148
  $\textit{code}(\Left\,v)$ & $\dn$ & $Z :: code(v)$\\
Chengsong
parents: 536
diff changeset
   149
  $\textit{code}(\Right\,v)$ & $\dn$ & $S :: code(v)$\\
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   150
  $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
537
Chengsong
parents: 536
diff changeset
   151
  $\textit{code}(\Stars\,[])$ & $\dn$ & $[Z]$\\
Chengsong
parents: 536
diff changeset
   152
  $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $S :: code(v) \;@\;
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   153
                                                 code(\Stars\,vs)$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   154
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   155
\end{center} 
537
Chengsong
parents: 536
diff changeset
   156
\caption{Coding Function for Values}
Chengsong
parents: 536
diff changeset
   157
\end{envForCaption}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   158
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   159
\noindent
537
Chengsong
parents: 536
diff changeset
   160
Here $\textit{code}$ encodes a value into a bit-code by converting
Chengsong
parents: 536
diff changeset
   161
$\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty
Chengsong
parents: 536
diff changeset
   162
star iteration by $S$. The border where a local star terminates
Chengsong
parents: 536
diff changeset
   163
is marked by $Z$. 
Chengsong
parents: 536
diff changeset
   164
This coding is lossy, as it throws away the information about
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   165
characters, and also does not encode the ``boundary'' between two
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   166
sequence values. Moreover, with only the bitcode we cannot even tell
537
Chengsong
parents: 536
diff changeset
   167
whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$. The
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   168
reason for choosing this compact way of storing information is that the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   169
relatively small size of bits can be easily manipulated and ``moved
537
Chengsong
parents: 536
diff changeset
   170
around'' in a regular expression. 
Chengsong
parents: 536
diff changeset
   171
Chengsong
parents: 536
diff changeset
   172
Chengsong
parents: 536
diff changeset
   173
We define the reverse operation of $\code$, which is $\decode$.
Chengsong
parents: 536
diff changeset
   174
As expected, $\decode$ not only requires the bit-codes,
Chengsong
parents: 536
diff changeset
   175
but also a regular expression to guide the decoding and 
Chengsong
parents: 536
diff changeset
   176
fill the gaps of characters:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   177
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   178
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   179
%\begin{definition}[Bitdecoding of Values]\mbox{}
537
Chengsong
parents: 536
diff changeset
   180
\begin{envForCaption}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   181
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   182
\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   183
  $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   184
  $\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
537
Chengsong
parents: 536
diff changeset
   185
  $\textit{decode}'\,(Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   186
     $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   187
       (\Left\,v, bs_1)$\\
537
Chengsong
parents: 536
diff changeset
   188
  $\textit{decode}'\,(S\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   189
     $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   190
       (\Right\,v, bs_1)$\\                           
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   191
  $\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   192
        $\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   193
  & &   $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   194
  & &   \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
537
Chengsong
parents: 536
diff changeset
   195
  $\textit{decode}'\,(Z\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
Chengsong
parents: 536
diff changeset
   196
  $\textit{decode}'\,(S\!::\!bs)\,(r^*)$ & $\dn$ & 
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   197
         $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   198
  & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   199
  & &   \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   200
  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   201
  $\textit{decode}\,bs\,r$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   202
     $\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   203
  & & $\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   204
       \textit{else}\;\textit{None}$                       
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   205
\end{tabular}    
537
Chengsong
parents: 536
diff changeset
   206
\end{center}
Chengsong
parents: 536
diff changeset
   207
\caption{Bit-decoding of values}    
Chengsong
parents: 536
diff changeset
   208
\end{envForCaption}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   209
%\end{definition}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   210
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   211
Sulzmann and Lu's integrated the bitcodes into regular expressions to
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   212
create annotated regular expressions \cite{Sulzmann2014}.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   213
\emph{Annotated regular expressions} are defined by the following
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   214
grammar:%\comment{ALTS should have  an $as$ in  the definitions, not  just $a_1$ and $a_2$}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   215
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   216
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   217
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   218
  $\textit{a}$ & $::=$  & $\ZERO$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   219
                  & $\mid$ & $_{bs}\ONE$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   220
                  & $\mid$ & $_{bs}{\bf c}$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   221
                  & $\mid$ & $_{bs}\sum\,as$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   222
                  & $\mid$ & $_{bs}a_1\cdot a_2$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   223
                  & $\mid$ & $_{bs}a^*$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   224
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   225
\end{center}  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   226
%(in \textit{ALTS})
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   227
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   228
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   229
where $bs$ stands for bitcodes, $a$  for $\mathbf{a}$nnotated regular
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   230
expressions and $as$ for a list of annotated regular expressions.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   231
The alternative constructor($\sum$) has been generalized to 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   232
accept a list of annotated regular expressions rather than just 2.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   233
We will show that these bitcodes encode information about
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   234
the (POSIX) value that should be generated by the Sulzmann and Lu
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   235
algorithm.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   236
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   237
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   238
To do lexing using annotated regular expressions, we shall first
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   239
transform the usual (un-annotated) regular expressions into annotated
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   240
regular expressions. This operation is called \emph{internalisation} and
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   241
defined as follows:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   242
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   243
%\begin{definition}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   244
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   245
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   246
  $(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   247
  $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   248
  $(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   249
  $(r_1 + r_2)^\uparrow$ & $\dn$ &
537
Chengsong
parents: 536
diff changeset
   250
  $_{[]}\sum[\textit{fuse}\,[Z]\,r_1^\uparrow,\,
Chengsong
parents: 536
diff changeset
   251
  \textit{fuse}\,[S]\,r_2^\uparrow]$\\
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   252
  $(r_1\cdot r_2)^\uparrow$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   253
         $_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   254
  $(r^*)^\uparrow$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   255
         $_{[]}(r^\uparrow)^*$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   256
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   257
\end{center}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   258
%\end{definition}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   259
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   260
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   261
We use up arrows here to indicate that the basic un-annotated regular
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   262
expressions are ``lifted up'' into something slightly more complex. In the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   263
fourth clause, $\textit{fuse}$ is an auxiliary function that helps to
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   264
attach bits to the front of an annotated regular expression. Its
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   265
definition is as follows:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   266
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   267
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   268
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   269
  $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   270
  $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   271
     $_{bs @ bs'}\ONE$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   272
  $\textit{fuse}\;bs\;_{bs'}{\bf c}$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   273
     $_{bs@bs'}{\bf c}$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   274
  $\textit{fuse}\;bs\,_{bs'}\sum\textit{as}$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   275
     $_{bs@bs'}\sum\textit{as}$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   276
  $\textit{fuse}\;bs\; _{bs'}a_1\cdot a_2$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   277
     $_{bs@bs'}a_1 \cdot a_2$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   278
  $\textit{fuse}\;bs\,_{bs'}a^*$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   279
     $_{bs @ bs'}a^*$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   280
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   281
\end{center}  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   282
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   283
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   284
After internalising the regular expression, we perform successive
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   285
derivative operations on the annotated regular expressions. This
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   286
derivative operation is the same as what we had previously for the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   287
basic regular expressions, except that we beed to take care of
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   288
the bitcodes:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   289
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   290
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   291
\iffalse
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   292
 %\begin{definition}{bder}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   293
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   294
  \begin{tabular}{@{}lcl@{}}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   295
  $(\textit{ZERO})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   296
  $(\textit{ONE}\;bs)\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   297
  $(\textit{CHAR}\;bs\,d)\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   298
        $\textit{if}\;c=d\; \;\textit{then}\;
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   299
         \textit{ONE}\;bs\;\textit{else}\;\textit{ZERO}$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   300
  $(\textit{ALTS}\;bs\,as)\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   301
  $\textit{ALTS}\;bs\,(map (\backslash c) as)$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   302
  $(\textit{SEQ}\;bs\,a_1\,a_2)\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   303
     $\textit{if}\;\textit{bnullable}\,a_1$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   304
					       & &$\textit{then}\;\textit{ALTS}\,bs\,List((\textit{SEQ}\,[]\,(a_1\,\backslash c)\,a_2),$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   305
					       & &$\phantom{\textit{then}\;\textit{ALTS}\,bs\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c)))$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   306
  & &$\textit{else}\;\textit{SEQ}\,bs\,(a_1\,\backslash c)\,a_2$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   307
  $(\textit{STAR}\,bs\,a)\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   308
      $\textit{SEQ}\;bs\,(\textit{fuse}\, [\Z] (r\,\backslash c))\,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   309
       (\textit{STAR}\,[]\,r)$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   310
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   311
\end{center}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   312
%\end{definition}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   313
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   314
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   315
  \begin{tabular}{@{}lcl@{}}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   316
  $(\textit{ZERO})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   317
  $(_{bs}\textit{ONE})\,\backslash c$ & $\dn$ & $\textit{ZERO}$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   318
  $(_{bs}\textit{CHAR}\;d)\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   319
        $\textit{if}\;c=d\; \;\textit{then}\;
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   320
         _{bs}\textit{ONE}\;\textit{else}\;\textit{ZERO}$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   321
  $(_{bs}\textit{ALTS}\;\textit{as})\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   322
  $_{bs}\textit{ALTS}\;(\textit{as}.\textit{map}(\backslash c))$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   323
  $(_{bs}\textit{SEQ}\;a_1\,a_2)\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   324
     $\textit{if}\;\textit{bnullable}\,a_1$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   325
					       & &$\textit{then}\;_{bs}\textit{ALTS}\,List((_{[]}\textit{SEQ}\,(a_1\,\backslash c)\,a_2),$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   326
					       & &$\phantom{\textit{then}\;_{bs}\textit{ALTS}\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c)))$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   327
  & &$\textit{else}\;_{bs}\textit{SEQ}\,(a_1\,\backslash c)\,a_2$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   328
  $(_{bs}\textit{STAR}\,a)\,\backslash c$ & $\dn$ &
537
Chengsong
parents: 536
diff changeset
   329
      $_{bs}\textit{SEQ}\;(\textit{fuse}\, [Z] \; r\,\backslash c )\,
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   330
       (_{bs}\textit{STAR}\,[]\,r)$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   331
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   332
\end{center}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   333
%\end{definition}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   334
\fi
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   335
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   336
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   337
  \begin{tabular}{@{}lcl@{}}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   338
  $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   339
  $(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   340
  $(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   341
        $\textit{if}\;c=d\; \;\textit{then}\;
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   342
         _{bs}\ONE\;\textit{else}\;\ZERO$\\  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   343
  $(_{bs}\sum \;\textit{as})\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   344
  $_{bs}\sum\;(\textit{map} (\_\backslash c) as )$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   345
  $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   346
     $\textit{if}\;\textit{bnullable}\,a_1$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   347
					       & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   348
					       & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   349
  & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   350
  $(_{bs}a^*)\,\backslash c$ & $\dn$ &
537
Chengsong
parents: 536
diff changeset
   351
      $_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   352
       (_{[]}r^*))$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   353
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   354
\end{center}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   355
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   356
%\end{definition}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   357
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   358
For instance, when we do derivative of  $_{bs}a^*$ with respect to c,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   359
we need to unfold it into a sequence,
537
Chengsong
parents: 536
diff changeset
   360
and attach an additional bit $Z$ to the front of $r \backslash c$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   361
to indicate one more star iteration. Also the sequence clause
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   362
is more subtle---when $a_1$ is $\textit{bnullable}$ (here
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   363
\textit{bnullable} is exactly the same as $\textit{nullable}$, except
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   364
that it is for annotated regular expressions, therefore we omit the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   365
definition). Assume that $\textit{bmkeps}$ correctly extracts the bitcode for how
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   366
$a_1$ matches the string prior to character $c$ (more on this later),
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   367
then the right branch of alternative, which is $\textit{fuse} \; \bmkeps \;  a_1 (a_2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   368
\backslash c)$ will collapse the regular expression $a_1$(as it has
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   369
already been fully matched) and store the parsing information at the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   370
head of the regular expression $a_2 \backslash c$ by fusing to it. The
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   371
bitsequence $\textit{bs}$, which was initially attached to the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   372
first element of the sequence $a_1 \cdot a_2$, has
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   373
now been elevated to the top-level of $\sum$, as this information will be
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   374
needed whichever way the sequence is matched---no matter whether $c$ belongs
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   375
to $a_1$ or $ a_2$. After building these derivatives and maintaining all
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   376
the lexing information, we complete the lexing by collecting the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   377
bitcodes using a generalised version of the $\textit{mkeps}$ function
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   378
for annotated regular expressions, called $\textit{bmkeps}$:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   379
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   380
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   381
%\begin{definition}[\textit{bmkeps}]\mbox{}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   382
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   383
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   384
  $\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   385
  $\textit{bmkeps}\,(_{bs}\sum a::\textit{as})$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   386
     $\textit{if}\;\textit{bnullable}\,a$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   387
  & &$\textit{then}\;bs\,@\,\textit{bmkeps}\,a$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   388
  & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{bs}\sum \textit{as})$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   389
  $\textit{bmkeps}\,(_{bs} a_1 \cdot a_2)$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   390
     $bs \,@\,\textit{bmkeps}\,a_1\,@\, \textit{bmkeps}\,a_2$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   391
  $\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
537
Chengsong
parents: 536
diff changeset
   392
     $bs \,@\, [Z]$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   393
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   394
\end{center}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   395
%\end{definition}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   396
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   397
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   398
This function completes the value information by travelling along the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   399
path of the regular expression that corresponds to a POSIX value and
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   400
collecting all the bitcodes, and using $S$ to indicate the end of star
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   401
iterations. If we take the bitcodes produced by $\textit{bmkeps}$ and
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   402
decode them, we get the value we expect. The corresponding lexing
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   403
algorithm looks as follows:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   404
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   405
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   406
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   407
  $\textit{blexer}\;r\,s$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   408
      $\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\                
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   409
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   410
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   411
  & & $\;\;\textit{else}\;\textit{None}$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   412
\end{tabular}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   413
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   414
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   415
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   416
In this definition $\_\backslash s$ is the  generalisation  of the derivative
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   417
operation from characters to strings (just like the derivatives for un-annotated
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   418
regular expressions).
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   419
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   420
Now we introduce the simplifications, which is why we introduce the 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   421
bitcodes in the first place.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   422
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   423
\subsection*{Simplification Rules}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   424
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   425
This section introduces aggressive (in terms of size) simplification rules
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   426
on annotated regular expressions
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   427
to keep derivatives small. Such simplifications are promising
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   428
as we have
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   429
generated test data that show
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   430
that a good tight bound can be achieved. We could only
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   431
partially cover the search space as there are infinitely many regular
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   432
expressions and strings. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   433
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   434
One modification we introduced is to allow a list of annotated regular
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   435
expressions in the $\sum$ constructor. This allows us to not just
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   436
delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   437
also unnecessary ``copies'' of regular expressions (very similar to
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   438
simplifying $r + r$ to just $r$, but in a more general setting). Another
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   439
modification is that we use simplification rules inspired by Antimirov's
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   440
work on partial derivatives. They maintain the idea that only the first
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   441
``copy'' of a regular expression in an alternative contributes to the
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   442
calculation of a POSIX value. All subsequent copies can be pruned away from
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   443
the regular expression. A recursive definition of our  simplification function 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   444
that looks somewhat similar to our Scala code is given below:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   445
%\comment{Use $\ZERO$, $\ONE$ and so on. 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   446
%Is it $ALTS$ or $ALTS$?}\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   447
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   448
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   449
  \begin{tabular}{@{}lcl@{}}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   450
   
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   451
  $\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   452
   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   453
   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   454
   &&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   455
   &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   456
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   457
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   458
  $\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   459
  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   460
   &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   461
   &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   462
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   463
   $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   464
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   465
\end{center}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   466
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   467
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   468
The simplification does a pattern matching on the regular expression.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   469
When it detected that the regular expression is an alternative or
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   470
sequence, it will try to simplify its child regular expressions
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   471
recursively and then see if one of the children turns into $\ZERO$ or
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   472
$\ONE$, which might trigger further simplification at the current level.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   473
The most involved part is the $\sum$ clause, where we use two
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   474
auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   475
alternatives and reduce as many duplicates as possible. Function
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   476
$\textit{distinct}$  keeps the first occurring copy only and removes all later ones
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   477
when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   478
Its recursive definition is given below:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   479
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   480
 \begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   481
  \begin{tabular}{@{}lcl@{}}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   482
  $\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   483
     (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   484
  $\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \;  \textit{as'} $ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   485
    $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise) 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   486
\end{tabular}    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   487
\end{center}  
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   488
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   489
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   490
Here $\textit{flatten}$ behaves like the traditional functional programming flatten
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   491
function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   492
removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   493
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   494
Having defined the $\simp$ function,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   495
we can use the previous notation of  natural
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   496
extension from derivative w.r.t.~character to derivative
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   497
w.r.t.~string:%\comment{simp in  the [] case?}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   498
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   499
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   500
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   501
$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   502
$r \backslash_{simp} [\,] $ & $\dn$ & $r$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   503
\end{tabular}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   504
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   505
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   506
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   507
to obtain an optimised version of the algorithm:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   508
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   509
 \begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   510
\begin{tabular}{lcl}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   511
  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   512
      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   513
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   514
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   515
  & & $\;\;\textit{else}\;\textit{None}$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   516
\end{tabular}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   517
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   518
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   519
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   520
This algorithm keeps the regular expression size small, for example,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   521
with this simplification our previous $(a + aa)^*$ example's 8000 nodes
536
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
   522
will be reduced to just 17 and stays constant, no matter how long the
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   523
input string is.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   524
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   525
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   526
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   527
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   528
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   529
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   530
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   531
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   533
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   534
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   535
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   536
%	SUBSECTION 1
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   537
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   538
\section{Specifications of Some Helper Functions}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   539
Here we give some functions' definitions, 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   540
which we will use later.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   541
\begin{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   542
\begin{tabular}{ccc}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   543
$\retrieve \; \ACHAR \, \textit{bs} \, c \; \Char(c) = \textit{bs}$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   544
\end{tabular}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   545
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   546
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   547
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   548
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   549
%	SECTION  correctness proof
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   550
%----------------------------------------------------------------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   551
\section{Correctness of Bit-coded Algorithm (Without Simplification)}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   552
We now give the proof the correctness of the algorithm with bit-codes.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   553
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   554
Ausaf and Urban cleverly defined an auxiliary function called $\flex$,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   555
defined as
536
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
   556
\begin{center}
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
   557
\begin{tabular}{lcr}
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
   558
$\flex \; r \; f \; [] \; v$       &  $=$ &   $f\; v$\\
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
   559
$\flex \; r \; f \; c :: s \; v$ &  $=$ &   $\flex \; r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v$
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
   560
\end{tabular}
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
   561
\end{center}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   562
which accumulates the characters that needs to be injected back, 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   563
and does the injection in a stack-like manner (last taken derivative first injected).
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   564
$\flex$ is connected to the $\lexer$:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   565
\begin{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   566
$\flex \; r \; \textit{id}\; s \; \mkeps (r\backslash s) = \lexer \; r \; s$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   567
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   568
$\flex$ provides us a bridge between $\lexer$ and $\blexer$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   569
What is even better about $\flex$ is that it allows us to 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   570
directly operate on the value $\mkeps (r\backslash v)$,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   571
which is pivotal in the definition of  $\lexer $ and $\blexer$, but not visible as an argument.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   572
When the value created by $\mkeps$ becomes available, one can 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   573
prove some stepwise properties of lexing nicely:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   574
\begin{lemma}\label{flexStepwise}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   575
$\textit{flex} \; r \; f \; s@[c] \; v= \flex \; r \; f\; s \; (\inj \; (r\backslash s) \; c \; v) $
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   576
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   577
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   578
And for $\blexer$ we have a function with stepwise properties like $\flex$ as well,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   579
called $\retrieve$\ref{retrieveDef}.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   580
$\retrieve$ takes bit-codes from annotated regular expressions
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   581
guided by a value.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   582
$\retrieve$ is connected to the $\blexer$ in the following way:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   583
\begin{lemma}\label{blexer_retrieve}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   584
$\blexer \; r \; s = \decode  \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   585
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   586
If you take derivative of an annotated regular expression, 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   587
you can $\retrieve$ the same bit-codes as before the derivative took place,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   588
provided that you use the corresponding value:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   589
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   590
\begin{lemma}\label{retrieveStepwise}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   591
$\retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   592
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   593
The other good thing about $\retrieve$ is that it can be connected to $\flex$:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   594
%centralLemma1
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   595
\begin{lemma}\label{flex_retrieve}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   596
$\flex \; r \; \textit{id}\; s\; v = \decode \; (\retrieve \; (r\backslash s )\; v) \; r$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   597
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   598
\begin{proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   599
By induction on $s$. The induction tactic is reverse induction on strings.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   600
$v$ is allowed to be arbitrary.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   601
The crucial point is to rewrite 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   602
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   603
\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c]) 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   604
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   605
as
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   606
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   607
\retrieve \; (r \backslash s) \; (\inj \; (r \backslash s) \; c\;  \mkeps (r \backslash s@[c]))
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   608
\].
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   609
This enables us to equate 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   610
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   611
\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c]) 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   612
\] 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   613
with 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   614
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   615
\flex \; r \; \textit{id} \; s \; (\inj \; (r\backslash s) \; c\; (\mkeps (r\backslash s@[c])))
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   616
\],
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   617
which in turn can be rewritten as
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   618
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   619
\flex \; r \; \textit{id} \; s@[c] \;  (\mkeps (r\backslash s@[c]))
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   620
\].
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   621
\end{proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   622
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   623
With the above lemma we can now link $\flex$ and $\blexer$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   624
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   625
\begin{lemma}\label{flex_blexer}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   626
$\textit{flex} \; r \; \textit{id} \; s \; \mkeps(r \backslash s)  = \blexer \; r \; s$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   627
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   628
\begin{proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   629
Using two of the above lemmas: \ref{flex_retrieve} and \ref{blexer_retrieve}.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   630
\end{proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   631
Finally 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   632
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   633
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   634