ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
author Chengsong
Wed, 17 Aug 2022 22:59:15 +0100
changeset 580 e0f0a81f907b
parent 579 35df9cdd36ca
child 581 9db2500629be
permissions -rwxr-xr-x
halfway chap3
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     1
% Chapter Template
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     2
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     3
% Main chapter title
538
8016a2480704 intro and chap2
Chengsong
parents: 537
diff changeset
     4
\chapter{Bit-coded Algorithm of Sulzmann and Lu}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     5
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     6
\label{Bitcoded1} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     7
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     8
%simplifications and therefore introduce our version of the bitcoded algorithm and 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
     9
%its correctness proof in 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
    10
%Chapter 3\ref{Chapter3}. 
564
Chengsong
parents: 543
diff changeset
    11
In this chapter, we are going to describe the bit-coded algorithm
Chengsong
parents: 543
diff changeset
    12
introduced by Sulzmann and Lu \parencite{Sulzmann2014} to address the growth problem of 
Chengsong
parents: 543
diff changeset
    13
regular expressions. 
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    14
We have implemented their algorithm in Scala, and found out inefficiencies
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    15
in their algorithm such as de-duplication not working properly and redundant
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    16
fixpoint construction. Their algorithm is improved and verified with the help of
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    17
formal proofs.
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
    18
\section{The Motivation Behind Using Bitcodes}
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    19
We first do a recap of what was going on 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    20
in the lexer algorithm in Chapter \ref{Inj},
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    21
\begin{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    22
\begin{tabular}{lcl}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    23
	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    24
	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    25
	& & $\quad \phantom{\mid}\; \None \implies \None$\\
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    26
	& & $\quad \mid           \Some(v) \implies \Some(\inj \; r\; c\; v)$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    27
\end{tabular}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    28
\end{center}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    29
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    30
The algorithm recursively calls $\lexer$ on 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    31
each new character input,
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    32
and before starting a child call
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    33
it stores information of previous lexing steps
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    34
on a stack, in the form of regular expressions
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    35
and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc.
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    36
Each descent into deeper recursive calls in $\lexer$
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    37
causes a new pair of $r_i, c_i$ to be pushed to the call stack.
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
    38
\begin{figure}[H]
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    39
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    40
%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    41
\node  [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    42
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    43
%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    44
\node  [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    45
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    46
%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    47
\node  [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    48
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    49
\node [minimum width = 2cm, rectangle, draw] (stack) at (0, 3) {Stack};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    50
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    51
\path
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    52
	(r)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    53
        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    54
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    55
\path   (r1)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    56
	edge [bend right, dashed] node {saved} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    57
\path   (c1)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    58
	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    59
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    60
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    61
\end{tikzpicture}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    62
\caption{First Derivative Taken}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    63
\end{figure}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    64
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    65
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    66
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
    67
\begin{figure}[H]
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    68
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    69
%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    70
\node  [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    71
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    72
%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    73
\node  [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    74
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    75
%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    76
\node  [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    77
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    78
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    79
\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    80
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    81
%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    82
\node [circle, draw] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    83
\node [minimum width = 3cm, minimum height = 1cm, rectangle, draw] (stack) at (0, 2) {\large Stack};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    84
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    85
\path
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    86
	(r)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    87
        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    88
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    89
\path   (r2)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    90
	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    91
\path   (c2)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    92
	edge [bend right, dashed] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    93
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    94
\path   (r1)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    95
	edge [] node {} (r2);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    96
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    97
\end{tikzpicture}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    98
\caption{Second Derivative Taken}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
    99
\end{figure}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   100
\noindent
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   101
As the number of derivative steps increase,
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   102
the stack would increase:
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   103
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   104
\begin{figure}[H]
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   105
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   106
%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   107
\node  [ circle ] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   108
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   109
%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   110
\node  [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   111
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   112
%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   113
\node  [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   114
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   115
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   116
\node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   117
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   118
%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   119
\node [circle, ] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   120
\node [minimum width = 4cm, minimum height = 2.5cm, rectangle, draw] (stack) at (0, 1) { \large Stack};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   121
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   122
\node [] (ldots) at (3.5, 5) {};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   123
%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   124
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   125
\node [minimum size = 0.5, circle, ] (rn) at (6, 5) {};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   126
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   127
\node (rldots) at ($(ldots)!.4!(rn)$) {\ldots};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   128
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   129
\path
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   130
	(r)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   131
        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   132
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   133
\path   (rldots)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   134
	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   135
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   136
\path   (r1)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   137
	edge [] node {} (r2);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   138
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   139
\path   (r2)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   140
	edge [] node {} (ldots);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   141
\path   (ldots)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   142
	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   143
\path   (5.03, 4.9)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   144
	edge [bend left, dashed] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   145
\end{tikzpicture}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   146
\caption{More Derivatives Taken}
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   147
\end{figure}
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   148
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   149
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   150
\begin{figure}[H]
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   151
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   152
%\draw (-6,-6) grid (6,6);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   153
\node  [radius = 0.5, circle, draw] (r) at (-6, 5) {$r$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   154
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   155
%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   156
\node  [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   157
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   158
%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   159
\node  [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   160
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   161
%\node (0, 6)  (c2) circle [radius = 0.3] {$c_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   162
\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   163
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   164
%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   165
\node [circle, draw] (r2) at (2, 5) {$r_2$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   166
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   167
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   168
\node [] (ldots) at (4.5, 5) {};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   169
%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   170
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   171
\node [minimum size = 0.5, circle, draw] (rn) at (6, 5) {$r_n$};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   172
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   173
\node at ($(ldots)!.4!(rn)$) {\ldots};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   174
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   175
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   176
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   177
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   178
\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   179
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   180
\path
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   181
	(r)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   182
        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   183
\path
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   184
	(r1)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   185
        edge [] node {} (r2);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   186
\path   (r2)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   187
	edge [] node {} (ldots);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   188
\path   (r)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   189
	edge [dashed, bend right] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   190
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   191
\path   (r1)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   192
	edge [dashed, ] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   193
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   194
\path   (c1)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   195
	edge [dashed, bend right] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   196
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   197
\path   (c2)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   198
	edge [dashed] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   199
\path   (4.5, 5)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   200
	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   201
\path   (4.9, 5)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   202
	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   203
\path   (5.3, 5)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   204
	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   205
\path (r2)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   206
	edge [dashed, ] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   207
\path (rn)
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   208
	edge [dashed, bend left] node {} (stack);
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   209
\end{tikzpicture}
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   210
\caption{Before Injection Phase Starts}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   211
\end{figure}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   212
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   213
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   214
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   215
After all derivatives have been taken, the stack grows to a maximum size
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   216
and the pair of regular expressions and characters $r_i, c_{i+1}$ 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   217
are then popped out and used in the injection phase.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   218
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   219
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   220
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   221
\begin{figure}[H]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   222
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   223
%\draw (-6,-6) grid (6,6);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   224
\node  [radius = 0.5, circle, ] (r) at (-6, 5) {$r$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   225
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   226
%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   227
\node  [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   228
%
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   229
%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   230
\node  [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   231
%
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   232
%\node (0, 6)  (c2) circle [radius = 0.3] {$c_2$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   233
\node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$};
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   234
%
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   235
%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   236
\node [circle, ] (r2) at (2, 5) {$r_2$};
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   237
%
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   238
%
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   239
\node [] (ldots) at (4.5, 5) {};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   240
%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   241
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   242
\node [minimum size = 0.5, circle, ] (rn) at (6, 5) {$r_n$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   243
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   244
\node at ($(ldots)!.4!(rn)$) {\ldots};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   245
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   246
\node [minimum size = 0.5, circle, ] (vn) at (6, -5) {$v_n$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   247
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   248
\node [] (ldots2) at (3.5, -5) {};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   249
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   250
\node  [minimum size = 0.5, circle, ] (v2) at (2, -5) {$v_2$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   251
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   252
\node at ($(ldots2)!.4!(v2)$) {\ldots};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   253
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   254
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   255
\node [circle, ] (v1) at (-2, -5) {$v_1$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   256
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   257
\node  [radius = 0.5, circle, ] (v) at (-6, -5) {$v$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   258
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   259
\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   260
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   261
\path
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   262
	(r)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   263
        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   264
\path
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   265
	(r1)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   266
        edge [] node {} (r2);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   267
\path   (r2)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   268
	edge [] node {} (ldots);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   269
\path   (rn)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   270
	edge [] node {$\mkeps$} (vn);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   271
\path   (vn) 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   272
	edge [] node {} (ldots2);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   273
\path   (v2)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   274
	edge [] node {$\inj \; r_1 \; c_2\;v_2$} (v1);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   275
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   276
\path   (v1)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   277
	edge [] node {$\inj \; r \; c_1 \; v_1$} (v);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   278
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   279
\path (stack)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   280
	edge [dashed] node {} (-4.2, -5.2);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   281
\path (stack)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   282
	edge [dashed] node {} (-4, -5.2);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   283
\path (stack)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   284
	edge [dashed] node {} (-0.1, -5.2);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   285
\path (stack)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   286
	edge [dashed] node {} (0.2, -5.26);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   287
\path (stack)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   288
	edge [dashed] node {} (3.2, -5);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   289
\path (stack)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   290
	edge [dashed] node {} (2.7, -5);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   291
\path (stack)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   292
	edge [dashed] node {} (3.7, -5);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   293
\end{tikzpicture}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   294
\caption{Stored $r_i, c_{i+1}$ Used by $\inj$}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   295
\end{figure}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   296
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   297
Storing all $r_i, c_{i+1}$ pairs recursively
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   298
allows the algorithm to work in an elegant way, at the expense of 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   299
storing quite a bit of verbose information.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   300
The stack seems to grow at least quadratically fast with respect
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   301
to the input (not taking into account the size bloat of $r_i$),
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   302
which can be inefficient and prone to stack overflow.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   303
\section{Bitcoded Algorithm}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   304
To address this,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   305
Sulzmann and Lu chose to  define a new datatype 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   306
called \emph{annotated regular expression},
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   307
which condenses all the partial lexing information
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   308
(that was originally stored in $r_i, c_{i+1}$ pairs)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   309
into bitcodes.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   310
Annotated regular expressions 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   311
are defined as the following 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   312
Isabelle datatype \footnote{ We use subscript notation to indicate
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   313
	that the bitcodes are auxiliary information that do not
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   314
interfere with the structure of the regular expressions }:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   315
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   316
\begin{tabular}{lcl}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   317
  $\textit{a}$ & $::=$  & $\ZERO$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   318
                  & $\mid$ & $_{bs}\ONE$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   319
                  & $\mid$ & $_{bs}{\bf c}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   320
                  & $\mid$ & $_{bs}\sum\,as$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   321
                  & $\mid$ & $_{bs}a_1\cdot a_2$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   322
                  & $\mid$ & $_{bs}a^*$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   323
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   324
\end{center}  
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   325
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   326
where $bs$ stands for bit-codes, $a$  for $\mathbf{a}$nnotated regular
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   327
expressions and $as$ for lists of annotated regular expressions.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   328
The alternative constructor, written, $\sum$, has been generalised to 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   329
accept a list of annotated regular expressions rather than just two.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   330
Why is it generalised? This is because when we open up nested 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   331
alternatives, there could be more than two elements at the same level
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   332
after de-duplication, which can no longer be stored in a binary
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   333
constructor.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   334
Bits and bitcodes (lists of bits) are defined as:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   335
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   336
		$b ::=   S \mid  Z \qquad
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   337
bs ::= [] \mid b::bs    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   338
$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   339
\end{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   340
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   341
We use $S$ and $Z$ rather than $1$ and $0$ is to avoid
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   342
confusion with the regular expressions $\ZERO$ and $\ONE$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   343
The idea is to use the attached bitcodes
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   344
to indicate which choice was made at a certain point
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   345
during lexing.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   346
For example,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   347
$(_{Z}a+_{S}b) \backslash a$ gives us
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   348
$_{Z}\ONE + \ZERO$, this $Z$ bitcode will
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   349
later be used to decode that a left branch was
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   350
selected at the time when the part $a+b$ is being taken
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   351
derivative of.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   352
\subsection{A Bird's Eye View of the Bit-coded Lexer}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   353
Before we give out the rest of the functions and definitions 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   354
related to our
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   355
$\blexer$ (\emph{b}-itcoded lexer), we first provide a high-level
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   356
view of the algorithm, so the reader does not get lost in
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   357
the details.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   358
\begin{figure}[H]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   359
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   360
%\draw (-6,-6) grid (6,6);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   361
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   362
	\node [circle, draw] (r0) at (-6, 2) {$r$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   363
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   364
	\node  [radius = 0.5, circle, draw] (r) at (-6, 5) {$_{bs}a$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   365
	\path (r0)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   366
		edge [] node {$\internalise$} (r);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   367
%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   368
\node  [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   369
%
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   370
%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   371
\node  [minimum size = 1.0cm, circle, draw] (r1) at (-2, 5) {$_{bs_1}a_1$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   372
%
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   373
%\node (0, 6)  (c2) circle [radius = 0.3] {$c_2$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   374
\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   375
%
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   376
%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   377
\node [circle, draw, minimum size = 1.4cm] (r2) at (2, 5) {$_{bs_2}a_2$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   378
%
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   379
%
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   380
\node [] (ldots) at (4.5, 5) {};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   381
%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   382
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   383
\node [minimum size = 2.2cm, circle, draw] (rn) at (6, 5) {$_{bs_n}a_n$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   384
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   385
\node at ($(ldots)!.1!(rn)$) {\ldots};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   386
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   387
\node [minimum size = 0.5, circle, ] (v) at (6, 2) {$v$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   388
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   389
%\node [] (v2) at (4, -5) {};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   390
%
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   391
%\node [draw, cross out] (ldots2) at (5, -5) {};
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   392
%
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   393
%\node at ($(ldots2)!.4!(v2)$) {\ldots};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   394
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   395
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   396
\node [align = center] (decode) at (6.6, 3.2) {$\bmkeps$\\$\decode$};
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   397
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   398
\path (c1)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   399
	edge [dashed, bend left] node {} (r0);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   400
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   401
\path (c2)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   402
	edge [dashed, bend left] node {} (r0);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   403
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   404
\path (r1)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   405
	edge [dashed, bend right] node {} (r2);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   406
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   407
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   408
\path
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   409
	(r)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   410
        edge [dashed, bend right] node[left] {} (r1);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   411
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   412
\path
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   413
	(r)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   414
        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   415
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   416
\path
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   417
	(r1)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   418
        edge [] node {} (r2);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   419
\path   (r2)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   420
	edge [] node {} (ldots);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   421
\path   (rn)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   422
	edge [] node {} (v);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   423
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   424
\path	(r0)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   425
	edge [dashed, bend right] node {} (decode);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   426
%\path	(v)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   427
	%edge [] node {} (ldots2);
579
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   428
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   429
35df9cdd36ca more chap3
Chengsong
parents: 576
diff changeset
   430
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   431
\end{tikzpicture}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   432
\caption{Bird's Eye View of $\blexer$}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   433
\end{figure}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   434
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   435
The plain regular expressions
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   436
are first ``lifted'' to an annotated regular expression,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   437
with the function called \emph{internalise}.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   438
Then the annotated regular expression $_{bs}a$ will
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   439
go through successive derivatives with respect 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   440
to the input stream of characters $c_1, c_2$ etc.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   441
Each time a derivative is taken, the bitcodes
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   442
are moved around, discarded or augmented together 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   443
with the regular expression they are attached to.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   444
After all input has been consumed, the 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   445
bitcodes are collected by $\bmkeps$,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   446
which traverses the nullable part of the regular expression
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   447
and collect the bitcodes along the way.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   448
The collected bitcodes are then $\decode$d with the guidance
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   449
of the input regular expression $r$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   450
The most notable improvements of $\blexer$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   451
over $\lexer$ are 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   452
\begin{itemize}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   453
	\item
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   454
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   455
		An absence of the second injection phase.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   456
	\item
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   457
		One need not store each pair of the 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   458
		intermediate regular expressions $_{bs_i}a_i$ and  $c_{i+1}$. 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   459
		The previous annotated regular expression $_{bs_i}a_i$'s information is passed
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   460
		on without loss to its successor $_{bs_{i+1}}a_{i+1}$,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   461
		and $c_{i+1}$'s already contained in the strings in $L\;r$ \footnote{
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   462
		which can be easily recovered if we decode in the correct order}.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   463
	\item
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   464
		The simplification works much better--one can maintain correctness
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   465
		while applying quite strong simplifications, as we shall wee.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   466
\end{itemize}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   467
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   468
In the next section we will 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   469
give the operations needed in $\blexer$,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   470
with some remarks on the idea behind their definitions.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   471
\subsection{Operations in $\textit{Blexer}$}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   472
The first operation we define related to bit-coded regular expressions
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   473
is how we move bits to the inside of regular expressions.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   474
Called $\fuse$, this operation attaches bit-codes 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   475
to the front of an annotated regular expression:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   476
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   477
\begin{tabular}{lcl}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   478
  $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   479
  $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   480
     $_{bs @ bs'}\ONE$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   481
  $\textit{fuse}\;bs\;_{bs'}{\bf c}$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   482
     $_{bs@bs'}{\bf c}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   483
  $\textit{fuse}\;bs\,_{bs'}\sum\textit{as}$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   484
     $_{bs@bs'}\sum\textit{as}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   485
  $\textit{fuse}\;bs\; _{bs'}a_1\cdot a_2$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   486
     $_{bs@bs'}a_1 \cdot a_2$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   487
  $\textit{fuse}\;bs\,_{bs'}a^*$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   488
     $_{bs @ bs'}a^*$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   489
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   490
\end{center} 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   491
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   492
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   493
With \emph{fuse} we are able to define the $\internalise$ function
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   494
that translates a ``standard'' regular expression into an
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   495
annotated regular expression.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   496
This function will be applied before we start
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   497
with the derivative phase of the algorithm.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   498
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   499
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   500
\begin{tabular}{lcl}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   501
  $(\ZERO)^\uparrow$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   502
  $(\ONE)^\uparrow$ & $\dn$ & $_{[]}\ONE$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   503
  $(c)^\uparrow$ & $\dn$ & $_{[]}{\bf c}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   504
  $(r_1 + r_2)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   505
  $_{[]}\sum[\textit{fuse}\,[Z]\,r_1^\uparrow,\,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   506
  \textit{fuse}\,[S]\,r_2^\uparrow]$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   507
  $(r_1\cdot r_2)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   508
         $_{[]}r_1^\uparrow \cdot r_2^\uparrow$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   509
  $(r^*)^\uparrow$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   510
         $_{[]}(r^\uparrow)^*$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   511
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   512
\end{center}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   513
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   514
We use an up arrow with postfix notation
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   515
to denote this operation
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   516
for convenience. 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   517
The opposite of $\textit{internalise}$ is
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   518
$\erase$, where all the bit-codes are removed,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   519
and the alternative operator $\sum$ for annotated
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   520
regular expressions is transformed to the binary version 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   521
in plain regular expressions.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   522
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   523
	\begin{tabular}{lcl}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   524
		$\ZERO_\downarrow$ & $\dn$ & $\ZERO$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   525
		$( _{bs}\ONE )_\downarrow$ & $\dn$ & $\ONE$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   526
		$( _{bs}\mathbf{c} )_\downarrow$ & $\dn$ & $\mathbf{c}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   527
		$( _{bs} a_1 \cdot a_2 )_\downarrow$ & $\dn$ & 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   528
		$ (a_1) _\downarrow \cdot  (a_2) _\downarrow$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   529
		$( _{bs} [])_\downarrow $ & $\dn$ & $\ZERO $\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   530
		$( _{bs} [a]  )_\downarrow$ & $\dn$ & $a_\downarrow$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   531
		$_{bs} \sum [a_1, \; a_2]$ & $\dn$ & $ (a_1) _\downarrow + ( a_2 ) _\downarrow $\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   532
		$(_{bs} \sum (a :: as))_\downarrow$ & $\dn$ & $ a_\downarrow + \; (_{[]} \sum as)_\downarrow$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   533
		$( _{bs} a^* )_\downarrow$ & $\dn$ & $(a_\downarrow)^*$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   534
	\end{tabular}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   535
\end{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   536
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   537
We also abbreviate the $\erase\; a$ operation
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   538
as $a_\downarrow$, for conciseness.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   539
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   540
Testing whether an annotated regular expression
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   541
contains the empty string in its lauguage requires
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   542
a dedicated function $\bnullable$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   543
$\bnullable$ simply calls $\erase$ before $\nullable$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   544
\begin{definition}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   545
		$\bnullable \; a \dn  \nullable \; (a_\downarrow)$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   546
\end{definition}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   547
The function for collecting 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   548
bitcodes from a 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   549
(b)nullable regular expression is called $\bmkeps$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   550
$\bmkeps$ is a variation of the $\textit{mkeps}$ function,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   551
which follows the path $\mkeps$ takes to traverse the
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   552
$\nullable$ branches when visiting a regular expression,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   553
but gives back bitcodes instead of a value.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   554
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   555
\begin{tabular}{lcl}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   556
  $\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   557
  $\textit{bmkeps}\,(_{bs}\sum a::\textit{as})$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   558
     $\textit{if}\;\textit{bnullable}\,a$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   559
  & &$\textit{then}\;bs\,@\,\textit{bmkeps}\,a$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   560
  & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,(_{[]}\sum \textit{as})$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   561
  $\textit{bmkeps}\,(_{bs} a_1 \cdot a_2)$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   562
     $bs \,@\,\textit{bmkeps}\,a_1\,@\, \textit{bmkeps}\,a_2$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   563
  $\textit{bmkeps}\,(_{bs}a^*)$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   564
     $bs \,@\, [Z]$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   565
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   566
\end{center}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   567
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   568
$\bmkeps$ retrieves the value $v$'s
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   569
information in the format
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   570
of bitcodes, by travelling along the
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   571
path of the regular expression that corresponds to a POSIX match,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   572
collecting all the bitcodes, and attaching $S$ to indicate the end of star
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   573
iterations. \\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   574
The bitcodes extracted by $\bmkeps$ need to be 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   575
$\decode$d (with the guidance of a plain regular expression):
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   576
%\begin{definition}[Bitdecoding of Values]\mbox{}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   577
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   578
\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   579
  $\textit{decode}'\,bs\,(\ONE)$ & $\dn$ & $(\Empty, bs)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   580
  $\textit{decode}'\,bs\,(c)$ & $\dn$ & $(\Char\,c, bs)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   581
  $\textit{decode}'\,(Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   582
     $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   583
       (\Left\,v, bs_1)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   584
  $\textit{decode}'\,(S\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   585
     $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   586
       (\Right\,v, bs_1)$\\                           
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   587
  $\textit{decode}'\,bs\;(r_1\cdot r_2)$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   588
        $\textit{let}\,(v_1, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   589
  & &   $\textit{let}\,(v_2, bs_2) = \textit{decode}'\,bs_1\,r_2$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   590
  & &   \hspace{35mm}$\textit{in}\;(\Seq\,v_1\,v_2, bs_2)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   591
  $\textit{decode}'\,(Z\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   592
  $\textit{decode}'\,(S\!::\!bs)\,(r^*)$ & $\dn$ & 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   593
         $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   594
  & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   595
  & &   \hspace{35mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\bigskip\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   596
  
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   597
  $\textit{decode}\,bs\,r$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   598
     $\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   599
  & & $\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   600
       \textit{else}\;\textit{None}$                       
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   601
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   602
\end{center} 
537
Chengsong
parents: 536
diff changeset
   603
\noindent
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   604
The function $\decode'$ returns a pair consisting of 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   605
a partially decoded value and some leftover bit list that cannot
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   606
be decide yet.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   607
The function $\decode'$ succeeds if the left-over 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   608
bit-sequence is empty.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   609
$\decode$ is terminating as $\decode'$ is terminating.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   610
$\decode'$ is terminating 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   611
because at least one of $\decode'$'s parameters will go down in terms
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   612
of size.\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   613
The reverse operation of $\decode$ is $\code$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   614
$\textit{code}$ encodes a value into a bitcode by converting
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   615
$\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   616
star iteration by $S$. The border where a star iteration
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   617
terminates is marked by $Z$. 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   618
This coding is lossy, as it throws away the information about
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   619
characters, and does not encode the ``boundary'' between two
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   620
sequence values. Moreover, with only the bitcode we cannot even tell
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   621
whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$.  
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   622
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   623
\begin{tabular}{lcl}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   624
  $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   625
  $\textit{code}(\Char\,c)$ & $\dn$ & $[]$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   626
  $\textit{code}(\Left\,v)$ & $\dn$ & $Z :: code(v)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   627
  $\textit{code}(\Right\,v)$ & $\dn$ & $S :: code(v)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   628
  $\textit{code}(\Seq\,v_1\,v_2)$ & $\dn$ & $code(v_1) \,@\, code(v_2)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   629
  $\textit{code}(\Stars\,[])$ & $\dn$ & $[Z]$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   630
  $\textit{code}(\Stars\,(v\!::\!vs))$ & $\dn$ & $S :: code(v) \;@\;
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   631
                                                 code(\Stars\,vs)$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   632
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   633
\end{center} 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   634
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   635
Assume we have a value $v$ and regular expression $r$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   636
with $\vdash v:r$,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   637
then we have the property that $\decode$ and $\code$ are
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   638
reverse operations of one another:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   639
\begin{lemma}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   640
\[If \vdash v : r \; then \;\decode \; (\code \; v) \; r = \textit{Some}(v) \]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   641
\end{lemma}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   642
\begin{proof}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   643
By proving a more general version of the lemma, on $\decode'$:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   644
\[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   645
Then setting $ds$ to be $[]$ and unfolding $\decode$ definition,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   646
we obtain the property.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   647
\end{proof}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   648
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   649
Now we give out the central part of this lexing algorithm,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   650
the $\bder$ function (stands for \emph{b}itcoded-derivative):
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   651
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   652
  \begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   653
  $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\  
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   654
  $(_{bs}\ONE)\,\backslash c$ & $\dn$ & $\ZERO$\\  
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   655
  $(_{bs}{\bf d})\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   656
        $\textit{if}\;c=d\; \;\textit{then}\;
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   657
         _{bs}\ONE\;\textit{else}\;\ZERO$\\  
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   658
  $(_{bs}\sum \;\textit{as})\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   659
  $_{bs}\sum\;(\textit{map} \; (\_\backslash c) \; as )$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   660
  $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   661
     $\textit{if}\;\textit{bnullable}\,a_1$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   662
					       & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   663
					       & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   664
  & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   665
  $(_{bs}a^*)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   666
      $_{bs}(\textit{fuse}\, [Z] \; r\,\backslash c)\cdot
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   667
       (_{[]}r^*))$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   668
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   669
\end{center}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   670
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   671
For most time we use the infix notation $(\_\backslash\_)$ 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   672
to mean $\bder$ for brevity when
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   673
there is no danger of confusion with derivatives on plain regular expressions.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   674
For example, we write $( _{[]}r^* ) \backslash c$ instead of $\bder \;c \; _{[]}r^*$,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   675
as the bitcodes at the front of $r^*$ indicates that it is 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   676
a bit-coded regular expression, not a plain one.\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   677
$\bder$ tells us how regular expressions can be recursively traversed,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   678
where the bitcodes are augmented and carried around 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   679
when a derivative is taken.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   680
We give the intuition behind some of the more involved cases in 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   681
$\bder$. \\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   682
For example,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   683
in the \emph{star} case,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   684
a derivative on $_{bs}a^*$ means 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   685
that one more star iteratoin needs to be taken.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   686
we need to unfold it into a sequence,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   687
and attach an additional bit $Z$ to the front of $r \backslash c$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   688
as a record to indicate one new star iteration is unfolded.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   689
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   690
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   691
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   692
  \begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   693
  $(_{bs}a^*)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   694
  $_{bs}(\underbrace{\textit{fuse}\, [Z] \; a\,\backslash c}_{\text{One more iteration}})\cdot
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   695
       (_{[]}a^*))$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   696
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   697
\end{center}   
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   698
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   699
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   700
This information will be recovered later by the $\decode$ function.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   701
The intuition is that the bit $Z$ will be decoded at the right location,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   702
because we accumulate bits from left to right (a rigorous proof will be given
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   703
later).
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   704
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   705
%\begin{tikzpicture}[ > = stealth, % arrow head style
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   706
%        shorten > = 1pt, % don't touch arrow head to node
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   707
%        semithick % line style
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   708
%    ]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   709
%
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   710
%    \tikzstyle{every state}=[
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   711
%        draw = black,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   712
%        thin,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   713
%        fill = cyan!29,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   714
%        minimum size = 7mm
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   715
%    ]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   716
%    \begin{scope}[node distance=1cm and 0cm, every node/.style=state]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   717
%		\node (k) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   718
%        {$bs$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   719
%         \nodepart{two} $a^*$ };
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   720
%	 \node (l) [below =of k, rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   721
%        { $bs$ + [Z]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   722
%         \nodepart{two}  $(a\backslash c )\cdot a^*$ };
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   723
%    \end{scope}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   724
%    \path[->] 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   725
%	      (k) edge (l);
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   726
%\end{tikzpicture}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   727
%
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   728
%Pictorially the process looks like below.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   729
%Like before, the red region denotes
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   730
%previous lexing information (stored as bitcodes in $bs$).
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   731
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   732
%\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   733
%	\begin{scope}[node distance=1cm]   
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   734
%		\node (a) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   735
%        {$bs$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   736
%         \nodepart{two} $a^*$ };
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   737
%	 \node (b) [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   738
%        { $bs$ + [Z]
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   739
%         \nodepart{two}  $(a\backslash c )\cdot a^*$ };
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   740
%%\caption{term 1 \ref{term:1}'s matching configuration}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   741
% 	\end{scope}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   742
%\end{tikzpicture}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   743
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   744
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   745
Another place the $\bder$ function differs
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   746
from derivatives on plain regular expressions
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   747
is the sequence case:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   748
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   749
  \begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   750
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   751
  $(_{bs}\;a_1\cdot a_2)\,\backslash c$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   752
     $\textit{if}\;\textit{bnullable}\,a_1$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   753
					       & &$\textit{then}\;_{bs}\sum\,[(_{[]}\,(a_1\,\backslash c)\cdot\,a_2),$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   754
					       & &$\phantom{\textit{then},\;_{bs}\sum\,}(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   755
  & &$\textit{else}\;_{bs}\,(a_1\,\backslash c)\cdot a_2$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   756
\end{tabular}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   757
\end{center}    
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   758
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   759
The difference is that (when $a_1$ is $\bnullable$)
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   760
we use $\bmkeps$ to store the lexing information
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   761
in $a_1$ before collapsing 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   762
it (as it has been fully matched by string prior to $c$),
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   763
and attach the collected bit-codes to the front of $a_2$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   764
before throwing away $a_1$. We assume that $\bmkeps$ 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   765
correctly extracts the bitcode for how $a_1$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   766
matches the string prior to $c$ (more on this later).
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   767
The bitsequence $\textit{bs}$ which was initially 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   768
attached to the first element of the sequence
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   769
$a_1 \cdot a_2$, has now been 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   770
elevated to the top level of the $\sum$ constructor. 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   771
This is because this piece of information will be needed 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   772
whichever way the sequence is matched,
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   773
regardless of whether $c$ belongs to $a_1$ or $a_2$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   774
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   775
In the injection-based lexer, $r_1$ is immediately thrown away in 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   776
subsequent derivatives on the right branch (when $r_1$ is $\nullable$),
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   777
depite $r_1$ potentially storing information of previous matches:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   778
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   779
	$(r_1 \cdot r_2 )\backslash c = (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   780
\end{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   781
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   782
this loss of information makes it necessary
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   783
to store on stack all the regular expressions'
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   784
``snapshots'' before they were
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   785
taken derivative of.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   786
So that the related information will be available 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   787
once the child recursive 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   788
call finishes.\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   789
The rest of the clauses of $\bder$ is rather similar to
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   790
$\der$. \\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   791
Generalising the derivative operation with bitcodes to strings, we have 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   792
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   793
	\begin{tabular}{@{}lcl@{}}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   794
		$a\backslash_s [] $ & $\dn$ & $a$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   795
		$a\backslash (c :: s) $ & $\dn$ & $(a \backslash c) \backslash_s s$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   796
	\end{tabular}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   797
\end{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   798
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   799
As we did earlier, we omit the $s$ subscript at $\backslash_s$ when there is no danger
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   800
of confusion.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   801
\subsection{Putting Things Together}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   802
Putting these operations altogether, we obtain a lexer with bit-coded regular expressions
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   803
as its internal data structures, which we call $\blexer$:
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   804
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   805
\begin{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   806
\begin{tabular}{lcl}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   807
  $\textit{blexer}\;r\,s$ & $\dn$ &
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   808
      $\textit{let}\;a = (r^\uparrow)\backslash s\;\textit{in}$\\                
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   809
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   810
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   811
  & & $\;\;\textit{else}\;\textit{None}$
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   812
\end{tabular}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   813
\end{center}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   814
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   815
\noindent
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   816
Ausaf and Urban formally proved the correctness of the $\blexer$, namely
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   817
\begin{property}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   818
$\blexer \;r \; s = \lexer \; r \; s$.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   819
\end{property}
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   820
This was claimed but not formalised in Sulzmann and Lu's work.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   821
We introduce the proof later, after we give all
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   822
the needed auxiliary functions and definitions.
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   823
But before this we shall first walk the reader 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   824
through a concrete example of our $\blexer$ calculating the right 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   825
lexical information through bit-coded regular expressions.\\
564
Chengsong
parents: 543
diff changeset
   826
Consider the regular expression $(aa)^* \cdot bc$ matching the string $aabc$
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   827
and assume we have just read the first character $a$:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   828
\begin{center}
537
Chengsong
parents: 536
diff changeset
   829
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
   830
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   831
	    {$\ONE \cdot a \cdot (aa)^* \cdot bc$ 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   832
	    \nodepart{two} $[] \iff \Seq \; (\Stars \; [\Seq\; a \; ??, \;]), ??$};
537
Chengsong
parents: 536
diff changeset
   833
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
   834
\end{center}
564
Chengsong
parents: 543
diff changeset
   835
\noindent
580
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   836
We use the red area for (annotated) regular expressions and the blue 
e0f0a81f907b halfway chap3
Chengsong
parents: 579
diff changeset
   837
area the (partial) bitcodes and (partial) values.
564
Chengsong
parents: 543
diff changeset
   838
In the injection-based lexing algorithm, we ``neglect" the red area
Chengsong
parents: 543
diff changeset
   839
by putting all the characters we have consumed and
Chengsong
parents: 543
diff changeset
   840
intermediate regular expressions on the stack when 
Chengsong
parents: 543
diff changeset
   841
we go from left to right in the derivative phase.
Chengsong
parents: 543
diff changeset
   842
The red area grows till the string is exhausted.
Chengsong
parents: 543
diff changeset
   843
During the injection phase, the value in the blue area
Chengsong
parents: 543
diff changeset
   844
is built up incrementally, while the red area shrinks.
Chengsong
parents: 543
diff changeset
   845
Before we have recovered all characters and intermediate
Chengsong
parents: 543
diff changeset
   846
derivative regular expressions from the stack,
Chengsong
parents: 543
diff changeset
   847
what values these characters and regular expressions correspond 
Chengsong
parents: 543
diff changeset
   848
to are unknown: 
537
Chengsong
parents: 536
diff changeset
   849
\begin{center}
Chengsong
parents: 536
diff changeset
   850
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
   851
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={white!30,blue!20},]
564
Chengsong
parents: 543
diff changeset
   852
	    {$(\ONE \cdot \ONE) \cdot (aa)^* \cdot bc $ correspond to:$???$
Chengsong
parents: 543
diff changeset
   853
         \nodepart{two}  $b c$ corresponds to  $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$};
537
Chengsong
parents: 536
diff changeset
   854
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 536
diff changeset
   855
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
   856
\end{center}
Chengsong
parents: 536
diff changeset
   857
\noindent
564
Chengsong
parents: 543
diff changeset
   858
However, they should be calculable,
Chengsong
parents: 543
diff changeset
   859
as characters and regular expression shapes
Chengsong
parents: 543
diff changeset
   860
after taking derivative w.r.t those characters
Chengsong
parents: 543
diff changeset
   861
have already been known, therefore in our example,
Chengsong
parents: 543
diff changeset
   862
we know that the value starts with two $a$s,
Chengsong
parents: 543
diff changeset
   863
and makes up to an iteration in a Kleene star:
Chengsong
parents: 543
diff changeset
   864
(We have put the injection-based lexing's partial 
Chengsong
parents: 543
diff changeset
   865
result in the right part of the split rectangle
Chengsong
parents: 543
diff changeset
   866
to contrast it with the partial valued produced
Chengsong
parents: 543
diff changeset
   867
in a forward manner)
537
Chengsong
parents: 536
diff changeset
   868
\begin{center}
Chengsong
parents: 536
diff changeset
   869
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
   870
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
564
Chengsong
parents: 543
diff changeset
   871
	    {$\stackrel{Bitcoded}{\longrightarrow} \Seq(\Stars[\Char(a), \Char(a)], ???)$
Chengsong
parents: 543
diff changeset
   872
	\nodepart{two} $\Seq(\ldots, \Seq(\Char(b), \Char(c)))$  $\stackrel{Inj}{\longleftarrow}$};
537
Chengsong
parents: 536
diff changeset
   873
%\caption{term 1 \ref{term:1}'s matching configuration}
Chengsong
parents: 536
diff changeset
   874
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
   875
\end{center}
Chengsong
parents: 536
diff changeset
   876
\noindent
Chengsong
parents: 536
diff changeset
   877
If we do this kind of "attachment"
Chengsong
parents: 536
diff changeset
   878
and each time augment the attached partially
Chengsong
parents: 536
diff changeset
   879
constructed value when taking off a 
Chengsong
parents: 536
diff changeset
   880
character:
Chengsong
parents: 536
diff changeset
   881
\begin{center}
564
Chengsong
parents: 543
diff changeset
   882
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 543
diff changeset
   883
	\node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},] (spPoint)
Chengsong
parents: 543
diff changeset
   884
        {$\Seq(\Stars[\Char(a), \Char(a)], \ldots)$
Chengsong
parents: 543
diff changeset
   885
         \nodepart{two} Remaining: $b c$};
Chengsong
parents: 543
diff changeset
   886
\end{tikzpicture}\\
Chengsong
parents: 543
diff changeset
   887
$\downarrow$\\
537
Chengsong
parents: 536
diff changeset
   888
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
   889
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
Chengsong
parents: 536
diff changeset
   890
        {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \ldots))$
564
Chengsong
parents: 543
diff changeset
   891
         \nodepart{two} Remaining: $c$};
537
Chengsong
parents: 536
diff changeset
   892
\end{tikzpicture}\\
564
Chengsong
parents: 543
diff changeset
   893
$\downarrow$\\
537
Chengsong
parents: 536
diff changeset
   894
\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
Chengsong
parents: 536
diff changeset
   895
    \node [rectangle split, rectangle split horizontal, rectangle split parts=2, rectangle split part fill={red!30,blue!20},]
Chengsong
parents: 536
diff changeset
   896
        {$\Seq(\Stars[\Char(a), \Char(a)], \Seq(\Char(b), \Char(c)))$
Chengsong
parents: 536
diff changeset
   897
         \nodepart{two} EOF};
Chengsong
parents: 536
diff changeset
   898
\end{tikzpicture}
Chengsong
parents: 536
diff changeset
   899
\end{center}
Chengsong
parents: 536
diff changeset
   900
\noindent
Chengsong
parents: 536
diff changeset
   901
In the end we could recover the value without a backward phase.
Chengsong
parents: 536
diff changeset
   902
But (partial) values are a bit clumsy to stick together with a regular expression, so 
Chengsong
parents: 536
diff changeset
   903
we instead use bit-codes to encode them.
Chengsong
parents: 536
diff changeset
   904
Chengsong
parents: 536
diff changeset
   905
Bits and bitcodes (lists of bits) are defined as:
Chengsong
parents: 536
diff changeset
   906
\begin{center}
Chengsong
parents: 536
diff changeset
   907
		$b ::=   S \mid  Z \qquad
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   908
bs ::= [] \mid b::bs    
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   909
$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   910
\end{center}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   911
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   912
\noindent
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   913
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   914
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   915
%	SUBSECTION 1
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   916
%-----------------------------------
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   917
\section{Specifications of Some Helper Functions}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   918
The functions we introduce will give a more detailed glimpse into 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   919
the lexing process, which might not be possible
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   920
using $\lexer$ or $\blexer$ themselves.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   921
The first function we shall look at is $\retrieve$.
543
b2bea5968b89 thesis_thys
Chengsong
parents: 542
diff changeset
   922
\subsection{$\textit{Retrieve}$}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   923
Our bit-coded lexer "retrieve"s the bitcodes using $\bmkeps$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   924
after we finished doing all the derivatives:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   925
\begin{center}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   926
\begin{tabular}{lcl}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   927
	& & $\ldots$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   928
  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   929
  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   930
  & & $\ldots$
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   931
\end{tabular}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   932
\end{center}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   933
\noindent
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   934
Recall that $\bmkeps$ looks for the leftmost branch of an alternative
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   935
and completes a star's iterations by attaching a $Z$ at the end of the bitcodes
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   936
extracted. It "retrieves" a sequence by visiting both children and then stitch together 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   937
two bitcodes using concatenation. After the entire tree structure of the regular 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   938
expression has been traversed using the above manner, we get a bitcode encoding the 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   939
lexing result.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   940
We know that this "retrieved" bitcode leads to the correct value after decoding,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   941
which is $v_0$ in the bird's eye view of the injection-based lexing diagram.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   942
Now assume we keep every other data structure in the diagram \ref{InjFigure},
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   943
and only replace all the plain regular expression by their annotated counterparts,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   944
computed during a $\blexer$ run.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   945
Then we obtain a diagram for the annotated regular expression derivatives and
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   946
their corresponding values, though the values are never calculated in $\blexer$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   947
We have that $a_n$ contains all the lexing result information.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   948
\vspace{20mm}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   949
\begin{center}%\label{graph:injLexer}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   950
\begin{tikzcd}[
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   951
	every matrix/.append style = {name=p},
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   952
	remember picture, overlay,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   953
	]
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   954
	a_0 \arrow[r, "\backslash c_0"]  \arrow[d] & a_1 \arrow[r, "\backslash c_1"] \arrow[d] & a_2 \arrow[r, dashed] \arrow[d] & a_n \arrow[d] \\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   955
v_0           & v_1 \arrow[l,"inj_{r_0} c_0"]                & v_2 \arrow[l, "inj_{r_1} c_1"]              & v_n \arrow[l, dashed]         
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   956
\end{tikzcd}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   957
\begin{tikzpicture}[
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   958
	remember picture, overlay,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   959
E/.style = {ellipse, draw=blue, dashed,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   960
            inner xsep=4mm,inner ysep=-4mm, rotate=90, fit=#1}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   961
                        ]
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   962
\node[E = (p-1-1) (p-2-1)] {};
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   963
\node[E = (p-1-4) (p-2-4)] {};
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   964
\end{tikzpicture}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   965
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   966
\end{center}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   967
\vspace{20mm}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   968
\noindent
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   969
On the other hand, $v_0$ also encodes the correct lexing result, as we have proven for $\lexer$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   970
Encircled in the diagram  are the two pairs $v_0, a_0$ and $v_n, a_n$, which both 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   971
encode the correct lexical result. Though for the leftmost pair, we have
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   972
the information condensed in $v_0$ the value part, whereas for the rightmost pair,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   973
the information is concentrated on $a_n$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   974
We know that in the intermediate steps the pairs $v_i, a_i$, must in some way encode the complete
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   975
lexing information as well. Therefore, we need a unified approach to extract such lexing result
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   976
from a value $v_i$ and its annotated regular expression $a_i$. 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   977
And the function $f$ must satisfy these requirements:
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   978
\begin{itemize}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   979
	\item
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   980
		$f \; a_i\;v_i = f \; a_n \; v_n = \decode \; (\bmkeps \; a_n) \; (\erase \; a_0)$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   981
	\item
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   982
		$f \; a_i\;v_i = f \; a_0 \; v_0 = v_0 = \decode \;(\code \; v_0) \; (\erase \; a_0)$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   983
\end{itemize}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   984
\noindent
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   985
If we factor out the common part $\decode \; \_ \; (\erase \; a_0)$,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   986
The core of the function $f$ is something that produces the bitcodes
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   987
$\code \; v_0$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   988
It is unclear how, but Sulzmann and Lu came up with a function satisfying all the above
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   989
requirements, named \emph{retrieve}:
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   990
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   991
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
   992
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   993
\begin{center}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   994
\begin{tabular}{lcr}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   995
	$\retrieve \; \, (_{bs} \ONE) \; \, (\Empty)$ & $\dn$ & $\textit{bs}$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   996
	$\retrieve \; \, (_{bs} \mathbf{c} ) \; \Char(c)$ & $\dn$ & $ \textit{bs}$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   997
	$\retrieve \; \, (_{bs} a_1 \cdot a_2) \; \Seq(v_1, v_2)$ & $\dn$ &  $\textit{bs} @ (\retrieve \; a_1\; v_1) @ (\retrieve \; a_2 \; v_2)$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   998
	$\retrieve \; \, (_{bs} \Sigma (a :: \textit{as}) \; \,\Left(v)$ & $\dn$ & $\textit{bs} @ (\retrieve \; a \; v)$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
   999
	$\retrieve \; \, (_{bs} \Sigma (a :: \textit{as} \; \, \Right(v)$ & $\dn$ & $\textit{bs} @ (\retrieve \; (_{[]}\Sigma \textit{as}) \; v)$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1000
	$\retrieve \; \, (_{bs} a^*) \; \, (\Stars(v :: vs)) $ & $\dn$ & $\textit{bs} @ (\retrieve \; a \; v) @ (\retrieve \; (_{[]} a^*) \; (\Stars(vs)))$\\
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1001
	$\retrieve \; \, (_{bs} a^*) \; \, (\Stars([]) $ & $\dn$ & $\textit{bs} @ [Z]$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1002
\end{tabular}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1003
\end{center}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1004
\noindent
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1005
As promised, $\retrieve$ collects the right bit-codes from the 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1006
final derivative $a_n$:
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1007
\begin{lemma}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1008
	$\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (\erase \; a))$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1009
\end{lemma}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1010
\begin{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1011
	By a routine induction on $a$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1012
\end{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1013
The design of $\retrieve$ enables extraction of bit-codes
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1014
from not only $\bnullable$ (annotated) regular expressions,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1015
but also those that are not $\bnullable$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1016
For example, if we have the regular expression just internalised
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1017
and the lexing result value, we could $\retrieve$ the bitcdoes
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1018
that look as if we have en$\code$-ed the value:
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1019
\begin{lemma}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1020
	$\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1021
\end{lemma}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1022
\begin{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1023
	By induction on $r$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1024
\end{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1025
\noindent
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1026
The following property is more interesting, as
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1027
it provides a "bridge" between $a_0, v_0$ and $a_n, v_n$ in the
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1028
lexing diagram.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1029
If you take derivative of an annotated regular expression, 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1030
you can $\retrieve$ the same bit-codes as before the derivative took place,
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1031
provided that you use the corresponding value:
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1032
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1033
\begin{lemma}\label{retrieveStepwise}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1034
	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1035
\end{lemma}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1036
\begin{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1037
	By induction on $r$, where $v$ is allowed to be arbitrary.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1038
	The induction principle is function $\erase$'s cases.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1039
\end{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1040
\noindent
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1041
$\retrieve$ is connected to the $\blexer$ in the following way:
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1042
\begin{lemma}\label{blexer_retrieve}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1043
$\blexer \; r \; s = \decode  \; (\retrieve \; (\internalise \; r) \; (\mkeps \; (r \backslash s) )) \; r$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1044
\end{lemma}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1045
\noindent
564
Chengsong
parents: 543
diff changeset
  1046
$\retrieve$ allows free navigation on the diagram \ref{InjFigure} for annotated regular expressiones of $\blexer$.
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1047
For plain regular expressions something similar is required as well.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1048
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1049
\subsection{$\flex$}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1050
Ausaf and Urban cleverly defined an auxiliary function called $\flex$ for $\lexer$,
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1051
defined as
536
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
  1052
\begin{center}
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
  1053
\begin{tabular}{lcr}
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
  1054
$\flex \; r \; f \; [] \; v$       &  $=$ &   $f\; v$\\
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
  1055
$\flex \; r \; f \; c :: s \; v$ &  $=$ &   $\flex \; r \; \lambda v. \, f (\inj \; r\; c\; v)\; s \; v$
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
  1056
\end{tabular}
aff7bf93b9c7 comments addressed all
Chengsong
parents: 532
diff changeset
  1057
\end{center}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1058
which accumulates the characters that needs to be injected back, 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1059
and does the injection in a stack-like manner (last taken derivative first injected).
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1060
$\flex$ is connected to the $\lexer$:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1061
\begin{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1062
$\flex \; r \; \textit{id}\; s \; \mkeps (r\backslash s) = \lexer \; r \; s$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1063
\end{lemma}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1064
\begin{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1065
	By reverse induction on $s$.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1066
\end{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1067
$\flex$ provides us a bridge between $\lexer$'s intermediate steps.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1068
What is even better about $\flex$ is that it allows us to 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1069
directly operate on the value $\mkeps (r\backslash v)$,
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1070
which is pivotal in the definition of  $\lexer $ and $\blexer$, but not visible as an argument.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1071
When the value created by $\mkeps$ becomes available, one can 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1072
prove some stepwise properties of lexing nicely:
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1073
\begin{lemma}\label{flexStepwise}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1074
$\textit{flex} \; r \; f \; s@[c] \; v= \flex \; r \; f\; s \; (\inj \; (r\backslash s) \; c \; v) $
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1075
\end{lemma}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1076
\begin{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1077
	By induction on the shape of $r\backslash s$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1078
\end{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1079
\noindent
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1080
With $\flex$ and $\retrieve$ ready, we are ready to connect $\lexer$ and $\blexer$ .
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1081
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1082
\subsection{Correctness Proof of Bit-coded Algorithm}
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1083
\begin{lemma}\label{flex_retrieve}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1084
$\flex \; r \; \textit{id}\; s\; v = \decode \; (\retrieve \; (r\backslash s )\; v) \; r$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1085
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1086
\begin{proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1087
By induction on $s$. The induction tactic is reverse induction on strings.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1088
$v$ is allowed to be arbitrary.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1089
The crucial point is to rewrite 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1090
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1091
\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c]) 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1092
\]
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1093
as
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1094
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1095
\retrieve \; (r \backslash s) \; (\inj \; (r \backslash s) \; c\;  \mkeps (r \backslash s@[c]))
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1096
\].
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1097
This enables us to equate 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1098
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1099
\retrieve \; (r \backslash s@[c]) \; \mkeps (r \backslash s@[c]) 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1100
\] 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1101
with 
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1102
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1103
\flex \; r \; \textit{id} \; s \; (\inj \; (r\backslash s) \; c\; (\mkeps (r\backslash s@[c])))
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1104
\],
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1105
which in turn can be rewritten as
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1106
\[
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1107
\flex \; r \; \textit{id} \; s@[c] \;  (\mkeps (r\backslash s@[c]))
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1108
\].
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1109
\end{proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1110
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1111
With the above lemma we can now link $\flex$ and $\blexer$.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1112
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1113
%----------------------------------------------------------------------------------------
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1114
%	SECTION  correctness proof
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1115
%----------------------------------------------------------------------------------------
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1116
\section{Correctness of Bit-coded Algorithm (Without Simplification)}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1117
We now give the proof the correctness of the algorithm with bit-codes.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1118
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1119
\begin{lemma}\label{flex_blexer}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1120
$\textit{flex} \; r \; \textit{id} \; s \; \mkeps(r \backslash s)  = \blexer \; r \; s$
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1121
\end{lemma}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1122
\begin{proof}
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1123
Using two of the above lemmas: \ref{flex_retrieve} and \ref{blexer_retrieve}.
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1124
\end{proof}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1125
Finally the correctness of $\blexer$ is given as it outputs the same result as $\lexer$:
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1126
\begin{theorem}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1127
	$\blexer\; r \; s = \lexer \; r \; s$
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1128
\end{theorem}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1129
\begin{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1130
	Straightforward corollary of \ref{flex_blexer}.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1131
\end{proof}
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1132
\noindent
576
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1133
To piece things together and spell out the exact correctness
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1134
of the bitcoded lexer
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1135
in terms of producing POSIX values,
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1136
we use the fact from the previous chapter that
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1137
\[
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1138
	If \; (r, s) \rightarrow v \; then \; \lexer \; r \; s = v
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1139
\]
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1140
to obtain this corollary:
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1141
\begin{corollary}\label{blexerPOSIX}
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1142
	$If \; (r, s) \rightarrow v \; then \blexer \; r \; s = v$
3e1b699696b6 thesis chap5
Chengsong
parents: 575
diff changeset
  1143
\end{corollary}
542
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1144
Our main reason for wanting a bit-coded algorithm over 
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1145
the injection-based one is for its capabilities of allowing
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1146
more aggressive simplifications.
a7344c9afbaf chapter3 finished
Chengsong
parents: 538
diff changeset
  1147
We will elaborate on this in the next chapter.
532
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1148
cc54ce075db5 restructured
Chengsong
parents:
diff changeset
  1149