532
|
1 |
% Chapter Template
|
|
2 |
|
|
3 |
% Main chapter title
|
|
4 |
\chapter{Correctness of Bit-coded Algorithm with Simplification}
|
|
5 |
|
|
6 |
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
|
|
7 |
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
|
|
8 |
%simplifications and therefore introduce our version of the bitcoded algorithm and
|
|
9 |
%its correctness proof in
|
|
10 |
%Chapter 3\ref{Chapter3}.
|
|
11 |
|
|
12 |
|
|
13 |
|
538
|
14 |
Now we introduce the simplifications, which is why we introduce the
|
|
15 |
bitcodes in the first place.
|
|
16 |
|
|
17 |
\subsection*{Simplification Rules}
|
|
18 |
|
|
19 |
This section introduces aggressive (in terms of size) simplification rules
|
|
20 |
on annotated regular expressions
|
|
21 |
to keep derivatives small. Such simplifications are promising
|
|
22 |
as we have
|
|
23 |
generated test data that show
|
|
24 |
that a good tight bound can be achieved. We could only
|
|
25 |
partially cover the search space as there are infinitely many regular
|
|
26 |
expressions and strings.
|
|
27 |
|
|
28 |
One modification we introduced is to allow a list of annotated regular
|
|
29 |
expressions in the $\sum$ constructor. This allows us to not just
|
|
30 |
delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
|
|
31 |
also unnecessary ``copies'' of regular expressions (very similar to
|
|
32 |
simplifying $r + r$ to just $r$, but in a more general setting). Another
|
|
33 |
modification is that we use simplification rules inspired by Antimirov's
|
|
34 |
work on partial derivatives. They maintain the idea that only the first
|
|
35 |
``copy'' of a regular expression in an alternative contributes to the
|
|
36 |
calculation of a POSIX value. All subsequent copies can be pruned away from
|
|
37 |
the regular expression. A recursive definition of our simplification function
|
|
38 |
that looks somewhat similar to our Scala code is given below:
|
|
39 |
%\comment{Use $\ZERO$, $\ONE$ and so on.
|
|
40 |
%Is it $ALTS$ or $ALTS$?}\\
|
|
41 |
|
|
42 |
\begin{center}
|
|
43 |
\begin{tabular}{@{}lcl@{}}
|
|
44 |
|
|
45 |
$\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\
|
|
46 |
&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
|
|
47 |
&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
|
|
48 |
&&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\
|
|
49 |
&&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\
|
|
50 |
&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\
|
|
51 |
|
|
52 |
$\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
|
|
53 |
&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
|
|
54 |
&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
|
|
55 |
&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
|
|
56 |
|
|
57 |
$\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
|
|
58 |
\end{tabular}
|
|
59 |
\end{center}
|
|
60 |
|
|
61 |
\noindent
|
|
62 |
The simplification does a pattern matching on the regular expression.
|
|
63 |
When it detected that the regular expression is an alternative or
|
|
64 |
sequence, it will try to simplify its child regular expressions
|
|
65 |
recursively and then see if one of the children turns into $\ZERO$ or
|
|
66 |
$\ONE$, which might trigger further simplification at the current level.
|
|
67 |
The most involved part is the $\sum$ clause, where we use two
|
|
68 |
auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
|
|
69 |
alternatives and reduce as many duplicates as possible. Function
|
|
70 |
$\textit{distinct}$ keeps the first occurring copy only and removes all later ones
|
|
71 |
when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
|
|
72 |
Its recursive definition is given below:
|
|
73 |
|
|
74 |
\begin{center}
|
|
75 |
\begin{tabular}{@{}lcl@{}}
|
|
76 |
$\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
|
|
77 |
(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
|
|
78 |
$\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \; \textit{as'} $ \\
|
|
79 |
$\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise)
|
|
80 |
\end{tabular}
|
|
81 |
\end{center}
|
|
82 |
|
|
83 |
\noindent
|
|
84 |
Here $\textit{flatten}$ behaves like the traditional functional programming flatten
|
|
85 |
function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
|
|
86 |
removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
|
|
87 |
|
|
88 |
Having defined the $\simp$ function,
|
|
89 |
we can use the previous notation of natural
|
|
90 |
extension from derivative w.r.t.~character to derivative
|
|
91 |
w.r.t.~string:%\comment{simp in the [] case?}
|
|
92 |
|
|
93 |
\begin{center}
|
|
94 |
\begin{tabular}{lcl}
|
|
95 |
$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
|
|
96 |
$r \backslash_{simp} [\,] $ & $\dn$ & $r$
|
|
97 |
\end{tabular}
|
|
98 |
\end{center}
|
|
99 |
|
|
100 |
\noindent
|
|
101 |
to obtain an optimised version of the algorithm:
|
|
102 |
|
|
103 |
\begin{center}
|
|
104 |
\begin{tabular}{lcl}
|
|
105 |
$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
|
|
106 |
$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
|
|
107 |
& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
|
|
108 |
& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
|
|
109 |
& & $\;\;\textit{else}\;\textit{None}$
|
|
110 |
\end{tabular}
|
|
111 |
\end{center}
|
|
112 |
|
|
113 |
\noindent
|
|
114 |
This algorithm keeps the regular expression size small, for example,
|
|
115 |
with this simplification our previous $(a + aa)^*$ example's 8000 nodes
|
|
116 |
will be reduced to just 17 and stays constant, no matter how long the
|
|
117 |
input string is.
|
|
118 |
|
|
119 |
|
|
120 |
|
|
121 |
|
|
122 |
|
|
123 |
|
|
124 |
|
|
125 |
|
|
126 |
|
532
|
127 |
|
|
128 |
%----------------------------------------------------------------------------------------
|
|
129 |
% SECTION common identities
|
|
130 |
%----------------------------------------------------------------------------------------
|
|
131 |
\section{Common Identities In Simplification-Related Functions}
|
|
132 |
Need to prove these before starting on the big correctness proof.
|
|
133 |
|
|
134 |
%-----------------------------------
|
|
135 |
% SUBSECTION
|
|
136 |
%-----------------------------------
|
|
137 |
\subsection{Idempotency of $\simp$}
|
|
138 |
|
|
139 |
\begin{equation}
|
|
140 |
\simp \;r = \simp\; \simp \; r
|
|
141 |
\end{equation}
|
|
142 |
This property means we do not have to repeatedly
|
|
143 |
apply simplification in each step, which justifies
|
|
144 |
our definition of $\blexersimp$.
|
|
145 |
It will also be useful in future proofs where properties such as
|
|
146 |
closed forms are needed.
|
|
147 |
The proof is by structural induction on $r$.
|
|
148 |
|
|
149 |
%-----------------------------------
|
|
150 |
% SUBSECTION
|
|
151 |
%-----------------------------------
|
|
152 |
\subsection{Syntactic Equivalence Under $\simp$}
|
|
153 |
We prove that minor differences can be annhilated
|
|
154 |
by $\simp$.
|
|
155 |
For example,
|
|
156 |
\begin{center}
|
|
157 |
$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) =
|
|
158 |
\simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$
|
|
159 |
\end{center}
|
|
160 |
|
|
161 |
|
|
162 |
|
|
163 |
|
|
164 |
|
|
165 |
|
|
166 |
|
|
167 |
%----------------------------------------------------------------------------------------
|
|
168 |
% SECTION corretness proof
|
|
169 |
%----------------------------------------------------------------------------------------
|
|
170 |
\section{Proof Technique of Correctness of Bit-coded Algorithm with Simplification}
|
|
171 |
The non-trivial part of proving the correctness of the algorithm with simplification
|
|
172 |
compared with not having simplification is that we can no longer use the argument
|
|
173 |
in \cref{flex_retrieve}.
|
|
174 |
The function \retrieve needs the structure of the annotated regular expression to
|
|
175 |
agree with the structure of the value, but simplification will always mess with the
|
|
176 |
structure:
|
|
177 |
%TODO: after simp does not agree with each other: (a + 0) --> a v.s. Left(Char(a)) |