532
|
1 |
% Chapter Template
|
|
2 |
|
|
3 |
% Main chapter title
|
|
4 |
\chapter{Correctness of Bit-coded Algorithm with Simplification}
|
|
5 |
|
|
6 |
\label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
|
|
7 |
%Then we illustrate how the algorithm without bitcodes falls short for such aggressive
|
|
8 |
%simplifications and therefore introduce our version of the bitcoded algorithm and
|
|
9 |
%its correctness proof in
|
|
10 |
%Chapter 3\ref{Chapter3}.
|
|
11 |
|
|
12 |
|
|
13 |
|
583
|
14 |
In this chapter we introduce the simplifications
|
|
15 |
on annotated regular expressions that can be applied to
|
|
16 |
each intermediate derivative result. This allows
|
|
17 |
us to make $\blexer$ much more efficient.
|
|
18 |
We contrast this simplification function
|
|
19 |
with Sulzmann and Lu's original
|
|
20 |
simplifications, indicating the simplicity of our algorithm and
|
|
21 |
improvements we made, demostrating
|
|
22 |
the usefulness and reliability of formal proofs on algorithms.
|
|
23 |
These ``aggressive'' simplifications would not be possible in the injection-based
|
|
24 |
lexing we introduced in chapter \ref{Inj}.
|
|
25 |
We then go on to prove the correctness with the improved version of
|
|
26 |
$\blexer$, called $\blexersimp$, by establishing
|
|
27 |
$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
|
538
|
28 |
|
543
|
29 |
\section{Simplification for Annotated Regular Expressions}
|
|
30 |
The first thing we notice in the fast growth of examples such as $(a^*a^*)^*$'s
|
|
31 |
and $(a^* + (aa)^*)^*$'s derivatives is that a lot of duplicated sub-patterns
|
579
|
32 |
are scattered around different levels, and therefore requires
|
|
33 |
de-duplication at different levels:
|
543
|
34 |
\begin{center}
|
579
|
35 |
$(a^*a^*)^* \rightarrow (a^*a^* + a^*)\cdot(a^*a^*)^* \rightarrow $\\
|
583
|
36 |
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^* \rightarrow \ldots$
|
543
|
37 |
\end{center}
|
|
38 |
\noindent
|
579
|
39 |
As we have already mentioned in \ref{eqn:growth2},
|
|
40 |
a simple-minded simplification function cannot simplify
|
583
|
41 |
\begin{center}
|
579
|
42 |
$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
|
583
|
43 |
\end{center}
|
|
44 |
any further, one would expect a better simplification function to work in the
|
579
|
45 |
following way:
|
|
46 |
\begin{gather*}
|
|
47 |
((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* +
|
|
48 |
\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
|
|
49 |
\bigg\downarrow \\
|
|
50 |
(a^*a^* + a^*
|
|
51 |
\color{gray} + a^* \color{black})\cdot(a^*a^*)^* +
|
|
52 |
\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
|
|
53 |
\bigg\downarrow \\
|
|
54 |
(a^*a^* + a^*
|
|
55 |
)\cdot(a^*a^*)^*
|
583
|
56 |
\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
|
|
57 |
\bigg\downarrow \\
|
|
58 |
(a^*a^* + a^*
|
|
59 |
)\cdot(a^*a^*)^*
|
579
|
60 |
\end{gather*}
|
|
61 |
\noindent
|
|
62 |
This motivating example came from testing Sulzmann and Lu's
|
|
63 |
algorithm: their simplification does
|
|
64 |
not work!
|
|
65 |
We quote their $\textit{simp}$ function verbatim here:
|
|
66 |
\begin{center}
|
|
67 |
\begin{tabular}{lcl}
|
|
68 |
$\simp\; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ &
|
|
69 |
$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
|
|
70 |
& &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
|
|
71 |
$\simp\;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if}
|
|
72 |
\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
|
|
73 |
\textit{then} \;\; \ZERO$\\
|
|
74 |
& & $\textit{else}\;\;_{bs}((\simp\;r_1)\cdot
|
|
75 |
(\simp\; r_2))$\\
|
|
76 |
$\simp \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
|
|
77 |
$\simp \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
|
|
78 |
$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
|
|
79 |
$\simp \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simp \; r)$\\
|
|
80 |
$\simp \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum
|
|
81 |
(\nub \; (\filter \; (\not \circ \zeroable)\;((\simp \; r) :: \map \; \simp \; rs)))$\\
|
|
82 |
|
|
83 |
\end{tabular}
|
|
84 |
\end{center}
|
|
85 |
\noindent
|
|
86 |
the $\textit{zeroable}$ predicate
|
|
87 |
which tests whether the regular expression
|
|
88 |
is equivalent to $\ZERO$,
|
|
89 |
is defined as:
|
|
90 |
\begin{center}
|
|
91 |
\begin{tabular}{lcl}
|
|
92 |
$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
|
|
93 |
\zeroable \;_{[]}\sum\;rs $\\
|
|
94 |
$\zeroable\;_{bs}(r_1 \cdot r_2)$ & $\dn$ & $\zeroable\; r_1 \;\; \lor \;\; \zeroable \; r_2$\\
|
|
95 |
$\zeroable\;_{bs}r^*$ & $\dn$ & $\textit{false}$ \\
|
|
96 |
$\zeroable\;_{bs}c$ & $\dn$ & $\textit{false}$\\
|
|
97 |
$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
|
|
98 |
$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
|
|
99 |
\end{tabular}
|
|
100 |
\end{center}
|
|
101 |
\noindent
|
584
|
102 |
They suggested that the $\simp$ function should be
|
|
103 |
applied repeatedly until a fixpoint is reached.
|
|
104 |
We implemented the algorithm as required in Scala,
|
|
105 |
and found that final derivative regular expression
|
|
106 |
size grows exponentially fast:
|
|
107 |
\begin{center}
|
|
108 |
\begin{figure}[H]
|
|
109 |
\begin{tikzpicture}
|
|
110 |
\begin{axis}[
|
|
111 |
xlabel={$n$},
|
|
112 |
ylabel={time in secs},
|
|
113 |
ymode = log,
|
|
114 |
legend entries={Sulzmann and Lu's $\simp$},
|
|
115 |
legend pos=north west,
|
|
116 |
legend cell align=left]
|
|
117 |
\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
|
|
118 |
\end{axis}
|
|
119 |
\end{tikzpicture}
|
|
120 |
\caption{Matching the regular expression $(a^*a^*)^*$ against strings of the form
|
|
121 |
$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
|
|
122 |
$ using Sulzmann and Lu's lexer algorithm}\label{SulzmannLuLexer}
|
|
123 |
\end{figure}
|
|
124 |
\end{center}
|
|
125 |
\noindent
|
|
126 |
At $n= 20$ we already get an out of memory error with Scala's normal
|
|
127 |
JVM heap size settings,
|
|
128 |
which seems a counterexample for their linear complexity claim:
|
|
129 |
\begin{quote}\it
|
|
130 |
Linear-Time Complexity Claim It is easy to see that each call of one of the functions/operations\b,simp,fuse,mkEpsBCandisPhi leadstosubcallswhose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. We yet need to work out detailed estimates regarding the space complexity of our algorithm.
|
|
131 |
\end{quote}
|
|
132 |
|
|
133 |
|
|
134 |
|
543
|
135 |
Despite that we have already implemented the simple-minded simplifications
|
|
136 |
such as throwing away useless $\ONE$s and $\ZERO$s.
|
|
137 |
The simplification rule $r + r \rightarrow $ cannot make a difference either
|
|
138 |
since it only removes duplicates on the same level, not something like
|
|
139 |
$(r+a)+r \rightarrow r+a$.
|
|
140 |
This requires us to break up nested alternatives into lists, for example
|
|
141 |
using the flatten operation similar to the one defined for any function language's
|
|
142 |
list datatype:
|
538
|
143 |
|
543
|
144 |
\begin{center}
|
|
145 |
\begin{tabular}{@{}lcl@{}}
|
|
146 |
$\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
|
|
147 |
(\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
|
|
148 |
$\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \; \textit{as'} $ \\
|
|
149 |
$\textit{flts} \; a :: as'$ & $\dn$ & $a :: \textit{flts} \; \textit{as'}$ \quad(otherwise)
|
|
150 |
\end{tabular}
|
|
151 |
\end{center}
|
|
152 |
|
|
153 |
\noindent
|
|
154 |
There is a minor difference though, in that our $\flts$ operation defined by us
|
|
155 |
also throws away $\ZERO$s.
|
|
156 |
For a flattened list of regular expressions, a de-duplication can be done efficiently:
|
|
157 |
|
|
158 |
|
|
159 |
\begin{center}
|
|
160 |
\begin{tabular}{@{}lcl@{}}
|
|
161 |
$\distinctBy \; [] \; f\; acc $ & $ =$ & $ []$\\
|
|
162 |
$\distinctBy \; (x :: xs) \; f \; acc$ & $=$ & \\
|
|
163 |
& & $\quad \textit{if} (f \; x) \in acc \textit{then} \distinctBy \; xs \; f \; acc$\\
|
|
164 |
& & $\quad \textit{else} x :: (\distinctBy \; xs \; f \; (\{f \; x\} \cup acc))$
|
|
165 |
\end{tabular}
|
|
166 |
\end{center}
|
|
167 |
\noindent
|
|
168 |
The reason we define a distinct function under a mapping $f$ is because
|
|
169 |
we want to eliminate regular expressions that are the same syntactically,
|
|
170 |
but having different bit-codes (more on the reason why we can do this later).
|
|
171 |
To achieve this, we call $\erase$ as the function $f$ during the distinction
|
|
172 |
operation.
|
|
173 |
A recursive definition of our simplification function
|
538
|
174 |
that looks somewhat similar to our Scala code is given below:
|
|
175 |
|
|
176 |
\begin{center}
|
|
177 |
\begin{tabular}{@{}lcl@{}}
|
|
178 |
|
543
|
179 |
$\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp} \; a_2) $ \\
|
|
180 |
$\textit{bsimp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{bsimp}_{ALTS} \; \textit{bs} \; (\textit{distinctBy} \; ( \textit{flatten} ( \textit{map} \; bsimp \; as)) \; \erase \; \phi) $ \\
|
|
181 |
$\textit{bsimp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$
|
538
|
182 |
\end{tabular}
|
|
183 |
\end{center}
|
|
184 |
|
|
185 |
\noindent
|
543
|
186 |
The simplification (named $\bsimp$ for \emph{b}it-coded) does a pattern matching on the regular expression.
|
538
|
187 |
When it detected that the regular expression is an alternative or
|
543
|
188 |
sequence, it will try to simplify its children regular expressions
|
538
|
189 |
recursively and then see if one of the children turns into $\ZERO$ or
|
|
190 |
$\ONE$, which might trigger further simplification at the current level.
|
543
|
191 |
Current level simplifications are handled by the function $\textit{bsimp}_{ASEQ}$,
|
|
192 |
using rules such as $\ZERO \cdot r \rightarrow \ZERO$ and $\ONE \cdot r \rightarrow r$.
|
|
193 |
\begin{center}
|
|
194 |
\begin{tabular}{@{}lcl@{}}
|
|
195 |
$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
|
|
196 |
&&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\
|
|
197 |
&&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\
|
|
198 |
&&$\quad\textit{case} \; (_{bs1}\ONE, a_2') \Rightarrow \textit{fuse} \; (bs@bs_1) \; a_2'$ \\
|
|
199 |
&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$
|
|
200 |
\end{tabular}
|
|
201 |
\end{center}
|
538
|
202 |
\noindent
|
543
|
203 |
The most involved part is the $\sum$ clause, where we first call $\flts$ on
|
|
204 |
the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
|
|
205 |
and then call $\distinctBy$ on that list, the predicate determining whether two
|
|
206 |
elements are the same is $\erase \; r_1 = \erase\; r_2$.
|
|
207 |
Finally, depending on whether the regular expression list $as'$ has turned into a
|
|
208 |
singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
|
|
209 |
decides whether to keep the current level constructor $\sum$ as it is, and
|
|
210 |
removes it when there are less than two elements:
|
|
211 |
\begin{center}
|
|
212 |
\begin{tabular}{lcl}
|
|
213 |
$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\
|
|
214 |
&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
|
|
215 |
&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
|
|
216 |
&&$\quad\textit{case} \; as' \Rightarrow _{bs}\sum \textit{as'}$\\
|
|
217 |
\end{tabular}
|
|
218 |
|
|
219 |
\end{center}
|
|
220 |
Having defined the $\bsimp$ function,
|
|
221 |
we add it as a phase after a derivative is taken,
|
|
222 |
so it stays small:
|
|
223 |
\begin{center}
|
|
224 |
\begin{tabular}{lcl}
|
|
225 |
$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
|
|
226 |
\end{tabular}
|
|
227 |
\end{center}
|
|
228 |
Following previous notation of natural
|
538
|
229 |
extension from derivative w.r.t.~character to derivative
|
543
|
230 |
w.r.t.~string, we define the derivative that nests simplifications with derivatives:%\comment{simp in the [] case?}
|
538
|
231 |
|
|
232 |
\begin{center}
|
|
233 |
\begin{tabular}{lcl}
|
543
|
234 |
$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
|
|
235 |
$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
|
538
|
236 |
\end{tabular}
|
|
237 |
\end{center}
|
|
238 |
|
|
239 |
\noindent
|
543
|
240 |
Extracting bit-codes from the derivatives that had been simplified repeatedly after
|
|
241 |
each derivative run, the simplified $\blexer$, called $\blexersimp$,
|
|
242 |
is defined as
|
538
|
243 |
\begin{center}
|
|
244 |
\begin{tabular}{lcl}
|
|
245 |
$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
|
|
246 |
$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
|
|
247 |
& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
|
|
248 |
& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
|
|
249 |
& & $\;\;\textit{else}\;\textit{None}$
|
|
250 |
\end{tabular}
|
|
251 |
\end{center}
|
|
252 |
|
|
253 |
\noindent
|
|
254 |
This algorithm keeps the regular expression size small, for example,
|
543
|
255 |
with this simplification our previous $(a + aa)^*$ example's fast growth (over
|
|
256 |
$10^5$ nodes at around $20$ input length)
|
538
|
257 |
will be reduced to just 17 and stays constant, no matter how long the
|
|
258 |
input string is.
|
543
|
259 |
We show some graphs to better demonstrate this imrpovement.
|
538
|
260 |
|
|
261 |
|
543
|
262 |
\section{$(a+aa)^*$ and $(a^*\cdot a^*)^*$ against $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}$ After Simplification}
|
|
263 |
For $(a+aa)^*$, it used to grow to over $9000$ nodes with simple-minded simplification
|
|
264 |
at only around $15$ input characters:
|
|
265 |
\begin{center}
|
539
|
266 |
\begin{tabular}{ll}
|
|
267 |
\begin{tikzpicture}
|
|
268 |
\begin{axis}[
|
|
269 |
xlabel={$n$},
|
|
270 |
ylabel={derivative size},
|
|
271 |
width=7cm,
|
|
272 |
height=4cm,
|
543
|
273 |
legend entries={Lexer with $\textit{bsimp}$},
|
|
274 |
legend pos= south east,
|
|
275 |
legend cell align=left]
|
|
276 |
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_bsimp.data};
|
|
277 |
\end{axis}
|
|
278 |
\end{tikzpicture} %\label{fig:BitcodedLexer}
|
|
279 |
&
|
|
280 |
\begin{tikzpicture}
|
|
281 |
\begin{axis}[
|
|
282 |
xlabel={$n$},
|
|
283 |
ylabel={derivative size},
|
|
284 |
width = 7cm,
|
|
285 |
height = 4cm,
|
|
286 |
legend entries={Lexer without $\textit{bsimp}$},
|
|
287 |
legend pos= north west,
|
|
288 |
legend cell align=left]
|
|
289 |
\addplot[red,mark=*, mark options={fill=white}] table {a_aa_star_easysimp.data};
|
|
290 |
\end{axis}
|
|
291 |
\end{tikzpicture}
|
|
292 |
\end{tabular}
|
|
293 |
\end{center}
|
|
294 |
And for $(a^*\cdot a^*)^*$, unlike in \ref{fig:BetterWaterloo}, the size with simplification now stay nicely constant with current
|
|
295 |
simplification rules (we put the graphs together to show the contrast)
|
|
296 |
\begin{center}
|
|
297 |
\begin{tabular}{ll}
|
|
298 |
\begin{tikzpicture}
|
|
299 |
\begin{axis}[
|
|
300 |
xlabel={$n$},
|
|
301 |
ylabel={derivative size},
|
|
302 |
width=7cm,
|
|
303 |
height=4cm,
|
|
304 |
legend entries={Lexer with $\textit{bsimp}$},
|
539
|
305 |
legend pos= south east,
|
|
306 |
legend cell align=left]
|
|
307 |
\addplot[red,mark=*, mark options={fill=white}] table {BitcodedLexer.data};
|
|
308 |
\end{axis}
|
|
309 |
\end{tikzpicture} %\label{fig:BitcodedLexer}
|
|
310 |
&
|
|
311 |
\begin{tikzpicture}
|
|
312 |
\begin{axis}[
|
|
313 |
xlabel={$n$},
|
|
314 |
ylabel={derivative size},
|
|
315 |
width = 7cm,
|
|
316 |
height = 4cm,
|
543
|
317 |
legend entries={Lexer without $\textit{bsimp}$},
|
539
|
318 |
legend pos= north west,
|
|
319 |
legend cell align=left]
|
|
320 |
\addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
|
|
321 |
\end{axis}
|
|
322 |
\end{tikzpicture}
|
|
323 |
\end{tabular}
|
543
|
324 |
\end{center}
|
538
|
325 |
|
543
|
326 |
%----------------------------------------------------------------------------------------
|
|
327 |
% SECTION rewrite relation
|
|
328 |
%----------------------------------------------------------------------------------------
|
|
329 |
\section{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
|
576
|
330 |
In the $\blexer$'s correctness proof, we
|
|
331 |
did not directly derive the fact that $\blexer$ gives out the POSIX value,
|
|
332 |
but first prove that $\blexer$ is linked with $\lexer$.
|
|
333 |
Then we re-use
|
|
334 |
the correctness of $\lexer$.
|
|
335 |
Here we follow suit by first proving that
|
|
336 |
$\blexersimp \; r \; s $
|
|
337 |
produces the same output as $\blexer \; r\; s$,
|
|
338 |
and then putting it together with
|
|
339 |
$\blexer$'s correctness result
|
|
340 |
($(r, s) \rightarrow v \implies \blexer \; r \;s = v$)
|
|
341 |
to obtain half of the correctness property (the other half
|
|
342 |
being when $s$ is not $L \; r$ which is routine to establish)
|
|
343 |
\begin{center}
|
|
344 |
$(r, s) \rightarrow v \implies \blexersimp \; r \; s = v$
|
|
345 |
\end{center}
|
|
346 |
\noindent
|
|
347 |
because it makes the proof simpler
|
|
348 |
The overall idea for the correctness of our simplified bitcoded lexer
|
|
349 |
\noindent
|
543
|
350 |
is that the $\textit{bsimp}$ will not change the regular expressions so much that
|
|
351 |
it becomes impossible to extract the $\POSIX$ values.
|
|
352 |
To capture this "similarity" between unsimplified regular expressions and simplified ones,
|
|
353 |
we devise the rewriting relation $\rrewrite$, written infix as $\rightsquigarrow$:
|
538
|
354 |
|
543
|
355 |
\begin{center}
|
|
356 |
\begin{mathpar}
|
|
357 |
\inferrule{}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
|
|
358 |
|
|
359 |
\inferrule{}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
|
|
360 |
|
|
361 |
\inferrule{}{_{bs1} ((_{bs2} \ONE) \cdot r) \rightsquigarrow \fuse \; (bs_1 @ bs_2) \; r\\}\\
|
|
362 |
|
|
363 |
|
|
364 |
|
|
365 |
\inferrule{\\ r_1 \rightsquigarrow r_2}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_2 \cdot r_3\\}
|
|
366 |
|
|
367 |
\inferrule{\\ r_3 \rightsquigarrow r_4}{_{bs} r_1 \cdot r_3 \rightsquigarrow _{bs} r_1 \cdot r_4\\}\\
|
|
368 |
|
|
369 |
\inferrule{}{ _{bs}\sum [] \rightsquigarrow \ZERO\\}
|
|
370 |
|
|
371 |
\inferrule{}{ _{bs}\sum [a] \rightsquigarrow \fuse \; bs \; a\\}
|
|
372 |
|
|
373 |
\inferrule{\\ rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{_{bs}\sum rs_1 \rightsquigarrow rs_2\\}
|
|
374 |
|
|
375 |
\inferrule{}{\\ [] \stackrel{s}{\rightsquigarrow} []}
|
|
376 |
|
|
377 |
\inferrule{rs_1 \stackrel{s}{\rightsquigarrow} rs_2}{\\ r :: rs_1 \rightsquigarrow r :: rs_2 \rightsquigarrow}
|
|
378 |
|
|
379 |
\inferrule{r_1 \rightsquigarrow r_2}{ r_1 :: rs \stackrel{s}{\rightsquigarrow} r_2 :: rs\\}
|
|
380 |
|
|
381 |
\inferrule{}{\ZERO :: rs \stackrel{s}{\rightsquigarrow} rs}
|
|
382 |
|
|
383 |
\inferrule{}{_{bs} \sum (rs_1 :: rs_b) \stackrel{s}{\rightsquigarrow} ((\map \; (\fuse \; bs_1) \; rs_1) @ rsb) }
|
|
384 |
|
|
385 |
\inferrule{\\ \rerase{a_1} = \rerase{a_2}}{rs_a @ [a_1] @ rs_b @ [a_2] @ rsc \stackrel{s}{\rightsquigarrow} rs_a @ [a_1] @ rs_b @ rs_c}
|
|
386 |
|
|
387 |
\end{mathpar}
|
|
388 |
\end{center}
|
|
389 |
These "rewrite" steps define the atomic simplifications we could impose on regular expressions
|
|
390 |
under our simplification algorithm.
|
|
391 |
For convenience, we define a relation $\stackrel{s}{\rightsquigarrow}$ for rewriting
|
|
392 |
a list of regular exression to another list.
|
|
393 |
The $\rerase{}$ function is used instead of $_\downarrow$ for the finiteness bound proof of next chapter,
|
|
394 |
which we will discuss later. For the moment the reader can assume they basically do the same thing as $\erase$.
|
|
395 |
|
|
396 |
Usually more than one steps are taking place during the simplification of a regular expression,
|
|
397 |
therefore we define the reflexive transitive closure of the $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
|
|
398 |
relations:
|
|
399 |
|
|
400 |
\begin{center}
|
|
401 |
\begin{mathpar}
|
|
402 |
\inferrule{}{ r \stackrel{*}{\rightsquigarrow} r \\}
|
|
403 |
\inferrule{}{\\ rs \stackrel{s*}{\rightsquigarrow} rs \\}
|
|
404 |
\inferrule{\\r_1 \stackrel{*}{\rightsquigarrow} r_2 \land \; r_2 \stackrel{*}{\rightsquigarrow} r_3}{r_1 \stackrel{*}{\rightsquigarrow} r_3\\}
|
|
405 |
\inferrule{\\rs_1 \stackrel{*}{\rightsquigarrow} rs_2 \land \; rs_2 \stackrel{*}{\rightsquigarrow} rs_3}{rs_1 \stackrel{*}{\rightsquigarrow} rs_3}
|
|
406 |
\end{mathpar}
|
|
407 |
\end{center}
|
|
408 |
Now that we have modelled the rewriting behaviour of our simplifier $\bsimp$, we prove mainly
|
|
409 |
three properties about how these relations connect to $\blexersimp$:
|
|
410 |
\begin{itemize}
|
|
411 |
\item
|
|
412 |
$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
|
|
413 |
The algorithm $\bsimp$ only transforms the regex $r$ using steps specified by
|
|
414 |
$\stackrel{*}{\rightsquigarrow}$ and nothing else.
|
|
415 |
\item
|
|
416 |
$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow r'\backslash c$.
|
|
417 |
The relation $\stackrel{*}{rightsquigarrow}$ is preserved under derivative.
|
|
418 |
\item
|
|
419 |
$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$. If we reach another
|
|
420 |
expression in finitely many atomic simplification steps, then these two regular expressions
|
|
421 |
will produce the same bit-codes under the bit collection function $\bmkeps$ used by our $\blexer$.
|
|
422 |
|
|
423 |
\end{itemize}
|
|
424 |
\section{Three Important properties}
|
|
425 |
These properties would work together towards the correctness theorem.
|
|
426 |
We start proving each of these lemmas below.
|
|
427 |
\subsection{$(r \stackrel{*}{\rightsquigarrow} r'\land \bnullable \; r_1) \implies \bmkeps \; r = \bmkeps \; r'$ and $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
|
|
428 |
The first few properties we establish is that the inference rules we gave for $\rightsquigarrow$
|
|
429 |
and $\stackrel{s}{\rightsquigarrow}$ also hold as implications for $\stackrel{*}{\rightsquigarrow}$ and
|
|
430 |
$\stackrel{s*}{\rightsquigarrow}$.
|
|
431 |
\begin{lemma}
|
|
432 |
$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
|
|
433 |
\end{lemma}
|
|
434 |
\begin{proof}
|
|
435 |
By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
|
|
436 |
\end{proof}
|
|
437 |
\begin{lemma}
|
|
438 |
$r \stackrel{*}{\rightsquigarrow} r' \implies _{bs} \sum r :: rs \stackrel{*}{\rightsquigarrow} _{bs} \sum r' :: rs$
|
|
439 |
\end{lemma}
|
|
440 |
\begin{proof}
|
|
441 |
By rule induction of $\stackrel{*}{\rightsquigarrow} $.
|
|
442 |
\end{proof}
|
|
443 |
\noindent
|
|
444 |
Then we establish that the $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ relation is preserved w.r.t appending and prepending of a list:
|
|
445 |
\begin{lemma}
|
|
446 |
$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s}{\rightsquigarrow} rs @ rs_2$
|
|
447 |
\end{lemma}
|
|
448 |
\begin{proof}
|
|
449 |
By induction on the list $rs$.
|
|
450 |
\end{proof}
|
|
451 |
|
|
452 |
\begin{lemma}
|
|
453 |
$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs @ rs_1 \stackrel{s*}{\rightsquigarrow} rs @ rs_2$
|
|
454 |
\end{lemma}
|
|
455 |
\begin{proof}
|
|
456 |
By rule induction of $\stackrel{s*}{\rightsquigarrow}$.
|
|
457 |
\end{proof}
|
|
458 |
|
|
459 |
\noindent
|
|
460 |
The $\stackrel{s}{\rightsquigarrow} $ relation after appending a list becomes $\stackrel{s*}{\rightsquigarrow}$:
|
|
461 |
\begin{lemma}\label{ssgqTossgs}
|
|
462 |
$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
|
|
463 |
\end{lemma}
|
|
464 |
\begin{proof}
|
|
465 |
By rule induction of $\stackrel{s}{\rightsquigarrow}$.
|
|
466 |
\end{proof}
|
|
467 |
\begin{lemma}
|
|
468 |
$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies rs_1 @ rs \stackrel{s*}{\rightsquigarrow} rs_2 @ rs$
|
|
469 |
\end{lemma}
|
|
470 |
\begin{proof}
|
|
471 |
By rule induction of $\stackrel{s*}{\rightsquigarrow}$ and using \ref{ssgqTossgs}.
|
|
472 |
\end{proof}
|
|
473 |
Here are two lemmas relating $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$:
|
|
474 |
\begin{lemma}\label{singleton}
|
|
475 |
$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies [r_1] \stackrel{s*}{\rightsquigarrow} [r_2]$
|
|
476 |
\end{lemma}
|
|
477 |
\begin{proof}
|
|
478 |
By rule induction of $ \stackrel{*}{\rightsquigarrow} $.
|
|
479 |
\end{proof}
|
|
480 |
\begin{lemma}
|
|
481 |
$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies
|
|
482 |
r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
|
|
483 |
\end{lemma}
|
|
484 |
\begin{proof}
|
|
485 |
By using \ref{singleton}.
|
|
486 |
\end{proof}
|
|
487 |
Now we get to the "meaty" part of the proof, which relates the relations $\stackrel{s*}{\rightsquigarrow}$ and
|
|
488 |
$\stackrel{*}{\rightsquigarrow} $ with our simplification components such $\distinctBy$ and $\flts$.
|
|
489 |
The first lemma below says that for a list made of two parts $rs_1 @ rs_2$, one can throw away the duplicate
|
|
490 |
elements in $rs_2$, as well as those that have appeared in $rs_1$:
|
|
491 |
\begin{lemma}
|
|
492 |
$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} (rs_1 @ (\distinctBy \; rs_2 \; \; \rerase{\_}\; \; (\map\;\; \rerase{\_}\; \; rs_1)))$
|
|
493 |
\end{lemma}
|
|
494 |
\begin{proof}
|
|
495 |
By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
|
|
496 |
\end{proof}
|
|
497 |
The above h as the corollary that is suitable for the actual way $\distinctBy$ is called in $\bsimp$:
|
|
498 |
\begin{lemma}\label{dBPreserves}
|
|
499 |
$rs_ 1 \rightarrow \distinctBy \; rs_1 \; \phi$
|
|
500 |
\end{lemma}
|
538
|
501 |
|
|
502 |
|
532
|
503 |
|
543
|
504 |
The flatten function $\flts$ works within the $\rightsquigarrow$ relation:
|
|
505 |
\begin{lemma}\label{fltsPreserves}
|
|
506 |
$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
|
|
507 |
\end{lemma}
|
|
508 |
|
|
509 |
The rewriting in many steps property is composible in terms of regular expression constructors:
|
|
510 |
\begin{lemma}
|
|
511 |
$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} \; _{bs} r_2 \cdot r_3 \quad $ and
|
|
512 |
$r_3 \stackrel{*}{\rightsquigarrow} r_4 \implies _{bs} r_1 \cdot r_3 \stackrel{*}{\rightsquigarrow} _{bs} \; r_1 \cdot r_4$
|
|
513 |
\end{lemma}
|
532
|
514 |
|
543
|
515 |
The rewriting in many steps properties $\stackrel{*}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ is preserved under the function $\fuse$:
|
|
516 |
\begin{lemma}
|
|
517 |
$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies \fuse \; bs \; r_1 \stackrel{*}{\rightsquigarrow} \; \fuse \; bs \; r_2 \quad $ and
|
|
518 |
$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\fuse \; bs) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs) \; rs_2$
|
|
519 |
\end{lemma}
|
|
520 |
\begin{proof}
|
|
521 |
By the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
|
|
522 |
$rs_2 \stackrel{s}{\rightsquigarrow} rs_3 \implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$.
|
|
523 |
\end{proof}
|
|
524 |
\noindent
|
|
525 |
If we could rewrite a regular expression in many steps to $\ZERO$, then
|
|
526 |
we could also rewrite any sequence containing it to $\ZERO$:
|
|
527 |
\begin{lemma}
|
|
528 |
$r_1 \stackrel{*}{\rightsquigarrow} \ZERO \implies _{bs}r_1\cdot r_2 \stackrel{*}{\rightsquigarrow} \ZERO$
|
|
529 |
\end{lemma}
|
|
530 |
\begin{lemma}
|
|
531 |
$\bmkeps \; (r \backslash s) = \bmkeps \; \bderssimp{r}{s}$
|
|
532 |
\end{lemma}
|
|
533 |
\noindent
|
|
534 |
The function $\bsimpalts$ preserves rewritability:
|
|
535 |
\begin{lemma}\label{bsimpaltsPreserves}
|
|
536 |
$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
|
|
537 |
\end{lemma}
|
|
538 |
\noindent
|
|
539 |
Before we give out the next lemmas, we define a predicate for a list of regular expressions
|
|
540 |
having at least one nullable regular expressions:
|
|
541 |
\begin{definition}
|
|
542 |
$\textit{bnullables} \; rs \dn \exists r \in rs. \bnullable r$
|
|
543 |
\end{definition}
|
|
544 |
The rewriting relation $\rightsquigarrow$ preserves nullability:
|
|
545 |
\begin{lemma}
|
|
546 |
$r_1 \rightsquigarrow r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$ and
|
|
547 |
$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \textit{bnullables} \; rs_1 = \textit{bnullables} \; rs_2$
|
|
548 |
\end{lemma}
|
|
549 |
\begin{proof}
|
|
550 |
By rule induction of $\rightarrow$ and $\stackrel{s}{\rightsquigarrow}$.
|
|
551 |
\end{proof}
|
|
552 |
So does the many steps rewriting:
|
|
553 |
\begin{lemma}\label{rewritesBnullable}
|
|
554 |
$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies \bnullable \; r_1 = \bnullable \; r_2$
|
|
555 |
\end{lemma}
|
|
556 |
\begin{proof}
|
|
557 |
By rule induction of $\stackrel{*}{\rightsquigarrow} $.
|
|
558 |
\end{proof}
|
|
559 |
\noindent
|
|
560 |
And if both regular expressions in a rewriting relation are nullable, then they
|
|
561 |
produce the same bit-codes:
|
532
|
562 |
|
543
|
563 |
\begin{lemma}\label{rewriteBmkepsAux}
|
|
564 |
$r_1 \rightsquigarrow r_2 \implies (\bnullable \; r_1 \land \bnullable \; r_2 \implies \bmkeps \; r_1 =
|
|
565 |
\bmkeps \; r_2)$ and
|
|
566 |
$rs_ 1 \stackrel{s}{\rightsquigarrow} rs_2 \implies (\bnullables \; rs_1 \land \bnullables \; rs_2 \implies
|
|
567 |
\bmkepss \; rs_1 = \bmkepss \; rs2)$
|
|
568 |
\end{lemma}
|
|
569 |
\noindent
|
|
570 |
The definition of $\bmkepss$ on list $rs$ is just to extract the bit-codes on the first element in $rs$ that
|
|
571 |
is $bnullable$:
|
532
|
572 |
\begin{center}
|
543
|
573 |
\begin{tabular}{lcl}
|
|
574 |
$\bmkepss \; [] $ & $\dn$ & $[]$\\
|
|
575 |
$\bmkepss \; r :: rs$ & $\dn$ & $\textit{if} \; \bnullable \; r then \; (\bmkeps \; r) \; \textit{else} \; \bmkepss \; rs$
|
|
576 |
\end{tabular}
|
532
|
577 |
\end{center}
|
543
|
578 |
\noindent
|
|
579 |
And now we are ready to prove the key property that if you
|
|
580 |
have two regular expressions, one rewritable in many steps to the other,
|
|
581 |
and one of them is $\bnullable$, then they will both yield the same bits under $\bmkeps$:
|
|
582 |
\begin{lemma}
|
576
|
583 |
$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r_1 \;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
|
543
|
584 |
\end{lemma}
|
|
585 |
\begin{proof}
|
|
586 |
By rule induction of $\stackrel{*}{\rightsquigarrow} $, using \ref{rewriteBmkepsAux} and $\ref{rewritesBnullable}$.
|
|
587 |
\end{proof}
|
|
588 |
\noindent
|
|
589 |
the other property is also ready:
|
|
590 |
\begin{lemma}
|
|
591 |
$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
|
|
592 |
\end{lemma}
|
|
593 |
\begin{proof}
|
|
594 |
By an induction on $r$.
|
|
595 |
The most difficult case would be the alternative case, where we using properties such as \ref{bsimpaltsPreserves} and \ref{fltsPreserves} and \ref{dBPreserves}, we could continuously rewrite a list like:\\
|
|
596 |
$rs \stackrel{s*}{\rightsquigarrow} \map \; \textit{bsimp} \; rs$\\
|
|
597 |
$\ldots \stackrel{s*}{\rightsquigarrow} \flts \; (\map \; \textit{bsimp} \; rs)$\\
|
|
598 |
$\ldots \;\stackrel{s*}{\rightsquigarrow} \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi$\\
|
|
599 |
Then we could do the following regular expresssion many steps rewrite:\\
|
|
600 |
$ _{bs} \sum \distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi \stackrel{*}{\rightsquigarrow} \bsimpalts \; bs \; (\distinctBy \; (\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerase \; \phi)$
|
|
601 |
\\
|
|
602 |
|
|
603 |
\end{proof}
|
|
604 |
\section{Proof for the Property: $r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
|
|
605 |
The first thing we prove is that if we could rewrite in one step, then after derivative
|
|
606 |
we could rewrite in many steps:
|
|
607 |
\begin{lemma}\label{rewriteBder}
|
|
608 |
$r_1 \rightsquigarrow r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$ and
|
|
609 |
$rs_1 \stackrel{s}{\rightsquigarrow} rs_2 \implies \map \; (\_\backslash c) \; rs_1 \stackrel{s*}{\rightsquigarrow} \map \; (\_ \backslash c) \; rs_2$
|
|
610 |
\end{lemma}
|
|
611 |
\begin{proof}
|
|
612 |
By induction on $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
|
|
613 |
\end{proof}
|
|
614 |
Now we can prove that once we could rewrite from one expression to another in many steps,
|
|
615 |
then after a derivative on both sides we could still rewrite one to another in many steps:
|
|
616 |
\begin{lemma}
|
|
617 |
$r_1 \stackrel{*}{\rightsquigarrow} r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$
|
|
618 |
\end{lemma}
|
|
619 |
\begin{proof}
|
|
620 |
By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma :\ref{rewriteBder}.
|
|
621 |
\end{proof}
|
|
622 |
\noindent
|
|
623 |
This can be extended and combined with the previous two important properties
|
|
624 |
so that a regular expression's successivve derivatives can be rewritten in many steps
|
|
625 |
to its simplified counterpart:
|
|
626 |
\begin{lemma}\label{bderBderssimp}
|
|
627 |
$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $
|
|
628 |
\end{lemma}
|
|
629 |
\subsection{Main Theorem}
|
|
630 |
Now with \ref{bdersBderssimp} we are ready for the main theorem.
|
|
631 |
To link $\blexersimp$ and $\blexer$,
|
|
632 |
we first say that they give out the same bits, if the lexing result is a match:
|
|
633 |
\begin{lemma}
|
|
634 |
$\bnullable \; (a \backslash s) \implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
|
|
635 |
\end{lemma}
|
|
636 |
\noindent
|
|
637 |
Now that they give out the same bits, we know that they give the same value after decoding,
|
|
638 |
which we know is correct value as $\blexer$ is correct:
|
|
639 |
\begin{theorem}
|
|
640 |
$\blexer \; r \; s = \blexersimp{r}{s}$
|
|
641 |
\end{theorem}
|
|
642 |
\noindent
|
576
|
643 |
\begin{proof}
|
|
644 |
One can rewrite in many steps from the original lexer's derivative regular expressions to the
|
|
645 |
lexer with simplification applied:
|
|
646 |
$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
|
|
647 |
If two regular expressions are rewritable, then they produce the same bits.
|
|
648 |
Under the condition that $ r_1$ is nullable, we have
|
|
649 |
$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r', \;\;\text{then} \;\; \bmkeps \; r = \bmkeps \; r'$.
|
|
650 |
This proves the \emph{if} (interesting) branch of
|
|
651 |
$\blexer \; r \; s = \blexersimp{r}{s}$.
|
|
652 |
|
|
653 |
\end{proof}
|
|
654 |
\noindent
|
|
655 |
As a corollary,
|
|
656 |
we link this result with the lemma we proved earlier that
|
|
657 |
\begin{center}
|
|
658 |
$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
|
|
659 |
\end{center}
|
|
660 |
and obtain the corollary that the bit-coded lexer with simplification is
|
|
661 |
indeed correctly outputting POSIX lexing result, if such a result exists.
|
|
662 |
\begin{corollary}
|
|
663 |
$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
|
|
664 |
\end{corollary}
|
532
|
665 |
|
543
|
666 |
\subsection{Comments on the Proof Techniques Used}
|
532
|
667 |
The non-trivial part of proving the correctness of the algorithm with simplification
|
|
668 |
compared with not having simplification is that we can no longer use the argument
|
|
669 |
in \cref{flex_retrieve}.
|
543
|
670 |
The function \retrieve needs the cumbersome structure of the (umsimplified)
|
|
671 |
annotated regular expression to
|
532
|
672 |
agree with the structure of the value, but simplification will always mess with the
|
543
|
673 |
structure.
|
|
674 |
|
|
675 |
We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
|
|
676 |
but this turns out to be not true, A counterexample of this being
|
|
677 |
\[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
|
|
678 |
\]
|
|
679 |
|
|
680 |
Then we would have $\bsimp{a \backslash s}$ being
|
|
681 |
$_{[]}(_{ZZ}\ONE + _{ZS}c ) $
|
|
682 |
whereas $\bsimp{\bderssimp{a}{s}}$ would be
|
|
683 |
$_{Z}(_{Z} \ONE + _{S} c)$.
|
|
684 |
Unfortunately if we apply $\textit{bsimp}$ at different
|
|
685 |
stages we will always have this discrepancy, due to
|
|
686 |
whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
|
|
687 |
is taken at some points will be entirely dependant on when the simplification
|
|
688 |
take place whether there is a larger alternative structure surrounding the
|
|
689 |
alternative being simplified.
|
|
690 |
The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
|
|
691 |
us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
|
|
692 |
are taken, but simply say that they can be taken to make two similar
|
|
693 |
regular expressions equal, and can be done after interleaving derivatives
|
|
694 |
and simplifications.
|
|
695 |
|
|
696 |
|
|
697 |
Having correctness property is good. But we would also like the lexer to be efficient in
|
|
698 |
some sense, for exampe, not grinding to a halt at certain cases.
|
|
699 |
In the next chapter we shall prove that for a given $r$, the internal derivative size is always
|
|
700 |
finitely bounded by a constant.
|
582
|
701 |
we would expect in the
|