3 \chapter{A Better Bound and Other Extensions} % Main chapter title |
3 \chapter{A Better Bound and Other Extensions} % Main chapter title |
4 |
4 |
5 \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound |
5 \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound |
6 %in Chapter 4 to a polynomial one, and demonstrate how one can extend the |
6 %in Chapter 4 to a polynomial one, and demonstrate how one can extend the |
7 %algorithm to include constructs such as bounded repetitions and negations. |
7 %algorithm to include constructs such as bounded repetitions and negations. |
8 |
8 \lstset{style=myScalastyle} |
|
9 |
|
10 |
|
11 This chapter is a ``miscellaneous'' |
|
12 chapter which records various |
|
13 extensions to our $\blexersimp$'s formalisations.\\ |
|
14 Firstly we present further improvements |
|
15 made to our lexer algorithm $\blexersimp$. |
|
16 We devise a stronger simplification algorithm, |
|
17 called $\bsimpStrong$, which can prune away |
|
18 similar components in two regular expressions at the same |
|
19 alternative level, |
|
20 even if these regular expressions are not exactly the same. |
|
21 We call the lexer that uses this stronger simplification function |
|
22 $\blexerStrong$. |
|
23 We conjecture that both |
|
24 \begin{center} |
|
25 $\blexerStrong \;r \; s = \blexer\; r\;s$ |
|
26 \end{center} |
|
27 and |
|
28 \begin{center} |
|
29 $\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$ |
|
30 \end{center} |
|
31 hold, but formalising |
|
32 them is still work in progress. |
|
33 We give reasons why the correctness and cubic size bound proofs |
|
34 can be achieved, |
|
35 by exploring the connection between the internal |
|
36 data structure of our $\blexerStrong$ and |
|
37 Animirov's partial derivatives.\\ |
|
38 Secondly, we extend our $\blexersimp$ |
|
39 to support bounded repetitions ($r^{\{n\}}$). |
|
40 We update our formalisation of |
|
41 the correctness and finiteness properties to |
|
42 include this new construct. With bounded repetitions |
|
43 we are able to out-compete other verified lexers such as |
|
44 Verbatim++ on regular expressions which involve a lot of |
|
45 repetitions. We also present the idempotency property proof |
|
46 of $\bsimp$, which leverages the idempotency proof of $\rsimp$. |
|
47 This reinforces our claim that the fixpoint construction |
|
48 originally required by Sulzmann and Lu can be removed in $\blexersimp$. |
|
49 \\ |
|
50 Last but not least, we present our efforts and challenges we met |
|
51 in further improving the algorithm by data |
|
52 structures such as zippers. |
|
53 |
|
54 |
|
55 |
9 %---------------------------------------------------------------------------------------- |
56 %---------------------------------------------------------------------------------------- |
10 % SECTION strongsimp |
57 % SECTION strongsimp |
11 %---------------------------------------------------------------------------------------- |
58 %---------------------------------------------------------------------------------------- |
12 \section{A Stronger Version of Simplification} |
59 \section{A Stronger Version of Simplification} |
13 %TODO: search for isabelle proofs of algorithms that check equivalence |
60 %TODO: search for isabelle proofs of algorithms that check equivalence |
14 In our bit-coded lexing algorithm, with or without simplification, |
61 In our bitcoded lexing algorithm, (sub)terms represent (sub)matches. |
15 two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair |
62 For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms, |
16 are always expressed in the "derivative regular expression" as two |
63 which expresses two possibilities it will match future input. |
17 disjoint alternative terms at the current (sub-)regex level. The fact that they |
|
18 are parallel tells us when we retrieve the information from this (sub-)regex |
|
19 there will always be a choice of which alternative term to take. |
|
20 As an example, the regular expression $aa \cdot a^*+ a \cdot a^*$ (omitting bit-codes) |
|
21 expresses two possibilities it will match future input. |
|
22 It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s |
64 It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s |
23 \begin{figure}\label{string_3parts1} |
65 \begin{figure}\label{string_3parts1} |
24 \begin{center} |
66 \begin{center} |
25 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
67 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] |
26 \node [rectangle split, rectangle split horizontal, rectangle split parts=3] |
68 \node [rectangle split, rectangle split horizontal, rectangle split parts=3] |
157 To implement this idea into an algorithm, we define a "pruning function" |
199 To implement this idea into an algorithm, we define a "pruning function" |
158 $\textit{prune}$, that takes away parts of all terms in $r$ that belongs to |
200 $\textit{prune}$, that takes away parts of all terms in $r$ that belongs to |
159 $\textit{acc}$, which acts as an element |
201 $\textit{acc}$, which acts as an element |
160 is a stronger version of $\textit{distinctBy}$. |
202 is a stronger version of $\textit{distinctBy}$. |
161 Here is a concise version of it (in the style of Scala): |
203 Here is a concise version of it (in the style of Scala): |
162 \begin{verbatim} |
204 \begin{figure}[H] |
|
205 \begin{lstlisting} |
163 def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) : |
206 def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) : |
164 List[ARexp] = rs match { |
207 List[ARexp] = rs match { |
165 case Nil => Nil |
208 case Nil => Nil |
166 case r :: rs => |
209 case r :: rs => |
167 if(acc.contains(erase(r))) |
210 if(acc.contains(erase(r))) |
168 distinctByPlus(rs, acc) |
211 distinctByPlus(rs, acc) |
169 else |
212 else |
170 prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc) |
213 prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc) |
171 } |
214 } |
172 |
215 |
173 \end{verbatim} |
216 \end{lstlisting} |
|
217 \caption{the distinctByPlus function} |
|
218 \end{figure} |
174 But for the function $\textit{prune}$ there is a difficulty: |
219 But for the function $\textit{prune}$ there is a difficulty: |
175 how could one define the $L$ function in a "computable" way, |
220 how could one define the $L$ function in a "computable" way, |
176 so that they generate a (lazy infinite) set of strings in $L(r)$. |
221 so that they generate a (lazy infinite) set of strings in $L(r)$. |
177 We also need a function that tests whether $L(r_1) \subseteq L(r_2)$ |
222 We also need a function that tests whether $L(r_1) \subseteq L(r_2)$ |
178 is true. |
223 is true. |
181 to a bound better than Antimirov's cubic bound). |
226 to a bound better than Antimirov's cubic bound). |
182 Rather, we do not try to eliminate in every instance of regular expressions |
227 Rather, we do not try to eliminate in every instance of regular expressions |
183 that have "language duplicates", but only eliminate the "exact duplicates". |
228 that have "language duplicates", but only eliminate the "exact duplicates". |
184 For this we need to break up terms $(a+b)\cdot c$ into two |
229 For this we need to break up terms $(a+b)\cdot c$ into two |
185 terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator: |
230 terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator: |
186 \begin{figure} |
231 \begin{figure}[H] |
187 \begin{verbatim} |
232 \begin{lstlisting} |
188 def distinctWith(rs: List[ARexp], |
233 def distinctWith(rs: List[ARexp], |
189 pruneFunction: (ARexp, Set[Rexp]) => ARexp, |
234 pruneFunction: (ARexp, Set[Rexp]) => ARexp, |
190 acc: Set[Rexp] = Set()) : List[ARexp] = |
235 acc: Set[Rexp] = Set()) : List[ARexp] = |
191 rs match{ |
236 rs match{ |
192 case Nil => Nil |
237 case Nil => Nil |