6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX} |
6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX} |
7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive |
7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive |
8 %simplifications and therefore introduce our version of the bitcoded algorithm and |
8 %simplifications and therefore introduce our version of the bitcoded algorithm and |
9 %its correctness proof in |
9 %its correctness proof in |
10 %Chapter 3\ref{Chapter3}. |
10 %Chapter 3\ref{Chapter3}. |
11 |
11 \section{Overview} |
12 |
12 |
13 |
13 This chapter |
|
14 is the point from which novel contributions of this PhD project are introduced |
|
15 in detail, |
|
16 and previous |
|
17 chapters are essential background work for setting the scene of the formal proof we |
|
18 are about to describe. |
|
19 In particular, the correctness theorem |
|
20 of the un-optimised bit-coded lexer $\blexer$ in |
|
21 chapter \ref{Bitcoded1} formalised by Ausaf et al. |
|
22 relies on lemma \ref{retrieveStepwise} that says |
|
23 any value can be retrieved in a stepwise manner: |
|
24 \begin{center} |
|
25 $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ |
|
26 \end{center} |
|
27 This no longer holds once we introduce simplifications. |
|
28 To control the size of regular expressions during derivatives, |
|
29 one has to eliminate redundant sub-expression with some |
|
30 procedure we call $\textit{simp}$, |
|
31 and $\textit{simp}$ is defined as |
|
32 : |
|
33 |
|
34 |
|
35 |
|
36 Having defined the $\textit{bsimp}$ function, |
|
37 we add it as a phase after a derivative is taken. |
|
38 \begin{center} |
|
39 \begin{tabular}{lcl} |
|
40 $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$ |
|
41 \end{tabular} |
|
42 \end{center} |
|
43 %Following previous notations |
|
44 %when extending from derivatives w.r.t.~character to derivative |
|
45 %w.r.t.~string, we define the derivative that nests simplifications |
|
46 %with derivatives:%\comment{simp in the [] case?} |
|
47 We extend this from characters to strings: |
|
48 \begin{center} |
|
49 \begin{tabular}{lcl} |
|
50 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\ |
|
51 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ |
|
52 \end{tabular} |
|
53 \end{center} |
|
54 |
|
55 \noindent |
|
56 The lexer that extracts bitcodes from the |
|
57 derivatives with simplifications from our $\simp$ function |
|
58 is called $\blexersimp$: |
|
59 |
|
60 \begin{center} |
|
61 |
|
62 \begin{center} |
|
63 \begin{tabular}{lcl} |
|
64 $r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$ |
|
65 \end{tabular} |
|
66 \end{center} |
|
67 \begin{tabular}{lcl} |
|
68 $\textit{blexer\_simp}\;r\,s$ & $\dn$ & |
|
69 $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ |
|
70 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
|
71 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ |
|
72 & & $\;\;\textit{else}\;\textit{None}$ |
|
73 \end{tabular} |
|
74 \end{center} |
|
75 \noindent |
|
76 the redundant sub-expressions after each derivative operation |
|
77 allows the exact structure of each intermediate result to be preserved, |
|
78 so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and |
|
79 $\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$ |
|
80 and $v^{c} \dn \inj \;r \; c \; v$). |
|
81 |
|
82 |
|
83 Define the |
|
84 But $\blexersimp$ introduces simplification after the derivative |
|
85 to reduce redundancy, |
|
86 yielding $r \backslash c$ |
|
87 This allows |
|
88 |
|
89 The proof details are necessary materials for this thesis |
|
90 because it provides necessary context to explain why we need a |
|
91 new framework for the proof of $\blexersimp$, which involves |
|
92 simplifications that cause structural changes to the regular expression. |
|
93 a new formal proof of the correctness of $\blexersimp$, where the |
|
94 proof of $\blexer$ |
|
95 is not applicatble in the sense that we cannot straightforwardly extend the |
|
96 proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does |
|
97 not hold anymore. |
|
98 This is because the structural induction on the stepwise correctness |
|
99 of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described |
|
100 in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to |
|
101 each other. |
14 In this chapter we introduce simplifications |
102 In this chapter we introduce simplifications |
15 for annotated regular expressions that can be applied to |
103 for annotated regular expressions that can be applied to |
16 each intermediate derivative result. This allows |
104 each intermediate derivative result. This allows |
17 us to make $\blexer$ much more efficient. |
105 us to make $\blexer$ much more efficient. |
18 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, |
106 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, |
19 but their simplification functions could have been more efficient and in some cases needed fixing. |
107 but their simplification functions could have been more efficient and in some cases needed fixing. |
|
108 |
|
109 |
|
110 |
|
111 |
|
112 From this chapter we start with the main contribution of this thesis, which |
|
113 |
|
114 o |
|
115 In particular, the $\blexer$ proof relies on a lockstep POSIX |
|
116 correspondence between the lexical value and the |
|
117 regular expression in each derivative and injection. |
|
118 |
|
119 |
|
120 which is essential for getting an understanding this thesis |
|
121 in chapter \ref{Bitcoded1}, which is necessary for understanding why |
|
122 the proof |
|
123 |
|
124 In this chapter, |
|
125 |
20 %We contrast our simplification function |
126 %We contrast our simplification function |
21 %with Sulzmann and Lu's, indicating the simplicity of our algorithm. |
127 %with Sulzmann and Lu's, indicating the simplicity of our algorithm. |
22 %This is another case for the usefulness |
128 %This is another case for the usefulness |
23 %and reliability of formal proofs on algorithms. |
129 %and reliability of formal proofs on algorithms. |
24 %These ``aggressive'' simplifications would not be possible in the injection-based |
130 %These ``aggressive'' simplifications would not be possible in the injection-based |
26 %We then prove the correctness with the improved version of |
132 %We then prove the correctness with the improved version of |
27 %$\blexer$, called $\blexersimp$, by establishing |
133 %$\blexer$, called $\blexersimp$, by establishing |
28 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system. |
134 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system. |
29 % |
135 % |
30 \section{Simplifications by Sulzmann and Lu} |
136 \section{Simplifications by Sulzmann and Lu} |
|
137 The algorithms $\lexer$ and $\blexer$ work beautifully as functional |
|
138 programs, but not as practical code. One main reason for the slowness is due |
|
139 to the size of intermediate representations--the derivative regular expressions |
|
140 tend to grow unbounded if the matching involved a large number of possible matches. |
31 Consider the derivatives of the following example $(a^*a^*)^*$: |
141 Consider the derivatives of the following example $(a^*a^*)^*$: |
32 %and $(a^* + (aa)^*)^*$: |
142 %and $(a^* + (aa)^*)^*$: |
33 \begin{center} |
143 \begin{center} |
34 \begin{tabular}{lcl} |
144 \begin{tabular}{lcl} |
35 $(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ & |
145 $(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ & |