14 is the point from which novel contributions of this PhD project are introduced |
14 is the point from which novel contributions of this PhD project are introduced |
15 in detail, |
15 in detail, |
16 and previous |
16 and previous |
17 chapters are essential background work for setting the scene of the formal proof we |
17 chapters are essential background work for setting the scene of the formal proof we |
18 are about to describe. |
18 are about to describe. |
|
19 The proof details are necessary materials for this thesis |
|
20 because it provides necessary context to explain why we need a |
|
21 new framework for the proof of $\blexersimp$, which involves |
|
22 simplifications that cause structural changes to the regular expression. |
|
23 a new formal proof of the correctness of $\blexersimp$, where the |
|
24 proof of $\blexer$ |
|
25 is not applicatble in the sense that we cannot straightforwardly extend the |
|
26 proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does |
|
27 not hold anymore. |
|
28 %This is because the structural induction on the stepwise correctness |
|
29 %of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described |
|
30 %in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to |
|
31 %each other. |
|
32 %In this chapter we introduce simplifications |
|
33 %for annotated regular expressions that can be applied to |
|
34 %each intermediate derivative result. This allows |
|
35 %us to make $\blexer$ much more efficient. |
|
36 %Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, |
|
37 %but their simplification functions could have been more efficient and in some cases needed fixing. |
|
38 |
|
39 |
19 In particular, the correctness theorem |
40 In particular, the correctness theorem |
20 of the un-optimised bit-coded lexer $\blexer$ in |
41 of the un-optimised bit-coded lexer $\blexer$ in |
21 chapter \ref{Bitcoded1} formalised by Ausaf et al. |
42 chapter \ref{Bitcoded1} formalised by Ausaf et al. |
22 relies on lemma \ref{retrieveStepwise} that says |
43 relies on lemma \ref{retrieveStepwise} that says |
23 any value can be retrieved in a stepwise manner: |
44 any value can be retrieved in a stepwise manner: |
24 \begin{center} |
45 \begin{center} |
25 $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ |
46 $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ |
26 \end{center} |
47 \end{center} |
27 This no longer holds once we introduce simplifications. |
48 This no longer holds once we introduce simplifications. |
28 To control the size of regular expressions during derivatives, |
49 Simplifications are necessary to control the size of regular expressions |
29 one has to eliminate redundant sub-expression with some |
50 during derivatives by eliminating redundant |
30 procedure we call $\textit{simp}$, |
51 sub-expression with some procedure we call $\textit{bsimp}$. |
31 and $\textit{simp}$ is defined as |
52 We want to prove the correctness of $\blexersimp$ which integrates |
32 : |
53 $\textit{bsimp}$ by applying it after each call to the derivative: |
33 |
|
34 |
|
35 |
|
36 Having defined the $\textit{bsimp}$ function, |
|
37 we add it as a phase after a derivative is taken. |
|
38 \begin{center} |
|
39 \begin{tabular}{lcl} |
|
40 $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$ |
|
41 \end{tabular} |
|
42 \end{center} |
|
43 %Following previous notations |
|
44 %when extending from derivatives w.r.t.~character to derivative |
|
45 %w.r.t.~string, we define the derivative that nests simplifications |
|
46 %with derivatives:%\comment{simp in the [] case?} |
|
47 We extend this from characters to strings: |
|
48 \begin{center} |
54 \begin{center} |
49 \begin{tabular}{lcl} |
55 \begin{tabular}{lcl} |
50 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\ |
56 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\ |
51 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ |
57 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ |
52 \end{tabular} |
58 \end{tabular} |
53 \end{center} |
|
54 |
|
55 \noindent |
|
56 The lexer that extracts bitcodes from the |
|
57 derivatives with simplifications from our $\simp$ function |
|
58 is called $\blexersimp$: |
|
59 |
|
60 \begin{center} |
|
61 |
|
62 \begin{center} |
|
63 \begin{tabular}{lcl} |
|
64 $r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$ |
|
65 \end{tabular} |
|
66 \end{center} |
|
67 \begin{tabular}{lcl} |
59 \begin{tabular}{lcl} |
68 $\textit{blexer\_simp}\;r\,s$ & $\dn$ & |
60 $\textit{blexer\_simp}\;r\,s$ & $\dn$ & |
69 $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ |
61 $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ |
70 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
62 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
71 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ |
63 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ |
72 & & $\;\;\textit{else}\;\textit{None}$ |
64 & & $\;\;\textit{else}\;\textit{None}$ |
73 \end{tabular} |
65 \end{tabular} |
74 \end{center} |
66 \end{center} |
75 \noindent |
67 \noindent |
76 the redundant sub-expressions after each derivative operation |
68 Previously without $\textit{bsimp}$ the exact structure of each intermediate |
77 allows the exact structure of each intermediate result to be preserved, |
69 regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and |
78 so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and |
70 $\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if |
79 $\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$ |
71 we use the convenient notation $r_{c} \dn r\backslash c$ |
80 and $v^{c} \dn \inj \;r \; c \; v$). |
72 and $v_{r}^{c} \dn \inj \;r \; c \; v$), |
81 |
73 but $\blexersimp$ introduces simplification after the derivative, |
82 |
74 getting us trouble in aligning the pairs: |
83 Define the |
75 \begin{center} |
84 But $\blexersimp$ introduces simplification after the derivative |
76 $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; \textit{bsimp} \; r_c \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ |
85 to reduce redundancy, |
77 \end{center} |
86 yielding $r \backslash c$ |
78 \noindent |
87 This allows |
79 It is quite clear that once we made |
88 |
80 $v$ to align with $\textit{bsimp} \; r_{c}$ |
89 The proof details are necessary materials for this thesis |
81 in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged |
90 because it provides necessary context to explain why we need a |
82 in for the above statement to hold. |
91 new framework for the proof of $\blexersimp$, which involves |
83 Ausaf et al. \cite{AusafUrbanDyckhoff2016} |
92 simplifications that cause structural changes to the regular expression. |
84 made some initial attempts on the un-annotated lexer (to be continued) |
93 a new formal proof of the correctness of $\blexersimp$, where the |
85 |
94 proof of $\blexer$ |
|
95 is not applicatble in the sense that we cannot straightforwardly extend the |
|
96 proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does |
|
97 not hold anymore. |
|
98 This is because the structural induction on the stepwise correctness |
|
99 of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described |
|
100 in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to |
|
101 each other. |
|
102 In this chapter we introduce simplifications |
|
103 for annotated regular expressions that can be applied to |
|
104 each intermediate derivative result. This allows |
|
105 us to make $\blexer$ much more efficient. |
|
106 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, |
|
107 but their simplification functions could have been more efficient and in some cases needed fixing. |
|
108 |
86 |
109 |
87 |
110 |
88 |
111 |
89 |
112 From this chapter we start with the main contribution of this thesis, which |
90 From this chapter we start with the main contribution of this thesis, which |