6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX} |
6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX} |
7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive |
7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive |
8 %simplifications and therefore introduce our version of the bitcoded algorithm and |
8 %simplifications and therefore introduce our version of the bitcoded algorithm and |
9 %its correctness proof in |
9 %its correctness proof in |
10 %Chapter 3\ref{Chapter3}. |
10 %Chapter 3\ref{Chapter3}. |
11 \section{Overview} |
11 %\section{Overview} |
12 \marginpar{\em Added a completely new \\overview section, \\highlighting contributions.} |
12 \marginpar{\em Added a completely new \\overview section, \\highlighting\\ contributions.} |
13 |
13 |
14 This chapter |
14 This chapter |
15 is the point from which novel contributions of this PhD project are introduced |
15 is the point from which novel contributions of this PhD project are introduced |
16 in detail. |
16 in detail. |
17 The material in the |
17 The material in the |
18 previous |
18 previous |
19 chapters is necessary for this thesis, |
19 chapters is necessary for this thesis, |
20 because it provides the context for why we need a new framework for |
20 because it provides the context for why we need a new framework for |
21 the proof of $\blexersimp$. |
21 the proof of $\blexersimp$. |
|
22 |
|
23 We will first introduce why aggressive simplifications are needed, after which we |
|
24 provide our algorithm, contrasting with Sulzmann and Lu's simplifications. |
|
25 We then explain how our simplifications make |
|
26 reusing $\blexer$'s correctness proof impossible. |
|
27 %with some minor modifications |
|
28 We discuss possible fixes such as rectification functions and then introduce our proof, |
|
29 which involves a weaker inductive |
|
30 invariant than that used in the correctness proof of $\blexer$. |
|
31 |
|
32 \marginpar{Shortened overview.} |
22 %material for setting the scene of the formal proof we |
33 %material for setting the scene of the formal proof we |
23 %are about to describe. |
34 %are about to describe. |
24 The fundamental reason is we cannot extend the correctness proof of theorem 4 |
|
25 because lemma 13 does not hold anymore when simplifications are involved. |
|
26 \marginpar{\em rephrased things \\so why new \\proof makes sense.} |
|
27 %The proof details are necessary materials for this thesis |
|
28 %because it provides necessary context to explain why we need a |
|
29 %new framework for the proof of $\blexersimp$, which involves |
|
30 %simplifications that cause structural changes to the regular expression. |
|
31 %A new formal proof of the correctness of $\blexersimp$, where the |
|
32 %proof of $\blexer$ |
|
33 %is not applicatble in the sense that we cannot straightforwardly extend the |
|
34 %proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does |
|
35 %not hold anymore. |
|
36 %This is because the structural induction on the stepwise correctness |
|
37 %of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described |
|
38 %in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to |
|
39 %each other. |
|
40 %In this chapter we introduce simplifications |
|
41 %for annotated regular expressions that can be applied to |
|
42 %each intermediate derivative result. This allows |
|
43 %us to make $\blexer$ much more efficient. |
|
44 %Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, |
|
45 %but their simplification functions could have been more efficient and in some cases needed fixing. |
|
46 |
|
47 |
|
48 In particular, the correctness theorem |
|
49 of the un-optimised bit-coded lexer $\blexer$ in |
|
50 chapter \ref{Bitcoded1} formalised by Ausaf et al. |
|
51 relies crucially on lemma \ref{retrieveStepwise} that says |
|
52 any value can be retrieved in a stepwise manner, namely: |
|
53 \begin{equation}\label{eq:stepwise}%eqref: this proposition needs to be referred |
|
54 \vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v) |
|
55 \end{equation} |
|
56 %This no longer holds once we introduce simplifications. |
|
57 Simplifications are necessary to control the size of derivatives, |
|
58 but they also destroy the structures of the regular expressions |
|
59 such that \ref{eq:stepwise} does not hold. |
|
60 |
|
61 |
|
62 We want to prove the correctness of $\blexersimp$ which integrates |
|
63 $\textit{bsimp}$ by applying it after each call to the derivative: |
|
64 \begin{center} |
|
65 \begin{tabular}{lcl} |
|
66 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\ |
|
67 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ |
|
68 \end{tabular} |
|
69 \begin{tabular}{lcl} |
|
70 $\textit{blexer\_simp}\;r\,s$ & $\dn$ & |
|
71 $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ |
|
72 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
|
73 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ |
|
74 & & $\;\;\textit{else}\;\textit{None}$ |
|
75 \end{tabular} |
|
76 \end{center} |
|
77 \noindent |
|
78 Previously without $\textit{bsimp}$ the exact structure of each intermediate |
|
79 regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and |
|
80 $\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if |
|
81 we use the convenient notation $r_{c} \dn r\backslash c$ |
|
82 and $v_{r}^{c} \dn \inj \;r \; c \; v$), |
|
83 but $\blexersimp$ introduces simplification after the derivative, |
|
84 making it difficult to align the pairs: |
|
85 \begin{center} |
|
86 $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; (\textit{bsimp} \; r_c) \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ |
|
87 \end{center} |
|
88 \noindent |
|
89 It is clear that once we made |
|
90 $v$ to align with $\textit{bsimp} \; r_{c}$ |
|
91 in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged |
|
92 in for the above statement to hold. |
|
93 Ausaf et al. \cite{AusafUrbanDyckhoff2016} |
|
94 made some initial attempts with this idea, see \cite{FahadThesis} |
|
95 for details. |
|
96 |
|
97 They added |
|
98 and then rectify it to |
|
99 |
|
100 |
|
101 this works fine, however that limits the kind of simplifications you can introduce. |
|
102 We cannot use their idea for our very strong simplification rules. |
|
103 Therefore we take our route |
|
104 a wea |
|
105 |
|
106 The other route is to dispose of lemma \ref{retrieveStepwise}, |
|
107 and prove a weakened inductive invariant instead. |
|
108 We adopt this approach in this thesis. |
|
109 |
|
110 Let us first explain why the requirement in $\blexer$'s proof |
|
111 is too strong, and suggest a few possible fixes, which leads to |
|
112 our proof which we believe was the most natural and effective method. |
|
113 |
|
114 |
|
115 |
|
116 \section{Why Lemma \ref{retrieveStepwise}'s Requirement is too Strong} |
|
117 |
|
118 %From this chapter we start with the main contribution of this thesis, which |
|
119 |
|
120 The $\blexer$ proof relies on a lockstep POSIX |
|
121 correspondence between the lexical value and the |
|
122 regular expression in each derivative and injection. |
|
123 If we zoom into the diagram \ref{graph:inj} and look specifically at |
|
124 the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating |
|
125 the invariant that the same bitcodes can be extracted from the pairs: |
|
126 \tikzset{three sided/.style={ |
|
127 draw=none, |
|
128 append after command={ |
|
129 [-,shorten <= -0.5\pgflinewidth] |
|
130 ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
131 edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
132 ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
133 edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
134 ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
135 edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
136 } |
|
137 } |
|
138 } |
|
139 |
|
140 \tikzset{three sided1/.style={ |
|
141 draw=none, |
|
142 append after command={ |
|
143 [-,shorten <= -0.5\pgflinewidth] |
|
144 ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
145 edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
146 ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
147 edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
148 ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
149 edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
150 } |
|
151 } |
|
152 } |
|
153 |
|
154 \begin{center} |
|
155 \begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick] |
|
156 \node [rectangle ] (1) at (-7, 2) {$\ldots$}; |
|
157 \node [rectangle, draw] (2) at (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$}; |
|
158 \node [rectangle, draw] (3) at (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_0a+_1aa)^*$}; |
|
159 \node [rectangle] (4) at (9, 2) {$\ldots$}; |
|
160 \node [rectangle] (5) at (-7, -2) {$\ldots$}; |
|
161 \node [rectangle, draw] (6) at (-4, -2) {$v_i = \Stars \; [\Left (a)]$}; |
|
162 \node [rectangle, draw] (7) at ( 4, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$}; |
|
163 \node [rectangle] (8) at ( 9, -2) {$\ldots$}; |
|
164 \node [rectangle] (9) at (-7, -6) {$\ldots$}; |
|
165 \node [rectangle, draw] (10) at (-4, -6) {$\textit{bits}_{i} = \retrieve \; r_i\;v_i$}; |
|
166 \node [rectangle, draw] (11) at (4, -6) {$\textit{bits}_{i+1} = \retrieve \; r_{i+1}\;v_{i+1}$}; |
|
167 \node [rectangle] (12) at (9, -6) {$\ldots$}; |
|
168 |
|
169 |
|
170 \path (1) edge [] node {} (2); |
|
171 \path (5) edge [] node {} (6); |
|
172 \path (9) edge [] node {} (10); |
|
173 |
|
174 \path (11) edge [] node {} (12); |
|
175 \path (7) edge [] node {} (8); |
|
176 \path (3) edge [] node {} (4); |
|
177 |
|
178 |
|
179 \path (2) edge [] node {$\backslash a$} (3); |
|
180 \path (2) edge [dashed, <->] node {$\vdash v_i : r_i$} (6); |
|
181 \path (3) edge [dashed, <->] node {$\vdash v_{i+1} : r_{i+1}$} (7); |
|
182 %\path (6) edge [] node {$\vdash v_i : r_i$} (10); |
|
183 %\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11); |
|
184 |
|
185 \path (10) edge [dashed, <->] node {$=$} (11); |
|
186 |
|
187 \path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6); |
|
188 |
|
189 % \node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$}; |
|
190 % \node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$}; |
|
191 % \path (r) |
|
192 % edge [] node {$\internalise$} (a); |
|
193 % \node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$}; |
|
194 % \path (a) |
|
195 % edge [] node {$\backslash a$} (a1); |
|
196 % |
|
197 % \node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$}; |
|
198 % \node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$}; |
|
199 % \path (a1) |
|
200 % edge [] node {$\backslash a$} (a21); |
|
201 % \node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$}; |
|
202 % \path (a22) |
|
203 % edge [] node {$\backslash b$} (a3); |
|
204 % \path (a21) |
|
205 % edge [dashed, bend right] node {} (a3); |
|
206 % \node [rectangle, draw] (bs) at (2, 4) {$ZSZ$}; |
|
207 % \path (a3) |
|
208 % edge [below] node {$\bmkeps$} (bs); |
|
209 % \node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$}; |
|
210 % \path (bs) |
|
211 % edge [] node {$\decode$} (v); |
|
212 |
|
213 |
|
214 \end{tikzpicture} |
|
215 %\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$} |
|
216 \end{center} |
|
217 |
|
218 When simplifications are added, the inhabitation relation no longer holds, |
|
219 causing the above diagram to break. |
|
220 |
|
221 Ausaf addressed this with an augmented lexer he called $\textit{slexer}$. |
|
222 |
|
223 |
|
224 |
|
225 we note that the invariant |
|
226 $\vdash v_{i+1}: r_{i+1} \implies \retrieve \; r_{i+1} \; v_{i+1} $ is too strong |
|
227 to maintain because the precondition $\vdash v_i : r_i$ is too weak. |
|
228 It does not require $v_i$ to be a POSIX value |
|
229 |
|
230 |
|
231 {\color{red} \rule{\linewidth}{0.5mm}} |
|
232 New content ends |
|
233 {\color{red} \rule{\linewidth}{0.5mm}} |
|
234 |
|
235 |
|
236 |
|
237 % |
|
238 % |
|
239 %which is essential for getting an understanding this thesis |
|
240 %in chapter \ref{Bitcoded1}, which is necessary for understanding why |
|
241 %the proof |
|
242 % |
|
243 %In this chapter, |
|
244 |
|
245 %We contrast our simplification function |
|
246 %with Sulzmann and Lu's, indicating the simplicity of our algorithm. |
|
247 %This is another case for the usefulness |
|
248 %and reliability of formal proofs on algorithms. |
|
249 %These ``aggressive'' simplifications would not be possible in the injection-based |
|
250 %lexing we introduced in chapter \ref{Inj}. |
|
251 %We then prove the correctness with the improved version of |
|
252 %$\blexer$, called $\blexersimp$, by establishing |
|
253 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system. |
|
254 % |
|
255 \section{Simplifications by Sulzmann and Lu} |
35 \section{Simplifications by Sulzmann and Lu} |
|
36 \marginpar{moved \\simplification \\section to front \\to make coherent\\ sense.} |
256 The algorithms $\lexer$ and $\blexer$ work beautifully as functional |
37 The algorithms $\lexer$ and $\blexer$ work beautifully as functional |
257 programs, but not as practical code. One main reason for the slowness is due |
38 programs, but not as practical code. One main reason for the slowness is due |
258 to the size of intermediate representations--the derivative regular expressions |
39 to the size of intermediate representations--the derivative regular expressions |
259 tend to grow unbounded if the matching involved a large number of possible matches. |
40 tend to grow unbounded if the matching involved a large number of possible matches. |
260 Consider the derivatives of the following example $(a^*a^*)^*$: |
41 Consider the derivatives of the following example $(a^*a^*)^*$: |
746 Given the size difference, it is not |
532 Given the size difference, it is not |
747 surprising that our $\blexersimp$ significantly outperforms |
533 surprising that our $\blexersimp$ significantly outperforms |
748 $\textit{blexer\_SLSimp}$ by Sulzmann and Lu. |
534 $\textit{blexer\_SLSimp}$ by Sulzmann and Lu. |
749 In the next section we are going to establish that our |
535 In the next section we are going to establish that our |
750 simplification preserves the correctness of the algorithm. |
536 simplification preserves the correctness of the algorithm. |
|
537 |
|
538 |
|
539 |
|
540 |
|
541 \section{Why $\textit{Blexer}$'s Proof Does Not Work} |
|
542 The fundamental reason is we cannot extend the correctness proof of theorem 4 |
|
543 because lemma 13 does not hold anymore when simplifications are involved. |
|
544 \marginpar{\em rephrased things \\so why new \\proof makes sense.} |
|
545 %The proof details are necessary materials for this thesis |
|
546 %because it provides necessary context to explain why we need a |
|
547 %new framework for the proof of $\blexersimp$, which involves |
|
548 %simplifications that cause structural changes to the regular expression. |
|
549 %A new formal proof of the correctness of $\blexersimp$, where the |
|
550 %proof of $\blexer$ |
|
551 %is not applicatble in the sense that we cannot straightforwardly extend the |
|
552 %proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does |
|
553 %not hold anymore. |
|
554 %This is because the structural induction on the stepwise correctness |
|
555 %of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described |
|
556 %in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to |
|
557 %each other. |
|
558 %In this chapter we introduce simplifications |
|
559 %for annotated regular expressions that can be applied to |
|
560 %each intermediate derivative result. This allows |
|
561 %us to make $\blexer$ much more efficient. |
|
562 %Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, |
|
563 %but their simplification functions could have been more efficient and in some cases needed fixing. |
|
564 |
|
565 |
|
566 In particular, the correctness theorem |
|
567 of the un-optimised bit-coded lexer $\blexer$ in |
|
568 chapter \ref{Bitcoded1} formalised by Ausaf et al. |
|
569 relies crucially on lemma \ref{retrieveStepwise} that says |
|
570 any value can be retrieved in a stepwise manner, namely: |
|
571 \begin{equation}\label{eq:stepwise}%eqref: this proposition needs to be referred |
|
572 \vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v) |
|
573 \end{equation} |
|
574 %This no longer holds once we introduce simplifications. |
|
575 Simplifications are necessary to control the size of derivatives, |
|
576 but they also destroy the structures of the regular expressions |
|
577 such that \ref{eq:stepwise} does not hold. |
|
578 |
|
579 |
|
580 We want to prove the correctness of $\blexersimp$ which integrates |
|
581 $\textit{bsimp}$ by applying it after each call to the derivative: |
|
582 \begin{center} |
|
583 \begin{tabular}{lcl} |
|
584 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\ |
|
585 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ |
|
586 \end{tabular} |
|
587 \begin{tabular}{lcl} |
|
588 $\textit{blexer\_simp}\;r\,s$ & $\dn$ & |
|
589 $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ |
|
590 & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ |
|
591 & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ |
|
592 & & $\;\;\textit{else}\;\textit{None}$ |
|
593 \end{tabular} |
|
594 \end{center} |
|
595 \noindent |
|
596 Previously without $\textit{bsimp}$ the exact structure of each intermediate |
|
597 regular expression is preserved, allowing pairs of inhabitation relations |
|
598 in the form $\vdash v : r \backslash c $ and |
|
599 $\vdash \inj \; r\; c \; v : r $ to hold in \ref{eq:stepwise}. |
|
600 But $\blexersimp$ introduces simplification after the derivative, |
|
601 making it difficult to align the structures of values and regular expressions. |
|
602 If we change the form of property \ref{eq:stepwise} to |
|
603 adapt to the needs of $\blexersimp$ the precondition of becomes |
|
604 \[ |
|
605 \vdash v : (\textit{bsimp} \; (r\backslash c)) |
|
606 \] |
|
607 The inhabitation relation of the other pair no longer holds, |
|
608 because $\inj$ does not work on the simplified value $v$ |
|
609 and the unsimplified regular expression $r$. |
|
610 The retrieve function will not work either. |
|
611 \[ |
|
612 \vdash \inj \; r \; c \; v : r |
|
613 \] |
|
614 It seems unclear what procedures needs to be |
|
615 used to create a new value $v_?$ such that |
|
616 \[ |
|
617 \vdash v_? : r \; \text{and} \; \retrieve \; r \; v_? = \retrieve \; (\textit{bsimp} \; (r\backslash c)) \; v |
|
618 \] |
|
619 hold. |
|
620 %It is clear that once we made |
|
621 %$v$ to align with $\textit{bsimp} \; r_{c}$ |
|
622 %in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged |
|
623 %in for the above statement to hold. |
|
624 Ausaf et al. \cite{AusafDyckhoffUrban2016} |
|
625 used something they call rectification functions to restore the original value from the simplified value. |
|
626 The idea is that simplification functions not only returns a regular expression, |
|
627 but also a rectification function |
|
628 \[ |
|
629 \textit{simp}^{rect} : Regex \Rightarrow (Value \Rightarrow Value, Regex) |
|
630 %\textit{frect} : Value \Rightarrow Value |
|
631 \] |
|
632 that is recorded recursively, |
|
633 and then applied to the previous value |
|
634 to obtain the correct value for $\inj$ to work on. |
|
635 The recursive case of the lexer is defined as something like |
|
636 \[ |
|
637 \textit{slexer} \; r \; (c\!::\!s) \dn let \;(\textit{frect}, r_c) = \textit{simp}^{rect} \;(r \backslash c) \;\; \textit{in}\;\; |
|
638 \inj \; r \; c \; (\textit{frect} \; (\textit{slexer} \; r_c\; s)) |
|
639 %\textit{match} \; s \; \textit{case} [ |
|
640 \] |
|
641 However this approach (including $\textit{slexer}$'s correctness proof) only |
|
642 works without bitcodes, and it limits the kind of simplifications one can introduce. |
|
643 %and they have not yet extended their relatively simple simplifications |
|
644 %to more aggressive ones. |
|
645 See the thesis by Ausaf \cite{Ausaf} |
|
646 for details. |
|
647 |
|
648 %\begin{center} |
|
649 % $\vdash v: (r\backslash c) \implies \retrieve \; (\mathord{?}(\textit{bsimp} \; r_c)) \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ |
|
650 %\end{center} |
|
651 %\noindent |
|
652 |
|
653 We were not able to use their idea for our very strong simplification rules. |
|
654 Therefore we are taking another route that completely |
|
655 disposes of lemma \ref{retrieveStepwise}, |
|
656 and prove a weakened inductive invariant instead. |
|
657 |
|
658 Let us first explain why lemma \ref{retrieveStepwise}'s requirement |
|
659 is too strong, and suggest a few possible fixes, which leads to |
|
660 our proof which we believe was the most natural and effective method. |
|
661 |
|
662 |
|
663 |
|
664 \section{Why Lemma \ref{retrieveStepwise}'s Requirement is too Strong} |
|
665 |
|
666 %From this chapter we start with the main contribution of this thesis, which |
|
667 |
|
668 The $\blexer$ proof relies on a lockstep POSIX |
|
669 correspondence between the lexical value and the |
|
670 regular expression in each derivative and injection. |
|
671 If we zoom into the diagram \ref{graph:inj} and look specifically at |
|
672 the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating |
|
673 the invariant that the same bitcodes can be extracted from the pairs: |
|
674 \tikzset{three sided/.style={ |
|
675 draw=none, |
|
676 append after command={ |
|
677 [-,shorten <= -0.5\pgflinewidth] |
|
678 ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
679 edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
680 ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
681 edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
682 ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
683 edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
684 } |
|
685 } |
|
686 } |
|
687 |
|
688 \tikzset{three sided1/.style={ |
|
689 draw=none, |
|
690 append after command={ |
|
691 [-,shorten <= -0.5\pgflinewidth] |
|
692 ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
693 edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
694 ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
695 edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
696 ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
697 edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
698 } |
|
699 } |
|
700 } |
|
701 |
|
702 \begin{center} |
|
703 \begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick] |
|
704 %\node [rectangle ] (1) at (-7, 2) {$\ldots$}; |
|
705 %\node [rectangle, draw] (2) at (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$}; |
|
706 %\node [rectangle, draw] (3) at (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$}; |
|
707 %\node [rectangle] (4) at (9, 2) {$\ldots$}; |
|
708 %\node [rectangle] (5) at (-7, -2) {$\ldots$}; |
|
709 %\node [rectangle, draw] (6) at (-4, -2) {$v_i = \Stars \; [\Left (a)]$}; |
|
710 %\node [rectangle, draw] (7) at ( 4, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$}; |
|
711 %\node [rectangle] (8) at ( 9, -2) {$\ldots$}; |
|
712 %\node [rectangle] (9) at (-7, -6) {$\ldots$}; |
|
713 %\node [rectangle, draw] (10) at (-4, -6) {$\textit{bits}_{i} = bs' @ ZZS$}; |
|
714 %\node [rectangle, draw] (11) at (4, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$}; |
|
715 %\node [rectangle] (12) at (9, -6) {$\ldots$}; |
|
716 |
|
717 \node [rectangle ] (1) at (-8, 2) {$\ldots$}; |
|
718 \node [rectangle, draw] (2) at (-5, 2) {$r_i = _{bs'}(_Za+_Saa)^*$}; |
|
719 \node [rectangle, draw] (3) at (3, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$}; |
|
720 \node [rectangle] (4) at (8, 2) {$\ldots$}; |
|
721 \node [rectangle] (5) at (-8, -2) {$\ldots$}; |
|
722 \node [rectangle, draw] (6) at (-5, -2) {$v_i = \Stars \; [\Left (a)]$}; |
|
723 \node [rectangle, draw] (7) at ( 3, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$}; |
|
724 \node [rectangle] (8) at ( 8, -2) {$\ldots$}; |
|
725 \node [rectangle] (9) at (-8, -6) {$\ldots$}; |
|
726 \node [rectangle, draw] (10) at (-5, -6) {$\textit{bits}_{i} = bs' @ ZZS$}; |
|
727 \node [rectangle, draw] (11) at (3, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$}; |
|
728 \node [rectangle] (12) at (8, -6) {$\ldots$}; |
|
729 |
|
730 |
|
731 |
|
732 \path (1) edge [] node {} (2); |
|
733 \path (5) edge [] node {} (6); |
|
734 \path (9) edge [] node {} (10); |
|
735 |
|
736 \path (11) edge [] node {} (12); |
|
737 \path (7) edge [] node {} (8); |
|
738 \path (3) edge [] node {} (4); |
|
739 |
|
740 |
|
741 \path (6) edge [dashed,bend right = 30] node {$\retrieve \; r_i \; v_i$} (10); |
|
742 \path (2) edge [dashed,bend left = 48] node {} (10); |
|
743 |
|
744 \path (7) edge [dashed,bend right = 30] node {$\retrieve \; r_{i+1} \; v_{i+1}$} (11); |
|
745 \path (3) edge [dashed,bend left = 45] node {} (11); |
|
746 |
|
747 \path (2) edge [] node {$\backslash a$} (3); |
|
748 \path (2) edge [dashed, <->] node {$\vdash v_i : r_i$} (6); |
|
749 \path (3) edge [dashed, <->] node {$\vdash v_{i+1} : r_{i+1}$} (7); |
|
750 %\path (6) edge [] node {$\vdash v_i : r_i$} (10); |
|
751 %\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11); |
|
752 |
|
753 \path (10) edge [dashed, <->] node {$=$} (11); |
|
754 |
|
755 \path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6); |
|
756 |
|
757 % \node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$}; |
|
758 % \node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$}; |
|
759 % \path (r) |
|
760 % edge [] node {$\internalise$} (a); |
|
761 % \node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$}; |
|
762 % \path (a) |
|
763 % edge [] node {$\backslash a$} (a1); |
|
764 % |
|
765 % \node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$}; |
|
766 % \node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$}; |
|
767 % \path (a1) |
|
768 % edge [] node {$\backslash a$} (a21); |
|
769 % \node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$}; |
|
770 % \path (a22) |
|
771 % edge [] node {$\backslash b$} (a3); |
|
772 % \path (a21) |
|
773 % edge [dashed, bend right] node {} (a3); |
|
774 % \node [rectangle, draw] (bs) at (2, 4) {$ZSZ$}; |
|
775 % \path (a3) |
|
776 % edge [below] node {$\bmkeps$} (bs); |
|
777 % \node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$}; |
|
778 % \path (bs) |
|
779 % edge [] node {$\decode$} (v); |
|
780 |
|
781 |
|
782 \end{tikzpicture} |
|
783 %\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$} |
|
784 \end{center} |
|
785 |
|
786 When simplifications are added, the inhabitation relation no longer holds, |
|
787 causing the above diagram to break. |
|
788 |
|
789 Ausaf addressed this with an augmented lexer he called $\textit{slexer}$. |
|
790 |
|
791 |
|
792 |
|
793 we note that the invariant |
|
794 $\vdash v_{i+1}: r_{i+1} \implies \retrieve \; r_{i+1} \; v_{i+1} $ is too strong |
|
795 to maintain because the precondition $\vdash v_i : r_i$ is too weak. |
|
796 It does not require $v_i$ to be a POSIX value |
|
797 |
|
798 |
|
799 {\color{red} \rule{\linewidth}{0.5mm}} |
|
800 New content ends |
|
801 {\color{red} \rule{\linewidth}{0.5mm}} |
|
802 |
|
803 |
|
804 |
|
805 % |
|
806 % |
|
807 %which is essential for getting an understanding this thesis |
|
808 %in chapter \ref{Bitcoded1}, which is necessary for understanding why |
|
809 %the proof |
|
810 % |
|
811 %In this chapter, |
|
812 |
|
813 %We contrast our simplification function |
|
814 %with Sulzmann and Lu's, indicating the simplicity of our algorithm. |
|
815 %This is another case for the usefulness |
|
816 %and reliability of formal proofs on algorithms. |
|
817 %These ``aggressive'' simplifications would not be possible in the injection-based |
|
818 %lexing we introduced in chapter \ref{Inj}. |
|
819 %We then prove the correctness with the improved version of |
|
820 %$\blexer$, called $\blexersimp$, by establishing |
|
821 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system. |
|
822 % |
|
823 |
|
824 |
751 %---------------------------------------------------------------------------------------- |
825 %---------------------------------------------------------------------------------------- |
752 % SECTION rewrite relation |
826 % SECTION rewrite relation |
753 %---------------------------------------------------------------------------------------- |
827 %---------------------------------------------------------------------------------------- |
754 \section{Correctness of $\blexersimp$} |
828 \section{Correctness of $\blexersimp$} |
755 We first introduce the rewriting relation \emph{rrewrite} |
829 We first introduce the rewriting relation \emph{rrewrite} |