9 %its correctness proof in |
9 %its correctness proof in |
10 %Chapter 3\ref{Chapter3}. |
10 %Chapter 3\ref{Chapter3}. |
11 {\color{red} \rule{\linewidth}{0.5mm}} |
11 {\color{red} \rule{\linewidth}{0.5mm}} |
12 New content starts. |
12 New content starts. |
13 {\color{red} \rule{\linewidth}{0.5mm}} |
13 {\color{red} \rule{\linewidth}{0.5mm}} |
|
14 \marginpar{\em Added a completely new overview section, highlighting the contributions.} |
14 |
15 |
15 \section{Overview} |
16 \section{Overview} |
16 |
17 |
17 This chapter |
18 This chapter |
18 is the point from which novel contributions of this PhD project are introduced |
19 is the point from which novel contributions of this PhD project are introduced |
19 in detail, |
20 in detail. |
20 and previous |
21 The material in the |
21 chapters are essential background work for setting the scene of the formal proof we |
22 previous |
|
23 chapters is necessary |
|
24 material for setting the scene of the formal proof we |
22 are about to describe. |
25 are about to describe. |
|
26 |
|
27 The fundamental reason is we cannot extend the correctness proof of theorem 4, |
|
28 because lemma 13 does not hold anymore when simplifications are involved. |
|
29 |
23 The proof details are necessary materials for this thesis |
30 The proof details are necessary materials for this thesis |
24 because it provides necessary context to explain why we need a |
31 because it provides necessary context to explain why we need a |
25 new framework for the proof of $\blexersimp$, which involves |
32 new framework for the proof of $\blexersimp$, which involves |
26 simplifications that cause structural changes to the regular expression. |
33 simplifications that cause structural changes to the regular expression. |
27 a new formal proof of the correctness of $\blexersimp$, where the |
34 A new formal proof of the correctness of $\blexersimp$, where the |
28 proof of $\blexer$ |
35 proof of $\blexer$ |
29 is not applicatble in the sense that we cannot straightforwardly extend the |
36 is not applicatble in the sense that we cannot straightforwardly extend the |
30 proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does |
37 proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does |
31 not hold anymore. |
38 not hold anymore. |
32 %This is because the structural induction on the stepwise correctness |
39 %This is because the structural induction on the stepwise correctness |
42 |
49 |
43 |
50 |
44 In particular, the correctness theorem |
51 In particular, the correctness theorem |
45 of the un-optimised bit-coded lexer $\blexer$ in |
52 of the un-optimised bit-coded lexer $\blexer$ in |
46 chapter \ref{Bitcoded1} formalised by Ausaf et al. |
53 chapter \ref{Bitcoded1} formalised by Ausaf et al. |
47 relies on lemma \ref{retrieveStepwise} that says |
54 relies crucially on lemma \ref{retrieveStepwise} that says |
48 any value can be retrieved in a stepwise manner: |
55 any value can be retrieved in a stepwise manner, namely: |
49 \begin{center} |
56 \begin{center}%eqref: this proposition needs to be referred |
50 $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ |
57 $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ |
51 \end{center} |
58 \end{center} |
52 This no longer holds once we introduce simplifications. |
59 This no longer holds once we introduce simplifications. |
53 Simplifications are necessary to control the size of regular expressions |
60 Simplifications are necessary to control the size of derivatives |
54 during derivatives by eliminating redundant |
61 |
55 sub-expression with some procedure we call $\textit{bsimp}$. |
62 |
56 We want to prove the correctness of $\blexersimp$ which integrates |
63 We want to prove the correctness of $\blexersimp$ which integrates |
57 $\textit{bsimp}$ by applying it after each call to the derivative: |
64 $\textit{bsimp}$ by applying it after each call to the derivative: |
58 \begin{center} |
65 \begin{center} |
59 \begin{tabular}{lcl} |
66 \begin{tabular}{lcl} |
60 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\ |
67 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\ |
73 regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and |
80 regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and |
74 $\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if |
81 $\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if |
75 we use the convenient notation $r_{c} \dn r\backslash c$ |
82 we use the convenient notation $r_{c} \dn r\backslash c$ |
76 and $v_{r}^{c} \dn \inj \;r \; c \; v$), |
83 and $v_{r}^{c} \dn \inj \;r \; c \; v$), |
77 but $\blexersimp$ introduces simplification after the derivative, |
84 but $\blexersimp$ introduces simplification after the derivative, |
78 getting us trouble in aligning the pairs: |
85 making it difficult to align the pairs: |
79 \begin{center} |
86 \begin{center} |
80 $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; \textit{bsimp} \; r_c \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ |
87 $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; (\textit{bsimp} \; r_c) \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ |
81 \end{center} |
88 \end{center} |
82 \noindent |
89 \noindent |
83 It is quite clear that once we made |
90 It is clear that once we made |
84 $v$ to align with $\textit{bsimp} \; r_{c}$ |
91 $v$ to align with $\textit{bsimp} \; r_{c}$ |
85 in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged |
92 in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged |
86 in for the above statement to hold. |
93 in for the above statement to hold. |
87 Ausaf et al. \cite{AusafUrbanDyckhoff2016} |
94 Ausaf et al. \cite{AusafUrbanDyckhoff2016} |
88 made some initial attempts with this idea, see \cite{FahadThesis} |
95 made some initial attempts with this idea, see \cite{FahadThesis} |
89 for details. |
96 for details. |
|
97 |
|
98 They added |
|
99 and then rectify it to |
|
100 |
|
101 |
|
102 this works fine, however that limits the kind of simplifications you can introduce. |
|
103 We cannot use their idea for our very strong simplification rules. |
|
104 Therefore we take our route |
|
105 a wea |
|
106 |
90 The other route is to dispose of lemma \ref{retrieveStepwise}, |
107 The other route is to dispose of lemma \ref{retrieveStepwise}, |
91 and prove a slightly weakened inductive invariant instead. |
108 and prove a weakened inductive invariant instead. |
92 We adopt this approach in this thesis. |
109 We adopt this approach in this thesis. |
93 |
110 |
94 We first introduce why the inductive invariant in $\blexer$'s proof |
111 Let us first explain why the requirement in $\blexer$'s proof |
95 is too strong, and suggest a few possible fixes, which leads to |
112 is too strong, and suggest a few possible fixes, which leads to |
96 our proof which we believe was the most natural and effective method. |
113 our proof which we believe was the most natural and effective method. |
97 |
114 |
98 |
115 |
99 |
116 |
102 %From this chapter we start with the main contribution of this thesis, which |
119 %From this chapter we start with the main contribution of this thesis, which |
103 |
120 |
104 The $\blexer$ proof relies on a lockstep POSIX |
121 The $\blexer$ proof relies on a lockstep POSIX |
105 correspondence between the lexical value and the |
122 correspondence between the lexical value and the |
106 regular expression in each derivative and injection. |
123 regular expression in each derivative and injection. |
107 If we zoom into the diagram \ref{graph:in} and look specifically at |
124 If we zoom into the diagram \ref{graph:inj} and look specifically at |
108 the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating |
125 the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating |
109 the invariant that the same bitcodes can be extracted from the pairs: |
126 the invariant that the same bitcodes can be extracted from the pairs: |
|
127 \tikzset{three sided/.style={ |
|
128 draw=none, |
|
129 append after command={ |
|
130 [-,shorten <= -0.5\pgflinewidth] |
|
131 ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
132 edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
133 ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
134 edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
135 ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
136 edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
137 } |
|
138 } |
|
139 } |
|
140 |
|
141 \tikzset{three sided1/.style={ |
|
142 draw=none, |
|
143 append after command={ |
|
144 [-,shorten <= -0.5\pgflinewidth] |
|
145 ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) |
|
146 edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
147 ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) |
|
148 edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
149 ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) |
|
150 edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) |
|
151 } |
|
152 } |
|
153 } |
|
154 |
|
155 \begin{figure}[H] |
|
156 \begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick] |
|
157 \node [rectangle, draw] (1) at (-7, 2) {$\ldots$}; |
|
158 \node [rectangle, draw] (2) at (-4, 2) {$_{bs'}(_Za+_Saa)^*$}; |
|
159 \node [rectangle, draw] (3) at (4, 2) {$_{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_0a+_1aa)^*$}; |
|
160 \node [rectangle, draw] (4) at (7, 2) {$\ldots$}; |
|
161 \node [rectangle, draw] (5) at (-7, -2) {$\ldots$}; |
|
162 \node [rectangle, draw] (6) at (-4, -2) {$\Stars \; [\Left (a)]$}; |
|
163 \node [rectangle, draw] (7) at ( 4, -2) {$\Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$}; |
|
164 \node [rectangle, draw] (8) at ( 7, -2) {$\ldots$}; |
|
165 \node [rectangle, draw] (9) at (-7, -6) {$\ldots$}; |
|
166 \node [rectangle, draw] (10) at (-4, -6) {$\textit{bits} = \retrieve \; r_i\;v_i$}; |
|
167 \node [rectangle, draw] (11) at (4, -6) {$\textit{bits} = \retrieve \; r_{i+1}\;v_{i+1}$}; |
|
168 \node [rectangle, draw] (12) at (7, -6) {$\ldots$}; |
|
169 |
|
170 |
|
171 \path (1) edge [] node {} (2); |
|
172 \path (2) edge [] node {$\backslash a$} (3); |
|
173 |
|
174 % \node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$}; |
|
175 % \node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$}; |
|
176 % \path (r) |
|
177 % edge [] node {$\internalise$} (a); |
|
178 % \node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$}; |
|
179 % \path (a) |
|
180 % edge [] node {$\backslash a$} (a1); |
|
181 % |
|
182 % \node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$}; |
|
183 % \node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$}; |
|
184 % \path (a1) |
|
185 % edge [] node {$\backslash a$} (a21); |
|
186 % \node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$}; |
|
187 % \path (a22) |
|
188 % edge [] node {$\backslash b$} (a3); |
|
189 % \path (a21) |
|
190 % edge [dashed, bend right] node {} (a3); |
|
191 % \node [rectangle, draw] (bs) at (2, 4) {$ZSZ$}; |
|
192 % \path (a3) |
|
193 % edge [below] node {$\bmkeps$} (bs); |
|
194 % \node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$}; |
|
195 % \path (bs) |
|
196 % edge [] node {$\decode$} (v); |
|
197 |
|
198 |
|
199 \end{tikzpicture} |
|
200 \caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$} |
|
201 \end{figure} |
110 |
202 |
111 When simplifications are added, the inhabitation relation no longer holds, |
203 When simplifications are added, the inhabitation relation no longer holds, |
112 causing the above diagram to break. |
204 causing the above diagram to break. |
113 |
205 |
114 Ausaf addressed this with an augmented lexer he called $\textit{slexer}$. |
206 Ausaf addressed this with an augmented lexer he called $\textit{slexer}$. |