8 %simplifications and therefore introduce our version of the bitcoded algorithm and |
8 %simplifications and therefore introduce our version of the bitcoded algorithm and |
9 %its correctness proof in |
9 %its correctness proof in |
10 %Chapter 3\ref{Chapter3}. |
10 %Chapter 3\ref{Chapter3}. |
11 In this chapter, we are going to describe the bit-coded algorithm |
11 In this chapter, we are going to describe the bit-coded algorithm |
12 introduced by Sulzmann and Lu \parencite{Sulzmann2014} and their correctness proof. |
12 introduced by Sulzmann and Lu \parencite{Sulzmann2014} and their correctness proof. |
|
13 Just like in chapter \ref{Inj}, the algorithms and proofs have been included |
|
14 for self-containedness reasons, |
|
15 even though they have been originally found and described by |
|
16 Sulzmann and Lu (\cite{Sulzmann2014}) and |
|
17 Ausaf et al. (\cite{AusafDyckhoffUrban2016} and \cite{Ausaf}). |
|
18 %In addition to this, the |
|
19 The details of the proofs in this thesis |
|
20 also follow more closely the actual Isabelle formalisation. |
|
21 For example, lemma \ref{flexStepwise} and \ref{retrieveStepwise} |
|
22 are not included in the publications by Ausaf et al., despite them being |
|
23 some of the key lemmas leading to the correctness result. |
|
24 |
|
25 We will first motivate the bit-coded algorithm in section \ref{bitMotivate}, |
|
26 and then introduce their formal definitions in section \ref{bitDef}, |
|
27 followed by a description of the correctness proof of $\blexer$ in section \ref{blexerProof}. |
|
28 |
13 %to address the growth problem of |
29 %to address the growth problem of |
14 %derivatives of |
30 %derivatives of |
15 %regular expressions. |
31 %regular expressions. |
16 We have implemented their algorithm in Scala and Isabelle, |
32 %We have implemented their algorithm in Scala and Isabelle, |
17 and found problems |
33 %and found problems |
18 in their algorithm, such as de-duplication not working properly and redundant |
34 %in their algorithm, such as de-duplication not working properly and redundant |
19 fixpoint construction. |
35 %fixpoint construction. |
20 \section{The Motivation Behind Using Bitcodes} |
36 \section{The Motivation Behind Using Bitcodes}\label{bitMotivate} |
21 Let us give again the definition of $\lexer$ from Chapter \ref{Inj}: |
37 Let us give again the definition of $\lexer$ from Chapter \ref{Inj}: |
22 \begin{center} |
38 \begin{center} |
23 \begin{tabular}{lcl} |
39 \begin{tabular}{lcl} |
24 $\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\ |
40 $\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\ |
25 $\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\ |
41 $\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\ |
314 allows the algorithm to work in an elegant way, at the expense of |
330 allows the algorithm to work in an elegant way, at the expense of |
315 storing quite a bit of verbose information on the stack. |
331 storing quite a bit of verbose information on the stack. |
316 The stack seems to grow at least quadratically with respect |
332 The stack seems to grow at least quadratically with respect |
317 to the input (not taking into account the size bloat of $r_i$), |
333 to the input (not taking into account the size bloat of $r_i$), |
318 which can be inefficient and prone to stack overflows. |
334 which can be inefficient and prone to stack overflows. |
319 \section{Bitcoded Algorithm} |
335 \section{Bitcoded Algorithm}\label{bitDef} |
320 To address this, |
336 To address this, |
321 Sulzmann and Lu defined a new datatype |
337 Sulzmann and Lu defined a new datatype |
322 called \emph{annotated regular expression}, |
338 called \emph{annotated regular expression}, |
323 which condenses all the partial lexing information |
339 which condenses all the partial lexing information |
324 (that was originally stored in $r_i, c_{i+1}$ pairs) |
340 (that was originally stored in $r_i, c_{i+1}$ pairs) |
1040 found by Ausaf and Urban |
1056 found by Ausaf and Urban |
1041 of the bitcoded lexer. |
1057 of the bitcoded lexer. |
1042 %----------------------------------- |
1058 %----------------------------------- |
1043 % SUBSECTION 1 |
1059 % SUBSECTION 1 |
1044 %----------------------------------- |
1060 %----------------------------------- |
1045 \section{Correctness Proof of $\textit{Blexer}$} |
1061 \section{Correctness Proof of $\textit{Blexer}$}\label{blexerProof} |
1046 Why is $\blexer$ correct? |
1062 Why is $\blexer$ correct? |
1047 In other words, why is it the case that |
1063 In other words, why is it the case that |
1048 $\blexer$ outputs the same value as $\lexer$? |
1064 $\blexer$ outputs the same value as $\lexer$? |
1049 Intuitively, |
1065 Intuitively, |
1050 that is because |
1066 that is because |