ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
changeset 667 660cf698eb26
parent 657 00171b627b8d
child 668 3831621d7b14
equal deleted inserted replaced
666:6da4516ea87d 667:660cf698eb26
     8 %simplifications and therefore introduce our version of the bitcoded algorithm and 
     8 %simplifications and therefore introduce our version of the bitcoded algorithm and 
     9 %its correctness proof in 
     9 %its correctness proof in 
    10 %Chapter 3\ref{Chapter3}. 
    10 %Chapter 3\ref{Chapter3}. 
    11 In this chapter, we are going to describe the bit-coded algorithm
    11 In this chapter, we are going to describe the bit-coded algorithm
    12 introduced by Sulzmann and Lu \parencite{Sulzmann2014} and their correctness proof.
    12 introduced by Sulzmann and Lu \parencite{Sulzmann2014} and their correctness proof.
       
    13 Just like in chapter \ref{Inj}, the algorithms and proofs have been included
       
    14 for self-containedness reasons,
       
    15 even though they have been originally found and described by
       
    16 Sulzmann and Lu (\cite{Sulzmann2014}) and 
       
    17 Ausaf et al. (\cite{AusafDyckhoffUrban2016} and \cite{Ausaf}).
       
    18 %In addition to this, the 
       
    19 The details of the proofs in this thesis 
       
    20 also follow more closely the actual Isabelle formalisation.
       
    21 For example, lemma \ref{flexStepwise} and \ref{retrieveStepwise}
       
    22 are not included in the publications by Ausaf et al., despite them being
       
    23 some of the key lemmas leading to the correctness result.
       
    24 
       
    25 We will first motivate the bit-coded algorithm in section \ref{bitMotivate},
       
    26 and then introduce their formal definitions in section \ref{bitDef},
       
    27 followed by a description of the correctness proof of $\blexer$ in section \ref{blexerProof}.
       
    28 
    13 %to address the growth problem of 
    29 %to address the growth problem of 
    14 %derivatives of
    30 %derivatives of
    15 %regular expressions. 
    31 %regular expressions. 
    16 We have implemented their algorithm in Scala and Isabelle, 
    32 %We have implemented their algorithm in Scala and Isabelle, 
    17 and found problems 
    33 %and found problems 
    18 in their algorithm, such as de-duplication not working properly and redundant
    34 %in their algorithm, such as de-duplication not working properly and redundant
    19 fixpoint construction. 
    35 %fixpoint construction. 
    20 \section{The Motivation Behind Using Bitcodes}
    36 \section{The Motivation Behind Using Bitcodes}\label{bitMotivate}
    21 Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
    37 Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
    22 \begin{center}
    38 \begin{center}
    23 \begin{tabular}{lcl}
    39 \begin{tabular}{lcl}
    24 	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
    40 	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
    25 	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
    41 	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
   314 allows the algorithm to work in an elegant way, at the expense of 
   330 allows the algorithm to work in an elegant way, at the expense of 
   315 storing quite a bit of verbose information on the stack.
   331 storing quite a bit of verbose information on the stack.
   316 The stack seems to grow at least quadratically with respect
   332 The stack seems to grow at least quadratically with respect
   317 to the input (not taking into account the size bloat of $r_i$),
   333 to the input (not taking into account the size bloat of $r_i$),
   318 which can be inefficient and prone to stack overflows.
   334 which can be inefficient and prone to stack overflows.
   319 \section{Bitcoded Algorithm}
   335 \section{Bitcoded Algorithm}\label{bitDef}
   320 To address this,
   336 To address this,
   321 Sulzmann and Lu defined a new datatype 
   337 Sulzmann and Lu defined a new datatype 
   322 called \emph{annotated regular expression},
   338 called \emph{annotated regular expression},
   323 which condenses all the partial lexing information
   339 which condenses all the partial lexing information
   324 (that was originally stored in $r_i, c_{i+1}$ pairs)
   340 (that was originally stored in $r_i, c_{i+1}$ pairs)
  1040 found by Ausaf and Urban
  1056 found by Ausaf and Urban
  1041 of the bitcoded lexer.
  1057 of the bitcoded lexer.
  1042 %-----------------------------------
  1058 %-----------------------------------
  1043 %	SUBSECTION 1
  1059 %	SUBSECTION 1
  1044 %-----------------------------------
  1060 %-----------------------------------
  1045 \section{Correctness Proof of $\textit{Blexer}$}
  1061 \section{Correctness Proof of $\textit{Blexer}$}\label{blexerProof}
  1046 Why is $\blexer$ correct?
  1062 Why is $\blexer$ correct?
  1047 In other words, why is it the case that 
  1063 In other words, why is it the case that 
  1048 $\blexer$ outputs the same value as $\lexer$?
  1064 $\blexer$ outputs the same value as $\lexer$?
  1049 Intuitively,
  1065 Intuitively,
  1050 that is because 
  1066 that is because