ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
changeset 650 a365d1364640
parent 649 ef2b8abcbc55
child 651 deb35fd780fe
equal deleted inserted replaced
649:ef2b8abcbc55 650:a365d1364640
    14 is the point from which novel contributions of this PhD project are introduced
    14 is the point from which novel contributions of this PhD project are introduced
    15 in detail, 
    15 in detail, 
    16 and previous
    16 and previous
    17 chapters are essential background work for setting the scene of the formal proof we
    17 chapters are essential background work for setting the scene of the formal proof we
    18 are about to describe.
    18 are about to describe.
       
    19 The proof details are necessary materials for this thesis
       
    20 because it provides necessary context to explain why we need a
       
    21 new framework for the proof of $\blexersimp$, which involves
       
    22 simplifications that cause structural changes to the regular expression.
       
    23 a new formal proof of the correctness of $\blexersimp$, where the 
       
    24 proof of $\blexer$
       
    25 is not applicatble in the sense that we cannot straightforwardly extend the
       
    26 proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does
       
    27 not hold anymore.
       
    28 %This is because the structural induction on the stepwise correctness
       
    29 %of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
       
    30 %in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
       
    31 %each other.
       
    32 %In this chapter we introduce simplifications
       
    33 %for annotated regular expressions that can be applied to 
       
    34 %each intermediate derivative result. This allows
       
    35 %us to make $\blexer$ much more efficient.
       
    36 %Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
       
    37 %but their simplification functions could have been more efficient and in some cases needed fixing.
       
    38 
       
    39 
    19 In particular, the correctness theorem 
    40 In particular, the correctness theorem 
    20 of the un-optimised bit-coded lexer $\blexer$ in 
    41 of the un-optimised bit-coded lexer $\blexer$ in 
    21 chapter \ref{Bitcoded1} formalised by Ausaf et al.
    42 chapter \ref{Bitcoded1} formalised by Ausaf et al.
    22 relies on lemma \ref{retrieveStepwise} that says
    43 relies on lemma \ref{retrieveStepwise} that says
    23 any value can be retrieved in a stepwise manner:
    44 any value can be retrieved in a stepwise manner:
    24 \begin{center}	
    45 \begin{center}	
    25 	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
    46 	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
    26 \end{center}
    47 \end{center}
    27 This no longer holds once we introduce simplifications.
    48 This no longer holds once we introduce simplifications.
    28 To control the size of regular expressions during derivatives, 
    49 Simplifications are necessary to control the size of regular expressions 
    29 one has to eliminate redundant sub-expression with some
    50 during derivatives by eliminating redundant 
    30 procedure we call $\textit{simp}$, 
    51 sub-expression with some procedure we call $\textit{bsimp}$.
    31 and $\textit{simp}$ is defined as
    52 We want to prove the correctness of $\blexersimp$ which integrates
    32 :
    53 $\textit{bsimp}$ by applying it after each call to the derivative:
    33 
       
    34 
       
    35 
       
    36 Having defined the $\textit{bsimp}$ function,
       
    37 we add it as a phase after a derivative is taken.
       
    38 \begin{center}
       
    39 	\begin{tabular}{lcl}
       
    40 		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
       
    41 	\end{tabular}
       
    42 \end{center}
       
    43 %Following previous notations
       
    44 %when extending from derivatives w.r.t.~character to derivative
       
    45 %w.r.t.~string, we define the derivative that nests simplifications 
       
    46 %with derivatives:%\comment{simp in  the [] case?}
       
    47 We extend this from characters to strings:
       
    48 \begin{center}
    54 \begin{center}
    49 \begin{tabular}{lcl}
    55 \begin{tabular}{lcl}
    50 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
    56 	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\
    51 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
    57 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
    52 \end{tabular}
    58 \end{tabular}
    53 \end{center}
       
    54 
       
    55 \noindent
       
    56 The lexer that extracts bitcodes from the 
       
    57 derivatives with simplifications from our $\simp$ function
       
    58 is called $\blexersimp$:
       
    59 
       
    60 \begin{center}
       
    61 
       
    62 \begin{center}
       
    63 	\begin{tabular}{lcl}
       
    64 		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
       
    65 	\end{tabular}
       
    66 \end{center}
       
    67 \begin{tabular}{lcl}
    59 \begin{tabular}{lcl}
    68   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
    60   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
    69       $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
    61       $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
    70   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
    62   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
    71   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
    63   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
    72   & & $\;\;\textit{else}\;\textit{None}$
    64   & & $\;\;\textit{else}\;\textit{None}$
    73 \end{tabular}
    65 \end{tabular}
    74 \end{center}
    66 \end{center}
    75 \noindent
    67 \noindent
    76 the redundant sub-expressions after each derivative operation
    68 Previously without $\textit{bsimp}$ the exact structure of each intermediate 
    77 allows the exact structure of each intermediate result to be preserved,
    69 regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
    78 so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
    70 $\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if 
    79 $\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$
    71 we use the convenient notation $r_{c} \dn r\backslash c$
    80 and $v^{c} \dn \inj \;r \; c \; v$).
    72 and $v_{r}^{c} \dn \inj \;r \; c \; v$),
    81 
    73 but $\blexersimp$ introduces simplification after the derivative,
    82 
    74 getting us trouble in aligning the pairs:
    83 Define the 
    75 \begin{center}
    84 But $\blexersimp$ introduces simplification after the derivative
    76 	$\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; \textit{bsimp} \; r_c \; v =\retrieve \; r  \;(\mathord{?} v_{r}^{c}) $
    85 to reduce redundancy,
    77 \end{center}
    86 yielding $r \backslash c$ 
    78 \noindent
    87 This allows 
    79 It is quite clear that once we made 
    88 
    80 $v$ to align with $\textit{bsimp} \; r_{c}$
    89 The proof details are necessary materials for this thesis
    81 in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged
    90 because it provides necessary context to explain why we need a
    82 in for the above statement to hold.
    91 new framework for the proof of $\blexersimp$, which involves
    83 Ausaf et al. \cite{AusafUrbanDyckhoff2016}
    92 simplifications that cause structural changes to the regular expression.
    84 made some initial attempts on the un-annotated lexer (to be continued)
    93 a new formal proof of the correctness of $\blexersimp$, where the 
    85 
    94 proof of $\blexer$
       
    95 is not applicatble in the sense that we cannot straightforwardly extend the
       
    96 proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does
       
    97 not hold anymore.
       
    98 This is because the structural induction on the stepwise correctness
       
    99 of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
       
   100 in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
       
   101 each other.
       
   102 In this chapter we introduce simplifications
       
   103 for annotated regular expressions that can be applied to 
       
   104 each intermediate derivative result. This allows
       
   105 us to make $\blexer$ much more efficient.
       
   106 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
       
   107 but their simplification functions could have been more efficient and in some cases needed fixing.
       
   108 
    86 
   109 
    87 
   110 
    88 
   111 
    89 
   112 From this chapter we start with the main contribution of this thesis, which
    90 From this chapter we start with the main contribution of this thesis, which