ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
changeset 649 ef2b8abcbc55
parent 640 bd1354127574
child 650 a365d1364640
equal deleted inserted replaced
648:d15a0b7d6d90 649:ef2b8abcbc55
     6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
     6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
     7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
     7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
     8 %simplifications and therefore introduce our version of the bitcoded algorithm and 
     8 %simplifications and therefore introduce our version of the bitcoded algorithm and 
     9 %its correctness proof in 
     9 %its correctness proof in 
    10 %Chapter 3\ref{Chapter3}. 
    10 %Chapter 3\ref{Chapter3}. 
    11 
    11 \section{Overview}
    12 
    12 
    13 
    13 This chapter
       
    14 is the point from which novel contributions of this PhD project are introduced
       
    15 in detail, 
       
    16 and previous
       
    17 chapters are essential background work for setting the scene of the formal proof we
       
    18 are about to describe.
       
    19 In particular, the correctness theorem 
       
    20 of the un-optimised bit-coded lexer $\blexer$ in 
       
    21 chapter \ref{Bitcoded1} formalised by Ausaf et al.
       
    22 relies on lemma \ref{retrieveStepwise} that says
       
    23 any value can be retrieved in a stepwise manner:
       
    24 \begin{center}	
       
    25 	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
       
    26 \end{center}
       
    27 This no longer holds once we introduce simplifications.
       
    28 To control the size of regular expressions during derivatives, 
       
    29 one has to eliminate redundant sub-expression with some
       
    30 procedure we call $\textit{simp}$, 
       
    31 and $\textit{simp}$ is defined as
       
    32 :
       
    33 
       
    34 
       
    35 
       
    36 Having defined the $\textit{bsimp}$ function,
       
    37 we add it as a phase after a derivative is taken.
       
    38 \begin{center}
       
    39 	\begin{tabular}{lcl}
       
    40 		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
       
    41 	\end{tabular}
       
    42 \end{center}
       
    43 %Following previous notations
       
    44 %when extending from derivatives w.r.t.~character to derivative
       
    45 %w.r.t.~string, we define the derivative that nests simplifications 
       
    46 %with derivatives:%\comment{simp in  the [] case?}
       
    47 We extend this from characters to strings:
       
    48 \begin{center}
       
    49 \begin{tabular}{lcl}
       
    50 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
       
    51 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
       
    52 \end{tabular}
       
    53 \end{center}
       
    54 
       
    55 \noindent
       
    56 The lexer that extracts bitcodes from the 
       
    57 derivatives with simplifications from our $\simp$ function
       
    58 is called $\blexersimp$:
       
    59 
       
    60 \begin{center}
       
    61 
       
    62 \begin{center}
       
    63 	\begin{tabular}{lcl}
       
    64 		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
       
    65 	\end{tabular}
       
    66 \end{center}
       
    67 \begin{tabular}{lcl}
       
    68   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
       
    69       $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
       
    70   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
       
    71   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
       
    72   & & $\;\;\textit{else}\;\textit{None}$
       
    73 \end{tabular}
       
    74 \end{center}
       
    75 \noindent
       
    76 the redundant sub-expressions after each derivative operation
       
    77 allows the exact structure of each intermediate result to be preserved,
       
    78 so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
       
    79 $\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$
       
    80 and $v^{c} \dn \inj \;r \; c \; v$).
       
    81 
       
    82 
       
    83 Define the 
       
    84 But $\blexersimp$ introduces simplification after the derivative
       
    85 to reduce redundancy,
       
    86 yielding $r \backslash c$ 
       
    87 This allows 
       
    88 
       
    89 The proof details are necessary materials for this thesis
       
    90 because it provides necessary context to explain why we need a
       
    91 new framework for the proof of $\blexersimp$, which involves
       
    92 simplifications that cause structural changes to the regular expression.
       
    93 a new formal proof of the correctness of $\blexersimp$, where the 
       
    94 proof of $\blexer$
       
    95 is not applicatble in the sense that we cannot straightforwardly extend the
       
    96 proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does
       
    97 not hold anymore.
       
    98 This is because the structural induction on the stepwise correctness
       
    99 of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
       
   100 in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
       
   101 each other.
    14 In this chapter we introduce simplifications
   102 In this chapter we introduce simplifications
    15 for annotated regular expressions that can be applied to 
   103 for annotated regular expressions that can be applied to 
    16 each intermediate derivative result. This allows
   104 each intermediate derivative result. This allows
    17 us to make $\blexer$ much more efficient.
   105 us to make $\blexer$ much more efficient.
    18 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
   106 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
    19 but their simplification functions could have been more efficient and in some cases needed fixing.
   107 but their simplification functions could have been more efficient and in some cases needed fixing.
       
   108 
       
   109 
       
   110 
       
   111 
       
   112 From this chapter we start with the main contribution of this thesis, which
       
   113 
       
   114 o
       
   115 In particular, the $\blexer$ proof relies on a lockstep POSIX
       
   116 correspondence between the lexical value and the
       
   117 regular expression in each derivative and injection.
       
   118 
       
   119 
       
   120 which is essential for getting an understanding this thesis
       
   121 in chapter \ref{Bitcoded1}, which is necessary for understanding why
       
   122 the proof 
       
   123 
       
   124 In this chapter,
       
   125 
    20 %We contrast our simplification function 
   126 %We contrast our simplification function 
    21 %with Sulzmann and Lu's, indicating the simplicity of our algorithm.
   127 %with Sulzmann and Lu's, indicating the simplicity of our algorithm.
    22 %This is another case for the usefulness 
   128 %This is another case for the usefulness 
    23 %and reliability of formal proofs on algorithms.
   129 %and reliability of formal proofs on algorithms.
    24 %These ``aggressive'' simplifications would not be possible in the injection-based 
   130 %These ``aggressive'' simplifications would not be possible in the injection-based 
    26 %We then prove the correctness with the improved version of 
   132 %We then prove the correctness with the improved version of 
    27 %$\blexer$, called $\blexersimp$, by establishing 
   133 %$\blexer$, called $\blexersimp$, by establishing 
    28 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
   134 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
    29 %
   135 %
    30 \section{Simplifications by Sulzmann and Lu}
   136 \section{Simplifications by Sulzmann and Lu}
       
   137 The algorithms $\lexer$ and $\blexer$ work beautifully as functional 
       
   138 programs, but not as practical code. One main reason for the slowness is due
       
   139 to the size of intermediate representations--the derivative regular expressions
       
   140 tend to grow unbounded if the matching involved a large number of possible matches.
    31 Consider the derivatives of the following example $(a^*a^*)^*$:
   141 Consider the derivatives of the following example $(a^*a^*)^*$:
    32 %and $(a^* + (aa)^*)^*$:
   142 %and $(a^* + (aa)^*)^*$:
    33 \begin{center}
   143 \begin{center}
    34 	\begin{tabular}{lcl}
   144 	\begin{tabular}{lcl}
    35 		$(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ & 
   145 		$(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ & 
   436 \end{center}
   546 \end{center}
   437 Having defined the $\textit{bsimp}$ function,
   547 Having defined the $\textit{bsimp}$ function,
   438 we add it as a phase after a derivative is taken.
   548 we add it as a phase after a derivative is taken.
   439 \begin{center}
   549 \begin{center}
   440 	\begin{tabular}{lcl}
   550 	\begin{tabular}{lcl}
   441 		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
   551 		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
   442 	\end{tabular}
   552 	\end{tabular}
   443 \end{center}
   553 \end{center}
   444 %Following previous notations
   554 %Following previous notations
   445 %when extending from derivatives w.r.t.~character to derivative
   555 %when extending from derivatives w.r.t.~character to derivative
   446 %w.r.t.~string, we define the derivative that nests simplifications 
   556 %w.r.t.~string, we define the derivative that nests simplifications