diff -r ef2b8abcbc55 -r a365d1364640 ChengsongTanPhdThesis/Chapters/Bitcoded2.tex --- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Sun Jun 18 17:54:52 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Wed Jun 21 22:43:04 2023 +0100 @@ -16,6 +16,27 @@ and previous chapters are essential background work for setting the scene of the formal proof we are about to describe. +The proof details are necessary materials for this thesis +because it provides necessary context to explain why we need a +new framework for the proof of $\blexersimp$, which involves +simplifications that cause structural changes to the regular expression. +a new formal proof of the correctness of $\blexersimp$, where the +proof of $\blexer$ +is not applicatble in the sense that we cannot straightforwardly extend the +proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does +not hold anymore. +%This is because the structural induction on the stepwise correctness +%of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described +%in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to +%each other. +%In this chapter we introduce simplifications +%for annotated regular expressions that can be applied to +%each intermediate derivative result. This allows +%us to make $\blexer$ much more efficient. +%Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, +%but their simplification functions could have been more efficient and in some cases needed fixing. + + In particular, the correctness theorem of the un-optimised bit-coded lexer $\blexer$ in chapter \ref{Bitcoded1} formalised by Ausaf et al. @@ -25,45 +46,16 @@ $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ \end{center} This no longer holds once we introduce simplifications. -To control the size of regular expressions during derivatives, -one has to eliminate redundant sub-expression with some -procedure we call $\textit{simp}$, -and $\textit{simp}$ is defined as -: - - - -Having defined the $\textit{bsimp}$ function, -we add it as a phase after a derivative is taken. -\begin{center} - \begin{tabular}{lcl} - $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$ - \end{tabular} -\end{center} -%Following previous notations -%when extending from derivatives w.r.t.~character to derivative -%w.r.t.~string, we define the derivative that nests simplifications -%with derivatives:%\comment{simp in the [] case?} -We extend this from characters to strings: +Simplifications are necessary to control the size of regular expressions +during derivatives by eliminating redundant +sub-expression with some procedure we call $\textit{bsimp}$. +We want to prove the correctness of $\blexersimp$ which integrates +$\textit{bsimp}$ by applying it after each call to the derivative: \begin{center} \begin{tabular}{lcl} -$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\ + $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\ $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ \end{tabular} -\end{center} - -\noindent -The lexer that extracts bitcodes from the -derivatives with simplifications from our $\simp$ function -is called $\blexersimp$: - -\begin{center} - -\begin{center} - \begin{tabular}{lcl} - $r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$ - \end{tabular} -\end{center} \begin{tabular}{lcl} $\textit{blexer\_simp}\;r\,s$ & $\dn$ & $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ @@ -73,38 +65,24 @@ \end{tabular} \end{center} \noindent -the redundant sub-expressions after each derivative operation -allows the exact structure of each intermediate result to be preserved, -so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and -$\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$ -and $v^{c} \dn \inj \;r \; c \; v$). - - -Define the -But $\blexersimp$ introduces simplification after the derivative -to reduce redundancy, -yielding $r \backslash c$ -This allows +Previously without $\textit{bsimp}$ the exact structure of each intermediate +regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and +$\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if +we use the convenient notation $r_{c} \dn r\backslash c$ +and $v_{r}^{c} \dn \inj \;r \; c \; v$), +but $\blexersimp$ introduces simplification after the derivative, +getting us trouble in aligning the pairs: +\begin{center} + $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; \textit{bsimp} \; r_c \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ +\end{center} +\noindent +It is quite clear that once we made +$v$ to align with $\textit{bsimp} \; r_{c}$ +in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged +in for the above statement to hold. +Ausaf et al. \cite{AusafUrbanDyckhoff2016} +made some initial attempts on the un-annotated lexer (to be continued) -The proof details are necessary materials for this thesis -because it provides necessary context to explain why we need a -new framework for the proof of $\blexersimp$, which involves -simplifications that cause structural changes to the regular expression. -a new formal proof of the correctness of $\blexersimp$, where the -proof of $\blexer$ -is not applicatble in the sense that we cannot straightforwardly extend the -proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does -not hold anymore. -This is because the structural induction on the stepwise correctness -of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described -in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to -each other. -In this chapter we introduce simplifications -for annotated regular expressions that can be applied to -each intermediate derivative result. This allows -us to make $\blexer$ much more efficient. -Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, -but their simplification functions could have been more efficient and in some cases needed fixing.