diff -r d15a0b7d6d90 -r ef2b8abcbc55 ChengsongTanPhdThesis/Chapters/Bitcoded2.tex --- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Fri May 26 23:42:15 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Sun Jun 18 17:54:52 2023 +0100 @@ -8,15 +8,121 @@ %simplifications and therefore introduce our version of the bitcoded algorithm and %its correctness proof in %Chapter 3\ref{Chapter3}. +\section{Overview} + +This chapter +is the point from which novel contributions of this PhD project are introduced +in detail, +and previous +chapters are essential background work for setting the scene of the formal proof we +are about to describe. +In particular, the correctness theorem +of the un-optimised bit-coded lexer $\blexer$ in +chapter \ref{Bitcoded1} formalised by Ausaf et al. +relies on lemma \ref{retrieveStepwise} that says +any value can be retrieved in a stepwise manner: +\begin{center} + $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ +\end{center} +This no longer holds once we introduce simplifications. +To control the size of regular expressions during derivatives, +one has to eliminate redundant sub-expression with some +procedure we call $\textit{simp}$, +and $\textit{simp}$ is defined as +: +Having defined the $\textit{bsimp}$ function, +we add it as a phase after a derivative is taken. +\begin{center} + \begin{tabular}{lcl} + $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$ + \end{tabular} +\end{center} +%Following previous notations +%when extending from derivatives w.r.t.~character to derivative +%w.r.t.~string, we define the derivative that nests simplifications +%with derivatives:%\comment{simp in the [] case?} +We extend this from characters to strings: +\begin{center} +\begin{tabular}{lcl} +$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\ +$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ +\end{tabular} +\end{center} + +\noindent +The lexer that extracts bitcodes from the +derivatives with simplifications from our $\simp$ function +is called $\blexersimp$: + +\begin{center} + +\begin{center} + \begin{tabular}{lcl} + $r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$ + \end{tabular} +\end{center} +\begin{tabular}{lcl} + $\textit{blexer\_simp}\;r\,s$ & $\dn$ & + $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ + & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ + & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ + & & $\;\;\textit{else}\;\textit{None}$ +\end{tabular} +\end{center} +\noindent +the redundant sub-expressions after each derivative operation +allows the exact structure of each intermediate result to be preserved, +so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and +$\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$ +and $v^{c} \dn \inj \;r \; c \; v$). + + +Define the +But $\blexersimp$ introduces simplification after the derivative +to reduce redundancy, +yielding $r \backslash c$ +This allows + +The proof details are necessary materials for this thesis +because it provides necessary context to explain why we need a +new framework for the proof of $\blexersimp$, which involves +simplifications that cause structural changes to the regular expression. +a new formal proof of the correctness of $\blexersimp$, where the +proof of $\blexer$ +is not applicatble in the sense that we cannot straightforwardly extend the +proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does +not hold anymore. +This is because the structural induction on the stepwise correctness +of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described +in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to +each other. In this chapter we introduce simplifications for annotated regular expressions that can be applied to each intermediate derivative result. This allows us to make $\blexer$ much more efficient. Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions, but their simplification functions could have been more efficient and in some cases needed fixing. + + + + +From this chapter we start with the main contribution of this thesis, which + +o +In particular, the $\blexer$ proof relies on a lockstep POSIX +correspondence between the lexical value and the +regular expression in each derivative and injection. + + +which is essential for getting an understanding this thesis +in chapter \ref{Bitcoded1}, which is necessary for understanding why +the proof + +In this chapter, + %We contrast our simplification function %with Sulzmann and Lu's, indicating the simplicity of our algorithm. %This is another case for the usefulness @@ -28,6 +134,10 @@ %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system. % \section{Simplifications by Sulzmann and Lu} +The algorithms $\lexer$ and $\blexer$ work beautifully as functional +programs, but not as practical code. One main reason for the slowness is due +to the size of intermediate representations--the derivative regular expressions +tend to grow unbounded if the matching involved a large number of possible matches. Consider the derivatives of the following example $(a^*a^*)^*$: %and $(a^* + (aa)^*)^*$: \begin{center} @@ -438,7 +548,7 @@ we add it as a phase after a derivative is taken. \begin{center} \begin{tabular}{lcl} - $r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$ + $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$ \end{tabular} \end{center} %Following previous notations