--- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Fri May 26 23:42:15 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Sun Jun 18 17:54:52 2023 +0100
@@ -8,15 +8,121 @@
%simplifications and therefore introduce our version of the bitcoded algorithm and
%its correctness proof in
%Chapter 3\ref{Chapter3}.
+\section{Overview}
+
+This chapter
+is the point from which novel contributions of this PhD project are introduced
+in detail,
+and previous
+chapters are essential background work for setting the scene of the formal proof we
+are about to describe.
+In particular, the correctness theorem
+of the un-optimised bit-coded lexer $\blexer$ in
+chapter \ref{Bitcoded1} formalised by Ausaf et al.
+relies on lemma \ref{retrieveStepwise} that says
+any value can be retrieved in a stepwise manner:
+\begin{center}
+ $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$
+\end{center}
+This no longer holds once we introduce simplifications.
+To control the size of regular expressions during derivatives,
+one has to eliminate redundant sub-expression with some
+procedure we call $\textit{simp}$,
+and $\textit{simp}$ is defined as
+:
+Having defined the $\textit{bsimp}$ function,
+we add it as a phase after a derivative is taken.
+\begin{center}
+ \begin{tabular}{lcl}
+ $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
+ \end{tabular}
+\end{center}
+%Following previous notations
+%when extending from derivatives w.r.t.~character to derivative
+%w.r.t.~string, we define the derivative that nests simplifications
+%with derivatives:%\comment{simp in the [] case?}
+We extend this from characters to strings:
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
+$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+
+\noindent
+The lexer that extracts bitcodes from the
+derivatives with simplifications from our $\simp$ function
+is called $\blexersimp$:
+
+\begin{center}
+
+\begin{center}
+ \begin{tabular}{lcl}
+ $r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
+ \end{tabular}
+\end{center}
+\begin{tabular}{lcl}
+ $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+ $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\
+ & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+ & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+ & & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+\noindent
+the redundant sub-expressions after each derivative operation
+allows the exact structure of each intermediate result to be preserved,
+so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
+$\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$
+and $v^{c} \dn \inj \;r \; c \; v$).
+
+
+Define the
+But $\blexersimp$ introduces simplification after the derivative
+to reduce redundancy,
+yielding $r \backslash c$
+This allows
+
+The proof details are necessary materials for this thesis
+because it provides necessary context to explain why we need a
+new framework for the proof of $\blexersimp$, which involves
+simplifications that cause structural changes to the regular expression.
+a new formal proof of the correctness of $\blexersimp$, where the
+proof of $\blexer$
+is not applicatble in the sense that we cannot straightforwardly extend the
+proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does
+not hold anymore.
+This is because the structural induction on the stepwise correctness
+of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
+in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
+each other.
In this chapter we introduce simplifications
for annotated regular expressions that can be applied to
each intermediate derivative result. This allows
us to make $\blexer$ much more efficient.
Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
but their simplification functions could have been more efficient and in some cases needed fixing.
+
+
+
+
+From this chapter we start with the main contribution of this thesis, which
+
+o
+In particular, the $\blexer$ proof relies on a lockstep POSIX
+correspondence between the lexical value and the
+regular expression in each derivative and injection.
+
+
+which is essential for getting an understanding this thesis
+in chapter \ref{Bitcoded1}, which is necessary for understanding why
+the proof
+
+In this chapter,
+
%We contrast our simplification function
%with Sulzmann and Lu's, indicating the simplicity of our algorithm.
%This is another case for the usefulness
@@ -28,6 +134,10 @@
%$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
%
\section{Simplifications by Sulzmann and Lu}
+The algorithms $\lexer$ and $\blexer$ work beautifully as functional
+programs, but not as practical code. One main reason for the slowness is due
+to the size of intermediate representations--the derivative regular expressions
+tend to grow unbounded if the matching involved a large number of possible matches.
Consider the derivatives of the following example $(a^*a^*)^*$:
%and $(a^* + (aa)^*)^*$:
\begin{center}
@@ -438,7 +548,7 @@
we add it as a phase after a derivative is taken.
\begin{center}
\begin{tabular}{lcl}
- $r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
+ $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
\end{tabular}
\end{center}
%Following previous notations