ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
changeset 649 ef2b8abcbc55
parent 640 bd1354127574
child 650 a365d1364640
--- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Fri May 26 23:42:15 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Sun Jun 18 17:54:52 2023 +0100
@@ -8,15 +8,121 @@
 %simplifications and therefore introduce our version of the bitcoded algorithm and 
 %its correctness proof in 
 %Chapter 3\ref{Chapter3}. 
+\section{Overview}
+
+This chapter
+is the point from which novel contributions of this PhD project are introduced
+in detail, 
+and previous
+chapters are essential background work for setting the scene of the formal proof we
+are about to describe.
+In particular, the correctness theorem 
+of the un-optimised bit-coded lexer $\blexer$ in 
+chapter \ref{Bitcoded1} formalised by Ausaf et al.
+relies on lemma \ref{retrieveStepwise} that says
+any value can be retrieved in a stepwise manner:
+\begin{center}	
+	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
+\end{center}
+This no longer holds once we introduce simplifications.
+To control the size of regular expressions during derivatives, 
+one has to eliminate redundant sub-expression with some
+procedure we call $\textit{simp}$, 
+and $\textit{simp}$ is defined as
+:
 
 
 
+Having defined the $\textit{bsimp}$ function,
+we add it as a phase after a derivative is taken.
+\begin{center}
+	\begin{tabular}{lcl}
+		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
+	\end{tabular}
+\end{center}
+%Following previous notations
+%when extending from derivatives w.r.t.~character to derivative
+%w.r.t.~string, we define the derivative that nests simplifications 
+%with derivatives:%\comment{simp in  the [] case?}
+We extend this from characters to strings:
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
+$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+
+\noindent
+The lexer that extracts bitcodes from the 
+derivatives with simplifications from our $\simp$ function
+is called $\blexersimp$:
+
+\begin{center}
+
+\begin{center}
+	\begin{tabular}{lcl}
+		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
+	\end{tabular}
+\end{center}
+\begin{tabular}{lcl}
+  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+      $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
+  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+  & & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+\noindent
+the redundant sub-expressions after each derivative operation
+allows the exact structure of each intermediate result to be preserved,
+so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
+$\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$
+and $v^{c} \dn \inj \;r \; c \; v$).
+
+
+Define the 
+But $\blexersimp$ introduces simplification after the derivative
+to reduce redundancy,
+yielding $r \backslash c$ 
+This allows 
+
+The proof details are necessary materials for this thesis
+because it provides necessary context to explain why we need a
+new framework for the proof of $\blexersimp$, which involves
+simplifications that cause structural changes to the regular expression.
+a new formal proof of the correctness of $\blexersimp$, where the 
+proof of $\blexer$
+is not applicatble in the sense that we cannot straightforwardly extend the
+proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does
+not hold anymore.
+This is because the structural induction on the stepwise correctness
+of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
+in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
+each other.
 In this chapter we introduce simplifications
 for annotated regular expressions that can be applied to 
 each intermediate derivative result. This allows
 us to make $\blexer$ much more efficient.
 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
 but their simplification functions could have been more efficient and in some cases needed fixing.
+
+
+
+
+From this chapter we start with the main contribution of this thesis, which
+
+o
+In particular, the $\blexer$ proof relies on a lockstep POSIX
+correspondence between the lexical value and the
+regular expression in each derivative and injection.
+
+
+which is essential for getting an understanding this thesis
+in chapter \ref{Bitcoded1}, which is necessary for understanding why
+the proof 
+
+In this chapter,
+
 %We contrast our simplification function 
 %with Sulzmann and Lu's, indicating the simplicity of our algorithm.
 %This is another case for the usefulness 
@@ -28,6 +134,10 @@
 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
 %
 \section{Simplifications by Sulzmann and Lu}
+The algorithms $\lexer$ and $\blexer$ work beautifully as functional 
+programs, but not as practical code. One main reason for the slowness is due
+to the size of intermediate representations--the derivative regular expressions
+tend to grow unbounded if the matching involved a large number of possible matches.
 Consider the derivatives of the following example $(a^*a^*)^*$:
 %and $(a^* + (aa)^*)^*$:
 \begin{center}
@@ -438,7 +548,7 @@
 we add it as a phase after a derivative is taken.
 \begin{center}
 	\begin{tabular}{lcl}
-		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
+		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
 	\end{tabular}
 \end{center}
 %Following previous notations