lexing: changeset 650:a365d1364640

--- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Sun Jun 18 17:54:52 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Wed Jun 21 22:43:04 2023 +0100
@@ -16,6 +16,27 @@
 and previous
 chapters are essential background work for setting the scene of the formal proof we
 are about to describe.
+The proof details are necessary materials for this thesis
+because it provides necessary context to explain why we need a
+new framework for the proof of $\blexersimp$, which involves
+simplifications that cause structural changes to the regular expression.
+a new formal proof of the correctness of $\blexersimp$, where the 
+proof of $\blexer$
+is not applicatble in the sense that we cannot straightforwardly extend the
+proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does
+not hold anymore.
+%This is because the structural induction on the stepwise correctness
+%of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
+%in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
+%each other.
+%In this chapter we introduce simplifications
+%for annotated regular expressions that can be applied to 
+%each intermediate derivative result. This allows
+%us to make $\blexer$ much more efficient.
+%Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
+%but their simplification functions could have been more efficient and in some cases needed fixing.
+
+
 In particular, the correctness theorem 
 of the un-optimised bit-coded lexer $\blexer$ in 
 chapter \ref{Bitcoded1} formalised by Ausaf et al.
@@ -25,45 +46,16 @@
 	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
 \end{center}
 This no longer holds once we introduce simplifications.
-To control the size of regular expressions during derivatives, 
-one has to eliminate redundant sub-expression with some
-procedure we call $\textit{simp}$, 
-and $\textit{simp}$ is defined as
-:
-
-
-
-Having defined the $\textit{bsimp}$ function,
-we add it as a phase after a derivative is taken.
-\begin{center}
-	\begin{tabular}{lcl}
-		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
-	\end{tabular}
-\end{center}
-%Following previous notations
-%when extending from derivatives w.r.t.~character to derivative
-%w.r.t.~string, we define the derivative that nests simplifications 
-%with derivatives:%\comment{simp in  the [] case?}
-We extend this from characters to strings:
+Simplifications are necessary to control the size of regular expressions 
+during derivatives by eliminating redundant 
+sub-expression with some procedure we call $\textit{bsimp}$.
+We want to prove the correctness of $\blexersimp$ which integrates
+$\textit{bsimp}$ by applying it after each call to the derivative:
 \begin{center}
 \begin{tabular}{lcl}
-$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
+	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\
 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
 \end{tabular}
-\end{center}
-
-\noindent
-The lexer that extracts bitcodes from the 
-derivatives with simplifications from our $\simp$ function
-is called $\blexersimp$:
-
-\begin{center}
-
-\begin{center}
-	\begin{tabular}{lcl}
-		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
-	\end{tabular}
-\end{center}
 \begin{tabular}{lcl}
   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
       $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
@@ -73,38 +65,24 @@
 \end{tabular}
 \end{center}
 \noindent
-the redundant sub-expressions after each derivative operation
-allows the exact structure of each intermediate result to be preserved,
-so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
-$\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$
-and $v^{c} \dn \inj \;r \; c \; v$).
-
-
-Define the 
-But $\blexersimp$ introduces simplification after the derivative
-to reduce redundancy,
-yielding $r \backslash c$ 
-This allows 
+Previously without $\textit{bsimp}$ the exact structure of each intermediate 
+regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
+$\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if 
+we use the convenient notation $r_{c} \dn r\backslash c$
+and $v_{r}^{c} \dn \inj \;r \; c \; v$),
+but $\blexersimp$ introduces simplification after the derivative,
+getting us trouble in aligning the pairs:
+\begin{center}
+	$\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; \textit{bsimp} \; r_c \; v =\retrieve \; r  \;(\mathord{?} v_{r}^{c}) $
+\end{center}
+\noindent
+It is quite clear that once we made 
+$v$ to align with $\textit{bsimp} \; r_{c}$
+in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged
+in for the above statement to hold.
+Ausaf et al. \cite{AusafUrbanDyckhoff2016}
+made some initial attempts on the un-annotated lexer (to be continued)
 
-The proof details are necessary materials for this thesis
-because it provides necessary context to explain why we need a
-new framework for the proof of $\blexersimp$, which involves
-simplifications that cause structural changes to the regular expression.
-a new formal proof of the correctness of $\blexersimp$, where the 
-proof of $\blexer$
-is not applicatble in the sense that we cannot straightforwardly extend the
-proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does
-not hold anymore.
-This is because the structural induction on the stepwise correctness
-of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
-in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
-each other.
-In this chapter we introduce simplifications
-for annotated regular expressions that can be applied to 
-each intermediate derivative result. This allows
-us to make $\blexer$ much more efficient.
-Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
-but their simplification functions could have been more efficient and in some cases needed fixing.
author	Chengsong
	Wed, 21 Jun 2023 22:43:04 +0100
changeset 650	a365d1364640
parent 649	ef2b8abcbc55
child 651	deb35fd780fe