lexing: comparison ChengsongTanPhdThesis/Chapters/Bitcoded2.tex

equal deleted inserted replaced

-:d15a0b7d6d90
+:ef2b8abcbc55
 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive
 %simplifications and therefore introduce our version of the bitcoded algorithm and
 %its correctness proof in
 %Chapter 3\ref{Chapter3}.
+\section{Overview}
+This chapter
+is the point from which novel contributions of this PhD project are introduced
+in detail,
+and previous
+chapters are essential background work for setting the scene of the formal proof we
+are about to describe.
+In particular, the correctness theorem
+of the un-optimised bit-coded lexer $\blexer$ in
+chapter \ref{Bitcoded1} formalised by Ausaf et al.
+relies on lemma \ref{retrieveStepwise} that says
+any value can be retrieved in a stepwise manner:
+\begin{center}
+	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
+\end{center}
+This no longer holds once we introduce simplifications.
+To control the size of regular expressions during derivatives,
+one has to eliminate redundant sub-expression with some
+procedure we call $\textit{simp}$,
+and $\textit{simp}$ is defined as
+:
+Having defined the $\textit{bsimp}$ function,
+we add it as a phase after a derivative is taken.
+\begin{center}
+	\begin{tabular}{lcl}
+		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
+	\end{tabular}
+\end{center}
+%Following previous notations
+%when extending from derivatives w.r.t.~character to derivative
+%w.r.t.~string, we define the derivative that nests simplifications
+%with derivatives:%\comment{simp in  the [] case?}
+We extend this from characters to strings:
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
+$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+\noindent
+The lexer that extracts bitcodes from the
+derivatives with simplifications from our $\simp$ function
+is called $\blexersimp$:
+\begin{center}
+\begin{center}
+	\begin{tabular}{lcl}
+		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
+	\end{tabular}
+\end{center}
+\begin{tabular}{lcl}
+$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+$\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\
+& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+& & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+\noindent
+the redundant sub-expressions after each derivative operation
+allows the exact structure of each intermediate result to be preserved,
+so that pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
+$\vdash v^{c} : r $ hold (if we allow the abbreviation $r_{c} \dn r\backslash c$
+and $v^{c} \dn \inj \;r \; c \; v$).
+Define the
+But $\blexersimp$ introduces simplification after the derivative
+to reduce redundancy,
+yielding $r \backslash c$
+This allows
+The proof details are necessary materials for this thesis
+because it provides necessary context to explain why we need a
+new framework for the proof of $\blexersimp$, which involves
+simplifications that cause structural changes to the regular expression.
+a new formal proof of the correctness of $\blexersimp$, where the
+proof of $\blexer$
+is not applicatble in the sense that we cannot straightforwardly extend the
+proof of theorem \ref{blexerCorrect} because lemma \ref{flex_retrieve} does
+not hold anymore.
+This is because the structural induction on the stepwise correctness
+of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
+in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
+each other.
 In this chapter we introduce simplifications
 for annotated regular expressions that can be applied to
 each intermediate derivative result. This allows
 us to make $\blexer$ much more efficient.
 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
 but their simplification functions could have been more efficient and in some cases needed fixing.
+From this chapter we start with the main contribution of this thesis, which
+o
+In particular, the $\blexer$ proof relies on a lockstep POSIX
+correspondence between the lexical value and the
+regular expression in each derivative and injection.
+which is essential for getting an understanding this thesis
+in chapter \ref{Bitcoded1}, which is necessary for understanding why
+the proof
+In this chapter,
 %We contrast our simplification function
 %with Sulzmann and Lu's, indicating the simplicity of our algorithm.
 %This is another case for the usefulness
 %and reliability of formal proofs on algorithms.
 %These ``aggressive'' simplifications would not be possible in the injection-based
 %We then prove the correctness with the improved version of
 %$\blexer$, called $\blexersimp$, by establishing
 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
 %
 \section{Simplifications by Sulzmann and Lu}
+The algorithms $\lexer$ and $\blexer$ work beautifully as functional
+programs, but not as practical code. One main reason for the slowness is due
+to the size of intermediate representations--the derivative regular expressions
+tend to grow unbounded if the matching involved a large number of possible matches.
 Consider the derivatives of the following example $(a^*a^*)^*$:
 %and $(a^* + (aa)^*)^*$:
 \begin{center}
 	\begin{tabular}{lcl}
 		$(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ &
 \end{center}
 Having defined the $\textit{bsimp}$ function,
 we add it as a phase after a derivative is taken.
 \begin{center}
 	\begin{tabular}{lcl}
-		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
+		$r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$
 	\end{tabular}
 \end{center}
 %Following previous notations
 %when extending from derivatives w.r.t.~character to derivative
 %w.r.t.~string, we define the derivative that nests simplifications

changeset 649	ef2b8abcbc55
parent 640	bd1354127574
child 650	a365d1364640