lexing: comparison ChengsongTanPhdThesis/Chapters/Bitcoded2.tex

equal deleted inserted replaced

-:d8f82c690b32
+:753a3b0ee02b
 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive
 %simplifications and therefore introduce our version of the bitcoded algorithm and
 %its correctness proof in
 %Chapter 3\ref{Chapter3}.
-\section{Overview}
+%\section{Overview}
-\marginpar{\em Added a completely new \\overview section, \\highlighting contributions.}
+\marginpar{\em Added a completely new \\overview section, \\highlighting\\ contributions.}
 This chapter
 is the point from which novel contributions of this PhD project are introduced
 in detail.
 The material in the
 previous
 chapters is necessary for this thesis,
 because it provides the context for why we need a new framework for
 the proof of $\blexersimp$.
+We will first introduce why aggressive simplifications are needed, after which we
+provide our algorithm, contrasting with Sulzmann and Lu's simplifications.
+We then explain how our simplifications make
+reusing $\blexer$'s correctness proof impossible.
+%with some minor modifications
+We discuss possible fixes such as rectification functions and then introduce our proof,
+which involves a weaker inductive
+invariant than that used in the correctness proof of $\blexer$.
+\marginpar{Shortened overview.}
 %material for setting the scene of the formal proof we
 %are about to describe.
-The fundamental reason is we cannot extend the correctness proof of theorem 4
-because lemma 13 does not hold anymore when simplifications are involved.
-\marginpar{\em rephrased things \\so why new \\proof makes sense.}
-%The proof details are necessary materials for this thesis
-%because it provides necessary context to explain why we need a
-%new framework for the proof of $\blexersimp$, which involves
-%simplifications that cause structural changes to the regular expression.
-%A new formal proof of the correctness of $\blexersimp$, where the
-%proof of $\blexer$
-%is not applicatble in the sense that we cannot straightforwardly extend the
-%proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does
-%not hold anymore.
-%This is because the structural induction on the stepwise correctness
-%of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
-%in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
-%each other.
-%In this chapter we introduce simplifications
-%for annotated regular expressions that can be applied to
-%each intermediate derivative result. This allows
-%us to make $\blexer$ much more efficient.
-%Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
-%but their simplification functions could have been more efficient and in some cases needed fixing.
-In particular, the correctness theorem
-of the un-optimised bit-coded lexer $\blexer$ in
-chapter \ref{Bitcoded1} formalised by Ausaf et al.
-relies crucially on lemma \ref{retrieveStepwise} that says
-any value can be retrieved in a stepwise manner, namely:
-\begin{equation}\label{eq:stepwise}%eqref: this proposition needs to be referred
-	\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)
-\end{equation}
-%This no longer holds once we introduce simplifications.
-Simplifications are necessary to control the size of derivatives,
-but they also destroy the structures of the regular expressions
-such that \ref{eq:stepwise} does not hold.
-We want to prove the correctness of $\blexersimp$ which integrates
-$\textit{bsimp}$ by applying it after each call to the derivative:
-\begin{center}
-\begin{tabular}{lcl}
-	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\
-$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
-\end{tabular}
-\begin{tabular}{lcl}
-$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
-$\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\
-& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
-& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
-& & $\;\;\textit{else}\;\textit{None}$
-\end{tabular}
-\end{center}
-\noindent
-Previously without $\textit{bsimp}$ the exact structure of each intermediate
-regular expression is preserved, allowing pairs of inhabitation relations in the form $\vdash v : r_{c} $ and
-$\vdash v^{c} : r $ to hold in lemma \ref{retrieveStepwise}(if
-we use the convenient notation $r_{c} \dn r\backslash c$
-and $v_{r}^{c} \dn \inj \;r \; c \; v$),
-but $\blexersimp$ introduces simplification after the derivative,
-making it difficult to align the pairs:
-\begin{center}
-	$\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; (\textit{bsimp} \; r_c) \; v =\retrieve \; r  \;(\mathord{?} v_{r}^{c}) $
-\end{center}
-\noindent
-It is clear that once we made
-$v$ to align with $\textit{bsimp} \; r_{c}$
-in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged
-in for the above statement to hold.
-Ausaf et al. \cite{AusafUrbanDyckhoff2016}
-made some initial attempts with this idea, see \cite{FahadThesis}
-for details.
-They added
-and then rectify it to
-this works fine, however that limits the kind of simplifications you can introduce.
-We cannot use their idea for our very strong simplification rules.
-Therefore we take our route
-a wea
-The other route is to dispose of lemma \ref{retrieveStepwise},
-and prove a weakened inductive invariant instead.
-We adopt this approach in this thesis.
-Let us first explain why the requirement in $\blexer$'s proof
-is too strong, and suggest a few possible fixes, which leads to
-our proof which we believe was the most natural and effective method.
-\section{Why Lemma \ref{retrieveStepwise}'s Requirement is too Strong}
-%From this chapter we start with the main contribution of this thesis, which
-The $\blexer$ proof relies on a lockstep POSIX
-correspondence between the lexical value and the
-regular expression in each derivative and injection.
-If we zoom into the diagram \ref{graph:inj} and look specifically at
-the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating
-the invariant that the same bitcodes can be extracted from the pairs:
-\tikzset{three sided/.style={
-draw=none,
-append after command={
-[-,shorten <= -0.5\pgflinewidth]
-([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
-edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
-([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
-edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
-([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
-edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
-}
-}
-}
-\tikzset{three sided1/.style={
-draw=none,
-append after command={
-[-,shorten <= -0.5\pgflinewidth]
-([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
-edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
-([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
-edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
-([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
-edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
-}
-}
-}
-\begin{center}
-	\begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
-		\node [rectangle ] (1)  at (-7, 2) {$\ldots$};
-		\node [rectangle, draw] (2) at  (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
-		\node [rectangle, draw] (3) at  (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_0a+_1aa)^*$};
-		\node [rectangle] (4) at  (9, 2) {$\ldots$};
-		\node [rectangle] (5) at  (-7, -2) {$\ldots$};
-		\node [rectangle, draw] (6) at  (-4, -2) {$v_i = \Stars \; [\Left (a)]$};
-		\node [rectangle, draw] (7) at  ( 4, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
-		\node [rectangle] (8) at  ( 9, -2) {$\ldots$};
-		\node [rectangle] (9) at  (-7, -6) {$\ldots$};
-		\node [rectangle, draw] (10) at (-4, -6) {$\textit{bits}_{i} = \retrieve \; r_i\;v_i$};
-		\node [rectangle, draw] (11) at (4, -6) {$\textit{bits}_{i+1} = \retrieve \; r_{i+1}\;v_{i+1}$};
-		\node [rectangle] (12) at  (9, -6) {$\ldots$};
-		\path (1) edge [] node {} (2);
-		\path (5) edge [] node {} (6);
-		\path (9) edge [] node {} (10);
-		\path (11) edge [] node {} (12);
-		\path (7) edge [] node {} (8);
-		\path (3) edge [] node {} (4);
-		\path (2) edge [] node {$\backslash a$} (3);
-		\path (2) edge [dashed, <->] node {$\vdash v_i : r_i$} (6);
-		\path (3) edge [dashed, <->] node {$\vdash v_{i+1} : r_{i+1}$} (7);
-		%\path (6) edge [] node {$\vdash v_i : r_i$} (10);
-		%\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11);
-		\path (10) edge [dashed, <->] node {$=$} (11);
-		\path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6);
-%		\node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
-%		\node [rectangle, draw] (a) at (-6, 4)	  {$(aa)^*(_{Z}b + _{S}c)$};
-%		\path	(r)
-%			edge [] node {$\internalise$} (a);
-%		\node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$};
-%		\path	(a)
-%			edge [] node {$\backslash a$} (a1);
-%
-%		\node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$};
-%		\node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$};
-%		\path	(a1)
-%			edge [] node {$\backslash a$} (a21);
-%		\node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$};
-%		\path	(a22)
-%			edge [] node {$\backslash b$} (a3);
-%		\path	(a21)
-%			edge [dashed, bend right] node {} (a3);
-%		\node [rectangle, draw] (bs) at (2, 4) {$ZSZ$};
-%		\path	(a3)
-%			edge [below] node {$\bmkeps$} (bs);
-%		\node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$};
-%		\path 	(bs)
-%			edge [] node {$\decode$} (v);
-	\end{tikzpicture}
-	%\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$}
-\end{center}
-When simplifications are added, the inhabitation relation no longer holds,
-causing the above diagram to break.
-Ausaf addressed this with an augmented lexer he called $\textit{slexer}$.
-we note that the invariant
-$\vdash v_{i+1}: r_{i+1} \implies \retrieve \; r_{i+1} \; v_{i+1} $ is too strong
-to maintain because the precondition $\vdash v_i : r_i$ is too weak.
-It does not require $v_i$ to be a POSIX value
-{\color{red} \rule{\linewidth}{0.5mm}}
-New content ends
-{\color{red} \rule{\linewidth}{0.5mm}}
-%
-%
-%which is essential for getting an understanding this thesis
-%in chapter \ref{Bitcoded1}, which is necessary for understanding why
-%the proof
-%
-%In this chapter,
-%We contrast our simplification function
-%with Sulzmann and Lu's, indicating the simplicity of our algorithm.
-%This is another case for the usefulness
-%and reliability of formal proofs on algorithms.
-%These ``aggressive'' simplifications would not be possible in the injection-based
-%lexing we introduced in chapter \ref{Inj}.
-%We then prove the correctness with the improved version of
-%$\blexer$, called $\blexersimp$, by establishing
-%$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
-%
 \section{Simplifications by Sulzmann and Lu}
+\marginpar{moved \\simplification \\section to front \\to make coherent\\ sense.}
 The algorithms $\lexer$ and $\blexer$ work beautifully as functional
 programs, but not as practical code. One main reason for the slowness is due
 to the size of intermediate representations--the derivative regular expressions
 tend to grow unbounded if the matching involved a large number of possible matches.
 Consider the derivatives of the following example $(a^*a^*)^*$:
 							     & $\stackrel{\backslash a}{
 	\longrightarrow} $ & $\ldots$\\
 	\end{tabular}
 \end{center}
 \noindent
-As can be seen, there are several duplications.
+From the second derivative several duplicate sub-expressions
+already needs to be eliminated (possible
+bitcodes are omitted to make the presentation more concise
+because they are not the key part of the simplifications).
 A simple-minded simplification function cannot simplify
 the third regular expression in the above chain of derivative
 regular expressions, namely
 \begin{center}
 $((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
 unpredictable and inefficient.
 %To not get ``caught off guard'' by
 %these counterexamples,
 %one needs to be more careful when designing the
 %simplification function and making claims about them.
 \section{Our $\textit{Simp}$ Function}
 We will now introduce our own simplification function.
 %by making a contrast with $\textit{simp}\_{SL}$.
 We also describe
 Given the size difference, it is not
 surprising that our $\blexersimp$ significantly outperforms
 $\textit{blexer\_SLSimp}$ by Sulzmann and Lu.
 In the next section we are going to establish that our
 simplification preserves the correctness of the algorithm.
+\section{Why $\textit{Blexer}$'s Proof Does Not Work}
+The fundamental reason is we cannot extend the correctness proof of theorem 4
+because lemma 13 does not hold anymore when simplifications are involved.
+\marginpar{\em rephrased things \\so why new \\proof makes sense.}
+%The proof details are necessary materials for this thesis
+%because it provides necessary context to explain why we need a
+%new framework for the proof of $\blexersimp$, which involves
+%simplifications that cause structural changes to the regular expression.
+%A new formal proof of the correctness of $\blexersimp$, where the
+%proof of $\blexer$
+%is not applicatble in the sense that we cannot straightforwardly extend the
+%proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does
+%not hold anymore.
+%This is because the structural induction on the stepwise correctness
+%of $\inj$ breaks due to each pair of $r_i$ and $v_i$ described
+%in chapter \ref{Inj} and \ref{Bitcoded1} no longer correspond to
+%each other.
+%In this chapter we introduce simplifications
+%for annotated regular expressions that can be applied to
+%each intermediate derivative result. This allows
+%us to make $\blexer$ much more efficient.
+%Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
+%but their simplification functions could have been more efficient and in some cases needed fixing.
+In particular, the correctness theorem
+of the un-optimised bit-coded lexer $\blexer$ in
+chapter \ref{Bitcoded1} formalised by Ausaf et al.
+relies crucially on lemma \ref{retrieveStepwise} that says
+any value can be retrieved in a stepwise manner, namely:
+\begin{equation}\label{eq:stepwise}%eqref: this proposition needs to be referred
+	\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)
+\end{equation}
+%This no longer holds once we introduce simplifications.
+Simplifications are necessary to control the size of derivatives,
+but they also destroy the structures of the regular expressions
+such that \ref{eq:stepwise} does not hold.
+We want to prove the correctness of $\blexersimp$ which integrates
+$\textit{bsimp}$ by applying it after each call to the derivative:
+\begin{center}
+\begin{tabular}{lcl}
+	$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\
+$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\begin{tabular}{lcl}
+$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+$\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\
+& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+& & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+\noindent
+Previously without $\textit{bsimp}$ the exact structure of each intermediate
+regular expression is preserved, allowing pairs of inhabitation relations
+in the form $\vdash v : r \backslash c $ and
+$\vdash \inj \; r\; c \; v : r $ to hold in \ref{eq:stepwise}.
+But $\blexersimp$ introduces simplification after the derivative,
+making it difficult to align the structures of values and regular expressions.
+If we change the form of property \ref{eq:stepwise} to
+adapt to the needs of $\blexersimp$ the precondition of becomes
+\[
+	\vdash v : (\textit{bsimp} \; (r\backslash c))
+\]
+The inhabitation relation of the other pair no longer holds,
+because $\inj$ does not work on the simplified value $v$
+and the unsimplified regular expression $r$.
+The retrieve function will not work either.
+\[
+	\vdash \inj \; r \; c \; v : r
+\]
+It seems unclear what procedures needs to be
+used to create a new value $v_?$ such that
+\[
+	\vdash v_? : r \; \text{and} \; \retrieve \; r \; v_?   = \retrieve \; (\textit{bsimp} \; (r\backslash c)) \; v
+\]
+hold.
+%It is clear that once we made
+%$v$ to align with $\textit{bsimp} \; r_{c}$
+%in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged
+%in for the above statement to hold.
+Ausaf et al. \cite{AusafDyckhoffUrban2016}
+used something they call rectification functions to restore the original value from the simplified value.
+The idea is that simplification functions not only returns a regular expression,
+but also a rectification function
+\[
+	\textit{simp}^{rect} : Regex \Rightarrow (Value \Rightarrow Value, Regex)
+%\textit{frect} : Value \Rightarrow Value
+\]
+that is recorded recursively,
+and then applied to the previous value
+to obtain the correct value for $\inj$ to work on.
+The recursive case of the lexer is defined as something like
+\[
+	\textit{slexer} \; r \; (c\!::\!s) \dn let \;(\textit{frect}, r_c) = \textit{simp}^{rect} \;(r \backslash c) \;\; \textit{in}\;\;
+	\inj \; r \; c \; (\textit{frect} \; (\textit{slexer} \; r_c\; s))
+	%\textit{match} \; s \; \textit{case} [
+\]
+However this approach (including $\textit{slexer}$'s correctness proof) only
+works without bitcodes, and it limits the kind of simplifications one can introduce.
+%and they have not yet extended their relatively simple simplifications
+%to more aggressive ones.
+See the thesis by Ausaf \cite{Ausaf}
+for details.
+%\begin{center}
+%	$\vdash v:  (r\backslash c) \implies \retrieve \; (\mathord{?}(\textit{bsimp} \; r_c)) \; v =\retrieve \; r  \;(\mathord{?} v_{r}^{c}) $
+%\end{center}
+%\noindent
+We were not able to use their idea for our very strong simplification rules.
+Therefore we are taking another route that completely
+disposes of lemma \ref{retrieveStepwise},
+and prove a weakened inductive invariant instead.
+Let us first explain why lemma \ref{retrieveStepwise}'s requirement
+is too strong, and suggest a few possible fixes, which leads to
+our proof which we believe was the most natural and effective method.
+\section{Why Lemma \ref{retrieveStepwise}'s Requirement is too Strong}
+%From this chapter we start with the main contribution of this thesis, which
+The $\blexer$ proof relies on a lockstep POSIX
+correspondence between the lexical value and the
+regular expression in each derivative and injection.
+If we zoom into the diagram \ref{graph:inj} and look specifically at
+the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating
+the invariant that the same bitcodes can be extracted from the pairs:
+\tikzset{three sided/.style={
+draw=none,
+append after command={
+[-,shorten <= -0.5\pgflinewidth]
+([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+}
+}
+}
+\tikzset{three sided1/.style={
+draw=none,
+append after command={
+[-,shorten <= -0.5\pgflinewidth]
+([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+}
+}
+}
+\begin{center}
+	\begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
+		%\node [rectangle ] (1)  at (-7, 2) {$\ldots$};
+		%\node [rectangle, draw] (2) at  (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
+		%\node [rectangle, draw] (3) at  (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$};
+		%\node [rectangle] (4) at  (9, 2) {$\ldots$};
+		%\node [rectangle] (5) at  (-7, -2) {$\ldots$};
+		%\node [rectangle, draw] (6) at  (-4, -2) {$v_i = \Stars \; [\Left (a)]$};
+		%\node [rectangle, draw] (7) at  ( 4, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
+		%\node [rectangle] (8) at  ( 9, -2) {$\ldots$};
+		%\node [rectangle] (9) at  (-7, -6) {$\ldots$};
+		%\node [rectangle, draw] (10) at (-4, -6) {$\textit{bits}_{i} = bs' @ ZZS$};
+		%\node [rectangle, draw] (11) at (4, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$};
+		%\node [rectangle] (12) at  (9, -6) {$\ldots$};
+		\node [rectangle ] (1)  at (-8, 2) {$\ldots$};
+		\node [rectangle, draw] (2) at  (-5, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
+		\node [rectangle, draw] (3) at  (3, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$};
+		\node [rectangle] (4) at  (8, 2) {$\ldots$};
+		\node [rectangle] (5) at  (-8, -2) {$\ldots$};
+		\node [rectangle, draw] (6) at  (-5, -2) {$v_i = \Stars \; [\Left (a)]$};
+		\node [rectangle, draw] (7) at  ( 3, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
+		\node [rectangle] (8) at  ( 8, -2) {$\ldots$};
+		\node [rectangle] (9) at  (-8, -6) {$\ldots$};
+		\node [rectangle, draw] (10) at (-5, -6) {$\textit{bits}_{i} = bs' @ ZZS$};
+		\node [rectangle, draw] (11) at (3, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$};
+		\node [rectangle] (12) at  (8, -6) {$\ldots$};
+		\path (1) edge [] node {} (2);
+		\path (5) edge [] node {} (6);
+		\path (9) edge [] node {} (10);
+		\path (11) edge [] node {} (12);
+		\path (7) edge [] node {} (8);
+		\path (3) edge [] node {} (4);
+		\path (6) edge [dashed,bend right = 30] node {$\retrieve \; r_i \; v_i$} (10);
+		\path (2) edge [dashed,bend left = 48] node {} (10);
+		\path (7) edge [dashed,bend right = 30] node {$\retrieve \; r_{i+1} \; v_{i+1}$} (11);
+		\path (3) edge [dashed,bend left = 45] node {} (11);
+		\path (2) edge [] node {$\backslash a$} (3);
+		\path (2) edge [dashed, <->] node {$\vdash v_i : r_i$} (6);
+		\path (3) edge [dashed, <->] node {$\vdash v_{i+1} : r_{i+1}$} (7);
+		%\path (6) edge [] node {$\vdash v_i : r_i$} (10);
+		%\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11);
+		\path (10) edge [dashed, <->] node {$=$} (11);
+		\path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6);
+%		\node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
+%		\node [rectangle, draw] (a) at (-6, 4)	  {$(aa)^*(_{Z}b + _{S}c)$};
+%		\path	(r)
+%			edge [] node {$\internalise$} (a);
+%		\node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$};
+%		\path	(a)
+%			edge [] node {$\backslash a$} (a1);
+%
+%		\node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$};
+%		\node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$};
+%		\path	(a1)
+%			edge [] node {$\backslash a$} (a21);
+%		\node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$};
+%		\path	(a22)
+%			edge [] node {$\backslash b$} (a3);
+%		\path	(a21)
+%			edge [dashed, bend right] node {} (a3);
+%		\node [rectangle, draw] (bs) at (2, 4) {$ZSZ$};
+%		\path	(a3)
+%			edge [below] node {$\bmkeps$} (bs);
+%		\node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$};
+%		\path 	(bs)
+%			edge [] node {$\decode$} (v);
+	\end{tikzpicture}
+	%\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$}
+\end{center}
+When simplifications are added, the inhabitation relation no longer holds,
+causing the above diagram to break.
+Ausaf addressed this with an augmented lexer he called $\textit{slexer}$.
+we note that the invariant
+$\vdash v_{i+1}: r_{i+1} \implies \retrieve \; r_{i+1} \; v_{i+1} $ is too strong
+to maintain because the precondition $\vdash v_i : r_i$ is too weak.
+It does not require $v_i$ to be a POSIX value
+{\color{red} \rule{\linewidth}{0.5mm}}
+New content ends
+{\color{red} \rule{\linewidth}{0.5mm}}
+%
+%
+%which is essential for getting an understanding this thesis
+%in chapter \ref{Bitcoded1}, which is necessary for understanding why
+%the proof
+%
+%In this chapter,
+%We contrast our simplification function
+%with Sulzmann and Lu's, indicating the simplicity of our algorithm.
+%This is another case for the usefulness
+%and reliability of formal proofs on algorithms.
+%These ``aggressive'' simplifications would not be possible in the injection-based
+%lexing we introduced in chapter \ref{Inj}.
+%We then prove the correctness with the improved version of
+%$\blexer$, called $\blexersimp$, by establishing
+%$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
+%
 %----------------------------------------------------------------------------------------
 %	SECTION rewrite relation
 %----------------------------------------------------------------------------------------
 \section{Correctness of $\blexersimp$}
 We first introduce the rewriting relation \emph{rrewrite}

changeset 656	753a3b0ee02b
parent 655	d8f82c690b32
child 657	00171b627b8d