lexing: comparison ChengsongTanPhdThesis/Chapters/Bitcoded2.tex

equal deleted inserted replaced

-:00171b627b8d
+:273c176d9027
 %$\vdash \inj \; r\; c \; v : r $ to hold in \ref{eq:stepwise}.
 We can illustrate this using the diagram \ref{fig:inj} in chapter \ref{Inj},
 by zooming in to the middle bit involving $r_i$, $r_{i+1}$, $v_i$ and $v_{i+1}$,
 and adding the bottom row to show how bitcodes encoding the lexing information
 can be extracted from every pair $(r_i, \; v_i)$:
-\begin{center}
+\begin{center}\label{graph:injZoom}
 	\begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
 		%\node [rectangle ] (1)  at (-7, 2) {$\ldots$};
 		%\node [rectangle, draw] (2) at  (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
 		%\node [rectangle, draw] (3) at  (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$};
 		%\node [rectangle] (4) at  (9, 2) {$\ldots$};
 		\node [rectangle] (5) at  (-8, -2) {$\ldots$};
 		\node [rectangle, draw] (6) at  (-5, -2) {$v_i = \inj\; r \; c \; v$};
 		\node [rectangle, draw] (7) at  ( 3, -2) {$v_{i+1} = v$};
 		\node [rectangle] (8) at  ( 8, -2) {$\ldots$};
 		\node [rectangle] (9) at  (-8, -6) {$\ldots$};
-		\node [rectangle, draw] (10) at (-5, -6) {$\textit{bits}_{i} = bs' @ ZZS$};
+		\node [rectangle, draw] (10) at (-5, -6) {$\textit{bits}_{i} $};
-		\node [rectangle, draw] (11) at (3, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$};
+		\node [rectangle, draw] (11) at (3, -6) {$\textit{bits}_{i+1}$};
 		\node [rectangle] (12) at  (8, -6) {$\ldots$};
 		\path (1) edge [] node {} (2);
 		\path (6) edge [] node {} (5);
 		\path (9) edge [] node {} (10);
-		\path (11) edge [] node {} (12);
+		\path (11) edge [<-] node {} (12);
 		\path (8) edge [] node {} (7);
 		\path (3) edge [] node {} (4);
 		\path (6) edge [dashed,bend right = 30] node {$\retrieve \; a_i \; v_i$} (10);
 		%\path (6) edge [] node {$\vdash v_i : r_i$} (10);
 		%\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11);
 		\path (10) edge [dashed, <->] node {$=$} (11);
-		\path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6);
+		\path (7) edge [] node {$\inj \; r_{i+1} \; c \; v_i$} (6);
 %		\node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
 %		\node [rectangle, draw] (a) at (-6, 4)	  {$(aa)^*(_{Z}b + _{S}c)$};
 %		\path	(r)
 %			edge [] node {$\internalise$} (a);
 \subsection{Why Lemma \ref{retrieveStepwise}'s Requirement is too Strong}
 %From this chapter we start with the main contribution of this thesis, which
-The $\blexer$ proof relies on a lockstep POSIX
+The $\blexer$ proof relies on $r_i, \; v_i$ to match each other in lockstep
-correspondence between the lexical value and the
+for each derivative step $i$, however only $v_0$ is needed and intermediate
-regular expression in each derivative and injection.
+$v_i$s are purely proof scaffolding.
-If we zoom into the diagram \ref{graph:inj} and look specifically at
+Moreover property \ref{eq:stepwise}
-the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating
+is stronger than needed for POSIX lexing: the precondition
-the invariant that the same bitcodes can be extracted from the pairs:
+$\vdash v_{i+1}:r_{i+1}$ is too general in the sense that it allows arbitrary
-\tikzset{three sided/.style={
+values inhabiting in $r_i$ to retrieve bitcodes.
-draw=none,
+%correspondence between the lexical value and the
-append after command={
+%regular expression in derivative and injection operations at the same step $i$.
-[-,shorten <= -0.5\pgflinewidth]
+%If we revisit the diagram \ref{graph:injZoom} with an example
-([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+Consider a concrete example where $a_i = (_{ZZ}x + _{ZS}y + _{S}x)$ and
-edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+$a_{i+1} = (_{ZZ}\ONE + \ZERO + _{S}\ONE)$.
-([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+What is required in the proof of $\blexer$ is that for the POSIX value
-edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+$v_i = \Left  \; (\Left \; Empty)$,
-([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+the property
-edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+\[
-}
+	%\vdash \Left  \; (\Left \; Empty) : (\ONE+\ZERO+\ONE) \implies
-}
+	\retrieve \; (_{ZZ}\ONE + \ZERO + _{S}\ONE) \; (\Left  \; (\Left \; \Empty) ) =
-}
+	\retrieve \; (_{ZZ}x + _{ZS}y + _{S}x ) \; (\Left  \; (\Left \; \Char\; x) )
+\]
-\tikzset{three sided1/.style={
+holds, and for $\blexersimp$ a property of similar shape to
-draw=none,
+\[
-append after command={
+	\retrieve \; _{ZZ}\ONE \; \; Empty =
-[-,shorten <= -0.5\pgflinewidth]
+	\retrieve \; _{ZZ}x  \; (\Char\; x)
-([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+\]
-edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+needs to hold as well.
-([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+However for the definitely non-POSIX value $v_i' = \Right \; \Empty$
-edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+the precondition $\vdash \Right \; \Empty : x+y+x$ holds as well, and therefore
-([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+the following equality
-edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+\[
-}
+	\retrieve \; (_{ZZ}\ONE + \ZERO + _{S}\ONE) \;  (\Right \; \Empty) =
-}
+	\retrieve \; (_{ZZ}x + _{ZS}y + _{S}x ) \;  (\Right \; (\Char\; x))
-}
+\]
+by lemma \ref{retrieveStepwise} holds for $\blexer$ as well.
-\begin{center}
+This cannot hold or be proven anymore with $\blexersimp$ as the corresponding
-	\begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
+sub-regular expressions and values have been eliminated during the
-		%\node [rectangle ] (1)  at (-7, 2) {$\ldots$};
+de-duplication procedure of our smplification.
-		%\node [rectangle, draw] (2) at  (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
+We are stuck with a property that holds in $\blexer$ but does not have a counterpart
-		%\node [rectangle, draw] (3) at  (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$};
+in $\blexersimp$.
-		%\node [rectangle] (4) at  (9, 2) {$\ldots$};
+This needs not hold for the purpose of POSIX lexing though--we know the rightmost
-		%\node [rectangle] (5) at  (-7, -2) {$\ldots$};
+subexpression $x$ is not POSIX by the left priority rule.
-		%\node [rectangle, draw] (6) at  (-4, -2) {$v_i = \Stars \; [\Left (a)]$};
+The inductive invariant \ref{eq:stepwise} can be weakened by restricting the precondition
-		%\node [rectangle, draw] (7) at  ( 4, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
+$\vdash v_i:r_i$ to $\exists s_i. \; (s_i, r_i) \rightarrow v_i$.
-		%\node [rectangle] (8) at  ( 9, -2) {$\ldots$};
+We tried this route but it did not work well since we need to
-		%\node [rectangle] (9) at  (-7, -6) {$\ldots$};
+use a similar technique as the rectification functions by Ausaf et al,
-		%\node [rectangle, draw] (10) at (-4, -6) {$\textit{bits}_{i} = bs' @ ZZS$};
+and they can get very complicated with our simplifications.
-		%\node [rectangle, draw] (11) at (4, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$};
+After some trial-and-error we found a property of the form
-		%\node [rectangle] (12) at  (9, -6) {$\ldots$};
+\begin{property}
+	If a POSIX value can be extracted from $a \backslash s$, then
-		\node [rectangle ] (1)  at (-8, 2) {$\ldots$};
+	it can be extracted from $a \backslash_{bsimps} s$ as well.
-		\node [rectangle, draw] (2) at  (-5, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
+\end{property}
-		\node [rectangle, draw] (3) at  (3, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$};
+\noindent
-		\node [rectangle] (4) at  (8, 2) {$\ldots$};
+most natural to work with, and we defined a binary relation to capture
-		\node [rectangle] (5) at  (-8, -2) {$\ldots$};
+the connection between $a\backslash s$ and $a \backslash_{bsimps} s$.
-		\node [rectangle, draw] (6) at  (-5, -2) {$v_i = \Stars \; [\Left (a)]$};
+%and look specifically at
-		\node [rectangle, draw] (7) at  ( 3, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
+%the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating
-		\node [rectangle] (8) at  ( 8, -2) {$\ldots$};
+%the invariant that the same bitcodes can be extracted from the pairs:
-		\node [rectangle] (9) at  (-8, -6) {$\ldots$};
+%\tikzset{three sided/.style={
-		\node [rectangle, draw] (10) at (-5, -6) {$\textit{bits}_{i} = bs' @ ZZS$};
+%        draw=none,
-		\node [rectangle, draw] (11) at (3, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$};
+%        append after command={
-		\node [rectangle] (12) at  (8, -6) {$\ldots$};
+%            [-,shorten <= -0.5\pgflinewidth]
+%            ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+%        edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+%            ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
-		\path (1) edge [] node {} (2);
+%        edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
-		\path (5) edge [] node {} (6);
+%            ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
-		\path (9) edge [] node {} (10);
+%        edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+%        }
-		\path (11) edge [] node {} (12);
+%    }
-		\path (7) edge [] node {} (8);
+%}
-		\path (3) edge [] node {} (4);
+%
+%\tikzset{three sided1/.style={
+%        draw=none,
-		\path (6) edge [dashed,bend right = 30] node {$\retrieve \; r_i \; v_i$} (10);
+%        append after command={
-		\path (2) edge [dashed,bend left = 48] node {} (10);
+%            [-,shorten <= -0.5\pgflinewidth]
+%            ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
-		\path (7) edge [dashed,bend right = 30] node {$\retrieve \; r_{i+1} \; v_{i+1}$} (11);
+%        edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
-		\path (3) edge [dashed,bend left = 45] node {} (11);
+%            ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+%        edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
-		\path (2) edge [] node {$\backslash a$} (3);
+%            ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
-		\path (2) edge [dashed, <->] node {$\vdash v_i : r_i$} (6);
+%        edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
-		\path (3) edge [dashed, <->] node {$\vdash v_{i+1} : r_{i+1}$} (7);
+%        }
-		%\path (6) edge [] node {$\vdash v_i : r_i$} (10);
+%    }
-		%\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11);
+%}
+%
-		\path (10) edge [dashed, <->] node {$=$} (11);
+%\begin{center}
+%	\begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
-		\path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6);
+%		%\node [rectangle ] (1)  at (-7, 2) {$\ldots$};
+%		%\node [rectangle, draw] (2) at  (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
-%		\node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
+%		%\node [rectangle, draw] (3) at  (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$};
-%		\node [rectangle, draw] (a) at (-6, 4)	  {$(aa)^*(_{Z}b + _{S}c)$};
+%		%\node [rectangle] (4) at  (9, 2) {$\ldots$};
-%		\path	(r)
+%		%\node [rectangle] (5) at  (-7, -2) {$\ldots$};
-%			edge [] node {$\internalise$} (a);
+%		%\node [rectangle, draw] (6) at  (-4, -2) {$v_i = \Stars \; [\Left (a)]$};
-%		\node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$};
+%		%\node [rectangle, draw] (7) at  ( 4, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
-%		\path	(a)
+%		%\node [rectangle] (8) at  ( 9, -2) {$\ldots$};
-%			edge [] node {$\backslash a$} (a1);
+%		%\node [rectangle] (9) at  (-7, -6) {$\ldots$};
-%
+%		%\node [rectangle, draw] (10) at (-4, -6) {$\textit{bits}_{i} = bs' @ ZZS$};
-%		\node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$};
+%		%\node [rectangle, draw] (11) at (4, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$};
-%		\node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$};
+%		%\node [rectangle] (12) at  (9, -6) {$\ldots$};
-%		\path	(a1)
+%
-%			edge [] node {$\backslash a$} (a21);
+%		\node [rectangle ] (1)  at (-8, 2) {$\ldots$};
-%		\node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$};
+%		\node [rectangle, draw] (2) at  (-5, 2) {$r_i = _{bs'}(_Za+_Saa)^*$};
-%		\path	(a22)
+%		\node [rectangle, draw] (3) at  (3, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$};
-%			edge [] node {$\backslash b$} (a3);
+%		\node [rectangle] (4) at  (8, 2) {$\ldots$};
-%		\path	(a21)
+%		\node [rectangle] (5) at  (-8, -2) {$\ldots$};
-%			edge [dashed, bend right] node {} (a3);
+%		\node [rectangle, draw] (6) at  (-5, -2) {$v_i = \Stars \; [\Left (a)]$};
-%		\node [rectangle, draw] (bs) at (2, 4) {$ZSZ$};
+%		\node [rectangle, draw] (7) at  ( 3, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
-%		\path	(a3)
+%		\node [rectangle] (8) at  ( 8, -2) {$\ldots$};
-%			edge [below] node {$\bmkeps$} (bs);
+%		\node [rectangle] (9) at  (-8, -6) {$\ldots$};
-%		\node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$};
+%		\node [rectangle, draw] (10) at (-5, -6) {$\textit{bits}_{i} = bs' @ ZZS$};
-%		\path 	(bs)
+%		\node [rectangle, draw] (11) at (3, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$};
-%			edge [] node {$\decode$} (v);
+%		\node [rectangle] (12) at  (8, -6) {$\ldots$};
+%
+%
-	\end{tikzpicture}
+%
-	%\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$}
+%		\path (1) edge [] node {} (2);
-\end{center}
+%		\path (5) edge [] node {} (6);
+%		\path (9) edge [] node {} (10);
-When simplifications are added, the inhabitation relation no longer holds,
+%
-causing the above diagram to break.
+%		\path (11) edge [] node {} (12);
+%		\path (7) edge [] node {} (8);
-Ausaf addressed this with an augmented lexer he called $\textit{slexer}$.
+%		\path (3) edge [] node {} (4);
+%
+%
+%		\path (6) edge [dashed,bend right = 30] node {$\retrieve \; r_i \; v_i$} (10);
-we note that the invariant
+%		\path (2) edge [dashed,bend left = 48] node {} (10);
-$\vdash v_{i+1}: r_{i+1} \implies \retrieve \; r_{i+1} \; v_{i+1} $ is too strong
+%
-to maintain because the precondition $\vdash v_i : r_i$ is too weak.
+%		\path (7) edge [dashed,bend right = 30] node {$\retrieve \; r_{i+1} \; v_{i+1}$} (11);
-It does not require $v_i$ to be a POSIX value
+%		\path (3) edge [dashed,bend left = 45] node {} (11);
+%
+%		\path (2) edge [] node {$\backslash a$} (3);
+%		\path (2) edge [dashed, <->] node {$\vdash v_i : r_i$} (6);
+%		\path (3) edge [dashed, <->] node {$\vdash v_{i+1} : r_{i+1}$} (7);
+%		%\path (6) edge [] node {$\vdash v_i : r_i$} (10);
+%		%\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11);
+%
+%		\path (10) edge [dashed, <->] node {$=$} (11);
+%
+%		\path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6);
+%
+%%		\node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
+%%		\node [rectangle, draw] (a) at (-6, 4)	  {$(aa)^*(_{Z}b + _{S}c)$};
+%%		\path	(r)
+%%			edge [] node {$\internalise$} (a);
+%%		\node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$};
+%%		\path	(a)
+%%			edge [] node {$\backslash a$} (a1);
+%%
+%%		\node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$};
+%%		\node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$};
+%%		\path	(a1)
+%%			edge [] node {$\backslash a$} (a21);
+%%		\node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$};
+%%		\path	(a22)
+%%			edge [] node {$\backslash b$} (a3);
+%%		\path	(a21)
+%%			edge [dashed, bend right] node {} (a3);
+%%		\node [rectangle, draw] (bs) at (2, 4) {$ZSZ$};
+%%		\path	(a3)
+%%			edge [below] node {$\bmkeps$} (bs);
+%%		\node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$};
+%%		\path 	(bs)
+%%			edge [] node {$\decode$} (v);
+%
+%
+%	\end{tikzpicture}
+%	%\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$}
+%\end{center}
+%When simplifications are added, the inhabitation relation no longer holds,
+%causing the above diagram to break.
+%
+%Ausaf addressed this with an augmented lexer he called $\textit{slexer}$.
+%
+%
+%
+%we note that the invariant
+%$\vdash v_{i+1}: r_{i+1} \implies \retrieve \; r_{i+1} \; v_{i+1} $ is too strong
+%to maintain because the precondition $\vdash v_i : r_i$ is too weak.
+%It does not require $v_i$ to be a POSIX value
 %
 %
 %which is essential for getting an understanding this thesis
 %in chapter \ref{Bitcoded1}, which is necessary for understanding why
 %the proof
 %lexing we introduced in chapter \ref{Inj}.
 %We then prove the correctness with the improved version of
 %$\blexer$, called $\blexersimp$, by establishing
 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
 %
 %----------------------------------------------------------------------------------------
 %	SECTION rewrite relation
 %----------------------------------------------------------------------------------------
-We first introduce the rewriting relation \emph{rrewrite}
+In the next section we first introduce the rewriting relation \emph{rrewrite}
 ($\rrewrite$) between two regular expressions,
 which stands for an atomic
 simplification.
 We then prove properties about
 this rewriting relation and its reflexive transitive closure.

changeset 658	273c176d9027
parent 657	00171b627b8d
child 659	2e05f04ed6b3