--- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Tue Jun 27 16:26:48 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Thu Jun 29 04:17:48 2023 +0100
@@ -11,20 +11,27 @@
{\color{red} \rule{\linewidth}{0.5mm}}
New content starts.
{\color{red} \rule{\linewidth}{0.5mm}}
+\marginpar{\em Added a completely new overview section, highlighting the contributions.}
\section{Overview}
This chapter
is the point from which novel contributions of this PhD project are introduced
-in detail,
-and previous
-chapters are essential background work for setting the scene of the formal proof we
+in detail.
+The material in the
+previous
+chapters is necessary
+material for setting the scene of the formal proof we
are about to describe.
+
+The fundamental reason is we cannot extend the correctness proof of theorem 4,
+because lemma 13 does not hold anymore when simplifications are involved.
+
The proof details are necessary materials for this thesis
because it provides necessary context to explain why we need a
new framework for the proof of $\blexersimp$, which involves
simplifications that cause structural changes to the regular expression.
-a new formal proof of the correctness of $\blexersimp$, where the
+A new formal proof of the correctness of $\blexersimp$, where the
proof of $\blexer$
is not applicatble in the sense that we cannot straightforwardly extend the
proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does
@@ -44,15 +51,15 @@
In particular, the correctness theorem
of the un-optimised bit-coded lexer $\blexer$ in
chapter \ref{Bitcoded1} formalised by Ausaf et al.
-relies on lemma \ref{retrieveStepwise} that says
-any value can be retrieved in a stepwise manner:
-\begin{center}
+relies crucially on lemma \ref{retrieveStepwise} that says
+any value can be retrieved in a stepwise manner, namely:
+\begin{center}%eqref: this proposition needs to be referred
$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$
\end{center}
This no longer holds once we introduce simplifications.
-Simplifications are necessary to control the size of regular expressions
-during derivatives by eliminating redundant
-sub-expression with some procedure we call $\textit{bsimp}$.
+Simplifications are necessary to control the size of derivatives
+
+
We want to prove the correctness of $\blexersimp$ which integrates
$\textit{bsimp}$ by applying it after each call to the derivative:
\begin{center}
@@ -75,23 +82,33 @@
we use the convenient notation $r_{c} \dn r\backslash c$
and $v_{r}^{c} \dn \inj \;r \; c \; v$),
but $\blexersimp$ introduces simplification after the derivative,
-getting us trouble in aligning the pairs:
+making it difficult to align the pairs:
\begin{center}
- $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; \textit{bsimp} \; r_c \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $
+ $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; (\textit{bsimp} \; r_c) \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $
\end{center}
\noindent
-It is quite clear that once we made
+It is clear that once we made
$v$ to align with $\textit{bsimp} \; r_{c}$
in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged
in for the above statement to hold.
Ausaf et al. \cite{AusafUrbanDyckhoff2016}
made some initial attempts with this idea, see \cite{FahadThesis}
for details.
+
+They added
+and then rectify it to
+
+
+this works fine, however that limits the kind of simplifications you can introduce.
+We cannot use their idea for our very strong simplification rules.
+Therefore we take our route
+a wea
+
The other route is to dispose of lemma \ref{retrieveStepwise},
-and prove a slightly weakened inductive invariant instead.
+and prove a weakened inductive invariant instead.
We adopt this approach in this thesis.
-We first introduce why the inductive invariant in $\blexer$'s proof
+Let us first explain why the requirement in $\blexer$'s proof
is too strong, and suggest a few possible fixes, which leads to
our proof which we believe was the most natural and effective method.
@@ -104,9 +121,84 @@
The $\blexer$ proof relies on a lockstep POSIX
correspondence between the lexical value and the
regular expression in each derivative and injection.
-If we zoom into the diagram \ref{graph:in} and look specifically at
+If we zoom into the diagram \ref{graph:inj} and look specifically at
the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating
the invariant that the same bitcodes can be extracted from the pairs:
+\tikzset{three sided/.style={
+ draw=none,
+ append after command={
+ [-,shorten <= -0.5\pgflinewidth]
+ ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+ edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+ ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+ edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+ ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+ edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+ }
+ }
+}
+
+\tikzset{three sided1/.style={
+ draw=none,
+ append after command={
+ [-,shorten <= -0.5\pgflinewidth]
+ ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west)
+ edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+ ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east)
+ edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+ ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east)
+ edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west)
+ }
+ }
+}
+
+\begin{figure}[H]
+ \begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick]
+ \node [rectangle, draw] (1) at (-7, 2) {$\ldots$};
+ \node [rectangle, draw] (2) at (-4, 2) {$_{bs'}(_Za+_Saa)^*$};
+ \node [rectangle, draw] (3) at (4, 2) {$_{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_0a+_1aa)^*$};
+ \node [rectangle, draw] (4) at (7, 2) {$\ldots$};
+ \node [rectangle, draw] (5) at (-7, -2) {$\ldots$};
+ \node [rectangle, draw] (6) at (-4, -2) {$\Stars \; [\Left (a)]$};
+ \node [rectangle, draw] (7) at ( 4, -2) {$\Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$};
+ \node [rectangle, draw] (8) at ( 7, -2) {$\ldots$};
+ \node [rectangle, draw] (9) at (-7, -6) {$\ldots$};
+ \node [rectangle, draw] (10) at (-4, -6) {$\textit{bits} = \retrieve \; r_i\;v_i$};
+ \node [rectangle, draw] (11) at (4, -6) {$\textit{bits} = \retrieve \; r_{i+1}\;v_{i+1}$};
+ \node [rectangle, draw] (12) at (7, -6) {$\ldots$};
+
+
+ \path (1) edge [] node {} (2);
+ \path (2) edge [] node {$\backslash a$} (3);
+
+% \node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$};
+% \node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$};
+% \path (r)
+% edge [] node {$\internalise$} (a);
+% \node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$};
+% \path (a)
+% edge [] node {$\backslash a$} (a1);
+%
+% \node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$};
+% \node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$};
+% \path (a1)
+% edge [] node {$\backslash a$} (a21);
+% \node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$};
+% \path (a22)
+% edge [] node {$\backslash b$} (a3);
+% \path (a21)
+% edge [dashed, bend right] node {} (a3);
+% \node [rectangle, draw] (bs) at (2, 4) {$ZSZ$};
+% \path (a3)
+% edge [below] node {$\bmkeps$} (bs);
+% \node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$};
+% \path (bs)
+% edge [] node {$\decode$} (v);
+
+
+ \end{tikzpicture}
+ \caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$}
+\end{figure}
When simplifications are added, the inhabitation relation no longer holds,
causing the above diagram to break.