diff -r a4d692a9a289 -r bc5571c38d1f ChengsongTanPhdThesis/Chapters/Bitcoded2.tex --- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Tue Jun 27 16:26:48 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Thu Jun 29 04:17:48 2023 +0100 @@ -11,20 +11,27 @@ {\color{red} \rule{\linewidth}{0.5mm}} New content starts. {\color{red} \rule{\linewidth}{0.5mm}} +\marginpar{\em Added a completely new overview section, highlighting the contributions.} \section{Overview} This chapter is the point from which novel contributions of this PhD project are introduced -in detail, -and previous -chapters are essential background work for setting the scene of the formal proof we +in detail. +The material in the +previous +chapters is necessary +material for setting the scene of the formal proof we are about to describe. + +The fundamental reason is we cannot extend the correctness proof of theorem 4, +because lemma 13 does not hold anymore when simplifications are involved. + The proof details are necessary materials for this thesis because it provides necessary context to explain why we need a new framework for the proof of $\blexersimp$, which involves simplifications that cause structural changes to the regular expression. -a new formal proof of the correctness of $\blexersimp$, where the +A new formal proof of the correctness of $\blexersimp$, where the proof of $\blexer$ is not applicatble in the sense that we cannot straightforwardly extend the proof of theorem \ref{blexerCorrect} because lemma \ref{retrieveStepwise} does @@ -44,15 +51,15 @@ In particular, the correctness theorem of the un-optimised bit-coded lexer $\blexer$ in chapter \ref{Bitcoded1} formalised by Ausaf et al. -relies on lemma \ref{retrieveStepwise} that says -any value can be retrieved in a stepwise manner: -\begin{center} +relies crucially on lemma \ref{retrieveStepwise} that says +any value can be retrieved in a stepwise manner, namely: +\begin{center}%eqref: this proposition needs to be referred $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ \end{center} This no longer holds once we introduce simplifications. -Simplifications are necessary to control the size of regular expressions -during derivatives by eliminating redundant -sub-expression with some procedure we call $\textit{bsimp}$. +Simplifications are necessary to control the size of derivatives + + We want to prove the correctness of $\blexersimp$ which integrates $\textit{bsimp}$ by applying it after each call to the derivative: \begin{center} @@ -75,23 +82,33 @@ we use the convenient notation $r_{c} \dn r\backslash c$ and $v_{r}^{c} \dn \inj \;r \; c \; v$), but $\blexersimp$ introduces simplification after the derivative, -getting us trouble in aligning the pairs: +making it difficult to align the pairs: \begin{center} - $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; \textit{bsimp} \; r_c \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ + $\vdash v: \textit{bsimp} \; r_{c} \implies \retrieve \; (\textit{bsimp} \; r_c) \; v =\retrieve \; r \;(\mathord{?} v_{r}^{c}) $ \end{center} \noindent -It is quite clear that once we made +It is clear that once we made $v$ to align with $\textit{bsimp} \; r_{c}$ in the inhabitation relation, something different than $v_{r}^{c}$ needs to be plugged in for the above statement to hold. Ausaf et al. \cite{AusafUrbanDyckhoff2016} made some initial attempts with this idea, see \cite{FahadThesis} for details. + +They added +and then rectify it to + + +this works fine, however that limits the kind of simplifications you can introduce. +We cannot use their idea for our very strong simplification rules. +Therefore we take our route +a wea + The other route is to dispose of lemma \ref{retrieveStepwise}, -and prove a slightly weakened inductive invariant instead. +and prove a weakened inductive invariant instead. We adopt this approach in this thesis. -We first introduce why the inductive invariant in $\blexer$'s proof +Let us first explain why the requirement in $\blexer$'s proof is too strong, and suggest a few possible fixes, which leads to our proof which we believe was the most natural and effective method. @@ -104,9 +121,84 @@ The $\blexer$ proof relies on a lockstep POSIX correspondence between the lexical value and the regular expression in each derivative and injection. -If we zoom into the diagram \ref{graph:in} and look specifically at +If we zoom into the diagram \ref{graph:inj} and look specifically at the pairs $v_i, r_i$ and $v_{i+1},\, r_{i+1}$, we get the diagram demonstrating the invariant that the same bitcodes can be extracted from the pairs: +\tikzset{three sided/.style={ + draw=none, + append after command={ + [-,shorten <= -0.5\pgflinewidth] + ([shift={(-1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) + edge([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) + ([shift={( 0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) + edge([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) + ([shift={( 0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) + edge([shift={(-1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) + } + } +} + +\tikzset{three sided1/.style={ + draw=none, + append after command={ + [-,shorten <= -0.5\pgflinewidth] + ([shift={(1.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north west) + edge([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) + ([shift={(-0.5\pgflinewidth,-0.5\pgflinewidth)}]\tikzlastnode.north east) + edge([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) + ([shift={(-0.5\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south east) + edge([shift={(1.0\pgflinewidth,+0.5\pgflinewidth)}]\tikzlastnode.south west) + } + } +} + +\begin{figure}[H] + \begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick] + \node [rectangle, draw] (1) at (-7, 2) {$\ldots$}; + \node [rectangle, draw] (2) at (-4, 2) {$_{bs'}(_Za+_Saa)^*$}; + \node [rectangle, draw] (3) at (4, 2) {$_{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_0a+_1aa)^*$}; + \node [rectangle, draw] (4) at (7, 2) {$\ldots$}; + \node [rectangle, draw] (5) at (-7, -2) {$\ldots$}; + \node [rectangle, draw] (6) at (-4, -2) {$\Stars \; [\Left (a)]$}; + \node [rectangle, draw] (7) at ( 4, -2) {$\Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$}; + \node [rectangle, draw] (8) at ( 7, -2) {$\ldots$}; + \node [rectangle, draw] (9) at (-7, -6) {$\ldots$}; + \node [rectangle, draw] (10) at (-4, -6) {$\textit{bits} = \retrieve \; r_i\;v_i$}; + \node [rectangle, draw] (11) at (4, -6) {$\textit{bits} = \retrieve \; r_{i+1}\;v_{i+1}$}; + \node [rectangle, draw] (12) at (7, -6) {$\ldots$}; + + + \path (1) edge [] node {} (2); + \path (2) edge [] node {$\backslash a$} (3); + +% \node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$}; +% \node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$}; +% \path (r) +% edge [] node {$\internalise$} (a); +% \node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$}; +% \path (a) +% edge [] node {$\backslash a$} (a1); +% +% \node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$}; +% \node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$}; +% \path (a1) +% edge [] node {$\backslash a$} (a21); +% \node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$}; +% \path (a22) +% edge [] node {$\backslash b$} (a3); +% \path (a21) +% edge [dashed, bend right] node {} (a3); +% \node [rectangle, draw] (bs) at (2, 4) {$ZSZ$}; +% \path (a3) +% edge [below] node {$\bmkeps$} (bs); +% \node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$}; +% \path (bs) +% edge [] node {$\decode$} (v); + + + \end{tikzpicture} + \caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$} +\end{figure} When simplifications are added, the inhabitation relation no longer holds, causing the above diagram to break.