diff -r 753a3b0ee02b -r 00171b627b8d ChengsongTanPhdThesis/Chapters/Bitcoded2.tex --- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Sat Jul 08 22:18:22 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Sun Jul 09 00:29:02 2023 +0100 @@ -453,7 +453,7 @@ we add it as a phase after a derivative is taken. \begin{center} \begin{tabular}{lcl} - $r \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(r \backslash c)$ + $a \backslash_{bsimp} c$ & $\dn$ & $\textit{bsimp}(a \backslash c)$ \end{tabular} \end{center} %Following previous notations @@ -463,8 +463,8 @@ We extend this from characters to strings: \begin{center} \begin{tabular}{lcl} -$r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\ -$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ +$a \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(a \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\ +$a \backslash_{bsimps} [\,] $ & $\dn$ & $a$ \end{tabular} \end{center} @@ -532,15 +532,27 @@ Given the size difference, it is not surprising that our $\blexersimp$ significantly outperforms $\textit{blexer\_SLSimp}$ by Sulzmann and Lu. +Indeed $\blexersimp$ seems to be a correct algorithm that effectively +bounds the size of intermediate representations. +\marginpar{\em more connecting material to make narration more coherent.} +As promised we will use formal proofs to show that our speculation +based on these experimental results indeed hold. +%intuitions indeed hold. In the next section we are going to establish that our simplification preserves the correctness of the algorithm. - +\section{Correctness of $\blexersimp$} +A natural thought would be to directly re-use the formal +proof of $\blexer$'s correctness, with some minor modifications +but keeping the way the induction is done. +However we were not able to find a simple way to re-factor +proof of \ref{blexerCorrect} in chapter \ref{Bitcoded1}. -\section{Why $\textit{Blexer}$'s Proof Does Not Work} -The fundamental reason is we cannot extend the correctness proof of theorem 4 -because lemma 13 does not hold anymore when simplifications are involved. +\subsection{Why $\textit{Blexer}$'s Proof Does Not Work} +The fundamental reason is %we cannot extend the correctness proof of theorem 4 +because lemma \ref{retrieveStepwise} does not hold +anymore when simplifications are involved. \marginpar{\em rephrased things \\so why new \\proof makes sense.} %The proof details are necessary materials for this thesis %because it provides necessary context to explain why we need a @@ -569,34 +581,132 @@ relies crucially on lemma \ref{retrieveStepwise} that says any value can be retrieved in a stepwise manner, namely: \begin{equation}\label{eq:stepwise}%eqref: this proposition needs to be referred - \vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v) + \vdash v : ((a_\downarrow) \backslash c) \implies \retrieve \; (a \backslash c) \; v= \retrieve \; a \; (\inj \; (a_\downarrow) \; c\; v) \end{equation} %This no longer holds once we introduce simplifications. -Simplifications are necessary to control the size of derivatives, -but they also destroy the structures of the regular expressions -such that \ref{eq:stepwise} does not hold. +The regular expressions $a$ and $a\backslash c$ correspond to the intermediate +result before and after the derivative with respect to $c$, and similarly +$\inj\; a_\downarrow \; c \; v$ and $v$ correspond to the value before and after the derivative. +They go in lockstep pairs +\[ + (a, \; \inj\; a_\downarrow \; c \; v)\; \text{and} \; (a\backslash c,\; v) +\] +and the structure of annotated regular expression and +value within a pair always align with each other. - -We want to prove the correctness of $\blexersimp$ which integrates -$\textit{bsimp}$ by applying it after each call to the derivative: +As $\blexersimp$ integrates +$\textit{bsimp}$ by applying it after each call to the derivatives function, +%Simplifications are necessary to control the size of derivatives, +%but they also destroy the structures of the regular expressions +%such that \ref{eq:stepwise} does not hold. \begin{center} \begin{tabular}{lcl} - $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (r \backslash\, c)) \backslash_{bsimps}\, s$ \\ -$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ + $a \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(\textit{bsimp} \; (a \backslash\, c)) \backslash_{bsimps}\, s$ \\ +%$r \backslash_{bsimps} [\,] $ & $\dn$ & $r$ \end{tabular} -\begin{tabular}{lcl} - $\textit{blexer\_simp}\;r\,s$ & $\dn$ & - $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ - & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ - & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ - & & $\;\;\textit{else}\;\textit{None}$ -\end{tabular} +%\begin{tabular}{lcl} +% $\textit{blexer\_simp}\;r\,s$ & $\dn$ & +% $\textit{let}\;a = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\ +% & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ +% & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ +% & & $\;\;\textit{else}\;\textit{None}$ +%\end{tabular} \end{center} \noindent +it becomes a problem to maintain a similar property as \ref{retrieveStepwise}. Previously without $\textit{bsimp}$ the exact structure of each intermediate -regular expression is preserved, allowing pairs of inhabitation relations -in the form $\vdash v : r \backslash c $ and -$\vdash \inj \; r\; c \; v : r $ to hold in \ref{eq:stepwise}. +regular expression is preserved, +%allowing pairs of inhabitation relations +%in the form $\vdash v : r \backslash c $ and +%$\vdash \inj \; r\; c \; v : r $ to hold in \ref{eq:stepwise}. +We can illustrate this using the diagram \ref{fig:inj} in chapter \ref{Inj}, +by zooming in to the middle bit involving $r_i$, $r_{i+1}$, $v_i$ and $v_{i+1}$, +and adding the bottom row to show how bitcodes encoding the lexing information +can be extracted from every pair $(r_i, \; v_i)$: +\begin{center} + \begin{tikzpicture}[->, >=stealth', shorten >= 1pt, auto, thick] + %\node [rectangle ] (1) at (-7, 2) {$\ldots$}; + %\node [rectangle, draw] (2) at (-4, 2) {$r_i = _{bs'}(_Za+_Saa)^*$}; + %\node [rectangle, draw] (3) at (4, 2) {$r_{i+1} = _{bs'}(_Z(_Z\ONE + _S(\ONE \cdot a)))\cdot(_Za+_Saa)^*$}; + %\node [rectangle] (4) at (9, 2) {$\ldots$}; + %\node [rectangle] (5) at (-7, -2) {$\ldots$}; + %\node [rectangle, draw] (6) at (-4, -2) {$v_i = \Stars \; [\Left (a)]$}; + %\node [rectangle, draw] (7) at ( 4, -2) {$v_{i+1} = \Seq (\Alt (\Left \; \Empty)) \; \Stars \, []$}; + %\node [rectangle] (8) at ( 9, -2) {$\ldots$}; + %\node [rectangle] (9) at (-7, -6) {$\ldots$}; + %\node [rectangle, draw] (10) at (-4, -6) {$\textit{bits}_{i} = bs' @ ZZS$}; + %\node [rectangle, draw] (11) at (4, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$}; + %\node [rectangle] (12) at (9, -6) {$\ldots$}; + + \node [rectangle ] (1) at (-8, 2) {$\ldots$}; + \node [rectangle, draw] (2) at (-5, 2) {$r_i = a_\downarrow$}; + \node [rectangle, draw] (3) at (3, 2) {$r_{i+1} = (a\backslash c)_\downarrow$}; + \node [rectangle] (4) at (8, 2) {$\ldots$}; + \node [rectangle] (5) at (-8, -2) {$\ldots$}; + \node [rectangle, draw] (6) at (-5, -2) {$v_i = \inj\; r \; c \; v$}; + \node [rectangle, draw] (7) at ( 3, -2) {$v_{i+1} = v$}; + \node [rectangle] (8) at ( 8, -2) {$\ldots$}; + \node [rectangle] (9) at (-8, -6) {$\ldots$}; + \node [rectangle, draw] (10) at (-5, -6) {$\textit{bits}_{i} = bs' @ ZZS$}; + \node [rectangle, draw] (11) at (3, -6) {$\textit{bits}_{i+1} = bs'@ ZZS$}; + \node [rectangle] (12) at (8, -6) {$\ldots$}; + + + + \path (1) edge [] node {} (2); + \path (6) edge [] node {} (5); + \path (9) edge [] node {} (10); + + \path (11) edge [] node {} (12); + \path (8) edge [] node {} (7); + \path (3) edge [] node {} (4); + + + \path (6) edge [dashed,bend right = 30] node {$\retrieve \; a_i \; v_i$} (10); + \path (2) edge [dashed,bend left = 48] node {} (10); + + \path (7) edge [dashed,bend right = 30] node {$\retrieve \; a_{i+1} \; v_{i+1}$} (11); + \path (3) edge [dashed,bend left = 45] node {} (11); + + \path (2) edge [] node {$\backslash c$} (3); + \path (2) edge [dashed, <->] node {$\vdash v_i : r_i$} (6); + \path (3) edge [dashed, <->] node {$\vdash v_{i+1} : r_{i+1}$} (7); + %\path (6) edge [] node {$\vdash v_i : r_i$} (10); + %\path (7) edge [dashed, <->] node {$\vdash v_i : r_i$} (11); + + \path (10) edge [dashed, <->] node {$=$} (11); + + \path (7) edge [] node {$\inj \; r_{i+1} \; a \; v_i$} (6); + +% \node [rectangle, draw] (r) at (-6, -1) {$(aa)^*(b+c)$}; +% \node [rectangle, draw] (a) at (-6, 4) {$(aa)^*(_{Z}b + _{S}c)$}; +% \path (r) +% edge [] node {$\internalise$} (a); +% \node [rectangle, draw] (a1) at (-3, 1) {$(_{Z}(\ONE \cdot a) \cdot (aa)^*) (_{Z}b + _Sc)$}; +% \path (a) +% edge [] node {$\backslash a$} (a1); +% +% \node [rectangle, draw, three sided] (a21) at (-2.5, 4) {$(_{Z}\ONE \cdot (aa)^*)$}; +% \node [rectangle, draw, three sided1] (a22) at (-0.8, 4) {$(_{Z}b + _{S}c)$}; +% \path (a1) +% edge [] node {$\backslash a$} (a21); +% \node [rectangle, draw] (a3) at (0.5, 2) {$_{ZS}(_{Z}\ONE + \ZERO)$}; +% \path (a22) +% edge [] node {$\backslash b$} (a3); +% \path (a21) +% edge [dashed, bend right] node {} (a3); +% \node [rectangle, draw] (bs) at (2, 4) {$ZSZ$}; +% \path (a3) +% edge [below] node {$\bmkeps$} (bs); +% \node [rectangle, draw] (v) at (3, 0) {$\Seq \; (\Stars\; [\Seq \; a \; a]) \; (\Left \; b)$}; +% \path (bs) +% edge [] node {$\decode$} (v); + + + \end{tikzpicture} + %\caption{$\blexer$ with the regular expression $(aa)^*(b+c)$ and $aab$} +\end{center} + But $\blexersimp$ introduces simplification after the derivative, making it difficult to align the structures of values and regular expressions. If we change the form of property \ref{eq:stepwise} to @@ -661,7 +771,7 @@ -\section{Why Lemma \ref{retrieveStepwise}'s Requirement is too Strong} +\subsection{Why Lemma \ref{retrieveStepwise}'s Requirement is too Strong} %From this chapter we start with the main contribution of this thesis, which @@ -796,9 +906,6 @@ It does not require $v_i$ to be a POSIX value -{\color{red} \rule{\linewidth}{0.5mm}} -New content ends -{\color{red} \rule{\linewidth}{0.5mm}} @@ -825,7 +932,6 @@ %---------------------------------------------------------------------------------------- % SECTION rewrite relation %---------------------------------------------------------------------------------------- -\section{Correctness of $\blexersimp$} We first introduce the rewriting relation \emph{rrewrite} ($\rrewrite$) between two regular expressions, which stands for an atomic