# HG changeset patch # User Chengsong # Date 1688946285 -3600 # Node ID 2e05f04ed6b34548dcc3b594fedabc6fad81c946 # Parent 273c176d9027403828cb5a89c44d0378e8cf23d9 Addressed Gerog "can't understand 'erase messes with structure'" comment diff -r 273c176d9027 -r 2e05f04ed6b3 ChengsongTanPhdThesis/Chapters/Bitcoded2.tex --- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Sun Jul 09 02:08:12 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex Mon Jul 10 00:44:45 2023 +0100 @@ -360,6 +360,8 @@ will definitely not contribute to a POSIX value, even if they are attached with different bitcodes. These duplicates therefore need to be removed. + + To achieve this, we call $\rerases$ as the function $f$ during the distinction operation. The function $\rerases$ is very similar to $\erase$, except that it preserves the structure @@ -375,7 +377,7 @@ the new datatype used by $\rerases$ (as our plain regular expression datatype does not allow non-binary alternatives). For now we can think of -$\rerases$ as the function $(\_)_\downarrow$ defined in chapter \ref{Bitcoded1} +$\rerases$ as the function erase ($(\_)_\downarrow$) defined in chapter \ref{Bitcoded1} and $\rrexp$ as plain regular expressions, but having a general list constructor for alternatives: \begin{figure}[H] @@ -400,6 +402,10 @@ $\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$ \end{tabular} \end{center} +We will provide more details in \ref{whyRerase} for why a new +erase function and new datatype is needed. But briefly speaking +it is for backward-compatibility with $\blexer$'s correctness proof and +the path we (naturally) took during our proof engineering of the finiteness property. \subsection{Putting Things Together} We can now give the definition of our simplification function: diff -r 273c176d9027 -r 2e05f04ed6b3 ChengsongTanPhdThesis/Chapters/Finite.tex --- a/ChengsongTanPhdThesis/Chapters/Finite.tex Sun Jul 09 02:08:12 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Mon Jul 10 00:44:45 2023 +0100 @@ -208,7 +208,8 @@ We first introduce the operations such as derivatives, simplification, size calculation, etc. associated with $\rrexp$s, which we have introduced - in chapter \ref{Bitcoded2}. + in chapter \ref{Bitcoded2}. As promised we will discuss + why they are needed in \ref{whyRerase}. The operations on $\rrexp$s are identical to those on annotated regular expressions except that they dispense with bitcodes. This means that all proofs about size of $\rrexp$s will apply to @@ -281,46 +282,79 @@ \end{tabular} \end{center} -\subsection{Why a New Datatype?} -The reason we -define a new datatype is that -the $\erase$ function -does not preserve the structure of annotated -regular expressions. -We initially started by using -plain regular expressions and tried to prove -lemma \ref{rsizeAsize}, -however the $\erase$ function messes with the structure of the -annotated regular expression. -The $+$ constructor -of basic regular expressions is only binary, whereas $\sum$ -takes a list. Therefore we need to convert between -annotated and normal regular expressions as follows: +\subsection{Why a New Datatype?}\label{whyRerase} +\marginpar{\em added label so this section can be referenced by other parts of the thesis +so that interested readers can jump to/be reassured that there will explanations.} +Originally the erase operation $(\_)_\downarrow$ was +used by Ausaf et al. in their proofs related to $\blexer$. +This function was not part of the lexing algorithm, and the sole purpose was to +bridge the gap between the $r$ +%$\textit{rexp}$ +(un-annotated) and $\textit{arexp}$ (annotated) +regular expression datatypes so as to leverage the correctness +theorem of $\lexer$.%to establish the correctness of $\blexer$. +For example, lemma \ref{retrieveStepwise} %and \ref{bmkepsRetrieve} +uses $\erase$ to convert an annotated regular expression $a$ into +a plain one so that it can be used by $\inj$ to create the desired value +$\inj\; (a)_\downarrow \; c \; v$. + +Ideally $\erase$ should only remove the auxiliary information not related to the +structure--the +bitcodes. However there exists a complication +where the alternative constructors have different arity for $\textit{arexp}$ +and $\textit{r}$: \begin{center} \begin{tabular}{lcl} - $\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\ - $\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\ - $\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$ + $\textit{r}$ & $::=$ & $\ldots \;|\; (\_ + \_) \; ::\; "\textit{r} \Rightarrow \textit{r} \Rightarrow \textit{r}" | \ldots$\\ + $\textit{arexp}$ & $::=$ & $\ldots\; |\; (\Sigma \_ ) \; ::\; "\textit{arexp} \; list \Rightarrow \textit{arexp}" | \ldots$ \end{tabular} \end{center} \noindent -As can be seen, alternative regular expressions with an empty argument list -will be turned into a $\ZERO$. -The singleton alternative $\sum [r]$ becomes $r$ during the -$\erase$ function. -The annotated regular expression $\sum[a, b, c]$ would turn into -$(a+(b+c))$. -All these operations change the size and structure of -an annotated regular expression, adding unnecessary -complexities to the size bound proof. - +To convert between the two +$\erase$ has to recursively disassemble a list into nested binary applications of the +$(\_ + \_)$ operator, +handling corner cases like empty or +singleton alternative lists: +%becomes $r$ during the +%$\erase$ function. +%The annotated regular expression $\sum[a, b, c]$ would turn into +%$(a+(b+c))$. +\begin{center} + \begin{tabular}{lcl} + $ (_{bs}\sum [])_\downarrow $ & $\dn$ & $\ZERO$\\ + $ (_{bs}\sum [a])_\downarrow$ & $\dn$ & $a$\\ + $ (_{bs}\sum a_1 :: a_2)_\downarrow$ & $\dn$ & $(a_1)_\downarrow + (a_2)_\downarrow)$\\ + $ (_{bs}\sum a :: as)_\downarrow$ & $\dn$ & $a_\downarrow + (\erase \; _{[]} \sum as)$ + \end{tabular} +\end{center} +\noindent +These operations inevitably change the structure and size of +an annotated regular expression. For example, +$a_1 = \sum _{Z}[x]$ has size 2, but $(a_1)_\downarrow = x$ +only has size 1. +%adding unnecessary +%complexities to the size bound proof. +%The reason we +%define a new datatype is that +%the $\erase$ function +%does not preserve the structure of annotated +%regular expressions. +%We initially started by using +%plain regular expressions and tried to prove +%lemma \ref{rsizeAsize}, +%however the $\erase$ function messes with the structure of the +%annotated regular expression. +%The $+$ constructor +%of basic regular expressions is only binary, whereas $\sum$ +%takes a list. Therefore we need to convert between +%annotated and normal regular expressions as follows: For example, if we define the size of a basic plain regular expression in the usual way, \begin{center} \begin{tabular}{lcl} $\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\ $\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\ - $\llbracket r_1 \cdot r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\ + $\llbracket r_1 + r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\ $\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\ $\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\ $\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$ @@ -332,17 +366,22 @@ $\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$ \end{center} does not hold. -With $\textit{rerase}$, however, -only the bitcodes are thrown away. -Everything about the structure remains intact. -Therefore it does not change the size -of an annotated regular expression and we have: -\begin{lemma}\label{rsizeAsize} - $\rsize{\rerase a} = \asize a$ -\end{lemma} -\begin{proof} - By routine structural induction on $a$. -\end{proof} +%With $\textit{rerase}$, however, +%only the bitcodes are thrown away. +That leads to us defining the new regular expression datatype without +bitcodes but with a list alternative constructor, and defining a new erase function +in a strictly structure-preserving manner: +\begin{center} + \begin{tabular}{lcl} + $\textit{rrexp}$ & $::=$ & $\ldots\; |\; (\sum \_ ) \; ::\; "\textit{rrexp} \; list \Rightarrow \textit{rrexp}" | \ldots$\\ + $\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\ + \end{tabular} +\end{center} +\noindent +%But +%Everything about the structure remains intact. +%Therefore it does not change the size +%of an annotated regular expression and we have: \noindent One might be able to prove an inequality such as $\llbracket a \rrbracket \leq \llbracket a_\downarrow \rrbracket_p $ @@ -475,18 +514,21 @@ can be calculated via the size of r-regular expressions after the application of $\rsimp$ and $\backslash_{rsimps}$: \begin{lemma}\label{sizeRelations} - The following two equalities hold: + The following equalities hold: \begin{itemize} \item + $\rsize{\rerase a} = \asize a$ + \item $\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$ \item $\asize{\bderssimp{a}{s}} = \rsize{\rderssimp{\rerase{a}}{s}}$ \end{itemize} \end{lemma} \begin{proof} - The first part is by induction on the inductive cases + First part follows from the definition of $(\_)_{\downarrow_r}$. + The second part is by induction on the inductive cases of $\textit{bsimp}$. - The second part is by induction on the string $s$, + The third part is by induction on the string $s$, where the inductive step follows from part one. \end{proof} \noindent