lexing: changeset 659:2e05f04ed6b3

--- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Sun Jul 09 02:08:12 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Mon Jul 10 00:44:45 2023 +0100
@@ -360,6 +360,8 @@
 will definitely not contribute to a POSIX value,
 even if they are attached with different bitcodes.
 These duplicates therefore need to be removed.
+
+
 To achieve this, we call $\rerases$ as the function $f$ during the distinction
 operation. The function
 $\rerases$ is very similar to $\erase$, except that it preserves the structure
@@ -375,7 +377,7 @@
 the new datatype used by $\rerases$ (as our plain
 regular expression datatype does not allow non-binary alternatives).
 For now we can think of 
-$\rerases$ as the function $(\_)_\downarrow$ defined in chapter \ref{Bitcoded1}
+$\rerases$ as the function erase ($(\_)_\downarrow$) defined in chapter \ref{Bitcoded1}
 and $\rrexp$ as plain regular expressions, but having a general list constructor
 for alternatives:
 \begin{figure}[H]
@@ -400,6 +402,10 @@
 $\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$
 \end{tabular}
 \end{center}
+We will provide more details in \ref{whyRerase} for why a new
+erase function and new datatype is needed. But briefly speaking
+it is for backward-compatibility with $\blexer$'s correctness proof and 
+the path we (naturally) took during our proof engineering of the finiteness property.
 
 \subsection{Putting Things Together}
 We can now give the definition of our  simplification function:

--- a/ChengsongTanPhdThesis/Chapters/Finite.tex	Sun Jul 09 02:08:12 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex	Mon Jul 10 00:44:45 2023 +0100
@@ -208,7 +208,8 @@
 		We first introduce the operations such as 
 		derivatives, simplification, size calculation, etc.
 		associated with $\rrexp$s, which we have introduced
-		in chapter \ref{Bitcoded2}.
+		in chapter \ref{Bitcoded2}. As promised we will discuss
+		why they are needed in \ref{whyRerase}.
 		The operations on $\rrexp$s are identical to those on
 		annotated regular expressions except that they dispense with
 		bitcodes. This means that all proofs about size of $\rrexp$s will apply to
@@ -281,46 +282,79 @@
 	\end{tabular}
 \end{center}
 
-\subsection{Why a New Datatype?}
-The reason we
-define a new datatype is that 
-the $\erase$ function 
-does not preserve the structure of annotated
-regular expressions.
-We initially started by using 
-plain regular expressions and tried to prove
-lemma \ref{rsizeAsize},
-however the $\erase$ function messes with the structure of the 
-annotated regular expression.
-The $+$ constructor
-of basic regular expressions is only binary, whereas $\sum$ 
-takes a list. Therefore we need to convert between
-annotated and normal regular expressions as follows:
+\subsection{Why a New Datatype?}\label{whyRerase}
+\marginpar{\em added label so this section can be referenced by other parts of the thesis
+so that interested readers can jump to/be reassured that there will explanations.}
+Originally the erase operation $(\_)_\downarrow$ was
+used by Ausaf et al. in their proofs related to $\blexer$.
+This function was not part of the lexing algorithm, and the sole purpose was to
+bridge the gap between the $r$
+%$\textit{rexp}$ 
+(un-annotated) and $\textit{arexp}$ (annotated)
+regular expression datatypes so as to leverage the correctness
+theorem of $\lexer$.%to establish the correctness of $\blexer$.
+For example, lemma \ref{retrieveStepwise} %and \ref{bmkepsRetrieve} 
+uses $\erase$ to convert an annotated regular expression $a$ into
+a plain one so that it can be used by $\inj$ to create the desired value
+$\inj\; (a)_\downarrow \; c \; v$.
+
+Ideally $\erase$ should only remove the auxiliary information not related to the
+structure--the 
+bitcodes. However there exists a complication
+where the alternative constructors have different arity for $\textit{arexp}$
+and $\textit{r}$:
 \begin{center}
 	\begin{tabular}{lcl}
-		$\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\
-		$\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\
-		$\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$
+		$\textit{r}$ & $::=$ & $\ldots \;|\; (\_ + \_) \; ::\; "\textit{r} \Rightarrow \textit{r} \Rightarrow \textit{r}" | \ldots$\\
+		$\textit{arexp}$ & $::=$ & $\ldots\; |\; (\Sigma \_ ) \; ::\; "\textit{arexp} \; list \Rightarrow \textit{arexp}" | \ldots$
 	\end{tabular}
 \end{center}
 \noindent
-As can be seen, alternative regular expressions with an empty argument list
-will be turned into a $\ZERO$.
-The singleton alternative $\sum [r]$ becomes $r$ during the
-$\erase$ function.
-The  annotated regular expression $\sum[a, b, c]$ would turn into
-$(a+(b+c))$.
-All these operations change the size and structure of
-an annotated regular expression, adding unnecessary 
-complexities to the size bound proof.
-
+To convert between the two
+$\erase$ has to recursively disassemble a list into nested binary applications of the 
+$(\_ + \_)$ operator,
+handling corner cases like empty or
+singleton alternative lists:
+%becomes $r$ during the
+%$\erase$ function.
+%The  annotated regular expression $\sum[a, b, c]$ would turn into
+%$(a+(b+c))$.
+\begin{center}
+	\begin{tabular}{lcl}
+		$ (_{bs}\sum [])_\downarrow $ & $\dn$ & $\ZERO$\\
+		$ (_{bs}\sum [a])_\downarrow$ & $\dn$ & $a$\\
+		$ (_{bs}\sum a_1 :: a_2)_\downarrow$ & $\dn$ & $(a_1)_\downarrow + (a_2)_\downarrow)$\\
+		$ (_{bs}\sum a :: as)_\downarrow$ & $\dn$ & $a_\downarrow + (\erase \; _{[]} \sum as)$
+	\end{tabular}
+\end{center}
+\noindent
+These operations inevitably change the structure and size of
+an annotated regular expression. For example,
+$a_1 = \sum _{Z}[x]$ has size 2, but $(a_1)_\downarrow = x$ 
+only has size 1.
+%adding unnecessary 
+%complexities to the size bound proof.
+%The reason we
+%define a new datatype is that 
+%the $\erase$ function 
+%does not preserve the structure of annotated
+%regular expressions.
+%We initially started by using 
+%plain regular expressions and tried to prove
+%lemma \ref{rsizeAsize},
+%however the $\erase$ function messes with the structure of the 
+%annotated regular expression.
+%The $+$ constructor
+%of basic regular expressions is only binary, whereas $\sum$ 
+%takes a list. Therefore we need to convert between
+%annotated and normal regular expressions as follows:
 For example, if we define the size of a basic plain regular expression 
 in the usual way,
 \begin{center}
 	\begin{tabular}{lcl}
 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
 		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
-		$\llbracket r_1 \cdot r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
+		$\llbracket r_1 + r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
 		$\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\
 		$\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\
 		$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
@@ -332,17 +366,22 @@
 	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
 \end{center}
 does not hold.
-With $\textit{rerase}$, however, 
-only the bitcodes are thrown away.
-Everything about the structure remains intact.
-Therefore it does not change the size
-of an annotated regular expression and we have:
-\begin{lemma}\label{rsizeAsize}
-	$\rsize{\rerase a} = \asize a$
-\end{lemma}
-\begin{proof}
-	By routine structural induction on $a$.
-\end{proof}
+%With $\textit{rerase}$, however, 
+%only the bitcodes are thrown away.
+That leads to us defining the new regular expression datatype without
+bitcodes but with a list alternative constructor, and defining a new erase function
+in a strictly structure-preserving manner:
+\begin{center}
+	\begin{tabular}{lcl}
+		$\textit{rrexp}$ & $::=$ & $\ldots\; |\; (\sum \_ ) \; ::\; "\textit{rrexp} \; list \Rightarrow \textit{rrexp}" | \ldots$\\
+		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
+	\end{tabular}
+\end{center}
+\noindent
+%But
+%Everything about the structure remains intact.
+%Therefore it does not change the size
+%of an annotated regular expression and we have:
 \noindent
 One might be able to prove an inequality such as
 $\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
@@ -475,18 +514,21 @@
 can be calculated via the size of r-regular expressions after the application
 of $\rsimp$ and $\backslash_{rsimps}$:
 \begin{lemma}\label{sizeRelations}
-	The following two equalities hold:
+	The following equalities hold:
 	\begin{itemize}
 		\item
+			$\rsize{\rerase a} = \asize a$
+		\item
 			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
 		\item
 			$\asize{\bderssimp{a}{s}} =  \rsize{\rderssimp{\rerase{a}}{s}}$
 	\end{itemize}
 \end{lemma}
 \begin{proof}
-	The first part is by induction on the inductive cases
+	First part follows from the definition of $(\_)_{\downarrow_r}$.
+	The second part is by induction on the inductive cases
 	of $\textit{bsimp}$.
-	The second part is by induction on the string $s$,
+	The third part is by induction on the string $s$,
 	where the inductive step follows from part one.
 \end{proof}
 \noindent

author	Chengsong
	Mon, 10 Jul 2023 00:44:45 +0100
changeset 659	2e05f04ed6b3
parent 658	273c176d9027
child 660	eddc4eaba7c4

ChengsongTanPhdThesis/Chapters/Bitcoded2.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Finite.tex		file \| annotate \| diff \| comparison \| revisions