lexing: changeset 594:62f8fa03863e

--- a/ChengsongTanPhdThesis/Chapters/Cubic.tex	Fri Sep 02 11:06:26 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex	Fri Sep 02 19:18:50 2022 +0100
@@ -497,7 +497,7 @@
 
 
 
- \section{The NTIMES Constructor, and the Size Bound and Correctness Proof for it}
+\section{The NTIMES Constructor, and the Size Bound and Correctness Proof for it}
 The NTIMES construct has the following closed form:
 \begin{verbatim}
 "rders_simp (RNTIMES r0 (Suc n)) (c#s) =

--- a/ChengsongTanPhdThesis/Chapters/Finite.tex	Fri Sep 02 11:06:26 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex	Fri Sep 02 19:18:50 2022 +0100
@@ -185,20 +185,25 @@
 		The key observation is that $\distinctBy$'s output is
 		a list with a constant length bound.
 \end{itemize}
-We give details of these steps in the next sections.
-The first step is to use 
-$\textit{rrexp}$s,
-something simpler than
-annotated regular expressions. 
+We will expand on these steps in the next sections.\\
 
 \section{The $\textit{Rrexp}$ Datatype and Its Lexing-Related Functions}
-We first recap the definition of the new datatype $\rrexp$, called
+The first step is to define 
+$\textit{rrexp}$s.
+They are without bitcodes,
+allowing a much simpler size bound proof.
+Of course, the bits which encode the lexing information 
+would grow linearly with respect 
+to the input, which should be taken into account when we wish to tackle the runtime comlexity.
+But for the sake of the structural size 
+we can safely ignore them.\\
+To recapitulate, the datatype 
+definition of the $\rrexp$, called
 \emph{r-regular expressions},
-which we first defined in \ref{rrexpDef}.
-R-regular expressions are similar to basic regular expressions.
-We call them \emph{r}-regular expressions
+was initially defined in \ref{rrexpDef}.
+The reason for the prefix $r$ is
 to make a distinction  
-with plain regular expressions.
+with basic regular expressions.
 \[			\rrexp ::=   \RZERO \mid  \RONE
 	\mid  \RCHAR{c}  
 	\mid  \RSEQ{r_1}{r_2}
@@ -222,8 +227,7 @@
 The $r$ in the subscript of $\llbracket \rrbracket_r$ is to 
 differentiate with the same operation for annotated regular expressions.
 Adding $r$ as subscript will be used in 
-other operations as well.
-
+other operations as well.\\
 The transformation from an annotated regular expression
 to an r-regular expression is straightforward.
 \begin{center}
@@ -237,54 +241,49 @@
 	\end{tabular}
 \end{center}
 \noindent
-$\textit{Rerase}$ throws away the bitcodes on the annotated regular expressions, 
+$\textit{Rerase}$ throws away the bitcodes 
+on the annotated regular expressions, 
 but keeps everything else intact.
-
-Before we introduce more functions related to r-regular expressions,
-we first give out the reason why we take all the trouble 
-defining a new datatype in the first place.
-We could calculate the size of annotated regular expressions in terms of
-their un-annotated $\rrexp$ counterpart: 
-\begin{lemma}
+Therefore it does not change the size
+of an annotated regular expression:
+\begin{lemma}\label{rsizeAsize}
 	$\rsize{\rerase a} = \asize a$
 \end{lemma}
 \begin{proof}
 	By routine structural induction on $a$.
 \end{proof}
 \noindent
+
 \subsection{Motivation Behind a New Datatype}
-The main difference between a plain regular expression
-and an r-regular expression is that it can take
-non-binary arguments for its alternative constructor.
-This turned out to be necessary if we want our proofs to be
-simple.
+The reason we take all the trouble 
+defining a new datatype is that $\erase$ makes things harder.
 We initially started by using 
 plain regular expressions and tried to prove
-equalities like 
+the lemma \ref{rsizeAsize},
+however the $\erase$ function unavoidbly messes with the structure of the 
+annotated regular expression.
+The $+$ constructor
+of basic regular expressions is binary whereas $\sum$ 
+takes a list, and one has to convert between them:
 \begin{center}
-	$\llbracket a \rrbracket = \llbracket a\downarrow \rrbracket_p$
+	\begin{tabular}{ccc}
+		$\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\
+		$\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\
+		$\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$
+	\end{tabular}
 \end{center}
-and 
-\[
-	\llbracket a \backslash_{bsimps} s \rrbracket = 
-	\llbracket a \downarrow \backslash_s s
-\]
-One might be able to prove that 
-$\llbracket a \downarrow \rrbracket_p \leq \llbracket a \rrbracket$.
-$\rrexp$ give the exact correspondence between an annotated regular expression
-and its (r-)erased version:
-
-This does not hold for plain $\rexp$s. 
-
-
-These operations are 
-almost identical to those of the annotated regular expressions,
-except that no bitcodes are attached.
-Of course, the bits which encode the lexing information would grow linearly with respect 
-to the input, which should be taken into account when we wish to tackle the runtime comlexity.
-But at the current stage 
-we can safely ignore them.
-Similarly there is a size function for plain regular expressions:
+\noindent
+An alternative regular expression with an empty argument list
+will be turned into a $\ZERO$.
+The singleton alternative $\sum [r]$ would have $r$ during the
+$\erase$ function.
+The  annotated regular expression $\sum[a, b, c]$ would turn into
+$(a+(b+c))$.
+All these operations change the size and structure of
+an annotated regular expressions, adding unnecessary 
+complexities to the size bound proof.\\
+For example, if we define the size of a basic regular expression 
+in the usual way,
 \begin{center}
 	\begin{tabular}{ccc}
 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
@@ -295,76 +294,22 @@
 		$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
 	\end{tabular}
 \end{center}
-
 \noindent
-The idea of obatining a bound for $\llbracket \bderssimp{a}{s} \rrbracket$
-is to get an equivalent form
-of something like $\llbracket \bderssimp{a}{s}\rrbracket = f(a, s)$, where $f(a, s)$ 
-is easier to estimate than $\llbracket \bderssimp{a}{s}\rrbracket$.
-We notice that while it is not so clear how to obtain
-a metamorphic representation of $\bderssimp{a}{s}$ (as we argued in chapter \ref{Bitcoded2},
-not interleaving the application of the functions $\backslash$ and $\bsimp{\_}$ 
-in the order as our lexer will result in the bit-codes dispensed differently),
-it is possible to get an slightly different representation of the unlifted versions:
-$ (\bderssimp{a}{s})_\downarrow = (\erase \; \bsimp{a \backslash s})_\downarrow$.
-This suggest setting the bounding function $f(a, s)$ as 
-$\llbracket  (a \backslash s)_\downarrow \rrbracket_p$, the plain size
-of the erased annotated regular expression.
-This requires the the regular expression accompanied by bitcodes
-to have the same size as its plain counterpart after erasure:
+Then the property
 \begin{center}
-	$\asize{a} \stackrel{?}{=} \llbracket \erase(a)\rrbracket_p$. 
-\end{center}
-\noindent
-But there is a minor nuisance: 
-the erase function unavoidbly messes with the structure of the regular expression,
-due to the discrepancy between annotated regular expression's $\sum$ constructor
-and plain regular expression's $+$ constructor having different arity.
-\begin{center}
-	\begin{tabular}{ccc}
-		$\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\
-		$\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\
-		$\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$
-	\end{tabular}
+	$\llbracket a \rrbracket = \llbracket a_\downarrow \rrbracket_p$
 \end{center}
-\noindent
-An alternative regular expression with an empty list of children
-is turned into a $\ZERO$ during the
-$\erase$ function, thereby changing the size and structure of the regex.
-Therefore the equality in question does not hold.
-
-These will likely be fixable if we really want to use plain $\rexp$s for dealing
-with size, but we choose a more straightforward (or stupid) method by 
-
-Similarly we could define the derivative  and simplification on 
-$\rrexp$, which would be identical to those we defined for plain $\rexp$s in chapter1, 
-except that now they can operate on alternatives taking multiple arguments.
+does not hold.
+One might be able to prove an inequality such as
+$\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
+and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
+but we found our approach more straightforward.\\
 
-\begin{center}
-	\begin{tabular}{lcr}
-		$(\RALTS{rs})\; \backslash c$ & $\dn$ &  $\RALTS{\map\; (\_ \backslash c) \;rs}$\\
-		(other clauses omitted)
-		With the new $\rrexp$ datatype in place, one can define its size function,
-		which precisely mirrors that of the annotated regular expressions:
-	\end{tabular}
-\end{center}
-\noindent
-\begin{center}
-	\begin{tabular}{ccc}
-		$\llbracket _{bs}\ONE \rrbracket_r$ & $\dn$ & $1$\\
-		$\llbracket \ZERO \rrbracket_r$ & $\dn$ & $1$ \\
-		$\llbracket _{bs} r_1 \cdot r_2 \rrbracket_r$ & $\dn$ & $\llbracket r_1 \rrbracket_r + \llbracket r_2 \rrbracket_r + 1$\\
-		$\llbracket _{bs}\mathbf{c} \rrbracket_r $ & $\dn$ & $1$\\
-		$\llbracket _{bs}\sum as \rrbracket_r $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket_r)\; as   + 1$\\
-		$\llbracket _{bs} a^* \rrbracket_r $ & $\dn$ & $\llbracket a \rrbracket_r + 1$
-	\end{tabular}
-\end{center}
-\noindent
-
-\subsection{Lexing Related Functions for $\rrexp$}
-Everything else for $\rrexp$ will be precisely the same for annotated expressions,
-except that they do not involve rectifying and augmenting bit-encoded tokenization information.
-As expected, most functions are simpler, such as the derivative:
+\subsection{Lexing Related Functions for $\rrexp$ such as $\backslash_r$, $\rdistincts$, and $\rsimp$}
+The operations on r-regular expressions are 
+almost identical to those of the annotated regular expressions,
+except that no bitcodes are used. For example,
+the derivative operation becomes simpler:\\
 \begin{center}
 	\begin{tabular}{@{}lcl@{}}
 		$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\  
@@ -374,7 +319,7 @@
 		$(\sum \;\textit{rs})\,\backslash_r c$ & $\dn$ &
 		$\sum\;(\textit{map} \; (\_\backslash_r c) \; rs )$\\
 		$(r_1\cdot r_2)\,\backslash_r c$ & $\dn$ &
-		$\textit{if}\;\textit{rnullable}\,r_1$\\
+		$\textit{if}\;(\textit{rnullable}\,r_1)$\\
 						 & &$\textit{then}\;\sum\,[(r_1\,\backslash_r c)\cdot\,r_2,$\\
 						 & &$\phantom{\textit{then},\;\sum\,}((r_2\,\backslash_r c))]$\\
 						 & &$\textit{else}\;\,(r_1\,\backslash_r c)\cdot r_2$\\
@@ -384,20 +329,25 @@
 	\end{tabular}    
 \end{center}  
 \noindent
-The simplification function is simplified without annotation causing superficial differences.
-Duplicate removal without  an equivalence relation:
+Similarly, $\distinctBy$ does not need 
+a function checking equivalence because
+there are no bit annotations causing superficial differences
+between syntactically equal terms.
 \begin{center}
 	\begin{tabular}{lcl}
 		$\rdistinct{[]}{rset} $ & $\dn$ & $[]$\\
-		$\rdistinct{r :: rs}{rset}$ & $\dn$ & $\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\
-					    &        & $\textit{else}\; r::\rdistinct{rs}{(rset \cup \{r\})}$
+		$\rdistinct{r :: rs}{rset}$ & $\dn$ & 
+		$\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\
+					    &        & $\textit{else}\; \;
+					    r::\rdistinct{rs}{(rset \cup \{r\})}$
 	\end{tabular}
 \end{center}
 %TODO: definition of rsimp (maybe only the alternative clause)
 \noindent
-The prefix $r$ in front of $\rdistinct{}{}$ is used mainly to 
-differentiate with $\textit{distinct}$, which is a built-in predicate
-in Isabelle that says all the elements of a list are unique.
+Notice there is a difference between our $\rdistincts$ and
+the Isabelle $\textit {distinct}$ function.
+In Isabelle $\textit{distinct}$ is a predicate
+that tests if all the elements of a list are unique.\\
 With $\textit{rdistinct}$ one can chain together all the other modules
 of $\bsimp{\_}$ (removing the functionalities related to bit-sequences)
 and get $\textit{rsimp}$ and $\rderssimp{\_}{\_}$.

--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex	Fri Sep 02 11:06:26 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex	Fri Sep 02 19:18:50 2022 +0100
@@ -34,6 +34,7 @@
 \newcommand{\ONE}{\mbox{\bf 1}}
 \newcommand{\AALTS}[2]{\oplus {\scriptstyle #1}\, #2}
 \newcommand{\rdistinct}[2]{\textit{rdistinct} \;\; #1 \;\; #2}
+\def\rdistincts{\textit{rdistinct}}
 \def\rDistinct{\textit{rdistinct}}
 \newcommand\hflat[1]{\llparenthesis  #1 \rrparenthesis_*}
 \newcommand\hflataux[1]{\llparenthesis #1 \rrparenthesis_*'}

author	Chengsong
	Fri, 02 Sep 2022 19:18:50 +0100
changeset 594	62f8fa03863e
parent 593	83fab852d72d
child 595	fa92124d1fb7

ChengsongTanPhdThesis/Chapters/Cubic.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Finite.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Introduction.tex		file \| annotate \| diff \| comparison \| revisions