--- a/ChengsongTanPhdThesis/Chapters/Finite.tex Fri Sep 02 11:06:26 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Fri Sep 02 19:18:50 2022 +0100
@@ -185,20 +185,25 @@
The key observation is that $\distinctBy$'s output is
a list with a constant length bound.
\end{itemize}
-We give details of these steps in the next sections.
-The first step is to use
-$\textit{rrexp}$s,
-something simpler than
-annotated regular expressions.
+We will expand on these steps in the next sections.\\
\section{The $\textit{Rrexp}$ Datatype and Its Lexing-Related Functions}
-We first recap the definition of the new datatype $\rrexp$, called
+The first step is to define
+$\textit{rrexp}$s.
+They are without bitcodes,
+allowing a much simpler size bound proof.
+Of course, the bits which encode the lexing information
+would grow linearly with respect
+to the input, which should be taken into account when we wish to tackle the runtime comlexity.
+But for the sake of the structural size
+we can safely ignore them.\\
+To recapitulate, the datatype
+definition of the $\rrexp$, called
\emph{r-regular expressions},
-which we first defined in \ref{rrexpDef}.
-R-regular expressions are similar to basic regular expressions.
-We call them \emph{r}-regular expressions
+was initially defined in \ref{rrexpDef}.
+The reason for the prefix $r$ is
to make a distinction
-with plain regular expressions.
+with basic regular expressions.
\[ \rrexp ::= \RZERO \mid \RONE
\mid \RCHAR{c}
\mid \RSEQ{r_1}{r_2}
@@ -222,8 +227,7 @@
The $r$ in the subscript of $\llbracket \rrbracket_r$ is to
differentiate with the same operation for annotated regular expressions.
Adding $r$ as subscript will be used in
-other operations as well.
-
+other operations as well.\\
The transformation from an annotated regular expression
to an r-regular expression is straightforward.
\begin{center}
@@ -237,54 +241,49 @@
\end{tabular}
\end{center}
\noindent
-$\textit{Rerase}$ throws away the bitcodes on the annotated regular expressions,
+$\textit{Rerase}$ throws away the bitcodes
+on the annotated regular expressions,
but keeps everything else intact.
-
-Before we introduce more functions related to r-regular expressions,
-we first give out the reason why we take all the trouble
-defining a new datatype in the first place.
-We could calculate the size of annotated regular expressions in terms of
-their un-annotated $\rrexp$ counterpart:
-\begin{lemma}
+Therefore it does not change the size
+of an annotated regular expression:
+\begin{lemma}\label{rsizeAsize}
$\rsize{\rerase a} = \asize a$
\end{lemma}
\begin{proof}
By routine structural induction on $a$.
\end{proof}
\noindent
+
\subsection{Motivation Behind a New Datatype}
-The main difference between a plain regular expression
-and an r-regular expression is that it can take
-non-binary arguments for its alternative constructor.
-This turned out to be necessary if we want our proofs to be
-simple.
+The reason we take all the trouble
+defining a new datatype is that $\erase$ makes things harder.
We initially started by using
plain regular expressions and tried to prove
-equalities like
+the lemma \ref{rsizeAsize},
+however the $\erase$ function unavoidbly messes with the structure of the
+annotated regular expression.
+The $+$ constructor
+of basic regular expressions is binary whereas $\sum$
+takes a list, and one has to convert between them:
\begin{center}
- $\llbracket a \rrbracket = \llbracket a\downarrow \rrbracket_p$
+ \begin{tabular}{ccc}
+ $\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\
+ $\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\
+ $\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$
+ \end{tabular}
\end{center}
-and
-\[
- \llbracket a \backslash_{bsimps} s \rrbracket =
- \llbracket a \downarrow \backslash_s s
-\]
-One might be able to prove that
-$\llbracket a \downarrow \rrbracket_p \leq \llbracket a \rrbracket$.
-$\rrexp$ give the exact correspondence between an annotated regular expression
-and its (r-)erased version:
-
-This does not hold for plain $\rexp$s.
-
-
-These operations are
-almost identical to those of the annotated regular expressions,
-except that no bitcodes are attached.
-Of course, the bits which encode the lexing information would grow linearly with respect
-to the input, which should be taken into account when we wish to tackle the runtime comlexity.
-But at the current stage
-we can safely ignore them.
-Similarly there is a size function for plain regular expressions:
+\noindent
+An alternative regular expression with an empty argument list
+will be turned into a $\ZERO$.
+The singleton alternative $\sum [r]$ would have $r$ during the
+$\erase$ function.
+The annotated regular expression $\sum[a, b, c]$ would turn into
+$(a+(b+c))$.
+All these operations change the size and structure of
+an annotated regular expressions, adding unnecessary
+complexities to the size bound proof.\\
+For example, if we define the size of a basic regular expression
+in the usual way,
\begin{center}
\begin{tabular}{ccc}
$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
@@ -295,76 +294,22 @@
$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
\end{tabular}
\end{center}
-
\noindent
-The idea of obatining a bound for $\llbracket \bderssimp{a}{s} \rrbracket$
-is to get an equivalent form
-of something like $\llbracket \bderssimp{a}{s}\rrbracket = f(a, s)$, where $f(a, s)$
-is easier to estimate than $\llbracket \bderssimp{a}{s}\rrbracket$.
-We notice that while it is not so clear how to obtain
-a metamorphic representation of $\bderssimp{a}{s}$ (as we argued in chapter \ref{Bitcoded2},
-not interleaving the application of the functions $\backslash$ and $\bsimp{\_}$
-in the order as our lexer will result in the bit-codes dispensed differently),
-it is possible to get an slightly different representation of the unlifted versions:
-$ (\bderssimp{a}{s})_\downarrow = (\erase \; \bsimp{a \backslash s})_\downarrow$.
-This suggest setting the bounding function $f(a, s)$ as
-$\llbracket (a \backslash s)_\downarrow \rrbracket_p$, the plain size
-of the erased annotated regular expression.
-This requires the the regular expression accompanied by bitcodes
-to have the same size as its plain counterpart after erasure:
+Then the property
\begin{center}
- $\asize{a} \stackrel{?}{=} \llbracket \erase(a)\rrbracket_p$.
-\end{center}
-\noindent
-But there is a minor nuisance:
-the erase function unavoidbly messes with the structure of the regular expression,
-due to the discrepancy between annotated regular expression's $\sum$ constructor
-and plain regular expression's $+$ constructor having different arity.
-\begin{center}
- \begin{tabular}{ccc}
- $\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\
- $\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\
- $\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$
- \end{tabular}
+ $\llbracket a \rrbracket = \llbracket a_\downarrow \rrbracket_p$
\end{center}
-\noindent
-An alternative regular expression with an empty list of children
-is turned into a $\ZERO$ during the
-$\erase$ function, thereby changing the size and structure of the regex.
-Therefore the equality in question does not hold.
-
-These will likely be fixable if we really want to use plain $\rexp$s for dealing
-with size, but we choose a more straightforward (or stupid) method by
-
-Similarly we could define the derivative and simplification on
-$\rrexp$, which would be identical to those we defined for plain $\rexp$s in chapter1,
-except that now they can operate on alternatives taking multiple arguments.
+does not hold.
+One might be able to prove an inequality such as
+$\llbracket a \rrbracket \leq \llbracket a_\downarrow \rrbracket_p $
+and then estimate $\llbracket a_\downarrow \rrbracket_p$,
+but we found our approach more straightforward.\\
-\begin{center}
- \begin{tabular}{lcr}
- $(\RALTS{rs})\; \backslash c$ & $\dn$ & $\RALTS{\map\; (\_ \backslash c) \;rs}$\\
- (other clauses omitted)
- With the new $\rrexp$ datatype in place, one can define its size function,
- which precisely mirrors that of the annotated regular expressions:
- \end{tabular}
-\end{center}
-\noindent
-\begin{center}
- \begin{tabular}{ccc}
- $\llbracket _{bs}\ONE \rrbracket_r$ & $\dn$ & $1$\\
- $\llbracket \ZERO \rrbracket_r$ & $\dn$ & $1$ \\
- $\llbracket _{bs} r_1 \cdot r_2 \rrbracket_r$ & $\dn$ & $\llbracket r_1 \rrbracket_r + \llbracket r_2 \rrbracket_r + 1$\\
- $\llbracket _{bs}\mathbf{c} \rrbracket_r $ & $\dn$ & $1$\\
- $\llbracket _{bs}\sum as \rrbracket_r $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket_r)\; as + 1$\\
- $\llbracket _{bs} a^* \rrbracket_r $ & $\dn$ & $\llbracket a \rrbracket_r + 1$
- \end{tabular}
-\end{center}
-\noindent
-
-\subsection{Lexing Related Functions for $\rrexp$}
-Everything else for $\rrexp$ will be precisely the same for annotated expressions,
-except that they do not involve rectifying and augmenting bit-encoded tokenization information.
-As expected, most functions are simpler, such as the derivative:
+\subsection{Lexing Related Functions for $\rrexp$ such as $\backslash_r$, $\rdistincts$, and $\rsimp$}
+The operations on r-regular expressions are
+almost identical to those of the annotated regular expressions,
+except that no bitcodes are used. For example,
+the derivative operation becomes simpler:\\
\begin{center}
\begin{tabular}{@{}lcl@{}}
$(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\
@@ -374,7 +319,7 @@
$(\sum \;\textit{rs})\,\backslash_r c$ & $\dn$ &
$\sum\;(\textit{map} \; (\_\backslash_r c) \; rs )$\\
$(r_1\cdot r_2)\,\backslash_r c$ & $\dn$ &
- $\textit{if}\;\textit{rnullable}\,r_1$\\
+ $\textit{if}\;(\textit{rnullable}\,r_1)$\\
& &$\textit{then}\;\sum\,[(r_1\,\backslash_r c)\cdot\,r_2,$\\
& &$\phantom{\textit{then},\;\sum\,}((r_2\,\backslash_r c))]$\\
& &$\textit{else}\;\,(r_1\,\backslash_r c)\cdot r_2$\\
@@ -384,20 +329,25 @@
\end{tabular}
\end{center}
\noindent
-The simplification function is simplified without annotation causing superficial differences.
-Duplicate removal without an equivalence relation:
+Similarly, $\distinctBy$ does not need
+a function checking equivalence because
+there are no bit annotations causing superficial differences
+between syntactically equal terms.
\begin{center}
\begin{tabular}{lcl}
$\rdistinct{[]}{rset} $ & $\dn$ & $[]$\\
- $\rdistinct{r :: rs}{rset}$ & $\dn$ & $\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\
- & & $\textit{else}\; r::\rdistinct{rs}{(rset \cup \{r\})}$
+ $\rdistinct{r :: rs}{rset}$ & $\dn$ &
+ $\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\
+ & & $\textit{else}\; \;
+ r::\rdistinct{rs}{(rset \cup \{r\})}$
\end{tabular}
\end{center}
%TODO: definition of rsimp (maybe only the alternative clause)
\noindent
-The prefix $r$ in front of $\rdistinct{}{}$ is used mainly to
-differentiate with $\textit{distinct}$, which is a built-in predicate
-in Isabelle that says all the elements of a list are unique.
+Notice there is a difference between our $\rdistincts$ and
+the Isabelle $\textit {distinct}$ function.
+In Isabelle $\textit{distinct}$ is a predicate
+that tests if all the elements of a list are unique.\\
With $\textit{rdistinct}$ one can chain together all the other modules
of $\bsimp{\_}$ (removing the functionalities related to bit-sequences)
and get $\textit{rsimp}$ and $\rderssimp{\_}{\_}$.