# HG changeset patch # User Chengsong # Date 1662142730 -3600 # Node ID 62f8fa03863e58c60dd0af8ee0070ded4930f619 # Parent 83fab852d72d9b654b768eb300db7fe42eed3e28 more diff -r 83fab852d72d -r 62f8fa03863e ChengsongTanPhdThesis/Chapters/Cubic.tex --- a/ChengsongTanPhdThesis/Chapters/Cubic.tex Fri Sep 02 11:06:26 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Fri Sep 02 19:18:50 2022 +0100 @@ -497,7 +497,7 @@ - \section{The NTIMES Constructor, and the Size Bound and Correctness Proof for it} +\section{The NTIMES Constructor, and the Size Bound and Correctness Proof for it} The NTIMES construct has the following closed form: \begin{verbatim} "rders_simp (RNTIMES r0 (Suc n)) (c#s) = diff -r 83fab852d72d -r 62f8fa03863e ChengsongTanPhdThesis/Chapters/Finite.tex --- a/ChengsongTanPhdThesis/Chapters/Finite.tex Fri Sep 02 11:06:26 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Finite.tex Fri Sep 02 19:18:50 2022 +0100 @@ -185,20 +185,25 @@ The key observation is that $\distinctBy$'s output is a list with a constant length bound. \end{itemize} -We give details of these steps in the next sections. -The first step is to use -$\textit{rrexp}$s, -something simpler than -annotated regular expressions. +We will expand on these steps in the next sections.\\ \section{The $\textit{Rrexp}$ Datatype and Its Lexing-Related Functions} -We first recap the definition of the new datatype $\rrexp$, called +The first step is to define +$\textit{rrexp}$s. +They are without bitcodes, +allowing a much simpler size bound proof. +Of course, the bits which encode the lexing information +would grow linearly with respect +to the input, which should be taken into account when we wish to tackle the runtime comlexity. +But for the sake of the structural size +we can safely ignore them.\\ +To recapitulate, the datatype +definition of the $\rrexp$, called \emph{r-regular expressions}, -which we first defined in \ref{rrexpDef}. -R-regular expressions are similar to basic regular expressions. -We call them \emph{r}-regular expressions +was initially defined in \ref{rrexpDef}. +The reason for the prefix $r$ is to make a distinction -with plain regular expressions. +with basic regular expressions. \[ \rrexp ::= \RZERO \mid \RONE \mid \RCHAR{c} \mid \RSEQ{r_1}{r_2} @@ -222,8 +227,7 @@ The $r$ in the subscript of $\llbracket \rrbracket_r$ is to differentiate with the same operation for annotated regular expressions. Adding $r$ as subscript will be used in -other operations as well. - +other operations as well.\\ The transformation from an annotated regular expression to an r-regular expression is straightforward. \begin{center} @@ -237,54 +241,49 @@ \end{tabular} \end{center} \noindent -$\textit{Rerase}$ throws away the bitcodes on the annotated regular expressions, +$\textit{Rerase}$ throws away the bitcodes +on the annotated regular expressions, but keeps everything else intact. - -Before we introduce more functions related to r-regular expressions, -we first give out the reason why we take all the trouble -defining a new datatype in the first place. -We could calculate the size of annotated regular expressions in terms of -their un-annotated $\rrexp$ counterpart: -\begin{lemma} +Therefore it does not change the size +of an annotated regular expression: +\begin{lemma}\label{rsizeAsize} $\rsize{\rerase a} = \asize a$ \end{lemma} \begin{proof} By routine structural induction on $a$. \end{proof} \noindent + \subsection{Motivation Behind a New Datatype} -The main difference between a plain regular expression -and an r-regular expression is that it can take -non-binary arguments for its alternative constructor. -This turned out to be necessary if we want our proofs to be -simple. +The reason we take all the trouble +defining a new datatype is that $\erase$ makes things harder. We initially started by using plain regular expressions and tried to prove -equalities like +the lemma \ref{rsizeAsize}, +however the $\erase$ function unavoidbly messes with the structure of the +annotated regular expression. +The $+$ constructor +of basic regular expressions is binary whereas $\sum$ +takes a list, and one has to convert between them: \begin{center} - $\llbracket a \rrbracket = \llbracket a\downarrow \rrbracket_p$ + \begin{tabular}{ccc} + $\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\ + $\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\ + $\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$ + \end{tabular} \end{center} -and -\[ - \llbracket a \backslash_{bsimps} s \rrbracket = - \llbracket a \downarrow \backslash_s s -\] -One might be able to prove that -$\llbracket a \downarrow \rrbracket_p \leq \llbracket a \rrbracket$. -$\rrexp$ give the exact correspondence between an annotated regular expression -and its (r-)erased version: - -This does not hold for plain $\rexp$s. - - -These operations are -almost identical to those of the annotated regular expressions, -except that no bitcodes are attached. -Of course, the bits which encode the lexing information would grow linearly with respect -to the input, which should be taken into account when we wish to tackle the runtime comlexity. -But at the current stage -we can safely ignore them. -Similarly there is a size function for plain regular expressions: +\noindent +An alternative regular expression with an empty argument list +will be turned into a $\ZERO$. +The singleton alternative $\sum [r]$ would have $r$ during the +$\erase$ function. +The annotated regular expression $\sum[a, b, c]$ would turn into +$(a+(b+c))$. +All these operations change the size and structure of +an annotated regular expressions, adding unnecessary +complexities to the size bound proof.\\ +For example, if we define the size of a basic regular expression +in the usual way, \begin{center} \begin{tabular}{ccc} $\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\ @@ -295,76 +294,22 @@ $\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$ \end{tabular} \end{center} - \noindent -The idea of obatining a bound for $\llbracket \bderssimp{a}{s} \rrbracket$ -is to get an equivalent form -of something like $\llbracket \bderssimp{a}{s}\rrbracket = f(a, s)$, where $f(a, s)$ -is easier to estimate than $\llbracket \bderssimp{a}{s}\rrbracket$. -We notice that while it is not so clear how to obtain -a metamorphic representation of $\bderssimp{a}{s}$ (as we argued in chapter \ref{Bitcoded2}, -not interleaving the application of the functions $\backslash$ and $\bsimp{\_}$ -in the order as our lexer will result in the bit-codes dispensed differently), -it is possible to get an slightly different representation of the unlifted versions: -$ (\bderssimp{a}{s})_\downarrow = (\erase \; \bsimp{a \backslash s})_\downarrow$. -This suggest setting the bounding function $f(a, s)$ as -$\llbracket (a \backslash s)_\downarrow \rrbracket_p$, the plain size -of the erased annotated regular expression. -This requires the the regular expression accompanied by bitcodes -to have the same size as its plain counterpart after erasure: +Then the property \begin{center} - $\asize{a} \stackrel{?}{=} \llbracket \erase(a)\rrbracket_p$. -\end{center} -\noindent -But there is a minor nuisance: -the erase function unavoidbly messes with the structure of the regular expression, -due to the discrepancy between annotated regular expression's $\sum$ constructor -and plain regular expression's $+$ constructor having different arity. -\begin{center} - \begin{tabular}{ccc} - $\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\ - $\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\ - $\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$ - \end{tabular} + $\llbracket a \rrbracket = \llbracket a_\downarrow \rrbracket_p$ \end{center} -\noindent -An alternative regular expression with an empty list of children -is turned into a $\ZERO$ during the -$\erase$ function, thereby changing the size and structure of the regex. -Therefore the equality in question does not hold. - -These will likely be fixable if we really want to use plain $\rexp$s for dealing -with size, but we choose a more straightforward (or stupid) method by - -Similarly we could define the derivative and simplification on -$\rrexp$, which would be identical to those we defined for plain $\rexp$s in chapter1, -except that now they can operate on alternatives taking multiple arguments. +does not hold. +One might be able to prove an inequality such as +$\llbracket a \rrbracket \leq \llbracket a_\downarrow \rrbracket_p $ +and then estimate $\llbracket a_\downarrow \rrbracket_p$, +but we found our approach more straightforward.\\ -\begin{center} - \begin{tabular}{lcr} - $(\RALTS{rs})\; \backslash c$ & $\dn$ & $\RALTS{\map\; (\_ \backslash c) \;rs}$\\ - (other clauses omitted) - With the new $\rrexp$ datatype in place, one can define its size function, - which precisely mirrors that of the annotated regular expressions: - \end{tabular} -\end{center} -\noindent -\begin{center} - \begin{tabular}{ccc} - $\llbracket _{bs}\ONE \rrbracket_r$ & $\dn$ & $1$\\ - $\llbracket \ZERO \rrbracket_r$ & $\dn$ & $1$ \\ - $\llbracket _{bs} r_1 \cdot r_2 \rrbracket_r$ & $\dn$ & $\llbracket r_1 \rrbracket_r + \llbracket r_2 \rrbracket_r + 1$\\ - $\llbracket _{bs}\mathbf{c} \rrbracket_r $ & $\dn$ & $1$\\ - $\llbracket _{bs}\sum as \rrbracket_r $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket_r)\; as + 1$\\ - $\llbracket _{bs} a^* \rrbracket_r $ & $\dn$ & $\llbracket a \rrbracket_r + 1$ - \end{tabular} -\end{center} -\noindent - -\subsection{Lexing Related Functions for $\rrexp$} -Everything else for $\rrexp$ will be precisely the same for annotated expressions, -except that they do not involve rectifying and augmenting bit-encoded tokenization information. -As expected, most functions are simpler, such as the derivative: +\subsection{Lexing Related Functions for $\rrexp$ such as $\backslash_r$, $\rdistincts$, and $\rsimp$} +The operations on r-regular expressions are +almost identical to those of the annotated regular expressions, +except that no bitcodes are used. For example, +the derivative operation becomes simpler:\\ \begin{center} \begin{tabular}{@{}lcl@{}} $(\ZERO)\,\backslash_r c$ & $\dn$ & $\ZERO$\\ @@ -374,7 +319,7 @@ $(\sum \;\textit{rs})\,\backslash_r c$ & $\dn$ & $\sum\;(\textit{map} \; (\_\backslash_r c) \; rs )$\\ $(r_1\cdot r_2)\,\backslash_r c$ & $\dn$ & - $\textit{if}\;\textit{rnullable}\,r_1$\\ + $\textit{if}\;(\textit{rnullable}\,r_1)$\\ & &$\textit{then}\;\sum\,[(r_1\,\backslash_r c)\cdot\,r_2,$\\ & &$\phantom{\textit{then},\;\sum\,}((r_2\,\backslash_r c))]$\\ & &$\textit{else}\;\,(r_1\,\backslash_r c)\cdot r_2$\\ @@ -384,20 +329,25 @@ \end{tabular} \end{center} \noindent -The simplification function is simplified without annotation causing superficial differences. -Duplicate removal without an equivalence relation: +Similarly, $\distinctBy$ does not need +a function checking equivalence because +there are no bit annotations causing superficial differences +between syntactically equal terms. \begin{center} \begin{tabular}{lcl} $\rdistinct{[]}{rset} $ & $\dn$ & $[]$\\ - $\rdistinct{r :: rs}{rset}$ & $\dn$ & $\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\ - & & $\textit{else}\; r::\rdistinct{rs}{(rset \cup \{r\})}$ + $\rdistinct{r :: rs}{rset}$ & $\dn$ & + $\textit{if}(r \in \textit{rset}) \; \textit{then} \; \rdistinct{rs}{rset}$\\ + & & $\textit{else}\; \; + r::\rdistinct{rs}{(rset \cup \{r\})}$ \end{tabular} \end{center} %TODO: definition of rsimp (maybe only the alternative clause) \noindent -The prefix $r$ in front of $\rdistinct{}{}$ is used mainly to -differentiate with $\textit{distinct}$, which is a built-in predicate -in Isabelle that says all the elements of a list are unique. +Notice there is a difference between our $\rdistincts$ and +the Isabelle $\textit {distinct}$ function. +In Isabelle $\textit{distinct}$ is a predicate +that tests if all the elements of a list are unique.\\ With $\textit{rdistinct}$ one can chain together all the other modules of $\bsimp{\_}$ (removing the functionalities related to bit-sequences) and get $\textit{rsimp}$ and $\rderssimp{\_}{\_}$. diff -r 83fab852d72d -r 62f8fa03863e ChengsongTanPhdThesis/Chapters/Introduction.tex --- a/ChengsongTanPhdThesis/Chapters/Introduction.tex Fri Sep 02 11:06:26 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex Fri Sep 02 19:18:50 2022 +0100 @@ -34,6 +34,7 @@ \newcommand{\ONE}{\mbox{\bf 1}} \newcommand{\AALTS}[2]{\oplus {\scriptstyle #1}\, #2} \newcommand{\rdistinct}[2]{\textit{rdistinct} \;\; #1 \;\; #2} +\def\rdistincts{\textit{rdistinct}} \def\rDistinct{\textit{rdistinct}} \newcommand\hflat[1]{\llparenthesis #1 \rrparenthesis_*} \newcommand\hflataux[1]{\llparenthesis #1 \rrparenthesis_*'}