% Chapter Template+ −
+ −
\chapter{A Better Bound and Other Extensions} % Main chapter title+ −
+ −
\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound+ −
%in Chapter 4 to a polynomial one, and demonstrate how one can extend the+ −
%algorithm to include constructs such as bounded repetitions and negations.+ −
\lstset{style=myScalastyle}+ −
+ −
+ −
This chapter is a ``miscellaneous''+ −
chapter which records various+ −
extensions to our $\blexersimp$'s formalisations.\\+ −
Firstly we present further improvements+ −
made to our lexer algorithm $\blexersimp$.+ −
We devise a stronger simplification algorithm,+ −
called $\bsimpStrong$, which can prune away+ −
similar components in two regular expressions at the same + −
alternative level,+ −
even if these regular expressions are not exactly the same.+ −
We call the lexer that uses this stronger simplification function+ −
$\blexerStrong$.+ −
We conjecture that both+ −
\begin{center}+ −
$\blexerStrong \;r \; s = \blexer\; r\;s$+ −
\end{center}+ −
and+ −
\begin{center}+ −
$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$+ −
\end{center}+ −
hold, but formalising+ −
them is still work in progress.+ −
We give reasons why the correctness and cubic size bound proofs+ −
can be achieved,+ −
by exploring the connection between the internal+ −
data structure of our $\blexerStrong$ and+ −
Animirov's partial derivatives.\\+ −
Secondly, we extend our $\blexersimp$+ −
to support bounded repetitions ($r^{\{n\}}$).+ −
We update our formalisation of + −
the correctness and finiteness properties to+ −
include this new construct. With bounded repetitions+ −
we are able to out-compete other verified lexers such as+ −
Verbatim++ on regular expressions which involve a lot of+ −
repetitions. %We also present the idempotency property proof+ −
%of $\bsimp$, which leverages the idempotency proof of $\rsimp$.+ −
%This reinforces our claim that the fixpoint construction+ −
%originally required by Sulzmann and Lu can be removed in $\blexersimp$.+ −
\\+ −
Last but not least, we present our efforts and challenges we met+ −
in further improving the algorithm by data+ −
structures such as zippers.+ −
+ −
+ −
+ −
%----------------------------------------------------------------------------------------+ −
% SECTION strongsimp+ −
%----------------------------------------------------------------------------------------+ −
\section{A Stronger Version of Simplification}+ −
%TODO: search for isabelle proofs of algorithms that check equivalence + −
In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.+ −
For example, the regular expression + −
\[+ −
aa \cdot a^*+ a \cdot a^* + aa\cdot a^*+ −
\]+ −
contains three terms, + −
expressing three possibilities it will match future input.+ −
The first and the third terms are identical, which means we can eliminate+ −
the latter as we know it will not be picked up by $\bmkeps$.+ −
In $\bsimps$, the $\distinctBy$ function takes care of this.+ −
The criteria $\distinctBy$ uses for removing a duplicate+ −
$a_2$ in the list+ −
\begin{center}+ −
$rs_a@[a_1]@rs_b@[a_2]@rs_c$+ −
\end{center}+ −
is that + −
\begin{center}+ −
$\rerase{a_1} = \rerase{a_2}$.+ −
\end{center}+ −
It can be characterised as the $LD$ + −
rewrite rule in \ref{rrewriteRules}.\\+ −
The problem , however, is that identical components+ −
in two slightly different regular expressions cannot be removed:+ −
\begin{figure}[H]+ −
\[+ −
(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1+ −
\]+ −
\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}+ −
\end{figure}+ −
\noindent+ −
A simplification like this actually+ −
cannot be omitted,+ −
as without it the size could blow up even with our $\simp$ function:+ −
\begin{figure}[H]+ −
\centering+ −
\begin{tikzpicture}+ −
\begin{axis}[+ −
%xlabel={$n$},+ −
myplotstyle,+ −
xlabel={input length},+ −
ylabel={size},+ −
]+ −
\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};+ −
\end{axis}+ −
\end{tikzpicture}+ −
\caption{Runtime of $\blexersimp$ for matching + −
$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ + −
with strings + −
of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}+ −
\end{figure}+ −
\noindent+ −
We would like to apply the rewriting at some stage+ −
\begin{figure}[H]+ −
\[+ −
(a+b+d) \cdot r_1 \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1+ −
\]+ −
\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}+ −
\end{figure}+ −
\noindent+ −
in our $\simp$ function,+ −
so that it makes the simplification in \ref{partialDedup} possible.+ −
For example,+ −
can a function like the below one be used?+ −
%TODO: simp' that spills things+ −
+ −
Unfortunately,+ −
if we introduce them in our+ −
setting we would lose the POSIX property of our calculated values. + −
For example given the regular expression + −
\begin{center}+ −
$(a + ab)(bc + c)$+ −
\end{center}+ −
and the string+ −
\begin{center}+ −
$ab$,+ −
\end{center}+ −
then our algorithm generates the following+ −
correct POSIX value+ −
\begin{center}+ −
$\Seq \; (\Right \; ab) \; (\Right \; c)$.+ −
\end{center}+ −
Essentially it matches the string with the longer Right-alternative+ −
in the first sequence (and+ −
then the 'rest' with the character regular expression $c$ from the second sequence).+ −
If we add the simplification above, then we obtain the following value+ −
\begin{center}+ −
$\Left \; (\Seq \; a \; (\Left \; bc))$+ −
\end{center}+ −
where the $\Left$-alternatives get priority. + −
However this violates the POSIX rules.+ −
The reason for getting this undesired value+ −
is that the new rule splits this regular expression up into + −
\begin{center}+ −
$a\cdot(b c + c) + ab \cdot (bc + c)$,+ −
\end{center}+ −
which becomes a regular expression with a + −
totally different structure--the original+ −
was a sequence, and now it becomes an alternative.+ −
With an alternative the maximum munch rule no longer works.\\+ −
A method to reconcile this is to do the + −
transformation in \ref{desiredSimp} ``non-invasively'',+ −
meaning that we traverse the list of regular expressions+ −
\begin{center}+ −
$rs_a@[a]@rs_c$+ −
\end{center}+ −
in the alternative+ −
\begin{center}+ −
$\sum ( rs_a@[a]@rs_c)$+ −
\end{center}+ −
using a function similar to $\distinctBy$,+ −
but this time + −
we allow a more general list rewrite:+ −
\begin{mathpar}\label{cubicRule}+ −
\inferrule{\vspace{0mm} }{rs_a@[a]@rs_c + −
\stackrel{s}{\rightsquigarrow }+ −
rs_a@[\textit{prune} \; a \; rs_a]@rs_c }+ −
\end{mathpar}+ −
%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)+ −
where $\textit{prune} \;a \; acc$ traverses $a$+ −
without altering the structure of $a$, removing components in $a$+ −
that have appeared in the accumulator $acc$.+ −
For example+ −
\begin{center}+ −
$\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $+ −
\end{center}+ −
should be equal to + −
\begin{center}+ −
$(r_g+r_h)r_d$+ −
\end{center}+ −
because $r_gr_d$ and + −
$r_hr_d$ are the only terms+ −
that have not appeared in the accumulator list + −
\begin{center}+ −
$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.+ −
\end{center}+ −
We implemented + −
function $\textit{prune}$ in Scala,+ −
and incorporated into our lexer,+ −
by replacing the $\simp$ function + −
with a stronger version called $\bsimpStrong$+ −
that prunes regular expressions.+ −
We call this lexer $\blexerStrong$.+ −
$\blexerStrong$ is able to drastically reduce the+ −
internal data structure size which could+ −
trigger exponential behaviours in+ −
$\blexersimp$.+ −
\begin{figure}[H]+ −
\centering+ −
\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}+ −
\begin{tikzpicture}+ −
\begin{axis}[+ −
%xlabel={$n$},+ −
myplotstyle,+ −
xlabel={input length},+ −
ylabel={size},+ −
]+ −
\addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};+ −
\end{axis}+ −
\end{tikzpicture}+ −
&+ −
\begin{tikzpicture}+ −
\begin{axis}[+ −
%xlabel={$n$},+ −
myplotstyle,+ −
xlabel={input length},+ −
ylabel={size},+ −
]+ −
\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};+ −
\end{axis}+ −
\end{tikzpicture}\\+ −
\multicolumn{2}{l}{Graphs: Runtime for matching $((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings + −
of the form $\underbrace{aa..a}_{n}$.}+ −
\end{tabular} + −
\caption{aaaaaStarStar} \label{fig:aaaaaStarStar}+ −
\end{figure}+ −
\begin{figure}[H]+ −
\begin{lstlisting}+ −
def atMostEmpty(r: Rexp) : Boolean = r match {+ −
case ZERO => true+ −
case ONE => true+ −
case STAR(r) => atMostEmpty(r)+ −
case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)+ −
case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)+ −
case CHAR(_) => false+ −
}+ −
+ −
+ −
def isOne(r: Rexp) : Boolean = r match {+ −
case ONE => true+ −
case SEQ(r1, r2) => isOne(r1) && isOne(r2)+ −
case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne)+ −
case STAR(r0) => atMostEmpty(r0)+ −
case CHAR(c) => false+ −
case ZERO => false+ −
}+ −
+ −
//r = r' ~ tail' : If tail' matches tail => returns r'+ −
def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match {+ −
case SEQ(r1, r2) => + −
if(r2 == tail) + −
r1+ −
else+ −
ZERO+ −
case r => ZERO+ −
}+ −
+ −
def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{+ −
case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match+ −
{+ −
//all components have been removed, meaning this is effectively a duplicate+ −
//flats will take care of removing this AZERO + −
case Nil => AZERO+ −
case r::Nil => fuse(bs, r)+ −
case rs1 => AALTS(bs, rs1)+ −
}+ −
case ASEQ(bs, r1, r2) => + −
//remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1+ −
prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {+ −
//after pruning, returns 0+ −
case AZERO => AZERO+ −
//after pruning, got r1'.r2, where r1' is equal to 1+ −
case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)+ −
//assemble the pruned head r1p with r2+ −
case r1p => ASEQ(bs, r1p, r2)+ −
}+ −
//this does the duplicate component removal task+ −
case r => if(acc(erase(r))) AZERO else r+ −
}+ −
\end{lstlisting}+ −
\caption{pruning function together with its helper functions}+ −
\end{figure}+ −
\noindent+ −
The benefits of using + −
$\textit{prune}$ such as refining the finiteness bound+ −
to a cubic bound has not been formalised yet.+ −
Therefore we choose to use Scala code rather than an Isabelle-style formal + −
definition like we did for $\simp$, as the definitions might change+ −
to suit proof needs.+ −
In the rest of the chapter we will use this convention consistently.+ −
\begin{figure}[H]+ −
\begin{lstlisting}+ −
def distinctWith(rs: List[ARexp], + −
pruneFunction: (ARexp, Set[Rexp]) => ARexp, + −
acc: Set[Rexp] = Set()) : List[ARexp] = + −
rs match{+ −
case Nil => Nil+ −
case r :: rs => + −
if(acc(erase(r)))+ −
distinctWith(rs, pruneFunction, acc)+ −
else {+ −
val pruned_r = pruneFunction(r, acc)+ −
pruned_r :: + −
distinctWith(rs, + −
pruneFunction, + −
turnIntoTerms(erase(pruned_r)) ++: acc+ −
)+ −
}+ −
}+ −
\end{lstlisting}+ −
\caption{A Stronger Version of $\textit{distinctBy}$}+ −
\end{figure}+ −
\noindent+ −
The function $\textit{prune}$ is used in $\distinctWith$.+ −
$\distinctWith$ is a stronger version of $\distinctBy$+ −
which not only removes duplicates as $\distinctBy$ would+ −
do, but also uses the $\textit{pruneFunction}$+ −
argument to prune away verbose components in a regular expression.\\+ −
\begin{figure}[H]+ −
\begin{lstlisting}+ −
//a stronger version of simp+ −
def bsimpStrong(r: ARexp): ARexp =+ −
{+ −
r match {+ −
case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {+ −
//normal clauses same as simp+ −
case (AZERO, _) => AZERO+ −
case (_, AZERO) => AZERO+ −
case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)+ −
//bs2 can be discarded+ −
case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil+ −
case (r1s, r2s) => ASEQ(bs1, r1s, r2s)+ −
}+ −
case AALTS(bs1, rs) => {+ −
//distinctBy(flat_res, erase)+ −
distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {+ −
case Nil => AZERO+ −
case s :: Nil => fuse(bs1, s)+ −
case rs => AALTS(bs1, rs)+ −
}+ −
}+ −
//stars that can be treated as 1+ −
case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)+ −
case r => r+ −
}+ −
}+ −
\end{lstlisting}+ −
\caption{The function $\bsimpStrong$ and $\bdersStrongs$}+ −
\end{figure}+ −
\noindent+ −
$\distinctWith$, is in turn used in $\bsimpStrong$:+ −
\begin{figure}[H]+ −
\begin{lstlisting}+ −
//Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3)+ −
def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {+ −
case Nil => r+ −
case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))+ −
}+ −
\end{lstlisting}+ −
\caption{The function $\bsimpStrong$ and $\bdersStrongs$}+ −
\end{figure}+ −
\noindent+ −
We conjecture that the above Scala function $\bdersStrongs$,+ −
written $\bdersStrong{\_}{\_}$ as an infix notation, + −
satisfies the following property:+ −
\begin{conjecture}+ −
$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$+ −
\end{conjecture}+ −
The stronger version of $\blexersimp$'s+ −
code in Scala looks like: + −
\begin{figure}[H]+ −
\begin{lstlisting}+ −
def strongBlexer(r: Rexp, s: String) : Option[Val] = {+ −
Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)+ −
}+ −
def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {+ −
case Nil => {+ −
if (bnullable(r)) {+ −
mkepsBC(r)+ −
}+ −
else+ −
throw new Exception("Not matched")+ −
}+ −
case c::cs => {+ −
strong_blex_simp(strongBsimp(bder(c, r)), cs)+ −
}+ −
}+ −
\end{lstlisting}+ −
\end{figure}+ −
\noindent+ −
We would like to preserve the correctness like the one + −
we had for $\blexersimp$:+ −
\begin{conjecture}\label{cubicConjecture}+ −
$\blexerStrong \;r \; s = \blexer\; r\;s$+ −
\end{conjecture}+ −
\noindent+ −
We introduce the new rule \ref{cubicRule}+ −
and augment our proofs in chapter \ref{Bitcoded2}.+ −
The idea is to maintain the properties like+ −
$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$ etc.+ −
+ −
In the next section,+ −
we will describe why we + −
believe a cubic bound can be achieved.+ −
We give an introduction to the+ −
partial derivatives,+ −
which was invented by Antimirov \cite{Antimirov95},+ −
and then link it with the result of the function+ −
$\bdersStrongs$.+ −
+ −
\section{Antimirov's partial derivatives}+ −
This suggests a link towrads "partial derivatives"+ −
introduced The idea behind Antimirov's partial derivatives+ −
is to do derivatives in a similar way as suggested by Brzozowski, + −
but maintain a set of regular expressions instead of a single one:+ −
+ −
%TODO: antimirov proposition 3.1, needs completion+ −
\begin{center} + −
\begin{tabular}{ccc}+ −
$\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\+ −
$\partial_x(\ONE)$ & $=$ & $\phi$+ −
\end{tabular}+ −
\end{center}+ −
+ −
Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together+ −
using the alternatives constructor, Antimirov cleverly chose to put them into+ −
a set instead. This breaks the terms in a derivative regular expression up, + −
allowing us to understand what are the "atomic" components of it.+ −
For example, To compute what regular expression $x^*(xx + y)^*$'s + −
derivative against $x$ is made of, one can do a partial derivative+ −
of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$+ −
from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.+ −
To get all the "atomic" components of a regular expression's possible derivatives,+ −
there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes+ −
whatever character is available at the head of the string inside the language of a+ −
regular expression, and gives back the character and the derivative regular expression+ −
as a pair (which he called "monomial"):+ −
\begin{center} + −
\begin{tabular}{ccc}+ −
$\lf(\ONE)$ & $=$ & $\phi$\\+ −
$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\+ −
$\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\+ −
$\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\+ −
\end{tabular}+ −
\end{center}+ −
%TODO: completion+ −
+ −
There is a slight difference in the last three clauses compared+ −
with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular + −
expression $r$ with every element inside $\textit{rset}$ to create a set of + −
sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates + −
on a set of monomials (which Antimirov called "linear form") and a regular + −
expression, and returns a linear form:+ −
\begin{center} + −
\begin{tabular}{ccc}+ −
$l \bigodot (\ZERO)$ & $=$ & $\phi$\\+ −
$l \bigodot (\ONE)$ & $=$ & $l$\\+ −
$\phi \bigodot t$ & $=$ & $\phi$\\+ −
$\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\+ −
$\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\+ −
$\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\+ −
$\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\+ −
$\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\+ −
\end{tabular}+ −
\end{center}+ −
%TODO: completion+ −
+ −
Some degree of simplification is applied when doing $\bigodot$, for example,+ −
$l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,+ −
and $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and+ −
$\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,+ −
and so on.+ −
+ −
With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with + −
an iterative procedure:+ −
\begin{center} + −
\begin{tabular}{llll}+ −
$\textit{while}$ & $(\Delta_i \neq \phi)$ & & \\+ −
& $\Delta_{i+1}$ & $ =$ & $\lf(\Delta_i) - \PD_i$ \\+ −
& $\PD_{i+1}$ & $ =$ & $\Delta_{i+1} \cup \PD_i$ \\+ −
$\partial_{UNIV}(r)$ & $=$ & $\PD$ & + −
\end{tabular}+ −
\end{center}+ −
+ −
+ −
$(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,+ −
+ −
+ −
+ −
+ −
%----------------------------------------------------------------------------------------+ −
% SECTION 2+ −
%----------------------------------------------------------------------------------------+ −
+ −
\section{Bounded Repetitions}+ −
We have promised in chapter \ref{Introduction}+ −
that our lexing algorithm can potentially be extended+ −
to handle bounded repetitions+ −
in natural and elegant ways.+ −
Now we fulfill our promise by adding support for + −
the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$.+ −
We add clauses in our derivatives-based lexing algorithms (with simplifications)+ −
introduced in chapter \ref{Bitcoded2}.+ −
+ −
\subsection{Augmented Definitions}+ −
There are a number of definitions that needs to be augmented.+ −
The most notable one would be the POSIX rules for $r^{\{n\}}$:+ −
\begin{mathpar}+ −
\inferrule{v \in vs}{\textit{Stars} place holder}+ −
\end{mathpar}+ −
+ −
+ −
\subsection{Proofs for the Augmented Lexing Algorithm}+ −
We need to maintain two proofs with the additional $r^{\{n\}}$+ −
construct: the + −
correctness proof in chapter \ref{Bitcoded2},+ −
and the finiteness proof in chapter \ref{Finite}.+ −
+ −
\subsubsection{Correctness Proof Augmentation}+ −
The correctness of $\textit{lexer}$ and $\textit{blexer}$ with bounded repetitions+ −
have been proven by Ausaf and Urban\cite{AusafDyckhoffUrban2016}.+ −
As they have commented, once the definitions are in place,+ −
the proofs given for the basic regular expressions will extend to+ −
bounded regular expressions, and there are no ``surprises''.+ −
We confirm this point because the correctness theorem would also+ −
extend without surprise to $\blexersimp$.+ −
The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on+ −
do not need to be changed,+ −
and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to + −
add one more line which can be solved by sledgehammer + −
to solve the $r^{\{n\}}$ inductive case.+ −
+ −
+ −
\subsubsection{Finiteness Proof Augmentation}+ −
The bounded repetitions are+ −
very similar to stars, and therefore the treatment+ −
is similar, with minor changes to handle some slight complications+ −
when the counter reaches 0.+ −
The exponential growth is similar:+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
$r^{\{n\}} $ & $\longrightarrow_{\backslash c}$\\+ −
$(r\backslash c) \cdot + −
r^{\{n - 1\}}*$ & $\longrightarrow_{\backslash c'}$\\+ −
\\+ −
$r \backslash cc' \cdot r^{\{n - 2\}}* + + −
r \backslash c' \cdot r^{\{n - 1\}}*$ &+ −
$\longrightarrow_{\backslash c''}$\\+ −
\\+ −
$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + + −
r \backslash c''\cdot r^{\{n-1\}}) + + −
(r \backslash c'c'' \cdot r^{\{n-2\}}* + + −
r \backslash c'' \cdot r^{\{n-1\}}*)$ & + −
$\longrightarrow_{\backslash c'''}$ \\+ −
\\+ −
$\ldots$\\+ −
\end{tabular}+ −
\end{center}+ −
Again, we assume that $r\backslash c$, $r \backslash cc'$ and so on+ −
are all nullable.+ −
The flattened list of terms for $r^{\{n\}} \backslash_{rs} s$+ −
\begin{center}+ −
$[r_1 \backslash cc'c'' \cdot r^{\{n-3\}}*,\;+ −
r \backslash c''\cdot r^{\{n-1\}}, \; + −
r \backslash c'c'' \cdot r^{\{n-2\}}*, \;+ −
r \backslash c'' \cdot r^{\{n-1\}}*,\; \ldots ]$ + −
\end{center}+ −
that comes from + −
\begin{center}+ −
$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + + −
r \backslash c''\cdot r^{\{n-1\}}) + + −
(r \backslash c'c'' \cdot r^{\{n-2\}}* + + −
r \backslash c'' \cdot r^{\{n-1\}}*)+ \ldots$ + −
\end{center}+ −
are made of sequences with different tails, where the counters+ −
might differ.+ −
The observation for maintaining the bound is that+ −
these counters never exceed $n$, the original+ −
counter. With the number of counters staying finite,+ −
$\rDistinct$ will deduplicate and keep the list finite.+ −
We introduce this idea as a lemma once we describe all+ −
the necessary helper functions.+ −
+ −
Similar to the star case, we want+ −
\begin{center}+ −
$\rderssimp{r^{\{n\}}}{s} = \rsimp{\sum rs}$.+ −
\end{center}+ −
where $rs$+ −
shall be in the form of + −
$\map \; f \; Ss$, where $f$ is a function and+ −
$Ss$ a list of objects to act on.+ −
For star, the object's datatype is string.+ −
The list of strings $Ss$+ −
is generated using functions + −
$\starupdate$ and $\starupdates$.+ −
The function that takes a string and returns a regular expression+ −
is the anonymous function $+ −
(\lambda s'. \; r\backslash s' \cdot r^{\{m\}})$.+ −
In the NTIMES setting,+ −
the $\starupdate$ and $\starupdates$ functions are replaced by + −
$\textit{nupdate}$ and $\textit{nupdates}$:+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\nupdate \; c \; r \; [] $ & $\dn$ & $[]$\\+ −
$\nupdate \; c \; r \; + −
(\Some \; (s, \; n + 1) \; :: \; Ss)$ & $\dn$ & %\\+ −
$\textit{if} \; + −
(\rnullable \; (r \backslash_{rs} s))$ \\+ −
& & $\;\;\textit{then} + −
\;\; \Some \; (s @ [c], n + 1) :: \Some \; ([c], n) :: (+ −
\nupdate \; c \; r \; Ss)$ \\+ −
& & $\textit{else} \;\; \Some \; (s @ [c], n+1) :: (+ −
\nupdate \; c \; r \; Ss)$\\+ −
$\nupdate \; c \; r \; (\textit{None} :: Ss)$ & $\dn$ & + −
$(\None :: \nupdate \; c \; r \; Ss)$\\+ −
& & \\+ −
%\end{tabular}+ −
%\end{center}+ −
%\begin{center}+ −
%\begin{tabular}{lcl}+ −
$\nupdates \; [] \; r \; Ss$ & $\dn$ & $Ss$\\+ −
$\nupdates \; (c :: cs) \; r \; Ss$ & $\dn$ & $\nupdates \; cs \; r \; (+ −
\nupdate \; c \; r \; Ss)$+ −
\end{tabular}+ −
\end{center}+ −
\noindent+ −
which take into account when a subterm+ −
\begin{center}+ −
$r \backslash_s s \cdot r^{\{n\}}$+ −
\end{center}+ −
counter $n$+ −
is 0, and therefore expands to + −
\begin{center}+ −
$r \backslash_s (s@[c]) \cdot r^{\{n\}} \;++ −
\; \ZERO$ + −
\end{center}+ −
after taking a derivative.+ −
The object now has type + −
\begin{center}+ −
$\textit{option} \;(\textit{string}, \textit{nat})$+ −
\end{center}+ −
and therefore the function for converting such an option into+ −
a regular expression term is called $\opterm$:+ −
+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\opterm \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\+ −
& & $\;\;\Some \; (s, n) \Rightarrow + −
(r\backslash_{rs} s)\cdot r^{\{n\}}$\\+ −
& & $\;\;\None \Rightarrow + −
\ZERO$\\+ −
\end{tabular}+ −
\end{center}+ −
\noindent+ −
Put together, the list $\map \; f \; Ss$ is instantiated as+ −
\begin{center}+ −
$\map \; (\opterm \; r) \; (\nupdates \; s \; r \;+ −
[\Some \; ([c], n)])$.+ −
\end{center}+ −
For the closed form to be bounded, we would like+ −
simplification to be applied to each term in the list.+ −
Therefore we introduce some variants of $\opterm$,+ −
which help conveniently express the rewriting steps + −
needed in the closed form proof.+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\optermOsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\+ −
& & $\;\;\Some \; (s, n) \Rightarrow + −
\textit{rsimp} \; ((r\backslash_{rs} s)\cdot r^{\{n\}})$\\+ −
& & $\;\;\None \Rightarrow + −
\ZERO$\\+ −
\\+ −
$\optermosimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\+ −
& & $\;\;\Some \; (s, n) \Rightarrow + −
(\textit{rsimp} \; (r\backslash_{rs} s)) + −
\cdot r^{\{n\}}$\\+ −
& & $\;\;\None \Rightarrow + −
\ZERO$\\+ −
\\+ −
$\optermsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\+ −
& & $\;\;\Some \; (s, n) \Rightarrow + −
(r\backslash_{rsimps} s)\cdot r^{\{n\}}$\\+ −
& & $\;\;\None \Rightarrow + −
\ZERO$\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
+ −
For a list of + −
$\textit{option} \;(\textit{string}, \textit{nat})$ elements,+ −
we define the highest power for it recursively:+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\hpa \; [] \; n $ & $\dn$ & $n$\\+ −
$\hpa \; (\None :: os) \; n $ & $\dn$ & $\hpa \; os \; n$\\+ −
$\hpa \; (\Some \; (s, n) :: os) \; m$ & $\dn$ & + −
$\hpa \;os \; (\textit{max} \; n\; m)$\\+ −
\\+ −
$\hpower \; rs $ & $\dn$ & $\hpa \; rs \; 0$\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
Now the intuition that an NTIMES regular expression's power+ −
does not increase can be easily expressed as+ −
\begin{lemma}\label{nupdatesMono2}+ −
$\hpower \; (\nupdates \;s \; r \; [\Some \; ([c], n)]) \leq n$+ −
\end{lemma}+ −
\begin{proof}+ −
Note that the power is non-increasing after a $\nupdate$ application:+ −
\begin{center}+ −
$\hpa \;\; (\nupdate \; c \; r \; Ss)\;\; m \leq + −
\hpa\; \; Ss \; m$.+ −
\end{center}+ −
This is also the case for $\nupdates$:+ −
\begin{center}+ −
$\hpa \;\; (\nupdates \; s \; r \; Ss)\;\; m \leq + −
\hpa\; \; Ss \; m$.+ −
\end{center}+ −
Therefore we have that+ −
\begin{center}+ −
$\hpower \;\; (\nupdates \; s \; r \; Ss) \leq+ −
\hpower \;\; Ss$+ −
\end{center}+ −
which leads to the lemma being proven.+ −
+ −
\end{proof}+ −
+ −
+ −
We also define the inductive rules for+ −
the shape of derivatives of the NTIMES regular expressions:\\[-3em]+ −
\begin{center}+ −
\begin{mathpar}+ −
\inferrule{\mbox{}}{\cbn \;\ZERO}+ −
+ −
\inferrule{\mbox{}}{\cbn \; \; r_a \cdot (r^{\{n\}})}+ −
+ −
\inferrule{\cbn \; r_1 \;\; \; \cbn \; r_2}{\cbn \; r_1 + r_2}+ −
+ −
\inferrule{\cbn \; r}{\cbn \; r + \ZERO}+ −
\end{mathpar}+ −
\end{center}+ −
\noindent+ −
A derivative of NTIMES fits into the shape described by $\cbn$:+ −
\begin{lemma}\label{ntimesDersCbn}+ −
$\cbn \; ((r' \cdot r^{\{n\}}) \backslash_{rs} s)$ holds.+ −
\end{lemma}+ −
\begin{proof}+ −
By a reverse induction on $s$.+ −
For the inductive case, note that if $\cbn \; r$ holds,+ −
then $\cbn \; (r\backslash_r c)$ holds.+ −
\end{proof}+ −
In addition, for $\cbn$-shaped regular expressioins, one can flatten+ −
them:+ −
\begin{lemma}\label{ntimesHfauPushin}+ −
If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = + −
\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;+ −
(\hflataux{r})})$+ −
\end{lemma}+ −
\begin{proof}+ −
By an induction on the inductive cases of $\cbn$.+ −
\end{proof}+ −
+ −
+ −
+ −
This time we do not need to define the flattening functions for NTIMES only,+ −
because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already:+ −
\begin{lemma}\label{ntimesHfauInduct}+ −
$\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = + −
\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$+ −
\end{lemma}+ −
\begin{proof}+ −
By a reverse induction on $s$.+ −
The lemma \ref{ntimesDersCbn} is used.+ −
\end{proof}+ −
\noindent+ −
We have a recursive property for NTIMES with $\nupdate$ + −
similar to that for STAR,+ −
and one for $\nupdates $ as well:+ −
\begin{lemma}\label{nupdateInduct1}+ −
(i) $\textit{concat} \; (\map \; (\hflataux{\_} \circ (+ −
\opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \;+ −
c \; r \; Ss)$\\+ −
(ii) $\textit{concat} \; (\map \; \hflataux{\_}\; + −
\map \; (\_\backslash_r x) \;+ −
(\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss))) =+ −
\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ holds.+ −
\end{lemma}+ −
\begin{proof}+ −
(i) is by an induction on $Ss$.+ −
(ii) is by an induction on $xs$.+ −
\end{proof}+ −
+ −
The $\nString$ predicate is defined for conveniently+ −
expressing that there are no empty strings in the+ −
$\Some \;(s, n)$ elements generated by $\nupdate$:+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\nString \; \None$ & $\dn$ & $ \textit{true}$\\+ −
$\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\+ −
$\nString \; (\Some \; (c::s, n))$ & $\dn$ & $ \textit{true}$\\+ −
\end{tabular}+ −
\end{center}+ −
\begin{lemma}\label{nupdatesNonempty}+ −
If for all elements $opt$ in $\textit{set} \; Ss$,+ −
$\nString \; opt$ holds, the we have that+ −
for all elements $opt$ in $\textit{set} \; (\nupdates \; s \; r \; Ss)$,+ −
$\nString \; opt$ holds.+ −
\end{lemma}+ −
\begin{proof}+ −
By an induction on $s$, where $Ss$ is set to vary over all possible values.+ −
\end{proof}+ −
+ −
\noindent+ −
+ −
\begin{lemma}\label{ntimesClosedFormsSteps}+ −
The following list of equalities or rewriting relations hold:\\+ −
(i) $r^{\{n+1\}} \backslash_{rsimps} (c::s) = + −
\textit{rsimp} \; (\sum (\map \; (\opterm \;r \;\_) \; (\nupdates \;+ −
s \; r \; [\Some \; ([c], n)])))$\\+ −
(ii)+ −
\begin{center}+ −
$\sum (\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [+ −
\Some \; ([c], n)]))$ \\ $ \sequal$\\+ −
$\sum (\map \; (\textit{rsimp} \circ (\opterm \; r))\; (\nupdates \;+ −
s\;r \; [\Some \; ([c], n)]))$\\+ −
\end{center}+ −
(iii)+ −
\begin{center}+ −
$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;+ −
([c], n)]))$\\ + −
$\sequal$\\+ −
$\sum \;(\map \; (\optermsimp r) \; (\nupdates \; s \; r \; [\Some \;+ −
([c], n)])) $\\+ −
\end{center}+ −
(iv)+ −
\begin{center}+ −
$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;+ −
([c], n)])) $ \\ $\sequal$\\+ −
$\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;+ −
([c], n)])) $\\+ −
\end{center}+ −
(v)+ −
\begin{center}+ −
$\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;+ −
([c], n)])) $ \\ $\sequal$\\+ −
$\sum \; (\map \; (\textit{rsimp} \circ (\opterm \; r)) \;+ −
(\nupdates \; s \; r \; [\Some \; ([c], n)]))$+ −
\end{center}+ −
\end{lemma}+ −
\begin{proof}+ −
Routine.+ −
(iii) and (iv) make use of the fact that all the strings $s$+ −
inside $\Some \; (s, m)$ which are elements of the list+ −
$\nupdates \; s\;r\;[\Some\; ([c], n)]$ are non-empty,+ −
which is from lemma \ref{nupdatesNonempty}.+ −
Once the string in $o = \Some \; (s, n)$ is + −
nonempty, $\optermsimp \; r \;o$,+ −
$\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed+ −
to be equal.+ −
\end{proof}+ −
+ −
\begin{theorem}\label{ntimesClosedForm}+ −
The derivative of $r^{\{n+1\}}$ can be described as an alternative+ −
containing a list+ −
of terms:\\+ −
$r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; (+ −
\sum (\map \; (\optermsimp \; r) \; (\nupdates \; s \; r \;+ −
[\Some \; ([c], n)])))$+ −
\end{theorem}+ −
\begin{proof}+ −
By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}.+ −
\end{proof}+ −
+ −
\begin{lemma}\label{nupdatesNLeqN}+ −
For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \;+ −
[\Some \; ([c], n)])$, either $o = \None$, or $o = \Some+ −
\; (s', m)$ for some string $s'$ and number $m \leq n$.+ −
\end{lemma}+ −
+ −
+ −
\begin{lemma}\label{ntimesClosedFormListElemShape}+ −
For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (+ −
\nupdates \; s \; r \; [\Some \; ([c], n)]))$,+ −
we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot+ −
r^{\{m\}}$ for some string $s'$ and number $m \leq n$.+ −
\end{lemma}+ −
\begin{proof}+ −
Using lemma \ref{nupdatesNLeqN}.+ −
\end{proof}+ −
+ −
\begin{theorem}\label{ntimesClosedFormBounded}+ −
Assuming that for any string $s$, $\llbracket r \backslash_{rsimps} s+ −
\rrbracket_r \leq N$ holds, then we have that\\+ −
$\llbracket r^{\{n+1\}} \backslash_{rsimps} s \rrbracket_r \leq+ −
\textit{max} \; (c_N+1)* (N + \llbracket r^{\{n\}} \rrbracket+1)$,+ −
where $c_N = \textit{card} \; (\textit{sizeNregex} \; (+ −
N + \llbracket r^{\{n\}} \rrbracket_r+1))$.+ −
\end{theorem}+ −
\begin{proof}+ −
We have that for all regular expressions $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (+ −
\nupdates \; s \; r \; [\Some \; ([c], n)]))$,+ −
$r'$'s size is less than or equal to $N + \llbracket r^{\{n\}} + −
\rrbracket_r + 1$+ −
because $r'$ can only be either a $\ZERO$ or $r \backslash_{rsimps} s' \cdot+ −
r^{\{m\}}$ for some string $s'$ and number + −
$m \leq n$ (lemma \ref{ntimesClosedFormListElemShape}).+ −
In addition, we know that the list + −
$\map \; (\optermsimp \; r) \; (+ −
\nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most+ −
$c_N = \textit{card} \; + −
(\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$.+ −
This gives us $\llbracket \llbracket r \backslash_{rsimps} \;s \llbracket_r+ −
\leq N * c_N$.+ −
\end{proof}+ −
+ −
+ −
Assuming we have that+ −
\begin{center}+ −
$\rderssimp{r^*}{s} = \rsimp{(\sum \map \; (\lambda s'. r\backslash s' \cdot r^*) \; Ss)}$.+ −
\end{center}+ −
holds.+ −
The idea of $\starupdate$ and $\starupdates$+ −
is to update $Ss$ when another+ −
derivative is taken on $\rderssimp{r^*}{s}$+ −
w.r.t a character $c$ and a string $s'$+ −
respectively.+ −
Both $\starupdate$ and $\starupdates$ take three arguments as input:+ −
the new character $c$ or string $s$ to take derivative with, + −
the regular expression+ −
$r$ under the star $r^*$, and the+ −
list of strings $Ss$ for the derivative $r^* \backslash s$ + −
up until this point + −
such that + −
\begin{center}+ −
$(r^*) \backslash s = \sum_{s' \in sSet} (r\backslash s') \cdot r^*$ + −
\end{center}+ −
is satisfied.+ −
+ −
Functions $\starupdate$ and $\starupdates$ characterise what the + −
star derivatives will look like once ``straightened out'' into lists.+ −
The helper functions for such operations will be similar to+ −
$\sflat{\_}$, $\sflataux{\_}$ and $\sflataux{\_}$, which we defined for sequence.+ −
We use similar symbols to denote them, with a $*$ subscript to mark the difference.+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\hflataux{r_1 + r_2}$ & $\dn$ & $\hflataux{r_1} @ \hflataux{r_2}$\\+ −
$\hflataux{r}$ & $\dn$ & $[r]$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\hflat{r_1 + r_2}$ & $\dn$ & $\sum (\hflataux {r_1} @ \hflataux {r_2}) $\\+ −
$\hflat{r}$ & $\dn$ & $r$+ −
\end{tabular}+ −
\end{center}+ −
\noindent+ −
These definitions are tailor-made for dealing with alternatives that have+ −
originated from a star's derivatives.+ −
A typical star derivative always has the structure of a balanced binary tree:+ −
\begin{center}+ −
$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') + + −
(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $ + −
\end{center}+ −
All of the nested structures of alternatives+ −
generated from derivatives are binary, and therefore+ −
$\hflat{\_}$ and $\hflataux{\_}$ only deal with binary alternatives.+ −
$\hflat{\_}$ ``untangles'' like the following:+ −
\[+ −
\sum ((r_1 + r_2) + (r_3 + r_4)) + \ldots \;+ −
\stackrel{\hflat{\_}}{\longrightarrow} \;+ −
\RALTS{[r_1, r_2, \ldots, r_n]} + −
\]+ −
Here is a lemma stating the recursive property of $\starupdate$ and $\starupdates$,+ −
with the helpers $\hflat{\_}$ and $\hflataux{\_}$\footnote{The function $\textit{concat}$ takes a list of lists + −
and merges each of the element lists to form a flattened list.}:+ −
\begin{lemma}\label{stupdateInduct1}+ −
\mbox+ −
For a list of strings $Ss$, the following hold.+ −
\begin{itemize}+ −
\item+ −
If we do a derivative on the terms + −
$r\backslash_r s \cdot r^*$ (where $s$ is taken from the list $Ss$),+ −
the result will be the same as if we apply $\starupdate$ to $Ss$.+ −
\begin{center}+ −
\begin{tabular}{c}+ −
$\textit{concat} \; (\map \; (\hflataux{\_} \circ ( (\_\backslash_r x)+ −
\circ (\lambda s.\;\; (r \backslash_r s) \cdot r^*)))\; Ss )\;+ −
$\\+ −
\\+ −
$=$ \\+ −
\\+ −
$\map \; (\lambda s. (r \backslash_r s) \cdot (r^*)) \; + −
(\starupdate \; x \; r \; Ss)$+ −
\end{tabular}+ −
\end{center}+ −
\item+ −
$\starupdates$ is ``composable'' w.r.t a derivative.+ −
It piggybacks the character $x$ to the tail of the string+ −
$xs$.+ −
\begin{center}+ −
\begin{tabular}{c}+ −
$\textit{concat} \; (\map \; \hflataux{\_} \; + −
(\map \; (\_\backslash_r x) \; + −
(\map \; (\lambda s.\;\; (r \backslash_r s) \cdot + −
(r^*) ) \; (\starupdates \; xs \; r \; Ss))))$\\+ −
\\+ −
$=$\\+ −
\\+ −
$\map \; (\lambda s.\;\; (r\backslash_r s) \cdot (r^*)) \;+ −
(\starupdates \; (xs @ [x]) \; r \; Ss)$+ −
\end{tabular}+ −
\end{center}+ −
\end{itemize}+ −
\end{lemma}+ −
+ −
\begin{proof}+ −
Part 1 is by induction on $Ss$.+ −
Part 2 is by induction on $xs$, where $Ss$ is left to take arbitrary values.+ −
\end{proof}+ −
+ −
+ −
Like $\textit{createdBySequence}$, we need+ −
a predicate for ``star-created'' regular expressions:+ −
\begin{center}+ −
\begin{mathpar}+ −
\inferrule{\mbox{}}{ \textit{createdByStar}\; \RSEQ{ra}{\RSTAR{rb}} }+ −
+ −
\inferrule{ \textit{createdByStar} \; r_1\; \land \; \textit{createdByStar} \; r_2 }{\textit{createdByStar} \; (r_1 + r_2) } + −
\end{mathpar}+ −
\end{center}+ −
\noindent+ −
All regular expressions created by taking derivatives of+ −
$r_1 \cdot (r_2)^*$ satisfy the $\textit{createdByStar}$ predicate:+ −
\begin{lemma}\label{starDersCbs}+ −
$\textit{createdByStar} \; ((r_1 \cdot r_2^*) \backslash_r s) $ holds.+ −
\end{lemma}+ −
\begin{proof}+ −
By a reverse induction on $s$.+ −
\end{proof}+ −
If a regular expression conforms to the shape of a star's derivative,+ −
then we can push an application of $\hflataux{\_}$ inside a derivative of it:+ −
\begin{lemma}\label{hfauPushin}+ −
If $\textit{createdByStar} \; r$ holds, then+ −
\begin{center}+ −
$\hflataux{r \backslash_r c} = \textit{concat} \; (+ −
\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$+ −
\end{center}+ −
holds.+ −
\end{lemma}+ −
\begin{proof}+ −
By an induction on the inductivev cases of $\textit{createdByStar}$.+ −
\end{proof}+ −
%This is not entirely true for annotated regular expressions: + −
%%TODO: bsimp bders \neq bderssimp+ −
%\begin{center}+ −
% $(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$+ −
%\end{center}+ −
%For bit-codes, the order in which simplification is applied+ −
%might cause a difference in the location they are placed.+ −
%If we want something like+ −
%\begin{center}+ −
% $\bderssimp{r}{s} \myequiv \bsimp{\bders{r}{s}}$+ −
%\end{center}+ −
%Some "canonicalization" procedure is required,+ −
%which either pushes all the common bitcodes to nodes+ −
%as senior as possible:+ −
%\begin{center}+ −
% $_{bs}(_{bs_1 @ bs'}r_1 + _{bs_1 @ bs''}r_2) \rightarrow _{bs @ bs_1}(_{bs'}r_1 + _{bs''}r_2) $+ −
%\end{center}+ −
%or does the reverse. However bitcodes are not of interest if we are talking about+ −
%the $\llbracket r \rrbracket$ size of a regex.+ −
%Therefore for the ease and simplicity of producing a+ −
%proof for a size bound, we are happy to restrict ourselves to + −
%unannotated regular expressions, and obtain such equalities as+ −
%TODO: rsimp sflat+ −
% The simplification of a flattened out regular expression, provided it comes+ −
%from the derivative of a star, is the same as the one nested.+ −
+ −
+ −
+ −
Now we introduce an inductive property+ −
for $\starupdate$ and $\hflataux{\_}$.+ −
\begin{lemma}\label{starHfauInduct}+ −
If we do derivatives of $r^*$+ −
with a string that starts with $c$,+ −
then flatten it out,+ −
we obtain a list+ −
of the shape $\sum_{s' \in sS} (r\backslash_r s') \cdot r^*$,+ −
where $sS = \starupdates \; s \; r \; [[c]]$. Namely,+ −
\begin{center}+ −
$\hflataux{(\rders{( (\rder{c}{r_0})\cdot(r_0^*))}{s})} = + −
\map \; (\lambda s_1. (r_0 \backslash_r s_1) \cdot (r_0^*)) \; + −
(\starupdates \; s \; r_0 \; [[c]])$+ −
\end{center}+ −
holds.+ −
\end{lemma}+ −
\begin{proof}+ −
By an induction on $s$, the inductive cases+ −
being $[]$ and $s@[c]$. The lemmas \ref{hfauPushin} and \ref{starDersCbs} are used.+ −
\end{proof}+ −
\noindent+ −
+ −
$\hflataux{\_}$ has a similar effect as $\textit{flatten}$:+ −
\begin{lemma}\label{hflatauxGrewrites}+ −
$a :: rs \grewrites \hflataux{a} @ rs$+ −
\end{lemma}+ −
\begin{proof}+ −
By induction on $a$. $rs$ is set to take arbitrary values.+ −
\end{proof}+ −
It is also not surprising that $\textit{rsimp}$ will cover+ −
the effect of $\hflataux{\_}$:+ −
\begin{lemma}\label{cbsHfauRsimpeq1}+ −
$\rsimp{(r_1 + r_2)} = \rsimp{(\RALTS{\hflataux{r_1} @ \hflataux{r_2}})}$+ −
\end{lemma}+ −
+ −
\begin{proof}+ −
By using the rewriting relation $\rightsquigarrow$+ −
\end{proof}+ −
And from this we obtain a proof that a star's derivative will be the same+ −
as if it had all its nested alternatives created during deriving being flattened out:+ −
For example,+ −
\begin{lemma}\label{hfauRsimpeq2}+ −
$\createdByStar{r} \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$+ −
\end{lemma}+ −
\begin{proof}+ −
By structural induction on $r$, where the induction rules + −
are these of $\createdByStar{\_}$.+ −
Lemma \ref{cbsHfauRsimpeq1} is used in the inductive case.+ −
\end{proof}+ −
+ −
+ −
%Here is a corollary that states the lemma in+ −
%a more intuitive way:+ −
%\begin{corollary}+ −
% $\hflataux{r^* \backslash_r (c::xs)} = \map \; (\lambda s. (r \backslash_r s) \cdot+ −
% (r^*))\; (\starupdates \; c\; r\; [[c]])$+ −
%\end{corollary}+ −
%\noindent+ −
%Note that this is also agnostic of the simplification+ −
%function we defined, and is therefore of more general interest.+ −
+ −
Together with the rewriting relation+ −
\begin{lemma}\label{starClosedForm6Hrewrites}+ −
We have the following set of rewriting relations or equalities:+ −
\begin{itemize}+ −
\item+ −
$\textit{rsimp} \; (r^* \backslash_r (c::s)) + −
\sequal+ −
\sum \; ( ( \sum (\lambda s. (r\backslash_r s) \cdot r^*) \; (+ −
\starupdates \; s \; r \; [ c::[]] ) ) )$+ −
\item+ −
$r \backslash_{rsimp} (c::s) = \textit{rsimp} \; ( (+ −
\sum ( (\map \; (\lambda s_1. (r\backslash s_1) \; r^*) \;+ −
(\starupdates \;s \; r \; [ c::[] ])))))$+ −
\item+ −
$\sum ( (\map \; (\lambda s. (r\backslash s) \; r^*) \; Ss))+ −
\sequal+ −
\sum ( (\map \; (\lambda s. \textit{rsimp} \; (r\backslash s) \;+ −
r^*) \; Ss) )$+ −
\item+ −
$\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss+ −
\scfrewrites+ −
\map \; (\lambda s. (\rsimp{r \backslash_r s}) \cdot (r^*)) \; Ss$+ −
\item+ −
$( ( \sum ( ( \map \ (\lambda s. \;\;+ −
(\textit{rsimp} \; (r \backslash_r s)) \cdot r^*) \; (\starupdates \;+ −
s \; r \; [ c::[] ])))))$\\+ −
$\sequal$\\+ −
$( ( \sum ( ( \map \ (\lambda s. \;\;+ −
( r \backslash_{rsimp} s)) \cdot r^*) \; (\starupdates \;+ −
s \; r \; [ c::[] ]))))$\\+ −
\end{itemize}+ −
\end{lemma}+ −
\begin{proof}+ −
Part 1 leads to part 2.+ −
The rest of them are routine.+ −
\end{proof}+ −
\noindent+ −
Next the closed form for star regular expressions can be derived:+ −
\begin{theorem}\label{starClosedForm}+ −
$\rderssimp{r^*}{c::s} = + −
\rsimp{+ −
(\sum (\map \; (\lambda s. (\rderssimp{r}{s})\cdot r^*) \; + −
(\starupdates \; s\; r \; [[c]])+ −
)+ −
)+ −
}+ −
$+ −
\end{theorem}+ −
\begin{proof}+ −
By an induction on $s$.+ −
The lemmas \ref{rsimpIdem}, \ref{starHfauInduct}, \ref{starClosedForm6Hrewrites} + −
and \ref{hfauRsimpeq2}+ −
are used. + −
In \ref{starClosedForm6Hrewrites}, the equalities are+ −
used to link the LHS and RHS.+ −
\end{proof}+ −
+ −
+ −
+ −
+ −
The closed form for them looks like:+ −
%\begin{center}+ −
% \begin{tabular}{llrclll}+ −
% $r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\+ −
% $\textit{rsimp}$ & $($ & $+ −
% \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\+ −
% & & & & $\textit{nupdates} \;$ & + −
% $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\+ −
% & & & & $)$ &\\+ −
% & & $)$ & & &\\+ −
% & $)$ & & & &\\+ −
% \end{tabular}+ −
%\end{center}+ −
\begin{center}+ −
\begin{tabular}{llrcllrllll}+ −
$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\+ −
&&&&$\textit{rsimp}$ & $($ & $+ −
\sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\+ −
&&&& & & & & $\;\; \textit{nupdates} \;$ & + −
$ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\+ −
&&&& & & & & $)$ &\\+ −
&&&& & & $)$ & & &\\+ −
&&&& & $)$ & & & &\\+ −
\end{tabular}+ −
\end{center}+ −
The $\textit{optermsimp}$ function with the argument $r$ + −
chooses from two options: $\ZERO$ or + −
We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$+ −
and $\starupdates$:+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\+ −
$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\+ −
& & $\textit{if} \; + −
(\rnullable \; (\rders \; r \; s))$ \\+ −
& & $\textit{then} \;\; (s @ [c]) :: [c] :: (+ −
\starupdate \; c \; r \; Ss)$ \\+ −
& & $\textit{else} \;\; (s @ [c]) :: (+ −
\starupdate \; c \; r \; Ss)$+ −
\end{tabular}+ −
\end{center}+ −
\noindent+ −
As a generalisation from characters to strings,+ −
$\starupdates$ takes a string instead of a character+ −
as the first input argument, and is otherwise the same+ −
as $\starupdate$.+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\+ −
$\starupdates \; (c :: cs) \; r \; Ss$ & $=$ & $\starupdates \; cs \; r \; (+ −
\starupdate \; c \; r \; Ss)$+ −
\end{tabular}+ −
\end{center}+ −
\noindent+ −
+ −
+ −
+ −
+ −
%----------------------------------------------------------------------------------------+ −
% SECTION 1+ −
%----------------------------------------------------------------------------------------+ −
+ −
%\section{Adding Support for the Negation Construct, and its Correctness Proof}+ −
%We now add support for the negation regular expression:+ −
%\[ r ::= \ZERO \mid \ONE+ −
% \mid c + −
% \mid r_1 \cdot r_2+ −
% \mid r_1 + r_2 + −
% \mid r^* + −
% \mid \sim r+ −
%\]+ −
%The $\textit{nullable}$ function's clause for it would be + −
%\[+ −
%\textit{nullable}(~r) = \neg \nullable(r)+ −
%\]+ −
%The derivative would be+ −
%\[+ −
%~r \backslash c = ~ (r \backslash c)+ −
%\]+ −
% + −
%The most tricky part of lexing for the $~r$ regular expression+ −
% is creating a value for it.+ −
% For other regular expressions, the value aligns with the + −
% structure of the regular expression:+ −
% \[+ −
% \vdash \Seq(\Char(a), \Char(b)) : a \cdot b+ −
% \]+ −
%But for the $~r$ regular expression, $s$ is a member of it if and only if+ −
%$s$ does not belong to $L(r)$. + −
%That means when there+ −
%is a match for the not regular expression, it is not possible to generate how the string $s$ matched+ −
%with $r$. + −
%What we can do is preserve the information of how $s$ was not matched by $r$,+ −
%and there are a number of options to do this.+ −
%+ −
%We could give a partial value when there is a partial match for the regular expression inside+ −
%the $\mathbf{not}$ construct.+ −
%For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,+ −
%A value for it could be + −
% \[+ −
% \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)+ −
% \]+ −
% The above example demonstrates what value to construct + −
% when the string $s$ is at most a real prefix+ −
% of the strings in $L(r)$. When $s$ instead is not a prefix of any strings+ −
% in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$+ −
% constructor.+ −
% + −
% Another option would be to either store the string $s$ that resulted in + −
% a mis-match for $r$ or a dummy value as a placeholder:+ −
% \[+ −
% \vdash \textit{Not}(abcd) : ~( r_1 )+ −
% \]+ −
%or+ −
% \[+ −
% \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )+ −
% \] + −
% We choose to implement this as it is most straightforward:+ −
% \[+ −
% \mkeps(~(r)) = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})+ −
% \]+ −
% + −
%+ −
%\begin{center}+ −
% \begin{tabular}{lcl}+ −
% $\ntset \; r \; (n+1) \; c::cs $ & $\dn$ & $\nupdates \;+ −
% cs \; r \; [\Some \; ([c], n)]$\\+ −
% $\ntset \; r\; 0 \; \_$ & $\dn$ & $\None$\\+ −
% $\ntset \; r \; \_ \; [] $ & $ \dn$ & $[]$\\+ −
% \end{tabular}+ −
%\end{center}+ −