lexing: ChengsongTanPhdThesis/Chapters/Cubic.tex@17c7611fb0a9


% Chapter Template

\chapter{A Better Bound and Other Extensions} % Main chapter title

\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
%algorithm to include constructs such as bounded repetitions and negations.
\lstset{style=myScalastyle}


This chapter is a ``miscellaneous''
chapter which records various
extensions to our $\blexersimp$'s formalisations.\\
Firstly we present further improvements
made to our lexer algorithm $\blexersimp$.
We devise a stronger simplification algorithm,
called $\bsimpStrong$, which can prune away
similar components in two regular expressions at the same 
alternative level,
even if these regular expressions are not exactly the same.
We call the lexer that uses this stronger simplification function
$\blexerStrong$.
We conjecture that both
\begin{center}
	$\blexerStrong \;r \; s = \blexer\; r\;s$
\end{center}
and
\begin{center}
	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
\end{center}
hold, but formalising
them is still work in progress.
We give reasons why the correctness and cubic size bound proofs
can be achieved,
by exploring the connection between the internal
data structure of our $\blexerStrong$ and
Animirov's partial derivatives.\\
Secondly, we extend our $\blexersimp$
to support bounded repetitions ($r^{\{n\}}$).
We update our formalisation of 
the correctness and finiteness properties to
include this new construct. 
we can out-compete other verified lexers such as
Verbatim++ on bounded regular expressions.
%We also present the idempotency property proof
%of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
%This reinforces our claim that the fixpoint construction
%originally required by Sulzmann and Lu can be removed in $\blexersimp$.

%Last but not least, we present our efforts and challenges we met
%in further improving the algorithm by data
%structures such as zippers.



%----------------------------------------------------------------------------------------
%	SECTION strongsimp
%----------------------------------------------------------------------------------------
\section{A Stronger Version of Simplification}
%TODO: search for isabelle proofs of algorithms that check equivalence 
In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
For example, the regular expression 
\[
	aa \cdot a^*+ a \cdot a^* + aa\cdot a^*
\]
contains three terms, 
expressing three possibilities it will match future input.
The first and the third terms are identical, which means we can eliminate
the latter as we know it will not be picked up by $\bmkeps$.
In $\bsimps$, the $\distinctBy$ function takes care of this.
The criteria $\distinctBy$ uses for removing a duplicate
$a_2$ in the list
\begin{center}
	$rs_a@[a_1]@rs_b@[a_2]@rs_c$
\end{center}
is that 
\begin{center}
	$\rerase{a_1} = \rerase{a_2}$.
\end{center}
It can be characterised as the $LD$ 
rewrite rule in \ref{rrewriteRules}.\\
The problem , however, is that identical components
in two slightly different regular expressions cannot be removed:
\begin{figure}[H]
\[
	(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
\]
\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}
\end{figure}
\noindent
A simplification like this actually
cannot be omitted,
as without it the size could blow up even with our $\textit{bsimp}$ 
function: for the chapter \ref{Finite} example
$\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$,
by just setting n to a small number,
we get exponential growth that does not stop before it becomes huge: 
\begin{figure}[H]
\centering
\begin{tikzpicture}
\begin{axis}[
    %xlabel={$n$},
    myplotstyle,
    xlabel={input length},
    ylabel={size},
    ]
\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
\end{axis}
\end{tikzpicture}
\caption{Runtime of $\blexersimp$ for matching 
	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ 
	with strings 
	of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
\end{figure}
\noindent
We would like to apply the rewriting at some stage
\begin{figure}[H]
\[
	(a+b+d) \cdot r_1  \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
\]
\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}
\end{figure}
\noindent
in our $\simp$ function,
so that it makes the simplification in \ref{partialDedup} possible.
Translating the rule into our $\textit{bsimp}$ function simply
involves adding a new clause to the $\textit{bsimp}_{ASEQ}$ function:
\begin{center}
	\begin{tabular}{@{}lcl@{}}
		$\textit{bsimp}_{ASEQ} \; bs\; a \; b$ & $\dn$ & $ (a,\; b) \textit{match}$\\
						       && $\ldots$\\
   &&$\quad\textit{case} \; (_{bs1}\sum as, a_2') \Rightarrow _{bs1}\sum (
   \map \; (_{[]}\textit{ASEQ} \; \_ \; a_2') \; as)$\\
   &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ \\
	\end{tabular}
\end{center}



Unfortunately,
if we introduce them in our
setting we would lose the POSIX property of our calculated values. 
For example given the regular expression 
\begin{center}
	$(a + ab)(bc + c)$
\end{center}
and the string
\begin{center}
	$ab$,
\end{center}
then our algorithm generates the following
correct POSIX value
\begin{center}
	$\Seq \; (\Right \; ab) \; (\Right \; c)$.
\end{center}
Essentially it matches the string with the longer Right-alternative
in the first sequence (and
then the 'rest' with the character regular expression $c$ from the second sequence).
If we add the simplification above, then we obtain the following value
\begin{center}
	$\Left \; (\Seq \; a \; (\Left \; bc))$
\end{center}
where the $\Left$-alternatives get priority. 
However this violates the POSIX rules.
The reason for getting this undesired value
is that the new rule splits this regular expression up into 
\begin{center}
	$a\cdot(b c + c) + ab \cdot (bc + c)$,
\end{center}
which becomes a regular expression with a 
totally different structure--the original
was a sequence, and now it becomes an alternative.
With an alternative the maximum munch rule no longer works.\\
A method to reconcile this is to do the 
transformation in \ref{desiredSimp} ``non-invasively'',
meaning that we traverse the list of regular expressions
\begin{center}
	$rs_a@[a]@rs_c$
\end{center}
in the alternative
\begin{center}
	$\sum ( rs_a@[a]@rs_c)$
\end{center}
using  a function similar to $\distinctBy$,
but this time 
we allow a more general list rewrite:
\begin{mathpar}\label{cubicRule}
	\inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c 
			\stackrel{s}{\rightsquigarrow }
		rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
\end{mathpar}
%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
where $\textit{prune} \;a \; acc$ traverses $a$
without altering the structure of $a$, removing components in $a$
that have appeared in the accumulator $acc$.
For example
\begin{center}
	$\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $
\end{center}
should be equal to 
\begin{center}
	$(r_g+r_h)r_d$
\end{center}
because $r_gr_d$ and 
$r_hr_d$ are the only terms
that have not appeared in the accumulator list 
\begin{center}
$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
\end{center}
We implemented 
function $\textit{prune}$ in Scala,
and incorporated into our lexer,
by replacing the $\simp$ function 
with a stronger version called $\bsimpStrong$
that prunes regular expressions.
\begin{figure}[H]

\begin{lstlisting}
    def atMostEmpty(r: Rexp) : Boolean = r match {
      case ZERO => true
      case ONE => true
      case STAR(r) => atMostEmpty(r)
      case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
      case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
      case CHAR(_) => false
    }


    def isOne(r: Rexp) : Boolean = r match {
      case ONE => true
      case SEQ(r1, r2) => isOne(r1) && isOne(r2)
      case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne)
      case STAR(r0) => atMostEmpty(r0)
      case CHAR(c) => false
      case ZERO => false
    }

    //r = r' ~ tail' : If tail' matches tail => returns r'
    def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match {
      case SEQ(r1, r2) => 
        if(r2 == tail) 
          r1
        else
          ZERO
      case r => ZERO
    }

    def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
      case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match
      {
        //all components have been removed, meaning this is effectively a duplicate
        //flats will take care of removing this AZERO 
        case Nil => AZERO
        case r::Nil => fuse(bs, r)
        case rs1 => AALTS(bs, rs1)
      }
      case ASEQ(bs, r1, r2) => 
        //remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1
        prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
          //after pruning, returns 0
          case AZERO => AZERO
          //after pruning, got r1'.r2, where r1' is equal to 1
          case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
          //assemble the pruned head r1p with r2
          case r1p => ASEQ(bs, r1p, r2)
        }
        //this does the duplicate component removal task
      case r => if(acc(erase(r))) AZERO else r
    }
\end{lstlisting}
\caption{pruning function together with its helper functions}
\end{figure}
\noindent
The benefits of using 
$\textit{prune}$ such as refining the finiteness bound
to a cubic bound has not been formalised yet.
Therefore we choose to use Scala code rather than an Isabelle-style formal 
definition like we did for $\simp$, as the definitions might change
to suit proof needs.
In the rest of the chapter we will use this convention consistently.
\begin{figure}[H]
\begin{lstlisting}
  def distinctWith(rs: List[ARexp], 
    pruneFunction: (ARexp, Set[Rexp]) => ARexp, 
    acc: Set[Rexp] = Set()) : List[ARexp] = 
      rs match{
        case Nil => Nil
        case r :: rs => 
          if(acc(erase(r)))
            distinctWith(rs, pruneFunction, acc)
          else {
            val pruned_r = pruneFunction(r, acc)
            pruned_r :: 
            distinctWith(rs, 
              pruneFunction, 
              turnIntoTerms(erase(pruned_r)) ++: acc
              )
          }
      }
\end{lstlisting}
\caption{A Stronger Version of $\textit{distinctBy}$}
\end{figure}
\noindent
The function $\textit{prune}$ is used in $\distinctWith$.
$\distinctWith$ is a stronger version of $\distinctBy$
which not only removes duplicates as $\distinctBy$ would
do, but also uses the $\textit{pruneFunction}$
argument to prune away verbose components in a regular expression.\\
\begin{figure}[H]
\begin{lstlisting}
   //a stronger version of simp
    def bsimpStrong(r: ARexp): ARexp =
    {
      r match {
        case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
          //normal clauses same as simp
        case (AZERO, _) => AZERO
        case (_, AZERO) => AZERO
        case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
        //bs2 can be discarded
        case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
        case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
        }
        case AALTS(bs1, rs) => {
          //distinctBy(flat_res, erase)
          distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
            case Nil => AZERO
            case s :: Nil => fuse(bs1, s)
            case rs => AALTS(bs1, rs)
          }
        }
        //stars that can be treated as 1
        case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
        case r => r
      }
    }
\end{lstlisting}
\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
\end{figure}
\noindent
$\distinctWith$, is in turn used in $\bsimpStrong$:
\begin{figure}[H]
\begin{lstlisting}
      //Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3)
      def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
        case Nil => r
        case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
      }
\end{lstlisting}
\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
\end{figure}
\noindent
We conjecture that the above Scala function $\bdersStrongs$,
written $\bdersStrong{\_}{\_}$ as an infix notation, 
satisfies the following property:
\begin{conjecture}
	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
\end{conjecture}
The stronger version of $\blexersimp$'s
code in Scala looks like: 
\begin{figure}[H]
\begin{lstlisting}
      def strongBlexer(r: Rexp, s: String) : Option[Val] = {
        Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)
      }
      def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {
        case Nil => {
          if (bnullable(r)) {
            mkepsBC(r)
          }
          else
            throw new Exception("Not matched")
        }
        case c::cs => {
          strong_blex_simp(strongBsimp(bder(c, r)), cs)
        }
      }
\end{lstlisting}
\end{figure}
\noindent
We call this lexer $\blexerStrong$.
$\blexerStrong$ is able to drastically reduce the
internal data structure size which could
trigger exponential behaviours in
$\blexersimp$.
\begin{figure}[H]
\centering
\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
\begin{tikzpicture}
\begin{axis}[
    %xlabel={$n$},
    myplotstyle,
    xlabel={input length},
    ylabel={size},
    width = 7cm,
    height = 5cm,
    ]
\addplot[red,mark=*, mark options={fill=white}] table {strongSimpCurve.data};
\end{axis}
\end{tikzpicture}
  &
\begin{tikzpicture}
\begin{axis}[
    %xlabel={$n$},
    myplotstyle,
    xlabel={input length},
    ylabel={size},
    width = 7cm,
    height = 5cm,
    ]
\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
\end{axis}
\end{tikzpicture}\\
\multicolumn{2}{l}{}
\end{tabular}    
\caption{Runtime for matching 
	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
	   of the form $\protect\underbrace{aa..a}_{n}$.}\label{fig:aaaaaStarStar}
\end{figure}
\noindent
We would like to preserve the correctness like the one 
we had for $\blexersimp$:
\begin{conjecture}\label{cubicConjecture}
	$\blexerStrong \;r \; s = \blexer\; r\;s$
\end{conjecture}
\noindent
The idea is to maintain key lemmas in
chapter \ref{Bitcoded2} like
$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
with the new rewriting rule \ref{cubicRule} .

In the next sub-section,
we will describe why we 
believe a cubic bound can be achieved.
We give an introduction to the
partial derivatives,
which was invented by Antimirov \cite{Antimirov95},
and then link it with the result of the function
$\bdersStrongs$.
 
\subsection{Antimirov's partial derivatives}
Partial derivatives were first introduced by 
Antimirov \cite{Antimirov95}.
It does derivatives in a similar way as suggested by Brzozowski, 
but splits children of alternative regular expressions into 
multiple independent terms, causing the output to become a 
set of regular expressions:
\begin{center}  
 	\begin{tabular}{lcl}
		$\partial_x \; (a \cdot b)$ & 
		$\dn$ & $\partial_x \; a\cdot b \cup
		\partial_x \; b \; \textit{if} \; \; \nullable\; a$\\
		      & & $\partial_x \; a\cdot b \quad\quad 
		      \textit{otherwise}$\\ 
		$\partial_x \; r^*$ & $\dn$ & $\partial_x \; r \cdot r^*$\\
		$\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \; 
		\textit{then} \;
		\{ \ONE\} \;\;\textit{else} \; \varnothing$\\
		$\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
		$\partial_x(\ONE)$ & $=$ & $\varnothing$\\
		$\partial_x(\ZERO)$ & $\dn$ & $\varnothing$\\
	\end{tabular}
\end{center}
\noindent
The $\cdot$ between for example 
$\partial_x \; a\cdot b $ 
is a shorthand notation for the cartesian product 
$\partial_x \; a \times \{ b\}$.
%Each element in the set generated by a partial derivative
%corresponds to a (potentially partial) match
%TODO: define derivatives w.r.t string s
Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
using the $\sum$ constructor, Antimirov put them into
a set.  This causes maximum de-duplication to happen, 
allowing us to understand what are the "atomic" components of it.
For example, To compute what regular expression $x^*(xx + y)^*$'s 
derivative against $x$ is made of, one can do a partial derivative
of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.

The set of all possible partial derivatives is defined
as the union of derivatives w.r.t all the strings in the universe:
\begin{center}
	\begin{tabular}{lcl}
		$\textit{PDER}_{UNIV} \; r $ & $\dn $ & $\bigcup_{w \in A^*}\partial_w \; r$
	\end{tabular}
\end{center}
\noindent

Back to our 
\begin{center}
	$((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$
\end{center}
example, if we denote this regular expression as $A$,
we have that
\begin{center}
$\textit{PDER}_{UNIV} \; A = 
\bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{ 
	(\underbrace{a \ldots a}_{\text{j a's}}\cdot
(\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot A \}$, 
\end{center}
with exactly $n * (n + 1) / 2$ terms.
This is in line with our speculation that only $n*(n+1)/2$ terms are
needed. We conjecture that $\bsimpStrong$ is also able to achieve this
upper limit in general
\begin{conjecture}\label{bsimpStrongInclusionPder}
	Using a suitable transformation $f$, we have 
	\begin{center}
		$\forall s.\; f \; (r \bdersStrong  \; s) \subseteq
		 \textit{PDER}_{UNIV} \; r$
	\end{center}
\end{conjecture}
\noindent
because our \ref{cubicRule} will keep only one copy of each term,
where the function $\textit{prune}$ takes care of maintaining
a set like structure similar to partial derivatives.
It is anticipated we might need to adjust $\textit{prune}$
slightly to make sure all duplicate terms are eliminated,
which should be doable.

Antimirov had proven that the sum of all the partial derivative 
terms' sizes is bounded by the cubic of the size of that regular
expression:
\begin{property}\label{pderBound}
	$\llbracket \textit{PDER}_{UNIV} \; r \rrbracket \leq O((\llbracket r \rrbracket)^3)$
\end{property}
This property was formalised by Urban, and the details are in the PDERIVS.thy file
in our repository.
Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound}
would yield us a cubic bound for our $\blexerStrong$ algorithm: 
\begin{conjecture}\label{strongCubic}
	$\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$
\end{conjecture}


%To get all the "atomic" components of a regular expression's possible derivatives,
%there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
%whatever character is available at the head of the string inside the language of a
%regular expression, and gives back the character and the derivative regular expression
%as a pair (which he called "monomial"):
% \begin{center}  
% \begin{tabular}{ccc}
% $\lf(\ONE)$ & $=$ & $\phi$\\
%$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
% $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
%\end{tabular}
%\end{center}
%%TODO: completion
%
%There is a slight difference in the last three clauses compared
%with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular 
%expression $r$ with every element inside $\textit{rset}$ to create a set of 
%sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates 
%on a set of monomials (which Antimirov called "linear form") and a regular 
%expression, and returns a linear form:
% \begin{center}  
% \begin{tabular}{ccc}
% $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
% $l \bigodot (\ONE)$ & $=$ & $l$\\
% $\phi \bigodot t$ & $=$ & $\phi$\\
% $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
% $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
%  $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
% $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
% $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
%\end{tabular}
%\end{center}
%%TODO: completion
%
% Some degree of simplification is applied when doing $\bigodot$, for example,
% $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
% and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
%  $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
%  and so on.
%  
%  With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with 
%  an iterative procedure:
%   \begin{center}  
% \begin{tabular}{llll}
%$\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
% 		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
%		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
%$\partial_{UNIV}(r)$ & $=$ & $\PD$ &		     
%\end{tabular}
%\end{center}
%
%
% $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,




%----------------------------------------------------------------------------------------
%	SECTION 2
%----------------------------------------------------------------------------------------

\section{Bounded Repetitions}
We have promised in chapter \ref{Introduction}
that our lexing algorithm can potentially be extended
to handle bounded repetitions
in natural and elegant ways.
Now we fulfill our promise by adding support for 
the ``exactly-$n$-times'' bounded regular expression $r^{\{n\}}$.
We add clauses in our derivatives-based lexing algorithms (with simplifications)
introduced in chapter \ref{Bitcoded2}.

\subsection{Augmented Definitions}
There are a number of definitions that need to be augmented.
The most notable one would be the POSIX rules for $r^{\{n\}}$:
\begin{center}
	\begin{mathpar}
		\inferrule{\forall v \in vs_1. \vdash v:r \land 
		|v| \neq []\\ \forall v \in vs_2. \vdash v:r \land |v| = []\\
		\textit{length} \; (vs_1 @ vs_2) = n}{\textit{Stars} \;
		(vs_1 @ vs_2) : r^{\{n\}} }
	\end{mathpar}
\end{center}
As Ausaf had pointed out \cite{Ausaf},
sometimes empty iterations have to be taken to get
a match with exactly $n$ repetitions,
and hence the $vs_2$ part.

Another important definition would be the size:
\begin{center}
	\begin{tabular}{lcl}
		$\llbracket r^{\{n\}} \rrbracket_r$ & $\dn$ & 
		$\llbracket r \rrbracket_r + n$\\
	\end{tabular}
\end{center}
\noindent
Arguably we should use $\log \; n$ for the size because
the number of digits increase logarithmically w.r.t $n$.
For simplicity we choose to add the counter directly to the size.

The derivative w.r.t a bounded regular expression
is given as 
\begin{center}
	\begin{tabular}{lcl}
		$r^{\{n\}} \backslash_r c$ & $\dn$ & 
		$r\backslash_r c \cdot r^{\{n-1\}} \;\; \textit{if} \; n \geq 1$\\
					   & & $\RZERO \;\quad \quad\quad \quad
					   \textit{otherwise}$\\
	\end{tabular}
\end{center}
\noindent
For brevity, we sometimes use NTIMES to refer to bounded 
regular expressions.
The $\mkeps$ function clause for NTIMES would be
\begin{center}
	\begin{tabular}{lcl}
		$\mkeps \; r^{\{n\}} $ & $\dn$ & $\Stars \;
		(\textit{replicate} \; n\; (\mkeps \; r))$
	\end{tabular}
\end{center}
\noindent
The injection looks like
\begin{center}
	\begin{tabular}{lcl}
		$\inj \; r^{\{n\}} \; c\; (\Seq \;v \; (\Stars \; vs)) $ & 
		$\dn$ & $\Stars \;
		((\inj \; r \;c \;v ) :: vs)$
	\end{tabular}
\end{center}
\noindent


\subsection{Proofs for the Augmented Lexing Algorithm}
We need to maintain two proofs with the additional $r^{\{n\}}$
construct: the 
correctness proof in chapter \ref{Bitcoded2},
and the finiteness proof in chapter \ref{Finite}.

\subsubsection{Correctness Proof Augmentation}
The correctness of $\textit{lexer}$ and $\textit{blexer}$ with bounded repetitions
have been proven by Ausaf and Urban\cite{AusafDyckhoffUrban2016}.
As they have commented, once the definitions are in place,
the proofs given for the basic regular expressions will extend to
bounded regular expressions, and there are no ``surprises''.
We confirm this point because the correctness theorem would also
extend without surprise to $\blexersimp$.
The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on
do not need to be changed,
and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to 
add one more line which can be solved by sledgehammer 
to solve the $r^{\{n\}}$ inductive case.


\subsubsection{Finiteness Proof Augmentation}
The bounded repetitions are
very similar to stars, and therefore the treatment
is similar, with minor changes to handle some slight complications
when the counter reaches 0.
The exponential growth is similar:
\begin{center}
	\begin{tabular}{ll}
		$r^{\{n\}} $ & $\longrightarrow_{\backslash c}$\\
		$(r\backslash c)  \cdot  
		r^{\{n - 1\}}*$ & $\longrightarrow_{\backslash c'}$\\
		\\
		$r \backslash cc'  \cdot r^{\{n - 2\}}* + 
		r \backslash c' \cdot r^{\{n - 1\}}*$ &
		$\longrightarrow_{\backslash c''}$\\
		\\
		$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + 
		r \backslash c''\cdot r^{\{n-1\}}) + 
		(r \backslash c'c'' \cdot r^{\{n-2\}}* + 
		r \backslash c'' \cdot r^{\{n-1\}}*)$ & 
		$\longrightarrow_{\backslash c'''}$ \\
		\\
		$\ldots$\\
	\end{tabular}
\end{center}
Again, we assume that $r\backslash c$, $r \backslash cc'$ and so on
are all nullable.
The flattened list of terms for $r^{\{n\}} \backslash_{rs} s$
\begin{center}
	$[r_1 \backslash cc'c'' \cdot r^{\{n-3\}}*,\;
	r \backslash c''\cdot r^{\{n-1\}}, \; 
	r \backslash c'c'' \cdot r^{\{n-2\}}*, \;
	r \backslash c'' \cdot r^{\{n-1\}}*,\; \ldots ]$  
\end{center}
that comes from 
\begin{center}
		$(r_1 \backslash cc'c'' \cdot r^{\{n-3\}}* + 
		r \backslash c''\cdot r^{\{n-1\}}) + 
		(r \backslash c'c'' \cdot r^{\{n-2\}}* + 
		r \backslash c'' \cdot r^{\{n-1\}}*)+ \ldots$ 
\end{center}
are made of sequences with different tails, where the counters
might differ.
The observation for maintaining the bound is that
these counters never exceed $n$, the original
counter. With the number of counters staying finite,
$\rDistinct$ will deduplicate and keep the list finite.
We introduce this idea as a lemma once we describe all
the necessary helper functions.

Similar to the star case, we want
\begin{center}
	$\rderssimp{r^{\{n\}}}{s} = \rsimp{\sum rs}$.
\end{center}
where $rs$
shall be in the form of 
$\map \; f \; Ss$, where $f$ is a function and
$Ss$ a list of objects to act on.
For star, the object's datatype is string.
The list of strings $Ss$
is generated using functions 
$\starupdate$ and $\starupdates$.
The function that takes a string and returns a regular expression
is the anonymous function $
(\lambda s'. \; r\backslash s' \cdot r^{\{m\}})$.
In the NTIMES setting,
the $\starupdate$ and $\starupdates$ functions are replaced by 
$\textit{nupdate}$ and $\textit{nupdates}$:
\begin{center}
	\begin{tabular}{lcl}
		$\nupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
		$\nupdate \; c \; r \; 
		(\Some \; (s, \; n + 1) \; :: \; Ss)$ & $\dn$ & %\\
						     $\textit{if} \; 
						     (\rnullable \; (r \backslash_{rs} s))$ \\
						     & & $\;\;\textit{then} 
						     \;\; \Some \; (s @ [c], n + 1) :: \Some \; ([c], n) :: (
						     \nupdate \; c \; r \; Ss)$ \\
						     & & $\textit{else} \;\; \Some \; (s @ [c], n+1) :: (
						     \nupdate \; c \; r \; Ss)$\\
		$\nupdate \; c \; r \; (\textit{None} :: Ss)$ & $\dn$ & 
		$(\None :: \nupdate  \; c \; r \; Ss)$\\
							      & & \\
	%\end{tabular}
%\end{center}
%\begin{center}
	%\begin{tabular}{lcl}
		$\nupdates \; [] \; r \; Ss$ & $\dn$ & $Ss$\\
		$\nupdates \; (c :: cs) \; r \; Ss$ &  $\dn$ &  $\nupdates \; cs \; r \; (
		\nupdate \; c \; r \; Ss)$
	\end{tabular}
\end{center}
\noindent
which take into account when a subterm
\begin{center}
	$r \backslash_s s \cdot r^{\{n\}}$
\end{center}
counter $n$
is 0, and therefore expands to 
\begin{center}
$r \backslash_s (s@[c]) \cdot r^{\{n\}} \;+
\; \ZERO$ 
\end{center}
after taking a derivative.
The object now has type 
\begin{center}
$\textit{option} \;(\textit{string}, \textit{nat})$
\end{center}
and therefore the function for converting such an option into
a regular expression term is called $\opterm$:

\begin{center}
	\begin{tabular}{lcl}
	$\opterm \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
				 & & $\;\;\Some \; (s, n) \Rightarrow 
				 (r\backslash_{rs} s)\cdot r^{\{n\}}$\\
				 & & $\;\;\None  \Rightarrow 
				 \ZERO$\\
	\end{tabular}
\end{center}
\noindent
Put together, the list $\map \; f \; Ss$ is instantiated as
\begin{center}
	$\map \; (\opterm \; r) \; (\nupdates \; s \; r \;
	[\Some \; ([c], n)])$.
\end{center}
For the closed form to be bounded, we would like
simplification to be applied to each term in the list.
Therefore we introduce some variants of $\opterm$,
which help conveniently express the rewriting steps 
needed in the closed form proof.
\begin{center}
	\begin{tabular}{lcl}
	$\optermOsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
				 & & $\;\;\Some \; (s, n) \Rightarrow 
				 \textit{rsimp} \; ((r\backslash_{rs} s)\cdot r^{\{n\}})$\\
				 & & $\;\;\None  \Rightarrow 
				 \ZERO$\\
				 \\
	$\optermosimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
				 & & $\;\;\Some \; (s, n) \Rightarrow 
				 (\textit{rsimp} \; (r\backslash_{rs} s)) 
				 \cdot r^{\{n\}}$\\
				 & & $\;\;\None  \Rightarrow 
				 \ZERO$\\
				 \\
	$\optermsimp \; r \; SN$ & $\dn$ & $\textit{case} \; SN\; of$\\
				 & & $\;\;\Some \; (s, n) \Rightarrow 
				 (r\backslash_{rsimps} s)\cdot r^{\{n\}}$\\
				 & & $\;\;\None  \Rightarrow 
				 \ZERO$\\
	\end{tabular}
\end{center}


For a list of 
$\textit{option} \;(\textit{string}, \textit{nat})$ elements,
we define the highest power for it recursively:
\begin{center}
	\begin{tabular}{lcl}
		$\hpa \; [] \; n $ & $\dn$ & $n$\\
		$\hpa \; (\None :: os) \; n $ &  $\dn$ &  $\hpa \; os \; n$\\
		$\hpa \; (\Some \; (s, n) :: os) \; m$ & $\dn$ & 
		$\hpa \;os \; (\textit{max} \; n\; m)$\\
		\\
		$\hpower \; rs $ & $\dn$ & $\hpa \; rs \; 0$\\
	\end{tabular}
\end{center}

Now the intuition that an NTIMES regular expression's power
does not increase can be easily expressed as
\begin{lemma}\label{nupdatesMono2}
	$\hpower \; (\nupdates \;s \; r \; [\Some \; ([c], n)]) \leq n$
\end{lemma}
\begin{proof}
	Note that the power is non-increasing after a $\nupdate$ application:
	\begin{center}
		$\hpa \;\; (\nupdate \; c \; r \; Ss)\;\; m \leq 
		 \hpa\; \; Ss \; m$.
	 \end{center}
	 This is also the case for $\nupdates$:
	\begin{center}
		$\hpa \;\; (\nupdates \; s \; r \; Ss)\;\; m \leq 
		 \hpa\; \; Ss \; m$.
	 \end{center}
	 Therefore we have that
	 \begin{center}
		 $\hpower \;\; (\nupdates \; s \; r \; Ss) \leq
		  \hpower \;\; Ss$
	 \end{center}
	 which leads to the lemma being proven.

 \end{proof}


We also define the inductive rules for
the shape of derivatives of the NTIMES regular expressions:\\[-3em]
\begin{center}
	\begin{mathpar}
		\inferrule{\mbox{}}{\cbn \;\ZERO}

		\inferrule{\mbox{}}{\cbn \; \; r_a \cdot (r^{\{n\}})}

		\inferrule{\cbn \; r_1 \;\; \; \cbn \; r_2}{\cbn \; r_1 + r_2}

		\inferrule{\cbn \; r}{\cbn \; r + \ZERO}
	\end{mathpar}
\end{center}
\noindent
A derivative of NTIMES fits into the shape described by $\cbn$:
\begin{lemma}\label{ntimesDersCbn}
	$\cbn \; ((r' \cdot r^{\{n\}}) \backslash_{rs} s)$ holds.
\end{lemma}
\begin{proof}
	By a reverse induction on $s$.
	For the inductive case, note that if $\cbn \; r$ holds,
	then $\cbn \; (r\backslash_r c)$ holds.
\end{proof}
\noindent
In addition, for $\cbn$-shaped regular expressioins, one can flatten
them:
\begin{lemma}\label{ntimesHfauPushin}
	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} = 
	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
	(\hflataux{r})})$
\end{lemma}
\begin{proof}
	By an induction on the inductive cases of $\cbn$.
\end{proof}
\noindent
This time we do not need to define the flattening functions for NTIMES only,
because $\hflat{\_}$ and $\hflataux{\_}$ work on NTIMES already.
\begin{lemma}\label{ntimesHfauInduct}
$\hflataux{( (r\backslash_r c) \cdot r^{\{n\}}) \backslash_{rsimps} s} = 
 \map \; (\opterm \; r) \; (\nupdates \; s \; r \; [\Some \; ([c], n)])$
\end{lemma}
\begin{proof}
	By a reverse induction on $s$.
	The lemmas \ref{ntimesHfauPushin} and \ref{ntimesDersCbn} are used.
\end{proof}
\noindent
We have a recursive property for NTIMES with $\nupdate$ 
similar to that for STAR,
and one for $\nupdates $ as well:
\begin{lemma}\label{nupdateInduct1}
	\mbox{}
	\begin{itemize}
		\item
			\begin{center}
	 $\textit{concat} \; (\map \; (\hflataux{\_} \circ (
	\opterm \; r)) \; Ss) = \map \; (\opterm \; r) \; (\nupdate \;
	c \; r \; Ss)$\\
	\end{center}
	holds.
\item
	\begin{center}
	 $\textit{concat} \; (\map \; \hflataux{\_}\; 
	\map \; (\_\backslash_r x) \;
		(\map \; (\opterm \; r) \; (\nupdates \; xs \; r \; Ss)))$\\
		$=$\\
	$\map \; (\opterm \; r) \; (\nupdates \;(xs@[x]) \; r\;Ss)$ 
	\end{center}
	holds.
	\end{itemize}
\end{lemma}
\begin{proof}
	(i) is by an induction on $Ss$.
	(ii) is by an induction on $xs$.
\end{proof}
\noindent
The $\nString$ predicate is defined for conveniently
expressing that there are no empty strings in the
$\Some \;(s, n)$ elements generated by $\nupdate$:
\begin{center}
	\begin{tabular}{lcl}
		$\nString \; \None$  & $\dn$ & $ \textit{true}$\\
		$\nString \; (\Some \; ([], n))$ & $\dn$ & $ \textit{false}$\\
		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
	\end{tabular}
\end{center}
\begin{lemma}\label{nupdatesNonempty}
	If for all elements $o \in \textit{set} \; Ss$,
	$\nString \; o$ holds, the we have that
	for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$,
	$\nString \; o'$ holds.
\end{lemma}
\begin{proof}
	By an induction on $s$, where $Ss$ is set to vary over all possible values.
\end{proof}

\noindent

\begin{lemma}\label{ntimesClosedFormsSteps}
	The following list of equalities or rewriting relations hold:\\
	(i) $r^{\{n+1\}} \backslash_{rsimps} (c::s) = 
	\textit{rsimp} \; (\sum (\map \; (\opterm \;r \;\_) \; (\nupdates \;
	s \; r \; [\Some \; ([c], n)])))$\\
	(ii)
	\begin{center}
	$\sum (\map \; (\opterm \; r) \; (\nupdates \; s \; r \; [
	\Some \; ([c], n)]))$ \\ $ \sequal$\\
	 $\sum (\map \; (\textit{rsimp} \circ (\opterm \; r))\; (\nupdates \;
	 s\;r \; [\Some \; ([c], n)]))$\\
 	\end{center}
	(iii)
	\begin{center}
	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
	([c], n)]))$\\ 
	$\sequal$\\
	 $\sum \;(\map \; (\optermsimp r) \; (\nupdates \; s \; r \; [\Some \;
	([c], n)])) $\\
	\end{center}
	(iv)
	\begin{center}
	$\sum \;(\map \; (\optermosimp \; r) \; (\nupdates \; s \; r\; [\Some \;
	([c], n)])) $ \\ $\sequal$\\
	 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
	([c], n)])) $\\
	\end{center}
	(v)
	\begin{center}
	 $\sum \;(\map \; (\optermOsimp r) \; (\nupdates \; s \; r \; [\Some \;
	 ([c], n)])) $ \\ $\sequal$\\
	  $\sum \; (\map \; (\textit{rsimp} \circ (\opterm \; r)) \;
	  (\nupdates \; s \; r \; [\Some \; ([c], n)]))$
  	\end{center}
\end{lemma}
\begin{proof}
	Routine.
	(iii) and (iv) make use of the fact that all the strings $s$
	inside $\Some \; (s, m)$ which are elements of the list
	$\nupdates \; s\;r\;[\Some\; ([c], n)]$ are non-empty,
	which is from lemma \ref{nupdatesNonempty}.
	Once the string in $o = \Some \; (s, n)$ is 
	nonempty, $\optermsimp \; r \;o$,
	$\optermosimp \; r \; o$ and $\optermosimp \; \; o$ are guaranteed
	to be equal.
	(v) uses \ref{nupdateInduct1}.
\end{proof}
\noindent
Now we are ready to present the closed form for NTIMES:
\begin{theorem}\label{ntimesClosedForm}
	The derivative of $r^{\{n+1\}}$ can be described as an alternative
	containing a list
	of terms:\\
	$r^{\{n+1\}} \backslash_{rsimps} (c::s) = \textit{rsimp} \; (
	\sum (\map \; (\optermsimp \; r) \; (\nupdates \; s \; r \;
	[\Some \; ([c], n)])))$
\end{theorem}
\begin{proof}
	By the rewriting steps described in lemma \ref{ntimesClosedFormsSteps}.
\end{proof}
\noindent
The key observation for bounding this closed form
is that the counter on $r^{\{n\}}$ will 
only decrement during derivatives:
\begin{lemma}\label{nupdatesNLeqN}
	For an element $o$ in $\textit{set} \; (\nupdates \; s \; r \;
	[\Some \; ([c], n)])$, either $o = \None$, or $o = \Some
	\; (s', m)$ for some string $s'$ and number $m \leq n$.
\end{lemma}
\noindent
The proof is routine and therefore omitted.
This allows us to say what kind of terms
are in the list $\textit{set} \; (\map \; (\optermsimp \; r) \; (
\nupdates \; s \; r \; [\Some \; ([c], n)]))$:
only $\ZERO_r$s or a sequence with the tail an $r^{\{m\}}$
with a small $m$:
\begin{lemma}\label{ntimesClosedFormListElemShape}
	For any element $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
	we have that $r'$ is either $\ZERO$ or $r \backslash_{rsimps} s' \cdot
	r^{\{m\}}$ for some string $s'$ and number $m \leq n$.
\end{lemma}
\begin{proof}
	Using lemma \ref{nupdatesNLeqN}.
\end{proof}

\begin{theorem}\label{ntimesClosedFormBounded}
	Assuming that for any string $s$, $\llbracket r \backslash_{rsimps} s
	\rrbracket_r \leq N$ holds, then we have that\\
	$\llbracket r^{\{n+1\}} \backslash_{rsimps} s \rrbracket_r \leq
	\textit{max} \; (c_N+1)* (N + \llbracket r^{\{n\}} \rrbracket+1)$,
	where $c_N = \textit{card} \; (\textit{sizeNregex} \; (
	N + \llbracket r^{\{n\}} \rrbracket_r+1))$.
\end{theorem}
\begin{proof}
We have that for all regular expressions $r'$ in $\textit{set} \; (\map \; (\optermsimp \; r) \; (
	\nupdates \; s \; r \; [\Some \; ([c], n)]))$,
	$r'$'s size is less than or equal to $N + \llbracket r^{\{n\}} 
	\rrbracket_r + 1$
because $r'$ can only be either a $\ZERO$ or $r \backslash_{rsimps} s' \cdot
r^{\{m\}}$ for some string $s'$ and number 
$m \leq n$ (lemma \ref{ntimesClosedFormListElemShape}).
In addition, we know that the list 
$\map \; (\optermsimp \; r) \; (
\nupdates \; s \; r \; [\Some \; ([c], n)])$'s size is at most
$c_N = \textit{card} \; 
(\sizeNregex \; ((N + \llbracket r^{\{n\}} \rrbracket) + 1))$.
This gives us $\llbracket r \backslash_{rsimps} \;s \rrbracket_r
\leq N * c_N$.
\end{proof}

We aim to formalise the correctness and size bound
for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$
and so on, which is still work in progress.
They should more or less follow the same recipe described in this section.
Once we know about how to deal with them recursively using suitable auxiliary
definitions, we are able to routinely establish the proofs.


%The closed form for them looks like:
%%\begin{center}
%%	\begin{tabular}{llrclll}
%%		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & \\
%%		$\textit{rsimp}$ & $($ & $
%%		\sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
%%			 & & & & $\textit{nupdates} \;$ & 
%%			 $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
%%			 & & & & $)$ &\\
%%			 & &  $)$ & & &\\
%%			 & $)$ & & & &\\
%%	\end{tabular}
%%\end{center}
%\begin{center}
%	\begin{tabular}{llrcllrllll}
%		$r^{\{n+1\}}$ & $ \backslash_{rsimps}$ & $(c::s)$ & $=$ & & &&&&\\
%			      &&&&$\textit{rsimp}$ & $($ & $
%			      \sum \; ( $ & $\map$ & $(\textit{optermsimp}\;r)$ & $($\\
%					  &&&& & & & & $\;\; \textit{nupdates} \;$ & 
%			 		  $ s \; r_0 \; [ \textit{Some} \; ([c], n)]$\\
%					  &&&& & & & & $)$ &\\
%					  &&&& & &  $)$ & & &\\
%					  &&&& & $)$ & & & &\\
%	\end{tabular}
%\end{center}
%The $\textit{optermsimp}$ function with the argument $r$ 
%chooses from two options: $\ZERO$ or 
%We define for the $r^{\{n\}}$ constructor something similar to $\starupdate$
%and $\starupdates$:
%\begin{center}
%	\begin{tabular}{lcl}
%		$\starupdate \; c \; r \; [] $ & $\dn$ & $[]$\\
%		$\starupdate \; c \; r \; (s :: Ss)$ & $\dn$ & \\
%						     & & $\textit{if} \; 
%						     (\rnullable \; (\rders \; r \; s))$ \\
%						     & & $\textit{then} \;\; (s @ [c]) :: [c] :: (
%						     \starupdate \; c \; r \; Ss)$ \\
%						     & & $\textit{else} \;\; (s @ [c]) :: (
%						     \starupdate \; c \; r \; Ss)$
%	\end{tabular}
%\end{center}
%\noindent
%As a generalisation from characters to strings,
%$\starupdates$ takes a string instead of a character
%as the first input argument, and is otherwise the same
%as $\starupdate$.
%\begin{center}
%	\begin{tabular}{lcl}
%		$\starupdates \; [] \; r \; Ss$ & $=$ & $Ss$\\
%		$\starupdates \; (c :: cs) \; r \; Ss$ &  $=$ &  $\starupdates \; cs \; r \; (
%		\starupdate \; c \; r \; Ss)$
%	\end{tabular}
%\end{center}
%\noindent



%\section{Zippers}
%Zipper is a data structure designed to operate on 
%and navigate between local parts of a tree.
%It was first formally described by Huet \cite{HuetZipper}.
%Typical applications of zippers involve text editor buffers
%and proof system databases.
%In our setting, the idea is to compactify the representation
%of derivatives with zippers, thereby making our algorithm faster.
%Some initial results 
%We first give a brief introduction to what zippers are,
%and other works
%that apply zippers to derivatives
%When dealing with large trees, it would be a waste to 
%traverse the entire tree if
%the operation only
%involves a small fraction of it.
%The idea is to put the focus on that subtree, turning other parts
%of the tree into a context
%
%
%One observation about our derivative-based lexing algorithm is that
%the derivative operation sometimes traverses the entire regular expression
%unnecessarily:


%----------------------------------------------------------------------------------------
%	SECTION 1
%----------------------------------------------------------------------------------------

%\section{Adding Support for the Negation Construct, and its Correctness Proof}
%We now add support for the negation regular expression:
%\[			r ::=   \ZERO \mid  \ONE
%			 \mid  c  
%			 \mid  r_1 \cdot r_2
%			 \mid  r_1 + r_2   
%			 \mid r^*    
%			 \mid \sim r
%\]
%The $\textit{nullable}$ function's clause for it would be 
%\[
%\textit{nullable}(~r) = \neg \nullable(r)
%\]
%The derivative would be
%\[
%~r \backslash c = ~ (r \backslash c)
%\]
% 
%The most tricky part of lexing for the $~r$ regular expression
% is creating a value for it.
% For other regular expressions, the value aligns with the 
% structure of the regular expression:
% \[
% \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
% \]
%But for the $~r$ regular expression, $s$ is a member of it if and only if
%$s$ does not belong to $L(r)$. 
%That means when there
%is a match for the not regular expression, it is not possible to generate how the string $s$ matched
%with $r$. 
%What we can do is preserve the information of how $s$ was not matched by $r$,
%and there are a number of options to do this.
%
%We could give a partial value when there is a partial match for the regular expression inside
%the $\mathbf{not}$ construct.
%For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
%A value for it could be 
% \[
% \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)
% \]
% The above example demonstrates what value to construct 
% when the string $s$ is at most a real prefix
% of the strings in $L(r)$. When $s$ instead is not a prefix of any strings
% in $L(r)$, it becomes unclear what to return as a value inside the $\textit{Not}$
% constructor.
% 
% Another option would be to either store the string $s$ that resulted in 
% a mis-match for $r$ or a dummy value as a placeholder:
% \[
% \vdash \textit{Not}(abcd) : ~( r_1 )
% \]
%or
% \[
% \vdash \textit{Not}(\textit{Dummy}) : ~( r_1 )
% \] 
% We choose to implement this as it is most straightforward:
% \[
% \mkeps(~(r))  = \textit{if}(\nullable(r)) \; \textit{Error} \; \textit{else} \; \textit{Not}(\textit{Dummy})
% \]
% 
%
%\begin{center}
%	\begin{tabular}{lcl}
%		$\ntset \; r \; (n+1) \; c::cs $ & $\dn$ & $\nupdates \;
%		cs \; r \; [\Some \; ([c], n)]$\\
%		$\ntset \; r\; 0 \; \_$ &  $\dn$ &  $\None$\\
%		$\ntset \; r \; \_ \; [] $ & $ \dn$ & $[]$\\
%	\end{tabular}
%\end{center}
author	Chengsong
	Tue, 08 Nov 2022 23:24:54 +0000
changeset 621	17c7611fb0a9
parent 620	ae6010c14e49
child 625	b797c9a709d9
permissions	-rwxr-xr-x