diff -r 94db2636a296 -r 7af4e2420a8c ChengsongTanPhdThesis/Chapters/Cubic.tex --- a/ChengsongTanPhdThesis/Chapters/Cubic.tex Tue Nov 22 12:50:04 2022 +0000 +++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Sat Nov 26 16:18:10 2022 +0000 @@ -16,7 +16,7 @@ Nevertheless, we outline the ideas we intend to use for the proof. We present further improvements -made to our lexer algorithm $\blexersimp$. +for our lexer algorithm $\blexersimp$. We devise a stronger simplification algorithm, called $\bsimpStrong$, which can prune away similar components in two regular expressions at the same @@ -24,6 +24,9 @@ even if these regular expressions are not exactly the same. We call the lexer that uses this stronger simplification function $\blexerStrong$. +Unfortunately we did not have time to +work out the proofs, like in +the previous chapters. We conjecture that both \begin{center} $\blexerStrong \;r \; s = \blexer\; r\;s$ @@ -32,10 +35,11 @@ \begin{center} $\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$ \end{center} -hold, but formalising -them is still work in progress. -We give reasons why the correctness and cubic size bound proofs -can be achieved, +hold, but a formalisation +is still future work. +We give an informal justification +why the correctness and cubic size bound proofs +can be achieved by exploring the connection between the internal data structure of our $\blexerStrong$ and Animirov's partial derivatives.\\ @@ -61,10 +65,11 @@ aa \cdot a^*+ a \cdot a^* + aa\cdot a^* \] contains three terms, -expressing three possibilities it will match future input. +expressing three possibilities for how it can match some input. The first and the third terms are identical, which means we can eliminate -the latter as we know it will not be picked up by $\bmkeps$. -In $\bsimps$, the $\distinctBy$ function takes care of this. +the latter as it will not contribute to a POSIX value. +In $\bsimps$, the $\distinctBy$ function takes care of +such instances. The criteria $\distinctBy$ uses for removing a duplicate $a_2$ in the list \begin{center} @@ -74,24 +79,29 @@ \begin{center} $\rerase{a_1} = \rerase{a_2}$. \end{center} -It can be characterised as the $LD$ -rewrite rule in \ref{rrewriteRules}.\\ +It is characterised as the $LD$ +rewrite rule in figure \ref{rrewriteRules}.\\ The problem , however, is that identical components -in two slightly different regular expressions cannot be removed: -\begin{figure}[H] -\[ +in two slightly different regular expressions cannot be removed. +Consider the simplification +\begin{equation} + \label{eqn:partialDedup} (a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1 -\] -\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup} -\end{figure} -\noindent -A simplification like this actually +\end{equation} +where the $(a+\ldots)\cdot r_1$ is deleted in the right alternative. +This is permissible because we have $(a+\ldots)\cdot r_1$ in the left +alternative. +The difficulty is that such ``buried'' +alternatives-sequences are not easily recognised. +But simplification like this actually cannot be omitted, -as without it the size could blow up even with our $\textit{bsimp}$ -function: for the chapter \ref{Finite} example +as without it the size of derivatives can still +blow up even with our $\textit{bsimp}$ +function: +consider again the example $\protect((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$, -by just setting n to a small number, -we get exponential growth that does not stop before it becomes huge: +and set $n$ to a relatively small number, +we get exponential growth: \begin{figure}[H] \centering \begin{tikzpicture} @@ -104,22 +114,20 @@ \addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data}; \end{axis} \end{tikzpicture} -\caption{Runtime of $\blexersimp$ for matching +\caption{Size of derivatives of $\blexersimp$ for matching $\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp} \end{figure} \noindent -We would like to apply the rewriting at some stage -\begin{figure}[H] +One possible approach would be to apply the rewriting +rule \[ (a+b+d) \cdot r_1 \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1 \] -\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp} -\end{figure} \noindent in our $\simp$ function, -so that it makes the simplification in \ref{partialDedup} possible. +so that it makes the simplification in \eqref{eqn:partialDedup} possible. Translating the rule into our $\textit{bsimp}$ function simply involves adding a new clause to the $\textit{bsimp}_{ASEQ}$ function: \begin{center} @@ -131,9 +139,7 @@ &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow _{bs}a_1' \cdot a_2'$ \\ \end{tabular} \end{center} - - - +\noindent Unfortunately, if we introduce them in our setting we would lose the POSIX property of our calculated values. @@ -141,10 +147,7 @@ \begin{center} $(a + ab)(bc + c)$ \end{center} -and the string -\begin{center} - $ab$, -\end{center} +and the string $ab$, then our algorithm generates the following correct POSIX value \begin{center} @@ -165,11 +168,11 @@ $a\cdot(b c + c) + ab \cdot (bc + c)$, \end{center} which becomes a regular expression with a -totally different structure--the original +quite different structure--the original was a sequence, and now it becomes an alternative. -With an alternative the maximum munch rule no longer works.\\ +With an alternative the maximal munch rule no longer works.\\ A method to reconcile this is to do the -transformation in \ref{desiredSimp} ``non-invasively'', +transformation in \eqref{eqn:partialDedup} ``non-invasively'', meaning that we traverse the list of regular expressions \begin{center} $rs_a@[a]@rs_c$ @@ -181,11 +184,16 @@ using a function similar to $\distinctBy$, but this time we allow a more general list rewrite: -\begin{mathpar}\label{cubicRule} +\begin{figure}[H] +\begin{mathpar} \inferrule * [Right = cubicRule]{\vspace{0mm} }{rs_a@[a]@rs_c \stackrel{s}{\rightsquigarrow } rs_a@[\textit{prune} \; a \; rs_a]@rs_c } \end{mathpar} +\caption{The rule capturing the pruning simplification needed to achieve +a cubic bound} +\label{fig:cubicRule} +\end{figure} %L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a) where $\textit{prune} \;a \; acc$ traverses $a$ without altering the structure of $a$, removing components in $a$ @@ -205,45 +213,11 @@ $[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$. \end{center} We implemented -function $\textit{prune}$ in Scala, -and incorporated into our lexer, -by replacing the $\simp$ function -with a stronger version called $\bsimpStrong$ -that prunes regular expressions. +function $\textit{prune}$ in Scala: \begin{figure}[H] - \begin{lstlisting} - def atMostEmpty(r: Rexp) : Boolean = r match { - case ZERO => true - case ONE => true - case STAR(r) => atMostEmpty(r) - case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2) - case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2) - case CHAR(_) => false - } - - - def isOne(r: Rexp) : Boolean = r match { - case ONE => true - case SEQ(r1, r2) => isOne(r1) && isOne(r2) - case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne) - case STAR(r0) => atMostEmpty(r0) - case CHAR(c) => false - case ZERO => false - } - - //r = r' ~ tail' : If tail' matches tail => returns r' - def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match { - case SEQ(r1, r2) => - if(r2 == tail) - r1 - else - ZERO - case r => ZERO - } - def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{ - case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match + case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != AZERO) match { //all components have been removed, meaning this is effectively a duplicate //flats will take care of removing this AZERO @@ -265,94 +239,250 @@ case r => if(acc(erase(r))) AZERO else r } \end{lstlisting} -\caption{pruning function together with its helper functions} +\caption{The function $\textit{prune}$ } \end{figure} \noindent -The benefits of using -$\textit{prune}$ such as refining the finiteness bound -to a cubic bound has not been formalised yet. -Therefore we choose to use Scala code rather than an Isabelle-style formal -definition like we did for $\simp$, as the definitions might change -to suit proof needs. -In the rest of the chapter we will use this convention consistently. +The function $\textit{prune}$ +is a stronger version of $\textit{distinctBy}$. +It does not just walk through a list looking for exact duplicates, +but prunes sub-expressions recursively. +It manages proper contexts by the helper functions +$\textit{removeSeqTail}$, $\textit{isOne}$ and $\textit{atMostEmpty}$. \begin{figure}[H] \begin{lstlisting} - def distinctWith(rs: List[ARexp], - pruneFunction: (ARexp, Set[Rexp]) => ARexp, - acc: Set[Rexp] = Set()) : List[ARexp] = - rs match{ - case Nil => Nil - case r :: rs => - if(acc(erase(r))) - distinctWith(rs, pruneFunction, acc) - else { - val pruned_r = pruneFunction(r, acc) - pruned_r :: - distinctWith(rs, - pruneFunction, - turnIntoTerms(erase(pruned_r)) ++: acc - ) - } + def atMostEmpty(r: Rexp) : Boolean = r match { + case ZERO => true + case ONE => true + case STAR(r) => atMostEmpty(r) + case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2) + case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2) + case CHAR(_) => false + } + + def isOne(r: Rexp) : Boolean = r match { + case ONE => true + case SEQ(r1, r2) => isOne(r1) && isOne(r2) + case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2)) + case STAR(r0) => atMostEmpty(r0) + case CHAR(c) => false + case ZERO => false + } + + def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = + if (r == tail) + ONE + else { + r match { + case SEQ(r1, r2) => + if(r2 == tail) + r1 + else + ZERO + case r => ZERO + } } + + +\end{lstlisting} +\caption{The helper functions of $\textit{prune}$} +\end{figure} +\noindent +Suppose we feed +\begin{center} + $r= (\underline{\ONE}+(\underline{f}+b)\cdot g)\cdot (a\cdot(d\cdot e))$ +\end{center} +and +\begin{center} + $acc = \{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \}$ +\end{center} +as the input for $\textit{prune}$. +The end result will be +\[ + b\cdot(g\cdot(a\cdot(d\cdot e))) +\] +where the underlined components in $r$ are eliminated. +Looking more closely, at the topmost call +\[ + \textit{prune} \quad (\ONE+ + (f+b)\cdot g)\cdot (a\cdot(d\cdot e)) \quad + \{a\cdot(d\cdot e),f\cdot (g \cdot (a \cdot (d \cdot e))) \} +\] +The sequence clause will be called, +where a sub-call +\[ + \textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \} +\] +is made. The terms in the new accumulator $\{\ONE,\; f\cdot g \}$ come from +the two calls to $\textit{removeSeqTail}$: +\[ + \textit{removeSeqTail} \quad\;\; a \cdot(d\cdot e) \quad\;\; a \cdot(d\cdot e) +\] +and +\[ + \textit{removeSeqTail} \quad \;\; + f\cdot(g\cdot (a \cdot(d\cdot e)))\quad \;\; a \cdot(d\cdot e). +\] +The idea behind $\textit{removeSeqTail}$ is that +when pruning recursively, we need to ``zoom in'' +to sub-expressions, and this ``zoom in'' needs to be performed +on the +accumulators as well, otherwise we will be comparing +apples with oranges. +The sub-call +$\textit{prune} \;\; (\ONE+(f+b)\cdot g)\;\; \{\ONE, f\cdot g \}$ +is simpler, which will trigger the alternative clause, causing +a pruning on each element in $(\ONE+(f+b)\cdot g)$, +leaving us $b\cdot g$ only. + +Our new lexer with stronger simplification +uses $\textit{prune}$ by making it +the core component of the deduplicating function +called $\textit{distinctWith}$. +$\textit{DistinctWith}$ ensures that all verbose +parts of a regular expression are pruned away. + +\begin{figure}[H] +\begin{lstlisting} + def turnIntoTerms(r: Rexp): List[Rexp] = r match { + case SEQ(r1, r2) => + turnIntoTerms(r1).flatMap(r11 => furtherSEQ(r11, r2)) + case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2) + case ZERO => Nil + case _ => r :: Nil + } + + def distinctWith(rs: List[ARexp], + pruneFunction: (ARexp, Set[Rexp]) => ARexp, + acc: Set[Rexp] = Set()) : List[ARexp] = + rs match{ + case Nil => Nil + case r :: rs => + if(acc(erase(r))) + distinctWith(rs, pruneFunction, acc) + else { + val pruned_r = pruneFunction(r, acc) + pruned_r :: + distinctWith(rs, + pruneFunction, + turnIntoTerms(erase(pruned_r)) ++: acc + ) + } + } + \end{lstlisting} \caption{A Stronger Version of $\textit{distinctBy}$} \end{figure} \noindent -The function $\textit{prune}$ is used in $\distinctWith$. -$\distinctWith$ is a stronger version of $\distinctBy$ -which not only removes duplicates as $\distinctBy$ would -do, but also uses the $\textit{pruneFunction}$ -argument to prune away verbose components in a regular expression.\\ +Once a regular expression has been pruned, +all its components will be added to the accumulator +to remove any future regular expressions' duplicate components. + +The function $\textit{bsimpStrong}$ +is very much the same as $\textit{bsimp}$, just with +$\textit{distinctBy}$ replaced +by $\textit{distinctWith}$. \begin{figure}[H] \begin{lstlisting} - //a stronger version of simp + def bsimpStrong(r: ARexp): ARexp = { r match { case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match { - //normal clauses same as simp case (AZERO, _) => AZERO case (_, AZERO) => AZERO case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s) - //bs2 can be discarded case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil case (r1s, r2s) => ASEQ(bs1, r1s, r2s) } case AALTS(bs1, rs) => { - //distinctBy(flat_res, erase) distinctWith(flats(rs.map(bsimpStrong(_))), prune) match { case Nil => AZERO case s :: Nil => fuse(bs1, s) case rs => AALTS(bs1, rs) } } - //stars that can be treated as 1 case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs) case r => r } } -\end{lstlisting} -\caption{The function $\bsimpStrong$ and $\bdersStrongs$} -\end{figure} -\noindent -$\distinctWith$, is in turn used in $\bsimpStrong$: -\begin{figure}[H] -\begin{lstlisting} - //Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3) - def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match { + def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match { case Nil => r case c::s => bdersStrong(s, bsimpStrong(bder(c, r))) } + \end{lstlisting} -\caption{The function $\bsimpStrong$ and $\bdersStrongs$} +\caption{The function +$\textit{bsimpStrong}$: a stronger version of $\textit{bsimp}$} \end{figure} \noindent +The benefits of using +$\textit{prune}$ refining the finiteness bound +to a cubic bound has not been formalised yet. +Therefore we choose to use Scala code rather than an Isabelle-style formal +definition like we did for $\simp$, as the definitions might change +to suit our proof needs. +In the rest of the chapter we will use this convention consistently. + +%The function $\textit{prune}$ is used in $\distinctWith$. +%$\distinctWith$ is a stronger version of $\distinctBy$ +%which not only removes duplicates as $\distinctBy$ would +%do, but also uses the $\textit{pruneFunction}$ +%argument to prune away verbose components in a regular expression.\\ +%\begin{figure}[H] +%\begin{lstlisting} +% //a stronger version of simp +% def bsimpStrong(r: ARexp): ARexp = +% { +% r match { +% case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match { +% //normal clauses same as simp +% case (AZERO, _) => AZERO +% case (_, AZERO) => AZERO +% case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s) +% //bs2 can be discarded +% case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil +% case (r1s, r2s) => ASEQ(bs1, r1s, r2s) +% } +% case AALTS(bs1, rs) => { +% //distinctBy(flat_res, erase) +% distinctWith(flats(rs.map(bsimpStrong(_))), prune) match { +% case Nil => AZERO +% case s :: Nil => fuse(bs1, s) +% case rs => AALTS(bs1, rs) +% } +% } +% //stars that can be treated as 1 +% case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs) +% case r => r +% } +% } +% def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match { +% case Nil => r +% case c::s => bdersStrong(s, bsimpStrong(bder(c, r))) +% } +%\end{lstlisting} +%\caption{The function $\bsimpStrong$ and $\bdersStrongs$} +%\end{figure} +%\noindent +%$\distinctWith$, is in turn used in $\bsimpStrong$: +%\begin{figure}[H] +%\begin{lstlisting} +% //Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3) +% def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match { +% case Nil => r +% case c::s => bdersStrong(s, bsimpStrong(bder(c, r))) +% } +%\end{lstlisting} +%\caption{The function $\bsimpStrong$ and $\bdersStrongs$} +%\end{figure} +%\noindent We conjecture that the above Scala function $\bdersStrongs$, written $\bdersStrong{\_}{\_}$ as an infix notation, satisfies the following property: \begin{conjecture} $\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$ \end{conjecture} +\noindent The stronger version of $\blexersimp$'s code in Scala looks like: \begin{figure}[H] @@ -376,8 +506,9 @@ \end{figure} \noindent We call this lexer $\blexerStrong$. -$\blexerStrong$ is able to drastically reduce the -internal data structure size which could +This version is able to drastically reduce the +internal data structure size which +otherwise could trigger exponential behaviours in $\blexersimp$. \begin{figure}[H] @@ -424,77 +555,88 @@ The idea is to maintain key lemmas in chapter \ref{Bitcoded2} like $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$ -with the new rewriting rule \ref{cubicRule} . +with the new rewriting rule +shown in figure \ref{fig:cubicRule} . In the next sub-section, we will describe why we -believe a cubic bound can be achieved. -We give an introduction to the +believe a cubic size bound can be achieved with +the stronger simplification. +For this we give a short introduction to the partial derivatives, -which was invented by Antimirov \cite{Antimirov95}, -and then link it with the result of the function +which were invented by Antimirov \cite{Antimirov95}, +and then link them with the result of the function $\bdersStrongs$. \subsection{Antimirov's partial derivatives} Partial derivatives were first introduced by Antimirov \cite{Antimirov95}. -It does derivatives in a similar way as suggested by Brzozowski, +Partial derivatives are very similar +to Brzozowski derivatives, but splits children of alternative regular expressions into -multiple independent terms, causing the output to become a +multiple independent terms. This means the output of +partial derivatives become a set of regular expressions: \begin{center} \begin{tabular}{lcl} - $\partial_x \; (a \cdot b)$ & - $\dn$ & $\partial_x \; a\cdot b \cup - \partial_x \; b \; \textit{if} \; \; \nullable\; a$\\ - & & $\partial_x \; a\cdot b \quad\quad + $\partial_x \; (r_1 \cdot r_2)$ & + $\dn$ & $(\partial_x \; r_1) \cdot r_2 \cup + \partial_x \; r_2 \; \textit{if} \; \; \nullable\; r_1$\\ + & & $(\partial_x \; r_1)\cdot r_2 \quad\quad \textit{otherwise}$\\ - $\partial_x \; r^*$ & $\dn$ & $\partial_x \; r \cdot r^*$\\ + $\partial_x \; r^*$ & $\dn$ & $(\partial_x \; r) \cdot r^*$\\ $\partial_x \; c $ & $\dn$ & $\textit{if} \; x = c \; \textit{then} \; \{ \ONE\} \;\;\textit{else} \; \varnothing$\\ - $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\ + $\partial_x(r_1+r_2)$ & $=$ & $\partial_x(r_1) \cup \partial_x(r_2)$\\ $\partial_x(\ONE)$ & $=$ & $\varnothing$\\ $\partial_x(\ZERO)$ & $\dn$ & $\varnothing$\\ \end{tabular} \end{center} \noindent The $\cdot$ between for example -$\partial_x \; a\cdot b $ +$(\partial_x \; r_1) \cdot r_2 $ is a shorthand notation for the cartesian product -$\partial_x \; a \times \{ b\}$. +$(\partial_x \; r_1) \times \{ r_2\}$. %Each element in the set generated by a partial derivative %corresponds to a (potentially partial) match %TODO: define derivatives w.r.t string s -Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together +Rather than joining the calculated derivatives $\partial_x r_1$ and $\partial_x r_2$ together using the $\sum$ constructor, Antimirov put them into -a set. This causes maximum de-duplication to happen, -allowing us to understand what are the "atomic" components of it. -For example, To compute what regular expression $x^*(xx + y)^*$'s -derivative against $x$ is made of, one can do a partial derivative -of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$ +a set. This means many subterms will be de-duplicated +because they are sets, +For example, to compute what regular expression $x^*(xx + y)^*$'s +derivative w.r.t. $x$ is, one can compute a partial derivative +and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$ from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$. +The partial derivative w.r.t. a string is defined recursively: +\[ + \partial_{c::cs} r \dn \bigcup_{r'\in (\partial_c r)} + \partial_{cs} r' +\] +Given an alphabet $\Sigma$, we denote the set of all possible strings +from this alphabet as $\Sigma^*$. The set of all possible partial derivatives is defined as the union of derivatives w.r.t all the strings in the universe: \begin{center} \begin{tabular}{lcl} - $\textit{PDER}_{UNIV} \; r $ & $\dn $ & $\bigcup_{w \in A^*}\partial_w \; r$ + $\textit{PDER}_{\Sigma^*} \; r $ & $\dn $ & $\bigcup_{w \in \Sigma^*}\partial_w \; r$ \end{tabular} \end{center} \noindent - -Back to our +Consider now again our pathological case where the derivatives +grow with a rather aggressive simplification \begin{center} $((a^* + (aa)^* + \ldots + (\underbrace{a\ldots a}_{n a's})^* )^*)^*$ \end{center} -example, if we denote this regular expression as $A$, +example, if we denote this regular expression as $r$, we have that \begin{center} -$\textit{PDER}_{UNIV} \; A = +$\textit{PDER}_{\Sigma^*} \; r = \bigcup_{i=1}^{n}\bigcup_{j=0}^{i-1} \{ (\underbrace{a \ldots a}_{\text{j a's}}\cdot -(\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot A \}$, +(\underbrace{a \ldots a}_{\text{i a's}})^*)\cdot r \}$, \end{center} with exactly $n * (n + 1) / 2$ terms. This is in line with our speculation that only $n*(n+1)/2$ terms are @@ -504,14 +646,14 @@ Using a suitable transformation $f$, we have \begin{center} $\forall s.\; f \; (r \bdersStrong \; s) \subseteq - \textit{PDER}_{UNIV} \; r$ + \textit{PDER}_{\Sigma^*} \; r$ \end{center} \end{conjecture} \noindent -because our \ref{cubicRule} will keep only one copy of each term, +because our \ref{fig:cubicRule} will keep only one copy of each term, where the function $\textit{prune}$ takes care of maintaining a set like structure similar to partial derivatives. -It is anticipated we might need to adjust $\textit{prune}$ +We might need to adjust $\textit{prune}$ slightly to make sure all duplicate terms are eliminated, which should be doable. @@ -519,15 +661,17 @@ terms' sizes is bounded by the cubic of the size of that regular expression: \begin{property}\label{pderBound} - $\llbracket \textit{PDER}_{UNIV} \; r \rrbracket \leq O((\llbracket r \rrbracket)^3)$ + $\llbracket \textit{PDER}_{\Sigma^*} \; r \rrbracket \leq O(\llbracket r \rrbracket^3)$ \end{property} -This property was formalised by Urban, and the details are in the PDERIVS.thy file -in our repository. +This property was formalised by Wu et al. \cite{Wu2014}, and the +details can be found in the archive of formal proofs. \footnote{https://www.isa-afp.org/entries/Myhill-Nerode.html} Once conjecture \ref{bsimpStrongInclusionPder} is proven, then property \ref{pderBound} -would yield us a cubic bound for our $\blexerStrong$ algorithm: +would yield us also a cubic bound for our $\blexerStrong$ algorithm: \begin{conjecture}\label{strongCubic} $\llbracket r \bdersStrong\; s \rrbracket \leq \llbracket r \rrbracket^3$ \end{conjecture} +\noindent +We leave this as future work. %To get all the "atomic" components of a regular expression's possible derivatives, @@ -748,3 +892,4 @@ % $\ntset \; r \; \_ \; [] $ & $ \dn$ & $[]$\\ % \end{tabular} %\end{center} +