--- a/ChengsongTanPhdThesis/Chapters/Cubic.tex Wed Aug 31 23:57:42 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Thu Sep 01 23:47:37 2022 +0100
@@ -59,239 +59,152 @@
\section{A Stronger Version of Simplification}
%TODO: search for isabelle proofs of algorithms that check equivalence
In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
-For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms,
-which expresses two possibilities it will match future input.
-It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
-\begin{figure}\label{string_3parts1}
+For example, the regular expression
+\[
+ aa \cdot a^*+ a \cdot a^* + aa\cdot a^*
+\]
+contains three terms,
+expressing three possibilities it will match future input.
+The first and the third terms are identical, which means we can eliminate
+the latter as we know it will not be picked up by $\bmkeps$.
+In $\bsimps$, the $\distinctBy$ function takes care of this.
+The criteria $\distinctBy$ uses for removing a duplicate
+$a_2$ in the list
+\begin{center}
+ $rs_a@[a_1]@rs_b@[a_2]@rs_c$
+\end{center}
+is that
\begin{center}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
- \node [rectangle split, rectangle split horizontal, rectangle split parts=3]
- {Consumed Input
- \nodepart{two} Expects $aa$
- \nodepart{three} Then expects $a^*$};
-
+ $\rerase{a_1} = \rerase{a_2}$.
+\end{center}
+It can be characterised as the $LD$
+rewrite rule in \ref{rrewriteRules}.\\
+The problem , however, is that identical components
+in two slightly different regular expressions cannot be removed:
+\begin{figure}[H]
+\[
+ (a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
+\]
+\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}
+\end{figure}
+\noindent
+A simplification like this actually
+cannot be omitted,
+as without it the size could blow up even with our $\simp$ function:
+\begin{figure}[H]
+\centering
+\begin{tikzpicture}
+\begin{axis}[
+ %xlabel={$n$},
+ myplotstyle,
+ xlabel={input length},
+ ylabel={size},
+ ]
+\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
+\end{axis}
\end{tikzpicture}
-\end{center}
+\caption{Runtime of $\blexersimp$ for matching
+ $\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$
+ with strings
+ of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
+\end{figure}
+\noindent
+We would like to apply the rewriting at some stage
+\begin{figure}[H]
+\[
+ (a+b+d) \cdot r_1 \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
+\]
+\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}
\end{figure}
\noindent
-Or it will match at least 1 more $a$:
-\begin{figure}\label{string_3parts2}
-\begin{center}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
- \node [rectangle split, rectangle split horizontal, rectangle split parts=3]
- {Consumed
- \nodepart{two} Expects $a$
- \nodepart{three} Then expects $a^*$};
-
-\end{tikzpicture}
-\end{center}
-\end{figure}
-If these two possibilities are identical, we can eliminate
-the second one as we know the second one corresponds to
-a match that is not $\POSIX$.
-
-
-If two identical regexes
-happen to be grouped into different sequences, one cannot use
-the $a + a \rightsquigarrow a$ rule to eliminate them, even if they
-are "parallel" to each other:
-\[
-(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
-\]
-and that's because they are followed by possibly
-different "suffixes" $r_1$ and $r_2$, and if
-they do turn out to be different then doing
-\[
-(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
-\]
-might cause a possible match where the string is in $L(a \cdot r_2)$ to be lost.
-
-Here is an example for this.
-Assume we have the derivative regex
-\[( r_1 + r_2 + r_3)\cdot r+
-( r_1 + r_5 + r_6)\cdot( r_1 + r_2 + r_3)^*
-\]
-
-occurring in an intermediate step in our bit-coded lexing algorithm.
+in our $\simp$ function,
+so that it makes the simplification in \ref{partialDedup} possible.
+For example,
+can a function like the below one be used?
+%TODO: simp' that spills things
-In this expression, there will be 6 "parallel" terms if we break up regexes
-of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
-\begin{align}
-(\nonumber \\
- r_1 + & \label{term:1} \\
- r_2 + & \label{term:2} \\
- r_3 & \label{term:3} \\
-& )\cdot r\; + \; (\nonumber \\
- r_1 + & \label{term:4} \\
- r_5 + & \label{term:5} \\
- r_6 \label{term:6}\\
-& )\cdot r\nonumber
-\end{align}
-
-They have been labelled, and each label denotes
-one term, for example, \ref{term:1} denotes
-\[
- r_1 \cdot r
-\]
-\noindent
-and \ref{term:3} denotes
-\[
- r_3\cdot r.
-\]
-From a lexer's point of view, \ref{term:4} will never
-be picked up in later phases of matching because there
-is a term \ref{term:1} giving identical matching information.
-The first term \ref{term:1} will match a string in $L(r_1 \cdot r)$,
-and so on for term \ref{term:2}, \ref{term:3}, etc.
-
-\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
-\begin{center}\label{string_2parts}
-
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
- \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
- {$r_{x1}$
- \nodepart{two} $r_1\cdot r$ };
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\end{tikzpicture}
-
+Unfortunately,
+if we introduce them in our
+setting we would lose the POSIX property of our calculated values.
+For example given the regular expression
+\begin{center}
+ $(a + ab)(bc + c)$
+\end{center}
+and the string
+\begin{center}
+ $ab$,
\end{center}
-For term 1 \ref{term:1}, whatever was before the current
-input position was fully matched and the regex corresponding
-to it reduced to $\ONE$,
-and in the next input token, it will start on $r_1\cdot r$.
-The resulting value will be something of a similar configuration:
-\begin{center}\label{value_2parts}
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
- \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
- {$v_{x1}$
- \nodepart{two} $v_{r_1\cdot r}$ };
-\end{tikzpicture}
+then our algorithm generates the following
+correct POSIX value
+\begin{center}
+ $\Seq \; (\Right \; ab) \; (\Right \; c)$.
\end{center}
-For term 2 \ref{term:2} we have a similar value partition:
-\begin{center}\label{value_2parts2}
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
- \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
- {$v_{x2}$
- \nodepart{two} $v_{r_2\cdot r}$ };
-\end{tikzpicture}
+Essentially it matches the string with the longer Right-alternative
+in the first sequence (and
+then the 'rest' with the character regular expression $c$ from the second sequence).
+If we add the simplification above, then we obtain the following value
+\begin{center}
+ $\Left \; (\Seq \; a \; (\Left \; bc))$
+\end{center}
+where the $\Left$-alternatives get priority.
+However this violates the POSIX rules.
+The reason for getting this undesired value
+is that the new rule splits this regular expression up into
+\begin{center}
+ $a\cdot(b c + c) + ab \cdot (bc + c)$,
\end{center}
-\noindent
-and so on.
-We note that for term 4 \ref{term:4} its result value
-after any position beyond the current one will always
-be the same:
-\begin{center}\label{value_2parts4}
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
- \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
- {$v_{x4}$
- \nodepart{two} $v_{r_1\cdot r}$ };
-\end{tikzpicture}
+which becomes a regular expression with a
+totally different structure--the original
+was a sequence, and now it becomes an alternative.
+With an alternative the maximum munch rule no longer works.\\
+A method to reconcile this is to do the
+transformation in \ref{desiredSimp} ``non-invasively'',
+meaning that we traverse the list of regular expressions
+\begin{center}
+ $rs_a@[a]@rs_c$
+\end{center}
+in the alternative
+\begin{center}
+ $\sum ( rs_a@[a]@rs_c)$
\end{center}
-And $\POSIX$ rules says that we can eliminate at least one of them.
-Our algorithm always puts the regex with the longest initial sub-match at the left of the
-alternatives, so we safely throw away \ref{term:4}.
-The fact that term 1 and 4 are not immediately in the same alternative
-expression does not prevent them from being two duplicate matches at
-the current point of view.
-To implement this idea into an algorithm, we define a "pruning function"
-$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
-$\textit{acc}$, which acts as an element
-is a stronger version of $\textit{distinctBy}$.
-Here is a concise version of it (in the style of Scala):
-\begin{figure}[H]
-\begin{lstlisting}
-def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) :
-List[ARexp] = rs match {
- case Nil => Nil
- case r :: rs =>
- if(acc.contains(erase(r)))
- distinctByPlus(rs, acc)
- else
- prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
-}
-
-\end{lstlisting}
-\caption{the distinctByPlus function}
-\end{figure}
-But for the function $\textit{prune}$ there is a difficulty:
-how could one define the $L$ function in a "computable" way,
-so that they generate a (lazy infinite) set of strings in $L(r)$.
-We also need a function that tests whether $L(r_1) \subseteq L(r_2)$
-is true.
-For the moment we cut corners and do not define these two functions
-as they are not used by Antimirov (and will probably not contribute
-to a bound better than Antimirov's cubic bound).
-Rather, we do not try to eliminate in every instance of regular expressions
-that have "language duplicates", but only eliminate the "exact duplicates".
-For this we need to break up terms $(a+b)\cdot c$ into two
-terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
+using a function similar to $\distinctBy$,
+but this time
+we allow a more general list rewrite:
+ \begin{mathpar}
+ \inferrule{\vspace{0mm} }{rs_a@[a]@rs_c
+ \stackrel{s}{\rightsquigarrow }
+ rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
+ \end{mathpar}
+%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
+where $\textit{prune} \;a \; acc$ traverses $a$
+without altering the structure of $a$, removing components in $a$
+that have appeared in the accumulator $acc$.
+For example
+\begin{center}
+ $\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $
+\end{center}
+should be equal to
+\begin{center}
+ $(r_g+r_h)r_d$
+\end{center}
+because $r_gr_d$ and
+$r_hr_d$ are the only terms
+that have not appeared in the accumulator list
+\begin{center}
+$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
+\end{center}
+We implemented
+function $\textit{prune}$ in Scala,
+and incorporated into our lexer,
+by replacing the $\simp$ function
+with a stronger version called $\bsimpStrong$
+that prunes regular expressions.
+We call this lexer $\blexerStrong$.
+$\blexerStrong$ is able to drastically reduce the
+internal data structure size which could
+trigger exponential behaviours in
+$\blexersimp$.
\begin{figure}[H]
-\begin{lstlisting}
-def distinctWith(rs: List[ARexp],
- pruneFunction: (ARexp, Set[Rexp]) => ARexp,
- acc: Set[Rexp] = Set()) : List[ARexp] =
- rs match{
- case Nil => Nil
- case r :: rs =>
- if(acc(erase(r)))
- distinctWith(rs, pruneFunction, acc)
- else {
- val pruned_r = pruneFunction(r, acc)
- pruned_r ::
- distinctWith(rs,
- pruneFunction,
- turnIntoTerms(erase(pruned_r)) ++: acc
- )
- }
- }
-\end{lstlisting}
-\caption{A Stronger Version of $\textit{distinctBy}$}
-\end{figure}
-\noindent
-This process is done by the function $\textit{turnIntoTerms}$:
-\begin{figure}
-\begin{verbatim}
-def turnIntoTerms(r: Rexp): List[Rexp] = r match {
- case SEQ(r1, r2) => if(isOne(r1))
- turnIntoTerms(r2)
- else
- turnIntoTerms(r1).map(r11 => SEQ(r11, r2))
- case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2)
- case ZERO => Nil
- case _ => r :: Nil
-}
-\end{verbatim}
-\caption{function that breaks up regular expression into "atomic" terms}
-\end{figure}
-
-\noindent
-The pruning function can be defined recursively:
-\begin{figure}
-\begin{verbatim}
-def prune7(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
- case AALTS(bs, rs) => rs.map(r => prune7(r, acc)).filter(_ != ZERO) match
- {
- case Nil => AZERO
- case r::Nil => fuse(bs, r)
- case rs1 => AALTS(bs, rs1)
- }
- case ASEQ(bs, r1, r2) => prune7(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
- case AZERO => AZERO
- case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
- case r1p => ASEQ(bs, r1p, r2)
- }
- case r => if(acc(erase(r))) AZERO else r
-}
-\end{verbatim}
-\caption{pruning function}
-\end{figure}
-
-
-
-\begin{figure}
\centering
\begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
\begin{tikzpicture}
@@ -316,10 +229,275 @@
\end{axis}
\end{tikzpicture}\\
\multicolumn{2}{l}{Graphs: Runtime for matching $((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings
- of the form $\underbrace{aa..a}_{n}$.}
+ of the form $\underbrace{aa..a}_{n}$.}
\end{tabular}
\caption{aaaaaStarStar} \label{fig:aaaaaStarStar}
\end{figure}
+\begin{figure}[H]
+\begin{lstlisting}
+ def atMostEmpty(r: Rexp) : Boolean = r match {
+ case ZERO => true
+ case ONE => true
+ case STAR(r) => atMostEmpty(r)
+ case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
+ case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
+ case CHAR(_) => false
+ }
+
+
+ def isOne(r: Rexp) : Boolean = r match {
+ case ONE => true
+ case SEQ(r1, r2) => isOne(r1) && isOne(r2)
+ case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne)
+ case STAR(r0) => atMostEmpty(r0)
+ case CHAR(c) => false
+ case ZERO => false
+ }
+
+ //r = r' ~ tail' : If tail' matches tail => returns r'
+ def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match {
+ case SEQ(r1, r2) =>
+ if(r2 == tail)
+ r1
+ else
+ ZERO
+ case r => ZERO
+ }
+
+ def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
+ case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match
+ {
+ //all components have been removed, meaning this is effectively a duplicate
+ //flats will take care of removing this AZERO
+ case Nil => AZERO
+ case r::Nil => fuse(bs, r)
+ case rs1 => AALTS(bs, rs1)
+ }
+ case ASEQ(bs, r1, r2) =>
+ //remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1
+ prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
+ //after pruning, returns 0
+ case AZERO => AZERO
+ //after pruning, got r1'.r2, where r1' is equal to 1
+ case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
+ //assemble the pruned head r1p with r2
+ case r1p => ASEQ(bs, r1p, r2)
+ }
+ //this does the duplicate component removal task
+ case r => if(acc(erase(r))) AZERO else r
+ }
+\end{lstlisting}
+\caption{pruning function together with its helper functions}
+\end{figure}
+\noindent
+The benefits of using
+$\textit{prune}$ such as refining the finiteness bound
+to a cubic bound has not been formalised yet.
+Therefore we choose to use Scala code rather than an Isabelle-style formal
+definition like we did for $\simp$, as the definitions might change
+to suit proof needs.
+In the rest of the chapter we will use this convention consistently.
+\begin{figure}[H]
+\begin{lstlisting}
+ def distinctWith(rs: List[ARexp],
+ pruneFunction: (ARexp, Set[Rexp]) => ARexp,
+ acc: Set[Rexp] = Set()) : List[ARexp] =
+ rs match{
+ case Nil => Nil
+ case r :: rs =>
+ if(acc(erase(r)))
+ distinctWith(rs, pruneFunction, acc)
+ else {
+ val pruned_r = pruneFunction(r, acc)
+ pruned_r ::
+ distinctWith(rs,
+ pruneFunction,
+ turnIntoTerms(erase(pruned_r)) ++: acc
+ )
+ }
+ }
+\end{lstlisting}
+\caption{A Stronger Version of $\textit{distinctBy}$}
+\end{figure}
+\noindent
+The function $\textit{prune}$ is used in $\distinctWith$.
+$\distinctWith$ is a stronger version of $\distinctBy$
+which not only removes duplicates as $\distinctBy$ would
+do, but also uses the $\textit{pruneFunction}$
+argument to prune away verbose components in a regular expression.\\
+\begin{figure}[H]
+\begin{lstlisting}
+ //a stronger version of simp
+ def bsimpStrong(r: ARexp): ARexp =
+ {
+ r match {
+ case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
+ //normal clauses same as simp
+ case (AZERO, _) => AZERO
+ case (_, AZERO) => AZERO
+ case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
+ //bs2 can be discarded
+ case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
+ case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
+ }
+ case AALTS(bs1, rs) => {
+ //distinctBy(flat_res, erase)
+ distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
+ case Nil => AZERO
+ case s :: Nil => fuse(bs1, s)
+ case rs => AALTS(bs1, rs)
+ }
+ }
+ //stars that can be treated as 1
+ case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
+ case r => r
+ }
+ }
+\end{lstlisting}
+\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
+\end{figure}
+\noindent
+$\distinctWith$, is in turn used in $\bsimpStrong$:
+\begin{figure}[H]
+\begin{lstlisting}
+ //Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3)
+ def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
+ case Nil => r
+ case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
+ }
+\end{lstlisting}
+\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
+\end{figure}
+\noindent
+We conjecture that the above Scala function $\bdersStrongs$,
+written $\bdersStrong{\_}{\_}$ as an infix notation,
+satisfies the following property:
+\begin{conjecture}
+ $\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
+\end{conjecture}
+The stronger version of $\blexersimp$'s
+code in Scala looks like:
+\begin{figure}[H]
+\begin{lstlisting}
+ def strongBlexer(r: Rexp, s: String) : Option[Val] = {
+ Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)
+ }
+ def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {
+ case Nil => {
+ if (bnullable(r)) {
+ mkepsBC(r)
+ }
+ else
+ throw new Exception("Not matched")
+ }
+ case c::cs => {
+ strong_blex_simp(strongBsimp(bder(c, r)), cs)
+ }
+ }
+\end{lstlisting}
+\end{figure}
+\noindent
+We would like to preserve the correctness like the one
+we had for $\blexersimp$:
+\begin{conjecture}\label{cubicConjecture}
+ $\blexerStrong \;r \; s = \blexer\; r\;s$
+\end{conjecture}
+
+To implement this idea into an algorithm, we define a "pruning function"
+$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
+$\textit{acc}$, which acts as an element
+is a stronger version of $\textit{distinctBy}$.
+Here is a concise version of it (in the style of Scala):
+
+
+
+
+\noindent
+The pruning function can be defined recursively:
+
+
+
+
+This suggests a link towrads "partial derivatives"
+ introduced by Antimirov \cite{Antimirov95}.
+
+ \section{Antimirov's partial derivatives}
+ The idea behind Antimirov's partial derivatives
+is to do derivatives in a similar way as suggested by Brzozowski,
+but maintain a set of regular expressions instead of a single one:
+
+%TODO: antimirov proposition 3.1, needs completion
+ \begin{center}
+ \begin{tabular}{ccc}
+ $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
+$\partial_x(\ONE)$ & $=$ & $\phi$
+\end{tabular}
+\end{center}
+
+Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
+using the alternatives constructor, Antimirov cleverly chose to put them into
+a set instead. This breaks the terms in a derivative regular expression up,
+allowing us to understand what are the "atomic" components of it.
+For example, To compute what regular expression $x^*(xx + y)^*$'s
+derivative against $x$ is made of, one can do a partial derivative
+of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
+from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
+To get all the "atomic" components of a regular expression's possible derivatives,
+there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
+whatever character is available at the head of the string inside the language of a
+regular expression, and gives back the character and the derivative regular expression
+as a pair (which he called "monomial"):
+ \begin{center}
+ \begin{tabular}{ccc}
+ $\lf(\ONE)$ & $=$ & $\phi$\\
+$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
+ $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+ $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
+
+There is a slight difference in the last three clauses compared
+with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular
+expression $r$ with every element inside $\textit{rset}$ to create a set of
+sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates
+on a set of monomials (which Antimirov called "linear form") and a regular
+expression, and returns a linear form:
+ \begin{center}
+ \begin{tabular}{ccc}
+ $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
+ $l \bigodot (\ONE)$ & $=$ & $l$\\
+ $\phi \bigodot t$ & $=$ & $\phi$\\
+ $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
+ $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
+ $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
+ $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+ $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
+
+ Some degree of simplification is applied when doing $\bigodot$, for example,
+ $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
+ and $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
+ $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
+ and so on.
+
+ With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with
+ an iterative procedure:
+ \begin{center}
+ \begin{tabular}{llll}
+$\textit{while}$ & $(\Delta_i \neq \phi)$ & & \\
+ & $\Delta_{i+1}$ & $ =$ & $\lf(\Delta_i) - \PD_i$ \\
+ & $\PD_{i+1}$ & $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
+$\partial_{UNIV}(r)$ & $=$ & $\PD$ &
+\end{tabular}
+\end{center}
+
+
+ $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
+
+
@@ -345,22 +523,22 @@
~r \backslash c = ~ (r \backslash c)
\]
-The most tricky part of lexing for the $~r$ regex
+The most tricky part of lexing for the $~r$ regular expression
is creating a value for it.
For other regular expressions, the value aligns with the
- structure of the regex:
+ structure of the regular expression:
\[
\vdash \Seq(\Char(a), \Char(b)) : a \cdot b
\]
-But for the $~r$ regex, $s$ is a member of it if and only if
+But for the $~r$ regular expression, $s$ is a member of it if and only if
$s$ does not belong to $L(r)$.
That means when there
-is a match for the not regex, it is not possible to generate how the string $s$ matched
+is a match for the not regular expression, it is not possible to generate how the string $s$ matched
with $r$.
What we can do is preserve the information of how $s$ was not matched by $r$,
and there are a number of options to do this.
-We could give a partial value when there is a partial match for the regex inside
+We could give a partial value when there is a partial match for the regular expression inside
the $\mathbf{not}$ construct.
For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
A value for it could be