lexing: comparison ChengsongTanPhdThesis/Chapters/Cubic.tex

equal deleted inserted replaced

-:988e92a70704
+:b2d0de6aee18
 %	SECTION strongsimp
 %----------------------------------------------------------------------------------------
 \section{A Stronger Version of Simplification}
 %TODO: search for isabelle proofs of algorithms that check equivalence
 In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
-For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms,
+For example, the regular expression
-which expresses two possibilities it will match future input.
+\[
-It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
+	aa \cdot a^*+ a \cdot a^* + aa\cdot a^*
-\begin{figure}\label{string_3parts1}
+\]
-\begin{center}
+contains three terms,
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
+expressing three possibilities it will match future input.
-\node [rectangle split, rectangle split horizontal, rectangle split parts=3]
+The first and the third terms are identical, which means we can eliminate
-{Consumed Input
+the latter as we know it will not be picked up by $\bmkeps$.
-\nodepart{two} Expects $aa$
+In $\bsimps$, the $\distinctBy$ function takes care of this.
-\nodepart{three} Then expects $a^*$};
+The criteria $\distinctBy$ uses for removing a duplicate
+$a_2$ in the list
+\begin{center}
+	$rs_a@[a_1]@rs_b@[a_2]@rs_c$
+\end{center}
+is that
+\begin{center}
+	$\rerase{a_1} = \rerase{a_2}$.
+\end{center}
+It can be characterised as the $LD$
+rewrite rule in \ref{rrewriteRules}.\\
+The problem , however, is that identical components
+in two slightly different regular expressions cannot be removed:
+\begin{figure}[H]
+\[
+	(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
+\]
+\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}
+\end{figure}
+\noindent
+A simplification like this actually
+cannot be omitted,
+as without it the size could blow up even with our $\simp$ function:
+\begin{figure}[H]
+\centering
+\begin{tikzpicture}
+\begin{axis}[
+%xlabel={$n$},
+myplotstyle,
+xlabel={input length},
+ylabel={size},
+]
+\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
+\end{axis}
 \end{tikzpicture}
-\end{center}
+\caption{Runtime of $\blexersimp$ for matching
-\end{figure}
+	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$
-\noindent
+	with strings
-Or it will match at least 1 more $a$:
+	of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
-\begin{figure}\label{string_3parts2}
+\end{figure}
-\begin{center}
+\noindent
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
+We would like to apply the rewriting at some stage
-\node [rectangle split, rectangle split horizontal, rectangle split parts=3]
+\begin{figure}[H]
-{Consumed
-\nodepart{two} Expects $a$
-\nodepart{three} Then expects $a^*$};
-\end{tikzpicture}
-\end{center}
-\end{figure}
-If these two possibilities are identical, we can eliminate
-the second one as we know the second one corresponds to
-a match that is not $\POSIX$.
-If two identical regexes
-happen to be grouped into different sequences, one cannot use
-the $a + a \rightsquigarrow a$ rule to eliminate them, even if they
-are "parallel" to each other:
 \[
-(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
+	(a+b+d) \cdot r_1  \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
 \]
-and that's because they are followed by possibly
+\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}
-different "suffixes" $r_1$ and $r_2$, and if
+\end{figure}
-they do turn out to be different then doing
+\noindent
-\[
+in our $\simp$ function,
-(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
+so that it makes the simplification in \ref{partialDedup} possible.
-\]
+For example,
-might cause a possible match where the string is in $L(a \cdot r_2)$ to be lost.
+can a function like the below one be used?
+%TODO: simp' that spills things
-Here is an example for this.
-Assume we have the derivative regex
+Unfortunately,
-\[( r_1  +  r_2  +  r_3)\cdot r+
+if we introduce them in our
-( r_1  +  r_5  +  r_6)\cdot( r_1  +  r_2  +  r_3)^*
+setting we would lose the POSIX property of our calculated values.
-\]
+For example given the regular expression
+\begin{center}
-occurring in an intermediate step in our bit-coded lexing algorithm.
+	$(a + ab)(bc + c)$
+\end{center}
-In this expression, there will be 6 "parallel" terms if we break up regexes
+and the string
-of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
+\begin{center}
-\begin{align}
+	$ab$,
-(\nonumber \\
+\end{center}
-r_1  + &    \label{term:1} \\
+then our algorithm generates the following
-r_2  + &  \label{term:2} \\
+correct POSIX value
-r_3 &  \label{term:3} \\
+\begin{center}
-&	)\cdot r\; + \; (\nonumber \\
+	$\Seq \; (\Right \; ab) \; (\Right \; c)$.
-r_1  + & \label{term:4} \\
+\end{center}
-r_5  + & \label{term:5} \\
+Essentially it matches the string with the longer Right-alternative
-r_6 \label{term:6}\\
+in the first sequence (and
-&      )\cdot r\nonumber
+then the 'rest' with the character regular expression $c$ from the second sequence).
-\end{align}
+If we add the simplification above, then we obtain the following value
+\begin{center}
-They have been labelled, and each label denotes
+	$\Left \; (\Seq \; a \; (\Left \; bc))$
-one term, for example, \ref{term:1} denotes
+\end{center}
-\[
+where the $\Left$-alternatives get priority.
-r_1 \cdot r
+However this violates the POSIX rules.
-\]
+The reason for getting this undesired value
-\noindent
+is that the new rule splits this regular expression up into
-and \ref{term:3} denotes
+\begin{center}
-\[
+	$a\cdot(b c + c) + ab \cdot (bc + c)$,
-r_3\cdot r.
+\end{center}
-\]
+which becomes a regular expression with a
-From a lexer's point of view, \ref{term:4} will never
+totally different structure--the original
-be picked up in later phases of matching because there
+was a sequence, and now it becomes an alternative.
-is a term \ref{term:1} giving identical matching information.
+With an alternative the maximum munch rule no longer works.\\
-The first term \ref{term:1} will match a string in $L(r_1 \cdot  r)$,
+A method to reconcile this is to do the
-and so on for term \ref{term:2}, \ref{term:3}, etc.
+transformation in \ref{desiredSimp} ``non-invasively'',
+meaning that we traverse the list of regular expressions
-\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
+\begin{center}
-\begin{center}\label{string_2parts}
+	$rs_a@[a]@rs_c$
+\end{center}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
+in the alternative
-\node [rectangle split, rectangle split horizontal, rectangle split parts=2]
+\begin{center}
-{$r_{x1}$
+	$\sum ( rs_a@[a]@rs_c)$
-\nodepart{two} $r_1\cdot r$ };
+\end{center}
-%\caption{term 1 \ref{term:1}'s matching configuration}
+using  a function similar to $\distinctBy$,
-\end{tikzpicture}
+but this time
+we allow a more general list rewrite:
-\end{center}
+	\begin{mathpar}
-For term 1 \ref{term:1}, whatever was before the current
+		\inferrule{\vspace{0mm} }{rs_a@[a]@rs_c
-input position was fully matched and the regex corresponding
+			\stackrel{s}{\rightsquigarrow }
-to it reduced to $\ONE$,
+		rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
-and in the next input token, it will start on $r_1\cdot r$.
+	\end{mathpar}
-The resulting value will be something of a similar configuration:
+%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
-\begin{center}\label{value_2parts}
+where $\textit{prune} \;a \; acc$ traverses $a$
-%\caption{term 1 \ref{term:1}'s matching configuration}
+without altering the structure of $a$, removing components in $a$
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
+that have appeared in the accumulator $acc$.
-\node [rectangle split, rectangle split horizontal, rectangle split parts=2]
+For example
-{$v_{x1}$
+\begin{center}
-\nodepart{two} $v_{r_1\cdot r}$ };
+	$\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $
-\end{tikzpicture}
+\end{center}
-\end{center}
+should be equal to
-For term 2 \ref{term:2} we have a similar value partition:
+\begin{center}
-\begin{center}\label{value_2parts2}
+	$(r_g+r_h)r_d$
-%\caption{term 1 \ref{term:1}'s matching configuration}
+\end{center}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
+because $r_gr_d$ and
-\node [rectangle split, rectangle split horizontal, rectangle split parts=2]
+$r_hr_d$ are the only terms
-{$v_{x2}$
+that have not appeared in the accumulator list
-\nodepart{two} $v_{r_2\cdot r}$ };
+\begin{center}
-\end{tikzpicture}
+$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
 \end{center}
-\noindent
+We implemented
-and so on.
+function $\textit{prune}$ in Scala,
-We note that for term 4 \ref{term:4} its result value
+and incorporated into our lexer,
-after any position beyond  the current one will always
+by replacing the $\simp$ function
-be the same:
+with a stronger version called $\bsimpStrong$
-\begin{center}\label{value_2parts4}
+that prunes regular expressions.
-%\caption{term 1 \ref{term:1}'s matching configuration}
+We call this lexer $\blexerStrong$.
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
+$\blexerStrong$ is able to drastically reduce the
-\node [rectangle split, rectangle split horizontal, rectangle split parts=2]
+internal data structure size which could
-{$v_{x4}$
+trigger exponential behaviours in
-\nodepart{two} $v_{r_1\cdot r}$ };
+$\blexersimp$.
-\end{tikzpicture}
+\begin{figure}[H]
-\end{center}
-And $\POSIX$ rules says that we can eliminate at least one of them.
-Our algorithm always puts the regex with the longest initial sub-match at the left of the
-alternatives, so we safely throw away \ref{term:4}.
-The fact that term 1 and 4 are not immediately in the same alternative
-expression does not prevent them from being two duplicate matches at
-the current point of view.
-To implement this idea into an algorithm, we define a "pruning function"
-$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
-$\textit{acc}$, which acts as an element
-is a stronger version of $\textit{distinctBy}$.
-Here is a concise version of it (in the style of Scala):
-\begin{figure}[H]
-\begin{lstlisting}
-def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) :
-List[ARexp] =  rs match {
-case Nil => Nil
-case r :: rs =>
-if(acc.contains(erase(r)))
-distinctByPlus(rs, acc)
-else
-prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
-}
-\end{lstlisting}
-\caption{the distinctByPlus function}
-\end{figure}
-But for the function $\textit{prune}$ there is a difficulty:
-how could one define the $L$ function in a "computable" way,
-so that they generate a (lazy infinite) set of strings in $L(r)$.
-We also need a function that tests whether $L(r_1) \subseteq L(r_2)$
-is true.
-For the moment we cut corners and do not define these two functions
-as they are not used by Antimirov (and will probably not contribute
-to a bound better than Antimirov's cubic bound).
-Rather, we do not try to eliminate in every instance of regular expressions
-that have "language duplicates", but only eliminate the "exact duplicates".
-For this we need to break up terms $(a+b)\cdot c$ into two
-terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
-\begin{figure}[H]
-\begin{lstlisting}
-def distinctWith(rs: List[ARexp],
-pruneFunction: (ARexp, Set[Rexp]) => ARexp,
-acc: Set[Rexp] = Set()) : List[ARexp] =
-rs match{
-case Nil => Nil
-case r :: rs =>
-if(acc(erase(r)))
-distinctWith(rs, pruneFunction, acc)
-else {
-val pruned_r = pruneFunction(r, acc)
-pruned_r ::
-distinctWith(rs,
-pruneFunction,
-turnIntoTerms(erase(pruned_r)) ++: acc
-)
-}
-}
-\end{lstlisting}
-\caption{A Stronger Version of $\textit{distinctBy}$}
-\end{figure}
-\noindent
-This process is done by the function $\textit{turnIntoTerms}$:
-\begin{figure}
-\begin{verbatim}
-def turnIntoTerms(r: Rexp): List[Rexp] = r match {
-case SEQ(r1, r2)  => if(isOne(r1))
-turnIntoTerms(r2)
-else
-turnIntoTerms(r1).map(r11 => SEQ(r11, r2))
-case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2)
-case ZERO => Nil
-case _ => r :: Nil
-}
-\end{verbatim}
-\caption{function that breaks up regular expression into "atomic" terms}
-\end{figure}
-\noindent
-The pruning function can be defined recursively:
-\begin{figure}
-\begin{verbatim}
-def prune7(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
-case AALTS(bs, rs) => rs.map(r => prune7(r, acc)).filter(_ != ZERO) match
-{
-case Nil => AZERO
-case r::Nil => fuse(bs, r)
-case rs1 => AALTS(bs, rs1)
-}
-case ASEQ(bs, r1, r2) => prune7(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
-case AZERO => AZERO
-case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
-case r1p => ASEQ(bs, r1p, r2)
-}
-case r => if(acc(erase(r))) AZERO else r
-}
-\end{verbatim}
-\caption{pruning function}
-\end{figure}
-\begin{figure}
 \centering
 \begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
 \begin{tikzpicture}
 \begin{axis}[
 %xlabel={$n$},
 ]
 \addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
 \end{axis}
 \end{tikzpicture}\\
 \multicolumn{2}{l}{Graphs: Runtime for matching $((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings
-of the form $\underbrace{aa..a}_{n}$.}
+	   of the form $\underbrace{aa..a}_{n}$.}
 \end{tabular}
 \caption{aaaaaStarStar} \label{fig:aaaaaStarStar}
 \end{figure}
+\begin{figure}[H]
+\begin{lstlisting}
+def atMostEmpty(r: Rexp) : Boolean = r match {
+case ZERO => true
+case ONE => true
+case STAR(r) => atMostEmpty(r)
+case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
+case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
+case CHAR(_) => false
+}
+def isOne(r: Rexp) : Boolean = r match {
+case ONE => true
+case SEQ(r1, r2) => isOne(r1) && isOne(r2)
+case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne)
+case STAR(r0) => atMostEmpty(r0)
+case CHAR(c) => false
+case ZERO => false
+}
+//r = r' ~ tail' : If tail' matches tail => returns r'
+def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match {
+case SEQ(r1, r2) =>
+if(r2 == tail)
+r1
+else
+ZERO
+case r => ZERO
+}
+def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
+case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match
+{
+//all components have been removed, meaning this is effectively a duplicate
+//flats will take care of removing this AZERO
+case Nil => AZERO
+case r::Nil => fuse(bs, r)
+case rs1 => AALTS(bs, rs1)
+}
+case ASEQ(bs, r1, r2) =>
+//remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1
+prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
+//after pruning, returns 0
+case AZERO => AZERO
+//after pruning, got r1'.r2, where r1' is equal to 1
+case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
+//assemble the pruned head r1p with r2
+case r1p => ASEQ(bs, r1p, r2)
+}
+//this does the duplicate component removal task
+case r => if(acc(erase(r))) AZERO else r
+}
+\end{lstlisting}
+\caption{pruning function together with its helper functions}
+\end{figure}
+\noindent
+The benefits of using
+$\textit{prune}$ such as refining the finiteness bound
+to a cubic bound has not been formalised yet.
+Therefore we choose to use Scala code rather than an Isabelle-style formal
+definition like we did for $\simp$, as the definitions might change
+to suit proof needs.
+In the rest of the chapter we will use this convention consistently.
+\begin{figure}[H]
+\begin{lstlisting}
+def distinctWith(rs: List[ARexp],
+pruneFunction: (ARexp, Set[Rexp]) => ARexp,
+acc: Set[Rexp] = Set()) : List[ARexp] =
+rs match{
+case Nil => Nil
+case r :: rs =>
+if(acc(erase(r)))
+distinctWith(rs, pruneFunction, acc)
+else {
+val pruned_r = pruneFunction(r, acc)
+pruned_r ::
+distinctWith(rs,
+pruneFunction,
+turnIntoTerms(erase(pruned_r)) ++: acc
+)
+}
+}
+\end{lstlisting}
+\caption{A Stronger Version of $\textit{distinctBy}$}
+\end{figure}
+\noindent
+The function $\textit{prune}$ is used in $\distinctWith$.
+$\distinctWith$ is a stronger version of $\distinctBy$
+which not only removes duplicates as $\distinctBy$ would
+do, but also uses the $\textit{pruneFunction}$
+argument to prune away verbose components in a regular expression.\\
+\begin{figure}[H]
+\begin{lstlisting}
+//a stronger version of simp
+def bsimpStrong(r: ARexp): ARexp =
+{
+r match {
+case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
+//normal clauses same as simp
+case (AZERO, _) => AZERO
+case (_, AZERO) => AZERO
+case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
+//bs2 can be discarded
+case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
+case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
+}
+case AALTS(bs1, rs) => {
+//distinctBy(flat_res, erase)
+distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
+case Nil => AZERO
+case s :: Nil => fuse(bs1, s)
+case rs => AALTS(bs1, rs)
+}
+}
+//stars that can be treated as 1
+case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
+case r => r
+}
+}
+\end{lstlisting}
+\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
+\end{figure}
+\noindent
+$\distinctWith$, is in turn used in $\bsimpStrong$:
+\begin{figure}[H]
+\begin{lstlisting}
+//Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3)
+def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
+case Nil => r
+case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
+}
+\end{lstlisting}
+\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
+\end{figure}
+\noindent
+We conjecture that the above Scala function $\bdersStrongs$,
+written $\bdersStrong{\_}{\_}$ as an infix notation,
+satisfies the following property:
+\begin{conjecture}
+	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
+\end{conjecture}
+The stronger version of $\blexersimp$'s
+code in Scala looks like:
+\begin{figure}[H]
+\begin{lstlisting}
+def strongBlexer(r: Rexp, s: String) : Option[Val] = {
+Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)
+}
+def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {
+case Nil => {
+if (bnullable(r)) {
+mkepsBC(r)
+}
+else
+throw new Exception("Not matched")
+}
+case c::cs => {
+strong_blex_simp(strongBsimp(bder(c, r)), cs)
+}
+}
+\end{lstlisting}
+\end{figure}
+\noindent
+We would like to preserve the correctness like the one
+we had for $\blexersimp$:
+\begin{conjecture}\label{cubicConjecture}
+	$\blexerStrong \;r \; s = \blexer\; r\;s$
+\end{conjecture}
+To implement this idea into an algorithm, we define a "pruning function"
+$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
+$\textit{acc}$, which acts as an element
+is a stronger version of $\textit{distinctBy}$.
+Here is a concise version of it (in the style of Scala):
+\noindent
+The pruning function can be defined recursively:
+This suggests a link towrads "partial derivatives"
+introduced by Antimirov \cite{Antimirov95}.
+\section{Antimirov's partial derivatives}
+The idea behind Antimirov's partial derivatives
+is to do derivatives in a similar way as suggested by Brzozowski,
+but maintain a set of regular expressions instead of a single one:
+%TODO: antimirov proposition 3.1, needs completion
+\begin{center}
+\begin{tabular}{ccc}
+$\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
+$\partial_x(\ONE)$ & $=$ & $\phi$
+\end{tabular}
+\end{center}
+Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
+using the alternatives constructor, Antimirov cleverly chose to put them into
+a set instead.  This breaks the terms in a derivative regular expression up,
+allowing us to understand what are the "atomic" components of it.
+For example, To compute what regular expression $x^*(xx + y)^*$'s
+derivative against $x$ is made of, one can do a partial derivative
+of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
+from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
+To get all the "atomic" components of a regular expression's possible derivatives,
+there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
+whatever character is available at the head of the string inside the language of a
+regular expression, and gives back the character and the derivative regular expression
+as a pair (which he called "monomial"):
+\begin{center}
+\begin{tabular}{ccc}
+$\lf(\ONE)$ & $=$ & $\phi$\\
+$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
+$\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+$\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
+There is a slight difference in the last three clauses compared
+with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular
+expression $r$ with every element inside $\textit{rset}$ to create a set of
+sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates
+on a set of monomials (which Antimirov called "linear form") and a regular
+expression, and returns a linear form:
+\begin{center}
+\begin{tabular}{ccc}
+$l \bigodot (\ZERO)$ & $=$ & $\phi$\\
+$l \bigodot (\ONE)$ & $=$ & $l$\\
+$\phi \bigodot t$ & $=$ & $\phi$\\
+$\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
+$\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
+$\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
+$\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+$\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
+Some degree of simplification is applied when doing $\bigodot$, for example,
+$l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
+and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
+$\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
+and so on.
+With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with
+an iterative procedure:
+\begin{center}
+\begin{tabular}{llll}
+$\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
+		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
+		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
+$\partial_{UNIV}(r)$ & $=$ & $\PD$ &
+\end{tabular}
+\end{center}
+$(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
 %----------------------------------------------------------------------------------------
 %	SECTION 1
 The derivative would be
 \[
 ~r \backslash c = ~ (r \backslash c)
 \]
-The most tricky part of lexing for the $~r$ regex
+The most tricky part of lexing for the $~r$ regular expression
 is creating a value for it.
 For other regular expressions, the value aligns with the
-structure of the regex:
+structure of the regular expression:
 \[
 \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
 \]
-But for the $~r$ regex, $s$ is a member of it if and only if
+But for the $~r$ regular expression, $s$ is a member of it if and only if
 $s$ does not belong to $L(r)$.
 That means when there
-is a match for the not regex, it is not possible to generate how the string $s$ matched
+is a match for the not regular expression, it is not possible to generate how the string $s$ matched
 with $r$.
 What we can do is preserve the information of how $s$ was not matched by $r$,
 and there are a number of options to do this.
-We could give a partial value when there is a partial match for the regex inside
+We could give a partial value when there is a partial match for the regular expression inside
 the $\mathbf{not}$ construct.
 For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
 A value for it could be
 \[
 \vdash \textit{Not}(\Seq(\Char(a), \Char(b))) : ~((a \cdot b ) \cdot c)

changeset 591	b2d0de6aee18
parent 590	988e92a70704
child 592	7f4c353c0f6b