ChengsongTanPhdThesis/Chapters/Cubic.tex
changeset 591 b2d0de6aee18
parent 590 988e92a70704
child 592 7f4c353c0f6b
--- a/ChengsongTanPhdThesis/Chapters/Cubic.tex	Wed Aug 31 23:57:42 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex	Thu Sep 01 23:47:37 2022 +0100
@@ -59,239 +59,152 @@
 \section{A Stronger Version of Simplification}
 %TODO: search for isabelle proofs of algorithms that check equivalence 
 In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
-For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms, 
-which expresses two possibilities it will match future input.
-It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
-\begin{figure}\label{string_3parts1}
+For example, the regular expression 
+\[
+	aa \cdot a^*+ a \cdot a^* + aa\cdot a^*
+\]
+contains three terms, 
+expressing three possibilities it will match future input.
+The first and the third terms are identical, which means we can eliminate
+the latter as we know it will not be picked up by $\bmkeps$.
+In $\bsimps$, the $\distinctBy$ function takes care of this.
+The criteria $\distinctBy$ uses for removing a duplicate
+$a_2$ in the list
+\begin{center}
+	$rs_a@[a_1]@rs_b@[a_2]@rs_c$
+\end{center}
+is that 
 \begin{center}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
-    \node [rectangle split, rectangle split horizontal, rectangle split parts=3]
-        {Consumed Input
-        \nodepart{two} Expects $aa$
-         \nodepart{three} Then expects $a^*$};
-
+	$\rerase{a_1} = \rerase{a_2}$.
+\end{center}
+It can be characterised as the $LD$ 
+rewrite rule in \ref{rrewriteRules}.\\
+The problem , however, is that identical components
+in two slightly different regular expressions cannot be removed:
+\begin{figure}[H]
+\[
+	(a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1
+\]
+\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup}
+\end{figure}
+\noindent
+A simplification like this actually
+cannot be omitted,
+as without it the size could blow up even with our $\simp$ function:
+\begin{figure}[H]
+\centering
+\begin{tikzpicture}
+\begin{axis}[
+    %xlabel={$n$},
+    myplotstyle,
+    xlabel={input length},
+    ylabel={size},
+    ]
+\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data};
+\end{axis}
 \end{tikzpicture}
-\end{center}
+\caption{Runtime of $\blexersimp$ for matching 
+	$\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ 
+	with strings 
+	of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp}
+\end{figure}
+\noindent
+We would like to apply the rewriting at some stage
+\begin{figure}[H]
+\[
+	(a+b+d) \cdot r_1  \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1
+\]
+\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp}
 \end{figure}
 \noindent
-Or it will match at least 1 more $a$:
-\begin{figure}\label{string_3parts2}
-\begin{center}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
-    \node [rectangle split, rectangle split horizontal, rectangle split parts=3]
-        {Consumed
-        \nodepart{two} Expects $a$
-         \nodepart{three} Then expects $a^*$};
-
-\end{tikzpicture}
-\end{center}
-\end{figure}
-If these two possibilities are identical, we can eliminate
-the second one as we know the second one corresponds to 
-a match that is not $\POSIX$.
-
-
-If two identical regexes 
-happen to be grouped into different sequences, one cannot use
-the $a + a \rightsquigarrow a$ rule to eliminate them, even if they 
-are "parallel" to each other:
-\[
-(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
-\]
-and that's because they are followed by possibly 
-different "suffixes" $r_1$ and $r_2$, and if 
-they do turn out to be different then doing 
-\[
-(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2
-\]
-might cause a possible match where the string is in $L(a \cdot r_2)$ to be lost.
-
-Here is an example for this.
-Assume we have the derivative regex 
-\[( r_1  +  r_2  +  r_3)\cdot r+ 
-( r_1  +  r_5  +  r_6)\cdot( r_1  +  r_2  +  r_3)^*
-\]
-
-occurring in an intermediate step in our bit-coded lexing algorithm.
+in our $\simp$ function,
+so that it makes the simplification in \ref{partialDedup} possible.
+For example,
+can a function like the below one be used?
+%TODO: simp' that spills things
 
-In this expression, there will be 6 "parallel" terms if we break up regexes 
-of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$).
-\begin{align}
-(\nonumber \\
- r_1  + &    \label{term:1} \\
- r_2  + &  \label{term:2} \\
- r_3 &  \label{term:3} \\
-&	)\cdot r\; + \; (\nonumber \\
- r_1  + & \label{term:4} \\
- r_5  + & \label{term:5} \\
- r_6 \label{term:6}\\
-&      )\cdot r\nonumber
-\end{align}
-
-They have been labelled, and each label denotes  
-one term, for example, \ref{term:1} denotes
-\[
- r_1 \cdot r
-\]
-\noindent
-and \ref{term:3} denotes
-\[
- r_3\cdot r.
-\]
-From a lexer's point of view, \ref{term:4} will never 
-be picked up in later phases of matching because there
-is a term \ref{term:1} giving identical matching information.
-The first term \ref{term:1} will match a string in $L(r_1 \cdot  r)$, 
-and so on for term \ref{term:2}, \ref{term:3}, etc.
-
-\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$}
-\begin{center}\label{string_2parts}
-
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
-    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
-        {$r_{x1}$
-         \nodepart{two} $r_1\cdot r$ };
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\end{tikzpicture}
-
+Unfortunately,
+if we introduce them in our
+setting we would lose the POSIX property of our calculated values. 
+For example given the regular expression 
+\begin{center}
+	$(a + ab)(bc + c)$
+\end{center}
+and the string
+\begin{center}
+	$ab$,
 \end{center}
-For term 1 \ref{term:1}, whatever was before the current
-input position was fully matched and the regex corresponding
-to it reduced to $\ONE$, 
-and in the next input token, it will start on $r_1\cdot r$.
-The resulting value will be something of a similar configuration:
-\begin{center}\label{value_2parts}
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
-    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
-        {$v_{x1}$
-         \nodepart{two} $v_{r_1\cdot r}$ };
-\end{tikzpicture}
+then our algorithm generates the following
+correct POSIX value
+\begin{center}
+	$\Seq \; (\Right \; ab) \; (\Right \; c)$.
 \end{center}
-For term 2 \ref{term:2} we have a similar value partition:
-\begin{center}\label{value_2parts2}
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
-    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
-        {$v_{x2}$
-         \nodepart{two} $v_{r_2\cdot r}$ };
-\end{tikzpicture}
+Essentially it matches the string with the longer Right-alternative
+in the first sequence (and
+then the 'rest' with the character regular expression $c$ from the second sequence).
+If we add the simplification above, then we obtain the following value
+\begin{center}
+	$\Left \; (\Seq \; a \; (\Left \; bc))$
+\end{center}
+where the $\Left$-alternatives get priority. 
+However this violates the POSIX rules.
+The reason for getting this undesired value
+is that the new rule splits this regular expression up into 
+\begin{center}
+	$a\cdot(b c + c) + ab \cdot (bc + c)$,
 \end{center}
-\noindent
-and so on.
-We note that for term 4 \ref{term:4} its result value 
-after any position beyond  the current one will always
-be the same:
-\begin{center}\label{value_2parts4}
-%\caption{term 1 \ref{term:1}'s matching configuration}
-\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
-    \node [rectangle split, rectangle split horizontal, rectangle split parts=2]
-        {$v_{x4}$
-         \nodepart{two} $v_{r_1\cdot r}$ };
-\end{tikzpicture}
+which becomes a regular expression with a 
+totally different structure--the original
+was a sequence, and now it becomes an alternative.
+With an alternative the maximum munch rule no longer works.\\
+A method to reconcile this is to do the 
+transformation in \ref{desiredSimp} ``non-invasively'',
+meaning that we traverse the list of regular expressions
+\begin{center}
+	$rs_a@[a]@rs_c$
+\end{center}
+in the alternative
+\begin{center}
+	$\sum ( rs_a@[a]@rs_c)$
 \end{center}
-And $\POSIX$ rules says that we can eliminate at least one of them.
-Our algorithm always puts the regex with the longest initial sub-match at the left of the 
-alternatives, so we safely throw away \ref{term:4}.
-The fact that term 1 and 4 are not immediately in the same alternative
-expression does not prevent them from being two duplicate matches at
-the current point of view.
-To implement this idea into an algorithm, we define a "pruning function"
-$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
-$\textit{acc}$, which acts as an element
-is a stronger version of $\textit{distinctBy}$.
-Here is a concise version of it (in the style of Scala):
-\begin{figure}[H]
-\begin{lstlisting}
-def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) : 
-List[ARexp] =  rs match {
-  case Nil => Nil
-  case r :: rs =>
-    if(acc.contains(erase(r)))
-      distinctByPlus(rs, acc)
-    else 
-      prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
-}
-
-\end{lstlisting}
-\caption{the distinctByPlus function}
-\end{figure}
-But for the function $\textit{prune}$ there is a difficulty:
-how could one define the $L$ function in a "computable" way,
-so that they generate a (lazy infinite) set of strings in $L(r)$.
-We also need a function that tests whether $L(r_1) \subseteq L(r_2)$
-is true.
-For the moment we cut corners and do not define these two functions
-as they are not used by Antimirov (and will probably not contribute
-to a bound better than Antimirov's cubic bound).
-Rather, we do not try to eliminate in every instance of regular expressions
-that have "language duplicates", but only eliminate the "exact duplicates".
-For this we need to break up terms $(a+b)\cdot c$ into two
-terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
+using  a function similar to $\distinctBy$,
+but this time 
+we allow a more general list rewrite:
+	\begin{mathpar}
+		\inferrule{\vspace{0mm} }{rs_a@[a]@rs_c 
+			\stackrel{s}{\rightsquigarrow }
+		rs_a@[\textit{prune} \; a \; rs_a]@rs_c }
+	\end{mathpar}
+%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a)
+where $\textit{prune} \;a \; acc$ traverses $a$
+without altering the structure of $a$, removing components in $a$
+that have appeared in the accumulator $acc$.
+For example
+\begin{center}
+	$\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $
+\end{center}
+should be equal to 
+\begin{center}
+	$(r_g+r_h)r_d$
+\end{center}
+because $r_gr_d$ and 
+$r_hr_d$ are the only terms
+that have not appeared in the accumulator list 
+\begin{center}
+$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$.
+\end{center}
+We implemented 
+function $\textit{prune}$ in Scala,
+and incorporated into our lexer,
+by replacing the $\simp$ function 
+with a stronger version called $\bsimpStrong$
+that prunes regular expressions.
+We call this lexer $\blexerStrong$.
+$\blexerStrong$ is able to drastically reduce the
+internal data structure size which could
+trigger exponential behaviours in
+$\blexersimp$.
 \begin{figure}[H]
-\begin{lstlisting}
-def distinctWith(rs: List[ARexp], 
-                pruneFunction: (ARexp, Set[Rexp]) => ARexp, 
-                acc: Set[Rexp] = Set()) : List[ARexp] = 
-  rs match{
-    case Nil => Nil
-    case r :: rs => 
-      if(acc(erase(r)))
-        distinctWith(rs, pruneFunction, acc)
-      else {
-        val pruned_r = pruneFunction(r, acc)
-        pruned_r :: 
-        distinctWith(rs, 
-          pruneFunction, 
-          turnIntoTerms(erase(pruned_r)) ++: acc
-        )
-      }
-  }
-\end{lstlisting}
-\caption{A Stronger Version of $\textit{distinctBy}$}
-\end{figure}
-\noindent
-This process is done by the function $\textit{turnIntoTerms}$:
-\begin{figure}
-\begin{verbatim}
-def turnIntoTerms(r: Rexp): List[Rexp] = r match {
-  case SEQ(r1, r2)  => if(isOne(r1)) 
-                          turnIntoTerms(r2) 
-                       else 
-                          turnIntoTerms(r1).map(r11 => SEQ(r11, r2))
-  case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2)
-  case ZERO => Nil
-  case _ => r :: Nil
-}
-\end{verbatim}
-\caption{function that breaks up regular expression into "atomic" terms}
-\end{figure}
-
-\noindent
-The pruning function can be defined recursively:
-\begin{figure}
-\begin{verbatim}
-def prune7(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
-  case AALTS(bs, rs) => rs.map(r => prune7(r, acc)).filter(_ != ZERO) match
-  {
-    case Nil => AZERO
-    case r::Nil => fuse(bs, r)
-    case rs1 => AALTS(bs, rs1)
-  }
-  case ASEQ(bs, r1, r2) => prune7(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
-    case AZERO => AZERO
-    case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
-    case r1p => ASEQ(bs, r1p, r2)
-  }
-  case r => if(acc(erase(r))) AZERO else r
-}
-\end{verbatim}
-\caption{pruning function}
-\end{figure}
-
-
-
-\begin{figure}
 \centering
 \begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}}
 \begin{tikzpicture}
@@ -316,10 +229,275 @@
 \end{axis}
 \end{tikzpicture}\\
 \multicolumn{2}{l}{Graphs: Runtime for matching $((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings 
-           of the form $\underbrace{aa..a}_{n}$.}
+	   of the form $\underbrace{aa..a}_{n}$.}
 \end{tabular}    
 \caption{aaaaaStarStar} \label{fig:aaaaaStarStar}
 \end{figure}
+\begin{figure}[H]
+\begin{lstlisting}
+    def atMostEmpty(r: Rexp) : Boolean = r match {
+      case ZERO => true
+      case ONE => true
+      case STAR(r) => atMostEmpty(r)
+      case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
+      case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2)
+      case CHAR(_) => false
+    }
+
+
+    def isOne(r: Rexp) : Boolean = r match {
+      case ONE => true
+      case SEQ(r1, r2) => isOne(r1) && isOne(r2)
+      case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne)
+      case STAR(r0) => atMostEmpty(r0)
+      case CHAR(c) => false
+      case ZERO => false
+    }
+
+    //r = r' ~ tail' : If tail' matches tail => returns r'
+    def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match {
+      case SEQ(r1, r2) => 
+        if(r2 == tail) 
+          r1
+        else
+          ZERO
+      case r => ZERO
+    }
+
+    def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{
+      case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match
+      {
+        //all components have been removed, meaning this is effectively a duplicate
+        //flats will take care of removing this AZERO 
+        case Nil => AZERO
+        case r::Nil => fuse(bs, r)
+        case rs1 => AALTS(bs, rs1)
+      }
+      case ASEQ(bs, r1, r2) => 
+        //remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1
+        prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match {
+          //after pruning, returns 0
+          case AZERO => AZERO
+          //after pruning, got r1'.r2, where r1' is equal to 1
+          case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2)
+          //assemble the pruned head r1p with r2
+          case r1p => ASEQ(bs, r1p, r2)
+        }
+        //this does the duplicate component removal task
+      case r => if(acc(erase(r))) AZERO else r
+    }
+\end{lstlisting}
+\caption{pruning function together with its helper functions}
+\end{figure}
+\noindent
+The benefits of using 
+$\textit{prune}$ such as refining the finiteness bound
+to a cubic bound has not been formalised yet.
+Therefore we choose to use Scala code rather than an Isabelle-style formal 
+definition like we did for $\simp$, as the definitions might change
+to suit proof needs.
+In the rest of the chapter we will use this convention consistently.
+\begin{figure}[H]
+\begin{lstlisting}
+  def distinctWith(rs: List[ARexp], 
+    pruneFunction: (ARexp, Set[Rexp]) => ARexp, 
+    acc: Set[Rexp] = Set()) : List[ARexp] = 
+      rs match{
+        case Nil => Nil
+        case r :: rs => 
+          if(acc(erase(r)))
+            distinctWith(rs, pruneFunction, acc)
+          else {
+            val pruned_r = pruneFunction(r, acc)
+            pruned_r :: 
+            distinctWith(rs, 
+              pruneFunction, 
+              turnIntoTerms(erase(pruned_r)) ++: acc
+              )
+          }
+      }
+\end{lstlisting}
+\caption{A Stronger Version of $\textit{distinctBy}$}
+\end{figure}
+\noindent
+The function $\textit{prune}$ is used in $\distinctWith$.
+$\distinctWith$ is a stronger version of $\distinctBy$
+which not only removes duplicates as $\distinctBy$ would
+do, but also uses the $\textit{pruneFunction}$
+argument to prune away verbose components in a regular expression.\\
+\begin{figure}[H]
+\begin{lstlisting}
+   //a stronger version of simp
+    def bsimpStrong(r: ARexp): ARexp =
+    {
+      r match {
+        case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match {
+          //normal clauses same as simp
+        case (AZERO, _) => AZERO
+        case (_, AZERO) => AZERO
+        case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s)
+        //bs2 can be discarded
+        case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil
+        case (r1s, r2s) => ASEQ(bs1, r1s, r2s)
+        }
+        case AALTS(bs1, rs) => {
+          //distinctBy(flat_res, erase)
+          distinctWith(flats(rs.map(bsimpStrong(_))), prune) match {
+            case Nil => AZERO
+            case s :: Nil => fuse(bs1, s)
+            case rs => AALTS(bs1, rs)
+          }
+        }
+        //stars that can be treated as 1
+        case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs)
+        case r => r
+      }
+    }
+\end{lstlisting}
+\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
+\end{figure}
+\noindent
+$\distinctWith$, is in turn used in $\bsimpStrong$:
+\begin{figure}[H]
+\begin{lstlisting}
+      //Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3)
+      def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match {
+        case Nil => r
+        case c::s => bdersStrong(s, bsimpStrong(bder(c, r)))
+      }
+\end{lstlisting}
+\caption{The function $\bsimpStrong$ and $\bdersStrongs$}
+\end{figure}
+\noindent
+We conjecture that the above Scala function $\bdersStrongs$,
+written $\bdersStrong{\_}{\_}$ as an infix notation, 
+satisfies the following property:
+\begin{conjecture}
+	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
+\end{conjecture}
+The stronger version of $\blexersimp$'s
+code in Scala looks like: 
+\begin{figure}[H]
+\begin{lstlisting}
+      def strongBlexer(r: Rexp, s: String) : Option[Val] = {
+        Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None)
+      }
+      def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match {
+        case Nil => {
+          if (bnullable(r)) {
+            mkepsBC(r)
+          }
+          else
+            throw new Exception("Not matched")
+        }
+        case c::cs => {
+          strong_blex_simp(strongBsimp(bder(c, r)), cs)
+        }
+      }
+\end{lstlisting}
+\end{figure}
+\noindent
+We would like to preserve the correctness like the one 
+we had for $\blexersimp$:
+\begin{conjecture}\label{cubicConjecture}
+	$\blexerStrong \;r \; s = \blexer\; r\;s$
+\end{conjecture}
+
+To implement this idea into an algorithm, we define a "pruning function"
+$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
+$\textit{acc}$, which acts as an element
+is a stronger version of $\textit{distinctBy}$.
+Here is a concise version of it (in the style of Scala):
+
+
+
+
+\noindent
+The pruning function can be defined recursively:
+
+
+
+
+This suggests a link towrads "partial derivatives"
+ introduced by Antimirov \cite{Antimirov95}.
+ 
+ \section{Antimirov's partial derivatives}
+ The idea behind Antimirov's partial derivatives
+is to do derivatives in a similar way as suggested by Brzozowski, 
+but maintain a set of regular expressions instead of a single one:
+
+%TODO: antimirov proposition 3.1, needs completion
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
+$\partial_x(\ONE)$ & $=$ & $\phi$
+\end{tabular}
+\end{center}
+
+Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
+using the alternatives constructor, Antimirov cleverly chose to put them into
+a set instead.  This breaks the terms in a derivative regular expression up, 
+allowing us to understand what are the "atomic" components of it.
+For example, To compute what regular expression $x^*(xx + y)^*$'s 
+derivative against $x$ is made of, one can do a partial derivative
+of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
+from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
+To get all the "atomic" components of a regular expression's possible derivatives,
+there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
+whatever character is available at the head of the string inside the language of a
+regular expression, and gives back the character and the derivative regular expression
+as a pair (which he called "monomial"):
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $\lf(\ONE)$ & $=$ & $\phi$\\
+$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
+ $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+ $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
+
+There is a slight difference in the last three clauses compared
+with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular 
+expression $r$ with every element inside $\textit{rset}$ to create a set of 
+sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates 
+on a set of monomials (which Antimirov called "linear form") and a regular 
+expression, and returns a linear form:
+ \begin{center}  
+ \begin{tabular}{ccc}
+ $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
+ $l \bigodot (\ONE)$ & $=$ & $l$\\
+ $\phi \bigodot t$ & $=$ & $\phi$\\
+ $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
+ $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
+  $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
+ $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
+ $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
+\end{tabular}
+\end{center}
+%TODO: completion
+
+ Some degree of simplification is applied when doing $\bigodot$, for example,
+ $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
+ and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
+  $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
+  and so on.
+  
+  With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with 
+  an iterative procedure:
+   \begin{center}  
+ \begin{tabular}{llll}
+$\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
+ 		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
+		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
+$\partial_{UNIV}(r)$ & $=$ & $\PD$ &		     
+\end{tabular}
+\end{center}
+
+
+ $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
+
+
 
 
 
@@ -345,22 +523,22 @@
 ~r \backslash c = ~ (r \backslash c)
 \]
  
-The most tricky part of lexing for the $~r$ regex
+The most tricky part of lexing for the $~r$ regular expression
  is creating a value for it.
  For other regular expressions, the value aligns with the 
- structure of the regex:
+ structure of the regular expression:
  \[
  \vdash \Seq(\Char(a), \Char(b)) : a \cdot b
  \]
-But for the $~r$ regex, $s$ is a member of it if and only if
+But for the $~r$ regular expression, $s$ is a member of it if and only if
 $s$ does not belong to $L(r)$. 
 That means when there
-is a match for the not regex, it is not possible to generate how the string $s$ matched
+is a match for the not regular expression, it is not possible to generate how the string $s$ matched
 with $r$. 
 What we can do is preserve the information of how $s$ was not matched by $r$,
 and there are a number of options to do this.
 
-We could give a partial value when there is a partial match for the regex inside
+We could give a partial value when there is a partial match for the regular expression inside
 the $\mathbf{not}$ construct.
 For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$,
 A value for it could be