--- a/etnms/etnms.tex Sat Feb 08 22:01:40 2020 +0000
+++ b/etnms/etnms.tex Sat Feb 08 22:24:21 2020 +0000
@@ -361,57 +361,31 @@
algorithm.
Fortunately, one can simplify regular expressions after each derivative
-step. Various simplifications of regular expressions are possible, such
+step.
+Various simplifications of regular expressions are possible, such
as the simplification of $\ZERO + r$, $r + \ZERO$, $\ONE\cdot r$, $r
-\cdot \ONE$, and $r + r$ to just $r$. These simplifications do not
-affect the answer for whether a regular expression matches a string or
-not, but fortunately also do not affect the POSIX strategy of how
-regular expressions match strings---although the latter is much harder
+\cdot \ONE$, and $r + r$ to just $r$.
+Suppose we apply simplification after each derivative step, and compose
+these two operations together as an atomic one: $a \backslash_{simp}\,c \dn
+\textit{simp}(a \backslash c)$. Then we can build values without having
+a cumbersome regular expression, and fortunately if we are careful
+enough in making some extra rectifications, the POSIX value of how
+regular expressions match strings will not be affected---although is much harder
to establish. Some initial results in this regard have been
obtained in \cite{AusafDyckhoffUrban2016}.
+
If we want the size of derivatives in Sulzmann and Lu's algorithm to
-stay below this bound, we would need more aggressive simplifications.
+stay even lower, we would need more aggressive simplifications.
Essentially we need to delete useless $\ZERO$s and $\ONE$s, as well as
deleting duplicates whenever possible. For example, the parentheses in
$(a+b) \cdot c + b\cdot c$ can be opened up to get $a\cdot c + b \cdot c + b
\cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
$a^*+a+\ONE$. Adding these more aggressive simplification rules help us
-to achieve the same size bound as that of the partial derivatives.
-
-
-Suppose we apply simplification after each derivative step, and view
-these two operations as an atomic one: $a \backslash_{simp}\,c \dn
-\textit{simp}(a \backslash c)$. Then we can use the previous natural
-extension from derivative w.r.t.~character to derivative
-w.r.t.~string:%\comment{simp in the [] case?}
-
-\begin{center}
-\begin{tabular}{lcl}
-$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
-$r \backslash_{simp} [\,] $ & $\dn$ & $r$
-\end{tabular}
-\end{center}
-
-\noindent
-we obtain an optimised version of the algorithm:
-
- \begin{center}
-\begin{tabular}{lcl}
- $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
- $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
- & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
- & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
- & & $\;\;\textit{else}\;\textit{None}$
-\end{tabular}
-\end{center}
-
-\noindent
-This algorithm keeps the regular expression size small, for example,
-with this simplification our previous $(a + aa)^*$ example's 8000 nodes
-will be reduced to just 6 and stays constant, no matter how long the
-input string is.
+to achieve a very tight size bound, namely,
+ the same size bound as that of the \emph{partial derivatives}.
+And we want to get rid of complex and fragile rectification of values.
In order to implement the idea of ``spilling out alternatives'' and to
@@ -769,6 +743,36 @@
function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
+Having defined the $\simp$ function,
+we can use the previous notation of natural
+extension from derivative w.r.t.~character to derivative
+w.r.t.~string:%\comment{simp in the [] case?}
+
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
+$r \backslash_{simp} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+
+\noindent
+to obtain an optimised version of the algorithm:
+
+ \begin{center}
+\begin{tabular}{lcl}
+ $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+ $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
+ & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+ & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+ & & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+
+\noindent
+This algorithm keeps the regular expression size small, for example,
+with this simplification our previous $(a + aa)^*$ example's 8000 nodes
+will be reduced to just 6 and stays constant, no matter how long the
+input string is.
@@ -1150,7 +1154,7 @@
&&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\
&&$\quad\textit{case} \; (a_1', a_2') \Rightarrow \textit{SEQ} \; bs \; a_1' \; a_2'$ \\
- $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\
+ $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{as.map(simp)})) \; \textit{match} $ \\
&&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\
&&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\
&&$\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$\\