etnms/etnms.tex
changeset 131 b6984212cf87
parent 130 4d6f54c478b5
child 132 69292c2b54d8
--- a/etnms/etnms.tex	Sat Feb 08 22:01:40 2020 +0000
+++ b/etnms/etnms.tex	Sat Feb 08 22:24:21 2020 +0000
@@ -361,57 +361,31 @@
 algorithm. 
 
 Fortunately, one can simplify regular expressions after each derivative
-step. Various simplifications of regular expressions are possible, such
+step. 
+Various simplifications of regular expressions are possible, such
 as the simplification of $\ZERO + r$, $r + \ZERO$, $\ONE\cdot r$, $r
-\cdot \ONE$, and $r + r$ to just $r$. These simplifications do not
-affect the answer for whether a regular expression matches a string or
-not, but fortunately also do not affect the POSIX strategy of how
-regular expressions match strings---although the latter is much harder
+\cdot \ONE$, and $r + r$ to just $r$.
+Suppose we apply simplification after each derivative step, and compose
+these two operations together as an atomic one: $a \backslash_{simp}\,c \dn
+\textit{simp}(a \backslash c)$. Then we can build values without having
+a cumbersome regular expression, and  fortunately if we are careful
+enough in making some extra rectifications, the POSIX value of how
+regular expressions match strings will not be affected---although is much harder
 to establish. Some initial results in this regard have been
 obtained in \cite{AusafDyckhoffUrban2016}. 
 
+
 If we want the size of derivatives in Sulzmann and Lu's algorithm to
-stay below this bound, we would need more aggressive simplifications.
+stay even lower, we would need more aggressive simplifications.
 Essentially we need to delete useless $\ZERO$s and $\ONE$s, as well as
 deleting duplicates whenever possible. For example, the parentheses in
 $(a+b) \cdot c + b\cdot c$ can be opened up to get $a\cdot c + b \cdot c + b
 \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
 example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
 $a^*+a+\ONE$. Adding these more aggressive simplification rules help us
-to achieve the same size bound as that of the partial derivatives. 
-
-
-Suppose we apply simplification after each derivative step, and view
-these two operations as an atomic one: $a \backslash_{simp}\,c \dn
-\textit{simp}(a \backslash c)$. Then we can use the previous natural
-extension from derivative w.r.t.~character to derivative
-w.r.t.~string:%\comment{simp in  the [] case?}
-
-\begin{center}
-\begin{tabular}{lcl}
-$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
-$r \backslash_{simp} [\,] $ & $\dn$ & $r$
-\end{tabular}
-\end{center}
-
-\noindent
-we obtain an optimised version of the algorithm:
-
- \begin{center}
-\begin{tabular}{lcl}
-  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
-      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
-  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
-  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
-  & & $\;\;\textit{else}\;\textit{None}$
-\end{tabular}
-\end{center}
-
-\noindent
-This algorithm keeps the regular expression size small, for example,
-with this simplification our previous $(a + aa)^*$ example's 8000 nodes
-will be reduced to just 6 and stays constant, no matter how long the
-input string is.
+to achieve a very tight size bound, namely,
+ the same size bound as that of the \emph{partial derivatives}. 
+And we want to get rid of complex and fragile rectification of values.
 
 
 In order to implement the idea of ``spilling out alternatives'' and to
@@ -769,6 +743,36 @@
 function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
 removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
 
+Having defined the $\simp$ function,
+we can use the previous notation of  natural
+extension from derivative w.r.t.~character to derivative
+w.r.t.~string:%\comment{simp in  the [] case?}
+
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
+$r \backslash_{simp} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+
+\noindent
+to obtain an optimised version of the algorithm:
+
+ \begin{center}
+\begin{tabular}{lcl}
+  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
+  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+  & & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+
+\noindent
+This algorithm keeps the regular expression size small, for example,
+with this simplification our previous $(a + aa)^*$ example's 8000 nodes
+will be reduced to just 6 and stays constant, no matter how long the
+input string is.
 
 
 
@@ -1150,7 +1154,7 @@
    &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
    &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow  \textit{SEQ} \; bs \; a_1' \;  a_2'$ \\
 
-  $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\
+  $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{as.map(simp)})) \; \textit{match} $ \\
   &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
    &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
    &&$\quad\textit{case} \;  as' \Rightarrow  \textit{ALTS}\;bs\;as'$\\