cst_tests: comparison etnms/etnms.tex

equal deleted inserted replaced

-:4d6f54c478b5
+:b6984212cf87
 $\textit{der}$ and $\nullable$ need to traverse such trees and
 consequently the bigger the size of the derivative the slower the
 algorithm.
 Fortunately, one can simplify regular expressions after each derivative
-step. Various simplifications of regular expressions are possible, such
+step.
+Various simplifications of regular expressions are possible, such
 as the simplification of $\ZERO + r$, $r + \ZERO$, $\ONE\cdot r$, $r
-\cdot \ONE$, and $r + r$ to just $r$. These simplifications do not
+\cdot \ONE$, and $r + r$ to just $r$.
-affect the answer for whether a regular expression matches a string or
+Suppose we apply simplification after each derivative step, and compose
-not, but fortunately also do not affect the POSIX strategy of how
+these two operations together as an atomic one: $a \backslash_{simp}\,c \dn
-regular expressions match strings---although the latter is much harder
+\textit{simp}(a \backslash c)$. Then we can build values without having
+a cumbersome regular expression, and  fortunately if we are careful
+enough in making some extra rectifications, the POSIX value of how
+regular expressions match strings will not be affected---although is much harder
 to establish. Some initial results in this regard have been
 obtained in \cite{AusafDyckhoffUrban2016}.
 If we want the size of derivatives in Sulzmann and Lu's algorithm to
-stay below this bound, we would need more aggressive simplifications.
+stay even lower, we would need more aggressive simplifications.
 Essentially we need to delete useless $\ZERO$s and $\ONE$s, as well as
 deleting duplicates whenever possible. For example, the parentheses in
 $(a+b) \cdot c + b\cdot c$ can be opened up to get $a\cdot c + b \cdot c + b
 \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
 example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
 $a^*+a+\ONE$. Adding these more aggressive simplification rules help us
-to achieve the same size bound as that of the partial derivatives.
+to achieve a very tight size bound, namely,
+the same size bound as that of the \emph{partial derivatives}.
+And we want to get rid of complex and fragile rectification of values.
-Suppose we apply simplification after each derivative step, and view
-these two operations as an atomic one: $a \backslash_{simp}\,c \dn
-\textit{simp}(a \backslash c)$. Then we can use the previous natural
-extension from derivative w.r.t.~character to derivative
-w.r.t.~string:%\comment{simp in  the [] case?}
-\begin{center}
-\begin{tabular}{lcl}
-$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
-$r \backslash_{simp} [\,] $ & $\dn$ & $r$
-\end{tabular}
-\end{center}
-\noindent
-we obtain an optimised version of the algorithm:
-\begin{center}
-\begin{tabular}{lcl}
-$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
-$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
-& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
-& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
-& & $\;\;\textit{else}\;\textit{None}$
-\end{tabular}
-\end{center}
-\noindent
-This algorithm keeps the regular expression size small, for example,
-with this simplification our previous $(a + aa)^*$ example's 8000 nodes
-will be reduced to just 6 and stays constant, no matter how long the
-input string is.
 In order to implement the idea of ``spilling out alternatives'' and to
 make them compatible with the $\textit{inj}$-mechanism, we use
 \emph{bitcodes}. They were first introduced by Sulzmann and Lu.
 \noindent
 Here $\textit{flatten}$ behaves like the traditional functional programming flatten
 function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
 removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
+Having defined the $\simp$ function,
+we can use the previous notation of  natural
+extension from derivative w.r.t.~character to derivative
+w.r.t.~string:%\comment{simp in  the [] case?}
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
+$r \backslash_{simp} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+\noindent
+to obtain an optimised version of the algorithm:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
+& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+& & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+\noindent
+This algorithm keeps the regular expression size small, for example,
+with this simplification our previous $(a + aa)^*$ example's 8000 nodes
+will be reduced to just 6 and stays constant, no matter how long the
+input string is.
 \section{Current Work and Progress}
 For reasons beyond this report, it turns out that a complete set of
 &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
 &&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
 &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
 &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow  \textit{SEQ} \; bs \; a_1' \;  a_2'$ \\
-$\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\
+$\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{as.map(simp)})) \; \textit{match} $ \\
 &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
 &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
 &&$\quad\textit{case} \;  as' \Rightarrow  \textit{ALTS}\;bs\;as'$\\
 $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$

changeset 131	b6984212cf87
parent 130	4d6f54c478b5
child 132	69292c2b54d8