--- a/etnms/etnms.tex Sat Feb 08 21:57:11 2020 +0000
+++ b/etnms/etnms.tex Sat Feb 08 22:01:40 2020 +0000
@@ -183,11 +183,11 @@
$(r^*)\backslash c$ & $\dn$ & $(r\backslash c) \cdot r^*$\\
\end{tabular}
\end{center}
+\noindent
+And defines how a regular expression evolves into
+a new regular expression after all the string it contains
+is chopped off a certain head character $c$.
-%Assuming the classic notion of a
-%\emph{language} of a regular expression, written $L(\_)$, t
-
-\noindent
The main property of the derivative operation is that
\begin{center}
@@ -214,7 +214,7 @@
\]
\noindent
-Assuming the a string is givane as a sequence of characters, say $c_0c_1..c_n$,
+Assuming the a string is given as a sequence of characters, say $c_0c_1..c_n$,
this algorithm presented graphically is as follows:
\begin{equation}\label{graph:*}
@@ -380,6 +380,40 @@
$a^*+a+\ONE$. Adding these more aggressive simplification rules help us
to achieve the same size bound as that of the partial derivatives.
+
+Suppose we apply simplification after each derivative step, and view
+these two operations as an atomic one: $a \backslash_{simp}\,c \dn
+\textit{simp}(a \backslash c)$. Then we can use the previous natural
+extension from derivative w.r.t.~character to derivative
+w.r.t.~string:%\comment{simp in the [] case?}
+
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
+$r \backslash_{simp} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+
+\noindent
+we obtain an optimised version of the algorithm:
+
+ \begin{center}
+\begin{tabular}{lcl}
+ $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+ $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
+ & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+ & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+ & & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+
+\noindent
+This algorithm keeps the regular expression size small, for example,
+with this simplification our previous $(a + aa)^*$ example's 8000 nodes
+will be reduced to just 6 and stays constant, no matter how long the
+input string is.
+
+
In order to implement the idea of ``spilling out alternatives'' and to
make them compatible with the $\textit{inj}$-mechanism, we use
\emph{bitcodes}. They were first introduced by Sulzmann and Lu.
@@ -387,14 +421,14 @@
\begin{center}
$b ::= 1 \mid 0 \qquad
-bs ::= [] \mid b:bs
+bs ::= [] \mid b::bs
$
\end{center}
\noindent
The $1$ and $0$ are not in bold in order to avoid
confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
-bit-lists) can be used to encode values (or incomplete values) in a
+bit-lists) can be used to encode values (or potentially incomplete values) in a
compact form. This can be straightforwardly seen in the following
coding function from values to bitcodes:
@@ -735,37 +769,6 @@
function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
-Suppose we apply simplification after each derivative step, and view
-these two operations as an atomic one: $a \backslash_{simp}\,c \dn
-\textit{simp}(a \backslash c)$. Then we can use the previous natural
-extension from derivative w.r.t.~character to derivative
-w.r.t.~string:%\comment{simp in the [] case?}
-
-\begin{center}
-\begin{tabular}{lcl}
-$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
-$r \backslash_{simp} [\,] $ & $\dn$ & $r$
-\end{tabular}
-\end{center}
-
-\noindent
-we obtain an optimised version of the algorithm:
-
- \begin{center}
-\begin{tabular}{lcl}
- $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
- $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
- & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
- & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
- & & $\;\;\textit{else}\;\textit{None}$
-\end{tabular}
-\end{center}
-
-\noindent
-This algorithm keeps the regular expression size small, for example,
-with this simplification our previous $(a + aa)^*$ example's 8000 nodes
-will be reduced to just 6 and stays constant, no matter how long the
-input string is.