cst_tests: comparison etnms/etnms.tex

equal deleted inserted replaced

-:576ddb23f596
+:4d6f54c478b5
 	&   & $\mathit{then}\;(r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c$\\
 	&   & $\mathit{else}\;(r_1\backslash c) \cdot r_2$\\
 	$(r^*)\backslash c$           & $\dn$ & $(r\backslash c) \cdot r^*$\\
 \end{tabular}
 \end{center}
+\noindent
-%Assuming the classic notion of a
+And defines how a regular expression evolves into
-%\emph{language} of a regular expression, written $L(\_)$, t
+a new regular expression after all the string it contains
+is chopped off a certain head character $c$.
-\noindent
 The main property of the derivative operation is that
 \begin{center}
 $c\!::\!s \in L(r)$ holds
 if and only if $s \in L(r\backslash c)$.
 \[
 match\;s\;r \;\dn\; nullable(r\backslash s)
 \]
 \noindent
-Assuming the a string is givane as a sequence of characters, say $c_0c_1..c_n$,
+Assuming the a string is given as a sequence of characters, say $c_0c_1..c_n$,
 this algorithm presented graphically is as follows:
 \begin{equation}\label{graph:*}
 \begin{tikzcd}
 r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  \arrow[r,"\textit{nullable}?"] & \;\textrm{YES}/\textrm{NO}
 \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
 example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
 $a^*+a+\ONE$. Adding these more aggressive simplification rules help us
 to achieve the same size bound as that of the partial derivatives.
+Suppose we apply simplification after each derivative step, and view
+these two operations as an atomic one: $a \backslash_{simp}\,c \dn
+\textit{simp}(a \backslash c)$. Then we can use the previous natural
+extension from derivative w.r.t.~character to derivative
+w.r.t.~string:%\comment{simp in  the [] case?}
+\begin{center}
+\begin{tabular}{lcl}
+$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
+$r \backslash_{simp} [\,] $ & $\dn$ & $r$
+\end{tabular}
+\end{center}
+\noindent
+we obtain an optimised version of the algorithm:
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
+$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
+& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
+& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+& & $\;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+\noindent
+This algorithm keeps the regular expression size small, for example,
+with this simplification our previous $(a + aa)^*$ example's 8000 nodes
+will be reduced to just 6 and stays constant, no matter how long the
+input string is.
 In order to implement the idea of ``spilling out alternatives'' and to
 make them compatible with the $\textit{inj}$-mechanism, we use
 \emph{bitcodes}. They were first introduced by Sulzmann and Lu.
 Here bits and bitcodes (lists of bits) are defined as:
 \begin{center}
 		$b ::=   1 \mid  0 \qquad
-bs ::= [] \mid b:bs
+bs ::= [] \mid b::bs
 $
 \end{center}
 \noindent
 The $1$ and $0$ are not in bold in order to avoid
 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
-bit-lists) can be used to encode values (or incomplete values) in a
+bit-lists) can be used to encode values (or potentially incomplete values) in a
 compact form. This can be straightforwardly seen in the following
 coding function from values to bitcodes:
 \begin{center}
 \begin{tabular}{lcl}
 \noindent
 Here $\textit{flatten}$ behaves like the traditional functional programming flatten
 function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
 removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
-Suppose we apply simplification after each derivative step, and view
-these two operations as an atomic one: $a \backslash_{simp}\,c \dn
-\textit{simp}(a \backslash c)$. Then we can use the previous natural
-extension from derivative w.r.t.~character to derivative
-w.r.t.~string:%\comment{simp in  the [] case?}
-\begin{center}
-\begin{tabular}{lcl}
-$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
-$r \backslash_{simp} [\,] $ & $\dn$ & $r$
-\end{tabular}
-\end{center}
-\noindent
-we obtain an optimised version of the algorithm:
-\begin{center}
-\begin{tabular}{lcl}
-$\textit{blexer\_simp}\;r\,s$ & $\dn$ &
-$\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\
-& & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
-& & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
-& & $\;\;\textit{else}\;\textit{None}$
-\end{tabular}
-\end{center}
-\noindent
-This algorithm keeps the regular expression size small, for example,
-with this simplification our previous $(a + aa)^*$ example's 8000 nodes
-will be reduced to just 6 and stays constant, no matter how long the
-input string is.
 \section{Current Work and Progress}
 For reasons beyond this report, it turns out that a complete set of

changeset 130	4d6f54c478b5
parent 129	576ddb23f596
child 131	b6984212cf87