# HG changeset patch # User Chengsong # Date 1581199300 0 # Node ID 4d6f54c478b5e579e46e19550451b92e1c3ea5e3 # Parent 576ddb23f596f5e7e2ba3b5f08f9029be2b48c09 i diff -r 576ddb23f596 -r 4d6f54c478b5 etnms/etnms.tex --- a/etnms/etnms.tex Sat Feb 08 21:57:11 2020 +0000 +++ b/etnms/etnms.tex Sat Feb 08 22:01:40 2020 +0000 @@ -183,11 +183,11 @@ $(r^*)\backslash c$ & $\dn$ & $(r\backslash c) \cdot r^*$\\ \end{tabular} \end{center} +\noindent +And defines how a regular expression evolves into +a new regular expression after all the string it contains +is chopped off a certain head character $c$. -%Assuming the classic notion of a -%\emph{language} of a regular expression, written $L(\_)$, t - -\noindent The main property of the derivative operation is that \begin{center} @@ -214,7 +214,7 @@ \] \noindent -Assuming the a string is givane as a sequence of characters, say $c_0c_1..c_n$, +Assuming the a string is given as a sequence of characters, say $c_0c_1..c_n$, this algorithm presented graphically is as follows: \begin{equation}\label{graph:*} @@ -380,6 +380,40 @@ $a^*+a+\ONE$. Adding these more aggressive simplification rules help us to achieve the same size bound as that of the partial derivatives. + +Suppose we apply simplification after each derivative step, and view +these two operations as an atomic one: $a \backslash_{simp}\,c \dn +\textit{simp}(a \backslash c)$. Then we can use the previous natural +extension from derivative w.r.t.~character to derivative +w.r.t.~string:%\comment{simp in the [] case?} + +\begin{center} +\begin{tabular}{lcl} +$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\ +$r \backslash_{simp} [\,] $ & $\dn$ & $r$ +\end{tabular} +\end{center} + +\noindent +we obtain an optimised version of the algorithm: + + \begin{center} +\begin{tabular}{lcl} + $\textit{blexer\_simp}\;r\,s$ & $\dn$ & + $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\ + & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ + & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ + & & $\;\;\textit{else}\;\textit{None}$ +\end{tabular} +\end{center} + +\noindent +This algorithm keeps the regular expression size small, for example, +with this simplification our previous $(a + aa)^*$ example's 8000 nodes +will be reduced to just 6 and stays constant, no matter how long the +input string is. + + In order to implement the idea of ``spilling out alternatives'' and to make them compatible with the $\textit{inj}$-mechanism, we use \emph{bitcodes}. They were first introduced by Sulzmann and Lu. @@ -387,14 +421,14 @@ \begin{center} $b ::= 1 \mid 0 \qquad -bs ::= [] \mid b:bs +bs ::= [] \mid b::bs $ \end{center} \noindent The $1$ and $0$ are not in bold in order to avoid confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or -bit-lists) can be used to encode values (or incomplete values) in a +bit-lists) can be used to encode values (or potentially incomplete values) in a compact form. This can be straightforwardly seen in the following coding function from values to bitcodes: @@ -735,37 +769,6 @@ function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it removes parentheses, for example changing $a+(b+c)$ into $a+b+c$. -Suppose we apply simplification after each derivative step, and view -these two operations as an atomic one: $a \backslash_{simp}\,c \dn -\textit{simp}(a \backslash c)$. Then we can use the previous natural -extension from derivative w.r.t.~character to derivative -w.r.t.~string:%\comment{simp in the [] case?} - -\begin{center} -\begin{tabular}{lcl} -$r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\ -$r \backslash_{simp} [\,] $ & $\dn$ & $r$ -\end{tabular} -\end{center} - -\noindent -we obtain an optimised version of the algorithm: - - \begin{center} -\begin{tabular}{lcl} - $\textit{blexer\_simp}\;r\,s$ & $\dn$ & - $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\ - & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\ - & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\ - & & $\;\;\textit{else}\;\textit{None}$ -\end{tabular} -\end{center} - -\noindent -This algorithm keeps the regular expression size small, for example, -with this simplification our previous $(a + aa)^*$ example's 8000 nodes -will be reduced to just 6 and stays constant, no matter how long the -input string is.