etnms/etnms.tex
changeset 130 4d6f54c478b5
parent 129 576ddb23f596
child 131 b6984212cf87
equal deleted inserted replaced
129:576ddb23f596 130:4d6f54c478b5
   181 	&   & $\mathit{then}\;(r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c$\\
   181 	&   & $\mathit{then}\;(r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c$\\
   182 	&   & $\mathit{else}\;(r_1\backslash c) \cdot r_2$\\
   182 	&   & $\mathit{else}\;(r_1\backslash c) \cdot r_2$\\
   183 	$(r^*)\backslash c$           & $\dn$ & $(r\backslash c) \cdot r^*$\\
   183 	$(r^*)\backslash c$           & $\dn$ & $(r\backslash c) \cdot r^*$\\
   184 \end{tabular}
   184 \end{tabular}
   185 \end{center}
   185 \end{center}
   186 
   186 \noindent
   187 %Assuming the classic notion of a
   187 And defines how a regular expression evolves into
   188 %\emph{language} of a regular expression, written $L(\_)$, t
   188 a new regular expression after all the string it contains
   189 
   189 is chopped off a certain head character $c$.
   190 \noindent
   190 
   191 The main property of the derivative operation is that
   191 The main property of the derivative operation is that
   192 
   192 
   193 \begin{center}
   193 \begin{center}
   194 $c\!::\!s \in L(r)$ holds
   194 $c\!::\!s \in L(r)$ holds
   195 if and only if $s \in L(r\backslash c)$.
   195 if and only if $s \in L(r\backslash c)$.
   212 \[
   212 \[
   213 match\;s\;r \;\dn\; nullable(r\backslash s)
   213 match\;s\;r \;\dn\; nullable(r\backslash s)
   214 \]
   214 \]
   215 
   215 
   216 \noindent
   216 \noindent
   217 Assuming the a string is givane as a sequence of characters, say $c_0c_1..c_n$, 
   217 Assuming the a string is given as a sequence of characters, say $c_0c_1..c_n$, 
   218 this algorithm presented graphically is as follows:
   218 this algorithm presented graphically is as follows:
   219 
   219 
   220 \begin{equation}\label{graph:*}
   220 \begin{equation}\label{graph:*}
   221 \begin{tikzcd}
   221 \begin{tikzcd}
   222 r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  \arrow[r,"\textit{nullable}?"] & \;\textrm{YES}/\textrm{NO}
   222 r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  \arrow[r,"\textit{nullable}?"] & \;\textrm{YES}/\textrm{NO}
   378 \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
   378 \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
   379 example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
   379 example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
   380 $a^*+a+\ONE$. Adding these more aggressive simplification rules help us
   380 $a^*+a+\ONE$. Adding these more aggressive simplification rules help us
   381 to achieve the same size bound as that of the partial derivatives. 
   381 to achieve the same size bound as that of the partial derivatives. 
   382 
   382 
       
   383 
       
   384 Suppose we apply simplification after each derivative step, and view
       
   385 these two operations as an atomic one: $a \backslash_{simp}\,c \dn
       
   386 \textit{simp}(a \backslash c)$. Then we can use the previous natural
       
   387 extension from derivative w.r.t.~character to derivative
       
   388 w.r.t.~string:%\comment{simp in  the [] case?}
       
   389 
       
   390 \begin{center}
       
   391 \begin{tabular}{lcl}
       
   392 $r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
       
   393 $r \backslash_{simp} [\,] $ & $\dn$ & $r$
       
   394 \end{tabular}
       
   395 \end{center}
       
   396 
       
   397 \noindent
       
   398 we obtain an optimised version of the algorithm:
       
   399 
       
   400  \begin{center}
       
   401 \begin{tabular}{lcl}
       
   402   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
       
   403       $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
       
   404   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
       
   405   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
       
   406   & & $\;\;\textit{else}\;\textit{None}$
       
   407 \end{tabular}
       
   408 \end{center}
       
   409 
       
   410 \noindent
       
   411 This algorithm keeps the regular expression size small, for example,
       
   412 with this simplification our previous $(a + aa)^*$ example's 8000 nodes
       
   413 will be reduced to just 6 and stays constant, no matter how long the
       
   414 input string is.
       
   415 
       
   416 
   383 In order to implement the idea of ``spilling out alternatives'' and to
   417 In order to implement the idea of ``spilling out alternatives'' and to
   384 make them compatible with the $\textit{inj}$-mechanism, we use
   418 make them compatible with the $\textit{inj}$-mechanism, we use
   385 \emph{bitcodes}. They were first introduced by Sulzmann and Lu.
   419 \emph{bitcodes}. They were first introduced by Sulzmann and Lu.
   386 Here bits and bitcodes (lists of bits) are defined as:
   420 Here bits and bitcodes (lists of bits) are defined as:
   387 
   421 
   388 \begin{center}
   422 \begin{center}
   389 		$b ::=   1 \mid  0 \qquad
   423 		$b ::=   1 \mid  0 \qquad
   390 bs ::= [] \mid b:bs    
   424 bs ::= [] \mid b::bs    
   391 $
   425 $
   392 \end{center}
   426 \end{center}
   393 
   427 
   394 \noindent
   428 \noindent
   395 The $1$ and $0$ are not in bold in order to avoid 
   429 The $1$ and $0$ are not in bold in order to avoid 
   396 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
   430 confusion with the regular expressions $\ZERO$ and $\ONE$. Bitcodes (or
   397 bit-lists) can be used to encode values (or incomplete values) in a
   431 bit-lists) can be used to encode values (or potentially incomplete values) in a
   398 compact form. This can be straightforwardly seen in the following
   432 compact form. This can be straightforwardly seen in the following
   399 coding function from values to bitcodes: 
   433 coding function from values to bitcodes: 
   400 
   434 
   401 \begin{center}
   435 \begin{center}
   402 \begin{tabular}{lcl}
   436 \begin{tabular}{lcl}
   733 \noindent
   767 \noindent
   734 Here $\textit{flatten}$ behaves like the traditional functional programming flatten
   768 Here $\textit{flatten}$ behaves like the traditional functional programming flatten
   735 function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
   769 function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
   736 removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
   770 removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
   737 
   771 
   738 Suppose we apply simplification after each derivative step, and view
       
   739 these two operations as an atomic one: $a \backslash_{simp}\,c \dn
       
   740 \textit{simp}(a \backslash c)$. Then we can use the previous natural
       
   741 extension from derivative w.r.t.~character to derivative
       
   742 w.r.t.~string:%\comment{simp in  the [] case?}
       
   743 
       
   744 \begin{center}
       
   745 \begin{tabular}{lcl}
       
   746 $r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
       
   747 $r \backslash_{simp} [\,] $ & $\dn$ & $r$
       
   748 \end{tabular}
       
   749 \end{center}
       
   750 
       
   751 \noindent
       
   752 we obtain an optimised version of the algorithm:
       
   753 
       
   754  \begin{center}
       
   755 \begin{tabular}{lcl}
       
   756   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
       
   757       $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
       
   758   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
       
   759   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
       
   760   & & $\;\;\textit{else}\;\textit{None}$
       
   761 \end{tabular}
       
   762 \end{center}
       
   763 
       
   764 \noindent
       
   765 This algorithm keeps the regular expression size small, for example,
       
   766 with this simplification our previous $(a + aa)^*$ example's 8000 nodes
       
   767 will be reduced to just 6 and stays constant, no matter how long the
       
   768 input string is.
       
   769 
   772 
   770 
   773 
   771 
   774 
   772 \section{Current Work and Progress}
   775 \section{Current Work and Progress}
   773 For reasons beyond this report, it turns out that a complete set of
   776 For reasons beyond this report, it turns out that a complete set of