ninems/ninems.tex
changeset 52 25bbbb8b0e90
parent 51 5df7faf69238
child 53 3ec403f650a8
equal deleted inserted replaced
51:5df7faf69238 52:25bbbb8b0e90
   547 by the number of characters contained in the initial regular
   547 by the number of characters contained in the initial regular
   548 expression \cite{Antimirov95}.
   548 expression \cite{Antimirov95}.
   549 
   549 
   550 Antimirov defined the "partial derivatives" of regular expressions to be this:
   550 Antimirov defined the "partial derivatives" of regular expressions to be this:
   551 %TODO definition of partial derivatives
   551 %TODO definition of partial derivatives
   552 \[ pder c 0 = emtpy set\]
   552 \begin{center}
   553 \[pder c 1 = emptry set\]
   553 \begin{tabular}{lcl}
   554 \[pder c c = if c is equal to d then set containing lambda only else  empty set\]
   554  $\textit{pder} \; c \; 0$ & $\dn$ & $\emptyset$\\
   555 \[pder c r_1+r_2 = pder c r_1 \cup pder c r_2\]
   555  $\textit{pder} \; c \; 1$ & $\dn$ & $\emptyset$ \\
   556 \[pder c r_1 \cdot r_2 = if r_1 nullable then set of r \cdot r_2' where r is in pder c r_1 \cup pder c r_2\]
   556  $\textit{pder} \; c \; d$ & $\dn$ & $\textit{if} \; c \,=\, d \; \{  1   \}  \; \textit{else} \; \emptyset$ \\ 
   557 \[else set of r \cdot r_2' where r is in pder c r_1\]
   557   $\textit{pder} \; c \; r_1+r_2$ & $\dn$ & $pder \; c \; r_1 \cup pder \; c \;  r_2$ \\
   558 \[pder c r^* = set of r \cdot r^* where r \in pder c r\]
   558    $\textit{pder} \; c \; r_1 \cdot r_2$ & $\dn$ & $\textit{if} \; nullable \; r_1 \; \{  r \cdot r_2 \mid r \in pder \; c \; r_1   \}  \cup pder \; c \; r_2 \; \textit{else} \; \{  r \cdot r_2 \mid r \in pder \; c \; r_1   \} $ \\ 
   559   
   559      $\textit{pder} \; c \; r^*$ & $\dn$ & $ \{  r' \cdot r^* \mid r' \in pder \; c \; r   \}  $ \\  
       
   560  \end{tabular}
       
   561  \end{center}
       
   562 
   560 
   563 
   561 it is essentially a set of regular expressions that come from the sub-structure of the original regular expression. 
   564 it is essentially a set of regular expressions that come from the sub-structure of the original regular expression. 
   562 Antimirov has proved a nice size bound of the size of partial derivatives. Roughly speaking the size will not exceed the fourth power of the number of nodes in that regular expression.  Interestingly, we observed from experiment that after the simplification step, our regular expression has the same size or is smaller than the partial derivatives. This allows us to prove a tight bound on the size of regular expression during the running time of the algorithm if we can establish the connection between our simplification rules and partial derivatives.
   565 Antimirov has proved a nice size bound of the size of partial derivatives. Roughly speaking the size will not exceed the fourth power of the number of nodes in that regular expression.  Interestingly, we observed from experiment that after the simplification step, our regular expression has the same size or is smaller than the partial derivatives. This allows us to prove a tight bound on the size of regular expression during the running time of the algorithm if we can establish the connection between our simplification rules and partial derivatives.
   563 
   566 
   564  %We believe, and have generated test
   567  %We believe, and have generated test
   753 inspired by Antimirov's work on partial derivatives. They maintain the
   756 inspired by Antimirov's work on partial derivatives. They maintain the
   754 idea that only the first ``copy'' of a regular expression in an
   757 idea that only the first ``copy'' of a regular expression in an
   755 alternative contributes to the calculation of a POSIX value. All
   758 alternative contributes to the calculation of a POSIX value. All
   756 subsequent copies can be pruned from the regular expression.
   759 subsequent copies can be pruned from the regular expression.
   757 
   760 
   758 A psuedocode version of our algorithm is given below:\\
   761 A recursive definition of simplification function that looks similar to scala code is given below:\\
   759 \[simp r \dn r if r = ONE bs or CHAR bs c or STAR bs r
   762 \begin{center}
   760 simp SEQ bs r_1 r_2 \dn \\
   763   \begin{tabular}{@{}lcl@{}}
   761 case (simp(r_1), simp(r_2) ) of (0, _) => 0
   764   $\textit{simp} \; a$ & $\dn$ & $\textit{a} \; \textit{if} \; a  =  (\textit{ONE} \; bs) \; or\; (\textit{CHAR} \, bs \; c) \; or\; (\textit{STAR}\; bs\; a_1)$\\  
   762 (_,0) => 0
   765   
   763 (1, r) => fuse bs r
   766   $\textit{simp} \; \textit{SEQ}\;bs\,a_1\,a_2$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
   764 (r,1) => fuse bs r
   767   &&$\textit{case} \; (0, \_) \Rightarrow  0$ \\
   765 (r_1, r_2) => SEQ bs r_1 r_2
   768    &&$ \textit{case} \; (\_, 0) \Rightarrow  0$ \\
   766 simp ALT bs rs = distinct(flatten( map simp rs)) match 
   769    &&$ \textit{case} \;  (1, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
   767 case Nil => ZERO
   770    &&$ \textit{case} \; (a_1', 1) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
   768 case r::Nil => fuse bs r
   771    &&$ \textit{case} \; (a_1', a_2') \Rightarrow  \textit{SEQ} \; bs \; a_1' \;  a_2'$ \\
   769 case rs => ALT bs rs
   772 
   770 \]
   773   $\textit{simp} \; \textit{ALT}\;bs\,as$ & $\dn$ & $\textit{ distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\
       
   774   &&$\textit{case} \; [] \Rightarrow  0$ \\
       
   775    &&$ \textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
       
   776    &&$ \textit{case} \;  as' \Rightarrow  \textit{ALT bs as'}$ 
       
   777 \end{tabular}    
       
   778 \end{center}    
   771 
   779 
   772 The simplification does a pattern matching on the regular expression. When it detected that
   780 The simplification does a pattern matching on the regular expression. When it detected that
   773 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
   781 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
   774 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   782 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   775  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   783  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions