ninems/ninems.tex
changeset 47 d2a7e87ea6e1
parent 46 9b48724ec609
child 48 bbefcf7351f2
equal deleted inserted replaced
46:9b48724ec609 47:d2a7e87ea6e1
   733 unnecessary ``copies'' of regular expressions (very similar to
   733 unnecessary ``copies'' of regular expressions (very similar to
   734 simplifying $r + r$ to just $r$, but in a more general
   734 simplifying $r + r$ to just $r$, but in a more general
   735 setting). 
   735 setting). 
   736 A psuedocode version of our algorithm is given below:\\
   736 A psuedocode version of our algorithm is given below:\\
   737 
   737 
   738 \begin{algorithm}
   738 simp r \defn r if r = ONE bs or CHAR bs c or STAR bs r
   739 \caption{simplification of annotated regular expression}\label{euclid}
   739 simp SEQ bs r_1 r_2 \defn \\
   740 \begin{algorithmic}[1]
   740 case (simp(r_1), simp(r_2) ) of (0, _) => 0
   741 \Procedure{$Simp$}{$areg$}
   741 (_,0) => 0
   742 \Switch{$areg$}
   742 (1, r) => fuse bs r
   743 	\Case{$ALTS(bs, rs)$}
   743 (r,1) => fuse bs r
   744 		\For{\textit{rs[i] in array rs}}
   744 (r_1, r_2) => SEQ bs r_1 r_2
   745         			\State $\textit{rs[i]} \gets$ \textit{Simp(rs[i])}
   745 simp ALT bs rs = distinct(flatten( map simp rs)) match 
   746       		\EndFor
   746 case Nil => ZERO
   747 		\For{\textit{rs[i] in array rs}}
   747 case r::Nil => fuse bs r
   748         			\If{$rs[i] == ALTS(bs', rs')$}
   748 case rs => ALT bs rs
   749 				\State $rs'' \gets$ attach bits $bs'$ to all elements in $rs'$
   749 
   750 				\State Insert $rs''$ into $rs$ at position $i$ ($rs[i]$ is destroyed, replaced by its list of children regular expressions)
   750 The simplification does a pattern matching on the regular expression. When it detected that
   751 			\EndIf
   751 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
   752       		\EndFor
   752 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   753 		\State Remove all duplicates in $rs$, only keeping the first copy for multiple occurrences of the same regular expression
   753  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   754 		\State Remove all $0$s in $rs$
   754  flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
   755 		\If{$ rs.length == 0$} \Return $ZERO$ \EndIf
   755 
   756 		\If {$ rs.length == 1$} \Return$ rs[0] $\EndIf
       
   757 	\EndCase
       
   758 	\Case{$SEQ(bs, r_1, r_2)$}
       
   759 		\If{$ r_1$ or $r_2$ is $ZERO$} \Return ZERO \EndIf
       
   760 		\State update $r_1$ and $r_2$ by attaching $bs$ to their front
       
   761 		\If {$r_1$ or $r_2$ is $ONE(bs')$} \Return $r_2$ or $r_1$ \EndIf
       
   762 	\EndCase
       
   763 	\Case{$Others$}
       
   764 		\Return $areg$ as it is
       
   765 	\EndCase
       
   766 \EndSwitch
       
   767 \EndProcedure
       
   768 \end{algorithmic}
       
   769 \end{algorithm}
       
   770 With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6.
   756 With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6.
   771 
   757 
   772 Another modification is that we use simplification rules
   758 Another modification is that we use simplification rules
   773 inspired by Antimirov's work on partial derivatives. They maintain the
   759 inspired by Antimirov's work on partial derivatives. They maintain the
   774 idea that only the first ``copy'' of a regular expression in an
   760 idea that only the first ``copy'' of a regular expression in an