733 unnecessary ``copies'' of regular expressions (very similar to |
733 unnecessary ``copies'' of regular expressions (very similar to |
734 simplifying $r + r$ to just $r$, but in a more general |
734 simplifying $r + r$ to just $r$, but in a more general |
735 setting). |
735 setting). |
736 A psuedocode version of our algorithm is given below:\\ |
736 A psuedocode version of our algorithm is given below:\\ |
737 |
737 |
738 \begin{algorithm} |
738 simp r \defn r if r = ONE bs or CHAR bs c or STAR bs r |
739 \caption{simplification of annotated regular expression}\label{euclid} |
739 simp SEQ bs r_1 r_2 \defn \\ |
740 \begin{algorithmic}[1] |
740 case (simp(r_1), simp(r_2) ) of (0, _) => 0 |
741 \Procedure{$Simp$}{$areg$} |
741 (_,0) => 0 |
742 \Switch{$areg$} |
742 (1, r) => fuse bs r |
743 \Case{$ALTS(bs, rs)$} |
743 (r,1) => fuse bs r |
744 \For{\textit{rs[i] in array rs}} |
744 (r_1, r_2) => SEQ bs r_1 r_2 |
745 \State $\textit{rs[i]} \gets$ \textit{Simp(rs[i])} |
745 simp ALT bs rs = distinct(flatten( map simp rs)) match |
746 \EndFor |
746 case Nil => ZERO |
747 \For{\textit{rs[i] in array rs}} |
747 case r::Nil => fuse bs r |
748 \If{$rs[i] == ALTS(bs', rs')$} |
748 case rs => ALT bs rs |
749 \State $rs'' \gets$ attach bits $bs'$ to all elements in $rs'$ |
749 |
750 \State Insert $rs''$ into $rs$ at position $i$ ($rs[i]$ is destroyed, replaced by its list of children regular expressions) |
750 The simplification does a pattern matching on the regular expression. When it detected that |
751 \EndIf |
751 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions |
752 \EndFor |
752 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification |
753 \State Remove all duplicates in $rs$, only keeping the first copy for multiple occurrences of the same regular expression |
753 at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions |
754 \State Remove all $0$s in $rs$ |
754 flatten and distinct to open up nested ALT and reduce as many duplicates as possible. |
755 \If{$ rs.length == 0$} \Return $ZERO$ \EndIf |
755 |
756 \If {$ rs.length == 1$} \Return$ rs[0] $\EndIf |
|
757 \EndCase |
|
758 \Case{$SEQ(bs, r_1, r_2)$} |
|
759 \If{$ r_1$ or $r_2$ is $ZERO$} \Return ZERO \EndIf |
|
760 \State update $r_1$ and $r_2$ by attaching $bs$ to their front |
|
761 \If {$r_1$ or $r_2$ is $ONE(bs')$} \Return $r_2$ or $r_1$ \EndIf |
|
762 \EndCase |
|
763 \Case{$Others$} |
|
764 \Return $areg$ as it is |
|
765 \EndCase |
|
766 \EndSwitch |
|
767 \EndProcedure |
|
768 \end{algorithmic} |
|
769 \end{algorithm} |
|
770 With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6. |
756 With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6. |
771 |
757 |
772 Another modification is that we use simplification rules |
758 Another modification is that we use simplification rules |
773 inspired by Antimirov's work on partial derivatives. They maintain the |
759 inspired by Antimirov's work on partial derivatives. They maintain the |
774 idea that only the first ``copy'' of a regular expression in an |
760 idea that only the first ``copy'' of a regular expression in an |