750 The simplification does a pattern matching on the regular expression. When it detected that |
750 The simplification does a pattern matching on the regular expression. When it detected that |
751 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions |
751 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions |
752 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification |
752 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification |
753 at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions |
753 at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions |
754 flatten and distinct to open up nested ALT and reduce as many duplicates as possible. |
754 flatten and distinct to open up nested ALT and reduce as many duplicates as possible. |
755 |
755 Function distinct keeps the first occurring copy only and remove all later ones when detected duplicates. |
756 With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6. |
756 Function flatten opens up nested ALT. Its recursive definition is given below: |
|
757 \\ |
|
758 flatten ALT bs rs :: rss = (map fuse( bs,_) rs )@ flatten rss |
|
759 flatten ZERO :: rss = flatten rss |
|
760 flatten r::rss = r @ flatten rss if r is of any other shape |
|
761 |
|
762 Here flatten behaves like the traditional functional programming flatten function, |
|
763 what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$. |
|
764 |
|
765 If we apply simplification after each derivative step, we get an optimized version of the algorithm: |
|
766 \\TODO definition of blexer_simp |
|
767 |
|
768 This algorithm effectively keeps the regular expression size small, for example, |
|
769 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is. |
757 |
770 |
758 Another modification is that we use simplification rules |
771 Another modification is that we use simplification rules |
759 inspired by Antimirov's work on partial derivatives. They maintain the |
772 inspired by Antimirov's work on partial derivatives. They maintain the |
760 idea that only the first ``copy'' of a regular expression in an |
773 idea that only the first ``copy'' of a regular expression in an |
761 alternative contributes to the calculation of a POSIX value. All |
774 alternative contributes to the calculation of a POSIX value. All |