# HG changeset patch # User Chengsong # Date 1562276702 -3600 # Node ID bbefcf7351f2ac80b86285c8873d76f570378249 # Parent d2a7e87ea6e123045d4aa6be83a013abeb47185a still will not compile diff -r d2a7e87ea6e1 -r bbefcf7351f2 ninems/ninems.tex --- a/ninems/ninems.tex Thu Jul 04 22:28:09 2019 +0100 +++ b/ninems/ninems.tex Thu Jul 04 22:45:02 2019 +0100 @@ -752,8 +752,21 @@ recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions flatten and distinct to open up nested ALT and reduce as many duplicates as possible. + Function distinct keeps the first occurring copy only and remove all later ones when detected duplicates. + Function flatten opens up nested ALT. Its recursive definition is given below: + \\ + flatten ALT bs rs :: rss = (map fuse( bs,_) rs )@ flatten rss + flatten ZERO :: rss = flatten rss + flatten r::rss = r @ flatten rss if r is of any other shape + + Here flatten behaves like the traditional functional programming flatten function, + what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$. -With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6. +If we apply simplification after each derivative step, we get an optimized version of the algorithm: +\\TODO definition of blexer_simp + +This algorithm effectively keeps the regular expression size small, for example, +with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is. Another modification is that we use simplification rules inspired by Antimirov's work on partial derivatives. They maintain the