cst_tests: comparison ninems/ninems.tex

equal deleted inserted replaced

-:d2a7e87ea6e1
+:bbefcf7351f2
 The simplification does a pattern matching on the regular expression. When it detected that
 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
 at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions
 flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
+Function distinct  keeps the first occurring copy only and remove all later ones when detected duplicates.
-With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6.
+Function flatten opens up nested ALT. Its recursive definition is given below:
+\\
+flatten ALT bs rs :: rss = (map fuse( bs,_)  rs )@ flatten rss
+flatten ZERO :: rss = flatten rss
+flatten r::rss = r @ flatten rss if r is of any other shape
+Here flatten behaves like the traditional functional programming flatten function,
+what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
+If we apply simplification after each derivative step, we get an optimized version of the algorithm:
+\\TODO definition of blexer_simp
+This algorithm effectively keeps the regular expression size small, for example,
+with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
 Another modification is that we use simplification rules
 inspired by Antimirov's work on partial derivatives. They maintain the
 idea that only the first ``copy'' of a regular expression in an
 alternative contributes to the calculation of a POSIX value. All

changeset 48	bbefcf7351f2
parent 47	d2a7e87ea6e1
child 49	d256aabe88f3