ninems/ninems.tex
changeset 48 bbefcf7351f2
parent 47 d2a7e87ea6e1
child 49 d256aabe88f3
equal deleted inserted replaced
47:d2a7e87ea6e1 48:bbefcf7351f2
   750 The simplification does a pattern matching on the regular expression. When it detected that
   750 The simplification does a pattern matching on the regular expression. When it detected that
   751 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
   751 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
   752 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   752 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   753  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   753  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   754  flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
   754  flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
   755 
   755  Function distinct  keeps the first occurring copy only and remove all later ones when detected duplicates.
   756 With this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6.
   756  Function flatten opens up nested ALT. Its recursive definition is given below:
       
   757  \\
       
   758  flatten ALT bs rs :: rss = (map fuse( bs,_)  rs )@ flatten rss
       
   759  flatten ZERO :: rss = flatten rss
       
   760  flatten r::rss = r @ flatten rss if r is of any other shape
       
   761  
       
   762  Here flatten behaves like the traditional functional programming flatten function,
       
   763  what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
       
   764 
       
   765 If we apply simplification after each derivative step, we get an optimized version of the algorithm:
       
   766 \\TODO definition of blexer_simp
       
   767 
       
   768 This algorithm effectively keeps the regular expression size small, for example,
       
   769 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
   757 
   770 
   758 Another modification is that we use simplification rules
   771 Another modification is that we use simplification rules
   759 inspired by Antimirov's work on partial derivatives. They maintain the
   772 inspired by Antimirov's work on partial derivatives. They maintain the
   760 idea that only the first ``copy'' of a regular expression in an
   773 idea that only the first ``copy'' of a regular expression in an
   761 alternative contributes to the calculation of a POSIX value. All
   774 alternative contributes to the calculation of a POSIX value. All