ninems/ninems.tex
changeset 50 866eda9ba66a
parent 49 d256aabe88f3
child 51 5df7faf69238
equal deleted inserted replaced
49:d256aabe88f3 50:866eda9ba66a
   738 idea that only the first ``copy'' of a regular expression in an
   738 idea that only the first ``copy'' of a regular expression in an
   739 alternative contributes to the calculation of a POSIX value. All
   739 alternative contributes to the calculation of a POSIX value. All
   740 subsequent copies can be pruned from the regular expression.
   740 subsequent copies can be pruned from the regular expression.
   741 
   741 
   742 A psuedocode version of our algorithm is given below:\\
   742 A psuedocode version of our algorithm is given below:\\
   743 
   743 \[simp r \dn r if r = ONE bs or CHAR bs c or STAR bs r
   744 simp r \defn r if r = ONE bs or CHAR bs c or STAR bs r
   744 simp SEQ bs r_1 r_2 \dn \\
   745 simp SEQ bs r_1 r_2 \defn \\
       
   746 case (simp(r_1), simp(r_2) ) of (0, _) => 0
   745 case (simp(r_1), simp(r_2) ) of (0, _) => 0
   747 (_,0) => 0
   746 (_,0) => 0
   748 (1, r) => fuse bs r
   747 (1, r) => fuse bs r
   749 (r,1) => fuse bs r
   748 (r,1) => fuse bs r
   750 (r_1, r_2) => SEQ bs r_1 r_2
   749 (r_1, r_2) => SEQ bs r_1 r_2
   751 simp ALT bs rs = distinct(flatten( map simp rs)) match 
   750 simp ALT bs rs = distinct(flatten( map simp rs)) match 
   752 case Nil => ZERO
   751 case Nil => ZERO
   753 case r::Nil => fuse bs r
   752 case r::Nil => fuse bs r
   754 case rs => ALT bs rs
   753 case rs => ALT bs rs
       
   754 \]
   755 
   755 
   756 The simplification does a pattern matching on the regular expression. When it detected that
   756 The simplification does a pattern matching on the regular expression. When it detected that
   757 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
   757 the regular expression is an alternative or sequence, it will try to simplify its children regular expressions
   758 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   758 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   759  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   759  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   760  flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
   760  flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
   761  Function distinct  keeps the first occurring copy only and remove all later ones when detected duplicates.
   761  Function distinct  keeps the first occurring copy only and remove all later ones when detected duplicates.
   762  Function flatten opens up nested ALT. Its recursive definition is given below:
   762  Function flatten opens up nested ALT. Its recursive definition is given below:
   763  \\
   763  \\
   764  flatten ALT bs rs :: rss = (map fuse( bs,_)  rs )@ flatten rss
   764  \[flatten ALT bs rs :: rss = (map fuse( bs,_)  rs )@ flatten rss
   765  flatten ZERO :: rss = flatten rss
   765  flatten ZERO :: rss = flatten rss
   766  flatten r::rss = r @ flatten rss if r is of any other shape
   766  flatten r::rss = r @ flatten rss if r is of any other shape
       
   767  \]
   767  
   768  
   768  Here flatten behaves like the traditional functional programming flatten function,
   769  Here flatten behaves like the traditional functional programming flatten function,
   769  what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
   770  what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
   770 
   771 
   771 If we apply simplification after each derivative step, we get an optimized version of the algorithm:
   772 If we apply simplification after each derivative step, we get an optimized version of the algorithm:
   772 \\TODO definition of blexer_simp
   773 \\TODO definition of $blexer_simp$
   773 
   774 
   774 This algorithm effectively keeps the regular expression size small, for example,
   775 This algorithm effectively keeps the regular expression size small, for example,
   775 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
   776 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
   776 
   777 
   777 
   778 
   812 \\definition of retrieve\\
   813 \\definition of retrieve\\
   813  This function assembled the bitcode that corresponds to a parse tree for how the current derivative mathces the suffix of the string(the characters that have not yet appeared, but is stored in the value).
   814  This function assembled the bitcode that corresponds to a parse tree for how the current derivative mathces the suffix of the string(the characters that have not yet appeared, but is stored in the value).
   814  Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
   815  Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
   815  inj r c v = decode retrieve (bder c (internalise r)) v
   816  inj r c v = decode retrieve (bder c (internalise r)) v
   816  Fahad and Christian also used this fact to prove  the correctness of bit-coded algorithm without simplificaiton.
   817  Fahad and Christian also used this fact to prove  the correctness of bit-coded algorithm without simplificaiton.
   817  Our purpose of using this, however, is try to establish 
   818  Our purpose of using this, however, is try to establish \\
   818  retrieve r v = retrieve simp(r) v'.
   819 $ retrieve \; r \; v \;=\; retrieve  \; simp(r) \; v'.$\\
   819  Although the r and v are different now,  we can still extract the bitsequence that gies the same parsing information.
   820  Although the r and v are different now,  we can still extract the bitsequence that gies the same parsing information.
   820  After establishing this, we might be able to finally bridge the gap of proving
   821  After establishing this, we might be able to finally bridge the gap of proving
   821  retrieve r \backslash s v = retrieve simp(r) \backslash s v'
   822  $retrieve \; r  \; \backslash  \;s \; v = \;retrieve \; simp(r) \; \backslash \; s \; v'$
   822  and subsequently
   823  and subsequently
   823  retrieve r \backslash s v = retrieve r \backslash{with simplification} s v'.
   824  $retrieve \; r \; \backslash \; s \; v\; = \; retrieve \; r \; \backslash{with simplification}  \; s \; v'$.
   824  This proves that our simplified version of regular expression still contains all the bitcodes neeeded.
   825  This proves that our simplified version of regular expression still contains all the bitcodes neeeded.
   825 
   826 
   826 The second task is to speed up the more aggressive simplification. Currently it is slower than a naive simplifiction(the naive version as implemented in ADU of course can explode in some cases).
   827 The second task is to speed up the more aggressive simplification. Currently it is slower than a naive simplifiction(the naive version as implemented in ADU of course can explode in some cases).
   827 So it needs to be explored how to make it faster. Our possibility would be to explore again the connection to DFAs. This is very much work in progress.
   828 So it needs to be explored how to make it faster. Our possibility would be to explore again the connection to DFAs. This is very much work in progress.
   828 
   829