ninems/ninems.tex
changeset 53 3ec403f650a8
parent 52 25bbbb8b0e90
child 54 4bfd28722884
equal deleted inserted replaced
52:25bbbb8b0e90 53:3ec403f650a8
   619 \end{tabular}    
   619 \end{tabular}    
   620 \end{center}    
   620 \end{center}    
   621 %\end{definition}
   621 %\end{definition}
   622 
   622 
   623 
   623 
   624 Sulzmann and Lu's integrated the bitcodes into annotated regular expressions by attaching them to the head of every substructure of a regular expression\emph{annotated regular expressions}~\cite{Sulzmann2014}. They are
   624 Sulzmann and Lu's integrated the bitcodes into annotated regular expressions by attaching them to the head of every substructure of a regular expression\cite{Sulzmann2014}. They are
   625 defined by the following grammar:
   625 defined by the following grammar:
   626 
   626 
   627 \begin{center}
   627 \begin{center}
   628 \begin{tabular}{lcl}
   628 \begin{tabular}{lcl}
   629   $\textit{a}$ & $::=$  & $\textit{ZERO}$\\
   629   $\textit{a}$ & $::=$  & $\textit{ZERO}$\\
   734   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
   734   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
   735   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
   735   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
   736   & & $\;\;\textit{else}\;\textit{None}$
   736   & & $\;\;\textit{else}\;\textit{None}$
   737 \end{tabular}
   737 \end{tabular}
   738 \end{center}
   738 \end{center}
       
   739 Here $(r^\uparrow)\backslash s$ is similar to what we have previously defined for 
       
   740 $r\backslash s$.
   739 
   741 
   740 The main point of the bitsequences and annotated regular expressions
   742 The main point of the bitsequences and annotated regular expressions
   741 is that we can apply rather aggressive (in terms of size)
   743 is that we can apply rather aggressive (in terms of size)
   742 simplification rules in order to keep derivatives small.  
   744 simplification rules in order to keep derivatives small.  
   743 
   745 
   782 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   784 recursively and then see if one of the children turn into 0 or 1, which might trigger further simplification
   783  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   785  at the current level. The most involved part is the ALTS clause, where we use two auxiliary functions 
   784  flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
   786  flatten and distinct to open up nested ALT and reduce as many duplicates as possible.
   785  Function distinct  keeps the first occurring copy only and remove all later ones when detected duplicates.
   787  Function distinct  keeps the first occurring copy only and remove all later ones when detected duplicates.
   786  Function flatten opens up nested ALT. Its recursive definition is given below:
   788  Function flatten opens up nested ALT. Its recursive definition is given below:
   787  \\
   789  \begin{center}
   788  \[flatten ALT bs rs :: rss = (map fuse( bs,_)  rs )@ flatten rss
   790   \begin{tabular}{@{}lcl@{}}
   789  flatten ZERO :: rss = flatten rss
   791   $\textit{flatten} \; (\textit{ALT}\;bs\,as) :: as'$ & $\dn$ & $(\textit{ map fuse}( \textit{bs, \_} )  \textit{ as}) \; +\!+ \; \textit{flatten} \; as' $ \\
   790  flatten r::rss = r @ flatten rss if r is of any other shape
   792   $\textit{flatten} \; \textit{ZERO} :: as'$ & $\dn$ & $ \textit{flatten} \;  as' $ \\
   791  \]
   793     $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; as' $ 
   792  
   794 \end{tabular}    
       
   795 \end{center}  
       
   796 
   793  Here flatten behaves like the traditional functional programming flatten function,
   797  Here flatten behaves like the traditional functional programming flatten function,
   794  what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
   798  what it does is basically removing parentheses like changing $a+(b+c)$ into $a+b+c$.
   795 
   799 
   796 If we apply simplification after each derivative step, we get an optimized version of the algorithm:
   800 Suppose we apply simplification after each derivative step,
   797 \\TODO definition of $blexer_simp$
   801 and view these two operations as an atomic one: $a \backslash_{simp} c \dn \textit{simp}(a \backslash c)$.
       
   802 Then we can use the previous natural extension from derivative w.r.t   character to derivative w.r.t string:
       
   803 
       
   804 \begin{center}
       
   805 \begin{tabular}{lcl}
       
   806 $r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp} c) \backslash_{simp} s$ \\
       
   807 $r \backslash [\,] $ & $\dn$ & $r$
       
   808 \end{tabular}
       
   809 \end{center}
       
   810 
       
   811  we get an optimized version of the algorithm:
       
   812 \begin{center}
       
   813 \begin{tabular}{lcl}
       
   814   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
       
   815       $\textit{let}\;a = (r^\uparrow)\backslash_{simp} s\;\textit{in}$\\                
       
   816   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
       
   817   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
       
   818   & & $\;\;\textit{else}\;\textit{None}$
       
   819 \end{tabular}
       
   820 \end{center}
   798 
   821 
   799 This algorithm effectively keeps the regular expression size small, for example,
   822 This algorithm effectively keeps the regular expression size small, for example,
   800 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
   823 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
   801 
   824 
   802 
   825 
   835 The crucial point is to find the "gene" of a regular expression and how it is kept intact during simplification.
   858 The crucial point is to find the "gene" of a regular expression and how it is kept intact during simplification.
   836 To aid this, we are utilizing the helping function retrieve described by Sulzmann and Lu:
   859 To aid this, we are utilizing the helping function retrieve described by Sulzmann and Lu:
   837 \\definition of retrieve\\
   860 \\definition of retrieve\\
   838  This function assembled the bitcode that corresponds to a parse tree for how the current derivative mathces the suffix of the string(the characters that have not yet appeared, but is stored in the value).
   861  This function assembled the bitcode that corresponds to a parse tree for how the current derivative mathces the suffix of the string(the characters that have not yet appeared, but is stored in the value).
   839  Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
   862  Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
   840  inj r c v = decode retrieve (bder c (internalise r)) v
   863  $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\; ((\textit{internalise}\; r)\backslash_{simp} c) v)$\\
       
   864  A little fact that needs to be stated to help comprehension:\\
       
   865  $r^\uparrow = a$($a$ stands for $annotated$).\\
   841  Fahad and Christian also used this fact to prove  the correctness of bit-coded algorithm without simplificaiton.
   866  Fahad and Christian also used this fact to prove  the correctness of bit-coded algorithm without simplificaiton.
   842  Our purpose of using this, however, is try to establish \\
   867  Our purpose of using this, however, is try to establish \\
   843 $ retrieve \; r \; v \;=\; retrieve  \; simp(r) \; v'.$\\
   868 $ \textit{retrieve} \; a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$\\
   844  Although the r and v are different now,  we can still extract the bitsequence that gies the same parsing information.
   869  The idea is that using $v'$,
   845  After establishing this, we might be able to finally bridge the gap of proving
   870   a simplified version of $v$ that possibly had gone through the same simplification step as $\textit{simp}(a)$ we are still  able to extract the bitsequence that gievs the same parsing information as the unsimplified one.
   846  $retrieve \; r  \; \backslash  \;s \; v = \;retrieve \; simp(r) \; \backslash \; s \; v'$
   871  After establishing this, we might be able to finally bridge the gap of proving\\
   847  and subsequently
   872  $\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \; \textit{simp}(r)  \backslash  s \; v'$\\
   848  $retrieve \; r \; \backslash \; s \; v\; = \; retrieve \; r \; \backslash{with simplification}  \; s \; v'$.
   873  and subsequently\\
       
   874  $\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \; r  \backslash_{simp}   s \; v'$.\\
   849  This proves that our simplified version of regular expression still contains all the bitcodes neeeded.
   875  This proves that our simplified version of regular expression still contains all the bitcodes neeeded.
   850 
   876 
   851 The second task is to speed up the more aggressive simplification. Currently it is slower than a naive simplifiction(the naive version as implemented in ADU of course can explode in some cases).
   877 The second task is to speed up the more aggressive simplification. Currently it is slower than a naive simplifiction(the naive version as implemented in ADU of course can explode in some cases).
   852 So it needs to be explored how to make it faster. Our possibility would be to explore again the connection to DFAs. This is very much work in progress.
   878 So it needs to be explored how to make it faster. Our possibility would be to explore again the connection to DFAs. This is very much work in progress.
   853 
   879