ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
changeset 538 8016a2480704
parent 532 cc54ce075db5
child 539 7cf9f17aa179
equal deleted inserted replaced
537:50e590823220 538:8016a2480704
     6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
     6 \label{Bitcoded2} % Change X to a consecutive number; for referencing this chapter elsewhere, use \ref{ChapterX}
     7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
     7 %Then we illustrate how the algorithm without bitcodes falls short for such aggressive 
     8 %simplifications and therefore introduce our version of the bitcoded algorithm and 
     8 %simplifications and therefore introduce our version of the bitcoded algorithm and 
     9 %its correctness proof in 
     9 %its correctness proof in 
    10 %Chapter 3\ref{Chapter3}. 
    10 %Chapter 3\ref{Chapter3}. 
       
    11 
       
    12 
       
    13 
       
    14 Now we introduce the simplifications, which is why we introduce the 
       
    15 bitcodes in the first place.
       
    16 
       
    17 \subsection*{Simplification Rules}
       
    18 
       
    19 This section introduces aggressive (in terms of size) simplification rules
       
    20 on annotated regular expressions
       
    21 to keep derivatives small. Such simplifications are promising
       
    22 as we have
       
    23 generated test data that show
       
    24 that a good tight bound can be achieved. We could only
       
    25 partially cover the search space as there are infinitely many regular
       
    26 expressions and strings. 
       
    27 
       
    28 One modification we introduced is to allow a list of annotated regular
       
    29 expressions in the $\sum$ constructor. This allows us to not just
       
    30 delete unnecessary $\ZERO$s and $\ONE$s from regular expressions, but
       
    31 also unnecessary ``copies'' of regular expressions (very similar to
       
    32 simplifying $r + r$ to just $r$, but in a more general setting). Another
       
    33 modification is that we use simplification rules inspired by Antimirov's
       
    34 work on partial derivatives. They maintain the idea that only the first
       
    35 ``copy'' of a regular expression in an alternative contributes to the
       
    36 calculation of a POSIX value. All subsequent copies can be pruned away from
       
    37 the regular expression. A recursive definition of our  simplification function 
       
    38 that looks somewhat similar to our Scala code is given below:
       
    39 %\comment{Use $\ZERO$, $\ONE$ and so on. 
       
    40 %Is it $ALTS$ or $ALTS$?}\\
       
    41 
       
    42 \begin{center}
       
    43   \begin{tabular}{@{}lcl@{}}
       
    44    
       
    45   $\textit{simp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
       
    46    &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
       
    47    &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
       
    48    &&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
       
    49    &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
       
    50    &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$ \\
       
    51 
       
    52   $\textit{simp} \; (_{bs}\sum \textit{as})$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map} \; simp \; as)) \; \textit{match} $ \\
       
    53   &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
       
    54    &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
       
    55    &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
       
    56 
       
    57    $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
       
    58 \end{tabular}    
       
    59 \end{center}    
       
    60 
       
    61 \noindent
       
    62 The simplification does a pattern matching on the regular expression.
       
    63 When it detected that the regular expression is an alternative or
       
    64 sequence, it will try to simplify its child regular expressions
       
    65 recursively and then see if one of the children turns into $\ZERO$ or
       
    66 $\ONE$, which might trigger further simplification at the current level.
       
    67 The most involved part is the $\sum$ clause, where we use two
       
    68 auxiliary functions $\textit{flatten}$ and $\textit{distinct}$ to open up nested
       
    69 alternatives and reduce as many duplicates as possible. Function
       
    70 $\textit{distinct}$  keeps the first occurring copy only and removes all later ones
       
    71 when detected duplicates. Function $\textit{flatten}$ opens up nested $\sum$s.
       
    72 Its recursive definition is given below:
       
    73 
       
    74  \begin{center}
       
    75   \begin{tabular}{@{}lcl@{}}
       
    76   $\textit{flatten} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
       
    77      (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
       
    78   $\textit{flatten} \; \ZERO :: as'$ & $\dn$ & $ \textit{flatten} \;  \textit{as'} $ \\
       
    79     $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; \textit{as'}$ \quad(otherwise) 
       
    80 \end{tabular}    
       
    81 \end{center}  
       
    82 
       
    83 \noindent
       
    84 Here $\textit{flatten}$ behaves like the traditional functional programming flatten
       
    85 function, except that it also removes $\ZERO$s. Or in terms of regular expressions, it
       
    86 removes parentheses, for example changing $a+(b+c)$ into $a+b+c$.
       
    87 
       
    88 Having defined the $\simp$ function,
       
    89 we can use the previous notation of  natural
       
    90 extension from derivative w.r.t.~character to derivative
       
    91 w.r.t.~string:%\comment{simp in  the [] case?}
       
    92 
       
    93 \begin{center}
       
    94 \begin{tabular}{lcl}
       
    95 $r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
       
    96 $r \backslash_{simp} [\,] $ & $\dn$ & $r$
       
    97 \end{tabular}
       
    98 \end{center}
       
    99 
       
   100 \noindent
       
   101 to obtain an optimised version of the algorithm:
       
   102 
       
   103  \begin{center}
       
   104 \begin{tabular}{lcl}
       
   105   $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
       
   106       $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
       
   107   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
       
   108   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
       
   109   & & $\;\;\textit{else}\;\textit{None}$
       
   110 \end{tabular}
       
   111 \end{center}
       
   112 
       
   113 \noindent
       
   114 This algorithm keeps the regular expression size small, for example,
       
   115 with this simplification our previous $(a + aa)^*$ example's 8000 nodes
       
   116 will be reduced to just 17 and stays constant, no matter how long the
       
   117 input string is.
       
   118 
       
   119 
       
   120 
       
   121 
       
   122 
       
   123 
    11 
   124 
    12 
   125 
    13 
   126 
    14 
   127 
    15 %----------------------------------------------------------------------------------------
   128 %----------------------------------------------------------------------------------------