ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
changeset 584 1734bd5975a3
parent 583 4aabb0629e4b
child 585 4969ef817d92
equal deleted inserted replaced
583:4aabb0629e4b 584:1734bd5975a3
    97 		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
    97 		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
    98 		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
    98 		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
    99 	\end{tabular}
    99 	\end{tabular}
   100 \end{center}
   100 \end{center}
   101 \noindent
   101 \noindent
       
   102 They suggested that the $\simp$ function should be
       
   103 applied repeatedly until a fixpoint is reached.
       
   104 We implemented the algorithm as required in Scala, 
       
   105 and found that final derivative regular expression
       
   106 size grows exponentially fast:
       
   107 \begin{center}
       
   108 \begin{figure}[H]
       
   109 \begin{tikzpicture}
       
   110 \begin{axis}[
       
   111     xlabel={$n$},
       
   112     ylabel={time in secs},
       
   113     ymode = log,
       
   114     legend entries={Sulzmann and Lu's $\simp$},  
       
   115     legend pos=north west,
       
   116     legend cell align=left]
       
   117 \addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
       
   118 \end{axis}
       
   119 \end{tikzpicture} 
       
   120 \caption{Matching the regular expression $(a^*a^*)^*$ against strings of the form
       
   121 $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
       
   122 $ using Sulzmann and Lu's lexer algorithm}\label{SulzmannLuLexer}
       
   123 \end{figure}
       
   124 \end{center}
       
   125 \noindent
       
   126 At $n= 20$ we already get an out of memory error with Scala's normal 
       
   127 JVM heap size settings,
       
   128 which seems a counterexample for their linear complexity claim:
       
   129 \begin{quote}\it
       
   130 Linear-Time Complexity Claim It is easy to see that each call of one of the functions/operations\b,simp,fuse,mkEpsBCandisPhi leadstosubcallswhose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. We yet need to work out detailed estimates regarding the space complexity of our algorithm.
       
   131 \end{quote}  
       
   132 
       
   133 
       
   134 
   102 Despite that we have already implemented the simple-minded simplifications 
   135 Despite that we have already implemented the simple-minded simplifications 
   103 such as throwing away useless $\ONE$s and $\ZERO$s.
   136 such as throwing away useless $\ONE$s and $\ZERO$s.
   104 The simplification rule $r + r \rightarrow $ cannot make a difference either
   137 The simplification rule $r + r \rightarrow $ cannot make a difference either
   105 since it only removes duplicates on the same level, not something like 
   138 since it only removes duplicates on the same level, not something like 
   106 $(r+a)+r \rightarrow r+a$.
   139 $(r+a)+r \rightarrow r+a$.