lexing: comparison ChengsongTanPhdThesis/Chapters/Bitcoded2.tex

equal deleted inserted replaced

-:4aabb0629e4b
+:1734bd5975a3
 		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
 		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
 	\end{tabular}
 \end{center}
 \noindent
+They suggested that the $\simp$ function should be
+applied repeatedly until a fixpoint is reached.
+We implemented the algorithm as required in Scala,
+and found that final derivative regular expression
+size grows exponentially fast:
+\begin{center}
+\begin{figure}[H]
+\begin{tikzpicture}
+\begin{axis}[
+xlabel={$n$},
+ylabel={time in secs},
+ymode = log,
+legend entries={Sulzmann and Lu's $\simp$},
+legend pos=north west,
+legend cell align=left]
+\addplot[red,mark=*, mark options={fill=white}] table {SulzmannLuLexer.data};
+\end{axis}
+\end{tikzpicture}
+\caption{Matching the regular expression $(a^*a^*)^*$ against strings of the form
+$\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
+$ using Sulzmann and Lu's lexer algorithm}\label{SulzmannLuLexer}
+\end{figure}
+\end{center}
+\noindent
+At $n= 20$ we already get an out of memory error with Scala's normal
+JVM heap size settings,
+which seems a counterexample for their linear complexity claim:
+\begin{quote}\it
+Linear-Time Complexity Claim It is easy to see that each call of one of the functions/operations\b,simp,fuse,mkEpsBCandisPhi leadstosubcallswhose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. We yet need to work out detailed estimates regarding the space complexity of our algorithm.
+\end{quote}
 Despite that we have already implemented the simple-minded simplifications
 such as throwing away useless $\ONE$s and $\ZERO$s.
 The simplification rule $r + r \rightarrow $ cannot make a difference either
 since it only removes duplicates on the same level, not something like
 $(r+a)+r \rightarrow r+a$.

changeset 584	1734bd5975a3
parent 583	4aabb0629e4b
child 585	4969ef817d92