lexing: comparison thys2/Paper/document/root.tex

equal deleted inserted replaced

-:d73f19be3ce6
+:46e5566ad4ba
 \begin{document}
 \maketitle
 \begin{abstract}
 Sulzmann and Lu described a lexing algorithm that calculates
-Brzozowski derivatives using bit-sequences annotated to regular
+Brzozowski derivatives using bitcodes annotated to regular
 expressions.  Their algorithm generates POSIX values which encode
 the information of \emph{how} a regular expression matches a
 string---that is, which part of the string is matched by which part
-of the regular expression.  The purpose of the bit-sequences in
+of the regular expression.  The purpose of the bitcodes in Sulzmann
-Sulzmann and Lu's algorithm is to keep the size of derivatives small
+and Lu's algorithm is to generate POSIX values incrementally while
-which is achieved by `aggressively' simplifying regular expressions.
+derivatives are calculated. However they also help with designing
-In this paper we describe a slight variant of Sulzmann and Lu's
+`aggressive' simplification methods that keep the size of
-algorithm and \textit{(i)} prove that this algorithm generates
+derivatives small. Without simplification derivatives can grow
-unique POSIX values; \textit{(ii)} we also establish a cubic bound
+exponentially resulting in an extremely slow lexing algorithm.  In this
-for the size of the derivatives---in earlier works, derivatives can
+paper we describe a variant of Sulzmann and Lu's algorithm: Our
-grow exponentially even after simplification.
+algorithm is a small, recursive functional program, whereas Sulzmann
+and Lu's version involves a fixpoint construction. We \textit{(i)}
+prove in Isabelle/HOL that our program is correct and generates
+unique POSIX values; we also \textit{(ii)} establish a polynomial
+bound for the size of the derivatives. The size can be seen as a
+proxy measure for the effeciency of the lexing algorithm---that means
+our algorithm does not suffer from the exponential blowup.
-%Brzozowski introduced the notion of derivatives for regular
+% Brzozowski introduced the notion of derivatives for regular
-%expressions. They can be used for a very simple regular expression
+% expressions. They can be used for a very simple regular expression
-%matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
+% matching algorithm.  Sulzmann and Lu cleverly extended this
-%in order to deal with POSIX matching, which is the underlying
+% algorithm in order to deal with POSIX matching, which is the
-%disambiguation strategy for regular expressions needed in lexers.
+% underlying disambiguation strategy for regular expressions needed
-%Their algorithm generates POSIX values which encode the information of
+% in lexers.  Their algorithm generates POSIX values which encode
-%\emph{how} a regular expression matches a string---that is, which part
+% the information of \emph{how} a regular expression matches a
-%of the string is matched by which part of the regular expression.  In
+% string---that is, which part of the string is matched by which
-%this paper we give our inductive definition of what a POSIX value is
+% part of the regular expression.  In this paper we give our
-%and show $(i)$ that such a value is unique (for given regular
+% inductive definition of what a POSIX value is and show $(i)$ that
-%expression and string being matched) and $(ii)$ that Sulzmann and Lu's
+% such a value is unique (for given regular expression and string
-%algorithm always generates such a value (provided that the regular
+% being matched) and $(ii)$ that Sulzmann and Lu's algorithm always
-%expression matches the string). We show that $(iii)$ our inductive
+% generates such a value (provided that the regular expression
-%definition of a POSIX value is equivalent to an alternative definition
+% matches the string). We show that $(iii)$ our inductive definition
-%by Okui and Suzuki which identifies POSIX values as least elements
+% of a POSIX value is equivalent to an alternative definition by
-%according to an ordering of values.  We also prove the correctness of
+% Okui and Suzuki which identifies POSIX values as least elements
-%Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
+% according to an ordering of values.  We also prove the correctness
-%results to additional constructors for regular expressions.  \smallskip
+% of Sulzmann's bitcoded version of the POSIX matching algorithm and
+% extend the results to additional constructors for regular
+% expressions.  \smallskip
 \end{abstract}
 \input{session}

changeset 400	46e5566ad4ba
parent 398	dac6d27c99c6
child 401	8bbe2468fedc