thys2/Paper/document/root.tex
changeset 400 46e5566ad4ba
parent 398 dac6d27c99c6
child 401 8bbe2468fedc
--- a/thys2/Paper/document/root.tex	Sat Jan 29 16:43:51 2022 +0000
+++ b/thys2/Paper/document/root.tex	Sat Jan 29 23:53:21 2022 +0000
@@ -59,37 +59,45 @@
 
 \begin{abstract}
   Sulzmann and Lu described a lexing algorithm that calculates
-  Brzozowski derivatives using bit-sequences annotated to regular
+  Brzozowski derivatives using bitcodes annotated to regular
   expressions.  Their algorithm generates POSIX values which encode
   the information of \emph{how} a regular expression matches a
   string---that is, which part of the string is matched by which part
-  of the regular expression.  The purpose of the bit-sequences in
-  Sulzmann and Lu's algorithm is to keep the size of derivatives small
-  which is achieved by `aggressively' simplifying regular expressions.
-  In this paper we describe a slight variant of Sulzmann and Lu's
-  algorithm and \textit{(i)} prove that this algorithm generates
-  unique POSIX values; \textit{(ii)} we also establish a cubic bound
-  for the size of the derivatives---in earlier works, derivatives can
-  grow exponentially even after simplification.
+  of the regular expression.  The purpose of the bitcodes in Sulzmann
+  and Lu's algorithm is to generate POSIX values incrementally while
+  derivatives are calculated. However they also help with designing
+  `aggressive' simplification methods that keep the size of
+  derivatives small. Without simplification derivatives can grow
+  exponentially resulting in an extremely slow lexing algorithm.  In this
+  paper we describe a variant of Sulzmann and Lu's algorithm: Our
+  algorithm is a small, recursive functional program, whereas Sulzmann
+  and Lu's version involves a fixpoint construction. We \textit{(i)}
+  prove in Isabelle/HOL that our program is correct and generates
+  unique POSIX values; we also \textit{(ii)} establish a polynomial
+  bound for the size of the derivatives. The size can be seen as a
+  proxy measure for the effeciency of the lexing algorithm---that means
+  our algorithm does not suffer from the exponential blowup.
   
-%Brzozowski introduced the notion of derivatives for regular
-%expressions. They can be used for a very simple regular expression
-%matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
-%in order to deal with POSIX matching, which is the underlying
-%disambiguation strategy for regular expressions needed in lexers.
-%Their algorithm generates POSIX values which encode the information of
-%\emph{how} a regular expression matches a string---that is, which part
-%of the string is matched by which part of the regular expression.  In
-%this paper we give our inductive definition of what a POSIX value is
-%and show $(i)$ that such a value is unique (for given regular
-%expression and string being matched) and $(ii)$ that Sulzmann and Lu's
-%algorithm always generates such a value (provided that the regular
-%expression matches the string). We show that $(iii)$ our inductive
-%definition of a POSIX value is equivalent to an alternative definition
-%by Okui and Suzuki which identifies POSIX values as least elements
-%according to an ordering of values.  We also prove the correctness of
-%Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
-%results to additional constructors for regular expressions.  \smallskip
+  % Brzozowski introduced the notion of derivatives for regular
+  % expressions. They can be used for a very simple regular expression
+  % matching algorithm.  Sulzmann and Lu cleverly extended this
+  % algorithm in order to deal with POSIX matching, which is the
+  % underlying disambiguation strategy for regular expressions needed
+  % in lexers.  Their algorithm generates POSIX values which encode
+  % the information of \emph{how} a regular expression matches a
+  % string---that is, which part of the string is matched by which
+  % part of the regular expression.  In this paper we give our
+  % inductive definition of what a POSIX value is and show $(i)$ that
+  % such a value is unique (for given regular expression and string
+  % being matched) and $(ii)$ that Sulzmann and Lu's algorithm always
+  % generates such a value (provided that the regular expression
+  % matches the string). We show that $(iii)$ our inductive definition
+  % of a POSIX value is equivalent to an alternative definition by
+  % Okui and Suzuki which identifies POSIX values as least elements
+  % according to an ordering of values.  We also prove the correctness
+  % of Sulzmann's bitcoded version of the POSIX matching algorithm and
+  % extend the results to additional constructors for regular
+  % expressions.  \smallskip
 \end{abstract}