--- a/thys2/Paper/document/root.tex Sun Jan 30 23:36:31 2022 +0000
+++ b/thys2/Paper/document/root.tex Sun Jan 30 23:37:29 2022 +0000
@@ -45,33 +45,59 @@
\titlerunning{POSIX Lexing with Bitcoded Derivatives}
\author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
\author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
+\authorrunning{C.~Tan and C.~Urban}
+\keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL}
+\category{}
+\ccsdesc[100]{Design and analysis of algorithms}
+\ccsdesc[100]{Formal languages and automata theory}
+\Copyright{\mbox{}}
+\nolinenumbers
\begin{document}
\maketitle
\begin{abstract}
-Brzozowski introduced the notion of derivatives for regular
-expressions. They can be used for a very simple regular expression
-matching algorithm. Sulzmann and Lu cleverly extended this algorithm
-in order to deal with POSIX matching, which is the underlying
-disambiguation strategy for regular expressions needed in lexers.
-Their algorithm generates POSIX values which encode the information of
-\emph{how} a regular expression matches a string---that is, which part
-of the string is matched by which part of the regular expression. In
-this paper we give our inductive definition of what a POSIX value is
-and show $(i)$ that such a value is unique (for given regular
-expression and string being matched) and $(ii)$ that Sulzmann and Lu's
-algorithm always generates such a value (provided that the regular
-expression matches the string). We show that $(iii)$ our inductive
-definition of a POSIX value is equivalent to an alternative definition
-by Okui and Suzuki which identifies POSIX values as least elements
-according to an ordering of values. We also prove the correctness of
-Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
-results to additional constructors for regular expressions. \smallskip
-
-{\bf Keywords:} POSIX matching, Derivatives of Regular Expressions,
-Isabelle/HOL
+ Sulzmann and Lu described a lexing algorithm that calculates
+ Brzozowski derivatives using bitcodes annotated to regular
+ expressions. Their algorithm generates POSIX values which encode
+ the information of \emph{how} a regular expression matches a
+ string---that is, which part of the string is matched by which part
+ of the regular expression. The purpose of the bitcodes is to generate POSIX values incrementally while
+ derivatives are calculated. They also help with designing
+ an `aggressive' simplification function that keeps the size of
+ derivatives small. Without simplification derivatives can grow
+ exponentially resulting in an extremely slow lexing algorithm. In this
+ paper we describe a variant of Sulzmann and Lu's algorithm: Our
+ algorithm is a recursive functional program, whereas Sulzmann
+ and Lu's version involves a fixpoint construction. We \textit{(i)}
+ prove in Isabelle/HOL that our program is correct and generates
+ unique POSIX values; we also \textit{(ii)} establish a polynomial
+ bound for the size of the derivatives. The size can be seen as a
+ proxy measure for the efficiency of the lexing algorithm: because of
+ the polynomial bound our algorithm does not suffer from
+ the exponential blowup in earlier works.
+
+ % Brzozowski introduced the notion of derivatives for regular
+ % expressions. They can be used for a very simple regular expression
+ % matching algorithm. Sulzmann and Lu cleverly extended this
+ % algorithm in order to deal with POSIX matching, which is the
+ % underlying disambiguation strategy for regular expressions needed
+ % in lexers. Their algorithm generates POSIX values which encode
+ % the information of \emph{how} a regular expression matches a
+ % string---that is, which part of the string is matched by which
+ % part of the regular expression. In this paper we give our
+ % inductive definition of what a POSIX value is and show $(i)$ that
+ % such a value is unique (for given regular expression and string
+ % being matched) and $(ii)$ that Sulzmann and Lu's algorithm always
+ % generates such a value (provided that the regular expression
+ % matches the string). We show that $(iii)$ our inductive definition
+ % of a POSIX value is equivalent to an alternative definition by
+ % Okui and Suzuki which identifies POSIX values as least elements
+ % according to an ordering of values. We also prove the correctness
+ % of Sulzmann's bitcoded version of the POSIX matching algorithm and
+ % extend the results to additional constructors for regular
+ % expressions. \smallskip
\end{abstract}