diff -r fb23e3fd12e5 -r b7199d6c672d thys2/Paper/document/root.tex --- a/thys2/Paper/document/root.tex Tue Feb 08 01:25:26 2022 +0000 +++ b/thys2/Paper/document/root.tex Tue Feb 08 14:29:41 2022 +0000 @@ -63,7 +63,7 @@ \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{} \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{} \authorrunning{C.~Tan and C.~Urban} -\keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL} +\keywords{POSIX matching and lexing, Derivatives of Regular Expressions, Isabelle/HOL} \category{} \ccsdesc[100]{Design and analysis of algorithms} \ccsdesc[100]{Formal languages and automata theory} @@ -75,15 +75,17 @@ \maketitle \begin{abstract} - Sulzmann and Lu described a lexing algorithm that calculates + Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of \emph{how} a regular expression matches a string---that is, which part of the string is matched by which part - of the regular expression. The purpose of the bitcodes is to generate POSIX values incrementally while + of the regular expression. This information is needed in the + context of lexing in order to extract and to classify tokens. + The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an ``aggressive'' simplification function that keeps the size of - derivatives finite. Without simplification the size derivatives can grow + derivatives finite. Without simplification the size of some derivatives can grow arbitrarily big resulting in an extremely slow lexing algorithm. In this paper we describe a variant of Sulzmann and Lu's algorithm: Our algorithm is a recursive functional program, whereas Sulzmann