diff -r cc8e231529fb -r e1b74d618f1b thys2/Paper/document/root.tex --- a/thys2/Paper/document/root.tex Tue Jan 25 13:12:50 2022 +0000 +++ b/thys2/Paper/document/root.tex Thu Jan 27 23:25:26 2022 +0000 @@ -45,33 +45,50 @@ \titlerunning{POSIX Lexing with Bitcoded Derivatives} \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{} \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{} +\keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL} +\category{} +\ccsdesc[100]{Design and analysis of algorithms} +\ccsdesc[100]{Formal languages and automata theory} +\Copyright{\mbox{}} +\nolinenumbers \begin{document} \maketitle \begin{abstract} -Brzozowski introduced the notion of derivatives for regular -expressions. They can be used for a very simple regular expression -matching algorithm. Sulzmann and Lu cleverly extended this algorithm -in order to deal with POSIX matching, which is the underlying -disambiguation strategy for regular expressions needed in lexers. -Their algorithm generates POSIX values which encode the information of -\emph{how} a regular expression matches a string---that is, which part -of the string is matched by which part of the regular expression. In -this paper we give our inductive definition of what a POSIX value is -and show $(i)$ that such a value is unique (for given regular -expression and string being matched) and $(ii)$ that Sulzmann and Lu's -algorithm always generates such a value (provided that the regular -expression matches the string). We show that $(iii)$ our inductive -definition of a POSIX value is equivalent to an alternative definition -by Okui and Suzuki which identifies POSIX values as least elements -according to an ordering of values. We also prove the correctness of -Sulzmann's bitcoded version of the POSIX matching algorithm and extend the -results to additional constructors for regular expressions. \smallskip - -{\bf Keywords:} POSIX matching, Derivatives of Regular Expressions, -Isabelle/HOL + Sulzmann and Lu described a lexing algorithm that calculates + Brzozowski derivatives using bit-sequences annotated to regular + expressions. Their algorithm generates POSIX values which encode + the information of \emph{how} a regular expression matches a + string---that is, which part of the string is matched by which part + of the regular expression. The purpose of the bit-sequences in + Sulzmann and Lu's algorithm is to keep the size of derivatives small + which is achieved by `aggressively' simplifying regular expressions. + In this paper we describe a slight variant of Sulzmann and Lu's + algorithm and \textit{(i)} prove that this algorithm generates + unique POSIX values; \textit{(ii)} we also establish a cubic bound + for the size of the derivatives---in earlier works, derivatives can + grow exponentially even after simplification. + +%Brzozowski introduced the notion of derivatives for regular +%expressions. They can be used for a very simple regular expression +%matching algorithm. Sulzmann and Lu cleverly extended this algorithm +%in order to deal with POSIX matching, which is the underlying +%disambiguation strategy for regular expressions needed in lexers. +%Their algorithm generates POSIX values which encode the information of +%\emph{how} a regular expression matches a string---that is, which part +%of the string is matched by which part of the regular expression. In +%this paper we give our inductive definition of what a POSIX value is +%and show $(i)$ that such a value is unique (for given regular +%expression and string being matched) and $(ii)$ that Sulzmann and Lu's +%algorithm always generates such a value (provided that the regular +%expression matches the string). We show that $(iii)$ our inductive +%definition of a POSIX value is equivalent to an alternative definition +%by Okui and Suzuki which identifies POSIX values as least elements +%according to an ordering of values. We also prove the correctness of +%Sulzmann's bitcoded version of the POSIX matching algorithm and extend the +%results to additional constructors for regular expressions. \smallskip \end{abstract}