lexing: comparison thys2/Paper/document/root.tex

equal deleted inserted replaced

-:6291181fad07
+:1500f96707b0
 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
+\authorrunning{C.~Tan and C.~Urban}
+\keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL}
+\category{}
+\ccsdesc[100]{Design and analysis of algorithms}
+\ccsdesc[100]{Formal languages and automata theory}
+\Copyright{\mbox{}}
+\nolinenumbers
 \begin{document}
 \maketitle
 \begin{abstract}
-Brzozowski introduced the notion of derivatives for regular
+Sulzmann and Lu described a lexing algorithm that calculates
-expressions. They can be used for a very simple regular expression
+Brzozowski derivatives using bitcodes annotated to regular
-matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
+expressions.  Their algorithm generates POSIX values which encode
-in order to deal with POSIX matching, which is the underlying
+the information of \emph{how} a regular expression matches a
-disambiguation strategy for regular expressions needed in lexers.
+string---that is, which part of the string is matched by which part
-Their algorithm generates POSIX values which encode the information of
+of the regular expression.  The purpose of the bitcodes is to generate POSIX values incrementally while
-\emph{how} a regular expression matches a string---that is, which part
+derivatives are calculated. They also help with designing
-of the string is matched by which part of the regular expression.  In
+an `aggressive' simplification function that keeps the size of
-this paper we give our inductive definition of what a POSIX value is
+derivatives small. Without simplification derivatives can grow
-and show $(i)$ that such a value is unique (for given regular
+exponentially resulting in an extremely slow lexing algorithm.  In this
-expression and string being matched) and $(ii)$ that Sulzmann and Lu's
+paper we describe a variant of Sulzmann and Lu's algorithm: Our
-algorithm always generates such a value (provided that the regular
+algorithm is a recursive functional program, whereas Sulzmann
-expression matches the string). We show that $(iii)$ our inductive
+and Lu's version involves a fixpoint construction. We \textit{(i)}
-definition of a POSIX value is equivalent to an alternative definition
+prove in Isabelle/HOL that our program is correct and generates
-by Okui and Suzuki which identifies POSIX values as least elements
+unique POSIX values; we also \textit{(ii)} establish a polynomial
-according to an ordering of values.  We also prove the correctness of
+bound for the size of the derivatives. The size can be seen as a
-Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
+proxy measure for the efficiency of the lexing algorithm: because of
-results to additional constructors for regular expressions.  \smallskip
+the polynomial bound our algorithm does not suffer from
+the exponential blowup in earlier works.
-{\bf Keywords:} POSIX matching, Derivatives of Regular Expressions,
-Isabelle/HOL
+% Brzozowski introduced the notion of derivatives for regular
+% expressions. They can be used for a very simple regular expression
+% matching algorithm.  Sulzmann and Lu cleverly extended this
+% algorithm in order to deal with POSIX matching, which is the
+% underlying disambiguation strategy for regular expressions needed
+% in lexers.  Their algorithm generates POSIX values which encode
+% the information of \emph{how} a regular expression matches a
+% string---that is, which part of the string is matched by which
+% part of the regular expression.  In this paper we give our
+% inductive definition of what a POSIX value is and show $(i)$ that
+% such a value is unique (for given regular expression and string
+% being matched) and $(ii)$ that Sulzmann and Lu's algorithm always
+% generates such a value (provided that the regular expression
+% matches the string). We show that $(iii)$ our inductive definition
+% of a POSIX value is equivalent to an alternative definition by
+% Okui and Suzuki which identifies POSIX values as least elements
+% according to an ordering of values.  We also prove the correctness
+% of Sulzmann's bitcoded version of the POSIX matching algorithm and
+% extend the results to additional constructors for regular
+% expressions.  \smallskip
 \end{abstract}
 \input{session}

changeset 404	1500f96707b0
parent 402	1612f2a77bf6
child 405	3cfea5bb5e23