thys2/Paper/document/root.tex
changeset 423 b7199d6c672d
parent 420 b66a4305749c
child 424 2416fdec6396
equal deleted inserted replaced
422:fb23e3fd12e5 423:b7199d6c672d
    61 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
    61 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
    62 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
    62 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
    63 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
    63 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
    64 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
    64 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
    65 \authorrunning{C.~Tan and C.~Urban}
    65 \authorrunning{C.~Tan and C.~Urban}
    66 \keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL}
    66 \keywords{POSIX matching and lexing, Derivatives of Regular Expressions, Isabelle/HOL}
    67 \category{}
    67 \category{}
    68 \ccsdesc[100]{Design and analysis of algorithms}
    68 \ccsdesc[100]{Design and analysis of algorithms}
    69 \ccsdesc[100]{Formal languages and automata theory}
    69 \ccsdesc[100]{Formal languages and automata theory}
    70 \Copyright{\mbox{}}
    70 \Copyright{\mbox{}}
    71 \nolinenumbers
    71 \nolinenumbers
    73 
    73 
    74 \begin{document}
    74 \begin{document}
    75 \maketitle
    75 \maketitle
    76 
    76 
    77 \begin{abstract}
    77 \begin{abstract}
    78   Sulzmann and Lu described a lexing algorithm that calculates
    78   Sulzmann and Lu describe a lexing algorithm that calculates
    79   Brzozowski derivatives using bitcodes annotated to regular
    79   Brzozowski derivatives using bitcodes annotated to regular
    80   expressions.  Their algorithm generates POSIX values which encode
    80   expressions.  Their algorithm generates POSIX values which encode
    81   the information of \emph{how} a regular expression matches a
    81   the information of \emph{how} a regular expression matches a
    82   string---that is, which part of the string is matched by which part
    82   string---that is, which part of the string is matched by which part
    83   of the regular expression.  The purpose of the bitcodes is to generate POSIX values incrementally while
    83   of the regular expression. This information is needed in the
       
    84   context of lexing in order to extract and to classify tokens.
       
    85   The purpose of the bitcodes is to generate POSIX values incrementally while
    84   derivatives are calculated. They also help with designing
    86   derivatives are calculated. They also help with designing
    85   an ``aggressive'' simplification function that keeps the size of
    87   an ``aggressive'' simplification function that keeps the size of
    86   derivatives finite. Without simplification the size derivatives can grow
    88   derivatives finite. Without simplification the size of some derivatives can grow
    87   arbitrarily big resulting in an extremely slow lexing algorithm.  In this
    89   arbitrarily big resulting in an extremely slow lexing algorithm.  In this
    88   paper we describe a variant of Sulzmann and Lu's algorithm: Our
    90   paper we describe a variant of Sulzmann and Lu's algorithm: Our
    89   algorithm is a recursive functional program, whereas Sulzmann
    91   algorithm is a recursive functional program, whereas Sulzmann
    90   and Lu's version involves a fixpoint construction. We \textit{(i)}
    92   and Lu's version involves a fixpoint construction. We \textit{(i)}
    91   prove in Isabelle/HOL that our algorithm is correct and generates
    93   prove in Isabelle/HOL that our algorithm is correct and generates