thys2/Paper/document/root.tex
changeset 404 1500f96707b0
parent 402 1612f2a77bf6
child 405 3cfea5bb5e23
equal deleted inserted replaced
403:6291181fad07 404:1500f96707b0
    43 
    43 
    44 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
    44 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
    45 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
    45 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
    46 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
    46 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
    47 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
    47 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
       
    48 \authorrunning{C.~Tan and C.~Urban}
       
    49 \keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL}
       
    50 \category{}
       
    51 \ccsdesc[100]{Design and analysis of algorithms}
       
    52 \ccsdesc[100]{Formal languages and automata theory}
       
    53 \Copyright{\mbox{}}
       
    54 \nolinenumbers
    48 
    55 
    49 
    56 
    50 \begin{document}
    57 \begin{document}
    51 \maketitle
    58 \maketitle
    52 
    59 
    53 \begin{abstract}
    60 \begin{abstract}
    54 Brzozowski introduced the notion of derivatives for regular
    61   Sulzmann and Lu described a lexing algorithm that calculates
    55 expressions. They can be used for a very simple regular expression
    62   Brzozowski derivatives using bitcodes annotated to regular
    56 matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
    63   expressions.  Their algorithm generates POSIX values which encode
    57 in order to deal with POSIX matching, which is the underlying
    64   the information of \emph{how} a regular expression matches a
    58 disambiguation strategy for regular expressions needed in lexers.
    65   string---that is, which part of the string is matched by which part
    59 Their algorithm generates POSIX values which encode the information of
    66   of the regular expression.  The purpose of the bitcodes is to generate POSIX values incrementally while
    60 \emph{how} a regular expression matches a string---that is, which part
    67   derivatives are calculated. They also help with designing
    61 of the string is matched by which part of the regular expression.  In
    68   an `aggressive' simplification function that keeps the size of
    62 this paper we give our inductive definition of what a POSIX value is
    69   derivatives small. Without simplification derivatives can grow
    63 and show $(i)$ that such a value is unique (for given regular
    70   exponentially resulting in an extremely slow lexing algorithm.  In this
    64 expression and string being matched) and $(ii)$ that Sulzmann and Lu's
    71   paper we describe a variant of Sulzmann and Lu's algorithm: Our
    65 algorithm always generates such a value (provided that the regular
    72   algorithm is a recursive functional program, whereas Sulzmann
    66 expression matches the string). We show that $(iii)$ our inductive
    73   and Lu's version involves a fixpoint construction. We \textit{(i)}
    67 definition of a POSIX value is equivalent to an alternative definition
    74   prove in Isabelle/HOL that our program is correct and generates
    68 by Okui and Suzuki which identifies POSIX values as least elements
    75   unique POSIX values; we also \textit{(ii)} establish a polynomial
    69 according to an ordering of values.  We also prove the correctness of
    76   bound for the size of the derivatives. The size can be seen as a
    70 Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
    77   proxy measure for the efficiency of the lexing algorithm: because of
    71 results to additional constructors for regular expressions.  \smallskip
    78   the polynomial bound our algorithm does not suffer from
    72 
    79   the exponential blowup in earlier works.
    73 {\bf Keywords:} POSIX matching, Derivatives of Regular Expressions,
    80   
    74 Isabelle/HOL
    81   % Brzozowski introduced the notion of derivatives for regular
       
    82   % expressions. They can be used for a very simple regular expression
       
    83   % matching algorithm.  Sulzmann and Lu cleverly extended this
       
    84   % algorithm in order to deal with POSIX matching, which is the
       
    85   % underlying disambiguation strategy for regular expressions needed
       
    86   % in lexers.  Their algorithm generates POSIX values which encode
       
    87   % the information of \emph{how} a regular expression matches a
       
    88   % string---that is, which part of the string is matched by which
       
    89   % part of the regular expression.  In this paper we give our
       
    90   % inductive definition of what a POSIX value is and show $(i)$ that
       
    91   % such a value is unique (for given regular expression and string
       
    92   % being matched) and $(ii)$ that Sulzmann and Lu's algorithm always
       
    93   % generates such a value (provided that the regular expression
       
    94   % matches the string). We show that $(iii)$ our inductive definition
       
    95   % of a POSIX value is equivalent to an alternative definition by
       
    96   % Okui and Suzuki which identifies POSIX values as least elements
       
    97   % according to an ordering of values.  We also prove the correctness
       
    98   % of Sulzmann's bitcoded version of the POSIX matching algorithm and
       
    99   % extend the results to additional constructors for regular
       
   100   % expressions.  \smallskip
    75 \end{abstract}
   101 \end{abstract}
    76 
   102 
    77 
   103 
    78 
   104 
    79 \input{session}
   105 \input{session}