thys2/Paper/document/root.tex
changeset 397 e1b74d618f1b
parent 396 cc8e231529fb
child 398 dac6d27c99c6
equal deleted inserted replaced
396:cc8e231529fb 397:e1b74d618f1b
    43 
    43 
    44 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
    44 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
    45 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
    45 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
    46 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
    46 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
    47 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
    47 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
       
    48 \keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL}
       
    49 \category{}
       
    50 \ccsdesc[100]{Design and analysis of algorithms}
       
    51 \ccsdesc[100]{Formal languages and automata theory}
       
    52 \Copyright{\mbox{}}
       
    53 \nolinenumbers
    48 
    54 
    49 
    55 
    50 \begin{document}
    56 \begin{document}
    51 \maketitle
    57 \maketitle
    52 
    58 
    53 \begin{abstract}
    59 \begin{abstract}
    54 Brzozowski introduced the notion of derivatives for regular
    60   Sulzmann and Lu described a lexing algorithm that calculates
    55 expressions. They can be used for a very simple regular expression
    61   Brzozowski derivatives using bit-sequences annotated to regular
    56 matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
    62   expressions.  Their algorithm generates POSIX values which encode
    57 in order to deal with POSIX matching, which is the underlying
    63   the information of \emph{how} a regular expression matches a
    58 disambiguation strategy for regular expressions needed in lexers.
    64   string---that is, which part of the string is matched by which part
    59 Their algorithm generates POSIX values which encode the information of
    65   of the regular expression.  The purpose of the bit-sequences in
    60 \emph{how} a regular expression matches a string---that is, which part
    66   Sulzmann and Lu's algorithm is to keep the size of derivatives small
    61 of the string is matched by which part of the regular expression.  In
    67   which is achieved by `aggressively' simplifying regular expressions.
    62 this paper we give our inductive definition of what a POSIX value is
    68   In this paper we describe a slight variant of Sulzmann and Lu's
    63 and show $(i)$ that such a value is unique (for given regular
    69   algorithm and \textit{(i)} prove that this algorithm generates
    64 expression and string being matched) and $(ii)$ that Sulzmann and Lu's
    70   unique POSIX values; \textit{(ii)} we also establish a cubic bound
    65 algorithm always generates such a value (provided that the regular
    71   for the size of the derivatives---in earlier works, derivatives can
    66 expression matches the string). We show that $(iii)$ our inductive
    72   grow exponentially even after simplification.
    67 definition of a POSIX value is equivalent to an alternative definition
    73   
    68 by Okui and Suzuki which identifies POSIX values as least elements
    74 %Brzozowski introduced the notion of derivatives for regular
    69 according to an ordering of values.  We also prove the correctness of
    75 %expressions. They can be used for a very simple regular expression
    70 Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
    76 %matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
    71 results to additional constructors for regular expressions.  \smallskip
    77 %in order to deal with POSIX matching, which is the underlying
    72 
    78 %disambiguation strategy for regular expressions needed in lexers.
    73 {\bf Keywords:} POSIX matching, Derivatives of Regular Expressions,
    79 %Their algorithm generates POSIX values which encode the information of
    74 Isabelle/HOL
    80 %\emph{how} a regular expression matches a string---that is, which part
       
    81 %of the string is matched by which part of the regular expression.  In
       
    82 %this paper we give our inductive definition of what a POSIX value is
       
    83 %and show $(i)$ that such a value is unique (for given regular
       
    84 %expression and string being matched) and $(ii)$ that Sulzmann and Lu's
       
    85 %algorithm always generates such a value (provided that the regular
       
    86 %expression matches the string). We show that $(iii)$ our inductive
       
    87 %definition of a POSIX value is equivalent to an alternative definition
       
    88 %by Okui and Suzuki which identifies POSIX values as least elements
       
    89 %according to an ordering of values.  We also prove the correctness of
       
    90 %Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
       
    91 %results to additional constructors for regular expressions.  \smallskip
    75 \end{abstract}
    92 \end{abstract}
    76 
    93 
    77 
    94 
    78 
    95 
    79 \input{session}
    96 \input{session}