thys2/Paper/document/root.tex
changeset 400 46e5566ad4ba
parent 398 dac6d27c99c6
child 401 8bbe2468fedc
equal deleted inserted replaced
399:d73f19be3ce6 400:46e5566ad4ba
    57 \begin{document}
    57 \begin{document}
    58 \maketitle
    58 \maketitle
    59 
    59 
    60 \begin{abstract}
    60 \begin{abstract}
    61   Sulzmann and Lu described a lexing algorithm that calculates
    61   Sulzmann and Lu described a lexing algorithm that calculates
    62   Brzozowski derivatives using bit-sequences annotated to regular
    62   Brzozowski derivatives using bitcodes annotated to regular
    63   expressions.  Their algorithm generates POSIX values which encode
    63   expressions.  Their algorithm generates POSIX values which encode
    64   the information of \emph{how} a regular expression matches a
    64   the information of \emph{how} a regular expression matches a
    65   string---that is, which part of the string is matched by which part
    65   string---that is, which part of the string is matched by which part
    66   of the regular expression.  The purpose of the bit-sequences in
    66   of the regular expression.  The purpose of the bitcodes in Sulzmann
    67   Sulzmann and Lu's algorithm is to keep the size of derivatives small
    67   and Lu's algorithm is to generate POSIX values incrementally while
    68   which is achieved by `aggressively' simplifying regular expressions.
    68   derivatives are calculated. However they also help with designing
    69   In this paper we describe a slight variant of Sulzmann and Lu's
    69   `aggressive' simplification methods that keep the size of
    70   algorithm and \textit{(i)} prove that this algorithm generates
    70   derivatives small. Without simplification derivatives can grow
    71   unique POSIX values; \textit{(ii)} we also establish a cubic bound
    71   exponentially resulting in an extremely slow lexing algorithm.  In this
    72   for the size of the derivatives---in earlier works, derivatives can
    72   paper we describe a variant of Sulzmann and Lu's algorithm: Our
    73   grow exponentially even after simplification.
    73   algorithm is a small, recursive functional program, whereas Sulzmann
       
    74   and Lu's version involves a fixpoint construction. We \textit{(i)}
       
    75   prove in Isabelle/HOL that our program is correct and generates
       
    76   unique POSIX values; we also \textit{(ii)} establish a polynomial
       
    77   bound for the size of the derivatives. The size can be seen as a
       
    78   proxy measure for the effeciency of the lexing algorithm---that means
       
    79   our algorithm does not suffer from the exponential blowup.
    74   
    80   
    75 %Brzozowski introduced the notion of derivatives for regular
    81   % Brzozowski introduced the notion of derivatives for regular
    76 %expressions. They can be used for a very simple regular expression
    82   % expressions. They can be used for a very simple regular expression
    77 %matching algorithm.  Sulzmann and Lu cleverly extended this algorithm
    83   % matching algorithm.  Sulzmann and Lu cleverly extended this
    78 %in order to deal with POSIX matching, which is the underlying
    84   % algorithm in order to deal with POSIX matching, which is the
    79 %disambiguation strategy for regular expressions needed in lexers.
    85   % underlying disambiguation strategy for regular expressions needed
    80 %Their algorithm generates POSIX values which encode the information of
    86   % in lexers.  Their algorithm generates POSIX values which encode
    81 %\emph{how} a regular expression matches a string---that is, which part
    87   % the information of \emph{how} a regular expression matches a
    82 %of the string is matched by which part of the regular expression.  In
    88   % string---that is, which part of the string is matched by which
    83 %this paper we give our inductive definition of what a POSIX value is
    89   % part of the regular expression.  In this paper we give our
    84 %and show $(i)$ that such a value is unique (for given regular
    90   % inductive definition of what a POSIX value is and show $(i)$ that
    85 %expression and string being matched) and $(ii)$ that Sulzmann and Lu's
    91   % such a value is unique (for given regular expression and string
    86 %algorithm always generates such a value (provided that the regular
    92   % being matched) and $(ii)$ that Sulzmann and Lu's algorithm always
    87 %expression matches the string). We show that $(iii)$ our inductive
    93   % generates such a value (provided that the regular expression
    88 %definition of a POSIX value is equivalent to an alternative definition
    94   % matches the string). We show that $(iii)$ our inductive definition
    89 %by Okui and Suzuki which identifies POSIX values as least elements
    95   % of a POSIX value is equivalent to an alternative definition by
    90 %according to an ordering of values.  We also prove the correctness of
    96   % Okui and Suzuki which identifies POSIX values as least elements
    91 %Sulzmann's bitcoded version of the POSIX matching algorithm and extend the
    97   % according to an ordering of values.  We also prove the correctness
    92 %results to additional constructors for regular expressions.  \smallskip
    98   % of Sulzmann's bitcoded version of the POSIX matching algorithm and
       
    99   % extend the results to additional constructors for regular
       
   100   % expressions.  \smallskip
    93 \end{abstract}
   101 \end{abstract}
    94 
   102 
    95 
   103 
    96 
   104 
    97 \input{session}
   105 \input{session}