thys3/document/root.tex
changeset 496 f493a20feeb3
child 499 6a100d32314c
equal deleted inserted replaced
495:f9cdc295ccf7 496:f493a20feeb3
       
     1 \documentclass[runningheads]{lipics-v2021}
       
     2 \usepackage{times}
       
     3 \usepackage{isabelle}
       
     4 \usepackage{isabellesym}
       
     5 \usepackage{amsmath}
       
     6 \usepackage{amssymb}
       
     7 \usepackage{mathpartir}
       
     8 \usepackage{tikz}
       
     9 \usepackage{pgf}
       
    10 \usetikzlibrary{positioning}
       
    11 %\usepackage{pdfsetup}
       
    12 \usepackage{stmaryrd}
       
    13 %\usepackage{url}
       
    14 %\usepackage{color}
       
    15 %\usepackage[safe]{tipa}
       
    16 %\usepackage[sc]{mathpazo}
       
    17 %\usepackage{fontspec}
       
    18 %\setmainfont[Ligatures=TeX]{Palatino Linotype}
       
    19 
       
    20 
       
    21 
       
    22 
       
    23 \urlstyle{rm}
       
    24 \isabellestyle{it}
       
    25 \renewcommand{\isastyleminor}{\it}% 
       
    26 \renewcommand{\isastyle}{\normalsize\it}%
       
    27 
       
    28 
       
    29 \def\dn{\,\stackrel{\mbox{\scriptsize def}}{=}\,}
       
    30 \renewcommand{\isasymequiv}{$\dn$}
       
    31 \renewcommand{\isasymemptyset}{$\varnothing$}
       
    32 \renewcommand{\isacharunderscore}{\mbox{$\_\!\_$}}
       
    33 \renewcommand{\isasymiota}{\makebox[0mm]{${}^{\prime}$}}
       
    34 \renewcommand{\isasymin}{\ensuremath{\,\in\,}}
       
    35 
       
    36 \addtolength{\oddsidemargin}{-1.5mm}
       
    37 \addtolength{\evensidemargin}{-1.5mm}
       
    38 \addtolength{\textwidth}{4mm}
       
    39 \addtolength{\textheight}{1.5mm}
       
    40 
       
    41 \def\lexer{\mathit{lexer}}
       
    42 \def\mkeps{\mathit{mkeps}}
       
    43 \def\inj{\mathit{inj}}
       
    44 \def\Empty{\mathit{Empty}}
       
    45 \def\Left{\mathit{Left}}
       
    46 \def\Right{\mathit{Right}}
       
    47 \def\Stars{\mathit{Stars}}
       
    48 \def\Char{\mathit{Char}}
       
    49 \def\Seq{\mathit{Seq}}
       
    50 \def\Der{\mathit{Der}}
       
    51 \def\nullable{\mathit{nullable}}
       
    52 \def\Z{\mathit{Z}}
       
    53 \def\S{\mathit{S}}
       
    54 \newcommand{\ZERO}{\mbox{\bf 0}}
       
    55 \newcommand{\ONE}{\mbox{\bf 1}}
       
    56 \def\rs{\mathit{rs}}
       
    57 
       
    58 \def\Brz{Brzozowski}
       
    59 \def\der{\backslash}
       
    60 \newtheorem{falsehood}{Falsehood}
       
    61 \newtheorem{conject}{Conjecture}
       
    62 
       
    63 \bibliographystyle{plainurl}
       
    64 
       
    65 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives}
       
    66 \titlerunning{POSIX Lexing with Bitcoded Derivatives}
       
    67 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{}
       
    68 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{}
       
    69 \authorrunning{C.~Tan and C.~Urban}
       
    70 \keywords{POSIX matching and lexing, derivatives of regular expressions, Isabelle/HOL}
       
    71 \category{}
       
    72 \ccsdesc[100]{Design and analysis of algorithms}
       
    73 \ccsdesc[100]{Formal languages and automata theory}
       
    74 \Copyright{\mbox{}}
       
    75 \renewcommand{\DOIPrefix}{}
       
    76 \nolinenumbers
       
    77 
       
    78 
       
    79 \begin{document}
       
    80 \maketitle
       
    81 
       
    82 \begin{abstract}
       
    83   Sulzmann and Lu describe a lexing algorithm that calculates
       
    84   Brzozowski derivatives using bitcodes annotated to regular
       
    85   expressions.  Their algorithm generates POSIX values which encode
       
    86   the information of \emph{how} a regular expression matches a
       
    87   string---that is, which part of the string is matched by which part
       
    88   of the regular expression. This information is needed in the
       
    89   context of lexing in order to extract and to classify tokens.
       
    90   The purpose of the bitcodes is to generate POSIX values incrementally while
       
    91   derivatives are calculated. They also help with designing
       
    92   an ``aggressive'' simplification function that keeps the size of
       
    93   derivatives finitely bounded. Without simplification the size of some derivatives can grow
       
    94   arbitrarily big, resulting in an extremely slow lexing algorithm.  In this
       
    95   paper we describe a variant of Sulzmann and Lu's algorithm: Our
       
    96   variant is a recursive functional program, whereas Sulzmann
       
    97   and Lu's version involves a fixpoint construction. We \textit{(i)}
       
    98   prove in Isabelle/HOL that our algorithm is correct and generates
       
    99   unique POSIX values; we also \textit{(ii)} establish finite
       
   100   bounds for the size of the derivatives.
       
   101 
       
   102   %The size can be seen as a
       
   103   %proxy measure for the efficiency of the lexing algorithm: because of
       
   104   %the polynomial bound our algorithm does not suffer from
       
   105   %the exponential blowup in earlier works.
       
   106   
       
   107   % Brzozowski introduced the notion of derivatives for regular
       
   108   % expressions. They can be used for a very simple regular expression
       
   109   % matching algorithm.  Sulzmann and Lu cleverly extended this
       
   110   % algorithm in order to deal with POSIX matching, which is the
       
   111   % underlying disambiguation strategy for regular expressions needed
       
   112   % in lexers.  Their algorithm generates POSIX values which encode
       
   113   % the information of \emph{how} a regular expression matches a
       
   114   % string---that is, which part of the string is matched by which
       
   115   % part of the regular expression.  In this paper we give our
       
   116   % inductive definition of what a POSIX value is and show $(i)$ that
       
   117   % such a value is unique (for given regular expression and string
       
   118   % being matched) and $(ii)$ that Sulzmann and Lu's algorithm always
       
   119   % generates such a value (provided that the regular expression
       
   120   % matches the string). We show that $(iii)$ our inductive definition
       
   121   % of a POSIX value is equivalent to an alternative definition by
       
   122   % Okui and Suzuki which identifies POSIX values as least elements
       
   123   % according to an ordering of values.  We also prove the correctness
       
   124   % of Sulzmann's bitcoded version of the POSIX matching algorithm and
       
   125   % extend the results to additional constructors for regular
       
   126   % expressions.  \smallskip
       
   127 \end{abstract}
       
   128 
       
   129 
       
   130 
       
   131 \input{session}
       
   132 
       
   133 
       
   134 
       
   135 \end{document}
       
   136 
       
   137 %%% Local Variables:
       
   138 %%% mode: latex
       
   139 %%% TeX-master: t
       
   140 %%% End: