43 |
43 |
44 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives} |
44 \title{{POSIX} {L}exing with {B}itcoded {D}erivatives} |
45 \titlerunning{POSIX Lexing with Bitcoded Derivatives} |
45 \titlerunning{POSIX Lexing with Bitcoded Derivatives} |
46 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{} |
46 \author{Chengsong Tan}{King's College London}{chengsong.tan@kcl.ac.uk}{}{} |
47 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{} |
47 \author{Christian Urban}{King's College London}{christian.urban@kcl.ac.uk}{}{} |
|
48 \keywords{POSIX matching, Derivatives of Regular Expressions, Isabelle/HOL} |
|
49 \category{} |
|
50 \ccsdesc[100]{Design and analysis of algorithms} |
|
51 \ccsdesc[100]{Formal languages and automata theory} |
|
52 \Copyright{\mbox{}} |
|
53 \nolinenumbers |
48 |
54 |
49 |
55 |
50 \begin{document} |
56 \begin{document} |
51 \maketitle |
57 \maketitle |
52 |
58 |
53 \begin{abstract} |
59 \begin{abstract} |
54 Brzozowski introduced the notion of derivatives for regular |
60 Sulzmann and Lu described a lexing algorithm that calculates |
55 expressions. They can be used for a very simple regular expression |
61 Brzozowski derivatives using bit-sequences annotated to regular |
56 matching algorithm. Sulzmann and Lu cleverly extended this algorithm |
62 expressions. Their algorithm generates POSIX values which encode |
57 in order to deal with POSIX matching, which is the underlying |
63 the information of \emph{how} a regular expression matches a |
58 disambiguation strategy for regular expressions needed in lexers. |
64 string---that is, which part of the string is matched by which part |
59 Their algorithm generates POSIX values which encode the information of |
65 of the regular expression. The purpose of the bit-sequences in |
60 \emph{how} a regular expression matches a string---that is, which part |
66 Sulzmann and Lu's algorithm is to keep the size of derivatives small |
61 of the string is matched by which part of the regular expression. In |
67 which is achieved by `aggressively' simplifying regular expressions. |
62 this paper we give our inductive definition of what a POSIX value is |
68 In this paper we describe a slight variant of Sulzmann and Lu's |
63 and show $(i)$ that such a value is unique (for given regular |
69 algorithm and \textit{(i)} prove that this algorithm generates |
64 expression and string being matched) and $(ii)$ that Sulzmann and Lu's |
70 unique POSIX values; \textit{(ii)} we also establish a cubic bound |
65 algorithm always generates such a value (provided that the regular |
71 for the size of the derivatives---in earlier works, derivatives can |
66 expression matches the string). We show that $(iii)$ our inductive |
72 grow exponentially even after simplification. |
67 definition of a POSIX value is equivalent to an alternative definition |
73 |
68 by Okui and Suzuki which identifies POSIX values as least elements |
74 %Brzozowski introduced the notion of derivatives for regular |
69 according to an ordering of values. We also prove the correctness of |
75 %expressions. They can be used for a very simple regular expression |
70 Sulzmann's bitcoded version of the POSIX matching algorithm and extend the |
76 %matching algorithm. Sulzmann and Lu cleverly extended this algorithm |
71 results to additional constructors for regular expressions. \smallskip |
77 %in order to deal with POSIX matching, which is the underlying |
72 |
78 %disambiguation strategy for regular expressions needed in lexers. |
73 {\bf Keywords:} POSIX matching, Derivatives of Regular Expressions, |
79 %Their algorithm generates POSIX values which encode the information of |
74 Isabelle/HOL |
80 %\emph{how} a regular expression matches a string---that is, which part |
|
81 %of the string is matched by which part of the regular expression. In |
|
82 %this paper we give our inductive definition of what a POSIX value is |
|
83 %and show $(i)$ that such a value is unique (for given regular |
|
84 %expression and string being matched) and $(ii)$ that Sulzmann and Lu's |
|
85 %algorithm always generates such a value (provided that the regular |
|
86 %expression matches the string). We show that $(iii)$ our inductive |
|
87 %definition of a POSIX value is equivalent to an alternative definition |
|
88 %by Okui and Suzuki which identifies POSIX values as least elements |
|
89 %according to an ordering of values. We also prove the correctness of |
|
90 %Sulzmann's bitcoded version of the POSIX matching algorithm and extend the |
|
91 %results to additional constructors for regular expressions. \smallskip |
75 \end{abstract} |
92 \end{abstract} |
76 |
93 |
77 |
94 |
78 |
95 |
79 \input{session} |
96 \input{session} |