57 \begin{document} |
57 \begin{document} |
58 \maketitle |
58 \maketitle |
59 |
59 |
60 \begin{abstract} |
60 \begin{abstract} |
61 Sulzmann and Lu described a lexing algorithm that calculates |
61 Sulzmann and Lu described a lexing algorithm that calculates |
62 Brzozowski derivatives using bit-sequences annotated to regular |
62 Brzozowski derivatives using bitcodes annotated to regular |
63 expressions. Their algorithm generates POSIX values which encode |
63 expressions. Their algorithm generates POSIX values which encode |
64 the information of \emph{how} a regular expression matches a |
64 the information of \emph{how} a regular expression matches a |
65 string---that is, which part of the string is matched by which part |
65 string---that is, which part of the string is matched by which part |
66 of the regular expression. The purpose of the bit-sequences in |
66 of the regular expression. The purpose of the bitcodes in Sulzmann |
67 Sulzmann and Lu's algorithm is to keep the size of derivatives small |
67 and Lu's algorithm is to generate POSIX values incrementally while |
68 which is achieved by `aggressively' simplifying regular expressions. |
68 derivatives are calculated. However they also help with designing |
69 In this paper we describe a slight variant of Sulzmann and Lu's |
69 `aggressive' simplification methods that keep the size of |
70 algorithm and \textit{(i)} prove that this algorithm generates |
70 derivatives small. Without simplification derivatives can grow |
71 unique POSIX values; \textit{(ii)} we also establish a cubic bound |
71 exponentially resulting in an extremely slow lexing algorithm. In this |
72 for the size of the derivatives---in earlier works, derivatives can |
72 paper we describe a variant of Sulzmann and Lu's algorithm: Our |
73 grow exponentially even after simplification. |
73 algorithm is a small, recursive functional program, whereas Sulzmann |
|
74 and Lu's version involves a fixpoint construction. We \textit{(i)} |
|
75 prove in Isabelle/HOL that our program is correct and generates |
|
76 unique POSIX values; we also \textit{(ii)} establish a polynomial |
|
77 bound for the size of the derivatives. The size can be seen as a |
|
78 proxy measure for the effeciency of the lexing algorithm---that means |
|
79 our algorithm does not suffer from the exponential blowup. |
74 |
80 |
75 %Brzozowski introduced the notion of derivatives for regular |
81 % Brzozowski introduced the notion of derivatives for regular |
76 %expressions. They can be used for a very simple regular expression |
82 % expressions. They can be used for a very simple regular expression |
77 %matching algorithm. Sulzmann and Lu cleverly extended this algorithm |
83 % matching algorithm. Sulzmann and Lu cleverly extended this |
78 %in order to deal with POSIX matching, which is the underlying |
84 % algorithm in order to deal with POSIX matching, which is the |
79 %disambiguation strategy for regular expressions needed in lexers. |
85 % underlying disambiguation strategy for regular expressions needed |
80 %Their algorithm generates POSIX values which encode the information of |
86 % in lexers. Their algorithm generates POSIX values which encode |
81 %\emph{how} a regular expression matches a string---that is, which part |
87 % the information of \emph{how} a regular expression matches a |
82 %of the string is matched by which part of the regular expression. In |
88 % string---that is, which part of the string is matched by which |
83 %this paper we give our inductive definition of what a POSIX value is |
89 % part of the regular expression. In this paper we give our |
84 %and show $(i)$ that such a value is unique (for given regular |
90 % inductive definition of what a POSIX value is and show $(i)$ that |
85 %expression and string being matched) and $(ii)$ that Sulzmann and Lu's |
91 % such a value is unique (for given regular expression and string |
86 %algorithm always generates such a value (provided that the regular |
92 % being matched) and $(ii)$ that Sulzmann and Lu's algorithm always |
87 %expression matches the string). We show that $(iii)$ our inductive |
93 % generates such a value (provided that the regular expression |
88 %definition of a POSIX value is equivalent to an alternative definition |
94 % matches the string). We show that $(iii)$ our inductive definition |
89 %by Okui and Suzuki which identifies POSIX values as least elements |
95 % of a POSIX value is equivalent to an alternative definition by |
90 %according to an ordering of values. We also prove the correctness of |
96 % Okui and Suzuki which identifies POSIX values as least elements |
91 %Sulzmann's bitcoded version of the POSIX matching algorithm and extend the |
97 % according to an ordering of values. We also prove the correctness |
92 %results to additional constructors for regular expressions. \smallskip |
98 % of Sulzmann's bitcoded version of the POSIX matching algorithm and |
|
99 % extend the results to additional constructors for regular |
|
100 % expressions. \smallskip |
93 \end{abstract} |
101 \end{abstract} |
94 |
102 |
95 |
103 |
96 |
104 |
97 \input{session} |
105 \input{session} |