81 the information of \emph{how} a regular expression matches a |
81 the information of \emph{how} a regular expression matches a |
82 string---that is, which part of the string is matched by which part |
82 string---that is, which part of the string is matched by which part |
83 of the regular expression. The purpose of the bitcodes is to generate POSIX values incrementally while |
83 of the regular expression. The purpose of the bitcodes is to generate POSIX values incrementally while |
84 derivatives are calculated. They also help with designing |
84 derivatives are calculated. They also help with designing |
85 an ``aggressive'' simplification function that keeps the size of |
85 an ``aggressive'' simplification function that keeps the size of |
86 derivatives small. Without simplification the size derivatives can grow |
86 derivatives finite. Without simplification the size derivatives can grow |
87 exponentially resulting in an extremely slow lexing algorithm. In this |
87 arbitrarily big resulting in an extremely slow lexing algorithm. In this |
88 paper we describe a variant of Sulzmann and Lu's algorithm: Our |
88 paper we describe a variant of Sulzmann and Lu's algorithm: Our |
89 algorithm is a recursive functional program, whereas Sulzmann |
89 algorithm is a recursive functional program, whereas Sulzmann |
90 and Lu's version involves a fixpoint construction. We \textit{(i)} |
90 and Lu's version involves a fixpoint construction. We \textit{(i)} |
91 prove in Isabelle/HOL that our program is correct and generates |
91 prove in Isabelle/HOL that our program is correct and generates |
92 unique POSIX values; we also \textit{(ii)} establish a polynomial |
92 unique POSIX values; we also \textit{(ii)} establish a finite |
93 bound for the size of the derivatives. The size can be seen as a |
93 bound for the size of the derivatives. |
94 proxy measure for the efficiency of the lexing algorithm: because of |
94 |
95 the polynomial bound our algorithm does not suffer from |
95 %The size can be seen as a |
96 the exponential blowup in earlier works. |
96 %proxy measure for the efficiency of the lexing algorithm: because of |
|
97 %the polynomial bound our algorithm does not suffer from |
|
98 %the exponential blowup in earlier works. |
97 |
99 |
98 % Brzozowski introduced the notion of derivatives for regular |
100 % Brzozowski introduced the notion of derivatives for regular |
99 % expressions. They can be used for a very simple regular expression |
101 % expressions. They can be used for a very simple regular expression |
100 % matching algorithm. Sulzmann and Lu cleverly extended this |
102 % matching algorithm. Sulzmann and Lu cleverly extended this |
101 % algorithm in order to deal with POSIX matching, which is the |
103 % algorithm in order to deal with POSIX matching, which is the |