64 the information of \emph{how} a regular expression matches a |
64 the information of \emph{how} a regular expression matches a |
65 string---that is, which part of the string is matched by which part |
65 string---that is, which part of the string is matched by which part |
66 of the regular expression. The purpose of the bitcodes in Sulzmann |
66 of the regular expression. The purpose of the bitcodes in Sulzmann |
67 and Lu's algorithm is to generate POSIX values incrementally while |
67 and Lu's algorithm is to generate POSIX values incrementally while |
68 derivatives are calculated. However they also help with designing |
68 derivatives are calculated. However they also help with designing |
69 `aggressive' simplification methods that keep the size of |
69 `aggressive' simplification functions that keep the size of |
70 derivatives small. Without simplification derivatives can grow |
70 derivatives small. Without simplification derivatives can grow |
71 exponentially resulting in an extremely slow lexing algorithm. In this |
71 exponentially resulting in an extremely slow lexing algorithm. In this |
72 paper we describe a variant of Sulzmann and Lu's algorithm: Our |
72 paper we describe a variant of Sulzmann and Lu's algorithm: Our |
73 algorithm is a small, recursive functional program, whereas Sulzmann |
73 algorithm is a small, recursive functional program, whereas Sulzmann |
74 and Lu's version involves a fixpoint construction. We \textit{(i)} |
74 and Lu's version involves a fixpoint construction. We \textit{(i)} |
75 prove in Isabelle/HOL that our program is correct and generates |
75 prove in Isabelle/HOL that our program is correct and generates |
76 unique POSIX values; we also \textit{(ii)} establish a polynomial |
76 unique POSIX values; we also \textit{(ii)} establish a polynomial |
77 bound for the size of the derivatives. The size can be seen as a |
77 bound for the size of the derivatives. The size can be seen as a |
78 proxy measure for the effeciency of the lexing algorithm---that means |
78 proxy measure for the efficiency of the lexing algorithm---that means |
79 our algorithm does not suffer from the exponential blowup. |
79 because of the polynomial bound our algorithm does not suffer from |
|
80 the exponential blowup in earlier works. |
80 |
81 |
81 % Brzozowski introduced the notion of derivatives for regular |
82 % Brzozowski introduced the notion of derivatives for regular |
82 % expressions. They can be used for a very simple regular expression |
83 % expressions. They can be used for a very simple regular expression |
83 % matching algorithm. Sulzmann and Lu cleverly extended this |
84 % matching algorithm. Sulzmann and Lu cleverly extended this |
84 % algorithm in order to deal with POSIX matching, which is the |
85 % algorithm in order to deal with POSIX matching, which is the |