32 |
32 |
33 section {* Introduction *} |
33 section {* Introduction *} |
34 |
34 |
35 text {* |
35 text {* |
36 |
36 |
37 Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em |
37 In the last fifteen or so years, Brzozowski's derivatives of regular |
38 derivative} @{term "der c r"} of a regular expression \<open>r\<close> w.r.t.\ |
38 expressions have sparked quite a bit of interest in the functional |
39 a character~\<open>c\<close>, and showed that it gave a simple solution to the |
39 programming and theorem prover communities. The beauty of |
40 problem of matching a string @{term s} with a regular expression @{term |
40 Brzozowski's derivatives \cite{Brzozowski1964} is that they are neatly |
41 r}: if the derivative of @{term r} w.r.t.\ (in succession) all the |
41 expressible in any functional language, and easily definable and |
42 characters of the string matches the empty string, then @{term r} |
42 reasoned about in theorem provers---the definitions just consist of |
43 matches @{term s} (and {\em vice versa}). The derivative has the |
43 inductive datatypes and simple recursive functions. A mechanised |
44 property (which may almost be regarded as its specification) that, for |
44 correctness proof of Brzozowski's matcher in for example HOL4 has been |
45 every string @{term s} and regular expression @{term r} and character |
45 mentioned by Owens and Slind~\cite{Owens2008}. Another one in |
46 @{term c}, one has @{term "cs \<in> L(r)"} if and only if \mbox{@{term "s \<in> L(der c r)"}}. |
46 Isabelle/HOL is part of the work by Krauss and Nipkow |
47 The beauty of Brzozowski's derivatives is that |
47 \cite{Krauss2011}. And another one in Coq is given by Coquand and |
48 they are neatly expressible in any functional language, and easily |
48 Siles \cite{Coquand2012}. |
49 definable and reasoned about in theorem provers---the definitions just |
49 |
50 consist of inductive datatypes and simple recursive functions. A |
50 |
51 mechanised correctness proof of Brzozowski's matcher in for example HOL4 |
51 The notion of derivatives |
52 has been mentioned by Owens and Slind~\cite{Owens2008}. Another one in |
52 \cite{Brzozowski1964}, written @{term "der c r"}, of a regular |
53 Isabelle/HOL is part of the work by Krauss and Nipkow \cite{Krauss2011}. |
53 expression give a simple solution to the problem of matching a string |
54 And another one in Coq is given by Coquand and Siles \cite{Coquand2012}. |
54 @{term s} with a regular expression @{term r}: if the derivative of |
|
55 @{term r} w.r.t.\ (in succession) all the characters of the string |
|
56 matches the empty string, then @{term r} matches @{term s} (and {\em |
|
57 vice versa}). The derivative has the property (which may almost be |
|
58 regarded as its specification) that, for every string @{term s} and |
|
59 regular expression @{term r} and character @{term c}, one has @{term |
|
60 "cs \<in> L(r)"} if and only if \mbox{@{term "s \<in> L(der c r)"}}. |
|
61 |
55 |
62 |
56 If a regular expression matches a string, then in general there is more |
63 If a regular expression matches a string, then in general there is more |
57 than one way of how the string is matched. There are two commonly used |
64 than one way of how the string is matched. There are two commonly used |
58 disambiguation strategies to generate a unique answer: one is called |
65 disambiguation strategies to generate a unique answer: one is called |
59 GREEDY matching \cite{Frisch2004} and the other is POSIX |
66 GREEDY matching \cite{Frisch2004} and the other is POSIX |