1056 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
1056 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
1057 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
1057 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
1058 regular expression. |
1058 regular expression. |
1059 |
1059 |
1060 While regular expressions are convenient in formalisations, they have some |
1060 While regular expressions are convenient in formalisations, they have some |
1061 limitations. One is that there seems to be no notion of a minimal regular |
1061 limitations. One is that there seems to be no method of calculating a |
1062 expression, like there is for automata. For an implementation of a simple |
1062 minimal regular expression (for example in terms of length), like there is |
1063 regular expression matcher, whose correctness has been formally |
1063 for automata. For an implementation of a simple regular expression matcher, |
1064 established, we refer the reader to Owens and Slind \cite{OwensSlind08}. |
1064 whose correctness has been formally established, we refer the reader to |
|
1065 Owens and Slind \cite{OwensSlind08}. |
1065 |
1066 |
1066 |
1067 |
1067 Our formalisation consists of 790 lines of Isabelle/Isar code for the first |
1068 Our formalisation consists of 790 lines of Isabelle/Isar code for the first |
1068 direction and 475 for the second, plus standard material about regular languages. |
1069 direction and 475 for the second, plus 300 lines of standard material about |
1069 While this might be seen as too large to |
1070 regular languages. While this might be seen as too large to count as a |
1070 count as a concise proof pearl, this should be seen in the context of the |
1071 concise proof pearl, this should be seen in the context of the work done by |
1071 work done by Constable at al \cite{Constable00} who formalised the |
1072 Constable at al \cite{Constable00} who formalised the Myhill-Nerode theorem |
1072 Myhill-Nerode theorem in Nuprl using automata. |
1073 in Nuprl using automata. They write that their four-member team needed |
1073 They write that their |
1074 something on the magnitute of 18 months for their formalisation. The |
1074 four-member team needed something on the magnitute of 18 months for their |
1075 estimate for our formalisation is that we needed approximately 3 months and |
1075 formalisation. The estimate for our formalisation is that we |
1076 this included the time to find our proof arguments. Unlike Constable et al, |
1076 needed approximately 3 months and this included the time to find our proof |
1077 who were able to follow the proofs from \cite{HopcroftUllman69}, we had to |
1077 arguments. Unlike Constable et al, who were able to follow the proofs from |
1078 find our own arguments. So for us the formalisation was not the |
1078 \cite{HopcroftUllman69}, we had to find our own arguments. So for us the formalisation |
1079 bottleneck. It is hard to gauge the size of a formalisation in Nurpl, but |
1079 was not the bottleneck. It is hard to gauge the size of a formalisation in |
1080 from what is shown in the Nuprl Math Library about their development it |
1080 Nurpl, but from what is shown in the Nuprl Math Library about their |
1081 seems substantially larger than ours. The code of ours can be found in the |
1081 development it seems substantially larger than ours. The code of |
1082 Mercurial Repository at |
1082 ours can be found in the Mercurial Repository at |
|
1083 |
1083 |
1084 \begin{center} |
1084 \begin{center} |
1085 \url{http://www4.in.tum.de/~urbanc/cgi-bin/repos.cgi/regexp/} |
1085 \url{http://www4.in.tum.de/~urbanc/cgi-bin/repos.cgi/regexp/} |
1086 \end{center} |
1086 \end{center} |
1087 |
1087 |
1106 expressions and shows that there can be only finitely many of them. We could |
1106 expressions and shows that there can be only finitely many of them. We could |
1107 have used as the tag of a string @{text s} the derivative of a regular expression |
1107 have used as the tag of a string @{text s} the derivative of a regular expression |
1108 generated with respect to @{text s}. Using the fact that two strings are |
1108 generated with respect to @{text s}. Using the fact that two strings are |
1109 Myhill-Nerode related whenever their derivative is the same together with |
1109 Myhill-Nerode related whenever their derivative is the same together with |
1110 the fact that there are only finitely many derivatives for a regular |
1110 the fact that there are only finitely many derivatives for a regular |
1111 expression would give us the same argument as ours. However it seems not so easy to |
1111 expression would give us a similar argument as ours. However it seems not so easy to |
1112 calculate the derivatives and then to count them. Therefore we preferred our |
1112 calculate the derivatives and then to count them. Therefore we preferred our |
1113 direct method of using tagging-functions involving equivalence classes. This |
1113 direct method of using tagging-functions involving equivalence classes. This |
1114 is also where our method shines, because we can completely side-step the |
1114 is also where our method shines, because we can completely side-step the |
1115 standard argument \cite{Kozen97} where automata need to be composed, which |
1115 standard argument \cite{Kozen97} where automata need to be composed, which |
1116 as stated in the Introduction is not so convenient to formalise in a |
1116 as stated in the Introduction is not so convenient to formalise in a |