357 carried out in HOL-based theorem provers. Nipkow \cite{Nipkow98} establishes |
357 carried out in HOL-based theorem provers. Nipkow \cite{Nipkow98} establishes |
358 the link between regular expressions and automata in the context of |
358 the link between regular expressions and automata in the context of |
359 lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata |
359 lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata |
360 working over bit strings in the context of Presburger arithmetic. The only |
360 working over bit strings in the context of Presburger arithmetic. The only |
361 larger formalisations of automata theory are carried out in Nuprl |
361 larger formalisations of automata theory are carried out in Nuprl |
362 \cite{Constable00} and in Coq \cite{Filliatre97}. |
362 \cite{Constable00} and in Coq, e.g.~\cite{Filliatre97,Almeidaetal10}. |
363 |
363 |
364 Also one might consider automata theory and regular languages as a meticulously |
364 Also one might consider automata theory and regular languages as a meticulously |
365 researched subject where everything is crystal clear. However, paper proofs about |
365 researched subject where everything is crystal clear. However, paper proofs about |
366 automata often involve subtle side-conditions which are easily overlooked, |
366 automata often involve subtle side-conditions which are easily overlooked, |
367 but which make formal reasoning rather painful. For example Kozen's proof of |
367 but which make formal reasoning rather painful. For example Kozen's proof of |
472 the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}. |
472 the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}. |
473 Property @{text "(iv)"} states that a non-empty string in @{term "A\<star>"} can |
473 Property @{text "(iv)"} states that a non-empty string in @{term "A\<star>"} can |
474 always be split up into a non-empty prefix belonging to @{text "A"} and the |
474 always be split up into a non-empty prefix belonging to @{text "A"} and the |
475 rest being in @{term "A\<star>"}. We omit |
475 rest being in @{term "A\<star>"}. We omit |
476 the proofs for these properties, but invite the reader to consult our |
476 the proofs for these properties, but invite the reader to consult our |
477 formalisation.\footnote{Available at \url{http://www4.in.tum.de/~urbanc/regexp.html}} |
477 formalisation.\footnote{Available in the Archive of Formal Proofs at |
|
478 \url{http://afp.sf.net/entries/Myhill-Nerode.shtml} |
|
479 \cite{myhillnerodeafp11}.} |
478 |
480 |
479 The notation in Isabelle/HOL for the quotient of a language @{text A} |
481 The notation in Isabelle/HOL for the quotient of a language @{text A} |
480 according to an equivalence relation @{term REL} is @{term "A // REL"}. We |
482 according to an equivalence relation @{term REL} is @{term "A // REL"}. We |
481 will write @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined as |
483 will write @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined as |
482 \mbox{@{text "{y | y \<approx> x}"}}, and have @{text "x \<approx> y"} if and only if @{text |
484 \mbox{@{text "{y | y \<approx> x}"}}, and have @{text "x \<approx> y"} if and only if @{text |
1757 Myhill-Nerode theorem, we have to be able to establish that for the |
1759 Myhill-Nerode theorem, we have to be able to establish that for the |
1758 corresponding language there are only finitely many derivatives---thus |
1760 corresponding language there are only finitely many derivatives---thus |
1759 ensuring that there are only finitely many equivalence |
1761 ensuring that there are only finitely many equivalence |
1760 classes. Unfortunately, this is not true in general. Sakarovitch gives an |
1762 classes. Unfortunately, this is not true in general. Sakarovitch gives an |
1761 example where a regular expression has infinitely many derivatives |
1763 example where a regular expression has infinitely many derivatives |
1762 w.r.t.~the language \mbox{@{term "({a} \<cdot> {b})\<star> \<union> ({a} \<cdot> {b})\<star> \<cdot> {a}"}} |
1764 w.r.t.~the language @{text "(ab)\<^isup>\<star> \<union> (ab)\<^isup>\<star>a"}, which is formally |
|
1765 written in our notation as \mbox{@{text "{[a,b]}\<^isup>\<star> \<union> ({[a,b]}\<^isup>\<star> \<cdot> {[a]})"}} |
1763 (see \cite[Page~141]{Sakarovitch09}). |
1766 (see \cite[Page~141]{Sakarovitch09}). |
1764 |
1767 |
1765 |
1768 |
1766 What Brzozowski \cite{Brzozowski64} established is that for every language there |
1769 What Brzozowski \cite{Brzozowski64} established is that for every language there |
1767 \emph{are} only finitely `dissimilar' derivatives for a regular |
1770 \emph{are} only finitely `dissimilar' derivatives for a regular |
2114 terms of already existing notions. A convenient definition for automata |
2117 terms of already existing notions. A convenient definition for automata |
2115 (based on graphs) uses a polymorphic type for the state nodes. This allows |
2118 (based on graphs) uses a polymorphic type for the state nodes. This allows |
2116 us to use the standard operation for disjoint union whenever we need to compose two |
2119 us to use the standard operation for disjoint union whenever we need to compose two |
2117 automata. Unfortunately, we cannot use such a polymorphic definition |
2120 automata. Unfortunately, we cannot use such a polymorphic definition |
2118 in HOL as part of the definition for regularity of a language (a predicate |
2121 in HOL as part of the definition for regularity of a language (a predicate |
2119 over set of strings). Consider the following attempt: |
2122 over set of strings). Consider for example the following attempt: |
2120 |
2123 |
2121 \begin{center} |
2124 \begin{center} |
2122 @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"} |
2125 @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"} |
2123 \end{center} |
2126 \end{center} |
2124 |
2127 |
2218 |
2221 |
2219 While our formalisation might appear large, it should be seen |
2222 While our formalisation might appear large, it should be seen |
2220 in the context of the work done by Constable at al \cite{Constable00} who |
2223 in the context of the work done by Constable at al \cite{Constable00} who |
2221 formalised the Myhill-Nerode theorem in Nuprl using automata. They write |
2224 formalised the Myhill-Nerode theorem in Nuprl using automata. They write |
2222 that their four-member team needed something on the magnitude of 18 months |
2225 that their four-member team needed something on the magnitude of 18 months |
2223 for their formalisation. The estimate for our formalisation is that we |
2226 for their formalisation. Also, Filli\^atre reports that his formalisation in |
|
2227 Coq of automata theory and Kleene's theorem is ``rather big''. |
|
2228 \cite{Filliatre97} More recently, Almeida et al reported about another |
|
2229 formalisation of regular languages in Coq \cite{Almeidaetal10}. Their |
|
2230 main result is the |
|
2231 correctness of Mirkin's construction of an automaton from a regular |
|
2232 expression using partial derivatives. This took approximately 10600 lines |
|
2233 of code. The estimate for our formalisation is that we |
2224 needed approximately 3 months and this included the time to find our proof |
2234 needed approximately 3 months and this included the time to find our proof |
2225 arguments. Unlike Constable et al, who were able to follow the proofs from |
2235 arguments. Unlike Constable et al, who were able to follow the Myhill-Nerode |
2226 \cite{HopcroftUllman69}, we had to find our own arguments. So for us the |
2236 proof from \cite{HopcroftUllman69}, we had to find our own arguments. So for us the |
2227 formalisation was not the bottleneck. It is hard to gauge the size of a |
2237 formalisation was not the bottleneck. It is hard to gauge the size of a |
2228 formalisation in Nurpl, but from what is shown in the Nuprl Math Library |
2238 formalisation in Nurpl, but from what is shown in the Nuprl Math Library |
2229 about their development it seems substantially larger than ours. The code of |
2239 about their development it seems substantially larger than ours. We attribute |
2230 ours can be found in the Mercurial Repository at |
2240 this to our use of regular expressions, which meant we did not need to `fight' |
2231 \mbox{\url{http://www4.in.tum.de/~urbanc/regexp.html}}.\medskip |
2241 the theorem prover. The code of |
|
2242 our formalisation can be found in the Archive of Formal Proofs at |
|
2243 \mbox{\url{http://afp.sf.net/entries/Myhill-Nerode.shtml}} \cite{myhillnerodeafp11}.\medskip |
2232 |
2244 |
2233 \noindent |
2245 \noindent |
2234 {\bf Acknowledgements:} |
2246 {\bf Acknowledgements:} |
2235 We are grateful for the comments we received from Larry |
2247 We are grateful for the comments we received from Larry |
2236 Paulson. |
2248 Paulson. |