diff -r 48b231495281 -r 2c56b20032a7 Journal/Paper.thy --- a/Journal/Paper.thy Mon Oct 15 13:23:52 2012 +0000 +++ b/Journal/Paper.thy Mon Dec 03 08:16:58 2012 +0000 @@ -337,8 +337,11 @@ then be careful to rename these identities apart whenever connecting two automata. This results in clunky proofs establishing that properties are invariant under renaming. Similarly, connecting two automata represented as - matrices results in very adhoc constructions, which are not pleasant to - reason about. + matrices results in messy constructions, which are not pleasant to + formally reason about. \citeN[Page 67]{Braibant12}, for example, writes that there are no + problems with reasoning about matrices, but that there is an + ``intrinsic difficulty of working with rectangular matrices'' in some + parts of his formalisation of formal languages in Coq. Functions are much better supported in Isabelle/HOL, but they still lead to similar problems as with graphs. Composing, for example, two @@ -376,15 +379,23 @@ %\emph{locales} or \emph{type classes}, which are \emph{not} available in all %HOL-based theorem provers. - Because of these problems to do with representing automata, there seems to - be no substantial formalisation of automata theory and regular languages - carried out in HOL-based theorem provers. \citeN{Nipkow98} establishes + Because of these problems to do with representing automata, formalising + automata theory is surprisingly not as easy as one might think, despite the + sometimes very detailed, but informal, textbook proofs. \citeN{LammichTuerk12} + formalised Hopcroft's algorithm using an automata library of 27 kloc in + Isabelle/HOL. There they use matrices for representing automata. + Functions are used by \citeN{Nipkow98}, who establishes the link between regular expressions and automata in the context of - lexing. \citeN{BerghoferReiter09} formalise automata - working over bit strings in the context of Presburger arithmetic. The only - larger formalisations of automata theory are carried out in Nuprl by - \citeN{Constable00} and in Coq by \citeN{Filliatre97}, \citeN{Almeidaetal10} - and \citeN{Braibant12}. + lexing. \citeN{BerghoferReiter09} use functions as well for formalising automata + working over bit strings in the context of Presburger arithmetic. + A Larger formalisation of automata theory, including the Myhill-Nerode theorem, + was carried out in Nuprl by \citeN{Constable00} which also uses functions. + Other large formailsations of automata theory in Coq are by + \citeN{Filliatre97} who essentially uses graphs and by \citeN{Almeidaetal10} + and \citeN{Braibant12}, who both use matrices. Many of these works, + like \citeN{Nipkow98} or \citeN{Braibant12}, mention intrinsic problems with + their representation of + automata which made them `fight' their respective theorem prover. %Also, one might consider automata as just convenient `vehicles' for %establishing properties about regular languages. However, paper proofs @@ -428,7 +439,8 @@ This convenience of regular expressions has recently been exploited in HOL4 with a formalisation of regular expression matching based on derivatives by \citeN{OwensSlind08}, and with an equivalence - checker for regular expressions in Isabelle/HOL by \citeN{KraussNipkow11}. The + checker for regular expressions in Isabelle/HOL by \citeN{KraussNipkow11} + and in Matida by \citeN{Asperti12} and in Coq by \citeN{CoquandSiles12}. The main purpose of this paper is to show that a central result about regular languages---the Myhill-Nerode Theorem---can be recreated by only using regular expressions. This theorem gives necessary and sufficient conditions @@ -1196,7 +1208,8 @@ \noindent Much more interesting, however, are the inductive cases. They seem hard to be solved - directly. The reader is invited to try. + directly. The reader is invited to try.\footnote{The induction hypothesis is not strong enough + to make any progress with the TIME and STAR cases.} In order to see how our proof proceeds consider the following suggestive picture given by \citeN{Constable00}: @@ -2080,7 +2093,8 @@ \noindent and assume @{text B} is any language and @{text A} is regular, then @{term "Deriv_lang B A"} is regular. To see this consider the following argument - using partial derivatives: From @{text A} being regular we know there exists + using partial derivatives (which we used in Section~\ref{derivatives}): From @{text A} + being regular we know there exists a regular expression @{text r} such that @{term "A = lang r"}. We also know that @{term "pderivs_lang B r"} is finite for every language @{text B} and regular expression @{text r} (recall Theorem~\ref{antimirov}). By definition @@ -2253,8 +2267,8 @@ \begin{proof}[of the First Part of Theorem~\ref{subseqreg}] By the second part, we know the right-hand side of \eqref{compl} is regular, which means @{term "- SUBSEQ A"} is regular. But since - we established already that regularity is preserved under complement, also @{term "SUBSEQ A"} - must be regular. + we established already that regularity is preserved under complement (using Myhill-Nerode), + also @{term "SUBSEQ A"} must be regular. \end{proof} Finally we like to show that the Myhill-Nerode Theorem is also convenient for establishing @@ -2418,6 +2432,8 @@ algorithm is still executable. Given the infrastructure for executable sets introduced by \citeN{Haftmann09} in Isabelle/HOL, it should. + We started out by claiming that in a theorem prover it is eaiser to reason + about regular expressions than about automta. Here are some numbers: Our formalisation of the Myhill-Nerode Theorem consists of 780 lines of Isabelle/Isar code for the first direction and 460 for the second (the one based on tagging-functions), plus around 300 lines of standard material @@ -2430,7 +2446,7 @@ The algorithm for solving equational systems, which we used in the first direction, is conceptually relatively simple. Still the use of sets over which the algorithm operates means it is not as easy to - formalise as one might hope. However, it seems sets cannot be avoided since + formalise as one might wish. However, it seems sets cannot be avoided since the `input' of the algorithm consists of equivalence classes and we cannot see how to reformulate the theory so that we can use lists or matrices. Lists would be much easier to reason about, since we can define functions over them by @@ -2445,29 +2461,35 @@ for their formalisation of the first eleven chapters of the textbook by \citeN{HopcroftUllman69}, which includes the Myhill-Nerode theorem. It is hard to gauge the size of a formalisation in Nurpl, but from what is shown in the Nuprl Math Library - about their development it seems substantially larger than ours. We attribute + about their development it seems \emph{substantially} larger than ours. We attribute this to our use of regular expressions, which meant we did not need to `fight' - the theorem prover. + the theorem prover. Recently, \citeN{LammichTuerk12} formalised Hopcroft's + algorithm in Isabelle/HOL (in 7000 lines of code) using an automata + library of 27000 lines of code. Also, \citeN{Filliatre97} reports that his formalisation in Coq of automata theory and Kleene's theorem is ``rather big''. - More recently, \citeN{Almeidaetal10} reported about another + \citeN{Almeidaetal10} reported about another formalisation of regular languages in Coq. Their main result is the correctness of Mirkin's construction of an automaton from a regular expression using partial derivatives. This took approximately 10600 lines - of code. Also \citeN{Braibant12} formalised a large part of regular language + of code. \citeN{Braibant12} formalised a large part of regular language theory and Kleene algebras in Coq. While he is mainly interested in implementing decision procedures for Kleene algebras, his library includes a proof of the Myhill-Nerode theorem. He reckons that our ``development is more concise'' than his one based on matrices \cite[Page 67]{Braibant12}. + He writes that there is no conceptual problems with formally reasoning + about matrices for automata, but notes ``intrinsic difficult[ies]'' when working + with matrices in Coq, which is the sort of `fighting' one would encounter also + in other theorem provers. + In terms of time, the estimate for our formalisation is that we needed approximately 3 months and this included the time to find our proof arguments. Unlike \citeN{Constable00}, who were able to follow the Myhill-Nerode proof by \citeN{HopcroftUllman69}, we had to find our own arguments. So for us the formalisation was not the bottleneck. The code of - our formalisation can be found in the Archive of Formal Proofs at - \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}} - \cite{myhillnerodeafp11}.\smallskip + our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at + \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}.\smallskip \noindent {\bf Acknowledgements:}