regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:05da74214979
+:28e98ede8599
 carried out in HOL-based theorem provers. Nipkow \cite{Nipkow98} establishes
 the link between regular expressions and automata in the context of
 lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata
 working over bit strings in the context of Presburger arithmetic.  The only
 larger formalisations of automata theory are carried out in Nuprl
-\cite{Constable00} and in Coq \cite{Filliatre97}.
+\cite{Constable00} and in Coq, e.g.~\cite{Filliatre97,Almeidaetal10}.
 Also one might consider automata theory and regular languages as a meticulously
 researched subject where everything is crystal clear. However, paper proofs about
 automata often involve subtle side-conditions which are easily overlooked,
 but which make formal reasoning rather painful. For example Kozen's proof of
 the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}.
 Property @{text "(iv)"} states that a non-empty string in @{term "A\<star>"} can
 always be split up into a non-empty prefix belonging to @{text "A"} and the
 rest being in @{term "A\<star>"}. We omit
 the proofs for these properties, but invite the reader to consult our
-formalisation.\footnote{Available at \url{http://www4.in.tum.de/~urbanc/regexp.html}}
+formalisation.\footnote{Available in the Archive of Formal Proofs at
+\url{http://afp.sf.net/entries/Myhill-Nerode.shtml}
+\cite{myhillnerodeafp11}.}
 The notation in Isabelle/HOL for the quotient of a language @{text A}
 according to an equivalence relation @{term REL} is @{term "A // REL"}. We
 will write @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined as
 \mbox{@{text "{y | y \<approx> x}"}}, and have @{text "x \<approx> y"} if and only if @{text
 Myhill-Nerode theorem, we have to be able to establish that for the
 corresponding language there are only finitely many derivatives---thus
 ensuring that there are only finitely many equivalence
 classes. Unfortunately, this is not true in general. Sakarovitch gives an
 example where a regular expression has infinitely many derivatives
-w.r.t.~the language \mbox{@{term "({a} \<cdot> {b})\<star> \<union> ({a} \<cdot> {b})\<star> \<cdot> {a}"}}
+w.r.t.~the language @{text "(ab)\<^isup>\<star> \<union> (ab)\<^isup>\<star>a"}, which is formally
+written in our notation as \mbox{@{text "{[a,b]}\<^isup>\<star> \<union> ({[a,b]}\<^isup>\<star> \<cdot> {[a]})"}}
 (see \cite[Page~141]{Sakarovitch09}).
 What Brzozowski \cite{Brzozowski64} established is that for every language there
 \emph{are} only finitely `dissimilar' derivatives for a regular
 terms of already existing notions. A convenient definition for automata
 (based on graphs) uses a polymorphic type for the state nodes. This allows
 us to use the standard operation for disjoint union whenever we need to compose two
 automata. Unfortunately, we cannot use such a polymorphic definition
 in HOL as part of the definition for regularity of a language (a predicate
-over set of strings).  Consider the following attempt:
+over set of strings).  Consider for example the following attempt:
 \begin{center}
 @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
 \end{center}
 While our formalisation might appear large, it should be seen
 in the context of the work done by Constable at al \cite{Constable00} who
 formalised the Myhill-Nerode theorem in Nuprl using automata. They write
 that their four-member team needed something on the magnitude of 18 months
-for their formalisation. The estimate for our formalisation is that we
+for their formalisation. Also, Filli\^atre reports that his formalisation in
+Coq of automata theory and Kleene's theorem is ``rather big''.
+\cite{Filliatre97} More recently, Almeida et al reported about another
+formalisation of regular languages in Coq \cite{Almeidaetal10}. Their
+main result is the
+correctness of Mirkin's construction of an automaton from a regular
+expression using partial derivatives. This took approximately 10600 lines
+of code.  The estimate for our formalisation is that we
 needed approximately 3 months and this included the time to find our proof
-arguments. Unlike Constable et al, who were able to follow the proofs from
+arguments. Unlike Constable et al, who were able to follow the Myhill-Nerode
-\cite{HopcroftUllman69}, we had to find our own arguments.  So for us the
+proof from \cite{HopcroftUllman69}, we had to find our own arguments.  So for us the
 formalisation was not the bottleneck. It is hard to gauge the size of a
 formalisation in Nurpl, but from what is shown in the Nuprl Math Library
-about their development it seems substantially larger than ours. The code of
+about their development it seems substantially larger than ours. We attribute
-ours can be found in the Mercurial Repository at
+this to our use of regular expressions, which meant we did not need to `fight'
-\mbox{\url{http://www4.in.tum.de/~urbanc/regexp.html}}.\medskip
+the theorem prover. The code of
+our formalisation can be found in the Archive of Formal Proofs at
+\mbox{\url{http://afp.sf.net/entries/Myhill-Nerode.shtml}} \cite{myhillnerodeafp11}.\medskip
 \noindent
 {\bf Acknowledgements:}
 We are grateful for the comments we received from Larry
 Paulson.

changeset 218	28e98ede8599
parent 217	05da74214979
child 233	e2dc11e12e0b