regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:4f303da0cd2a
+:a0bcf886b8ef
 Theorem~\ref{antimirov}). However, since partial derivatives use sets of
 regular expressions, one needs to carefully analyse whether the resulting
 algorithm is still executable. Given the infrastructure for
 executable sets  introduced by \citeN{Haftmann09} in Isabelle/HOL, it should.
-We started out by claiming that in a theorem prover it is eaiser to reason
+We started out by claiming that in a theorem prover it is eaiser to
-about regular expressions than about automta. Here are some numbers:
+reason about regular expressions than about automta. Here are some
-Our formalisation of the Myhill-Nerode Theorem consists of 780 lines of
+numbers: Our formalisation of the Myhill-Nerode Theorem consists of
-Isabelle/Isar code for the first direction and 460 for the second (the one
+780 lines of Isabelle/Isar code for the first direction and 460 for
-based on tagging-functions), plus around 300 lines of standard material
+the second (the one based on tagging-functions), plus around 300
-about regular languages. The formalisation of derivatives and partial
+lines of standard material about regular languages. The
-derivatives shown in Section~\ref{derivatives} consists of 390 lines of
+formalisation of derivatives and partial derivatives shown in
-code.  The closure properties in Section~\ref{closure} (except Theorem~\ref{subseqreg})
+Section~\ref{derivatives} consists of 390 lines of code.  The
-can be established in
+closure properties in Section~\ref{closure} (except
-100 lines of code. The Continuation Lemma and the non-regularity of @{text "a\<^sup>n b\<^sup>n"}
+Theorem~\ref{subseqreg}) can be established in 100 lines of
-require 70 lines of code.
+code. The Continuation Lemma and the non-regularity of @{text "a\<^sup>n
-The algorithm for solving equational systems, which we
+b\<^sup>n"} require 70 lines of code.  The algorithm for solving equational
-used in the first direction, is conceptually relatively simple. Still the
+systems, which we used in the first direction, is conceptually
-use of sets over which the algorithm operates means it is not as easy to
+relatively simple. Still the use of sets over which the algorithm
-formalise as one might wish. However, it seems sets cannot be avoided since
+operates means it is not as easy to formalise as one might
-the `input' of the algorithm consists of equivalence classes and we cannot
+wish. However, it seems sets cannot be avoided since the `input' of
-see how to reformulate the theory so that we can use lists or matrices. Lists would be
+the algorithm consists of equivalence classes and we cannot see how
-much easier to reason about, since we can define functions over them by
+to reformulate the theory so that we can use lists or
-recursion. For sets we have to use set-comprehensions, which is slightly
+matrices. Lists would be much easier to reason about, since we can
-unwieldy. Matrices would allow us to use the slick formalisation by
+define functions over them by recursion. For sets we have to use
-\citeN{Nipkow11} of the Gauss-Jordan algorithm.
+set-comprehensions, which is slightly unwieldy. Matrices would allow
+us to use the slick formalisation by \citeN{Nipkow11} of the
-While our formalisation might appear large, it should be seen
+Gauss-Jordan algorithm.
-in the context of the work done by \citeN{Constable00} who
-formalised the Myhill-Nerode Theorem in Nuprl using automata. They write
+% OLD
-that their four-member team would need something on the magnitude of 18 months
+%While our formalisation might appear large, it should be seen in the
-for their formalisation of the first eleven chapters of the textbook by \citeN{HopcroftUllman69},
+%context of the work done by \citeN{Constable00} who formalised the
-which includes the Myhill-Nerode theorem. It is hard to gauge the size of a
+%Myhill-Nerode Theorem in Nuprl using automata. They write that their
-formalisation in Nurpl, but from what is shown in the Nuprl Math Library
+%four-member team would need something on the magnitude of 18 months
-about their development it seems \emph{substantially} larger than ours. We attribute
+%for their formalisation of the first eleven chapters of the textbook
-this to our use of regular expressions, which meant we did not need to `fight'
+%by \citeN{HopcroftUllman69}, which includes the Myhill-Nerode
-the theorem prover. Recently, \citeN{LammichTuerk12} formalised Hopcroft's
+%theorem. It is hard to gauge the size of a formalisation in Nurpl,
-algorithm in Isabelle/HOL (in 7000 lines of code) using an automata
+%but from what is shown in the Nuprl Math Library about their
-library of 27000 lines of code.
+%development it seems \emph{substantially} larger than ours. We
-Also, \citeN{Filliatre97} reports that his formalisation in
+%attribute this to our use of regular expressions, which meant we did
-Coq of automata theory and Kleene's theorem is ``rather big''.
+%not need to `fight' the theorem prover.
-\citeN{Almeidaetal10}  reported about another
-formalisation of regular languages in Coq. Their
+%%% NEW
-main result is the
+While our formalisation might appear large, it should be seen in the
+context of the work done by \citeN{Constable00} who formalised the
+Myhill-Nerode Theorem in Nuprl using automata.  They choose to formalise the
+this theorem, because it gives them state minimization of automata
+as a corollary. It is hard to gauge the size of a formalisation in Nurpl,
+but from what is shown in the Nuprl Math Library about this
+development it seems \emph{substantially} larger than ours. We
+attribute this to our use of regular expressions, which meant we did
+not need to `fight' the theorem prover.
+%
+Recently,
+\citeN{LammichTuerk12} formalised Hopcroft's algorithm in
+Isabelle/HOL (in 7000 lines of code) using an automata library of
+27000 lines of code.  Also, \citeN{Filliatre97} reports that his
+formalisation in Coq of automata theory and Kleene's theorem is
+``rather big''.  \citeN{Almeidaetal10} reported about another
+formalisation of regular languages in Coq. Their main result is the
 correctness of Mirkin's construction of an automaton from a regular
-expression using partial derivatives. This took approximately 10600 lines
+expression using partial derivatives. This took approximately 10600
-of code.  \citeN{Braibant12} formalised a large part of regular language
+lines of code.  \citeN{Braibant12} formalised a large part of
-theory and Kleene algebras in Coq. While he is mainly interested
+regular language theory and Kleene algebras in Coq. While he is
-in implementing decision procedures for Kleene algebras, his library
+mainly interested in implementing decision procedures for Kleene
-includes a proof of the Myhill-Nerode theorem. He reckons that our
+algebras, his library includes a proof of the Myhill-Nerode
-``development is more concise'' than his one based on matrices \cite[Page 67]{Braibant12}.
+theorem. He reckons that our ``development is more concise'' than
-He writes that there is no conceptual problems with formally reasoning
+his one based on matrices \cite[Page 67]{Braibant12}.  He writes
-about matrices for automata, but notes ``intrinsic difficult[ies]'' when working
+that there is no conceptual problems with formally reasoning about
-with matrices in Coq, which is the sort of `fighting' one would encounter also
+matrices for automata, but notes ``intrinsic difficult[ies]'' when
-in other theorem provers.
+working with matrices in Coq, which is the sort of `fighting' one
+would encounter also in other theorem provers.
 In terms of time, the estimate for our formalisation is that we
 needed approximately 3 months and this included the time to find our proof
 arguments. Unlike \citeN{Constable00}, who were able to follow the Myhill-Nerode
-proof by \citeN{HopcroftUllman69}, we had to find our own arguments.  So for us the
+proof by \citeN{HopcroftUllman69}, we had to find our own arguments.
-formalisation was not the bottleneck.  The code of
+So for us the formalisation was not the bottleneck. The conclusion we draw
+from all these comparisons is that if one is interested in formalising
+results from regular language theory, not results from automata theory,
+then regular expressions are easier to work with formally.
+The code of
 our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at
 \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}.\smallskip
 \noindent
 {\bf Acknowledgements:}

changeset 378	a0bcf886b8ef
parent 377	4f303da0cd2a
child 379	8c4b6fb43ebe