2450 Theorem~\ref{antimirov}). However, since partial derivatives use sets of |
2450 Theorem~\ref{antimirov}). However, since partial derivatives use sets of |
2451 regular expressions, one needs to carefully analyse whether the resulting |
2451 regular expressions, one needs to carefully analyse whether the resulting |
2452 algorithm is still executable. Given the infrastructure for |
2452 algorithm is still executable. Given the infrastructure for |
2453 executable sets introduced by \citeN{Haftmann09} in Isabelle/HOL, it should. |
2453 executable sets introduced by \citeN{Haftmann09} in Isabelle/HOL, it should. |
2454 |
2454 |
2455 We started out by claiming that in a theorem prover it is eaiser to reason |
2455 We started out by claiming that in a theorem prover it is eaiser to |
2456 about regular expressions than about automta. Here are some numbers: |
2456 reason about regular expressions than about automta. Here are some |
2457 Our formalisation of the Myhill-Nerode Theorem consists of 780 lines of |
2457 numbers: Our formalisation of the Myhill-Nerode Theorem consists of |
2458 Isabelle/Isar code for the first direction and 460 for the second (the one |
2458 780 lines of Isabelle/Isar code for the first direction and 460 for |
2459 based on tagging-functions), plus around 300 lines of standard material |
2459 the second (the one based on tagging-functions), plus around 300 |
2460 about regular languages. The formalisation of derivatives and partial |
2460 lines of standard material about regular languages. The |
2461 derivatives shown in Section~\ref{derivatives} consists of 390 lines of |
2461 formalisation of derivatives and partial derivatives shown in |
2462 code. The closure properties in Section~\ref{closure} (except Theorem~\ref{subseqreg}) |
2462 Section~\ref{derivatives} consists of 390 lines of code. The |
2463 can be established in |
2463 closure properties in Section~\ref{closure} (except |
2464 100 lines of code. The Continuation Lemma and the non-regularity of @{text "a\<^sup>n b\<^sup>n"} |
2464 Theorem~\ref{subseqreg}) can be established in 100 lines of |
2465 require 70 lines of code. |
2465 code. The Continuation Lemma and the non-regularity of @{text "a\<^sup>n |
2466 The algorithm for solving equational systems, which we |
2466 b\<^sup>n"} require 70 lines of code. The algorithm for solving equational |
2467 used in the first direction, is conceptually relatively simple. Still the |
2467 systems, which we used in the first direction, is conceptually |
2468 use of sets over which the algorithm operates means it is not as easy to |
2468 relatively simple. Still the use of sets over which the algorithm |
2469 formalise as one might wish. However, it seems sets cannot be avoided since |
2469 operates means it is not as easy to formalise as one might |
2470 the `input' of the algorithm consists of equivalence classes and we cannot |
2470 wish. However, it seems sets cannot be avoided since the `input' of |
2471 see how to reformulate the theory so that we can use lists or matrices. Lists would be |
2471 the algorithm consists of equivalence classes and we cannot see how |
2472 much easier to reason about, since we can define functions over them by |
2472 to reformulate the theory so that we can use lists or |
2473 recursion. For sets we have to use set-comprehensions, which is slightly |
2473 matrices. Lists would be much easier to reason about, since we can |
2474 unwieldy. Matrices would allow us to use the slick formalisation by |
2474 define functions over them by recursion. For sets we have to use |
2475 \citeN{Nipkow11} of the Gauss-Jordan algorithm. |
2475 set-comprehensions, which is slightly unwieldy. Matrices would allow |
2476 |
2476 us to use the slick formalisation by \citeN{Nipkow11} of the |
2477 While our formalisation might appear large, it should be seen |
2477 Gauss-Jordan algorithm. |
2478 in the context of the work done by \citeN{Constable00} who |
2478 |
2479 formalised the Myhill-Nerode Theorem in Nuprl using automata. They write |
2479 % OLD |
2480 that their four-member team would need something on the magnitude of 18 months |
2480 %While our formalisation might appear large, it should be seen in the |
2481 for their formalisation of the first eleven chapters of the textbook by \citeN{HopcroftUllman69}, |
2481 %context of the work done by \citeN{Constable00} who formalised the |
2482 which includes the Myhill-Nerode theorem. It is hard to gauge the size of a |
2482 %Myhill-Nerode Theorem in Nuprl using automata. They write that their |
2483 formalisation in Nurpl, but from what is shown in the Nuprl Math Library |
2483 %four-member team would need something on the magnitude of 18 months |
2484 about their development it seems \emph{substantially} larger than ours. We attribute |
2484 %for their formalisation of the first eleven chapters of the textbook |
2485 this to our use of regular expressions, which meant we did not need to `fight' |
2485 %by \citeN{HopcroftUllman69}, which includes the Myhill-Nerode |
2486 the theorem prover. Recently, \citeN{LammichTuerk12} formalised Hopcroft's |
2486 %theorem. It is hard to gauge the size of a formalisation in Nurpl, |
2487 algorithm in Isabelle/HOL (in 7000 lines of code) using an automata |
2487 %but from what is shown in the Nuprl Math Library about their |
2488 library of 27000 lines of code. |
2488 %development it seems \emph{substantially} larger than ours. We |
2489 Also, \citeN{Filliatre97} reports that his formalisation in |
2489 %attribute this to our use of regular expressions, which meant we did |
2490 Coq of automata theory and Kleene's theorem is ``rather big''. |
2490 %not need to `fight' the theorem prover. |
2491 \citeN{Almeidaetal10} reported about another |
2491 |
2492 formalisation of regular languages in Coq. Their |
2492 %%% NEW |
2493 main result is the |
2493 While our formalisation might appear large, it should be seen in the |
|
2494 context of the work done by \citeN{Constable00} who formalised the |
|
2495 Myhill-Nerode Theorem in Nuprl using automata. They choose to formalise the |
|
2496 this theorem, because it gives them state minimization of automata |
|
2497 as a corollary. It is hard to gauge the size of a formalisation in Nurpl, |
|
2498 but from what is shown in the Nuprl Math Library about this |
|
2499 development it seems \emph{substantially} larger than ours. We |
|
2500 attribute this to our use of regular expressions, which meant we did |
|
2501 not need to `fight' the theorem prover. |
|
2502 % |
|
2503 Recently, |
|
2504 \citeN{LammichTuerk12} formalised Hopcroft's algorithm in |
|
2505 Isabelle/HOL (in 7000 lines of code) using an automata library of |
|
2506 27000 lines of code. Also, \citeN{Filliatre97} reports that his |
|
2507 formalisation in Coq of automata theory and Kleene's theorem is |
|
2508 ``rather big''. \citeN{Almeidaetal10} reported about another |
|
2509 formalisation of regular languages in Coq. Their main result is the |
2494 correctness of Mirkin's construction of an automaton from a regular |
2510 correctness of Mirkin's construction of an automaton from a regular |
2495 expression using partial derivatives. This took approximately 10600 lines |
2511 expression using partial derivatives. This took approximately 10600 |
2496 of code. \citeN{Braibant12} formalised a large part of regular language |
2512 lines of code. \citeN{Braibant12} formalised a large part of |
2497 theory and Kleene algebras in Coq. While he is mainly interested |
2513 regular language theory and Kleene algebras in Coq. While he is |
2498 in implementing decision procedures for Kleene algebras, his library |
2514 mainly interested in implementing decision procedures for Kleene |
2499 includes a proof of the Myhill-Nerode theorem. He reckons that our |
2515 algebras, his library includes a proof of the Myhill-Nerode |
2500 ``development is more concise'' than his one based on matrices \cite[Page 67]{Braibant12}. |
2516 theorem. He reckons that our ``development is more concise'' than |
2501 He writes that there is no conceptual problems with formally reasoning |
2517 his one based on matrices \cite[Page 67]{Braibant12}. He writes |
2502 about matrices for automata, but notes ``intrinsic difficult[ies]'' when working |
2518 that there is no conceptual problems with formally reasoning about |
2503 with matrices in Coq, which is the sort of `fighting' one would encounter also |
2519 matrices for automata, but notes ``intrinsic difficult[ies]'' when |
2504 in other theorem provers. |
2520 working with matrices in Coq, which is the sort of `fighting' one |
2505 |
2521 would encounter also in other theorem provers. |
2506 In terms of time, the estimate for our formalisation is that we |
2522 In terms of time, the estimate for our formalisation is that we |
2507 needed approximately 3 months and this included the time to find our proof |
2523 needed approximately 3 months and this included the time to find our proof |
2508 arguments. Unlike \citeN{Constable00}, who were able to follow the Myhill-Nerode |
2524 arguments. Unlike \citeN{Constable00}, who were able to follow the Myhill-Nerode |
2509 proof by \citeN{HopcroftUllman69}, we had to find our own arguments. So for us the |
2525 proof by \citeN{HopcroftUllman69}, we had to find our own arguments. |
2510 formalisation was not the bottleneck. The code of |
2526 So for us the formalisation was not the bottleneck. The conclusion we draw |
|
2527 from all these comparisons is that if one is interested in formalising |
|
2528 results from regular language theory, not results from automata theory, |
|
2529 then regular expressions are easier to work with formally. |
|
2530 The code of |
2511 our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at |
2531 our formalisation \cite{myhillnerodeafp11} can be found in the Archive of Formal Proofs at |
2512 \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}.\smallskip |
2532 \mbox{\url{http://afp.sourceforge.net/entries/Myhill-Nerode.shtml}}.\smallskip |
2513 |
2533 |
2514 \noindent |
2534 \noindent |
2515 {\bf Acknowledgements:} |
2535 {\bf Acknowledgements:} |