diff -r 0679a84b11ad -r 01d223421ba0 Journal/Response/response.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Journal/Response/response.tex Wed Dec 12 11:45:04 2012 +0000 @@ -0,0 +1,255 @@ +\documentclass[12pt]{article} +\usepackage{times} +\usepackage{a4} +\usepackage{hyperref} +\usepackage[comma,square]{natbib} +\usepackage{tikz} +\usetikzlibrary{automata} + +\begin{document} + +\noindent +{\bf Dear Dale, dear Reviewers,}\bigskip + +\noindent +Thank you very much for your time and effort in reviewing this paper. We really appreciate +this. However we passionately believe that the most important aspect of the paper has \underline{\bf no}t been taken into account +when reaching your decision, and that Referee No 3, who we assume has extensive expertise in +regular languages, has proved our point in his review. He brilliantly writes: + +\begin{quote}\it +``the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof + (e.g., see the one given on wikipedia) ; being able to formalise it + is not a real challenge from my point of view.'' +\end{quote} + +\noindent +The problem is it \underline{\bf is} a challenge in theorem provers! And the +real value of the paper and its novelty is that this is a surprising fact, even for experts +in regular languages, as the report of Referee No 3 shows. Because it is +surprising, we really think the paper should appear in a more general journal than one +specialised on theorem provers (after all, many theorem prover people already +appreciate the difficulties; they +just did not know of a way how to overcome them elegantly). +We hope you share with us the belief that theorem prover proofs might not actually +be used in all subjects, but they are +the gold standard of proofs and all researchers should be aware of them.\medskip + +We therefore kindly ask you to reconsider the decision from this +point of view and also ask you to consider the supporting +numbers, which we provided in the paper in order to prove that formalising automata theory \underline{\bf is} a challenge. +We list them below again for quick reference. +We are also happy to implement every improvement you suggest that make our general point +and proofs clearer.\medskip + +We can also give one \underline{\bf new} datapoint: \citet{LammichTuerk12} took up the challenge and have formalised a library for NFAs and DFAs +in approximately 7000 of lines of Isabelle code---this just sets up the framework. For proving the correctness of +Hopcroft's algorithm they need an additional 7000 lines of Isabelle code. In contrast, our +paper describes a formalised proof of the most central property in regular language +theory taking only 1500(!) lines in Isabelle including all definitions and supporting +lemmas. However, the difficulties with formalising automata theory are \underline{\bf not} restricted +to Isabelle, as the Reviewers assert, and that you do not have to take +our word for it, we list below numbers and comments +made by researchers from \underline{\bf Coq}, \underline{\bf Isabelle} and \underline{\bf Nuprl}. We think similar comments would apply to +\underline{\bf Matita}, for example, and therefore our paper is written to be as much as possible +agnostic with respect to a single theorem prover. \medskip + +We answer below the specific points the Reviewers raised. We also hope to surprise again + Referee No 3 +in that automata complementation is more tricky than it first might appear and +most textbooks would suggest.\bigskip + + +\newpage +\noindent +\underline{\bf Reviewer 1}\medskip + +\begin{itemize} +\item additions with respect to ITP: section 4 gives now a more detailed proof; section 5 and 6 are +new; the introduction and conclusion are extended with more numbers and comments, which should prove +beyond doubt our general point that automata are difficult to reason about and regular expressions are +a better substitute +\item we did {\bf not} cite \citet{OwensReppyTuron09} for the comment that derivatives ``went lost in the sands of time'' +but that they proved with numbers and an actual implementation that in the context of lexing, derivatives are +good enough in practice; we believe that this statement is still accurate and it is the only paper that we could +cite for this +\item we would be grateful to receive a concrete pointer to the literature: ``{\it The technique is folklore, and can be found in traditional textbooks (possibly passing through left linear +grammars)}'' +\end{itemize}\bigskip + +\noindent +\underline{\bf Reviewer 2}\medskip + +\begin{itemize} +\item we hope the Reviewer agrees with us that an actual deep analysis of the technical +details and a comparison with Coq/Sreflect would be a major distraction from the main +point that the paper wants to convey; +such an analysis would also be in violation with the editor's guidelines for ToCL +\end{itemize}\bigskip + +\noindent +\underline{\bf Reviewer 3}\medskip + +\noindent +We honestly enjoyed the comments by Reviewer 3 very much, since they pretty much +agreed with our view we helt before we started this work---automata theory is +more than 50 years old, surely nothing can be challenging there. We hope, however, the +following comments by independent Coq/Nuprl/Isabelle-researchers destroy this view: + +\begin{itemize} +\item[] {\bf Numbers and comments from the literature that reasoning about automata is challenging in theorem provers +(our formalisation is 1500 loc):}\medskip + +\item[{\bf Coq:}] \citet{Braibant12} formalises automata theory including + Myhill-Nerode: he writes that our formalisation based on + regular expressions ``{\it is more concise}'' than his; he also writes + there are no problems with standard textbook proofs, + except for the ``{\it {\bf intrinsic} difficulties of working with + rectangular matrices}'' in Coq ($>$10k loc) + + +\item [{\bf Coq:}] \citet{Filliatre97} formalised automata theory using matrices + leading only to Kleene's theorem, but not + Myhill-Nerode nor closure under complementation: he + writes his formalisation based on matrices is + ``{\it rather big}'' ($>$4.5k loc) + +\item[{\bf Coq:}] \citet{Almeidaetal10} verify Mirkin's partial derivatives + automata ($>$10k lines of code using matrices) + +\item [{\bf Nuprl:}] \citet{Constable00} estimate their 4-member team needs + 11 months with Nuprl for chapters 1 - 11 from Hopcroft + \& Ullman, which includes Myhill-Nerode ($>$100k lines + of code) + +\item[{\bf Isabelle:}] \citet{LammichTuerk12} have in Isabelle formalised an automata + library (7k loc) and verified Hopcroft's algorithm + (additional 7k loc) + +\item[{\bf Isabelle:}] + \citet{Nipkow98} formalises automata with functions but writes + ``{\it proofs require a painful amount of detail}'' +\end{itemize} + +\noindent +These numbers and comments should +convincingly negate the premises of the following comments from the judgement +by Reviewer No 3: + +\begin{itemize} +\item ``{\it all formalised mathematical results are well known and established + since more than fifty years''} ({\bf we do not contest this nor claim otherwise; we talk about + theorem prover proofs - there issues are interesting and we have not seen + any previous work that took our view of starting with a definition for regular expressions only}) +\item ``{\it the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof + (e.g., see the one given on wikipedia) ; being able to formalise it + is not a real challenge from my point of view.}'' {(\bf not true)} +\item ``{\it the followed approach is highly specific to Isabelle/HOL: it is + driven by the postulate that it's hard to manipulate automata or + matrices.}'' ({\bf not true}) +\item ``{\it despite what the authors claim, I'm certain that a mechanised proof + of this Theorem could be straightforward in other theorem provers, + using automata for the right-to-left direction and partial + derivatives for the left-to-right direction''} {\bf (not true)} +\end{itemize} + +\noindent +Reviewer 3 also raised the following criticisms in his judgement: + +\begin{itemize} +\item ``{\it Moreover, the authors insist on the fact that they don't want to use +automata, while they actually just rephrase automata-based proofs, +without using the word `automaton'}'' + +We did not deliberately avoid the word ``automaton'', but the {\bf definition} of what an automaton +is. As soon as we start with such a definition, be it in terms of functions, matrices or graphs, we would be in trouble +with formalised proofs as the loc-numbers above show in comparison to ours. + +\item ``{\it All in all, the paper should be presented as `here is a way of doing +automata in HOL'. I'm sure this would make it much clearer and more +interesting, notably since doing automata is supposed to be difficult +in HOL.}'' + +The fundamental difference is that it is easier in {\bf every} theorem prover to +reason about inductive datatypes in comparison with non-inductively defined structures +(graphs, functions, matrices). The specific problem the paper addresses is that +all textbook proofs of Myhill-Nerode start with a definition of automata leading +to cumbersome theorem prover proofs in {\bf all} theorem provers we are aware of. +\end{itemize}\bigskip + +\subsection*{Complementation of Automata} + +Finally, we like to comment on the difficulty about complementation of automata, which +on {\bf paper} looks trivial and which highlights the differences between pencil-and-paper textbook +reasoning and formal proofs in a theorem prover: + +\begin{quote}\it +``if we take the definition `a language is regular if recognised by + a finite deterministic automaton (DFA)' then closure under + complementation is **trivial** (revert the accepting status of + states, that's it)'' +\end{quote} + +\noindent +Given a representation of automata as graphs or +matrices, then the following + +\begin{center} +\begin{tikzpicture}[scale=3, line width=0.7mm] + \node[state, initial] (q0) at ( 0,1) {$q_0$}; + \node[state, accepting] (q1) at ( 1,1) {$q_1$}; + \path[->] (q0) edge node[above] {$a$} (q1) + (q1) edge [loop right] node {$b$} () + ; +\end{tikzpicture} +\end{center} + +\noindent +fits the definition as a graph or matrix of nodes. However interchanging the terminal and +non-terminal states gives the {\bf incorrect} result. What is needed is the +notion of a {\bf completed} automaton, such as + +\begin{center} +\begin{tikzpicture}[scale=3, line width=0.7mm] + \node[state, initial] (q0) at ( 0,1) {$q_0$}; + \node[state, accepting] (q1) at ( 1,1) {$q_1$}; + \node[state] (q3) at (0.5, 1.5) {$q_{sink}$}; + + \path[->] (q0) edge node[above] {$a$} (q1) + (q1) edge [loop right] node {$b$} () + (q0) edge node[above] {$b$} (q3) + (q1) edge node[above] {$a$} (q3) + (q3) edge [loop above] node {$a, b$} () + ; +\end{tikzpicture} +\end{center} + +\noindent +This additional restriction unfortunately percolates through formalised proofs. In the work by +\citet{LammichTuerk12}, for example, this completeness constrain is mentioned in 160 places +out of almost 600 lemmas. Since, their representation is based on graphs (sets of nodes and edges), the restriction +about finitely many nodes is mentioned in another 108 places. Similarly, badly fare the definitions +based on matrices in Coq: they need to formalise the shape of matrices and carry around +this baggage in their proofs and operations (which {\bf all} authors of formalised automata theory in +Coq point out as clunky). Also, attempting to imitate the textbook definition of a finite +function of type state $\times$ letter $\rightarrow$ state, which by the way has not been +used by anybody who formalised automata theory in Coq, does not solve the difficulties: since +no theorem prover has a finite-function type built in, the additional constraint about finite +domains need to be carried around, again making proofs clunky. + +Regular expressions really solve all the problems mentioned above: they are inductive +datatypes in all theorem provers, they come with simple structural induction principles +for free and operations can be easily defined without any restrictions. The concrete question +the paper answers is whether regular expressions are enough to obtain the central result in +regular language theory side-stepping the definition of automata which, according to +all previous work, leads to cumbersome reasoning in all theorem provers. The answer we give is yes: +If you are in the business of verifying Hopcroft's algorithm there is no way around automata. +But for Myhill-Nerode there is, and the work involved is reduced by roughly a {\bf factor of 5}. +We have not found this answer in any of the textbooks or papers available to us. We are happy to +be proved otherwise. + + +\bibliographystyle{apa} +\bibliography{root} + +\end{document}