--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Journal/Response/response.tex Wed Dec 12 11:45:04 2012 +0000
@@ -0,0 +1,255 @@
+\documentclass[12pt]{article}
+\usepackage{times}
+\usepackage{a4}
+\usepackage{hyperref}
+\usepackage[comma,square]{natbib}
+\usepackage{tikz}
+\usetikzlibrary{automata}
+
+\begin{document}
+
+\noindent
+{\bf Dear Dale, dear Reviewers,}\bigskip
+
+\noindent
+Thank you very much for your time and effort in reviewing this paper. We really appreciate
+this. However we passionately believe that the most important aspect of the paper has \underline{\bf no}t been taken into account
+when reaching your decision, and that Referee No 3, who we assume has extensive expertise in
+regular languages, has proved our point in his review. He brilliantly writes:
+
+\begin{quote}\it
+``the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
+ (e.g., see the one given on wikipedia) ; being able to formalise it
+ is not a real challenge from my point of view.''
+\end{quote}
+
+\noindent
+The problem is it \underline{\bf is} a challenge in theorem provers! And the
+real value of the paper and its novelty is that this is a surprising fact, even for experts
+in regular languages, as the report of Referee No 3 shows. Because it is
+surprising, we really think the paper should appear in a more general journal than one
+specialised on theorem provers (after all, many theorem prover people already
+appreciate the difficulties; they
+just did not know of a way how to overcome them elegantly).
+We hope you share with us the belief that theorem prover proofs might not actually
+be used in all subjects, but they are
+the gold standard of proofs and all researchers should be aware of them.\medskip
+
+We therefore kindly ask you to reconsider the decision from this
+point of view and also ask you to consider the supporting
+numbers, which we provided in the paper in order to prove that formalising automata theory \underline{\bf is} a challenge.
+We list them below again for quick reference.
+We are also happy to implement every improvement you suggest that make our general point
+and proofs clearer.\medskip
+
+We can also give one \underline{\bf new} datapoint: \citet{LammichTuerk12} took up the challenge and have formalised a library for NFAs and DFAs
+in approximately 7000 of lines of Isabelle code---this just sets up the framework. For proving the correctness of
+Hopcroft's algorithm they need an additional 7000 lines of Isabelle code. In contrast, our
+paper describes a formalised proof of the most central property in regular language
+theory taking only 1500(!) lines in Isabelle including all definitions and supporting
+lemmas. However, the difficulties with formalising automata theory are \underline{\bf not} restricted
+to Isabelle, as the Reviewers assert, and that you do not have to take
+our word for it, we list below numbers and comments
+made by researchers from \underline{\bf Coq}, \underline{\bf Isabelle} and \underline{\bf Nuprl}. We think similar comments would apply to
+\underline{\bf Matita}, for example, and therefore our paper is written to be as much as possible
+agnostic with respect to a single theorem prover. \medskip
+
+We answer below the specific points the Reviewers raised. We also hope to surprise again
+ Referee No 3
+in that automata complementation is more tricky than it first might appear and
+most textbooks would suggest.\bigskip
+
+
+\newpage
+\noindent
+\underline{\bf Reviewer 1}\medskip
+
+\begin{itemize}
+\item additions with respect to ITP: section 4 gives now a more detailed proof; section 5 and 6 are
+new; the introduction and conclusion are extended with more numbers and comments, which should prove
+beyond doubt our general point that automata are difficult to reason about and regular expressions are
+a better substitute
+\item we did {\bf not} cite \citet{OwensReppyTuron09} for the comment that derivatives ``went lost in the sands of time''
+but that they proved with numbers and an actual implementation that in the context of lexing, derivatives are
+good enough in practice; we believe that this statement is still accurate and it is the only paper that we could
+cite for this
+\item we would be grateful to receive a concrete pointer to the literature: ``{\it The technique is folklore, and can be found in traditional textbooks (possibly passing through left linear
+grammars)}''
+\end{itemize}\bigskip
+
+\noindent
+\underline{\bf Reviewer 2}\medskip
+
+\begin{itemize}
+\item we hope the Reviewer agrees with us that an actual deep analysis of the technical
+details and a comparison with Coq/Sreflect would be a major distraction from the main
+point that the paper wants to convey;
+such an analysis would also be in violation with the editor's guidelines for ToCL
+\end{itemize}\bigskip
+
+\noindent
+\underline{\bf Reviewer 3}\medskip
+
+\noindent
+We honestly enjoyed the comments by Reviewer 3 very much, since they pretty much
+agreed with our view we helt before we started this work---automata theory is
+more than 50 years old, surely nothing can be challenging there. We hope, however, the
+following comments by independent Coq/Nuprl/Isabelle-researchers destroy this view:
+
+\begin{itemize}
+\item[] {\bf Numbers and comments from the literature that reasoning about automata is challenging in theorem provers
+(our formalisation is 1500 loc):}\medskip
+
+\item[{\bf Coq:}] \citet{Braibant12} formalises automata theory including
+ Myhill-Nerode: he writes that our formalisation based on
+ regular expressions ``{\it is more concise}'' than his; he also writes
+ there are no problems with standard textbook proofs,
+ except for the ``{\it {\bf intrinsic} difficulties of working with
+ rectangular matrices}'' in Coq ($>$10k loc)
+
+
+\item [{\bf Coq:}] \citet{Filliatre97} formalised automata theory using matrices
+ leading only to Kleene's theorem, but not
+ Myhill-Nerode nor closure under complementation: he
+ writes his formalisation based on matrices is
+ ``{\it rather big}'' ($>$4.5k loc)
+
+\item[{\bf Coq:}] \citet{Almeidaetal10} verify Mirkin's partial derivatives
+ automata ($>$10k lines of code using matrices)
+
+\item [{\bf Nuprl:}] \citet{Constable00} estimate their 4-member team needs
+ 11 months with Nuprl for chapters 1 - 11 from Hopcroft
+ \& Ullman, which includes Myhill-Nerode ($>$100k lines
+ of code)
+
+\item[{\bf Isabelle:}] \citet{LammichTuerk12} have in Isabelle formalised an automata
+ library (7k loc) and verified Hopcroft's algorithm
+ (additional 7k loc)
+
+\item[{\bf Isabelle:}]
+ \citet{Nipkow98} formalises automata with functions but writes
+ ``{\it proofs require a painful amount of detail}''
+\end{itemize}
+
+\noindent
+These numbers and comments should
+convincingly negate the premises of the following comments from the judgement
+by Reviewer No 3:
+
+\begin{itemize}
+\item ``{\it all formalised mathematical results are well known and established
+ since more than fifty years''} ({\bf we do not contest this nor claim otherwise; we talk about
+ theorem prover proofs - there issues are interesting and we have not seen
+ any previous work that took our view of starting with a definition for regular expressions only})
+\item ``{\it the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
+ (e.g., see the one given on wikipedia) ; being able to formalise it
+ is not a real challenge from my point of view.}'' {(\bf not true)}
+\item ``{\it the followed approach is highly specific to Isabelle/HOL: it is
+ driven by the postulate that it's hard to manipulate automata or
+ matrices.}'' ({\bf not true})
+\item ``{\it despite what the authors claim, I'm certain that a mechanised proof
+ of this Theorem could be straightforward in other theorem provers,
+ using automata for the right-to-left direction and partial
+ derivatives for the left-to-right direction''} {\bf (not true)}
+\end{itemize}
+
+\noindent
+Reviewer 3 also raised the following criticisms in his judgement:
+
+\begin{itemize}
+\item ``{\it Moreover, the authors insist on the fact that they don't want to use
+automata, while they actually just rephrase automata-based proofs,
+without using the word `automaton'}''
+
+We did not deliberately avoid the word ``automaton'', but the {\bf definition} of what an automaton
+is. As soon as we start with such a definition, be it in terms of functions, matrices or graphs, we would be in trouble
+with formalised proofs as the loc-numbers above show in comparison to ours.
+
+\item ``{\it All in all, the paper should be presented as `here is a way of doing
+automata in HOL'. I'm sure this would make it much clearer and more
+interesting, notably since doing automata is supposed to be difficult
+in HOL.}''
+
+The fundamental difference is that it is easier in {\bf every} theorem prover to
+reason about inductive datatypes in comparison with non-inductively defined structures
+(graphs, functions, matrices). The specific problem the paper addresses is that
+all textbook proofs of Myhill-Nerode start with a definition of automata leading
+to cumbersome theorem prover proofs in {\bf all} theorem provers we are aware of.
+\end{itemize}\bigskip
+
+\subsection*{Complementation of Automata}
+
+Finally, we like to comment on the difficulty about complementation of automata, which
+on {\bf paper} looks trivial and which highlights the differences between pencil-and-paper textbook
+reasoning and formal proofs in a theorem prover:
+
+\begin{quote}\it
+``if we take the definition `a language is regular if recognised by
+ a finite deterministic automaton (DFA)' then closure under
+ complementation is **trivial** (revert the accepting status of
+ states, that's it)''
+\end{quote}
+
+\noindent
+Given a representation of automata as graphs or
+matrices, then the following
+
+\begin{center}
+\begin{tikzpicture}[scale=3, line width=0.7mm]
+ \node[state, initial] (q0) at ( 0,1) {$q_0$};
+ \node[state, accepting] (q1) at ( 1,1) {$q_1$};
+ \path[->] (q0) edge node[above] {$a$} (q1)
+ (q1) edge [loop right] node {$b$} ()
+ ;
+\end{tikzpicture}
+\end{center}
+
+\noindent
+fits the definition as a graph or matrix of nodes. However interchanging the terminal and
+non-terminal states gives the {\bf incorrect} result. What is needed is the
+notion of a {\bf completed} automaton, such as
+
+\begin{center}
+\begin{tikzpicture}[scale=3, line width=0.7mm]
+ \node[state, initial] (q0) at ( 0,1) {$q_0$};
+ \node[state, accepting] (q1) at ( 1,1) {$q_1$};
+ \node[state] (q3) at (0.5, 1.5) {$q_{sink}$};
+
+ \path[->] (q0) edge node[above] {$a$} (q1)
+ (q1) edge [loop right] node {$b$} ()
+ (q0) edge node[above] {$b$} (q3)
+ (q1) edge node[above] {$a$} (q3)
+ (q3) edge [loop above] node {$a, b$} ()
+ ;
+\end{tikzpicture}
+\end{center}
+
+\noindent
+This additional restriction unfortunately percolates through formalised proofs. In the work by
+\citet{LammichTuerk12}, for example, this completeness constrain is mentioned in 160 places
+out of almost 600 lemmas. Since, their representation is based on graphs (sets of nodes and edges), the restriction
+about finitely many nodes is mentioned in another 108 places. Similarly, badly fare the definitions
+based on matrices in Coq: they need to formalise the shape of matrices and carry around
+this baggage in their proofs and operations (which {\bf all} authors of formalised automata theory in
+Coq point out as clunky). Also, attempting to imitate the textbook definition of a finite
+function of type state $\times$ letter $\rightarrow$ state, which by the way has not been
+used by anybody who formalised automata theory in Coq, does not solve the difficulties: since
+no theorem prover has a finite-function type built in, the additional constraint about finite
+domains need to be carried around, again making proofs clunky.
+
+Regular expressions really solve all the problems mentioned above: they are inductive
+datatypes in all theorem provers, they come with simple structural induction principles
+for free and operations can be easily defined without any restrictions. The concrete question
+the paper answers is whether regular expressions are enough to obtain the central result in
+regular language theory side-stepping the definition of automata which, according to
+all previous work, leads to cumbersome reasoning in all theorem provers. The answer we give is yes:
+If you are in the business of verifying Hopcroft's algorithm there is no way around automata.
+But for Myhill-Nerode there is, and the work involved is reduced by roughly a {\bf factor of 5}.
+We have not found this answer in any of the textbooks or papers available to us. We are happy to
+be proved otherwise.
+
+
+\bibliographystyle{apa}
+\bibliography{root}
+
+\end{document}