\documentclass[12pt]{article}\usepackage{times}\usepackage{a4}\usepackage{hyperref}\usepackage[comma,square]{natbib}\usepackage{tikz}\usetikzlibrary{automata}\begin{document}\noindent{\bf Dear Dale, dear Reviewers,}\bigskip\noindentThank you very much for your time and effort in reviewing this paper. We really appreciatethis. However we passionately believe that the most important aspect of the paper has \underline{\bf no}t been taken into account when reaching your decision, and that Referee No 3, who we assume has extensive expertise in regular languages, has proved our point in his review. He brilliantly writes:\begin{quote}\it``the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof (e.g., see the one given on wikipedia) ; being able to formalise it is not a real challenge from my point of view.''\end{quote}\noindentThe problem is it \underline{\bf is} a challenge in theorem provers! And the real value of the paper and its novelty is that this is a surprising fact, even for expertsin regular languages, as the report of Referee No 3 shows. Because it is surprising, we really think the paper should appear in a more general journal than onespecialised on theorem provers (after all, many theorem prover people already appreciate the difficulties; they just did not know of a way how to overcome them elegantly).We hope you share with us the belief that theorem prover proofs might not actually be used in all subjects, but they are the gold standard of proofs and all researchers should be aware of them.\medskipWe therefore kindly ask you to reconsider the decision from thispoint of view and also ask you to consider the supporting numbers, which we provided in the paper in order to prove that formalising automata theory \underline{\bf is} a challenge. We list them below again for quick reference.We are also happy to implement every improvement you suggest that make our general pointand proofs clearer.\medskipWe can also give one \underline{\bf new} datapoint: \citet{LammichTuerk12} took up the challenge and have formalised a library for NFAs and DFAsin approximately 7000 of lines of Isabelle code---this just sets up the framework. For proving the correctness of Hopcroft's algorithm they need an additional 7000 lines of Isabelle code. In contrast, ourpaper describes a formalised proof of the most central property in regular languagetheory taking only 1500(!) lines in Isabelle including all definitions and supportinglemmas. However, the difficulties with formalising automata theory are \underline{\bf not} restricted to Isabelle, as the Reviewers assert, and that you do not have to takeour word for it, we list below numbers and comments made by researchers from \underline{\bf Coq}, \underline{\bf Isabelle} and \underline{\bf Nuprl}. We think similar comments would apply to \underline{\bf Matita}, for example, and therefore our paper is written to be as much as possible agnostic with respect to a single theorem prover. \medskipWe answer below the specific points the Reviewers raised. We also hope to surprise again Referee No 3in that automata complementation is more tricky than it first might appear and most textbooks would suggest.\bigskip\newpage\noindent\underline{\bf Reviewer 1}\medskip\begin{itemize}\item additions with respect to ITP: section 4 gives now a more detailed proof; section 5 and 6 arenew; the introduction and conclusion are extended with more numbers and comments, which should provebeyond doubt our general point that automata are difficult to reason about and regular expressions area better substitute\item we did {\bf not} cite \citet{OwensReppyTuron09} for the comment that derivatives ``went lost in the sands of time''but that they proved with numbers and an actual implementation that in the context of lexing, derivatives aregood enough in practice; we believe that this statement is still accurate and it is the only paper that we couldcite for this\item we would be grateful to receive a concrete pointer to the literature: ``{\it The technique is folklore, and can be found in traditional textbooks (possibly passing through left linear grammars)}''\end{itemize}\bigskip\noindent\underline{\bf Reviewer 2}\medskip\begin{itemize}\item we hope the Reviewer agrees with us that an actual deep analysis of the technicaldetails and a comparison with Coq/Sreflect would be a major distraction from the mainpoint that the paper wants to convey;such an analysis would also be in violation with the editor's guidelines for ToCL\end{itemize}\bigskip\noindent\underline{\bf Reviewer 3}\medskip\noindentWe honestly enjoyed the comments by Reviewer 3 very much, since they pretty much agreed with our view we helt before we started this work---automata theory ismore than 50 years old, surely nothing can be challenging there. We hope, however, thefollowing comments by independent Coq/Nuprl/Isabelle-researchers destroy this view:\begin{itemize}\item[] {\bf Numbers and comments from the literature that reasoning about automata is challenging in theorem provers(our formalisation is 1500 loc):}\medskip\item[{\bf Coq:}] \citet{Braibant12} formalises automata theory including Myhill-Nerode: he writes that our formalisation based on regular expressions ``{\it is more concise}'' than his; he also writes there are no problems with standard textbook proofs, except for the ``{\it {\bf intrinsic} difficulties of working with rectangular matrices}'' in Coq ($>$10k loc)\item [{\bf Coq:}] \citet{Filliatre97} formalised automata theory using matrices leading only to Kleene's theorem, but not Myhill-Nerode nor closure under complementation: he writes his formalisation based on matrices is ``{\it rather big}'' ($>$4.5k loc)\item[{\bf Coq:}] \citet{Almeidaetal10} verify Mirkin's partial derivatives automata ($>$10k lines of code using matrices)\item [{\bf Nuprl:}] \citet{Constable00} estimate their 4-member team needs 11 months with Nuprl for chapters 1 - 11 from Hopcroft \& Ullman, which includes Myhill-Nerode ($>$100k lines of code)\item[{\bf Isabelle:}] \citet{LammichTuerk12} have in Isabelle formalised an automata library (7k loc) and verified Hopcroft's algorithm (additional 7k loc)\item[{\bf Isabelle:}] \citet{Nipkow98} formalises automata with functions but writes ``{\it proofs require a painful amount of detail}''\end{itemize}\noindentThese numbers and comments should convincingly negate the premises of the following comments from the judgementby Reviewer No 3:\begin{itemize}\item ``{\it all formalised mathematical results are well known and established since more than fifty years''} ({\bf we do not contest this nor claim otherwise; we talk about theorem prover proofs - there issues are interesting and we have not seen any previous work that took our view of starting with a definition for regular expressions only})\item ``{\it the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof (e.g., see the one given on wikipedia) ; being able to formalise it is not a real challenge from my point of view.}'' {(\bf not true)}\item ``{\it the followed approach is highly specific to Isabelle/HOL: it is driven by the postulate that it's hard to manipulate automata or matrices.}'' ({\bf not true})\item ``{\it despite what the authors claim, I'm certain that a mechanised proof of this Theorem could be straightforward in other theorem provers, using automata for the right-to-left direction and partial derivatives for the left-to-right direction''} {\bf (not true)}\end{itemize}\noindentReviewer 3 also raised the following criticisms in his judgement:\begin{itemize}\item ``{\it Moreover, the authors insist on the fact that they don't want to useautomata, while they actually just rephrase automata-based proofs,without using the word `automaton'}''We did not deliberately avoid the word ``automaton'', but the {\bf definition} of what an automatonis. As soon as we start with such a definition, be it in terms of functions, matrices or graphs, we would be in trouble with formalised proofs as the loc-numbers above show in comparison to ours.\item ``{\it All in all, the paper should be presented as `here is a way of doingautomata in HOL'. I'm sure this would make it much clearer and moreinteresting, notably since doing automata is supposed to be difficultin HOL.}''The fundamental difference is that it is easier in {\bf every} theorem prover to reason about inductive datatypes in comparison with non-inductively defined structures(graphs, functions, matrices). The specific problem the paper addresses is thatall textbook proofs of Myhill-Nerode start with a definition of automata leading to cumbersome theorem prover proofs in {\bf all} theorem provers we are aware of.\end{itemize}\bigskip\subsection*{Complementation of Automata}Finally, we like to comment on the difficulty about complementation of automata, whichon {\bf paper} looks trivial and which highlights the differences between pencil-and-paper textbook reasoning and formal proofs in a theorem prover:\begin{quote}\it ``if we take the definition `a language is regular if recognised by a finite deterministic automaton (DFA)' then closure under complementation is **trivial** (revert the accepting status of states, that's it)''\end{quote} \noindentGiven a representation of automata as graphs ormatrices, then the following \begin{center}\begin{tikzpicture}[scale=3, line width=0.7mm] \node[state, initial] (q0) at ( 0,1) {$q_0$}; \node[state, accepting] (q1) at ( 1,1) {$q_1$}; \path[->] (q0) edge node[above] {$a$} (q1) (q1) edge [loop right] node {$b$} () ;\end{tikzpicture}\end{center}\noindentfits the definition as a graph or matrix of nodes. However interchanging the terminal andnon-terminal states gives the {\bf incorrect} result. What is needed is thenotion of a {\bf completed} automaton, such as\begin{center}\begin{tikzpicture}[scale=3, line width=0.7mm] \node[state, initial] (q0) at ( 0,1) {$q_0$}; \node[state, accepting] (q1) at ( 1,1) {$q_1$}; \node[state] (q3) at (0.5, 1.5) {$q_{sink}$}; \path[->] (q0) edge node[above] {$a$} (q1) (q1) edge [loop right] node {$b$} () (q0) edge node[above] {$b$} (q3) (q1) edge node[above] {$a$} (q3) (q3) edge [loop above] node {$a, b$} () ;\end{tikzpicture}\end{center}\noindentThis additional restriction unfortunately percolates through formalised proofs. In the work by\citet{LammichTuerk12}, for example, this completeness constrain is mentioned in 160 placesout of almost 600 lemmas. Since, their representation is based on graphs (sets of nodes and edges), the restrictionabout finitely many nodes is mentioned in another 108 places. Similarly, badly fare the definitionsbased on matrices in Coq: they need to formalise the shape of matrices and carry aroundthis baggage in their proofs and operations (which {\bf all} authors of formalised automata theory inCoq point out as clunky). Also, attempting to imitate the textbook definition of a finitefunction of type state $\times$ letter $\rightarrow$ state, which by the way has not been used by anybody who formalised automata theory in Coq, does not solve the difficulties: sinceno theorem prover has a finite-function type built in, the additional constraint about finitedomains need to be carried around, again making proofs clunky.Regular expressions really solve all the problems mentioned above: they are inductivedatatypes in all theorem provers, they come with simple structural induction principles for free and operations can be easily defined without any restrictions. The concrete questionthe paper answers is whether regular expressions are enough to obtain the central result inregular language theory side-stepping the definition of automata which, according toall previous work, leads to cumbersome reasoning in all theorem provers. The answer we give is yes:If you are in the business of verifying Hopcroft's algorithm there is no way around automata.But for Myhill-Nerode there is, and the work involved is reduced by roughly a {\bf factor of 5}.We have not found this answer in any of the textbooks or papers available to us. We are happy tobe proved otherwise.\bibliographystyle{apa}\bibliography{root}\end{document}