\documentclass[12pt]{article}
\usepackage{times}
\usepackage{a4}
\usepackage{hyperref}
\usepackage[comma,square]{natbib}
\usepackage{tikz}
\usetikzlibrary{automata}
\begin{document}
\noindent
{\bf Dear Dale, dear Reviewers,}\bigskip
\noindent
Thank you very much for your time and effort in reviewing this paper. We really appreciate
this. However we passionately believe that the most important aspect of the paper has \underline{\bf no}t been taken into account
when reaching your decision, and that Referee No 3, who we assume has extensive expertise in
regular languages, has proved our point in his review. He brilliantly writes:
\begin{quote}\it
``the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
(e.g., see the one given on wikipedia) ; being able to formalise it
is not a real challenge from my point of view.''
\end{quote}
\noindent
The problem is it \underline{\bf is} a challenge in theorem provers! And the
real value of the paper and its novelty is that this is a surprising fact, even for experts
in regular languages, as the report of Referee No 3 shows. Because it is
surprising, we really think the paper should appear in a more general journal than one
specialised on theorem provers (after all, many theorem prover people already
appreciate the difficulties; they
just did not know of a way how to overcome them elegantly).
We hope you share with us the belief that theorem prover proofs might not actually
be used in all subjects, but they are
the gold standard of proofs and all researchers should be aware of them.\medskip
We therefore kindly ask you to reconsider the decision from this
point of view and also ask you to consider the supporting
numbers, which we provided in the paper in order to prove that formalising automata theory \underline{\bf is} a challenge.
We list them below again for quick reference.
We are also happy to implement every improvement you suggest that make our general point
and proofs clearer.\medskip
We can also give one \underline{\bf new} datapoint: \citet{LammichTuerk12} took up the challenge and have formalised a library for NFAs and DFAs
in approximately 7000 of lines of Isabelle code---this just sets up the framework. For proving the correctness of
Hopcroft's algorithm they need an additional 7000 lines of Isabelle code. In contrast, our
paper describes a formalised proof of the most central property in regular language
theory taking only 1500(!) lines in Isabelle including all definitions and supporting
lemmas. However, the difficulties with formalising automata theory are \underline{\bf not} restricted
to Isabelle, as the Reviewers assert, and that you do not have to take
our word for it, we list below numbers and comments
made by researchers from \underline{\bf Coq}, \underline{\bf Isabelle} and \underline{\bf Nuprl}. We think similar comments would apply to
\underline{\bf Matita}, for example, and therefore our paper is written to be as much as possible
agnostic with respect to a single theorem prover. \medskip
We answer below the specific points the Reviewers raised. We also hope to surprise again
Referee No 3
in that automata complementation is more tricky than it first might appear and
most textbooks would suggest.\bigskip
\newpage
\noindent
\underline{\bf Reviewer 1}\medskip
\begin{itemize}
\item additions with respect to ITP: section 4 gives now a more detailed proof; section 5 and 6 are
new; the introduction and conclusion are extended with more numbers and comments, which should prove
beyond doubt our general point that automata are difficult to reason about and regular expressions are
a better substitute
\item we did {\bf not} cite \citet{OwensReppyTuron09} for the comment that derivatives ``went lost in the sands of time''
but that they proved with numbers and an actual implementation that in the context of lexing, derivatives are
good enough in practice; we believe that this statement is still accurate and it is the only paper that we could
cite for this
\item we would be grateful to receive a concrete pointer to the literature: ``{\it The technique is folklore, and can be found in traditional textbooks (possibly passing through left linear
grammars)}''
\end{itemize}\bigskip
\noindent
\underline{\bf Reviewer 2}\medskip
\begin{itemize}
\item we hope the Reviewer agrees with us that an actual deep analysis of the technical
details and a comparison with Coq/Sreflect would be a major distraction from the main
point that the paper wants to convey;
such an analysis would also be in violation with the editor's guidelines for ToCL
\end{itemize}\bigskip
\noindent
\underline{\bf Reviewer 3}\medskip
\noindent
We honestly enjoyed the comments by Reviewer 3 very much, since they pretty much
agreed with our view we helt before we started this work---automata theory is
more than 50 years old, surely nothing can be challenging there. We hope, however, the
following comments by independent Coq/Nuprl/Isabelle-researchers destroy this view:
\begin{itemize}
\item[] {\bf Numbers and comments from the literature that reasoning about automata is challenging in theorem provers
(our formalisation is 1500 loc):}\medskip
\item[{\bf Coq:}] \citet{Braibant12} formalises automata theory including
Myhill-Nerode: he writes that our formalisation based on
regular expressions ``{\it is more concise}'' than his; he also writes
there are no problems with standard textbook proofs,
except for the ``{\it {\bf intrinsic} difficulties of working with
rectangular matrices}'' in Coq ($>$10k loc)
\item [{\bf Coq:}] \citet{Filliatre97} formalised automata theory using matrices
leading only to Kleene's theorem, but not
Myhill-Nerode nor closure under complementation: he
writes his formalisation based on matrices is
``{\it rather big}'' ($>$4.5k loc)
\item[{\bf Coq:}] \citet{Almeidaetal10} verify Mirkin's partial derivatives
automata ($>$10k lines of code using matrices)
\item [{\bf Nuprl:}] \citet{Constable00} estimate their 4-member team needs
11 months with Nuprl for chapters 1 - 11 from Hopcroft
\& Ullman, which includes Myhill-Nerode ($>$100k lines
of code)
\item[{\bf Isabelle:}] \citet{LammichTuerk12} have in Isabelle formalised an automata
library (7k loc) and verified Hopcroft's algorithm
(additional 7k loc)
\item[{\bf Isabelle:}]
\citet{Nipkow98} formalises automata with functions but writes
``{\it proofs require a painful amount of detail}''
\end{itemize}
\noindent
These numbers and comments should
convincingly negate the premises of the following comments from the judgement
by Reviewer No 3:
\begin{itemize}
\item ``{\it all formalised mathematical results are well known and established
since more than fifty years''} ({\bf we do not contest this nor claim otherwise; we talk about
theorem prover proofs - there issues are interesting and we have not seen
any previous work that took our view of starting with a definition for regular expressions only})
\item ``{\it the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
(e.g., see the one given on wikipedia) ; being able to formalise it
is not a real challenge from my point of view.}'' {(\bf not true)}
\item ``{\it the followed approach is highly specific to Isabelle/HOL: it is
driven by the postulate that it's hard to manipulate automata or
matrices.}'' ({\bf not true})
\item ``{\it despite what the authors claim, I'm certain that a mechanised proof
of this Theorem could be straightforward in other theorem provers,
using automata for the right-to-left direction and partial
derivatives for the left-to-right direction''} {\bf (not true)}
\end{itemize}
\noindent
Reviewer 3 also raised the following criticisms in his judgement:
\begin{itemize}
\item ``{\it Moreover, the authors insist on the fact that they don't want to use
automata, while they actually just rephrase automata-based proofs,
without using the word `automaton'}''
We did not deliberately avoid the word ``automaton'', but the {\bf definition} of what an automaton
is. As soon as we start with such a definition, be it in terms of functions, matrices or graphs, we would be in trouble
with formalised proofs as the loc-numbers above show in comparison to ours.
\item ``{\it All in all, the paper should be presented as `here is a way of doing
automata in HOL'. I'm sure this would make it much clearer and more
interesting, notably since doing automata is supposed to be difficult
in HOL.}''
The fundamental difference is that it is easier in {\bf every} theorem prover to
reason about inductive datatypes in comparison with non-inductively defined structures
(graphs, functions, matrices). The specific problem the paper addresses is that
all textbook proofs of Myhill-Nerode start with a definition of automata leading
to cumbersome theorem prover proofs in {\bf all} theorem provers we are aware of.
\end{itemize}\bigskip
\subsection*{Complementation of Automata}
Finally, we like to comment on the difficulty about complementation of automata, which
on {\bf paper} looks trivial and which highlights the differences between pencil-and-paper textbook
reasoning and formal proofs in a theorem prover:
\begin{quote}\it
``if we take the definition `a language is regular if recognised by
a finite deterministic automaton (DFA)' then closure under
complementation is **trivial** (revert the accepting status of
states, that's it)''
\end{quote}
\noindent
Given a representation of automata as graphs or
matrices, then the following
\begin{center}
\begin{tikzpicture}[scale=3, line width=0.7mm]
\node[state, initial] (q0) at ( 0,1) {$q_0$};
\node[state, accepting] (q1) at ( 1,1) {$q_1$};
\path[->] (q0) edge node[above] {$a$} (q1)
(q1) edge [loop right] node {$b$} ()
;
\end{tikzpicture}
\end{center}
\noindent
fits the definition as a graph or matrix of nodes. However interchanging the terminal and
non-terminal states gives the {\bf incorrect} result. What is needed is the
notion of a {\bf completed} automaton, such as
\begin{center}
\begin{tikzpicture}[scale=3, line width=0.7mm]
\node[state, initial] (q0) at ( 0,1) {$q_0$};
\node[state, accepting] (q1) at ( 1,1) {$q_1$};
\node[state] (q3) at (0.5, 1.5) {$q_{sink}$};
\path[->] (q0) edge node[above] {$a$} (q1)
(q1) edge [loop right] node {$b$} ()
(q0) edge node[above] {$b$} (q3)
(q1) edge node[above] {$a$} (q3)
(q3) edge [loop above] node {$a, b$} ()
;
\end{tikzpicture}
\end{center}
\noindent
This additional restriction unfortunately percolates through formalised proofs. In the work by
\citet{LammichTuerk12}, for example, this completeness constrain is mentioned in 160 places
out of almost 600 lemmas. Since, their representation is based on graphs (sets of nodes and edges), the restriction
about finitely many nodes is mentioned in another 108 places. Similarly, badly fare the definitions
based on matrices in Coq: they need to formalise the shape of matrices and carry around
this baggage in their proofs and operations (which {\bf all} authors of formalised automata theory in
Coq point out as clunky). Also, attempting to imitate the textbook definition of a finite
function of type state $\times$ letter $\rightarrow$ state, which by the way has not been
used by anybody who formalised automata theory in Coq, does not solve the difficulties: since
no theorem prover has a finite-function type built in, the additional constraint about finite
domains need to be carried around, again making proofs clunky.
Regular expressions really solve all the problems mentioned above: they are inductive
datatypes in all theorem provers, they come with simple structural induction principles
for free and operations can be easily defined without any restrictions. The concrete question
the paper answers is whether regular expressions are enough to obtain the central result in
regular language theory side-stepping the definition of automata which, according to
all previous work, leads to cumbersome reasoning in all theorem provers. The answer we give is yes:
If you are in the business of verifying Hopcroft's algorithm there is no way around automata.
But for Myhill-Nerode there is, and the work involved is reduced by roughly a {\bf factor of 5}.
We have not found this answer in any of the textbooks or papers available to us. We are happy to
be proved otherwise.
\bibliographystyle{apa}
\bibliography{root}
\end{document}