regexp: comparison Journal/Response/response.tex

equal deleted inserted replaced

-:0679a84b11ad
+:01d223421ba0
+\documentclass[12pt]{article}
+\usepackage{times}
+\usepackage{a4}
+\usepackage{hyperref}
+\usepackage[comma,square]{natbib}
+\usepackage{tikz}
+\usetikzlibrary{automata}
+\begin{document}
+\noindent
+{\bf Dear Dale, dear Reviewers,}\bigskip
+\noindent
+Thank you very much for your time and effort in reviewing this paper. We really appreciate
+this. However we passionately believe that the most important aspect of the paper has \underline{\bf no}t been taken into account
+when reaching your decision, and that Referee No 3, who we assume has extensive expertise in
+regular languages, has proved our point in his review. He brilliantly writes:
+\begin{quote}\it
+``the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
+(e.g., see the one given on wikipedia) ; being able to formalise it
+is not a real challenge from my point of view.''
+\end{quote}
+\noindent
+The problem is it \underline{\bf is} a challenge in theorem provers!  And the
+real value of the paper and its novelty is that this is a surprising fact, even for experts
+in regular languages, as the report of Referee No 3 shows. Because it is
+surprising, we really think the paper should appear in a more general journal than one
+specialised on theorem provers (after all, many theorem prover people already
+appreciate the difficulties; they
+just did not know of a way how to overcome them elegantly).
+We hope you share with us the belief that theorem prover proofs might not actually
+be used in all subjects, but they are
+the gold standard of proofs and all researchers should be aware of them.\medskip
+We therefore kindly ask you to reconsider the decision from this
+point of view and also ask you to consider the supporting
+numbers, which we provided in the paper in order to prove that formalising automata theory \underline{\bf is} a challenge.
+We list them below again for quick reference.
+We are also happy to implement every improvement you suggest that make our general point
+and proofs clearer.\medskip
+We can also give one \underline{\bf new} datapoint: \citet{LammichTuerk12} took up the challenge and have formalised a library for NFAs and DFAs
+in approximately 7000 of lines of Isabelle code---this just sets up the framework. For proving the correctness of
+Hopcroft's algorithm they need an additional 7000 lines of Isabelle code. In contrast, our
+paper describes a formalised proof of the most central property in regular language
+theory taking only 1500(!) lines in Isabelle including all definitions and supporting
+lemmas. However, the difficulties with formalising automata theory are \underline{\bf not} restricted
+to Isabelle, as the Reviewers assert, and that you do not have to take
+our word for it, we list below numbers and comments
+made by researchers from \underline{\bf Coq}, \underline{\bf Isabelle} and \underline{\bf Nuprl}. We think similar comments would apply to
+\underline{\bf Matita}, for example,  and therefore our paper is written to be as much as possible
+agnostic with respect to a single theorem prover. \medskip
+We answer below the specific points the Reviewers raised. We also hope to surprise again
+Referee No 3
+in that automata complementation is more tricky than it first might appear and
+most textbooks would suggest.\bigskip
+\newpage
+\noindent
+\underline{\bf Reviewer 1}\medskip
+\begin{itemize}
+\item additions with respect to ITP: section 4 gives now a more detailed proof; section 5 and 6 are
+new; the introduction and conclusion are extended with more numbers and comments, which should prove
+beyond doubt our general point that automata are difficult to reason about and regular expressions are
+a better substitute
+\item we did {\bf not} cite \citet{OwensReppyTuron09} for the comment that derivatives ``went lost in the sands of time''
+but that they proved with numbers and an actual implementation that in the context of lexing, derivatives are
+good enough in practice; we believe that this statement is still accurate and it is the only paper that we could
+cite for this
+\item we would be grateful to receive a concrete pointer to the literature: ``{\it The technique is folklore, and can be found in traditional textbooks (possibly passing through left linear
+grammars)}''
+\end{itemize}\bigskip
+\noindent
+\underline{\bf Reviewer 2}\medskip
+\begin{itemize}
+\item we hope the Reviewer agrees with us that an actual deep analysis of the technical
+details and a comparison with Coq/Sreflect would be a major distraction from the main
+point that the paper wants to convey;
+such an analysis would also be in violation with the editor's guidelines for ToCL
+\end{itemize}\bigskip
+\noindent
+\underline{\bf Reviewer 3}\medskip
+\noindent
+We honestly enjoyed the comments by Reviewer 3 very much, since they pretty much
+agreed with our view we helt before we started this work---automata theory is
+more than 50 years old, surely nothing can be challenging there. We hope, however, the
+following comments by independent Coq/Nuprl/Isabelle-researchers destroy this view:
+\begin{itemize}
+\item[] {\bf Numbers and comments from the literature that reasoning about automata is challenging in theorem provers
+(our formalisation is 1500 loc):}\medskip
+\item[{\bf Coq:}] \citet{Braibant12} formalises automata theory including
+Myhill-Nerode: he writes that our formalisation based on
+regular expressions ``{\it is more concise}'' than his; he also writes
+there are no problems with standard textbook proofs,
+except for the ``{\it {\bf intrinsic} difficulties of working with
+rectangular matrices}'' in Coq ($>$10k loc)
+\item [{\bf Coq:}] \citet{Filliatre97} formalised automata theory using matrices
+leading only to Kleene's theorem, but not
+Myhill-Nerode nor closure under complementation: he
+writes his formalisation based on matrices is
+``{\it rather big}'' ($>$4.5k loc)
+\item[{\bf Coq:}] \citet{Almeidaetal10} verify Mirkin's partial derivatives
+automata ($>$10k lines of code using matrices)
+\item [{\bf Nuprl:}] \citet{Constable00} estimate their 4-member team needs
+11 months with Nuprl for chapters 1 - 11 from Hopcroft
+\& Ullman, which includes Myhill-Nerode ($>$100k lines
+of code)
+\item[{\bf Isabelle:}] \citet{LammichTuerk12} have in Isabelle formalised an automata
+library (7k loc) and verified Hopcroft's algorithm
+(additional 7k loc)
+\item[{\bf Isabelle:}]
+\citet{Nipkow98} formalises automata with functions but writes
+``{\it proofs require a painful amount of detail}''
+\end{itemize}
+\noindent
+These numbers and comments should
+convincingly negate the premises of the following comments from the judgement
+by Reviewer No 3:
+\begin{itemize}
+\item ``{\it all formalised mathematical results are well known and established
+since more than fifty years''} ({\bf we do not contest this nor claim otherwise; we talk about
+theorem prover proofs - there issues are interesting and we have not seen
+any previous work that took our view of starting with a definition for regular expressions only})
+\item ``{\it the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
+(e.g., see the one given on wikipedia) ; being able to formalise it
+is not a real challenge from my point of view.}'' {(\bf not true)}
+\item ``{\it the followed approach is highly specific to Isabelle/HOL: it is
+driven by the postulate that it's hard to manipulate automata or
+matrices.}'' ({\bf not true})
+\item ``{\it despite what the authors claim, I'm certain that a mechanised proof
+of this Theorem could be straightforward in other theorem provers,
+using automata for the right-to-left direction and partial
+derivatives for the left-to-right direction''} {\bf (not true)}
+\end{itemize}
+\noindent
+Reviewer 3 also raised the following criticisms in his judgement:
+\begin{itemize}
+\item ``{\it Moreover, the authors insist on the fact that they don't want to use
+automata, while they actually just rephrase automata-based proofs,
+without using the word `automaton'}''
+We did not deliberately avoid the word ``automaton'', but the {\bf definition} of what an automaton
+is. As soon as we start with such a definition, be it in terms of functions, matrices or graphs, we would be in trouble
+with formalised proofs as the loc-numbers above show in comparison to ours.
+\item ``{\it All in all, the paper should be presented as `here is a way of doing
+automata in HOL'. I'm sure this would make it much clearer and more
+interesting, notably since doing automata is supposed to be difficult
+in HOL.}''
+The fundamental difference is that it is easier in {\bf every} theorem prover to
+reason about inductive datatypes in comparison with non-inductively defined structures
+(graphs, functions, matrices). The specific problem the paper addresses is that
+all textbook proofs of Myhill-Nerode start with a definition of automata leading
+to cumbersome theorem prover proofs in {\bf all} theorem provers we are aware of.
+\end{itemize}\bigskip
+\subsection*{Complementation of Automata}
+Finally, we like to comment on the difficulty about complementation of automata, which
+on {\bf paper} looks trivial and which highlights the differences between pencil-and-paper textbook
+reasoning and formal proofs in a theorem prover:
+\begin{quote}\it
+``if we take the definition `a language is regular if recognised by
+a finite deterministic automaton (DFA)' then closure under
+complementation is **trivial** (revert the accepting status of
+states, that's it)''
+\end{quote}
+\noindent
+Given a representation of automata as graphs or
+matrices, then the following
+\begin{center}
+\begin{tikzpicture}[scale=3, line width=0.7mm]
+\node[state, initial]        (q0) at ( 0,1) {$q_0$};
+\node[state, accepting]  (q1) at ( 1,1) {$q_1$};
+\path[->] (q0) edge node[above] {$a$} (q1)
+(q1) edge [loop right] node {$b$} ()
+;
+\end{tikzpicture}
+\end{center}
+\noindent
+fits the definition as a graph or matrix of nodes. However interchanging the terminal and
+non-terminal states gives the {\bf incorrect} result. What is needed is the
+notion of a {\bf completed} automaton, such as
+\begin{center}
+\begin{tikzpicture}[scale=3, line width=0.7mm]
+\node[state, initial]        (q0) at ( 0,1) {$q_0$};
+\node[state, accepting]  (q1) at ( 1,1) {$q_1$};
+\node[state]        (q3) at (0.5, 1.5) {$q_{sink}$};
+\path[->] (q0) edge node[above] {$a$} (q1)
+(q1) edge [loop right] node {$b$} ()
+(q0) edge node[above] {$b$} (q3)
+(q1) edge node[above] {$a$} (q3)
+(q3) edge [loop above] node {$a, b$} ()
+;
+\end{tikzpicture}
+\end{center}
+\noindent
+This additional restriction unfortunately percolates  through formalised proofs.  In the work by
+\citet{LammichTuerk12}, for example, this completeness constrain is  mentioned in 160 places
+out of almost 600 lemmas. Since, their representation is based on graphs (sets of nodes and edges), the restriction
+about finitely many nodes is mentioned in another 108 places. Similarly, badly fare the definitions
+based on matrices in Coq: they need to formalise the shape of matrices and carry around
+this baggage in their proofs and operations (which {\bf all} authors of formalised automata theory in
+Coq point out as clunky).  Also, attempting to imitate the textbook definition of a finite
+function of type state $\times$ letter $\rightarrow$ state, which by the way has not been
+used by anybody who formalised automata theory in Coq, does not solve the difficulties: since
+no theorem prover has a finite-function type built in, the additional constraint about finite
+domains need to be carried around, again making proofs clunky.
+Regular expressions really solve all the problems mentioned above: they are inductive
+datatypes in all theorem provers, they come with simple structural induction principles
+for free and operations can be easily defined without any restrictions.  The concrete question
+the paper answers is whether regular expressions are enough to obtain the central result in
+regular language theory side-stepping the definition of automata which, according to
+all previous work,  leads to cumbersome reasoning in all theorem provers. The answer we give is yes:
+If you are in the business of verifying Hopcroft's algorithm there is no way around automata.
+But for Myhill-Nerode there is, and the work involved is reduced by roughly a {\bf factor of 5}.
+We have not found this answer in any of the textbooks or papers available to us. We are happy to
+be proved otherwise.
+\bibliographystyle{apa}
+\bibliography{root}
+\end{document}