Journal/Response/response.tex
changeset 374 01d223421ba0
equal deleted inserted replaced
373:0679a84b11ad 374:01d223421ba0
       
     1 \documentclass[12pt]{article}
       
     2 \usepackage{times}
       
     3 \usepackage{a4}
       
     4 \usepackage{hyperref}
       
     5 \usepackage[comma,square]{natbib}
       
     6 \usepackage{tikz}
       
     7 \usetikzlibrary{automata}
       
     8 
       
     9 \begin{document}
       
    10 
       
    11 \noindent
       
    12 {\bf Dear Dale, dear Reviewers,}\bigskip
       
    13 
       
    14 \noindent
       
    15 Thank you very much for your time and effort in reviewing this paper. We really appreciate
       
    16 this. However we passionately believe that the most important aspect of the paper has \underline{\bf no}t been taken into account 
       
    17 when reaching your decision, and that Referee No 3, who we assume has extensive expertise in 
       
    18 regular languages, has proved our point in his review. He brilliantly writes:
       
    19 
       
    20 \begin{quote}\it
       
    21 ``the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
       
    22   (e.g., see the one given on wikipedia) ; being able to formalise it
       
    23   is not a real challenge from my point of view.''
       
    24 \end{quote}
       
    25 
       
    26 \noindent
       
    27 The problem is it \underline{\bf is} a challenge in theorem provers!  And the 
       
    28 real value of the paper and its novelty is that this is a surprising fact, even for experts
       
    29 in regular languages, as the report of Referee No 3 shows. Because it is 
       
    30 surprising, we really think the paper should appear in a more general journal than one
       
    31 specialised on theorem provers (after all, many theorem prover people already 
       
    32 appreciate the difficulties; they 
       
    33 just did not know of a way how to overcome them elegantly).
       
    34 We hope you share with us the belief that theorem prover proofs might not actually 
       
    35 be used in all subjects, but they are 
       
    36 the gold standard of proofs and all researchers should be aware of them.\medskip
       
    37 
       
    38 We therefore kindly ask you to reconsider the decision from this
       
    39 point of view and also ask you to consider the supporting 
       
    40 numbers, which we provided in the paper in order to prove that formalising automata theory \underline{\bf is} a challenge.  
       
    41 We list them below again for quick reference.
       
    42 We are also happy to implement every improvement you suggest that make our general point
       
    43 and proofs clearer.\medskip
       
    44 
       
    45 We can also give one \underline{\bf new} datapoint: \citet{LammichTuerk12} took up the challenge and have formalised a library for NFAs and DFAs
       
    46 in approximately 7000 of lines of Isabelle code---this just sets up the framework. For proving the correctness of 
       
    47 Hopcroft's algorithm they need an additional 7000 lines of Isabelle code. In contrast, our
       
    48 paper describes a formalised proof of the most central property in regular language
       
    49 theory taking only 1500(!) lines in Isabelle including all definitions and supporting
       
    50 lemmas. However, the difficulties with formalising automata theory are \underline{\bf not} restricted 
       
    51 to Isabelle, as the Reviewers assert, and that you do not have to take
       
    52 our word for it, we list below numbers and comments 
       
    53 made by researchers from \underline{\bf Coq}, \underline{\bf Isabelle} and \underline{\bf Nuprl}. We think similar comments would apply to 
       
    54 \underline{\bf Matita}, for example,  and therefore our paper is written to be as much as possible 
       
    55 agnostic with respect to a single theorem prover. \medskip
       
    56 
       
    57 We answer below the specific points the Reviewers raised. We also hope to surprise again
       
    58  Referee No 3
       
    59 in that automata complementation is more tricky than it first might appear and 
       
    60 most textbooks would suggest.\bigskip
       
    61 
       
    62 
       
    63 \newpage
       
    64 \noindent
       
    65 \underline{\bf Reviewer 1}\medskip
       
    66 
       
    67 \begin{itemize}
       
    68 \item additions with respect to ITP: section 4 gives now a more detailed proof; section 5 and 6 are
       
    69 new; the introduction and conclusion are extended with more numbers and comments, which should prove
       
    70 beyond doubt our general point that automata are difficult to reason about and regular expressions are
       
    71 a better substitute
       
    72 \item we did {\bf not} cite \citet{OwensReppyTuron09} for the comment that derivatives ``went lost in the sands of time''
       
    73 but that they proved with numbers and an actual implementation that in the context of lexing, derivatives are
       
    74 good enough in practice; we believe that this statement is still accurate and it is the only paper that we could
       
    75 cite for this
       
    76 \item we would be grateful to receive a concrete pointer to the literature: ``{\it The technique is folklore, and can be found in traditional textbooks (possibly passing through left linear 
       
    77 grammars)}''
       
    78 \end{itemize}\bigskip
       
    79 
       
    80 \noindent
       
    81 \underline{\bf Reviewer 2}\medskip
       
    82 
       
    83 \begin{itemize}
       
    84 \item we hope the Reviewer agrees with us that an actual deep analysis of the technical
       
    85 details and a comparison with Coq/Sreflect would be a major distraction from the main
       
    86 point that the paper wants to convey;
       
    87 such an analysis would also be in violation with the editor's guidelines for ToCL
       
    88 \end{itemize}\bigskip
       
    89 
       
    90 \noindent
       
    91 \underline{\bf Reviewer 3}\medskip
       
    92 
       
    93 \noindent
       
    94 We honestly enjoyed the comments by Reviewer 3 very much, since they pretty much 
       
    95 agreed with our view we helt before we started this work---automata theory is
       
    96 more than 50 years old, surely nothing can be challenging there. We hope, however, the
       
    97 following comments by independent Coq/Nuprl/Isabelle-researchers destroy this view:
       
    98 
       
    99 \begin{itemize}
       
   100 \item[] {\bf Numbers and comments from the literature that reasoning about automata is challenging in theorem provers
       
   101 (our formalisation is 1500 loc):}\medskip
       
   102 
       
   103 \item[{\bf Coq:}] \citet{Braibant12} formalises automata theory including  
       
   104    Myhill-Nerode: he writes that our formalisation based on
       
   105    regular expressions ``{\it is more concise}'' than his; he also writes
       
   106    there are no problems with standard textbook proofs,
       
   107    except for the ``{\it {\bf intrinsic} difficulties of working with 
       
   108    rectangular matrices}'' in Coq ($>$10k loc)
       
   109 
       
   110 
       
   111 \item [{\bf Coq:}] \citet{Filliatre97} formalised automata theory using matrices
       
   112    leading only to Kleene's theorem, but not 
       
   113    Myhill-Nerode nor closure under complementation: he 
       
   114    writes his formalisation based on matrices is 
       
   115    ``{\it rather big}'' ($>$4.5k loc)
       
   116  
       
   117 \item[{\bf Coq:}] \citet{Almeidaetal10} verify Mirkin's partial derivatives
       
   118    automata ($>$10k lines of code using matrices)
       
   119 
       
   120 \item [{\bf Nuprl:}] \citet{Constable00} estimate their 4-member team needs
       
   121    11 months with Nuprl for chapters 1 - 11 from Hopcroft 
       
   122    \& Ullman, which includes Myhill-Nerode ($>$100k lines
       
   123    of code)
       
   124 
       
   125 \item[{\bf Isabelle:}] \citet{LammichTuerk12} have in Isabelle formalised an automata 
       
   126    library (7k loc) and verified Hopcroft's algorithm 
       
   127    (additional 7k loc)
       
   128 
       
   129 \item[{\bf Isabelle:}]  
       
   130  \citet{Nipkow98} formalises automata with functions but writes
       
   131    ``{\it proofs require a painful amount of detail}''
       
   132 \end{itemize}
       
   133 
       
   134 \noindent
       
   135 These numbers and comments should 
       
   136 convincingly negate the premises of the following comments from the judgement
       
   137 by Reviewer No 3:
       
   138 
       
   139 \begin{itemize}
       
   140 \item ``{\it all formalised mathematical results are well known and established
       
   141   since more than fifty years''} ({\bf we do not contest this nor claim otherwise; we talk about 
       
   142   theorem prover proofs - there issues are interesting and we have not seen
       
   143    any previous work that took our view of starting with a definition for regular expressions only})
       
   144 \item ``{\it the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof
       
   145   (e.g., see the one given on wikipedia) ; being able to formalise it
       
   146   is not a real challenge from my point of view.}'' {(\bf not true)}
       
   147 \item ``{\it the followed approach is highly specific to Isabelle/HOL: it is
       
   148   driven by the postulate that it's hard to manipulate automata or
       
   149   matrices.}'' ({\bf not true})
       
   150 \item ``{\it despite what the authors claim, I'm certain that a mechanised proof
       
   151   of this Theorem could be straightforward in other theorem provers, 
       
   152   using automata for the right-to-left direction and partial
       
   153   derivatives for the left-to-right direction''} {\bf (not true)}
       
   154 \end{itemize}
       
   155 
       
   156 \noindent
       
   157 Reviewer 3 also raised the following criticisms in his judgement:
       
   158 
       
   159 \begin{itemize}
       
   160 \item ``{\it Moreover, the authors insist on the fact that they don't want to use
       
   161 automata, while they actually just rephrase automata-based proofs,
       
   162 without using the word `automaton'}''
       
   163 
       
   164 We did not deliberately avoid the word ``automaton'', but the {\bf definition} of what an automaton
       
   165 is. As soon as we start with such a definition, be it in terms of functions, matrices or graphs, we would be in trouble 
       
   166 with formalised proofs as the loc-numbers above show in comparison to ours.
       
   167 
       
   168 \item ``{\it All in all, the paper should be presented as `here is a way of doing
       
   169 automata in HOL'. I'm sure this would make it much clearer and more
       
   170 interesting, notably since doing automata is supposed to be difficult
       
   171 in HOL.}''
       
   172 
       
   173 The fundamental difference is that it is easier in {\bf every} theorem prover to 
       
   174 reason about inductive datatypes in comparison with non-inductively defined structures
       
   175 (graphs, functions, matrices). The specific problem the paper addresses is that
       
   176 all textbook proofs of Myhill-Nerode start with a definition of automata leading 
       
   177 to cumbersome theorem prover proofs in {\bf all} theorem provers we are aware of.
       
   178 \end{itemize}\bigskip
       
   179 
       
   180 \subsection*{Complementation of Automata}
       
   181 
       
   182 Finally, we like to comment on the difficulty about complementation of automata, which
       
   183 on {\bf paper} looks trivial and which highlights the differences between pencil-and-paper textbook 
       
   184 reasoning and formal proofs in a theorem prover:
       
   185 
       
   186 \begin{quote}\it 
       
   187 ``if we take the definition `a language is regular if recognised by
       
   188     a finite deterministic automaton (DFA)' then closure under 
       
   189     complementation is **trivial** (revert the accepting status of 
       
   190     states, that's it)''
       
   191 \end{quote}    
       
   192     
       
   193 \noindent
       
   194 Given a representation of automata as graphs or
       
   195 matrices, then the following 
       
   196 
       
   197 \begin{center}
       
   198 \begin{tikzpicture}[scale=3, line width=0.7mm]
       
   199   \node[state, initial]        (q0) at ( 0,1) {$q_0$};
       
   200   \node[state, accepting]  (q1) at ( 1,1) {$q_1$};
       
   201   \path[->] (q0) edge node[above] {$a$} (q1)
       
   202                    (q1) edge [loop right] node {$b$} ()
       
   203                   ;
       
   204 \end{tikzpicture}
       
   205 \end{center}
       
   206 
       
   207 \noindent
       
   208 fits the definition as a graph or matrix of nodes. However interchanging the terminal and
       
   209 non-terminal states gives the {\bf incorrect} result. What is needed is the
       
   210 notion of a {\bf completed} automaton, such as
       
   211  
       
   212 \begin{center}
       
   213 \begin{tikzpicture}[scale=3, line width=0.7mm]
       
   214   \node[state, initial]        (q0) at ( 0,1) {$q_0$};
       
   215   \node[state, accepting]  (q1) at ( 1,1) {$q_1$};
       
   216   \node[state]        (q3) at (0.5, 1.5) {$q_{sink}$};
       
   217 
       
   218   \path[->] (q0) edge node[above] {$a$} (q1)
       
   219                    (q1) edge [loop right] node {$b$} ()
       
   220                    (q0) edge node[above] {$b$} (q3) 
       
   221                    (q1) edge node[above] {$a$} (q3) 
       
   222                     (q3) edge [loop above] node {$a, b$} ()
       
   223                   ;
       
   224 \end{tikzpicture}
       
   225 \end{center}
       
   226     
       
   227 \noindent
       
   228 This additional restriction unfortunately percolates  through formalised proofs.  In the work by
       
   229 \citet{LammichTuerk12}, for example, this completeness constrain is  mentioned in 160 places
       
   230 out of almost 600 lemmas. Since, their representation is based on graphs (sets of nodes and edges), the restriction
       
   231 about finitely many nodes is mentioned in another 108 places. Similarly, badly fare the definitions
       
   232 based on matrices in Coq: they need to formalise the shape of matrices and carry around
       
   233 this baggage in their proofs and operations (which {\bf all} authors of formalised automata theory in
       
   234 Coq point out as clunky).  Also, attempting to imitate the textbook definition of a finite
       
   235 function of type state $\times$ letter $\rightarrow$ state, which by the way has not been 
       
   236 used by anybody who formalised automata theory in Coq, does not solve the difficulties: since
       
   237 no theorem prover has a finite-function type built in, the additional constraint about finite
       
   238 domains need to be carried around, again making proofs clunky.
       
   239  
       
   240 Regular expressions really solve all the problems mentioned above: they are inductive
       
   241 datatypes in all theorem provers, they come with simple structural induction principles 
       
   242 for free and operations can be easily defined without any restrictions.  The concrete question
       
   243 the paper answers is whether regular expressions are enough to obtain the central result in
       
   244 regular language theory side-stepping the definition of automata which, according to
       
   245 all previous work,  leads to cumbersome reasoning in all theorem provers. The answer we give is yes:
       
   246 If you are in the business of verifying Hopcroft's algorithm there is no way around automata.
       
   247 But for Myhill-Nerode there is, and the work involved is reduced by roughly a {\bf factor of 5}.
       
   248 We have not found this answer in any of the textbooks or papers available to us. We are happy to
       
   249 be proved otherwise.
       
   250 
       
   251     
       
   252 \bibliographystyle{apa}
       
   253 \bibliography{root}
       
   254 
       
   255 \end{document}