|
1 \documentclass[12pt]{article} |
|
2 \usepackage{times} |
|
3 \usepackage{a4} |
|
4 \usepackage{hyperref} |
|
5 \usepackage[comma,square]{natbib} |
|
6 \usepackage{tikz} |
|
7 \usetikzlibrary{automata} |
|
8 |
|
9 \begin{document} |
|
10 |
|
11 \noindent |
|
12 {\bf Dear Dale, dear Reviewers,}\bigskip |
|
13 |
|
14 \noindent |
|
15 Thank you very much for your time and effort in reviewing this paper. We really appreciate |
|
16 this. However we passionately believe that the most important aspect of the paper has \underline{\bf no}t been taken into account |
|
17 when reaching your decision, and that Referee No 3, who we assume has extensive expertise in |
|
18 regular languages, has proved our point in his review. He brilliantly writes: |
|
19 |
|
20 \begin{quote}\it |
|
21 ``the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof |
|
22 (e.g., see the one given on wikipedia) ; being able to formalise it |
|
23 is not a real challenge from my point of view.'' |
|
24 \end{quote} |
|
25 |
|
26 \noindent |
|
27 The problem is it \underline{\bf is} a challenge in theorem provers! And the |
|
28 real value of the paper and its novelty is that this is a surprising fact, even for experts |
|
29 in regular languages, as the report of Referee No 3 shows. Because it is |
|
30 surprising, we really think the paper should appear in a more general journal than one |
|
31 specialised on theorem provers (after all, many theorem prover people already |
|
32 appreciate the difficulties; they |
|
33 just did not know of a way how to overcome them elegantly). |
|
34 We hope you share with us the belief that theorem prover proofs might not actually |
|
35 be used in all subjects, but they are |
|
36 the gold standard of proofs and all researchers should be aware of them.\medskip |
|
37 |
|
38 We therefore kindly ask you to reconsider the decision from this |
|
39 point of view and also ask you to consider the supporting |
|
40 numbers, which we provided in the paper in order to prove that formalising automata theory \underline{\bf is} a challenge. |
|
41 We list them below again for quick reference. |
|
42 We are also happy to implement every improvement you suggest that make our general point |
|
43 and proofs clearer.\medskip |
|
44 |
|
45 We can also give one \underline{\bf new} datapoint: \citet{LammichTuerk12} took up the challenge and have formalised a library for NFAs and DFAs |
|
46 in approximately 7000 of lines of Isabelle code---this just sets up the framework. For proving the correctness of |
|
47 Hopcroft's algorithm they need an additional 7000 lines of Isabelle code. In contrast, our |
|
48 paper describes a formalised proof of the most central property in regular language |
|
49 theory taking only 1500(!) lines in Isabelle including all definitions and supporting |
|
50 lemmas. However, the difficulties with formalising automata theory are \underline{\bf not} restricted |
|
51 to Isabelle, as the Reviewers assert, and that you do not have to take |
|
52 our word for it, we list below numbers and comments |
|
53 made by researchers from \underline{\bf Coq}, \underline{\bf Isabelle} and \underline{\bf Nuprl}. We think similar comments would apply to |
|
54 \underline{\bf Matita}, for example, and therefore our paper is written to be as much as possible |
|
55 agnostic with respect to a single theorem prover. \medskip |
|
56 |
|
57 We answer below the specific points the Reviewers raised. We also hope to surprise again |
|
58 Referee No 3 |
|
59 in that automata complementation is more tricky than it first might appear and |
|
60 most textbooks would suggest.\bigskip |
|
61 |
|
62 |
|
63 \newpage |
|
64 \noindent |
|
65 \underline{\bf Reviewer 1}\medskip |
|
66 |
|
67 \begin{itemize} |
|
68 \item additions with respect to ITP: section 4 gives now a more detailed proof; section 5 and 6 are |
|
69 new; the introduction and conclusion are extended with more numbers and comments, which should prove |
|
70 beyond doubt our general point that automata are difficult to reason about and regular expressions are |
|
71 a better substitute |
|
72 \item we did {\bf not} cite \citet{OwensReppyTuron09} for the comment that derivatives ``went lost in the sands of time'' |
|
73 but that they proved with numbers and an actual implementation that in the context of lexing, derivatives are |
|
74 good enough in practice; we believe that this statement is still accurate and it is the only paper that we could |
|
75 cite for this |
|
76 \item we would be grateful to receive a concrete pointer to the literature: ``{\it The technique is folklore, and can be found in traditional textbooks (possibly passing through left linear |
|
77 grammars)}'' |
|
78 \end{itemize}\bigskip |
|
79 |
|
80 \noindent |
|
81 \underline{\bf Reviewer 2}\medskip |
|
82 |
|
83 \begin{itemize} |
|
84 \item we hope the Reviewer agrees with us that an actual deep analysis of the technical |
|
85 details and a comparison with Coq/Sreflect would be a major distraction from the main |
|
86 point that the paper wants to convey; |
|
87 such an analysis would also be in violation with the editor's guidelines for ToCL |
|
88 \end{itemize}\bigskip |
|
89 |
|
90 \noindent |
|
91 \underline{\bf Reviewer 3}\medskip |
|
92 |
|
93 \noindent |
|
94 We honestly enjoyed the comments by Reviewer 3 very much, since they pretty much |
|
95 agreed with our view we helt before we started this work---automata theory is |
|
96 more than 50 years old, surely nothing can be challenging there. We hope, however, the |
|
97 following comments by independent Coq/Nuprl/Isabelle-researchers destroy this view: |
|
98 |
|
99 \begin{itemize} |
|
100 \item[] {\bf Numbers and comments from the literature that reasoning about automata is challenging in theorem provers |
|
101 (our formalisation is 1500 loc):}\medskip |
|
102 |
|
103 \item[{\bf Coq:}] \citet{Braibant12} formalises automata theory including |
|
104 Myhill-Nerode: he writes that our formalisation based on |
|
105 regular expressions ``{\it is more concise}'' than his; he also writes |
|
106 there are no problems with standard textbook proofs, |
|
107 except for the ``{\it {\bf intrinsic} difficulties of working with |
|
108 rectangular matrices}'' in Coq ($>$10k loc) |
|
109 |
|
110 |
|
111 \item [{\bf Coq:}] \citet{Filliatre97} formalised automata theory using matrices |
|
112 leading only to Kleene's theorem, but not |
|
113 Myhill-Nerode nor closure under complementation: he |
|
114 writes his formalisation based on matrices is |
|
115 ``{\it rather big}'' ($>$4.5k loc) |
|
116 |
|
117 \item[{\bf Coq:}] \citet{Almeidaetal10} verify Mirkin's partial derivatives |
|
118 automata ($>$10k lines of code using matrices) |
|
119 |
|
120 \item [{\bf Nuprl:}] \citet{Constable00} estimate their 4-member team needs |
|
121 11 months with Nuprl for chapters 1 - 11 from Hopcroft |
|
122 \& Ullman, which includes Myhill-Nerode ($>$100k lines |
|
123 of code) |
|
124 |
|
125 \item[{\bf Isabelle:}] \citet{LammichTuerk12} have in Isabelle formalised an automata |
|
126 library (7k loc) and verified Hopcroft's algorithm |
|
127 (additional 7k loc) |
|
128 |
|
129 \item[{\bf Isabelle:}] |
|
130 \citet{Nipkow98} formalises automata with functions but writes |
|
131 ``{\it proofs require a painful amount of detail}'' |
|
132 \end{itemize} |
|
133 |
|
134 \noindent |
|
135 These numbers and comments should |
|
136 convincingly negate the premises of the following comments from the judgement |
|
137 by Reviewer No 3: |
|
138 |
|
139 \begin{itemize} |
|
140 \item ``{\it all formalised mathematical results are well known and established |
|
141 since more than fifty years''} ({\bf we do not contest this nor claim otherwise; we talk about |
|
142 theorem prover proofs - there issues are interesting and we have not seen |
|
143 any previous work that took our view of starting with a definition for regular expressions only}) |
|
144 \item ``{\it the Myhill-Nerode theorem admits a 15-20 lines fairly detailed proof |
|
145 (e.g., see the one given on wikipedia) ; being able to formalise it |
|
146 is not a real challenge from my point of view.}'' {(\bf not true)} |
|
147 \item ``{\it the followed approach is highly specific to Isabelle/HOL: it is |
|
148 driven by the postulate that it's hard to manipulate automata or |
|
149 matrices.}'' ({\bf not true}) |
|
150 \item ``{\it despite what the authors claim, I'm certain that a mechanised proof |
|
151 of this Theorem could be straightforward in other theorem provers, |
|
152 using automata for the right-to-left direction and partial |
|
153 derivatives for the left-to-right direction''} {\bf (not true)} |
|
154 \end{itemize} |
|
155 |
|
156 \noindent |
|
157 Reviewer 3 also raised the following criticisms in his judgement: |
|
158 |
|
159 \begin{itemize} |
|
160 \item ``{\it Moreover, the authors insist on the fact that they don't want to use |
|
161 automata, while they actually just rephrase automata-based proofs, |
|
162 without using the word `automaton'}'' |
|
163 |
|
164 We did not deliberately avoid the word ``automaton'', but the {\bf definition} of what an automaton |
|
165 is. As soon as we start with such a definition, be it in terms of functions, matrices or graphs, we would be in trouble |
|
166 with formalised proofs as the loc-numbers above show in comparison to ours. |
|
167 |
|
168 \item ``{\it All in all, the paper should be presented as `here is a way of doing |
|
169 automata in HOL'. I'm sure this would make it much clearer and more |
|
170 interesting, notably since doing automata is supposed to be difficult |
|
171 in HOL.}'' |
|
172 |
|
173 The fundamental difference is that it is easier in {\bf every} theorem prover to |
|
174 reason about inductive datatypes in comparison with non-inductively defined structures |
|
175 (graphs, functions, matrices). The specific problem the paper addresses is that |
|
176 all textbook proofs of Myhill-Nerode start with a definition of automata leading |
|
177 to cumbersome theorem prover proofs in {\bf all} theorem provers we are aware of. |
|
178 \end{itemize}\bigskip |
|
179 |
|
180 \subsection*{Complementation of Automata} |
|
181 |
|
182 Finally, we like to comment on the difficulty about complementation of automata, which |
|
183 on {\bf paper} looks trivial and which highlights the differences between pencil-and-paper textbook |
|
184 reasoning and formal proofs in a theorem prover: |
|
185 |
|
186 \begin{quote}\it |
|
187 ``if we take the definition `a language is regular if recognised by |
|
188 a finite deterministic automaton (DFA)' then closure under |
|
189 complementation is **trivial** (revert the accepting status of |
|
190 states, that's it)'' |
|
191 \end{quote} |
|
192 |
|
193 \noindent |
|
194 Given a representation of automata as graphs or |
|
195 matrices, then the following |
|
196 |
|
197 \begin{center} |
|
198 \begin{tikzpicture}[scale=3, line width=0.7mm] |
|
199 \node[state, initial] (q0) at ( 0,1) {$q_0$}; |
|
200 \node[state, accepting] (q1) at ( 1,1) {$q_1$}; |
|
201 \path[->] (q0) edge node[above] {$a$} (q1) |
|
202 (q1) edge [loop right] node {$b$} () |
|
203 ; |
|
204 \end{tikzpicture} |
|
205 \end{center} |
|
206 |
|
207 \noindent |
|
208 fits the definition as a graph or matrix of nodes. However interchanging the terminal and |
|
209 non-terminal states gives the {\bf incorrect} result. What is needed is the |
|
210 notion of a {\bf completed} automaton, such as |
|
211 |
|
212 \begin{center} |
|
213 \begin{tikzpicture}[scale=3, line width=0.7mm] |
|
214 \node[state, initial] (q0) at ( 0,1) {$q_0$}; |
|
215 \node[state, accepting] (q1) at ( 1,1) {$q_1$}; |
|
216 \node[state] (q3) at (0.5, 1.5) {$q_{sink}$}; |
|
217 |
|
218 \path[->] (q0) edge node[above] {$a$} (q1) |
|
219 (q1) edge [loop right] node {$b$} () |
|
220 (q0) edge node[above] {$b$} (q3) |
|
221 (q1) edge node[above] {$a$} (q3) |
|
222 (q3) edge [loop above] node {$a, b$} () |
|
223 ; |
|
224 \end{tikzpicture} |
|
225 \end{center} |
|
226 |
|
227 \noindent |
|
228 This additional restriction unfortunately percolates through formalised proofs. In the work by |
|
229 \citet{LammichTuerk12}, for example, this completeness constrain is mentioned in 160 places |
|
230 out of almost 600 lemmas. Since, their representation is based on graphs (sets of nodes and edges), the restriction |
|
231 about finitely many nodes is mentioned in another 108 places. Similarly, badly fare the definitions |
|
232 based on matrices in Coq: they need to formalise the shape of matrices and carry around |
|
233 this baggage in their proofs and operations (which {\bf all} authors of formalised automata theory in |
|
234 Coq point out as clunky). Also, attempting to imitate the textbook definition of a finite |
|
235 function of type state $\times$ letter $\rightarrow$ state, which by the way has not been |
|
236 used by anybody who formalised automata theory in Coq, does not solve the difficulties: since |
|
237 no theorem prover has a finite-function type built in, the additional constraint about finite |
|
238 domains need to be carried around, again making proofs clunky. |
|
239 |
|
240 Regular expressions really solve all the problems mentioned above: they are inductive |
|
241 datatypes in all theorem provers, they come with simple structural induction principles |
|
242 for free and operations can be easily defined without any restrictions. The concrete question |
|
243 the paper answers is whether regular expressions are enough to obtain the central result in |
|
244 regular language theory side-stepping the definition of automata which, according to |
|
245 all previous work, leads to cumbersome reasoning in all theorem provers. The answer we give is yes: |
|
246 If you are in the business of verifying Hopcroft's algorithm there is no way around automata. |
|
247 But for Myhill-Nerode there is, and the work involved is reduced by roughly a {\bf factor of 5}. |
|
248 We have not found this answer in any of the textbooks or papers available to us. We are happy to |
|
249 be proved otherwise. |
|
250 |
|
251 |
|
252 \bibliographystyle{apa} |
|
253 \bibliography{root} |
|
254 |
|
255 \end{document} |