|
1 |
|
2 > Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be |
|
3 > proved without relying on the notion of automata and b) argues |
|
4 > shallowly thorough the paper that not having to formalize the notion |
|
5 > of automata is easier. |
|
6 > |
|
7 > While I do agree that point a) is a an interesting theoretical |
|
8 > contribution that deserves publication, I disagree with the second |
|
9 > argument. |
|
10 |
|
11 We are very grateful to the reviewer deciding the paper is a worthy |
|
12 contribution. We are especially grateful since this opinion is despite |
|
13 disagreement with parts of the paper. Thank you very much! |
|
14 |
|
15 > It should be noted that the handling of the notion of automata has |
|
16 > been difficult historically for two reasons. On the one hand, giving |
|
17 > an identifier to every state of an automata indeed yields proofs that |
|
18 > may be characterized as "clunky" (even if the wording is a bit |
|
19 > strong), but is a necessary condition for efficient implementation of |
|
20 > automata algorithms. This yields sizable formalizations as is |
|
21 > demonstrated by the Lammich and Tuerk's work or Braibant and Pous's |
|
22 > work [1]. On the other hand, it seems to be the case that the |
|
23 > "principle of definition" in HOL makes it impossible to use parametric |
|
24 > definitions of automata (but there is no problem with such a |
|
25 > definition in Coq). |
|
26 |
|
27 We agree with this summary. We cited the work by Doczal et al and wrote |
|
28 that their quantification over types is not available to us. |
|
29 |
|
30 Systems such as Coq permit quantification over types and thus can |
|
31 state such a definition. This has been recently exploited in an elegant |
|
32 and short formalisation of the Myhill-Nerode theorem in Coq by Doczkal |
|
33 et al. (2013), but as said this is not available to us working in |
|
34 Isabelle/HOL. |
|
35 |
|
36 > But, the startup cost of a formalization depends on the scope of what |
|
37 > one intends to formalize. Perhaps Braibant concedes that your |
|
38 > development is more concise that his, yet Braibant's work dealt with |
|
39 > the formalization of an _efficient_ decision procedure for Kleene |
|
40 > algebras that required _two_ formalizations of automata (the matrices |
|
41 > are only used in the proof of correctness of this decision procedure, |
|
42 > while the computational part uses efficient representations for DAGs |
|
43 > -- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only |
|
44 > adding emphasis here to demonstrate that this does not compare easily |
|
45 > with the formalization of the proof of Myhill-Nerode by itself. |
|
46 |
|
47 We have attempted to not touch upon efficiency and algorithmic issues, since |
|
48 they are an distraction for our topic at hand, that is reasoning about inductive |
|
49 structures, of which regular expressions are one instance. If you use more |
|
50 efficient datastructures, then it is expected that reasoning will be |
|
51 more difficult. The point of the paper is to prove the Myhill-Nerode theorem |
|
52 with only inductive structures, which are well-supported in all interactive |
|
53 theorem provers we are aware off. |
|
54 |
|
55 > (At this point, let me comment on your answer: "As far as we know |
|
56 > Braibant also uses ssreflect". It is not the case. One may wonder to |
|
57 > what extent the use of ssreflect there could have alleviated the |
|
58 > "intrinsic difficulties of working with rectangular matrices".) |
|
59 |
|
60 Noted. We will not make this claim and have not done so in the paper. |
|
61 |
|
62 > In the same fashion, Lammich and Tuerk's work uses efficient |
|
63 > data-structures (.e.g, for collections, automatas), and some amount of |
|
64 > data-refinement to link various representations together. |
|
65 |
|
66 Please note that the 14 kloc we cited for the Lammich/Tuerk work is |
|
67 the automata library, not the formalisation built on top of it. |
|
68 Also Lammich and Tuerk have told us that reasoning about graphs |
|
69 with state names is undesirable. If they had a better representation |
|
70 available (which they had _not_, since they formalise Hopcroft's |
|
71 algorithm), they would have used this one and establish that it is |
|
72 "equivalent" to the more efficient version with efficient data- |
|
73 structures. In their case there is no alternative formalising a library |
|
74 about graphs etc. We have shown that in case of the Myhill-Nerode theorem |
|
75 there is an alternative. We have been very careful to make claims about |
|
76 being the first ones who have done so. However, we believe we are, |
|
77 but certainly our proofs are not present in the literature (this |
|
78 might have something to do with the fact that the Myhill-Nerode |
|
79 theorem requires the reversal of Brzozowski and Arden...see |
|
80 comment further below). |
|
81 |
|
82 > My most stringent remark about the paper is that it should make clear |
|
83 > from the beginning whether or not the algorithms are computational, |
|
84 > and if they are executable, how efficient. Then, comparing with the |
|
85 > two aforementioned work will make sense. |
|
86 |
|
87 As stated earlier, we do not like to make computational issues |
|
88 explicit as they are a distraction in our opinion. The logic behind |
|
89 Isabelle/HOL and all HOL-based theorem provers is not a computational logic, |
|
90 in the sense that it is not about computable functions (unlike Coq). There |
|
91 are "uncomputable" features in HOL such as the axiom of choice (which we |
|
92 make use of in our formalisation). We still use the terminology "algorithm" |
|
93 in the widely-used sense of algorithms that contain "do-not-care" |
|
94 nondetermism. For example, unification algorithms are usually explained and |
|
95 proved correct as transition system over sets of equations. They are not |
|
96 executable in the Turing-machine sense, since they leave implicit |
|
97 the order in which the equations should be analysed and which |
|
98 rules should "fire" if there are more than one applicable. They |
|
99 are also "imperfect" as the notion of (potentially infinite) sets |
|
100 cannot be implemented, in general, as a computable function, but |
|
101 they can be reasoned about in Isabelle/HOL and other theorem provers. |
|
102 |
|
103 Still we hope that the reviewer agrees with us and with the many uses |
|
104 in the literature (unification is only one example which commits this |
|
105 "sin"), it is nevertheless perfectly sensible to call such "things" |
|
106 algorithms. Although they cannot be directly implemented, they can be |
|
107 implemented once choices are made explicit and datastructures |
|
108 fixed. Similarly, our formalisation uses a do-not-care approach |
|
109 for building a regular expression out of a set of regular expressions |
|
110 (see equation (2)). It also leaves the order of equations that should be |
|
111 analysed under-specified. But if all these choices are made explicit, we |
|
112 have an algorithm that is implementable in code. But as the explanation |
|
113 already indicates, it is quite a digression from our main topic of |
|
114 reasoning about inductive structures. |
|
115 |
|
116 If we further contemplate the efficiency of the algorithm we |
|
117 (under)specified, then our best guess is that it is exponential in the |
|
118 worst case, just like the "naive" unification algorithm is exponential |
|
119 if sharing is not treated carefully. |
|
120 |
|
121 > In fact, I have been made aware of the following work [2] that do |
|
122 > formalize automata and regular language theory without using explicit |
|
123 > identifiers for state numbers in automata. The upside is that it |
|
124 > yields much smoother proofs than previous works in Coq. The downside |
|
125 > is that this formalization strategy does not yield efficient algorithm |
|
126 > for automata constructions. I reckon that this is a technical report |
|
127 > that should be taken with a grain of salt in the context of this |
|
128 > review, but I think that it gives evidence that it is not always the |
|
129 > case that the startup cost is higher with automata rather than with |
|
130 > regular expressions: they formalize the Myhill-Nerode theorem, |
|
131 > Kleene's theorem, the relationships between various representations of |
|
132 > regular languages in around 1500 lines of ssreflect/Coq. (Again, let |
|
133 > me emphasize that this is not a computational representation of |
|
134 > automata, and thus, it does not easily compare in terms of size with |
|
135 > Lammich and Tuerk's work or Braibant and Pous's work.) I think this |
|
136 > work would make for a much more relevant comparison with your work. |
|
137 |
|
138 We cited this work and commented on it (see above). |
|
139 |
|
140 > I agree that comparing with related work is hard, and that the authors |
|
141 > have already been careful in their comparison with the formalization |
|
142 > sizes from other works. Yet, I am afraid an uninformed outsider would |
|
143 > miss the distinction between related work that are executable and |
|
144 > efficient and this formalization. |
|
145 |
|
146 In order to get meaningful work done, we have to depend on readers |
|
147 carefully reading the paper and obtaining required background material. |
|
148 I am afraid, there is no other way around. |
|
149 |
|
150 > - I am still confused about the explanation about the way the |
|
151 > equational system is set up (your comment in the conclusion does not |
|
152 > help me to pinpoint why you have to do that): I understand that |
|
153 > Brzozowski annotates the final states with lambdas, and propagates |
|
154 > them to the unique initial state. On the other hand you start with a |
|
155 > set of equivalence classes, and you want to compute a regular |
|
156 > expression for all classes that are in finals. In other words, I still |
|
157 > fail to see why the following system does not suit your needs |
|
158 > |
|
159 > X1 = a,X2 + b,X4 |
|
160 > X2 = b,X3 + a,X4 + lambda(ONE) |
|
161 > X3 = a,X4 + b,X4 + lambda(ONE) |
|
162 > X4 = a,X4 + b,X4 |
|
163 > |
|
164 > with L(r,X) = L(r) \cdot X |
|
165 > |
|
166 > and then propagate the equation to compute to compute a value for |
|
167 > X1. (And, you could draw the automata underlying the transitions at |
|
168 > the top of p. 7 to help the reader understand what is going on, and |
|
169 > the relationship with the current equational system.) |
|
170 > |
|
171 > Maybe it is simply that you could have done it, but that it would have |
|
172 > been less pleasant to work with in the context of a Myhill-Nerode |
|
173 > relation? |
|
174 |
|
175 We used in the paper the much stronger formulation of being |
|
176 _forced_ to set up the equational system in our way. And we even |
|
177 now have made the earlier comment about this issue, which was |
|
178 placed in a footnote, to be part of the main text (see paragraph after |
|
179 equation (6)). |
|
180 |
|
181 While your equational system and computations generate a regular |
|
182 expression, it does not take into account the assumption from |
|
183 the first direction of the Myhill-Nerode theorem. Recall that the |
|
184 assumption is that there are finitely many equivalence classes |
|
185 by the Myhill-Nerode relation. Therefore we have to make our equations |
|
186 to be in agreement with these equivalence classes, which the properties |
|
187 in equation (7) and (8) make explicit. We further have to maintain |
|
188 this agreement during the solution of the equational system. |
|
189 |
|
190 Your equational system is _not_ in agreement with the equivalence |
|
191 classes: consider that |
|
192 |
|
193 X1 = {[]} |
|
194 X2 = {[a]} |
|
195 X3 = {[a, b]} |
|
196 X4 = UNIV − {[], [a], [a, b]} |
|
197 |
|
198 hold, which is not true for the equations you have set up (under the |
|
199 natural interpretation you have given). Another point is that it is always the |
|
200 case that only one equivalence class in the Myhill-Nerode theorem contains |
|
201 the empty string. Therefore it cannot be the case that there are more than |
|
202 one (initial) equation containing the marker lambda(ONE), which translates |
|
203 into the empty string under the natural interpretation. So we are |
|
204 of the opinion that for the Myhill-Nerode theorem you are really forced |
|
205 to "reverse" the standard approaches. Please let us know, if this |
|
206 clarifies this issue for you. |
|
207 |
|
208 > - I really appreciate the rephrasing of section 4, especially the |
|
209 > final remark. I think it clearly helps the reader to understand the |
|
210 > proof arguments, and the relationship with automata. |
|
211 |
|
212 Thank you. |
|
213 |
|
214 > - I concur with the second reviewer to say that the executability of |
|
215 > the first half of the M-N theorem should be discussed in the |
|
216 > conclusion. Moreover, I am puzzled by your comments in the response to |
|
217 > reviewers: to what extent does your formalization need to be |
|
218 > _modified_ to make it executable? Do you need to change the code, or |
|
219 > to add more things? (More precisely, you discuss the executability of |
|
220 > the second half of the M-N theorem -- that constructs a partition of |
|
221 > the set of strings -- in the conclusion, in the context of Haftmann |
|
222 > executable sets; |
|
223 |
|
224 The first half is about the Myhill-Nerode relation partitioning the sets of |
|
225 all strings into equivalence classes. Our remark applies to this "half". |
|
226 |
|
227 > you should do the same for the first-half---that |
|
228 > constructs a regular expression). I am not sure I am happy with the |
|
229 > things you wrote about executable sets, because it is not clear from |
|
230 > my point of view if you need to modify the code, or add things. |
|
231 |
|
232 See comment above. We really not want to open up this can of worms. |
|
233 |
|
234 > - More generally, it should be made clear that when you say algorithm, |
|
235 > the reader should not understand it as something that is computable as |
|
236 > it is. In particular, I think that the following sentence is |
|
237 > misleading: "our algorithm for solving equational systems actually |
|
238 > calculates a regular expression for the complement language." You |
|
239 > actually define such an expression, but it seems not possible, as it |
|
240 > is, to open the box to see what is inside. |
|
241 |
|
242 It is possible as long as you make the "do-not-care" non-determinism |
|
243 explicit and fix appropriate datastructures for computations. |
|
244 |
|
245 > - You answered: |
|
246 > > In our opinion, Coquand and Siles' development strongly justifies our |
|
247 > > comment about formalisation of Brzozowski's argument being painful. |
|
248 > > If you like to have this reference added in the paper, we are happy |
|
249 > > to do so. |
|
250 > |
|
251 > I think that it is worth adding a comment like this one either in |
|
252 > section 5 (at the beginning of p. 20) or in the conclusion. |
|
253 |
|
254 We added to page 20: |
|
255 |
|
256 Therefore Coquand and Siles (2011) who follow through this idea |
|
257 had to spend extra effort and first formalise the quite intricate |
|
258 notion of inductively finite sets in order to formalise this property. |
|
259 |
|
260 > - The authors provide a detailed figure for the number of line of code |
|
261 > of Almeida et al's implementation of Mirkin construction, but there |
|
262 > is no such figure for some of other works that are cited. Maybe this |
|
263 > number could be dropped, or similar figures could be given for other |
|
264 > works, in order to give a more accurate picture. |
|
265 |
|
266 We are not 100% sure about this comment. We have not selected the |
|
267 explicit numbers for the Lammich-Tuerk and Almeida et al works because of |
|
268 "name-and-shaming" or because of a "beauty contest". Rather to illustrate |
|
269 a trend. We asked about the numbers for the Lammich-Tuerk work. They |
|
270 also confirmed with us that their automata-with-names approach is painful |
|
271 to work with. Unfortunately they have not written this in their paper |
|
272 such that we could have cited it. Filliatre makes explicitly such a comment |
|
273 in his paper, which we have cited. Braibant equally wrote that he reckons |
|
274 that our formalisation is more concise. There is no way we can give a number |
|
275 in terms of lines-of-code for the formalisation in Nuprl. We believe |
|
276 the number for the Almeida et al work is accurate (and not even the |
|
277 biggest formalisation from the ones listed, but one which is about |
|
278 one specific topic). |
|
279 |
|
280 > - I wonder if you have direction for future works, that would satisfy |
|
281 > your predicate "results from regular language theory, not results |
|
282 > from automata theory". It would be nice to give a few examples if |
|
283 > you have some in mind. |
|
284 |
|
285 We added the work by Sulzmann and Lu who work on regular expression |
|
286 sub-matching using derivatives of regular expressions. We would like |
|
287 to formalise this work. Thank you very much for suggesting future work. |
|
288 |