|         |      1  | 
|         |      2 > Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be | 
|         |      3 > proved without relying on the notion of automata and b) argues | 
|         |      4 > shallowly thorough the paper that not having to formalize the notion | 
|         |      5 > of automata is easier. | 
|         |      6 >  | 
|         |      7 > While I do agree that point a) is a an interesting theoretical | 
|         |      8 > contribution that deserves publication, I disagree with the second | 
|         |      9 > argument. | 
|         |     10  | 
|         |     11 We are very grateful to the reviewer deciding the paper is a worthy  | 
|         |     12 contribution. We are especially grateful since this opinion is despite  | 
|         |     13 disagreement with parts of the paper. Thank you very much! | 
|         |     14  | 
|         |     15 > It should be noted that the handling of the notion of automata has | 
|         |     16 > been difficult historically for two reasons. On the one hand, giving | 
|         |     17 > an identifier to every state of an automata indeed yields proofs that | 
|         |     18 > may be characterized as "clunky" (even if the wording is a bit | 
|         |     19 > strong), but is a necessary condition for efficient implementation of | 
|         |     20 > automata algorithms. This yields sizable formalizations as is | 
|         |     21 > demonstrated by the Lammich and Tuerk's work or Braibant and Pous's | 
|         |     22 > work [1]. On the other hand, it seems to be the case that the | 
|         |     23 > "principle of definition" in HOL makes it impossible to use parametric | 
|         |     24 > definitions of automata (but there is no problem with such a | 
|         |     25 > definition in Coq). | 
|         |     26  | 
|         |     27 We agree with this summary. We cited the work by Doczal et al and wrote  | 
|         |     28 that their quantification over types is not available to us.  | 
|         |     29  | 
|         |     30    Systems such as Coq permit quantification over types and thus can  | 
|         |     31    state such a definition. This has been recently exploited in an elegant  | 
|         |     32    and short formalisation of the Myhill-Nerode theorem in Coq by Doczkal  | 
|         |     33    et al. (2013), but as said this is not available to us working in  | 
|         |     34    Isabelle/HOL. | 
|         |     35  | 
|         |     36 > But, the startup cost of a formalization depends on the scope of what | 
|         |     37 > one intends to formalize. Perhaps Braibant concedes that your | 
|         |     38 > development is more concise that his, yet Braibant's work dealt with | 
|         |     39 > the formalization of an _efficient_ decision procedure for Kleene | 
|         |     40 > algebras that required _two_ formalizations of automata (the matrices | 
|         |     41 > are only used in the proof of correctness of this decision procedure, | 
|         |     42 > while the computational part uses efficient representations for DAGs | 
|         |     43 > -- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only | 
|         |     44 > adding emphasis here to demonstrate that this does not compare easily | 
|         |     45 > with the formalization of the proof of Myhill-Nerode by itself. | 
|         |     46  | 
|         |     47 We have attempted to not touch upon efficiency and algorithmic issues, since  | 
|         |     48 they are an distraction for our topic at hand, that is reasoning about inductive | 
|         |     49 structures, of which regular expressions are one instance. If you use more | 
|         |     50 efficient datastructures, then it is expected that reasoning will be | 
|         |     51 more difficult. The point of the paper is to prove the Myhill-Nerode theorem | 
|         |     52 with only inductive structures, which are well-supported in all interactive  | 
|         |     53 theorem provers we are aware off.  | 
|         |     54  | 
|         |     55 > (At this point, let me comment on your answer: "As far as we know | 
|         |     56 > Braibant also uses ssreflect".  It is not the case. One may wonder to | 
|         |     57 > what extent the use of ssreflect there could have alleviated the | 
|         |     58 > "intrinsic difficulties of working with rectangular matrices".) | 
|         |     59  | 
|         |     60 Noted. We will not make this claim and have not done so in the paper. | 
|         |     61  | 
|         |     62 > In the same fashion, Lammich and Tuerk's work uses efficient | 
|         |     63 > data-structures (.e.g, for collections, automatas), and some amount of | 
|         |     64 > data-refinement to link various representations together.  | 
|         |     65  | 
|         |     66 Please note that the 14 kloc we cited for the Lammich/Tuerk work is | 
|         |     67 the automata library, not the formalisation built on top of it. | 
|         |     68 Also Lammich and Tuerk have told us that reasoning about graphs  | 
|         |     69 with state names is undesirable. If they had a better representation | 
|         |     70 available (which they had _not_, since they formalise Hopcroft's | 
|         |     71 algorithm), they would have used this one and establish that it is  | 
|         |     72 "equivalent" to the more efficient version with efficient data- | 
|         |     73 structures. In their case there is no alternative formalising a library | 
|         |     74 about graphs etc. We have shown that in case of the Myhill-Nerode theorem | 
|         |     75 there is an alternative. We have been very careful to make claims about | 
|         |     76 being the first ones who have done so. However, we believe we are, | 
|         |     77 but certainly our proofs are not present in the literature (this | 
|         |     78 might have something to do with the fact that the Myhill-Nerode | 
|         |     79 theorem requires the reversal of Brzozowski and Arden...see | 
|         |     80 comment further below). | 
|         |     81   | 
|         |     82 > My most stringent remark about the paper is that it should make clear | 
|         |     83 > from the beginning whether or not the algorithms are computational, | 
|         |     84 > and if they are executable, how efficient. Then, comparing with the | 
|         |     85 > two aforementioned work will make sense.  | 
|         |     86  | 
|         |     87 As stated earlier, we do not like to make computational issues | 
|         |     88 explicit as they are a distraction in our opinion. The logic behind  | 
|         |     89 Isabelle/HOL and all HOL-based theorem provers is not a computational logic,  | 
|         |     90 in the sense that it is not about computable functions (unlike Coq). There  | 
|         |     91 are "uncomputable" features in HOL such as the axiom of choice (which we | 
|         |     92 make use of in our formalisation). We still use the terminology "algorithm" | 
|         |     93 in the widely-used sense of algorithms that contain "do-not-care" | 
|         |     94 nondetermism. For example, unification algorithms are usually explained and  | 
|         |     95 proved correct as transition system over sets of equations. They are not | 
|         |     96 executable in the Turing-machine sense, since they leave implicit | 
|         |     97 the order in which the equations should be analysed and which | 
|         |     98 rules should "fire" if there are more than one applicable. They | 
|         |     99 are also "imperfect" as the notion of (potentially infinite) sets | 
|         |    100 cannot be implemented, in general, as a computable function, but  | 
|         |    101 they can be reasoned about in Isabelle/HOL and other theorem provers. | 
|         |    102  | 
|         |    103 Still we hope that the reviewer agrees with us and with the many uses  | 
|         |    104 in the literature (unification is only one example which commits this  | 
|         |    105 "sin"), it is nevertheless perfectly sensible to call such "things"  | 
|         |    106 algorithms. Although they cannot be directly implemented, they can be | 
|         |    107 implemented once choices are made explicit and datastructures  | 
|         |    108 fixed. Similarly, our formalisation uses a do-not-care approach | 
|         |    109 for building a regular expression out of a set of regular expressions | 
|         |    110 (see equation (2)).  It also leaves the order of equations that should be  | 
|         |    111 analysed under-specified. But if all these choices are made explicit, we | 
|         |    112 have an algorithm that is implementable in code. But as the explanation | 
|         |    113 already indicates, it is quite a digression from our main topic of | 
|         |    114 reasoning about inductive structures. | 
|         |    115  | 
|         |    116 If we further contemplate the efficiency of the algorithm we  | 
|         |    117 (under)specified, then our best guess is that it is exponential in the  | 
|         |    118 worst case, just like the "naive" unification algorithm is exponential  | 
|         |    119 if sharing is not treated carefully.  | 
|         |    120  | 
|         |    121 > In fact, I have been made aware of the following work [2] that do | 
|         |    122 > formalize automata and regular language theory without using explicit | 
|         |    123 > identifiers for state numbers in automata. The upside is that it | 
|         |    124 > yields much smoother proofs than previous works in Coq. The downside | 
|         |    125 > is that this formalization strategy does not yield efficient algorithm | 
|         |    126 > for automata constructions. I reckon that this is a technical report | 
|         |    127 > that should be taken with a grain of salt in the context of this | 
|         |    128 > review, but I think that it gives evidence that it is not always the | 
|         |    129 > case that the startup cost is higher with automata rather than with | 
|         |    130 > regular expressions: they formalize the Myhill-Nerode theorem, | 
|         |    131 > Kleene's theorem, the relationships between various representations of | 
|         |    132 > regular languages in around 1500 lines of ssreflect/Coq. (Again, let | 
|         |    133 > me emphasize that this is not a computational representation of | 
|         |    134 > automata, and thus, it does not easily compare in terms of size with | 
|         |    135 > Lammich and Tuerk's work or Braibant and Pous's work.) I think this | 
|         |    136 > work would make for a much more relevant comparison with your work.  | 
|         |    137  | 
|         |    138 We cited this work and commented on it (see above). | 
|         |    139  | 
|         |    140 > I agree that comparing with related work is hard, and that the authors | 
|         |    141 > have already been careful in their comparison with the formalization | 
|         |    142 > sizes from other works. Yet, I am afraid an uninformed outsider would | 
|         |    143 > miss the distinction between related work that are executable and | 
|         |    144 > efficient and this formalization.  | 
|         |    145  | 
|         |    146 In order to get meaningful work done, we have to depend on readers  | 
|         |    147 carefully reading the paper and obtaining required background material. | 
|         |    148 I am afraid, there is no other way around.  | 
|         |    149  | 
|         |    150 > - I am still confused about the explanation about the way the | 
|         |    151 > equational system is set up (your comment in the conclusion does not | 
|         |    152 > help me to pinpoint why you have to do that): I understand that | 
|         |    153 > Brzozowski annotates the final states with lambdas, and propagates | 
|         |    154 > them to the unique initial state. On the other hand you start with a | 
|         |    155 > set of equivalence classes, and you want to compute a regular | 
|         |    156 > expression for all classes that are in finals. In other words, I still | 
|         |    157 > fail to see why the following system does not suit your needs | 
|         |    158 >  | 
|         |    159 > X1 = a,X2 + b,X4  | 
|         |    160 > X2 = b,X3 + a,X4 + lambda(ONE) | 
|         |    161 > X3 = a,X4 + b,X4 + lambda(ONE) | 
|         |    162 > X4 = a,X4 + b,X4 | 
|         |    163 > | 
|         |    164 > with L(r,X) = L(r) \cdot X | 
|         |    165 > | 
|         |    166 > and then propagate the equation to compute to compute a value for | 
|         |    167 > X1. (And, you could draw the automata underlying the transitions at | 
|         |    168 > the top of p. 7 to help the reader understand what is going on, and | 
|         |    169 > the relationship with the current equational system.) | 
|         |    170 >  | 
|         |    171 > Maybe it is simply that you could have done it, but that it would have | 
|         |    172 > been less pleasant to work with in the context of a Myhill-Nerode | 
|         |    173 > relation? | 
|         |    174  | 
|         |    175 We used in the paper the much stronger formulation of being | 
|         |    176 _forced_ to set up the equational system in our way. And we even | 
|         |    177 now have made the earlier comment about this issue, which was  | 
|         |    178 placed in a footnote, to be part of the main text (see paragraph after  | 
|         |    179 equation (6)).  | 
|         |    180  | 
|         |    181 While your equational system and computations generate a regular | 
|         |    182 expression, it does not take into account the assumption from  | 
|         |    183 the first direction of the Myhill-Nerode theorem. Recall that the  | 
|         |    184 assumption is that there are finitely many equivalence classes  | 
|         |    185 by the Myhill-Nerode relation. Therefore we have to make our equations | 
|         |    186 to be in agreement with these equivalence classes, which the properties | 
|         |    187 in equation (7) and (8) make explicit. We further have to maintain | 
|         |    188 this agreement during the solution of the equational system.  | 
|         |    189  | 
|         |    190 Your equational system is _not_ in agreement with the equivalence  | 
|         |    191 classes: consider that  | 
|         |    192  | 
|         |    193    X1 = {[]} | 
|         |    194    X2 = {[a]} | 
|         |    195    X3 = {[a, b]} | 
|         |    196    X4 = UNIV − {[], [a], [a, b]}  | 
|         |    197  | 
|         |    198 hold, which is not true for the equations you have set up (under the | 
|         |    199 natural interpretation you have given). Another point is that it is always the | 
|         |    200 case that only one equivalence class in the Myhill-Nerode theorem contains  | 
|         |    201 the empty string. Therefore it cannot be the case that there are more than  | 
|         |    202 one (initial) equation containing the marker lambda(ONE), which translates  | 
|         |    203 into the empty string under the natural interpretation. So we are | 
|         |    204 of the opinion that for the Myhill-Nerode theorem you are really forced  | 
|         |    205 to "reverse" the standard approaches. Please let us know, if this  | 
|         |    206 clarifies this issue for you. | 
|         |    207  | 
|         |    208 > - I really appreciate the rephrasing of section 4, especially the | 
|         |    209 > final remark. I think it clearly helps the reader to understand the | 
|         |    210 > proof arguments, and the relationship with automata.  | 
|         |    211  | 
|         |    212 Thank you. | 
|         |    213  | 
|         |    214 > - I concur with the second reviewer to say that the executability of | 
|         |    215 > the first half of the M-N theorem should be discussed in the | 
|         |    216 > conclusion. Moreover, I am puzzled by your comments in the response to | 
|         |    217 > reviewers: to what extent does your formalization need to be | 
|         |    218 > _modified_ to make it executable? Do you need to change the code, or | 
|         |    219 > to add more things? (More precisely, you discuss the executability of | 
|         |    220 > the second half of the M-N theorem -- that constructs a partition of | 
|         |    221 > the set of strings -- in the conclusion, in the context of Haftmann | 
|         |    222 > executable sets;  | 
|         |    223  | 
|         |    224 The first half is about the Myhill-Nerode relation partitioning the sets of  | 
|         |    225 all strings into equivalence classes. Our remark applies to this "half". | 
|         |    226  | 
|         |    227 > you should do the same for the first-half---that | 
|         |    228 > constructs a regular expression). I am not sure I am happy with the | 
|         |    229 > things you wrote about executable sets, because it is not clear from | 
|         |    230 > my point of view if you need to modify the code, or add things.  | 
|         |    231  | 
|         |    232 See comment above. We really not want to open up this can of worms. | 
|         |    233  | 
|         |    234 > - More generally, it should be made clear that when you say algorithm, | 
|         |    235 > the reader should not understand it as something that is computable as | 
|         |    236 > it is. In particular, I think that the following sentence is | 
|         |    237 > misleading: "our algorithm for solving equational systems actually | 
|         |    238 > calculates a regular expression for the complement language." You | 
|         |    239 > actually define such an expression, but it seems not possible, as it | 
|         |    240 > is, to open the box to see what is inside. | 
|         |    241  | 
|         |    242 It is possible as long as you make the "do-not-care" non-determinism  | 
|         |    243 explicit and fix appropriate datastructures for computations.  | 
|         |    244  | 
|         |    245 > - You answered:  | 
|         |    246 > > In our opinion, Coquand and Siles' development strongly justifies our | 
|         |    247 > > comment about formalisation of Brzozowski's argument being painful. | 
|         |    248 > > If you like to have this reference added in the paper, we are happy | 
|         |    249 > > to do so. | 
|         |    250 > | 
|         |    251 > I think that it is worth adding a comment like this one either in | 
|         |    252 > section 5 (at the beginning of p. 20) or in the conclusion. | 
|         |    253  | 
|         |    254 We added to page 20: | 
|         |    255  | 
|         |    256   Therefore Coquand and Siles (2011) who follow through this idea  | 
|         |    257   had to spend extra effort and first formalise the quite intricate  | 
|         |    258   notion of inductively finite sets in order to formalise this property. | 
|         |    259  | 
|         |    260 > - The authors provide a detailed figure for the number of line of code | 
|         |    261 >   of Almeida et al's implementation of Mirkin construction, but there | 
|         |    262 >   is no such figure for some of other works that are cited. Maybe this | 
|         |    263 >   number could be dropped, or similar figures could be given for other | 
|         |    264 >   works, in order to give a more accurate picture. | 
|         |    265  | 
|         |    266 We are not 100% sure about this comment. We have not selected the | 
|         |    267 explicit numbers for the Lammich-Tuerk and Almeida et al works because of   | 
|         |    268 "name-and-shaming" or because of a "beauty contest". Rather to illustrate  | 
|         |    269 a trend. We asked about the numbers for the Lammich-Tuerk work. They  | 
|         |    270 also confirmed with us that their automata-with-names approach is painful  | 
|         |    271 to work with. Unfortunately they have not written this in their paper  | 
|         |    272 such that we could have cited it. Filliatre makes explicitly such a comment  | 
|         |    273 in his paper, which we have cited. Braibant equally wrote that he reckons  | 
|         |    274 that our formalisation is more concise. There is no way we can give a number  | 
|         |    275 in terms of lines-of-code for the formalisation in Nuprl. We believe  | 
|         |    276 the number for the Almeida et al work is accurate (and not even the | 
|         |    277 biggest formalisation from the ones listed, but one which is about | 
|         |    278 one specific topic).   | 
|         |    279  | 
|         |    280 > - I wonder if you have direction for future works, that would satisfy | 
|         |    281 >  your predicate "results from regular language theory, not results | 
|         |    282 >  from automata theory". It would be nice to give a few examples if | 
|         |    283 >  you have some in mind.  | 
|         |    284  | 
|         |    285 We added the work by Sulzmann and Lu who work on regular expression | 
|         |    286 sub-matching using derivatives of regular expressions. We would like | 
|         |    287 to formalise this work. Thank you very much for suggesting future work. | 
|         |    288  |