> Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be> proved without relying on the notion of automata and b) argues> shallowly thorough the paper that not having to formalize the notion> of automata is easier.> > While I do agree that point a) is a an interesting theoretical> contribution that deserves publication, I disagree with the second> argument.We are very grateful to the reviewer deciding the paper is a worthy contribution. We are especially grateful since this opinion is despite disagreement with parts of the paper. Thank you very much!> It should be noted that the handling of the notion of automata has> been difficult historically for two reasons. On the one hand, giving> an identifier to every state of an automata indeed yields proofs that> may be characterized as "clunky" (even if the wording is a bit> strong), but is a necessary condition for efficient implementation of> automata algorithms. This yields sizable formalizations as is> demonstrated by the Lammich and Tuerk's work or Braibant and Pous's> work [1]. On the other hand, it seems to be the case that the> "principle of definition" in HOL makes it impossible to use parametric> definitions of automata (but there is no problem with such a> definition in Coq).We agree with this summary. We cited the work by Doczal et al and wrote that their quantification over types is not available to us. Systems such as Coq permit quantification over types and thus can state such a definition. This has been recently exploited in an elegant and short formalisation of the Myhill-Nerode theorem in Coq by Doczkal et al. (2013), but as said this is not available to us working in Isabelle/HOL.> But, the startup cost of a formalization depends on the scope of what> one intends to formalize. Perhaps Braibant concedes that your> development is more concise that his, yet Braibant's work dealt with> the formalization of an _efficient_ decision procedure for Kleene> algebras that required _two_ formalizations of automata (the matrices> are only used in the proof of correctness of this decision procedure,> while the computational part uses efficient representations for DAGs> -- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only> adding emphasis here to demonstrate that this does not compare easily> with the formalization of the proof of Myhill-Nerode by itself.We have attempted to not touch upon efficiency and algorithmic issues, since they are an distraction for our topic at hand, that is reasoning about inductivestructures, of which regular expressions are one instance. If you use moreefficient datastructures, then it is expected that reasoning will bemore difficult. The point of the paper is to prove the Myhill-Nerode theoremwith only inductive structures, which are well-supported in all interactive theorem provers we are aware off. > (At this point, let me comment on your answer: "As far as we know> Braibant also uses ssreflect". It is not the case. One may wonder to> what extent the use of ssreflect there could have alleviated the> "intrinsic difficulties of working with rectangular matrices".)Noted. We will not make this claim and have not done so in the paper.> In the same fashion, Lammich and Tuerk's work uses efficient> data-structures (.e.g, for collections, automatas), and some amount of> data-refinement to link various representations together. Please note that the 14 kloc we cited for the Lammich/Tuerk work isthe automata library, not the formalisation built on top of it.Also Lammich and Tuerk have told us that reasoning about graphs with state names is undesirable. If they had a better representationavailable (which they had _not_, since they formalise Hopcroft'salgorithm), they would have used this one and establish that it is "equivalent" to the more efficient version with efficient data-structures. In their case there is no alternative formalising a libraryabout graphs etc. We have shown that in case of the Myhill-Nerode theoremthere is an alternative. We have been very careful to make claims aboutbeing the first ones who have done so. However, we believe we are,but certainly our proofs are not present in the literature (thismight have something to do with the fact that the Myhill-Nerodetheorem requires the reversal of Brzozowski and Arden...seecomment further below).> My most stringent remark about the paper is that it should make clear> from the beginning whether or not the algorithms are computational,> and if they are executable, how efficient. Then, comparing with the> two aforementioned work will make sense. As stated earlier, we do not like to make computational issuesexplicit as they are a distraction in our opinion. The logic behind Isabelle/HOL and all HOL-based theorem provers is not a computational logic, in the sense that it is not about computable functions (unlike Coq). There are "uncomputable" features in HOL such as the axiom of choice (which wemake use of in our formalisation). We still use the terminology "algorithm"in the widely-used sense of algorithms that contain "do-not-care"nondetermism. For example, unification algorithms are usually explained and proved correct as transition system over sets of equations. They are notexecutable in the Turing-machine sense, since they leave implicitthe order in which the equations should be analysed and whichrules should "fire" if there are more than one applicable. Theyare also "imperfect" as the notion of (potentially infinite) setscannot be implemented, in general, as a computable function, but they can be reasoned about in Isabelle/HOL and other theorem provers.Still we hope that the reviewer agrees with us and with the many uses in the literature (unification is only one example which commits this "sin"), it is nevertheless perfectly sensible to call such "things" algorithms. Although they cannot be directly implemented, they can beimplemented once choices are made explicit and datastructures fixed. Similarly, our formalisation uses a do-not-care approachfor building a regular expression out of a set of regular expressions(see equation (2)). It also leaves the order of equations that should be analysed under-specified. But if all these choices are made explicit, wehave an algorithm that is implementable in code. But as the explanationalready indicates, it is quite a digression from our main topic ofreasoning about inductive structures.If we further contemplate the efficiency of the algorithm we (under)specified, then our best guess is that it is exponential in the worst case, just like the "naive" unification algorithm is exponential if sharing is not treated carefully. > In fact, I have been made aware of the following work [2] that do> formalize automata and regular language theory without using explicit> identifiers for state numbers in automata. The upside is that it> yields much smoother proofs than previous works in Coq. The downside> is that this formalization strategy does not yield efficient algorithm> for automata constructions. I reckon that this is a technical report> that should be taken with a grain of salt in the context of this> review, but I think that it gives evidence that it is not always the> case that the startup cost is higher with automata rather than with> regular expressions: they formalize the Myhill-Nerode theorem,> Kleene's theorem, the relationships between various representations of> regular languages in around 1500 lines of ssreflect/Coq. (Again, let> me emphasize that this is not a computational representation of> automata, and thus, it does not easily compare in terms of size with> Lammich and Tuerk's work or Braibant and Pous's work.) I think this> work would make for a much more relevant comparison with your work. We cited this work and commented on it (see above).> I agree that comparing with related work is hard, and that the authors> have already been careful in their comparison with the formalization> sizes from other works. Yet, I am afraid an uninformed outsider would> miss the distinction between related work that are executable and> efficient and this formalization. In order to get meaningful work done, we have to depend on readers carefully reading the paper and obtaining required background material.I am afraid, there is no other way around. > - I am still confused about the explanation about the way the> equational system is set up (your comment in the conclusion does not> help me to pinpoint why you have to do that): I understand that> Brzozowski annotates the final states with lambdas, and propagates> them to the unique initial state. On the other hand you start with a> set of equivalence classes, and you want to compute a regular> expression for all classes that are in finals. In other words, I still> fail to see why the following system does not suit your needs> > X1 = a,X2 + b,X4 > X2 = b,X3 + a,X4 + lambda(ONE)> X3 = a,X4 + b,X4 + lambda(ONE)> X4 = a,X4 + b,X4>> with L(r,X) = L(r) \cdot X>> and then propagate the equation to compute to compute a value for> X1. (And, you could draw the automata underlying the transitions at> the top of p. 7 to help the reader understand what is going on, and> the relationship with the current equational system.)> > Maybe it is simply that you could have done it, but that it would have> been less pleasant to work with in the context of a Myhill-Nerode> relation?We used in the paper the much stronger formulation of being_forced_ to set up the equational system in our way. And we evennow have made the earlier comment about this issue, which was placed in a footnote, to be part of the main text (see paragraph after equation (6)). While your equational system and computations generate a regularexpression, it does not take into account the assumption from the first direction of the Myhill-Nerode theorem. Recall that the assumption is that there are finitely many equivalence classes by the Myhill-Nerode relation. Therefore we have to make our equationsto be in agreement with these equivalence classes, which the propertiesin equation (7) and (8) make explicit. We further have to maintainthis agreement during the solution of the equational system. Your equational system is _not_ in agreement with the equivalence classes: consider that X1 = {[]} X2 = {[a]} X3 = {[a, b]} X4 = UNIV − {[], [a], [a, b]} hold, which is not true for the equations you have set up (under thenatural interpretation you have given). Another point is that it is always thecase that only one equivalence class in the Myhill-Nerode theorem contains the empty string. Therefore it cannot be the case that there are more than one (initial) equation containing the marker lambda(ONE), which translates into the empty string under the natural interpretation. So we areof the opinion that for the Myhill-Nerode theorem you are really forced to "reverse" the standard approaches. Please let us know, if this clarifies this issue for you.> - I really appreciate the rephrasing of section 4, especially the> final remark. I think it clearly helps the reader to understand the> proof arguments, and the relationship with automata. Thank you.> - I concur with the second reviewer to say that the executability of> the first half of the M-N theorem should be discussed in the> conclusion. Moreover, I am puzzled by your comments in the response to> reviewers: to what extent does your formalization need to be> _modified_ to make it executable? Do you need to change the code, or> to add more things? (More precisely, you discuss the executability of> the second half of the M-N theorem -- that constructs a partition of> the set of strings -- in the conclusion, in the context of Haftmann> executable sets; The first half is about the Myhill-Nerode relation partitioning the sets of all strings into equivalence classes. Our remark applies to this "half".> you should do the same for the first-half---that> constructs a regular expression). I am not sure I am happy with the> things you wrote about executable sets, because it is not clear from> my point of view if you need to modify the code, or add things. See comment above. We really not want to open up this can of worms.> - More generally, it should be made clear that when you say algorithm,> the reader should not understand it as something that is computable as> it is. In particular, I think that the following sentence is> misleading: "our algorithm for solving equational systems actually> calculates a regular expression for the complement language." You> actually define such an expression, but it seems not possible, as it> is, to open the box to see what is inside.It is possible as long as you make the "do-not-care" non-determinism explicit and fix appropriate datastructures for computations. > - You answered: > > In our opinion, Coquand and Siles' development strongly justifies our> > comment about formalisation of Brzozowski's argument being painful.> > If you like to have this reference added in the paper, we are happy> > to do so.>> I think that it is worth adding a comment like this one either in> section 5 (at the beginning of p. 20) or in the conclusion.We added to page 20: Therefore Coquand and Siles (2011) who follow through this idea had to spend extra effort and first formalise the quite intricate notion of inductively finite sets in order to formalise this property.> - The authors provide a detailed figure for the number of line of code> of Almeida et al's implementation of Mirkin construction, but there> is no such figure for some of other works that are cited. Maybe this> number could be dropped, or similar figures could be given for other> works, in order to give a more accurate picture.We are not 100% sure about this comment. We have not selected theexplicit numbers for the Lammich-Tuerk and Almeida et al works because of "name-and-shaming" or because of a "beauty contest". Rather to illustrate a trend. We asked about the numbers for the Lammich-Tuerk work. They also confirmed with us that their automata-with-names approach is painful to work with. Unfortunately they have not written this in their paper such that we could have cited it. Filliatre makes explicitly such a comment in his paper, which we have cited. Braibant equally wrote that he reckons that our formalisation is more concise. There is no way we can give a number in terms of lines-of-code for the formalisation in Nuprl. We believe the number for the Almeida et al work is accurate (and not even thebiggest formalisation from the ones listed, but one which is aboutone specific topic). > - I wonder if you have direction for future works, that would satisfy> your predicate "results from regular language theory, not results> from automata theory". It would be nice to give a few examples if> you have some in mind. We added the work by Sulzmann and Lu who work on regular expressionsub-matching using derivatives of regular expressions. We would liketo formalise this work. Thank you very much for suggesting future work.