--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Journal/JAR-response2 Tue Sep 17 11:21:58 2013 +0100
@@ -0,0 +1,288 @@
+
+> Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be
+> proved without relying on the notion of automata and b) argues
+> shallowly thorough the paper that not having to formalize the notion
+> of automata is easier.
+>
+> While I do agree that point a) is a an interesting theoretical
+> contribution that deserves publication, I disagree with the second
+> argument.
+
+We are very grateful to the reviewer deciding the paper is a worthy
+contribution. We are especially grateful since this opinion is despite
+disagreement with parts of the paper. Thank you very much!
+
+> It should be noted that the handling of the notion of automata has
+> been difficult historically for two reasons. On the one hand, giving
+> an identifier to every state of an automata indeed yields proofs that
+> may be characterized as "clunky" (even if the wording is a bit
+> strong), but is a necessary condition for efficient implementation of
+> automata algorithms. This yields sizable formalizations as is
+> demonstrated by the Lammich and Tuerk's work or Braibant and Pous's
+> work [1]. On the other hand, it seems to be the case that the
+> "principle of definition" in HOL makes it impossible to use parametric
+> definitions of automata (but there is no problem with such a
+> definition in Coq).
+
+We agree with this summary. We cited the work by Doczal et al and wrote
+that their quantification over types is not available to us.
+
+ Systems such as Coq permit quantification over types and thus can
+ state such a definition. This has been recently exploited in an elegant
+ and short formalisation of the Myhill-Nerode theorem in Coq by Doczkal
+ et al. (2013), but as said this is not available to us working in
+ Isabelle/HOL.
+
+> But, the startup cost of a formalization depends on the scope of what
+> one intends to formalize. Perhaps Braibant concedes that your
+> development is more concise that his, yet Braibant's work dealt with
+> the formalization of an _efficient_ decision procedure for Kleene
+> algebras that required _two_ formalizations of automata (the matrices
+> are only used in the proof of correctness of this decision procedure,
+> while the computational part uses efficient representations for DAGs
+> -- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only
+> adding emphasis here to demonstrate that this does not compare easily
+> with the formalization of the proof of Myhill-Nerode by itself.
+
+We have attempted to not touch upon efficiency and algorithmic issues, since
+they are an distraction for our topic at hand, that is reasoning about inductive
+structures, of which regular expressions are one instance. If you use more
+efficient datastructures, then it is expected that reasoning will be
+more difficult. The point of the paper is to prove the Myhill-Nerode theorem
+with only inductive structures, which are well-supported in all interactive
+theorem provers we are aware off.
+
+> (At this point, let me comment on your answer: "As far as we know
+> Braibant also uses ssreflect". It is not the case. One may wonder to
+> what extent the use of ssreflect there could have alleviated the
+> "intrinsic difficulties of working with rectangular matrices".)
+
+Noted. We will not make this claim and have not done so in the paper.
+
+> In the same fashion, Lammich and Tuerk's work uses efficient
+> data-structures (.e.g, for collections, automatas), and some amount of
+> data-refinement to link various representations together.
+
+Please note that the 14 kloc we cited for the Lammich/Tuerk work is
+the automata library, not the formalisation built on top of it.
+Also Lammich and Tuerk have told us that reasoning about graphs
+with state names is undesirable. If they had a better representation
+available (which they had _not_, since they formalise Hopcroft's
+algorithm), they would have used this one and establish that it is
+"equivalent" to the more efficient version with efficient data-
+structures. In their case there is no alternative formalising a library
+about graphs etc. We have shown that in case of the Myhill-Nerode theorem
+there is an alternative. We have been very careful to make claims about
+being the first ones who have done so. However, we believe we are,
+but certainly our proofs are not present in the literature (this
+might have something to do with the fact that the Myhill-Nerode
+theorem requires the reversal of Brzozowski and Arden...see
+comment further below).
+
+> My most stringent remark about the paper is that it should make clear
+> from the beginning whether or not the algorithms are computational,
+> and if they are executable, how efficient. Then, comparing with the
+> two aforementioned work will make sense.
+
+As stated earlier, we do not like to make computational issues
+explicit as they are a distraction in our opinion. The logic behind
+Isabelle/HOL and all HOL-based theorem provers is not a computational logic,
+in the sense that it is not about computable functions (unlike Coq). There
+are "uncomputable" features in HOL such as the axiom of choice (which we
+make use of in our formalisation). We still use the terminology "algorithm"
+in the widely-used sense of algorithms that contain "do-not-care"
+nondetermism. For example, unification algorithms are usually explained and
+proved correct as transition system over sets of equations. They are not
+executable in the Turing-machine sense, since they leave implicit
+the order in which the equations should be analysed and which
+rules should "fire" if there are more than one applicable. They
+are also "imperfect" as the notion of (potentially infinite) sets
+cannot be implemented, in general, as a computable function, but
+they can be reasoned about in Isabelle/HOL and other theorem provers.
+
+Still we hope that the reviewer agrees with us and with the many uses
+in the literature (unification is only one example which commits this
+"sin"), it is nevertheless perfectly sensible to call such "things"
+algorithms. Although they cannot be directly implemented, they can be
+implemented once choices are made explicit and datastructures
+fixed. Similarly, our formalisation uses a do-not-care approach
+for building a regular expression out of a set of regular expressions
+(see equation (2)). It also leaves the order of equations that should be
+analysed under-specified. But if all these choices are made explicit, we
+have an algorithm that is implementable in code. But as the explanation
+already indicates, it is quite a digression from our main topic of
+reasoning about inductive structures.
+
+If we further contemplate the efficiency of the algorithm we
+(under)specified, then our best guess is that it is exponential in the
+worst case, just like the "naive" unification algorithm is exponential
+if sharing is not treated carefully.
+
+> In fact, I have been made aware of the following work [2] that do
+> formalize automata and regular language theory without using explicit
+> identifiers for state numbers in automata. The upside is that it
+> yields much smoother proofs than previous works in Coq. The downside
+> is that this formalization strategy does not yield efficient algorithm
+> for automata constructions. I reckon that this is a technical report
+> that should be taken with a grain of salt in the context of this
+> review, but I think that it gives evidence that it is not always the
+> case that the startup cost is higher with automata rather than with
+> regular expressions: they formalize the Myhill-Nerode theorem,
+> Kleene's theorem, the relationships between various representations of
+> regular languages in around 1500 lines of ssreflect/Coq. (Again, let
+> me emphasize that this is not a computational representation of
+> automata, and thus, it does not easily compare in terms of size with
+> Lammich and Tuerk's work or Braibant and Pous's work.) I think this
+> work would make for a much more relevant comparison with your work.
+
+We cited this work and commented on it (see above).
+
+> I agree that comparing with related work is hard, and that the authors
+> have already been careful in their comparison with the formalization
+> sizes from other works. Yet, I am afraid an uninformed outsider would
+> miss the distinction between related work that are executable and
+> efficient and this formalization.
+
+In order to get meaningful work done, we have to depend on readers
+carefully reading the paper and obtaining required background material.
+I am afraid, there is no other way around.
+
+> - I am still confused about the explanation about the way the
+> equational system is set up (your comment in the conclusion does not
+> help me to pinpoint why you have to do that): I understand that
+> Brzozowski annotates the final states with lambdas, and propagates
+> them to the unique initial state. On the other hand you start with a
+> set of equivalence classes, and you want to compute a regular
+> expression for all classes that are in finals. In other words, I still
+> fail to see why the following system does not suit your needs
+>
+> X1 = a,X2 + b,X4
+> X2 = b,X3 + a,X4 + lambda(ONE)
+> X3 = a,X4 + b,X4 + lambda(ONE)
+> X4 = a,X4 + b,X4
+>
+> with L(r,X) = L(r) \cdot X
+>
+> and then propagate the equation to compute to compute a value for
+> X1. (And, you could draw the automata underlying the transitions at
+> the top of p. 7 to help the reader understand what is going on, and
+> the relationship with the current equational system.)
+>
+> Maybe it is simply that you could have done it, but that it would have
+> been less pleasant to work with in the context of a Myhill-Nerode
+> relation?
+
+We used in the paper the much stronger formulation of being
+_forced_ to set up the equational system in our way. And we even
+now have made the earlier comment about this issue, which was
+placed in a footnote, to be part of the main text (see paragraph after
+equation (6)).
+
+While your equational system and computations generate a regular
+expression, it does not take into account the assumption from
+the first direction of the Myhill-Nerode theorem. Recall that the
+assumption is that there are finitely many equivalence classes
+by the Myhill-Nerode relation. Therefore we have to make our equations
+to be in agreement with these equivalence classes, which the properties
+in equation (7) and (8) make explicit. We further have to maintain
+this agreement during the solution of the equational system.
+
+Your equational system is _not_ in agreement with the equivalence
+classes: consider that
+
+ X1 = {[]}
+ X2 = {[a]}
+ X3 = {[a, b]}
+ X4 = UNIV − {[], [a], [a, b]}
+
+hold, which is not true for the equations you have set up (under the
+natural interpretation you have given). Another point is that it is always the
+case that only one equivalence class in the Myhill-Nerode theorem contains
+the empty string. Therefore it cannot be the case that there are more than
+one (initial) equation containing the marker lambda(ONE), which translates
+into the empty string under the natural interpretation. So we are
+of the opinion that for the Myhill-Nerode theorem you are really forced
+to "reverse" the standard approaches. Please let us know, if this
+clarifies this issue for you.
+
+> - I really appreciate the rephrasing of section 4, especially the
+> final remark. I think it clearly helps the reader to understand the
+> proof arguments, and the relationship with automata.
+
+Thank you.
+
+> - I concur with the second reviewer to say that the executability of
+> the first half of the M-N theorem should be discussed in the
+> conclusion. Moreover, I am puzzled by your comments in the response to
+> reviewers: to what extent does your formalization need to be
+> _modified_ to make it executable? Do you need to change the code, or
+> to add more things? (More precisely, you discuss the executability of
+> the second half of the M-N theorem -- that constructs a partition of
+> the set of strings -- in the conclusion, in the context of Haftmann
+> executable sets;
+
+The first half is about the Myhill-Nerode relation partitioning the sets of
+all strings into equivalence classes. Our remark applies to this "half".
+
+> you should do the same for the first-half---that
+> constructs a regular expression). I am not sure I am happy with the
+> things you wrote about executable sets, because it is not clear from
+> my point of view if you need to modify the code, or add things.
+
+See comment above. We really not want to open up this can of worms.
+
+> - More generally, it should be made clear that when you say algorithm,
+> the reader should not understand it as something that is computable as
+> it is. In particular, I think that the following sentence is
+> misleading: "our algorithm for solving equational systems actually
+> calculates a regular expression for the complement language." You
+> actually define such an expression, but it seems not possible, as it
+> is, to open the box to see what is inside.
+
+It is possible as long as you make the "do-not-care" non-determinism
+explicit and fix appropriate datastructures for computations.
+
+> - You answered:
+> > In our opinion, Coquand and Siles' development strongly justifies our
+> > comment about formalisation of Brzozowski's argument being painful.
+> > If you like to have this reference added in the paper, we are happy
+> > to do so.
+>
+> I think that it is worth adding a comment like this one either in
+> section 5 (at the beginning of p. 20) or in the conclusion.
+
+We added to page 20:
+
+ Therefore Coquand and Siles (2011) who follow through this idea
+ had to spend extra effort and first formalise the quite intricate
+ notion of inductively finite sets in order to formalise this property.
+
+> - The authors provide a detailed figure for the number of line of code
+> of Almeida et al's implementation of Mirkin construction, but there
+> is no such figure for some of other works that are cited. Maybe this
+> number could be dropped, or similar figures could be given for other
+> works, in order to give a more accurate picture.
+
+We are not 100% sure about this comment. We have not selected the
+explicit numbers for the Lammich-Tuerk and Almeida et al works because of
+"name-and-shaming" or because of a "beauty contest". Rather to illustrate
+a trend. We asked about the numbers for the Lammich-Tuerk work. They
+also confirmed with us that their automata-with-names approach is painful
+to work with. Unfortunately they have not written this in their paper
+such that we could have cited it. Filliatre makes explicitly such a comment
+in his paper, which we have cited. Braibant equally wrote that he reckons
+that our formalisation is more concise. There is no way we can give a number
+in terms of lines-of-code for the formalisation in Nuprl. We believe
+the number for the Almeida et al work is accurate (and not even the
+biggest formalisation from the ones listed, but one which is about
+one specific topic).
+
+> - I wonder if you have direction for future works, that would satisfy
+> your predicate "results from regular language theory, not results
+> from automata theory". It would be nice to give a few examples if
+> you have some in mind.
+
+We added the work by Sulzmann and Lu who work on regular expression
+sub-matching using derivatives of regular expressions. We would like
+to formalise this work. Thank you very much for suggesting future work.
+
--- a/Journal/Paper.thy Sat Sep 14 14:08:19 2013 +0100
+++ b/Journal/Paper.thy Tue Sep 17 11:21:58 2013 +0100
@@ -344,7 +344,7 @@
2005.} Systems such as Coq permit quantification over types and thus can
state such a definition. This has been recently exploited in an
elegant and short formalisation of the Myhill-Nerode theorem in Coq by
- \citeN{Smolka13}, but as said this is not avaible to us working in Isabelle/HOL.
+ \citeN{Smolka13}, but as said this is not available to us working in Isabelle/HOL.
An alternative, which provides us with a single type for states of automata,
is to give every state node an identity, for example a natural number, and
@@ -1741,8 +1741,8 @@
\end{proof}
\begin{rmk}
- While our proof using tagging functions might seem like a rabbit pulled
- out of a hat, the intuition behind can be somewhat rationalised by taking the
+ While our proof using tagging functions might seem like a `rabbit pulled
+ out of a hat', the intuition behind can be somewhat rationalised by taking the
view that the equivalence classes @{term "UNIV // \<approx>(lang r)"} stand for the
states of the minimal automaton for the language @{term "lang r"}. The theorem
amounts to showing that this minimal automaton has finitely many states.
@@ -1968,7 +1968,7 @@
but the one modulo @{text "ACI"}. In principle, this can be done formally,
but it is very painful in a theorem prover---since there is no
direct characterisation of the set of dissimilar derivatives. Therefore
- \citeN{CoquandSiles12} who follow through this idea had to `stand on their heads' and
+ \citeN{CoquandSiles12} who follow through this idea had to spend extra effort and
first formalise the quite intricate notion of \emph{inductively finite sets} in
order to formalise this property.
@@ -2315,7 +2315,7 @@
\noindent
Our proof follows the one given by \citeN[Pages 92--95]{Shallit08}, except that we use
Higman's Lemma, which is already proved in the Isabelle/HOL library by
- Sternagel.
+ \citeN{Sternagel13}.
Higman's Lemma allows us to infer that every language @{text A} of antichains, satisfying
%
\begin{equation}\label{higman}