# HG changeset patch
# User Christian Urban <christian dot urban at kcl dot ac dot uk>
# Date 1379413318 -3600
# Node ID 0da31edd95b929b44da386953623047ccae789e1
# Parent  288637d9dcdee727e8434393ed92bc6010c6a9f5
new version

diff -r 288637d9dcde -r 0da31edd95b9 Journal/JAR-response2
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Journal/JAR-response2	Tue Sep 17 11:21:58 2013 +0100
@@ -0,0 +1,288 @@
+
+> Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be
+> proved without relying on the notion of automata and b) argues
+> shallowly thorough the paper that not having to formalize the notion
+> of automata is easier.
+> 
+> While I do agree that point a) is a an interesting theoretical
+> contribution that deserves publication, I disagree with the second
+> argument.
+
+We are very grateful to the reviewer deciding the paper is a worthy 
+contribution. We are especially grateful since this opinion is despite 
+disagreement with parts of the paper. Thank you very much!
+
+> It should be noted that the handling of the notion of automata has
+> been difficult historically for two reasons. On the one hand, giving
+> an identifier to every state of an automata indeed yields proofs that
+> may be characterized as "clunky" (even if the wording is a bit
+> strong), but is a necessary condition for efficient implementation of
+> automata algorithms. This yields sizable formalizations as is
+> demonstrated by the Lammich and Tuerk's work or Braibant and Pous's
+> work [1]. On the other hand, it seems to be the case that the
+> "principle of definition" in HOL makes it impossible to use parametric
+> definitions of automata (but there is no problem with such a
+> definition in Coq).
+
+We agree with this summary. We cited the work by Doczal et al and wrote 
+that their quantification over types is not available to us. 
+
+   Systems such as Coq permit quantification over types and thus can 
+   state such a definition. This has been recently exploited in an elegant 
+   and short formalisation of the Myhill-Nerode theorem in Coq by Doczkal 
+   et al. (2013), but as said this is not available to us working in 
+   Isabelle/HOL.
+
+> But, the startup cost of a formalization depends on the scope of what
+> one intends to formalize. Perhaps Braibant concedes that your
+> development is more concise that his, yet Braibant's work dealt with
+> the formalization of an _efficient_ decision procedure for Kleene
+> algebras that required _two_ formalizations of automata (the matrices
+> are only used in the proof of correctness of this decision procedure,
+> while the computational part uses efficient representations for DAGs
+> -- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only
+> adding emphasis here to demonstrate that this does not compare easily
+> with the formalization of the proof of Myhill-Nerode by itself.
+
+We have attempted to not touch upon efficiency and algorithmic issues, since 
+they are an distraction for our topic at hand, that is reasoning about inductive
+structures, of which regular expressions are one instance. If you use more
+efficient datastructures, then it is expected that reasoning will be
+more difficult. The point of the paper is to prove the Myhill-Nerode theorem
+with only inductive structures, which are well-supported in all interactive 
+theorem provers we are aware off. 
+
+> (At this point, let me comment on your answer: "As far as we know
+> Braibant also uses ssreflect".  It is not the case. One may wonder to
+> what extent the use of ssreflect there could have alleviated the
+> "intrinsic difficulties of working with rectangular matrices".)
+
+Noted. We will not make this claim and have not done so in the paper.
+
+> In the same fashion, Lammich and Tuerk's work uses efficient
+> data-structures (.e.g, for collections, automatas), and some amount of
+> data-refinement to link various representations together. 
+
+Please note that the 14 kloc we cited for the Lammich/Tuerk work is
+the automata library, not the formalisation built on top of it.
+Also Lammich and Tuerk have told us that reasoning about graphs 
+with state names is undesirable. If they had a better representation
+available (which they had _not_, since they formalise Hopcroft's
+algorithm), they would have used this one and establish that it is 
+"equivalent" to the more efficient version with efficient data-
+structures. In their case there is no alternative formalising a library
+about graphs etc. We have shown that in case of the Myhill-Nerode theorem
+there is an alternative. We have been very careful to make claims about
+being the first ones who have done so. However, we believe we are,
+but certainly our proofs are not present in the literature (this
+might have something to do with the fact that the Myhill-Nerode
+theorem requires the reversal of Brzozowski and Arden...see
+comment further below).
+ 
+> My most stringent remark about the paper is that it should make clear
+> from the beginning whether or not the algorithms are computational,
+> and if they are executable, how efficient. Then, comparing with the
+> two aforementioned work will make sense. 
+
+As stated earlier, we do not like to make computational issues
+explicit as they are a distraction in our opinion. The logic behind 
+Isabelle/HOL and all HOL-based theorem provers is not a computational logic, 
+in the sense that it is not about computable functions (unlike Coq). There 
+are "uncomputable" features in HOL such as the axiom of choice (which we
+make use of in our formalisation). We still use the terminology "algorithm"
+in the widely-used sense of algorithms that contain "do-not-care"
+nondetermism. For example, unification algorithms are usually explained and 
+proved correct as transition system over sets of equations. They are not
+executable in the Turing-machine sense, since they leave implicit
+the order in which the equations should be analysed and which
+rules should "fire" if there are more than one applicable. They
+are also "imperfect" as the notion of (potentially infinite) sets
+cannot be implemented, in general, as a computable function, but 
+they can be reasoned about in Isabelle/HOL and other theorem provers.
+
+Still we hope that the reviewer agrees with us and with the many uses 
+in the literature (unification is only one example which commits this 
+"sin"), it is nevertheless perfectly sensible to call such "things" 
+algorithms. Although they cannot be directly implemented, they can be
+implemented once choices are made explicit and datastructures 
+fixed. Similarly, our formalisation uses a do-not-care approach
+for building a regular expression out of a set of regular expressions
+(see equation (2)).  It also leaves the order of equations that should be 
+analysed under-specified. But if all these choices are made explicit, we
+have an algorithm that is implementable in code. But as the explanation
+already indicates, it is quite a digression from our main topic of
+reasoning about inductive structures.
+
+If we further contemplate the efficiency of the algorithm we 
+(under)specified, then our best guess is that it is exponential in the 
+worst case, just like the "naive" unification algorithm is exponential 
+if sharing is not treated carefully. 
+
+> In fact, I have been made aware of the following work [2] that do
+> formalize automata and regular language theory without using explicit
+> identifiers for state numbers in automata. The upside is that it
+> yields much smoother proofs than previous works in Coq. The downside
+> is that this formalization strategy does not yield efficient algorithm
+> for automata constructions. I reckon that this is a technical report
+> that should be taken with a grain of salt in the context of this
+> review, but I think that it gives evidence that it is not always the
+> case that the startup cost is higher with automata rather than with
+> regular expressions: they formalize the Myhill-Nerode theorem,
+> Kleene's theorem, the relationships between various representations of
+> regular languages in around 1500 lines of ssreflect/Coq. (Again, let
+> me emphasize that this is not a computational representation of
+> automata, and thus, it does not easily compare in terms of size with
+> Lammich and Tuerk's work or Braibant and Pous's work.) I think this
+> work would make for a much more relevant comparison with your work. 
+
+We cited this work and commented on it (see above).
+
+> I agree that comparing with related work is hard, and that the authors
+> have already been careful in their comparison with the formalization
+> sizes from other works. Yet, I am afraid an uninformed outsider would
+> miss the distinction between related work that are executable and
+> efficient and this formalization. 
+
+In order to get meaningful work done, we have to depend on readers 
+carefully reading the paper and obtaining required background material.
+I am afraid, there is no other way around. 
+
+> - I am still confused about the explanation about the way the
+> equational system is set up (your comment in the conclusion does not
+> help me to pinpoint why you have to do that): I understand that
+> Brzozowski annotates the final states with lambdas, and propagates
+> them to the unique initial state. On the other hand you start with a
+> set of equivalence classes, and you want to compute a regular
+> expression for all classes that are in finals. In other words, I still
+> fail to see why the following system does not suit your needs
+> 
+> X1 = a,X2 + b,X4 
+> X2 = b,X3 + a,X4 + lambda(ONE)
+> X3 = a,X4 + b,X4 + lambda(ONE)
+> X4 = a,X4 + b,X4
+>
+> with L(r,X) = L(r) \cdot X
+>
+> and then propagate the equation to compute to compute a value for
+> X1. (And, you could draw the automata underlying the transitions at
+> the top of p. 7 to help the reader understand what is going on, and
+> the relationship with the current equational system.)
+> 
+> Maybe it is simply that you could have done it, but that it would have
+> been less pleasant to work with in the context of a Myhill-Nerode
+> relation?
+
+We used in the paper the much stronger formulation of being
+_forced_ to set up the equational system in our way. And we even
+now have made the earlier comment about this issue, which was 
+placed in a footnote, to be part of the main text (see paragraph after 
+equation (6)). 
+
+While your equational system and computations generate a regular
+expression, it does not take into account the assumption from 
+the first direction of the Myhill-Nerode theorem. Recall that the 
+assumption is that there are finitely many equivalence classes 
+by the Myhill-Nerode relation. Therefore we have to make our equations
+to be in agreement with these equivalence classes, which the properties
+in equation (7) and (8) make explicit. We further have to maintain
+this agreement during the solution of the equational system. 
+
+Your equational system is _not_ in agreement with the equivalence 
+classes: consider that 
+
+   X1 = {[]}
+   X2 = {[a]}
+   X3 = {[a, b]}
+   X4 = UNIV − {[], [a], [a, b]} 
+
+hold, which is not true for the equations you have set up (under the
+natural interpretation you have given). Another point is that it is always the
+case that only one equivalence class in the Myhill-Nerode theorem contains 
+the empty string. Therefore it cannot be the case that there are more than 
+one (initial) equation containing the marker lambda(ONE), which translates 
+into the empty string under the natural interpretation. So we are
+of the opinion that for the Myhill-Nerode theorem you are really forced 
+to "reverse" the standard approaches. Please let us know, if this 
+clarifies this issue for you.
+
+> - I really appreciate the rephrasing of section 4, especially the
+> final remark. I think it clearly helps the reader to understand the
+> proof arguments, and the relationship with automata. 
+
+Thank you.
+
+> - I concur with the second reviewer to say that the executability of
+> the first half of the M-N theorem should be discussed in the
+> conclusion. Moreover, I am puzzled by your comments in the response to
+> reviewers: to what extent does your formalization need to be
+> _modified_ to make it executable? Do you need to change the code, or
+> to add more things? (More precisely, you discuss the executability of
+> the second half of the M-N theorem -- that constructs a partition of
+> the set of strings -- in the conclusion, in the context of Haftmann
+> executable sets; 
+
+The first half is about the Myhill-Nerode relation partitioning the sets of 
+all strings into equivalence classes. Our remark applies to this "half".
+
+> you should do the same for the first-half---that
+> constructs a regular expression). I am not sure I am happy with the
+> things you wrote about executable sets, because it is not clear from
+> my point of view if you need to modify the code, or add things. 
+
+See comment above. We really not want to open up this can of worms.
+
+> - More generally, it should be made clear that when you say algorithm,
+> the reader should not understand it as something that is computable as
+> it is. In particular, I think that the following sentence is
+> misleading: "our algorithm for solving equational systems actually
+> calculates a regular expression for the complement language." You
+> actually define such an expression, but it seems not possible, as it
+> is, to open the box to see what is inside.
+
+It is possible as long as you make the "do-not-care" non-determinism 
+explicit and fix appropriate datastructures for computations. 
+
+> - You answered: 
+> > In our opinion, Coquand and Siles' development strongly justifies our
+> > comment about formalisation of Brzozowski's argument being painful.
+> > If you like to have this reference added in the paper, we are happy
+> > to do so.
+>
+> I think that it is worth adding a comment like this one either in
+> section 5 (at the beginning of p. 20) or in the conclusion.
+
+We added to page 20:
+
+  Therefore Coquand and Siles (2011) who follow through this idea 
+  had to spend extra effort and first formalise the quite intricate 
+  notion of inductively finite sets in order to formalise this property.
+
+> - The authors provide a detailed figure for the number of line of code
+>   of Almeida et al's implementation of Mirkin construction, but there
+>   is no such figure for some of other works that are cited. Maybe this
+>   number could be dropped, or similar figures could be given for other
+>   works, in order to give a more accurate picture.
+
+We are not 100% sure about this comment. We have not selected the
+explicit numbers for the Lammich-Tuerk and Almeida et al works because of  
+"name-and-shaming" or because of a "beauty contest". Rather to illustrate 
+a trend. We asked about the numbers for the Lammich-Tuerk work. They 
+also confirmed with us that their automata-with-names approach is painful 
+to work with. Unfortunately they have not written this in their paper 
+such that we could have cited it. Filliatre makes explicitly such a comment 
+in his paper, which we have cited. Braibant equally wrote that he reckons 
+that our formalisation is more concise. There is no way we can give a number 
+in terms of lines-of-code for the formalisation in Nuprl. We believe 
+the number for the Almeida et al work is accurate (and not even the
+biggest formalisation from the ones listed, but one which is about
+one specific topic).  
+
+> - I wonder if you have direction for future works, that would satisfy
+>  your predicate "results from regular language theory, not results
+>  from automata theory". It would be nice to give a few examples if
+>  you have some in mind. 
+
+We added the work by Sulzmann and Lu who work on regular expression
+sub-matching using derivatives of regular expressions. We would like
+to formalise this work. Thank you very much for suggesting future work.
+
diff -r 288637d9dcde -r 0da31edd95b9 Journal/Paper.thy
--- a/Journal/Paper.thy	Sat Sep 14 14:08:19 2013 +0100
+++ b/Journal/Paper.thy	Tue Sep 17 11:21:58 2013 +0100
@@ -344,7 +344,7 @@
   2005.} Systems such as Coq permit quantification over types and thus can
   state such a definition.  This has been recently exploited in an
   elegant and short formalisation of the Myhill-Nerode theorem in Coq by
-  \citeN{Smolka13}, but as said this is not avaible to us working in Isabelle/HOL.
+  \citeN{Smolka13}, but as said this is not available to us working in Isabelle/HOL.
 
   An alternative, which provides us with a single type for states of automata,
   is to give every state node an identity, for example a natural number, and
@@ -1741,8 +1741,8 @@
   \end{proof}
 
   \begin{rmk}
-  While our proof using tagging functions might seem like a rabbit pulled 
-  out of a hat, the intuition behind can be somewhat rationalised by taking the 
+  While our proof using tagging functions might seem like a `rabbit pulled 
+  out of a hat', the intuition behind can be somewhat rationalised by taking the 
   view that the equivalence classes @{term "UNIV // \<approx>(lang r)"} stand for the 
   states of the minimal automaton for the language @{term "lang r"}. The theorem 
   amounts to showing that this minimal automaton has finitely many states. 
@@ -1968,7 +1968,7 @@
   but the one modulo @{text "ACI"}.  In principle, this can be done formally, 
   but it is very painful in a theorem prover---since there is no
   direct characterisation of the set of dissimilar derivatives. Therefore
-  \citeN{CoquandSiles12} who follow through this idea had to `stand on their  heads' and
+  \citeN{CoquandSiles12} who follow through this idea had to spend extra effort and
   first formalise the quite intricate notion of \emph{inductively finite sets} in 
   order to formalise this property.
 
@@ -2315,7 +2315,7 @@
   \noindent
   Our proof follows the one given by \citeN[Pages 92--95]{Shallit08}, except that we use
   Higman's Lemma, which is already proved in the Isabelle/HOL library by
-  Sternagel.
+  \citeN{Sternagel13}.
   Higman's Lemma allows us to infer that every language @{text A} of antichains, satisfying
   %
   \begin{equation}\label{higman}
diff -r 288637d9dcde -r 0da31edd95b9 Journal/document/root.bib
--- a/Journal/document/root.bib	Sat Sep 14 14:08:19 2013 +0100
+++ b/Journal/document/root.bib	Tue Sep 17 11:21:58 2013 +0100
@@ -445,3 +445,10 @@
   year =         {2013}
 }
 
+@Unpublished{Sternagel13,
+  author =       {C.~Sternagel},
+  title =        {{C}ertified {K}ruskal’s {T}ree {T}heorem},
+  note =         {Accepted for publication in {\it Proc.~of the 3rd International Conference on 
+                  Certified Programs and Proofs}},
+  year =         {2013}
+}
\ No newline at end of file
diff -r 288637d9dcde -r 0da31edd95b9 journal.pdf
Binary file journal.pdf has changed