Journal/JAR-response2
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Fri, 27 Sep 2013 09:20:58 +0100
changeset 389 796de251332c
parent 388 0da31edd95b9
permissions -rw-r--r--
added paper by Tobias


> Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be
> proved without relying on the notion of automata and b) argues
> shallowly thorough the paper that not having to formalize the notion
> of automata is easier.
> 
> While I do agree that point a) is a an interesting theoretical
> contribution that deserves publication, I disagree with the second
> argument.

We are very grateful to the reviewer deciding the paper is a worthy 
contribution. We are especially grateful since this opinion is despite 
disagreement with parts of the paper. Thank you very much!

> It should be noted that the handling of the notion of automata has
> been difficult historically for two reasons. On the one hand, giving
> an identifier to every state of an automata indeed yields proofs that
> may be characterized as "clunky" (even if the wording is a bit
> strong), but is a necessary condition for efficient implementation of
> automata algorithms. This yields sizable formalizations as is
> demonstrated by the Lammich and Tuerk's work or Braibant and Pous's
> work [1]. On the other hand, it seems to be the case that the
> "principle of definition" in HOL makes it impossible to use parametric
> definitions of automata (but there is no problem with such a
> definition in Coq).

We agree with this summary. We cited the work by Doczal et al and wrote 
that their quantification over types is not available to us. 

   Systems such as Coq permit quantification over types and thus can 
   state such a definition. This has been recently exploited in an elegant 
   and short formalisation of the Myhill-Nerode theorem in Coq by Doczkal 
   et al. (2013), but as said this is not available to us working in 
   Isabelle/HOL.

> But, the startup cost of a formalization depends on the scope of what
> one intends to formalize. Perhaps Braibant concedes that your
> development is more concise that his, yet Braibant's work dealt with
> the formalization of an _efficient_ decision procedure for Kleene
> algebras that required _two_ formalizations of automata (the matrices
> are only used in the proof of correctness of this decision procedure,
> while the computational part uses efficient representations for DAGs
> -- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only
> adding emphasis here to demonstrate that this does not compare easily
> with the formalization of the proof of Myhill-Nerode by itself.

We have attempted to not touch upon efficiency and algorithmic issues, since 
they are an distraction for our topic at hand, that is reasoning about inductive
structures, of which regular expressions are one instance. If you use more
efficient datastructures, then it is expected that reasoning will be
more difficult. The point of the paper is to prove the Myhill-Nerode theorem
with only inductive structures, which are well-supported in all interactive 
theorem provers we are aware off. 

> (At this point, let me comment on your answer: "As far as we know
> Braibant also uses ssreflect".  It is not the case. One may wonder to
> what extent the use of ssreflect there could have alleviated the
> "intrinsic difficulties of working with rectangular matrices".)

Noted. We will not make this claim and have not done so in the paper.

> In the same fashion, Lammich and Tuerk's work uses efficient
> data-structures (.e.g, for collections, automatas), and some amount of
> data-refinement to link various representations together. 

Please note that the 14 kloc we cited for the Lammich/Tuerk work is
the automata library, not the formalisation built on top of it.
Also Lammich and Tuerk have told us that reasoning about graphs 
with state names is undesirable. If they had a better representation
available (which they had _not_, since they formalise Hopcroft's
algorithm), they would have used this one and establish that it is 
"equivalent" to the more efficient version with efficient data-
structures. In their case there is no alternative formalising a library
about graphs etc. We have shown that in case of the Myhill-Nerode theorem
there is an alternative. We have been very careful to make claims about
being the first ones who have done so. However, we believe we are,
but certainly our proofs are not present in the literature (this
might have something to do with the fact that the Myhill-Nerode
theorem requires the reversal of Brzozowski and Arden...see
comment further below).
 
> My most stringent remark about the paper is that it should make clear
> from the beginning whether or not the algorithms are computational,
> and if they are executable, how efficient. Then, comparing with the
> two aforementioned work will make sense. 

As stated earlier, we do not like to make computational issues
explicit as they are a distraction in our opinion. The logic behind 
Isabelle/HOL and all HOL-based theorem provers is not a computational logic, 
in the sense that it is not about computable functions (unlike Coq). There 
are "uncomputable" features in HOL such as the axiom of choice (which we
make use of in our formalisation). We still use the terminology "algorithm"
in the widely-used sense of algorithms that contain "do-not-care"
nondetermism. For example, unification algorithms are usually explained and 
proved correct as transition system over sets of equations. They are not
executable in the Turing-machine sense, since they leave implicit
the order in which the equations should be analysed and which
rules should "fire" if there are more than one applicable. They
are also "imperfect" as the notion of (potentially infinite) sets
cannot be implemented, in general, as a computable function, but 
they can be reasoned about in Isabelle/HOL and other theorem provers.

Still we hope that the reviewer agrees with us and with the many uses 
in the literature (unification is only one example which commits this 
"sin"), it is nevertheless perfectly sensible to call such "things" 
algorithms. Although they cannot be directly implemented, they can be
implemented once choices are made explicit and datastructures 
fixed. Similarly, our formalisation uses a do-not-care approach
for building a regular expression out of a set of regular expressions
(see equation (2)).  It also leaves the order of equations that should be 
analysed under-specified. But if all these choices are made explicit, we
have an algorithm that is implementable in code. But as the explanation
already indicates, it is quite a digression from our main topic of
reasoning about inductive structures.

If we further contemplate the efficiency of the algorithm we 
(under)specified, then our best guess is that it is exponential in the 
worst case, just like the "naive" unification algorithm is exponential 
if sharing is not treated carefully. 

> In fact, I have been made aware of the following work [2] that do
> formalize automata and regular language theory without using explicit
> identifiers for state numbers in automata. The upside is that it
> yields much smoother proofs than previous works in Coq. The downside
> is that this formalization strategy does not yield efficient algorithm
> for automata constructions. I reckon that this is a technical report
> that should be taken with a grain of salt in the context of this
> review, but I think that it gives evidence that it is not always the
> case that the startup cost is higher with automata rather than with
> regular expressions: they formalize the Myhill-Nerode theorem,
> Kleene's theorem, the relationships between various representations of
> regular languages in around 1500 lines of ssreflect/Coq. (Again, let
> me emphasize that this is not a computational representation of
> automata, and thus, it does not easily compare in terms of size with
> Lammich and Tuerk's work or Braibant and Pous's work.) I think this
> work would make for a much more relevant comparison with your work. 

We cited this work and commented on it (see above).

> I agree that comparing with related work is hard, and that the authors
> have already been careful in their comparison with the formalization
> sizes from other works. Yet, I am afraid an uninformed outsider would
> miss the distinction between related work that are executable and
> efficient and this formalization. 

In order to get meaningful work done, we have to depend on readers 
carefully reading the paper and obtaining required background material.
I am afraid, there is no other way around. 

> - I am still confused about the explanation about the way the
> equational system is set up (your comment in the conclusion does not
> help me to pinpoint why you have to do that): I understand that
> Brzozowski annotates the final states with lambdas, and propagates
> them to the unique initial state. On the other hand you start with a
> set of equivalence classes, and you want to compute a regular
> expression for all classes that are in finals. In other words, I still
> fail to see why the following system does not suit your needs
> 
> X1 = a,X2 + b,X4 
> X2 = b,X3 + a,X4 + lambda(ONE)
> X3 = a,X4 + b,X4 + lambda(ONE)
> X4 = a,X4 + b,X4
>
> with L(r,X) = L(r) \cdot X
>
> and then propagate the equation to compute to compute a value for
> X1. (And, you could draw the automata underlying the transitions at
> the top of p. 7 to help the reader understand what is going on, and
> the relationship with the current equational system.)
> 
> Maybe it is simply that you could have done it, but that it would have
> been less pleasant to work with in the context of a Myhill-Nerode
> relation?

We used in the paper the much stronger formulation of being
_forced_ to set up the equational system in our way. And we even
now have made the earlier comment about this issue, which was 
placed in a footnote, to be part of the main text (see paragraph after 
equation (6)). 

While your equational system and computations generate a regular
expression, it does not take into account the assumption from 
the first direction of the Myhill-Nerode theorem. Recall that the 
assumption is that there are finitely many equivalence classes 
by the Myhill-Nerode relation. Therefore we have to make our equations
to be in agreement with these equivalence classes, which the properties
in equation (7) and (8) make explicit. We further have to maintain
this agreement during the solution of the equational system. 

Your equational system is _not_ in agreement with the equivalence 
classes: consider that 

   X1 = {[]}
   X2 = {[a]}
   X3 = {[a, b]}
   X4 = UNIV − {[], [a], [a, b]} 

hold, which is not true for the equations you have set up (under the
natural interpretation you have given). Another point is that it is always the
case that only one equivalence class in the Myhill-Nerode theorem contains 
the empty string. Therefore it cannot be the case that there are more than 
one (initial) equation containing the marker lambda(ONE), which translates 
into the empty string under the natural interpretation. So we are
of the opinion that for the Myhill-Nerode theorem you are really forced 
to "reverse" the standard approaches. Please let us know, if this 
clarifies this issue for you.

> - I really appreciate the rephrasing of section 4, especially the
> final remark. I think it clearly helps the reader to understand the
> proof arguments, and the relationship with automata. 

Thank you.

> - I concur with the second reviewer to say that the executability of
> the first half of the M-N theorem should be discussed in the
> conclusion. Moreover, I am puzzled by your comments in the response to
> reviewers: to what extent does your formalization need to be
> _modified_ to make it executable? Do you need to change the code, or
> to add more things? (More precisely, you discuss the executability of
> the second half of the M-N theorem -- that constructs a partition of
> the set of strings -- in the conclusion, in the context of Haftmann
> executable sets; 

The first half is about the Myhill-Nerode relation partitioning the sets of 
all strings into equivalence classes. Our remark applies to this "half".

> you should do the same for the first-half---that
> constructs a regular expression). I am not sure I am happy with the
> things you wrote about executable sets, because it is not clear from
> my point of view if you need to modify the code, or add things. 

See comment above. We really not want to open up this can of worms.

> - More generally, it should be made clear that when you say algorithm,
> the reader should not understand it as something that is computable as
> it is. In particular, I think that the following sentence is
> misleading: "our algorithm for solving equational systems actually
> calculates a regular expression for the complement language." You
> actually define such an expression, but it seems not possible, as it
> is, to open the box to see what is inside.

It is possible as long as you make the "do-not-care" non-determinism 
explicit and fix appropriate datastructures for computations. 

> - You answered: 
> > In our opinion, Coquand and Siles' development strongly justifies our
> > comment about formalisation of Brzozowski's argument being painful.
> > If you like to have this reference added in the paper, we are happy
> > to do so.
>
> I think that it is worth adding a comment like this one either in
> section 5 (at the beginning of p. 20) or in the conclusion.

We added to page 20:

  Therefore Coquand and Siles (2011) who follow through this idea 
  had to spend extra effort and first formalise the quite intricate 
  notion of inductively finite sets in order to formalise this property.

> - The authors provide a detailed figure for the number of line of code
>   of Almeida et al's implementation of Mirkin construction, but there
>   is no such figure for some of other works that are cited. Maybe this
>   number could be dropped, or similar figures could be given for other
>   works, in order to give a more accurate picture.

We are not 100% sure about this comment. We have not selected the
explicit numbers for the Lammich-Tuerk and Almeida et al works because of  
"name-and-shaming" or because of a "beauty contest". Rather to illustrate 
a trend. We asked about the numbers for the Lammich-Tuerk work. They 
also confirmed with us that their automata-with-names approach is painful 
to work with. Unfortunately they have not written this in their paper 
such that we could have cited it. Filliatre makes explicitly such a comment 
in his paper, which we have cited. Braibant equally wrote that he reckons 
that our formalisation is more concise. There is no way we can give a number 
in terms of lines-of-code for the formalisation in Nuprl. We believe 
the number for the Almeida et al work is accurate (and not even the
biggest formalisation from the ones listed, but one which is about
one specific topic).  

> - I wonder if you have direction for future works, that would satisfy
>  your predicate "results from regular language theory, not results
>  from automata theory". It would be nice to give a few examples if
>  you have some in mind. 

We added the work by Sulzmann and Lu who work on regular expression
sub-matching using derivatives of regular expressions. We would like
to formalise this work. Thank you very much for suggesting future work.