Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be
proved without relying on the notion of automata and b) argues
shallowly thorough the paper that not having to formalize the notion
of automata is easier.
While I do agree that point a) is a an interesting theoretical
contribution that deserves publication, I disagree with the second
argument.
It should be noted that the handling of the notion of automata has
been difficult historically for two reasons. On the one hand, giving
an identifier to every state of an automata indeed yields proofs that
may be characterized as "clunky" (even if the wording is a bit
strong), but is a necessary condition for efficient implementation of
automata algorithms. This yields sizable formalizations as is
demonstrated by the Lammich and Tuerk's work or Braibant and Pous's
work [1]. On the other hand, it seems to be the case that the
"principle of definition" in HOL makes it impossible to use parametric
definitions of automata (but there is no problem with such a
definition in Coq).
But, the startup cost of a formalization depends on the scope of what
one intends to formalize. Perhaps Braibant concedes that your
development is more concise that his, yet Braibant's work dealt with
the formalization of an _efficient_ decision procedure for Kleene
algebras that required _two_ formalizations of automata (the matrices
are only used in the proof of correctness of this decision procedure,
while the computational part uses efficient representations for DAGs
-- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only
adding emphasis here to demonstrate that this does not compare easily
with the formalization of the proof of Myhill-Nerode by itself.
(At this point, let me comment on your answer: "As far as we know
Braibant also uses ssreflect".  It is not the case. One may wonder to
what extent the use of ssreflect there could have alleviated the
"intrinsic difficulties of working with rectangular matrices".)
In the same fashion, Lammich and Tuerk's work uses efficient
data-structures (.e.g, for collections, automatas), and some amount of
data-refinement to link various representations together. 
My most stringent remark about the paper is that it should make clear
from the beginning whether or not the algorithms are computational,
and if they are executable, how efficient. Then, comparing with the
two aforementioned work will make sense. 
In fact, I have been made aware of the following work [2] that do
formalize automata and regular language theory without using explicit
identifiers for state numbers in automata. The upside is that it
yields much smoother proofs than previous works in Coq. The downside
is that this formalization strategy does not yield efficient algorithm
for automata constructions. I reckon that this is a technical report
that should be taken with a grain of salt in the context of this
review, but I think that it gives evidence that it is not always the
case that the startup cost is higher with automata rather than with
regular expressions: they formalize the Myhill-Nerode theorem,
Kleene's theorem, the relationships between various representations of
regular languages in around 1500 lines of ssreflect/Coq. (Again, let
me emphasize that this is not a computational representation of
automata, and thus, it does not easily compare in terms of size with
Lammich and Tuerk's work or Braibant and Pous's work.) I think this
work would make for a much more relevant comparison with your work. 
I agree that comparing with related work is hard, and that the authors
have already been careful in their comparison with the formalization
sizes from other works. Yet, I am afraid an uninformed outsider would
miss the distinction between related work that are executable and
efficient and this formalization. 
Minor comments about the paper
==============================
- I am still confused about the explanation about the way the
equational system is set up (your comment in the conclusion does not
help me to pinpoint why you have to do that): I understand that
Brzozowski annotates the final states with lambdas, and propagates
them to the unique initial state. On the other hand you start with a
set of equivalence classes, and you want to compute a regular
expression for all classes that are in finals. In other words, I still
fail to see why the following system does not suit your needs
X1 = a,X2 + b,X4 
X2 = b,X3 + a,X4 + lambda(ONE)
X3 = a,X4 + b,X4 + lambda(ONE)
X4 = a,X4 + b,X4
with L(r,X) = L(r) \cdot X
and then propagate the equation to compute to compute a value for
X1. (And, you could draw the automata underlying the transitions at
the top of p. 7 to help the reader understand what is going on, and
the relationship with the current equational system.)
Maybe it is simply that you could have done it, but that it would have
been less pleasant to work with in the context of a Myhill-Nerode
relation?
- I really appreciate the rephrasing of section 4, especially the
final remark. I think it clearly helps the reader to understand the
proof arguments, and the relationship with automata. 
- I concur with the second reviewer to say that the executability of
the first half of the M-N theorem should be discussed in the
conclusion. Moreover, I am puzzled by your comments in the response to
reviewers: to what extent does your formalization need to be
_modified_ to make it executable? Do you need to change the code, or
to add more things? (More precisely, you discuss the executability of
the second half of the M-N theorem -- that constructs a partition of
the set of strings -- in the conclusion, in the context of Haftmann
executable sets; you should do the same for the first-half---that
constructs a regular expression). I am not sure I am happy with the
things you wrote about executable sets, because it is not clear from
my point of view if you need to modify the code, or add things. 
- More generally, it should be made clear that when you say algorithm,
the reader should not understand it as something that is computable as
it is. In particular, I think that the following sentence is
misleading: "our algorithm for solving equational systems actually
calculates a regular expression for the complement language." You
actually define such an expression, but it seems not possible, as it
is, to open the box to see what is inside.
- You answered: 
> In our opinion, Coquand and Siles' development strongly justifies our
> comment about formalisation of Brzozowski's argument being painful.
> If you like to have this reference added in the paper, we are happy
> to do so.
I think that it is worth adding a comment like this one either in
section 5 (at the beginning of p. 20) or in the conclusion.
- There are still some spell-checking mistakes in the paper, e.g.,
  "refular" p. 26, and 
- The authors provide a detailed figure for the number of line of code
  of Almeida et al's implementation of Mirkin construction, but there
  is no such figure for some of other works that are cited. Maybe this
  number could be dropped, or similar figures could be given for other
  works, in order to give a more accurate picture.
- I wonder if you have direction for future works, that would satisfy
  your predicate "results from regular language theory, not results
  from automata theory". It would be nice to give a few examples if
  you have some in mind. 
[1] Deciding Kleene algebras in Coq, Thomas Braibant, Damien Pous
http://dx.doi.org/10.2168/LMCS-8(1:16)2012
[2] A Constructive Theory of Regular Languages in Coq, Christian
Doczkal and Jan-Oliver Kaiser and Gert Smolka
http://www.ps.uni-saarland.de/Publications/details/DoczkalEtAl:2013:A-Constructive.html