We thank the reviewers for their time and insightful comments.
We have incorporated most comments and if not, then answered them
below. We also spell-checked the paper. Thanks again for all the
comments.
Reviewer #1
===========
> Indeed, one of the premise of the paper is that representing automata
> in a theorem prover requires to fight against the theorem prover or
> requires painful amount of details. Therefore, the authors define the
> notion of regular languages as a language generated by a regular
> expression. (This is not an unusual choice from a formalization point
> of view, provided complemented by a proof of Kleene's theorem that
> states such regular languages are exactly the one that are recognized
> by finite automata.)
We concur with the reviewer that our definition of regular languages
as the ones generated by regular expressions is usual, but the paper is
unusual in that we do not need to provide a version of Kleene's theorem.
All results are obtained without formally needing a definition for automata.
If we would need Kleene's theorem as part of our arguments, then we would
need such a definition.
To make our intention with the paper more clear, we moved the quote
by Gasarch from the Conclusion to the Introduction. In this quote he asks
whether the complementation of regular-expression languages can be proved
without using automata. In this way we sharpen our motivation by being
theorem prover related, but also of general interest. See quotations and
changes right after Definition 2, p. 3.
> Second, the premise of this article is that formalizing automata
> theory require some fight with the theorem prover, and that several
> other (longer) formalizations from the related work substantiate this
> claim. I am under the impression that this comparison is misleading
> since some of the related work deals not only with regular language
> \emph{theory}, but also regular language \emph{algorithms}: e.g.,
> automata constructions, conversion from NFA to DFA, minimization,
> decision procedures for language equivalence. These algorithms may be
> proved correct, complete, or terminating; and may be more or less
> efficient. Then, these properties dictate some choices of
> representation of the objects of the formalization (and hence have
> consequences on its intricacies).
Yes, we agree with the reviewer that *if* one is in the business of
proving the correctness of, for example, Hopcroft's algorithm or the
NFA-to-DFA conversion, which are based on "concrete" automata, then yes
one clearly has to use the usual notion of the automata somewhere, even
might have to start from it.
However, we disagree with the interpretation that our comparison is
misleading. Our claim that usual `paper' definitions do not translate
smoothly into formalised definitions is not a claim just by us and
independent of any numbers: We referenced in the paper to a concrete
problem pointed out already in Slind's email cited on the bottom of
p. 2. In the context of HOL4, he made the observation
| A student who is attempting to prove the pumping lemma for regular
| languages came across the following problem. Suppose a DFA is
| represented by a record <| init ; trans ; accept |> where
|
| init : 'state
| trans : 'state -> 'symbol -> 'state
| accept : 'state -> bool
|
| Execution of a DFA on a string:
|
| (Execute machine [] s = s)
| (Execute machine (h::t) s = Execute machine t (machine.trans s h))
|
| Language recognized by a machine D:
|
| Lang(D) = {x | D.accept (Execute D x D.init)}
|
| These are all accepted by HOL with no problem. In then trying to define
| the set of regular languages, namely those accepted by some DFA, the
| following is natural to try
|
| Regular (L:'symbol list -> bool) = ?D. L = Lang(D)
|
| But this fails, since D has two type variables ('state and 'symbol) but
| Regular is a predicate on 'symbol lists. So the principle of definition
| rejects.
We also pointed to comments by Tobias Nipkow which clearly show that
he had to fight the theorem prover. Many formalisations we cited for
comparison contain statements about having to fight their respective
theorem prover.
The conclusion we want the reader to draw from our comparisons is
that if one is just interested in formalising results from regular language
theory, not results from automata theory, then regular expressions are
easier to work with formally. If you look at the numbers (especially the
ones for Isabelle/HOL), then we hope you agree with us that the startup
costs are much higher with automata in comparison with regular
expressions. We carefully commented on the examples we have cited in
the Conclusion. Nowhere did we claim that they are one-to-one comparisons with
our development. We did, for example, say that Braibant agreed with us that
*if* he had done just Myhill-Nerode using his library, he estimates our
development is more concise (see his comment in the Conclusion, p. 25).
> In this article, the authors choose
> to do the converse: avoid the notion of automata altogether, and see
> how far it goes. I find this choice a bit peculiar.
At first sight, our choice/approach might seem peculiar, since
we are not primarily verifying an algorithm. However, we are not
the only ones who pondered about this approach (we mentioned Gasarch).
We showed that this is indeed a choice that can lead you quite far
with regular language theory and lead to a relatively smooth
formalisation. We did not make the claim that regular expressions are
a substitute for automata in all situations. See our paragraph
in the Conclusion starting with "While regular expressions are
convenient, they have some limitations."
> Third, I wonder to what extent some of the proofs presented in this
> article are a rephrasing of standard proofs (that go through
> automata), inlining the notion of automata, e.g., the construction
> depicted in Section 3. Also, the notions of derivatives and partial
> derivatives underly some automata constructions: for instance, from my
> point of view, Section 5 is actually building a DFA (with transition
> relation $\delta$ and initial state ${r}$)from a regular expression
> without saying so; and it suffices to take as a tag function the
> expression $\lambda x.\delta({r},x)$ (which is a right invariant
> equivalence relation between strings, and of finite index since there
> finitely many states in the automaton). The authors take care to note
> the proximity of their arguments with existing ones when due (that
> often rely on automata), but they seem very close to the existing one
> in any case, with enough inlining. (Yet, I found the proof based on
> tagging functions quite elegant: it avoids the explicit construction
> of an automata--because automata are difficult to formalize in
> HOL--while being close to the intuitive automata constructions, even
> if they are not mentionned.)
We agree "that underlying our arguments are automata proofs", but
we argued without constructing automata explicitly. This is the very point
and contribution of the paper. If we construct automata explicitly, the
formalised proof of Myhill-Nerode theorem will become much larger as
can be seen in existing work. By abandoning automata and relying only on
regular expressions, we get a much simpler formalised proof of MN theorem.
> To sum up, I think there is a significant value to come up with fresh
> arguments that makes it easier to formalize existing pen-and-paper
> proofs and textbooks. Yet, the arguments presented in this paper are
> not really new (they look like inlinings, even if the authors chose
> not to see it). And I am not convinced by the idea of side stepping an
> elegant tool, the notion of automata, because it is hard to
> formalize. I would rather improve on the underlying tools, or develop
> new ways to formalize this notion.
We disagree in that we should not take the opportunity of using
regular expressions when they lead to simpler formalisations
in regular language theory. We have already found that many structural
inductions that only regular expressions afford are good learning
tools for students in order to come to grips with formal reasoning.
> - Lemma 21 already appears in Brzozowski (1964) who attributes it to
> Arden. I think there is little value in distinguishing the two
> versions of Arden's lemma, since their proofs and statement are parallel.
We agree that Brzozowski (1964) gives the statement of Arden's lemma, but
no proof, for which he refers to Arden (1960). Unfortunately, we do not have
access to this book and guess that this applies to many contemporary readers.
Our reference to Sakarovitch is motivated by the fact that he gives
explicitly a proof (XXX CHECK).
> - I am puzzled by the one-character transition relation between
> equivalence classes (p. 6). The $X_i$ correspond to the states of
> the minimal automaton that recognizes $L$ (and you even agree with
> that in the Related Work section!). Therefore, I disagree with your
> sentence stating that this construction does not build an
> automaton.
We deleted this comment since it also confused the second reviewer.
What we wanted to express is that we have not defined a transition
relation between states in the usual automata sense.
> Then, it would follow that the initial system would be better
> presented as a DFA (that is, associating to each equivalence class,
> a mapping from atoms to equivalence classes with the atoms being
> disjoint), even if you do not want to introduce the notion of
> automaton. I fail to see the advantages of the current presentation
> of the initial system. Then, I think that it would be easier to
> understand your presentation if you used a more involved language
> (like (ab + a)) as a running example. This would produce the
> equivalence classes
>
> X1 = {[]}
> X2 = {[a]}
> X3 = {[ab]}
> X4 = {univ - {[], [a], [ab]}}
>
> and the equational system
>
> X1 = LAMBDA(ONE)
> X2 = (X1,a)
> X3 = (X2,b)
> X4 = (X1,b) + (X2,a) + (X3,a) + (X3,b)
>
We have modified the example as you suggest. But note that assuming
we only have characters a and b in an alphabet, the equation for
X4 must be
X4 = (X1,b) + (X2,a) + (X3,a) + (X3,b) + (X4,a) + (X4,b)
> (It could be worth saying that it is a left-linear context free
> grammar in any case). Having built this example, I wonder why you
> chose to set up the equationnal system the way you did. In the RW
> section, you argue that the differences with Brzozowski's are subtle
> and impact the presentation of , but I fail to see what would go
> wrong if you chose this presentation too.
The difference is from where we start: In Brzozowski's algebraic
method, the start is an automaton from which this method calculates
a regular expression. Therefore Brzozowski annotates the final
states with lambdas and `propagates' them to the single initial
state. From there he can read off a (single) regular expression recognising
the same language as the automaton. We, on the other hand, start with a
set of equivalence classes, for all of which we want to calculate a
regular expression. In this way we have to annotate the lambda to the
initial state (containing only []) and `propagate' it to all other
equivalence classes. Once there, we can read off the regular expressions
for each equivalence class. There seems to be no convenient way to set
up the equational system the original Brzozowski-way. However, it does
mean that we have to use the reversed version of Arden's lemma in order to
solve our equational systems.
> - p. 11 "There exists a regular expression" this is not clear, unless
> one has a look at your formal proof: it should be made clear that
> the definition "regular A" is defined as exists r, A = lang r.
We clarified this sentence according to your suggestion.
> - Top of p. 12: it is stated that this algorithm provides a method to
> compute a regexp for the complement of a language and that this
> construction is similar to the construction of the complement
> automaton. I agree with this statement, because I am under the
> impression (maybe wrongly) that the construction in this section
> boils down to "automata with states indexed by the equivalences
> classes" to "regular expression". What I am uncertain of is, under
> what conditions I could use your formalization to compute this
> regular expression. (Note that I am not speaking about complexity
> issues here.)
We are not 100% sure whether we are correctly interpreting your
question: If you are after an informal "black board" calculation method
of this regular expression in front of students, then you can use our
algorithm. We have done this already. If you are asking whether
our formalisation is "constructive" enough in order to directly
extract an algorithm, then no we cannot do this. We use a theorem
prover based on classical logic. We, for example, did not make
explicit how a single regular expression can be constructed from
a set of regular expressions, still recognising the same language.
However, we do not like to open the "can-of-worms" about constructive
and unconstructive arguments and how to extract explicit algorithms.
In theory we can, in practice we cannot. Unfortunately, we do not have
a better answer for this at the moment and think this is besides
the main point of the paper.
> - It seems that the purpose of the tagging functions here is to build
> right invariant equivalence relations that refines the Myhill-Nerode
> relation (Lemma 43). Maybe you should start by that. Also, any such
> right invariant relation naturally gives rise to an automaton, and
> you build such later on. Maybe you could phrase it out at the very
> beginning of this section.
We added a long remark at the end of the section that spells
out the automata inspiration behind the tagging functions (see p. 17).
> - p. 13 In the definition of the tagging function for the plus case,
> isn't there a typo (replacing the \cdot by a comma) ?
No. Our formalisation uses this definition.
> - p. 14 There is an "x" missing on the left-hand side of the definition of
> x tag A B.
Yes. Corrected.
> - The structure of section 4 is a bit difficult to follow, because the
> proof of Theorem 31 is interleaved with the statement of many
> intermediate lemmas. Could you separate the definition of the
> tagging functions from the proof?
We have added a figure at the beginning of the proof and included
appropriate references.
> I think it would also be
> interesting to point out that the type of the codomain the tag
> functions are not the same through the section. Again, I am under
> the impression that the tagging functions defined in this section
> define states of an automaton that is not built explicitly, with the
> result of the tag function being a particular state in a suitably
> built automaton.
We added a relatively lengthy remark after the proof about
how the tagging functions can be seen as constructing particular
automata - see p. 17.
> Yet, it could be worth pointing out that this use
> of polymorphism for the tag functions does not have the same
> problems as the one encountered when one try to formalize
> automata. This is in a sense contradiction with your statement
> p. 24 that you sidestep the composition of automata (this automaton
> is not explicitly built).
We have modified the statement on p. 24.
> - p. 18: you mention the fact that it is painful to formalize
> Brzozowski's argument stating that there are only finitely many
> dissimilar derivatives. Is this because it requires to build part of
> the equationnal theory of regular expressions that you avoided so
> far? Could you comment on what Coquand and Siles did in the paper
> you cite?
Coquand and Siles first spend 3 pages of their 16-pages paper
developing a theory of inductively finite sets in type theory.
This theory uses "discrete sets" where the equality needs to
be decidable. They then view the ACI-properties as rewrite rules
leading to a decidable equality over regular expressions. This
shows (although they left out the details in their paper) that the
set of derivatives is inductively finite.
In our opinion, Coquand and Siles' development strongly justifies our
comment about formalisation of Brzozowski's argument being painful.
If you like to add this as a reference in the paper, we are happy
to do so.
> - Section 6, p20, previous to last paragraph: you say that the
> construction of the complement regular expression involves several
> transformations: you could shave off the construction of the
> non-deterministic automaton, since it can be avoided using
> Antimirov's construction of a DFA (as you do).
We feel that our sequence adequately represents procedure
that is used by informed outsiders and that is usually
taught to students. Antimirov's construction using partial
derivatives is unfortunately not widely known, which we hope
we can help to change with this paper.
> - (27) are properties, not definitions, isn't it?
Corrected.
> - A question. On the one hand, Isabelle/HOL features a classical
> logic, with the law of the excluded middle and the Hilbert's epsilon
> operator. On the other hand, in Coq, it is often involved to prove
> decidability results, which may make it more difficult to formalize
> some of the arguments you are using. Is it possible in Isabelle/HOL
> to track what are the uses of classical logic that you use?
In theory there is a way to track all classical uses in Isabelle,
but this can be practically unfeasible, since proof terms can be
gigantic (therefore the Isabelle/HOL follows the LCF approach
where only theorems need to be recorded for correctness). Having
said that there is one obvious place in our proof where we used
a "classical" shortcut by using + for generating a single regular
expression from a finite set of expressions. This would need better
"explanation" in a constructive proof. There are no other inherent
classical shortcuts in the proof. But as said before opening we
think opening this can-of-worms is beside the focus of the paper.
> In particular, I am worried about the last sentence on page 24: how do
> you make these sets computational?
We wrote:
However, since partial derivatives use sets of regular expressions, one
needs to carefully analyse whether the resulting algorithm is still
executable. Given the infrastructure for executable sets introduced by
Haftmann (2009) in Isabelle/HOL, it should.
Does this allay your fears?
> - I think a word of caution should be added to the comparison with
> related works in terms of formalization size: there is a huge gap in
> the scope of these formalizations efforts, especially when one
> defines algorithms. For instance, an usual corollary of the
> Myhill-Nerode theorem is an algorithm that minimize automata: could
> you provide, for instance, an algorithm that compute the number of
> equivalences classes of your Myhill-Nerode relation?
We have attempted to be very careful with our comments about the
formalisation sizes from other works. Our intention is only to draw
attention to the somewhat surprising point, namely that formalisation
using automata (either out of choice or being forced) led to
formalisations where the people in question had to fight their
theorem prover. On the other hand, our formalisation could rely
on the fact that regular expressions are inductive datatypes which
are generally well-supported by theorem prover reasoning. We hope
to have "balanced the picture" by saying that regular expressions
are not good for everything (see paragraph in the Conclusion
"While regular expressions are convenient, they have some
limitations...").
We have not thought about an algorithm to compute the number of
equivalence classes.
> - There has been several work that uses matrices in Coq, in the
> context of the ssreflect dialect. For instance, you may have a look
> at Dénès M, Mörtberg A, Siles V. 2012. A refinement-based approach
> to computational algebra in COQ. ITP 2012. It does not seem that
> there is a particular problem with the use of matrices.
We relied completely on the comment by (expert formaliser in Coq) Braibant
from his thesis which explicitly states
| Note that while they [Wu, Zhang, Urban] justify their work by the fact
| that formalising automata ``can be a real hassle in HOL-based provers'',
| and that neither Isabelle/HOL nor HOL4 nor HOL-light support matrices and
| graphs with libraries, we did not encounter such difficulties in Coq
| (omitting the **intrinsic difficulty** of working with **rectangular matrices**
| at some points).
The emphasis added by us. As far as we know Braibant also uses ssreflect.
Reviewer #2
===========
> This is a very nice paper, at least in terms of its content, and it
> should certainly be published in some shape or form. It's based on an
> ITP conference paper from 2011, but seems to me to include sufficient
> new content to justify republication in a journal. To be specific,
> the new material is: the second of the two proofs showing the
> implication that if a language is regular it has a finite number of
> Myhill-Nerode equivalence classes (using partial derivatives), and all
> of the closure properties. Both new contributions are very appealing.
We agree with your observations.
> Really, I have only a slew of minor criticisms about the paper, and
> these almost all have to do with minor presentation issues. I started
> to list some of these in the "Minor things" section below, but I
> eventually decided that I should simply say that 1. the English is
> sometimes less than perfect; 2. a spell-checker should be used; 3. the
> typography is occasionally dubious; and that 4. it all needs to be
> re-read with more care.
We have improved the paper. Please let us know if we
overlooked any more errors.
> Finally, the conclusion and related work sections are reasonable. I
> think the conclusion could usefully discuss the executability of the
> algorithm in the first half of the M-N proof.
Informally, the algorithm is clearly executable. But we have not
attempted this with our formalisation.
> Also, the conclusion discusses the Brzozowski's algebraic method in
> this context, and argues that the equational constraints that appear
> in this paper are not quite the same as a minimal automaton. It's
> clear that, yes, they are a little different, but it still seems to
> me that the initial state of the constraint set is isomorphic to the
> automaton. Given that, I was very surprised to read (on p6) that
> "Note that we do not define an automaton here".
Yes, we agree, behind our equational system lurks a minimal automaton. What
we wanted to clear up is that we have not defined a transition relation
between states in the usual automata sense. Despite best intentions, we
have deleted this remark, as it clearly misled the reviewers.
> Appendix: this seems unnecessary. If it's interesting enough to be in
> the main text, it should be there. Otherwise refer the reader to the
> online formalisation. In this case, the appendix is so small that it
> seems particularly silly to have one.
We deleted the appendix.