streamlined definition of alpha-equivalence for single binders (used flip instead of swap)
> Referee no 1:
>
>  * The paper can be accepted for Logical Methods in Computer Science 
> after minor
> revisions
>
> NUMBER    : LMCS-2011-675
> TITLE     : General Bindings and Alpha-Equivalence in Nominal Isabelle
> AUTHOR(S) : Christian Urban, Cezary Kaliszyk
>
> Recommendation: The paper can be accepted for Logical Methods in
> Computer Science after minor revisions.
>
> The work reported is very good, but the presentation of the paper can
> be improved.
>
> [...]
>
> However, I recommend improvement of the presentation of the paper
> before it is accepted by LMCS.  While the motivation for the work of
> this paper is clear to anyone who has tried to formalize such
> reasoning, it is not explained in the paper.  E.g. on p.1 "However,
> Nominal Isabelle has fared less well in a formalisation of the
> algorithm W [...]."  But there is no analysis in the paper of what was
> hard in algorithm W coded with single binders, or explanation of how
> it would be done in the new system reported in this paper showing why
> the new approach works better in practice.  
We have expanded on the point and given details why single 
binders lead to clumsy formalisations of W.
> Although this example is
> one of the main motivations given for the work, there is apparently no
> formalization of algorithm W in the library of examples that comes
> with Nominal2 described in this paper.  I think that should be
> provided.  Similarly for the second motivating example (on p.2 "The
> need of iterating single binders is also one reason why Nominal
> Isabelle and similar theorem provers that only provide mechanisms for
> binding single variables have not fared extremely well with the more
> advanced tasks in the POPLmark challenge [2], because also there one
> would like to bind multiple variables at once.").
We proved the main properties about type-schemes in algorithm W: 
the ones that caused problems in our earlier formalisation. Though 
the complete formalisation of W is not yet in place.
> The new Isabelle package "Nominal2", described in this paper, is not
> ready for users without a lot of hand-holding from the Nominal2
> developers.  This paper would have more impact if interested users
> could try the tool without so much difficulty.
The plan is to have Nominal Isabelle be part of the next stable 
release of Isabelle, which should be out in Autumn 2012.
At the moment it can be downloaded as a bundle and is ready
to be used (there are two groups that use Nominal2 and 
only occasionally ask questions on the mailing list). 
> A few more specific points:
>
> Bottom of p.7: I don't understand the paragraph containing equations
> (2.4) and (2.5).
We reworded this paragraph.
> Bottom of p.9: The parameters R and fa of the alpha equivalence
> relation are dropped in the examples, so the examples are not clear.
We have two minds with this: Keeping R and fa is more faithful to
the definition, but it would not provide much intuition into
the definition. We feel explaining the definition in special
cases is more beneficial to the reader.
> I think there is a typo in the first example: "It can be easily
> checked that ({x,y},x->y) and ({y,x},y->x) are alpha-equivalent [...]"
> Did you mean "({x,y},x->y) and ({y,x},x->y) are alpha-equivalent"?
Both are equivalent since sets {x, y} and {y, x} are equal. But we made
it clearer as you suggest.
> Referee no 2:
>
> [...]
>
> My main criticisms of the paper are:
>
> * The definition of the "nominal signature" language is not completely clear.
>   The general format at the beginning of section 4 is very clear, but is in
>   fact too general: not everything that can be written in this format makes
>   sense. The authors then walk the reader through a series of 
>   examples of what
>   is *forbidden* (with informal explanations why these examples are
>   forbidden), but in the end, a positive definition of what is *permitted*
>   seems to be missing.
Yes, we agree very much: a positive definition would be very desirable.
Unfortunately, this is not as easy as one would hope. Already a positive
definition for a *datatype* in HOL is rather tricky. We would rely on this
for nominal datatypes. Then we would need to think about what to do with
nested datatype, before coming to nominal matters.
The only real defence for not giving a positive we have is that we give a 
sensible "upper bound". To appreciate this point, you need to consider that 
systems like Ott go beyond this upper bound and give definitions which are 
not sensible in the context of (named) alpha-equated structures.
> * The authors have isolated an important building block, the notion of
>   (multiple-name) abstraction (in Section 3). (Actually, there are three
>   variants of it.) This is good: it makes the whole construction modular
>   and helps simplify what follows. I don't know if this will make sense
>   for the authors, but I would like them to go further in this direction:
>   identify more elementary building blocks ("combinators", if you will),
>   study their properties in isolation, and in the end combine them to
>   obtain a very simple explanation of the "nominal signature" format
>   that is exposed to the user. In the present state of the paper, the
>   design of the "nominal signature" format seems somewhat ad hoc: the
>   format of the "binds" clauses is subject to several restrictions;
>   there seems to be a distinction between "binders" and ordinary
>   "terms"; there is a distinction between "recursive" and "non-recursive"
>   binders, and a distinction between "shallow" and "deep" binders. If
>   one could identify a small number of elementary building blocks and
>   explain/motivate the design of the surface specification language in
>   terms of these elementary notions, the paper might become all the more
>   compelling.
We are not sure whether we can make progress with this. There is such a
"combinator approach" described by Francois Pottier for C-alpha-ML. His approach
only needs an inner and outer combinator. However, this has led to quite
non-intuitive representations of "nominal datatypes". We attempted
to be as close as possible to what is used in the "wild" (and in Ott). 
This led us to isolate the notions of shallow, deep and recursive binders.
>   In the present state of the paper, I think the *implementation* of the
>   nominal package is very useful for the end user, but the *theory* that is
>   presented in this paper is still a bit cumbersome: the definitions of free
>   atoms, alpha-equivalence, etc. presented on pages 16-20 are understandable
>   but not compelling by their simplicity.
We agree - simpler would be much better. However, we refer the referee to "competing" 
definitions that are substantially more complicated. Please try to understand 
the notion of alpha-equivalence for Ott (which has been published). Taking 
tgis into account, we really think our definitions are a substantial improvement,
as witnessed by the fact that we were actually able to code them up.
> * I do not quite understand the treatment of the finiteness restriction.
>   I understand that things must have finite support so as to allow picking
>   atoms outside of their support. But finiteness side conditions seem to
>   appear pretty early and in unexpected places; e.g. I would expect the
>   support of a set of atoms "as" to be equal to "as", regardless of whether
>   "as" is finite or infinite. This could be clarified.
This is a peculiarity of support which has been explained elsewhere in the nominal
literature (pointers in the paper). Assume all_atoms is the set of all atoms then
     supp all_atoms = {}
For sets of atoms the support behaves as follows:
     supp as = as             if as is finite
     supp as = all_atoms      if as and all_atoms - as are infinite
     supp as = all_atoms - as if as is co-finite 
As said, such properties have been described in the literature and it would be
overkill to include them again in this paper. Since in programming language research 
nearly always considers only finitely supported structures, we often restrict
our work to these cases. Knowing that an object of interest has only finite support
is needed when fresh atoms need to be chosen. 
> * The choice of abstraction "style" is limited to three built-in forms (list,
>   set, and set+). Perhaps one could make this user-extensible. After 
>   all, very few properties seem to be required of the basic abstraction forms, 
>   so why not let the user define new ones?
This would be nice, but we do make use of the equality properties
of abstractions (which are different for the set, set+ and list cases).
So we have yet to find a unifying framework.
> * One may argue that the set-abstractions are an attempt to kill two birds
>   with one stone. On the one hand, we take the quotient raw terms modulo a
>   standard notion of alpha-equivalence; on the other hand, at the same time,
>   we take the quotient modulo a notion of structural equivalence (permutation
>   of binders, removal or introduction of vacuous binders). One could argue
>   that dealing with structural equivalence should be left to the user, because
>   in general the structural equivalence axioms that the user needs can be
>   arbitrarily complex and application-specific. 
Yes, we can actually only deal with *one* non-trivial structural equivalence,
namely set+ (introduction of vacuous binders). For us, it seems this is an application
that occurs more often than not. Type-schemes are one example; the Psi-calculus
from the group around Joachim Parrow are another; in the paper we also point to
work by Altenkirch that uses set+. Therefore we like to give automated support 
for set+. Doing this without automatic support would be quite painful for the user.
>   There are object languages,
>   for instance, where abstractions commute with pairs: binding a name in a
>   pair is the same as binding a name within each of the pair components.
>   (This is the case in first-order logic where forall distributes over
>   conjunction.) Thus, one may fear that in many cases, the set and set+
>   abstractions will not be sufficiently powerful to encode the desired
>   structural equivalence, and the user will need to explicitly define 
> a notion
>   of structural equivalence anyway. I don't think that the paper provides
>   convincing evidence that set and set+ abstractions are useful. (That said,
>   they don't cost much, so why not include them? Sure.)
From our experience, once a feature is included, applications will
be found. Case in point, the set+ binder arises naturally in the
psi-calculus and is used in work by Altenkirch.
> * Here is little challenge related to set-abstractions. Could you explain how
>   to define the syntax of an object language with a construct like this:
>
>     let x1 = t1 and ... and xn = tn in t
>
>   where the xi's are bound in t (this is a non-recursive multiple-let form)
>   and the order of the definitions does not matter (that is, "let x1 = t1
>   and x2 = t2 in t" is alpha-equivalent to "let x2 = t2 and x1 = t1 in t")?
>   Can you use a set-abstraction to achieve this? I am guessing that this
>   might be possible, if one represents the definitions "x1 = t1 and ..."
>   using a set of pairs (or a map of names to terms) as opposed to a list
>   of pairs. I think that the system should at the very least allow encoding
>   this example, otherwise set-abstractions will not be very useful in
>   practice.
We can. We added this as an example in the conclusion section.
> p.2, "this leads to a rather clumsy formalisation of W". Could you explain
> why? Although I can understand why in some circumstances it is desirable to
> have a notion of alpha-equivalence that includes re-ordering binders, 
> I am not
> sure that the ML type system (or its inference algorithm) is a good
> illustration. If one examines the typing rules of Core ML, one finds that
> their premises involve a notion of equality between *types* (for 
> instance, the
> function application rule requires that the types of the formal and actual
> arguments match) but do not involve any notion of equality between *type
> schemes*. Type schemes are constructed and eliminated; they are never 
> compared
> with one another. For this reason, it is not clear that a notion of
> alpha-equivalence for type schemes is required at all, let alone that it must
> allow re-ordering binders and/or disregarding vacuous binders.
In the correctness *proof* of W, the notion of capture-avoiding 
substitution plays a crucial role. The notion of capture-avoiding
substitution without the notion of alpha-equivalence does not make
much sense. This is different from *implementations* of W. They
might get away with concrete representations of type-schemes. 
> p.3, "let the user chose" -> "choose"
Fixed.
> p.5, I am not sure what you mean by "automatic proofs". Do you mean
> automatically-generated proof scripts, or proofs performed automatically by a
> decision procedure, or ... ?
Fixed.
> p.5, "adaption"
Fixed.
> p.5, it seems strange to use the symbol "+" for composition, a 
> non-commutative
> operation.
Yes. This is a "flaw" in the type-class setup. If we want to reuse libraries
from Isabelle, we have to bite into this "sour" apple.
> Equation (2.2) is unfamiliar to me. I am used to seeing "supp x" defined as
> the least set L such that for every permutation pi, if pi fixes L, then pi
> fixes x. I assume that the two definitions are equivalent? Is there a reason
> why you prefer this one?
Our definition is equivalent to yours for finitely supported x. 
The advantage of our definition is that can be easily stated
in a theorem prover since it fits into the scheme
   constant args = term
Your definition as the least set...does not fit into this simple 
scheme. Our definition has been described extensively elsewhere,
to which we refer the reader.
> Proposition 2.3, item (i) is not very easy to read, because text and math
> are mixed and "as" happens to be an English word. 
Replaced by bs.
> More importantly, could
> you explain why the hypothesis "finite as" is needed? The proposition seems
> intuitively true if we remove this hypothesis: it states exactly that 
> "supp x"
> is the least set that supports x (this is actually the definition of "supp"
> that I expected, as mentioned above).
It does *not* work for non-finitely supported sets of atoms. Imagine
bs is the set of "even" atoms (a set that is not finite nor co-finite).
   bs supports bs
but 
   supp bs = all_atoms    !<=   bs 
> p.8, "equivariant functions have empty support". I suppose the converse is
> true, i.e. "functions that have empty support are equivariant". If this is
> correct, please say so.
We have reworded this paragraph.
> p.8, "we used extensively Property 2.1". You mean "Proposition 2.1". 
Fixed.
> p.8, "we identify four conditions: (i) [...] x and y need to have the same
> set of free atoms". You seem to be saying that fa(x) and fa(y) should be
> equal. But this is too strong; I suppose you mean fa(x) \ as = fa(y) \ bs.
> Please clarify. (Definition 3.1 indeed clarifies this, but I believe that
> the text that precedes it is a bit confusing.)
Fixed.
> p.9, it seems to me that alpha-equivalence for Set+ bindings (Definition 3.3)
> is in a sense the most general of the three notions presented here. Indeed,
> alpha-equivalence for Set bindings can be defined in terms of it, as follows:
>
>   (as, x) =_{Set} (bs, y)
>   if and only if
>   (as, (as, x)) =_{Set+} (bs, (bs, y))
>
> That is, I am comparing abstractions whose body has type "atom set * beta".
> The comparison of the set components forces condition (iv) of Definition 3.1.
> Similarly, alpha-equivalence for List bindings can be defined in terms of it,
> as follows:
>
>   (as, x) =_{List} (bs, y)
>   if and only if
>   (set as, (as, x)) =_{Set+} (set bs, (bs, y))
>
> That is, I am comparing abstractions whose body has type "atom list * beta".
> Am I correct to think that one can do this? If so, could this help eliminate
> some redundancy in the paper or in the implementation? And, for a 
> more radical
> suggestion, could one decide to expose only Set+ equality to the programmer,
> and let him/her explicitly encode Set/List equality where desired?
The second property holds outright. The first holds provided as and bs
are finite sets of atoms. However, we seem to not gain much code reduction
from these properties, except for defining the modes set and list in terms 
of set+. Also often proving properties for set+ is harder than for the
other modes.
Since these modes behave differently as alpha-equivalence, the encoding would, 
in our opinion, blurr this difference. We however noted these facts in the conclusion. 
> p.10, "in these relation"
Fixed.
> p.10, isn't equation (3.3) a *definition* of the action of permutations
> on the newly defined quotient type "beta abs_{set}"?
Yes, we have this now as the definition. Earlier we really had
derived this. Both methods need equal work (the definition
needs a proof to be proper). Having it as definition is clearer.
> p.11, why do you need to "assume that x has finite support" in order to
> obtain property 3.4? It seems to me that this fact should also hold for
> an x with infinite support. 
Fixed.
> Same remark in a couple of places further
> down on this page. You note that "supp bs = bs" holds "for every finite
> set of atoms bs". Is it *not* the case that this also holds for infinite
> sets? If so, what *is* the support of an infinite set of atoms? 
This property only holds for finite sets bs. See above.
> Why not
> adopt a definition of support that validates "supp bs = bs" for *every*
> set of atoms bs? Is there a difficulty due to the fact that what you
> call a "permutation" is in a fact "a permutation with finite support"?
> I think it would be good to motivate your technical choices and clarify
> exactly where/why a finite support assumption is required.
This cannot be made as it is very natural to have the property
that the set of all atoms has empty support. There is no permutation
that does not fix that set. We cannot see how this can be made
to work so that
   supp bs = bs 
for all sets.
> p.11, "The other half is a bit more involved." I would suggest removing
> this rather scary sentence. The proof actually appears very simple and
> elegant to me.
Fixed.
> p.12, "mutual recursive" -> "mutually recursive"
Fixed.
> p.12, does the tool support parameterized data type definitions? If so,
> please mention it, otherwise explain whether there is a difficulty (e.g.
> the parameters would need to come with a notion of permutation).
Yes. Note added.
> p.12, "Interestingly, [...] will make a difference [...]". At this 
> point, upon
> first reading, this is not "interesting" but rather frustrating, because it
> does not sound natural: my understanding would be very much simplified if
> "binds ... in t u" was equivalent to "binds ... in t, binds ... in 
> u". Because
> a forward pointer is missing, I cannot find immediately where this is
> explained, and this problem hinders my reading of the beginning of section 5.
We have given a forward pointer.
> p.13, the type of atoms now seems to be "name", whereas it was previously
> "atom". The remark on the last line of page 13 leads me to understand that
> "name" refers to one specific sort of atoms, whereas "atom" refers to an
> atom of any sort (right?). The function "atom" converts one to the other;
> but what is its type (is it overloaded?).
Yes, atom is overloaded. 
> p.13, you distinguish shallow binders (binds x in ...) and deep binders
> (binds bn(x) in ...). I would hope that a shallow binder is just syntactic
> sugar for a deep binder where "bn" is the "singleton list" or "singleton
> set" function. Is this the case? If not, why not? If yes, perhaps you could
> remove all mentions to shallow binders in section 5.
Not quite. A deep binder can be recursive, which is a notion
that does not make sense for shallow binders.
> p.14, "we cannot have more than one binding function for a deep binder".  You
> exclude "binds bn_1(p) bn_2(p) in t". Couldn't this be accepted and
> interpreted as "binds bn_1(p) \cup bn_2(p) in t"? (I guess it does not matter
> much either way.)
That would make the definition of fa-function more complicated.
> p.14, you also exclude "binds bn1(p) in t1, binds bn2(p) in t2". Two
> questions. First, a clarification: if bn1 and bn2 are the same function, is
> this allowed or excluded? 
For practical purposes, this is excluded. But the theory would
support this case. I added a comment in the paper.
> Second, I don't understand why you need this
> restriction, that is, why you are trying to prevent an atom to be "bound and
> free at the same time" (bound in one sub-term and free in another). I 
> mean, in
> the case of single binders, you seem to allow "binds x y in t1, binds 
> y in t2"
> (at least, you have not stated that you disallow this). There, occurrences of
> x in t1 are considered bound, whereas occurrences of x in t2 are considered
> free; is this correct? If so, why not allow "binds bn1(p) in t1, binds bn2(p)
> in t2", which seems to be of a similar nature? 
The problem is that bn1 and bn2 can pick out different atoms to
be free and bound in p. Therefore we have to exclude this case.
> p.14, example 4.4, the restriction that you impose here seems to rule out
> an interesting and potentially useful pattern, namely telescopes. A telescope
> is a list of binders, where each binder scopes over the rest of the 
> telescope,
> and in addition all of the names introduced by the telescope are considered
> bound by the telescope in some separate term. I am thinking of 
> something along
> the following lines:
>
>   nominal_datatype trm =
>   | Var name
>   | Let tele::telescope body::trm  binds bn(tele) in body
>   | ...
>
>   and telescope =
>   | TNil
>   | TCons x::name rhs::trm rest::telescope  binds x in rest
>
>   binder bn::telescope => atom list
>   where bn (TNil) = []
>       | bn (TCons x rhs rest) = [ atom x ] @ bn(rest)
>
> You write that "if we would permit bn to return y, then it would not be
> respectful and therefore cannot be lifted to alpha-equated lambda-terms". I
> can see why there is a problem: if "x" is considered bound (therefore
> anonymous) in the telescope "TCons x rhs rest", then it cannot possibly be
> returned by a (well-behaved) function "bn". I think that the answer to this
> problem should be: we must pick an appropriate notion of alpha-equivalence
> for telescopes, and this notion of alpha-equivalence must *not* consider x
> as anonymous in "TCons x rhs rest". Instead, x must be considered free in
> this telescope. The telescopes "TCons x rhs TNil" and "TCons y rhs TNil"
> must be considered distinct. Of course we could achieve this effect just by
> removing the clause "binds x in rest", but this would lead to a notion of
> alpha-equivalence for "Let" terms which is not the desired one: when writing
> "let (x1 = t1; x2 = t2) in t", we would like x1 to be bound in t2, and this
> will not be the case if we omit "binds x in rest" in the above definition.
> I conclude that your design (which seems very reasonable) cannot currently
> express telescopes. It would be nice if you could explicitly discuss this
> issue. Is it conceivable that an extension of your system could deal with
> telescopes? Other researchers have proposed approaches that can deal with
> them (I am thinking e.g. of ``Binders Unbound'' by Weirich et al.).
Yes, we cannot deal with telescopes. We added a comment. 
> Here is another general question. How would you declare a nominal data type
> for ML patterns? Informally, the syntax of patterns is:
>
>   p ::=
>     x                                                    (variable)
>   | (p, p)      where bn(p1) and bn(p2) are disjoint     (pair)
>   | (p | p)     where bn(p1) = bn(p2)                    (disjunction)
>   | ...
>
> In the case of a pair (or conjunction) pattern, one usually requires that the
> two components bind disjoint sets of names, whereas in the case of a
> disjunction pattern, one requires that the two components bind exactly the
> same sets of names. How would you deal with this? I imagine that one could
> just omit these two side conditions in the definition of the nominal data
> type, and deal with them separately by defining a well-formedness predicate.
Yes, that is how we would exclude non-linear patterns.
> One question: in the definition of the "term" data type, at the point where
> one writes "binds bn(p) in t", which variant of the "binds" keyword would one
> use: "binds", "binds(set)", or "binds(set+)"? Does it make any difference,
> considering that a pattern can have multiple occurrences of a name in binding
> position? It would be interesting if you could explain how you would handle
> this example.
You can write 
    
    binds bn(p) in t, 
    binds(set) bn(p) in t, 
    binds(set+) bn(p) in t
> p.15, just before section 5, I note that the completion process does *not*
> produce any clause of the form "binds ... in x" (in the Lam case). One could
> have expected it to produce "binds x in x", for instance. One could imagine
> that, for *every* constructor argument t, there is a clause of the 
> form "binds
> .. in t". Here, you adopt a different approach: you seem to be partitioning
> the constructor arguments in two categories, the "terms" (which after
> completion appear in the right-hand side of exactly one "binds" clause) and
> the "binders" (which appear in the left-hand side of at least one "binds"
> clause). Please clarify whether this is indeed the case. (You have 
> presented a
> series of data type definitions that you forbid, but in the end, you should
> present a succinct summary of what is allowed.) Also, I seem to understand
> that the following definition is forbidden:
>
>   nominal_datatype trm =
>   | Foo t1::trm t2::trm  binds bn(t1) in t2, binds bn(t2) in t1
>
> (for some definition of "bn"). This would be forbidden because t1 and t2 are
> used both as "terms" and as "binders" (both on the left-hand and right-hand
> side of a "binds" clause). As far as I can see, however, you have not
> explicitly forbidden this situation. So, is it forbidden or allowed? Please
> clarify.
It is disallowed. We clarified the text.
> p.17, "for each of the arguments we calculate the free atoms as 
> follows": this
> definition relies on the fact that "rhs" must be of a specific *syntactic*
> form (unions of expressions of the form "constant set" or "recursive call").
> For instance, "rhs" cannot contain the expression "my_empty_set z_i", where
> "my_empty_set" is a user-defined function that always returns the empty set;
> otherwise the third bullet would apply and we would end up considering "z_i"
> as neither free nor bound. You have mentioned near the top of page 15 that
> binding functions "can only return" certain results. You should clarify that
> you are not restricting just *the values* that these functions can 
> return, but
> the *syntactic form* of these functions.
Fixed.
> p.23, "We call these conditions as": not really grammatical.
Fixed.
> p.23, "cases lemmas": I suppose this means an elimination principle?
Yes.
> p.23, "Note that for the term constructors" -> "constructor"
Fixed.
> p.26, "avoid, or being fresh for" -> "avoid, or are fresh for"
Fixed.
> p.30, "Second, it covers cases of binders depending on other binders,
> which just do no not make sense [...]". I am curious why the designers
> of Ott thought that these cases make sense and you don't. Perhaps this
> point would deserve an example and a deeper discussion?
We have added more comments why such binders would not make sense
for alpha-equated terms (fa-functions would not lift). We do not
know why Ott allows them - probably because they do not establish
a reasoning infrastructure for alpha-equated terms.
> p.32, "It remains to be seen whether properties like [...] allow us
> to support more interesting binding functions." Could you clarify
> what you mean? Do you mean (perhaps) that fa_bn(x) could be defined
> as fa_ty(x) \ bn(x), regardless of the definition of bn(x), instead
> of by induction over x? Do you mean something else?
Yes.