Journal/JAR-response2
changeset 388 0da31edd95b9
equal deleted inserted replaced
387:288637d9dcde 388:0da31edd95b9
       
     1 
       
     2 > Reviewer #1: This paper demonstrates that a) the Myhill-Nerode theorem can be
       
     3 > proved without relying on the notion of automata and b) argues
       
     4 > shallowly thorough the paper that not having to formalize the notion
       
     5 > of automata is easier.
       
     6 > 
       
     7 > While I do agree that point a) is a an interesting theoretical
       
     8 > contribution that deserves publication, I disagree with the second
       
     9 > argument.
       
    10 
       
    11 We are very grateful to the reviewer deciding the paper is a worthy 
       
    12 contribution. We are especially grateful since this opinion is despite 
       
    13 disagreement with parts of the paper. Thank you very much!
       
    14 
       
    15 > It should be noted that the handling of the notion of automata has
       
    16 > been difficult historically for two reasons. On the one hand, giving
       
    17 > an identifier to every state of an automata indeed yields proofs that
       
    18 > may be characterized as "clunky" (even if the wording is a bit
       
    19 > strong), but is a necessary condition for efficient implementation of
       
    20 > automata algorithms. This yields sizable formalizations as is
       
    21 > demonstrated by the Lammich and Tuerk's work or Braibant and Pous's
       
    22 > work [1]. On the other hand, it seems to be the case that the
       
    23 > "principle of definition" in HOL makes it impossible to use parametric
       
    24 > definitions of automata (but there is no problem with such a
       
    25 > definition in Coq).
       
    26 
       
    27 We agree with this summary. We cited the work by Doczal et al and wrote 
       
    28 that their quantification over types is not available to us. 
       
    29 
       
    30    Systems such as Coq permit quantification over types and thus can 
       
    31    state such a definition. This has been recently exploited in an elegant 
       
    32    and short formalisation of the Myhill-Nerode theorem in Coq by Doczkal 
       
    33    et al. (2013), but as said this is not available to us working in 
       
    34    Isabelle/HOL.
       
    35 
       
    36 > But, the startup cost of a formalization depends on the scope of what
       
    37 > one intends to formalize. Perhaps Braibant concedes that your
       
    38 > development is more concise that his, yet Braibant's work dealt with
       
    39 > the formalization of an _efficient_ decision procedure for Kleene
       
    40 > algebras that required _two_ formalizations of automata (the matrices
       
    41 > are only used in the proof of correctness of this decision procedure,
       
    42 > while the computational part uses efficient representations for DAGs
       
    43 > -- see for instance [1] p. 21 or Braibant's thesis p. 49). I am only
       
    44 > adding emphasis here to demonstrate that this does not compare easily
       
    45 > with the formalization of the proof of Myhill-Nerode by itself.
       
    46 
       
    47 We have attempted to not touch upon efficiency and algorithmic issues, since 
       
    48 they are an distraction for our topic at hand, that is reasoning about inductive
       
    49 structures, of which regular expressions are one instance. If you use more
       
    50 efficient datastructures, then it is expected that reasoning will be
       
    51 more difficult. The point of the paper is to prove the Myhill-Nerode theorem
       
    52 with only inductive structures, which are well-supported in all interactive 
       
    53 theorem provers we are aware off. 
       
    54 
       
    55 > (At this point, let me comment on your answer: "As far as we know
       
    56 > Braibant also uses ssreflect".  It is not the case. One may wonder to
       
    57 > what extent the use of ssreflect there could have alleviated the
       
    58 > "intrinsic difficulties of working with rectangular matrices".)
       
    59 
       
    60 Noted. We will not make this claim and have not done so in the paper.
       
    61 
       
    62 > In the same fashion, Lammich and Tuerk's work uses efficient
       
    63 > data-structures (.e.g, for collections, automatas), and some amount of
       
    64 > data-refinement to link various representations together. 
       
    65 
       
    66 Please note that the 14 kloc we cited for the Lammich/Tuerk work is
       
    67 the automata library, not the formalisation built on top of it.
       
    68 Also Lammich and Tuerk have told us that reasoning about graphs 
       
    69 with state names is undesirable. If they had a better representation
       
    70 available (which they had _not_, since they formalise Hopcroft's
       
    71 algorithm), they would have used this one and establish that it is 
       
    72 "equivalent" to the more efficient version with efficient data-
       
    73 structures. In their case there is no alternative formalising a library
       
    74 about graphs etc. We have shown that in case of the Myhill-Nerode theorem
       
    75 there is an alternative. We have been very careful to make claims about
       
    76 being the first ones who have done so. However, we believe we are,
       
    77 but certainly our proofs are not present in the literature (this
       
    78 might have something to do with the fact that the Myhill-Nerode
       
    79 theorem requires the reversal of Brzozowski and Arden...see
       
    80 comment further below).
       
    81  
       
    82 > My most stringent remark about the paper is that it should make clear
       
    83 > from the beginning whether or not the algorithms are computational,
       
    84 > and if they are executable, how efficient. Then, comparing with the
       
    85 > two aforementioned work will make sense. 
       
    86 
       
    87 As stated earlier, we do not like to make computational issues
       
    88 explicit as they are a distraction in our opinion. The logic behind 
       
    89 Isabelle/HOL and all HOL-based theorem provers is not a computational logic, 
       
    90 in the sense that it is not about computable functions (unlike Coq). There 
       
    91 are "uncomputable" features in HOL such as the axiom of choice (which we
       
    92 make use of in our formalisation). We still use the terminology "algorithm"
       
    93 in the widely-used sense of algorithms that contain "do-not-care"
       
    94 nondetermism. For example, unification algorithms are usually explained and 
       
    95 proved correct as transition system over sets of equations. They are not
       
    96 executable in the Turing-machine sense, since they leave implicit
       
    97 the order in which the equations should be analysed and which
       
    98 rules should "fire" if there are more than one applicable. They
       
    99 are also "imperfect" as the notion of (potentially infinite) sets
       
   100 cannot be implemented, in general, as a computable function, but 
       
   101 they can be reasoned about in Isabelle/HOL and other theorem provers.
       
   102 
       
   103 Still we hope that the reviewer agrees with us and with the many uses 
       
   104 in the literature (unification is only one example which commits this 
       
   105 "sin"), it is nevertheless perfectly sensible to call such "things" 
       
   106 algorithms. Although they cannot be directly implemented, they can be
       
   107 implemented once choices are made explicit and datastructures 
       
   108 fixed. Similarly, our formalisation uses a do-not-care approach
       
   109 for building a regular expression out of a set of regular expressions
       
   110 (see equation (2)).  It also leaves the order of equations that should be 
       
   111 analysed under-specified. But if all these choices are made explicit, we
       
   112 have an algorithm that is implementable in code. But as the explanation
       
   113 already indicates, it is quite a digression from our main topic of
       
   114 reasoning about inductive structures.
       
   115 
       
   116 If we further contemplate the efficiency of the algorithm we 
       
   117 (under)specified, then our best guess is that it is exponential in the 
       
   118 worst case, just like the "naive" unification algorithm is exponential 
       
   119 if sharing is not treated carefully. 
       
   120 
       
   121 > In fact, I have been made aware of the following work [2] that do
       
   122 > formalize automata and regular language theory without using explicit
       
   123 > identifiers for state numbers in automata. The upside is that it
       
   124 > yields much smoother proofs than previous works in Coq. The downside
       
   125 > is that this formalization strategy does not yield efficient algorithm
       
   126 > for automata constructions. I reckon that this is a technical report
       
   127 > that should be taken with a grain of salt in the context of this
       
   128 > review, but I think that it gives evidence that it is not always the
       
   129 > case that the startup cost is higher with automata rather than with
       
   130 > regular expressions: they formalize the Myhill-Nerode theorem,
       
   131 > Kleene's theorem, the relationships between various representations of
       
   132 > regular languages in around 1500 lines of ssreflect/Coq. (Again, let
       
   133 > me emphasize that this is not a computational representation of
       
   134 > automata, and thus, it does not easily compare in terms of size with
       
   135 > Lammich and Tuerk's work or Braibant and Pous's work.) I think this
       
   136 > work would make for a much more relevant comparison with your work. 
       
   137 
       
   138 We cited this work and commented on it (see above).
       
   139 
       
   140 > I agree that comparing with related work is hard, and that the authors
       
   141 > have already been careful in their comparison with the formalization
       
   142 > sizes from other works. Yet, I am afraid an uninformed outsider would
       
   143 > miss the distinction between related work that are executable and
       
   144 > efficient and this formalization. 
       
   145 
       
   146 In order to get meaningful work done, we have to depend on readers 
       
   147 carefully reading the paper and obtaining required background material.
       
   148 I am afraid, there is no other way around. 
       
   149 
       
   150 > - I am still confused about the explanation about the way the
       
   151 > equational system is set up (your comment in the conclusion does not
       
   152 > help me to pinpoint why you have to do that): I understand that
       
   153 > Brzozowski annotates the final states with lambdas, and propagates
       
   154 > them to the unique initial state. On the other hand you start with a
       
   155 > set of equivalence classes, and you want to compute a regular
       
   156 > expression for all classes that are in finals. In other words, I still
       
   157 > fail to see why the following system does not suit your needs
       
   158 > 
       
   159 > X1 = a,X2 + b,X4 
       
   160 > X2 = b,X3 + a,X4 + lambda(ONE)
       
   161 > X3 = a,X4 + b,X4 + lambda(ONE)
       
   162 > X4 = a,X4 + b,X4
       
   163 >
       
   164 > with L(r,X) = L(r) \cdot X
       
   165 >
       
   166 > and then propagate the equation to compute to compute a value for
       
   167 > X1. (And, you could draw the automata underlying the transitions at
       
   168 > the top of p. 7 to help the reader understand what is going on, and
       
   169 > the relationship with the current equational system.)
       
   170 > 
       
   171 > Maybe it is simply that you could have done it, but that it would have
       
   172 > been less pleasant to work with in the context of a Myhill-Nerode
       
   173 > relation?
       
   174 
       
   175 We used in the paper the much stronger formulation of being
       
   176 _forced_ to set up the equational system in our way. And we even
       
   177 now have made the earlier comment about this issue, which was 
       
   178 placed in a footnote, to be part of the main text (see paragraph after 
       
   179 equation (6)). 
       
   180 
       
   181 While your equational system and computations generate a regular
       
   182 expression, it does not take into account the assumption from 
       
   183 the first direction of the Myhill-Nerode theorem. Recall that the 
       
   184 assumption is that there are finitely many equivalence classes 
       
   185 by the Myhill-Nerode relation. Therefore we have to make our equations
       
   186 to be in agreement with these equivalence classes, which the properties
       
   187 in equation (7) and (8) make explicit. We further have to maintain
       
   188 this agreement during the solution of the equational system. 
       
   189 
       
   190 Your equational system is _not_ in agreement with the equivalence 
       
   191 classes: consider that 
       
   192 
       
   193    X1 = {[]}
       
   194    X2 = {[a]}
       
   195    X3 = {[a, b]}
       
   196    X4 = UNIV − {[], [a], [a, b]} 
       
   197 
       
   198 hold, which is not true for the equations you have set up (under the
       
   199 natural interpretation you have given). Another point is that it is always the
       
   200 case that only one equivalence class in the Myhill-Nerode theorem contains 
       
   201 the empty string. Therefore it cannot be the case that there are more than 
       
   202 one (initial) equation containing the marker lambda(ONE), which translates 
       
   203 into the empty string under the natural interpretation. So we are
       
   204 of the opinion that for the Myhill-Nerode theorem you are really forced 
       
   205 to "reverse" the standard approaches. Please let us know, if this 
       
   206 clarifies this issue for you.
       
   207 
       
   208 > - I really appreciate the rephrasing of section 4, especially the
       
   209 > final remark. I think it clearly helps the reader to understand the
       
   210 > proof arguments, and the relationship with automata. 
       
   211 
       
   212 Thank you.
       
   213 
       
   214 > - I concur with the second reviewer to say that the executability of
       
   215 > the first half of the M-N theorem should be discussed in the
       
   216 > conclusion. Moreover, I am puzzled by your comments in the response to
       
   217 > reviewers: to what extent does your formalization need to be
       
   218 > _modified_ to make it executable? Do you need to change the code, or
       
   219 > to add more things? (More precisely, you discuss the executability of
       
   220 > the second half of the M-N theorem -- that constructs a partition of
       
   221 > the set of strings -- in the conclusion, in the context of Haftmann
       
   222 > executable sets; 
       
   223 
       
   224 The first half is about the Myhill-Nerode relation partitioning the sets of 
       
   225 all strings into equivalence classes. Our remark applies to this "half".
       
   226 
       
   227 > you should do the same for the first-half---that
       
   228 > constructs a regular expression). I am not sure I am happy with the
       
   229 > things you wrote about executable sets, because it is not clear from
       
   230 > my point of view if you need to modify the code, or add things. 
       
   231 
       
   232 See comment above. We really not want to open up this can of worms.
       
   233 
       
   234 > - More generally, it should be made clear that when you say algorithm,
       
   235 > the reader should not understand it as something that is computable as
       
   236 > it is. In particular, I think that the following sentence is
       
   237 > misleading: "our algorithm for solving equational systems actually
       
   238 > calculates a regular expression for the complement language." You
       
   239 > actually define such an expression, but it seems not possible, as it
       
   240 > is, to open the box to see what is inside.
       
   241 
       
   242 It is possible as long as you make the "do-not-care" non-determinism 
       
   243 explicit and fix appropriate datastructures for computations. 
       
   244 
       
   245 > - You answered: 
       
   246 > > In our opinion, Coquand and Siles' development strongly justifies our
       
   247 > > comment about formalisation of Brzozowski's argument being painful.
       
   248 > > If you like to have this reference added in the paper, we are happy
       
   249 > > to do so.
       
   250 >
       
   251 > I think that it is worth adding a comment like this one either in
       
   252 > section 5 (at the beginning of p. 20) or in the conclusion.
       
   253 
       
   254 We added to page 20:
       
   255 
       
   256   Therefore Coquand and Siles (2011) who follow through this idea 
       
   257   had to spend extra effort and first formalise the quite intricate 
       
   258   notion of inductively finite sets in order to formalise this property.
       
   259 
       
   260 > - The authors provide a detailed figure for the number of line of code
       
   261 >   of Almeida et al's implementation of Mirkin construction, but there
       
   262 >   is no such figure for some of other works that are cited. Maybe this
       
   263 >   number could be dropped, or similar figures could be given for other
       
   264 >   works, in order to give a more accurate picture.
       
   265 
       
   266 We are not 100% sure about this comment. We have not selected the
       
   267 explicit numbers for the Lammich-Tuerk and Almeida et al works because of  
       
   268 "name-and-shaming" or because of a "beauty contest". Rather to illustrate 
       
   269 a trend. We asked about the numbers for the Lammich-Tuerk work. They 
       
   270 also confirmed with us that their automata-with-names approach is painful 
       
   271 to work with. Unfortunately they have not written this in their paper 
       
   272 such that we could have cited it. Filliatre makes explicitly such a comment 
       
   273 in his paper, which we have cited. Braibant equally wrote that he reckons 
       
   274 that our formalisation is more concise. There is no way we can give a number 
       
   275 in terms of lines-of-code for the formalisation in Nuprl. We believe 
       
   276 the number for the Almeida et al work is accurate (and not even the
       
   277 biggest formalisation from the ones listed, but one which is about
       
   278 one specific topic).  
       
   279 
       
   280 > - I wonder if you have direction for future works, that would satisfy
       
   281 >  your predicate "results from regular language theory, not results
       
   282 >  from automata theory". It would be nice to give a few examples if
       
   283 >  you have some in mind. 
       
   284 
       
   285 We added the work by Sulzmann and Lu who work on regular expression
       
   286 sub-matching using derivatives of regular expressions. We would like
       
   287 to formalise this work. Thank you very much for suggesting future work.
       
   288