Parsing-Literature/yacc-is-dead-review
changeset 153 e7e4e490326b
equal deleted inserted replaced
152:c265a1689bdc 153:e7e4e490326b
       
     1 -------------- Reviews on submission 104 -------------
       
     2 
       
     3 -------------------- review 1 --------------------
       
     4 
       
     5 PAPER: 104
       
     6 TITLE: Parsing with derivatives
       
     7 
       
     8 OVERALL RATING: 4 (OK paper, but I will not champion it.)
       
     9 REVIEWER'S CONFIDENCE: 2 (medium)
       
    10 ----------------------- REVIEW --------------------
       
    11 
       
    12 MAIN PART
       
    13 The paper describes an original approach to the derivation of parsers from grammars. The approach is based on the observation that context-free grammars are closed under Brzozowski derivatives, and such derivatives can be easily computed. The parser described hence parses a string by lazily computing the derivatives of the grammar for each character met. Quite surprisingly, such a parser seems very easy to encode, especially in a language that supports streams and lazy computations, can parse every grammar, and is quite efficient.
       
    14 I found the paper very interesting. The studied problem is very important, and is surprising to meet original results in a field that has been explored so much. I enjoyed the first pages of the paper very much, they are extremely crisp and clear. Unfortunately, around page 8, the paper starts getting more and more obscure; chapter 7 end abruptly, and chapter 8, which is probably the most important one, is quite obscure. At the end I still think that the paper should be accepted, but I really hope that the authors will be able to beef section 8 up to the point of making it readable for me.
       
    15 Detailed remarks for the authors only
       
    16 Example 1: the parse tree is: => the pasre tree of ' () ' is;
       
    17 In general: you should write <B||B->\epsilon> rather then <B|B->\epsilon> (this is not quantum theory, after all...)
       
    18 The non-deterministic state machine: the language in \Sigma is a language over A', not over A, right? You should say that explicitly.
       
    19 In the rules for bra's and ket's, the consequences should look like
       
    20    (L,w) =>^{\bra{n}} (D_{\bra{n}}(L),w)
       
    21 I think you forgot the derivative, right?
       
    22 The parse-string machine pushed a bra when it meets it, but never pops it. You can easily adjust the last rule of section 7 to pop it, but the real question is: why do you push it?
       
    23 The end of section 7 is too abrupt. You have described a nice pair of abstract machines, but now you should tell me something a bit more concrete about the amount of nondetermins involved, how much would that hurt me in practice...
       
    24 Page 10: the monotonicity for parser combinators should look like:
       
    25   forall p in \tau_X(w) exists p' in \tau_X(w') such that first p prefix of first p'
       
    26 Section 8 is obscure. It is too terse. The equations look right and look convincing, but I was totally unable to create a mental picture about how this machinery actually work. I understand that such a combination of non-determinism and lazyness creates code that tends to work in a mysterious way, but still the section is extremely unsatisfactory, especially.
       
    27 Section 9: a bit unreadable, at least for a non-scala programmer.
       
    28 PROS AND CONS
       
    29 + interesting solution for an interesting problem
       
    30 + elegant
       
    31 + innovative
       
    32 + first 8 pages are a pleasure to read
       
    33 - crucial sections are almost unreadable
       
    34 REBUTTAL
       
    35 
       
    36 
       
    37 -------------------- review 2 --------------------
       
    38 
       
    39 PAPER: 104
       
    40 TITLE: Parsing with derivatives
       
    41 
       
    42 OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
       
    43 REVIEWER'S CONFIDENCE: 4 (expert)
       
    44 ----------------------- REVIEW --------------------
       
    45 
       
    46 MAIN PART
       
    47 The authors describe how one can use Brzozowski-style derivatives to
       
    48 implement parser combinators. They claim that their technique leads to
       
    49 simple implementations, and this seems to be the case. They also claim
       
    50 that their implementation is efficient, but give no solid evidence for
       
    51 this (they added a special combinator to get decent numbers from their
       
    52 single reported experiment). Another claim is that their parser
       
    53 combinators can handle left recursion, but they give no evidence for
       
    54 this.
       
    55 The authors seem to be unaware that Danielsson presented essentially
       
    56 the same technique at TYPES 2009 ("Mixing Induction and Coinduction").
       
    57 However, Danielsson's draft paper does not try to sell the technique,
       
    58 only using it as a vehicle to implement provably total parser
       
    59 combinators, and he does not address efficiency of parsing, so there
       
    60 could be a place for both papers.
       
    61 The authors seem to claim that their implementation is a lot more
       
    62 efficient than other parser combinator libraries:
       
    63  "[...] they are easy to implement, and easy to implement as
       
    64  libraries (like top-down parsers), yet they are efficient and
       
    65  expressive (like bottom-up parsers)."
       
    66 However, they have not made any head-to-head comparisons, and they
       
    67 have not cited a single paper discussing efficient parser combinators.
       
    68 Generally there is a serious lack of references to related work on
       
    69 parser combinators.
       
    70 Section 7 discusses another parsing technique (also based on
       
    71 derivatives). However, I found this section to be poorly written and
       
    72 hard to follow. I suggest that it should be removed.
       
    73 If the claims about efficiency had been followed up with substantial
       
    74 evidence, then I would have recommended that the paper should be
       
    75 published. As it stands I do not make such a recommendation.
       
    76 Detailed comments:
       
    77 Section 2:
       
    78 "This technique was lost to the “sands of time” until [paper from
       
    79 2009]" A quick web search will tell you that a large number of papers
       
    80 published before 2009 have used Brzozowski derivatives (or
       
    81 generalisations), so this comment is incorrect. See for instance the
       
    82 work of Rutten, including "Automata and Coinduction; an exercise in
       
    83 coalgebra" and "Automata, Power Series, and Coinduction: Taking Input
       
    84 Derivatives Seriously".
       
    85 Section 6:
       
    86 The expansion of D_x(List) would be easier to follow if you added a
       
    87 step between the second and third lines.
       
    88 "the null string terminal" and "the empty nonterminal" may not exist
       
    89 in N, so they may have to be added to N'.
       
    90 One could be led to believe that the last "and:" only applies when
       
    91 m = 0.
       
    92 "The empty nonterminal arises frequently with derivatives, and if this
       
    93 reduction is applied aggressively, the grammar will remain a
       
    94 manageable size." You give no proper motivation for this statement. In
       
    95 fact, your discussion of the "special closure construct" in Section 9
       
    96 seems to imply that this statement is false. I suggest that you remove
       
    97 this statement.
       
    98 Section 7:
       
    99 The machine in this section does not make sense to me. First of all
       
   100 you do not state what the machine should accomplish; my guess is that
       
   101 you intend
       
   102  ((L, s), <>) ->* ((L', <>), <t>)
       
   103 to hold iff t is a parse tree which witnesses the membership of s in
       
   104 L. Furthermore you give no reason (such as a proof) to believe that
       
   105 the machine satisfies this specification. For instance, you push
       
   106 nonterminals onto the stack, but you never remove them (I suspect that
       
   107 the last rule should remove a nonterminal), and it seems strange that
       
   108 the state is unchanged in the rules involving "first".
       
   109 I think the paper would be better if Section 7, which is not central
       
   110 to the paper, was omitted.
       
   111 "This approach is not well suited to typed languages, since a literal
       
   112 implementation of the parsing machine must use a generic parse-tree
       
   113 type, or else violate type-safety with casts for reductions." I
       
   114 suspect that you could do better with a dependently typed language, so
       
   115 I suggest that you remove this sentence.
       
   116 Example 1: "the parse tree is:" I think you mean "one parse tree is:".
       
   117 "The parsing machine demonstrates that, even without reduction rules
       
   118 associated with the grammar, we can use derivatives to construct a
       
   119 concrete syntax tree." I do not follow this sentence. What kind of
       
   120 reduction rules are you referring to?
       
   121 Section 8:
       
   122 The parser combinators that you give seem quite outdated to me. The
       
   123 functional programming community moved away from this set of
       
   124 combinators more than a decade ago, in favour of combinators based on
       
   125 monads or applicative functors. Please motivate this choice.
       
   126 You state two properties that parser combinators must satisfy, but you
       
   127 never really make any use of these properties. Are these properties
       
   128 necessary for your development? Is your definition of a parser
       
   129 combinator a function which satisfies these properties?
       
   130 "A parser combinator is monotonic iff..." I assume that the second
       
   131 instance of \sqsubseteq should be read pointwise.
       
   132 "the meaning of an input expression w <- A^∗ is contained within
       
   133 tau_N(w)" In fact it is the one and only (complete) result, right?
       
   134 "D_c(c) = eps -> \w.{(c, w)}" This does not make sense, the second
       
   135 argument of -> is not a parser.
       
   136 Section 9:
       
   137 Your parsers accept infinite streams as input. However, parseFull
       
   138 clearly fails to terminate for infinite input. Why do you not restrict
       
   139 the input to finite lists?
       
   140 Your parsers return potentially infinite streams of results. However,
       
   141 your parse function uses "append", which (essentially) throws away its
       
   142 second argument if the first is infinite. Wouldn't it be better to
       
   143 return a fair interleaving of the two streams?
       
   144 In internalDerive you seem to have renamed a and b to first and
       
   145 second, respectively.
       
   146 "the longest possible parse first" Why is this relevant for
       
   147 performance? If you return all possible parses, then the order in
       
   148 which they are returned should not matter, and for unambiguous parsers
       
   149 at most one result is returned anyway. I can imagine that your
       
   150 optimisation matters in the (atypical?) case of an ambiguous parse, if
       
   151 you are not interested in inspecting all the results, but only want to
       
   152 see that there is more than one.
       
   153 The sentence before the table seems to be unfinished.
       
   154 I assume that "43 m" should be "43 ms".
       
   155 "These results suggest that derivative-based parser combinators scale
       
   156 well enough to be useful in practice." To me this result suggests that
       
   157 I may have to repeatedly add new special-case combinators (like your
       
   158 closure operation) to ensure decent performance.
       
   159 Section 10:
       
   160 "derivative-based parsing defies the classical taxonomies", "Top-down
       
   161 methods", "back-tracking"... Are you not aware of parser combinator
       
   162 libraries which use breadth-first techniques (which your combinators
       
   163 also do)? See, for instance, Claessen's "Parallel Parsing Processes".
       
   164 "these methods have not been able to handle left-recursive grammars"
       
   165 You should check out Lickman's MSc thesis "Parsing With Fixed Points"
       
   166 and "Parser Combinators for Ambiguous Left-Recursive Grammars" by
       
   167 Frost et al.
       
   168 "Philosophically, neither derivative-based method “feels” like a
       
   169 top-down parsing method." To me your combinator implementation does
       
   170 feel like a (breadth-first) top-down method.
       
   171 You claim (implicitly) that your parser combinators can handle left
       
   172 recursive grammars. However, I doubt that you can handle grammars of
       
   173 the form "n ::= n". If you can handle such grammars, please explain
       
   174 how you do it. Otherwise it would be interesting to know exactly what
       
   175 kind of left recursion you support.
       
   176 "they are efficient and expressive (like bottom-up parsers)" This
       
   177 seems to imply that there is a language which can be parsed using
       
   178 bottom-up techniques, but not using top-down techniques. Can you
       
   179 verify this claim?
       
   180 You may want to cite Frost and Szydlowski, who claim worst-case cubic
       
   181 time for parsing ("Memoizing purely functional top-down backtracking
       
   182 language processors"). Swierstra's work on efficient parser
       
   183 combinators is also relevant (many papers).
       
   184 References:
       
   185 Improve formatting.
       
   186 PROS AND CONS
       
   187 + Simple technique for implementing parser combinators.
       
   188 - The technique has been presented before.
       
   189 - No substantial evidence for claims of efficiency.
       
   190 - Section 7 does not make sense to me (but could be removed).
       
   191 REBUTTAL
       
   192 
       
   193 
       
   194 -------------------- review 3 --------------------
       
   195 
       
   196 PAPER: 104
       
   197 TITLE: Parsing with derivatives
       
   198 
       
   199 OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
       
   200 REVIEWER'S CONFIDENCE: 2 (medium)
       
   201 ----------------------- REVIEW --------------------
       
   202 
       
   203 MAIN PART
       
   204 The paper proposes a new parsing technique based on "derivatives",
       
   205 inspired by the derivatives of regular expressions and extended to the
       
   206 derivatives of context free grammars.
       
   207 Because context free grammars may be ambiguous, they propose to compute the
       
   208 set of all parse trees but using lazy evaluation to share and delay the
       
   209 computation. They present their framework and an implementation in Scala.
       
   210 They claim that their technique is efficient in practice.
       
   211 
       
   212 EVALUATION
       
   213 Parsing techniques are old and well-established, although the current status
       
   214 of the theory is still unsatisfactory: grammars are still a burden to write
       
   215 and any advance of this topic is welcomed.  However, the topic must be
       
   216 addressed very seriously.
       
   217 Given this, this work is potentially interesting as a possible way to
       
   218 improve the state of the art situation, but not in its current
       
   219 presentation, as the paper should not just _claim_ that using derivatives
       
   220 for parsing is feasible, but _demonstrate_ that writing grammars with
       
   221 combinators is at least both as convenient and as efficient as using
       
   222 existing technology.
       
   223 The first aspect is not discussed at all in the paper.  As for efficiency,
       
   224 which is claimed, the benchmarks are really small, and perhaps probably
       
   225 representative of existing grammars, and the results are not compared with
       
   226 other parsing techniques.
       
   227 Other questions arise: what happens when the grammar is largely ambiguous?
       
   228 Do you intend to reject those, or pick the meaning of the things that you
       
   229 are parsing at random?
       
   230 The overall structure of the paper is not very clear either. You should
       
   231 gives us a road map at the beginning and make it clear how the different
       
   232 sections fits together.
       
   233 
       
   234 OTHER COMMENTS
       
   235 Section 7 and 8 are the key sections of the paper to understand your
       
   236 proposal. However, I have problems with both.
       
   237 I don't understand while your parsing is decomposed into production of an
       
   238 intermediate parsing string that is then consumed by the machine that
       
   239 produces a syntax tree. Since the parsing string is a linear representation
       
   240 of the parsing tree, why can't you work with the parsing tree itself, which
       
   241 is more structured than the parsing string? Why can't you directly extract
       
   242 a syntax tree from the parsing tree? Could you run the algorithm in Section
       
   243 7 on an example?
       
   244 In section 8, you compute the derivatives of parser combinators, but you
       
   245 could instead compute the grammar defined by the parser combinator and
       
   246 compute its derivative in one step. What is this bad/worse than computing it
       
   247 directly?
       
   248 
       
   249 - One line above section 2: "could [to<-] modify"
       
   250 - Section 2, Line 10: "derivative---[they<-??] critical property"
       
   251 - One line above Section 7: "remain [<-of] a manageable size.
       
   252 - Computation of the derivative, above theorem 1:
       
   253  You could explain the derivatives, especially for rules n-> s1..sn
       
   254  and perhaps give an example.
       
   255 - Last two lines above section 8: "The parsing machine demonstrates...
       
   256  a concrete syntax tree".
       
   257  I don't really understand what you (intend to) mean.
       
   258 - Section 8.1, Line 1: "ability [the<-to] combine them"
       
   259 - End of section 8.1:
       
   260  you should say and comment on the fact that these definitions are
       
   261  recursive.
       
   262 - First line of section 8.4: "to be able [<-to] parse"
       
   263 - Section 8.4: How do you use these inequations? Why do you give this list of
       
   264  formulas without any comment?
       
   265 
       
   266 - Section 8.5: what are the lambda-terms on the right of --> ?
       
   267 
       
   268 
       
   269 -------------------- review 4 --------------------
       
   270 
       
   271 PAPER: 104
       
   272 TITLE: Parsing with derivatives
       
   273 
       
   274 OVERALL RATING: 1 (Serious problems. I will argue to reject this paper.)
       
   275 REVIEWER'S CONFIDENCE: 3 (high)
       
   276 ----------------------- REVIEW --------------------
       
   277 
       
   278 MAIN PART
       
   279 SUMMARY
       
   280 The paper proposes a basic algorithm for determining whether a sentence s is a
       
   281 member of a context-free language L: (i) if the sentence is empty, determine
       
   282 whether L contains the empty string (a decidable problem); (ii) if the
       
   283 sentence is of the form c.s, determine whether the suffix s is a member of the
       
   284 derivative c^{-1}.L. Because the derivative of a context-free language can be
       
   285 effectively constructed, this is an algorithm.
       
   286 The contribution of the paper is in generalizing this basic algorithm in two
       
   287 ways: (i) instead of just determining membership, the parsing algorithm can be
       
   288 made to return a concrete syntax tree, or a semantic value; (ii) instead of
       
   289 viewing the derivative as an operation that accepts a monolithic grammar and
       
   290 produces a monolithic grammar, one can view it as an operation that accepts a
       
   291 parser combinator and produces a parser combinator; this means that the
       
   292 parsing algorithm is amenable to a modular, combinator-based implementation.
       
   293 EVALUATION
       
   294 The introduction of the paper is somewhat irritating, for two reasons. First,
       
   295 it promises a lot ("simplicity, feasibility, and ubiquity"), which the paper
       
   296 does not clearly deliver. Second, it criticizes the most popular parsing
       
   297 methods (LL and LR) on the grounds that the constraints that they impose, in
       
   298 order to rule out ambiguous grammars, are difficult to grasp. I can accept
       
   299 this criticism. But the authors do not offer any solution to the ambiguity
       
   300 problem: their algorithm produces a parse forest. This leaves the programmer
       
   301 faced with the task of writing and debugging disambiguation filters, an error
       
   302 prone task. At least, an LR parser generator is able (in principle, and some
       
   303 implementations actually do this) to accurately explain why the grammar seems
       
   304 ambiguous. It can produce a sentence that is a prefix of the fringes of two
       
   305 distinct, valid parse trees.
       
   306 The first claimed contribution of the paper is the remark that the derivative
       
   307 of a context free language can be effectively constructed. (A proof appears in
       
   308 section 6.) This is in fact not a contribution. This property of context free
       
   309 languages is well-known, at least in the formal language community (perhaps
       
   310 not in the programming language community). The derivative of a formal language
       
   311 is also known as its left quotient, and written c^{-1}L. See, for instance, the
       
   312 following paper, which exploits this property intensively:
       
   313  Jean Berstel and Luc Boasson. Towards an algebraic theory of context-free
       
   314  languages. Fundamentae Informaticae, 25(3):217-239, 1996.
       
   315 The remainder of the paper contains two distinct developments. In section 7, a
       
   316 version of the derivative-based parsing algorithm is developed, which produces
       
   317 concrete syntax trees. Then, in sections 8 and 9, another version is developed,
       
   318 which produces programmer-defined semantic values, and is combinator-based.
       
   319 It is not clear to me why these two distinct developments co-exist. In my eyes,
       
   320 the approach of sections 8 and 9 seems superior, because it is more flexible
       
   321 (it is not limited to producing concrete syntax trees) and modular.
       
   322 The idea that the notion of a derivative can be generalized to parser
       
   323 combinators is interesting, and worthy of thought. To some extent, section 8.2
       
   324 is where the paper really begins! Section 8 provides a mathematical view of
       
   325 the problem, while section 9 describes part of a Scala implementation.
       
   326 This material is interesting, but can be criticized on several grounds:
       
   327 1. The complexity analysis of the algorithm is absent. The algorithm seems
       
   328 very expensive in the worst case: indeed, the effective construction of a
       
   329 derivative in section 6 causes a quadratic blow up of the size of the grammar,
       
   330 and this operation is iterated at every character of the input stream! In
       
   331 fact, section 9.1 mentions that a naive implementation of the algorithm is
       
   332 unusable (exponentially slow), and that special support for the Kleene star
       
   333 had to be added. However, there is no argument that this patch is sufficient
       
   334 for the algorithm to become useable. A worst-case analysis of the space and
       
   335 time complexity is needed, and, if this worst-case complexity is bad (say,
       
   336 exponential), then a formal or semi-formal argument is needed why the behavior
       
   337 of the algorithm remains tolerable in practice. In light of the efficiency
       
   338 claim found in the introduction, this is indispensable!
       
   339 2. The description of the implementation could be improved. In particular,
       
   340 memoization and sharing seem to play a crucial role. It seems that two
       
   341 attempts to compute the derivative of a parser combinator should return the
       
   342 same object. This includes the particular case where the second attempt is in
       
   343 fact nested within the first one (i.e., it is a recursive call). For instance,
       
   344 if the grammar contains the single production L -> LM, a call to L.derive(t)
       
   345 will (indirectly) cause another call to L.derive(t). What happens then? By
       
   346 which mechanism is non-termination avoided? This is not clearly described.
       
   347 The paper contains four lines of text that seem to concern this topic (p.15,
       
   348 "It is important to note [...] immediately suspended."), but I find them
       
   349 obscure. I note that there seems to be a distinction between derive and
       
   350 internalDerive, which is not explained in the paper; my guess is that the
       
   351 former may be a memoizing version of the latter, using a table indexed by
       
   352 tokens. Is this the case?
       
   353 3. The introduction claims "ubiquity", that is, if I understand correctly, it
       
   354 claims that the method is very easy to implement as a library. This may be
       
   355 true, but the paper lacks a convincing argument why this is so. In particular,
       
   356 how is this approach superior to the other parsing combinator libraries in
       
   357 existence? The paper lacks a good comparison with other combinator-based
       
   358 parsing techniques.
       
   359 It seems that there might be a connection between the technique presented in
       
   360 the paper and intersection parsing (see e.g. chapter 13 of the excellent book
       
   361 "Parsing techniques", by Grune and Jacobs). Computing the derivative of a
       
   362 context free language with respect to a terminal symbol c is closely related
       
   363 to computing its intersection with the regular language c(.*). Intersection
       
   364 parsing, as described by Grune and Jacobs, computes the intersection of the
       
   365 grammar with the input (which is a string, so a regular language!) and
       
   366 simplifies the resulting grammar, which can then be viewed as a parse
       
   367 tree. Could one view the approach presented in the paper as an incremental
       
   368 (and modular) way of doing intersection parsing?
       
   369 In conclusion, although the paper contains an interesting idea, its
       
   370 investigation is not sufficiently thorough, and a clear case in favor
       
   371 of this parsing technique is missing.
       
   372 DETAILED COMMENTS
       
   373 The authors state in section 7 that "a literal implementation [...] must use a
       
   374 generic parse-tree type, or else violate type safety with casts for
       
   375 reductions". Strictly speaking, this depends on which type system one has in
       
   376 mind. If the type system is ML, for instance, the claim is probably true. If
       
   377 the type system has GADTs, however, it is quite possible that the discipline
       
   378 of the parsing machine can be encoded in the types. This has been done, for an
       
   379 LR parsing machine, and for an arbitrary but fixed grammar, in the following
       
   380 paper:
       
   381  François Pottier and Yann Régis-Gianas. Towards efficient, typed LR parsers.
       
   382  In ACM Workshop on ML, volume 148(2) of Electronic Notes in Theoretical Computer
       
   383  Science, pages 155-180, 2006.
       
   384 To the best of my knowledge, both Haskell and Scala have GADTs, so the authors
       
   385 might wish to check this out.
       
   386 p.5, "for any class of decidable languages closed under the derivative,
       
   387 derivatives may be used to determine membership". I am not sure what is meant
       
   388 by "decidable" here. If it is meant that membership in the language is
       
   389 decidable, then this is a tautology. Perhaps one should understand that
       
   390 membership *of the empty word* in the language is decidable, since effectively
       
   391 that is all that is needed?
       
   392 p.5, "the derivative is closed under regular operations": unclear formulation.
       
   393 p.7, before theorem 1: please indicate the interval over which the index i
       
   394 ranges. My guess is 0 \geq i < m. The particular case where m = 0 seems
       
   395 redundant: when m = 0, there is no suitable index i, so no production is
       
   396 generated, which means D_c(n) is empty. In fact, the notation \emptyset
       
   397 in the right-hand side of a production does not make sense, since the
       
   398 right-hand side of a production is supposed to be a sequence of symbols.
       
   399 p.7, "the empty nonterminal": this has not been defined. I assume this
       
   400 explains the use of \emptyset above. Define it.
       
   401 p.8, example 1: "for the grammar [...] of balanced parentheses [...] the parse
       
   402 tree is": I assume this should be: "the parse tree for the string '()'".
       
   403 p.8, "A parse string is [...] containing both nonterminal markers and
       
   404 reduction markers"." Please indicate at this point that a parse string also
       
   405 contains elements of the alphabet A. In fact, the definition of the alphabet
       
   406 A' should be moved to this point.
       
   407 p.8, "we construct a new grammar [...] over parse strings": please change
       
   408 this to "a new grammar over the alphabet A'".
       
   409 p.8, the definition of \Sigma mentions \mathbb{L}: I guess this should be
       
   410 \mathbb{L}_{A'}.
       
   411 p.9, "a second rule for producing bras": is the rule correct? In the
       
   412 right-hand side of the conclusion, instead of L, I would have expected the
       
   413 derivative of L with respect to <n|. (The rule, as it stands, clearly causes
       
   414 an infinite stream of <n|'s to be produced! Similarly for the third rule.
       
   415 p.9, "the parsing machine reduces": one might wish to state the invariant
       
   416 of the parsing machine which guarantees that reduction can never fail (by
       
   417 lack of material on the stack).
       
   418 p.9, "popular as of late": please provide citation(s).
       
   419 p.10, the statement of the monotonicity condition is somewhat unclear: the
       
   420 assertion \tau_X(w) \sqsubseteq \tau_X(w') compares two sets of pairs. An
       
   421 ordering over pairs is defined, but an ordering over sets of pairs is not
       
   422 defined.
       
   423 p.11, the authors might want to spend more space explaining the definition of
       
   424 the derivative of a parser combinator -- why is the right definition? and how
       
   425 does it relate with the definition of the derivative of a formal language?
       
   426 Can one recover the latter as a degenerate case of the former?
       
   427 p.12, section 8.3: the parser combinator \epsilon is used, but has not been
       
   428 defined. What is its type? In particular, what semantic value does it produce?
       
   429 (A unit value?) What is the meaning of the notation \lambda\epsilon. ...
       
   430 p.12, equation (3): the elegance afforded by the use of \delta in the
       
   431 definition of the derivative of a concatenation (p.5) seems to have been lost
       
   432 here.
       
   433 p.13, "in practice, the cost [...] as they are derived". I don't know what
       
   434 this means. Please be more precise and include a formal complexity claim.
       
   435 p.14, "parse trees of type A" should really be referred to as "semantic
       
   436 values", since they may not be trees at all -- they are programmer-defined.
       
   437 p.14, "its implementation follows eq.2": actually, eq.2 corresponds only
       
   438 to the "else" clause of the code, so the text is badly worded. Same problem
       
   439 with eq.1 a few lines later.
       
   440 p.15, the distinction between "derive" and "internalDerive" is not explained.
       
   441 p.15, the derivative of union "short-circuits if it finds either arm is
       
   442 empty": is this important, say, to ensure termination? or is it just a minor
       
   443 optimization?
       
   444 p.16, the fact that a naïve treatment of the Kleene star required exponential
       
   445 time, and the fact that right recursion is significantly more efficient than
       
   446 left recursion, are worrisome. These problems are perhaps partially masked by
       
   447 the introduction of "the special closure construct", but how do we know that
       
   448 they are gone?
       
   449 p.16, the test suite consists of just one grammar and a set of randomly
       
   450 generated inputs. This seems far from representative of actual usage...
       
   451 p.17, "the cost of that compilation is amortized across the entire parsing
       
   452 process". I don't know what is meant by "compilation" here, nor do I know
       
   453 why its cost is amortized. Please explain.
       
   454 p.18, "these combinators are slowly transforming and unfolding the grammar
       
   455 itself into a parse tree": this seems to be an interesting intuition. Why
       
   456 not try to make it more formal?
       
   457 
       
   458 TYPOS
       
   459 p.2, "could to modify"
       
   460 p.2, "they critical"
       
   461 p.6, "attempt interpret"
       
   462 p.10, "Operations or combinators"
       
   463 p.10, "the remainders of the input"
       
   464 p.12, "is base case"
       
   465 p.13, "able parse"
       
   466 p.15, "short-circuiting if the first element is nullable": should this be: "is
       
   467 not nullable"?
       
   468 p.16, "43 m" -> "43 ms"
       
   469 p.16, "surpising"
       
   470 p.17, "they extent"
       
   471 PROS AND CONS
       
   472 + interesting and possibly novel idea
       
   473 - no time/space complexity analysis
       
   474 - implementation not clearly described
       
   475 - case in favor of the idea not clearly made
       
   476 - comparison with related work should be more thorough
       
   477 REBUTTAL
       
   478 No specific questions.