| 153 |      1 | -------------- Reviews on submission 104 -------------
 | 
|  |      2 | 
 | 
|  |      3 | -------------------- review 1 --------------------
 | 
|  |      4 | 
 | 
|  |      5 | PAPER: 104
 | 
|  |      6 | TITLE: Parsing with derivatives
 | 
|  |      7 | 
 | 
|  |      8 | OVERALL RATING: 4 (OK paper, but I will not champion it.)
 | 
|  |      9 | REVIEWER'S CONFIDENCE: 2 (medium)
 | 
|  |     10 | ----------------------- REVIEW --------------------
 | 
|  |     11 | 
 | 
|  |     12 | MAIN PART
 | 
|  |     13 | The paper describes an original approach to the derivation of parsers from grammars. The approach is based on the observation that context-free grammars are closed under Brzozowski derivatives, and such derivatives can be easily computed. The parser described hence parses a string by lazily computing the derivatives of the grammar for each character met. Quite surprisingly, such a parser seems very easy to encode, especially in a language that supports streams and lazy computations, can parse every grammar, and is quite efficient.
 | 
|  |     14 | I found the paper very interesting. The studied problem is very important, and is surprising to meet original results in a field that has been explored so much. I enjoyed the first pages of the paper very much, they are extremely crisp and clear. Unfortunately, around page 8, the paper starts getting more and more obscure; chapter 7 end abruptly, and chapter 8, which is probably the most important one, is quite obscure. At the end I still think that the paper should be accepted, but I really hope that the authors will be able to beef section 8 up to the point of making it readable for me.
 | 
|  |     15 | Detailed remarks for the authors only
 | 
|  |     16 | Example 1: the parse tree is: => the pasre tree of ' () ' is;
 | 
|  |     17 | In general: you should write <B||B->\epsilon> rather then <B|B->\epsilon> (this is not quantum theory, after all...)
 | 
|  |     18 | The non-deterministic state machine: the language in \Sigma is a language over A', not over A, right? You should say that explicitly.
 | 
|  |     19 | In the rules for bra's and ket's, the consequences should look like
 | 
|  |     20 |    (L,w) =>^{\bra{n}} (D_{\bra{n}}(L),w)
 | 
|  |     21 | I think you forgot the derivative, right?
 | 
|  |     22 | The parse-string machine pushed a bra when it meets it, but never pops it. You can easily adjust the last rule of section 7 to pop it, but the real question is: why do you push it?
 | 
|  |     23 | The end of section 7 is too abrupt. You have described a nice pair of abstract machines, but now you should tell me something a bit more concrete about the amount of nondetermins involved, how much would that hurt me in practice...
 | 
|  |     24 | Page 10: the monotonicity for parser combinators should look like:
 | 
|  |     25 |   forall p in \tau_X(w) exists p' in \tau_X(w') such that first p prefix of first p'
 | 
|  |     26 | Section 8 is obscure. It is too terse. The equations look right and look convincing, but I was totally unable to create a mental picture about how this machinery actually work. I understand that such a combination of non-determinism and lazyness creates code that tends to work in a mysterious way, but still the section is extremely unsatisfactory, especially.
 | 
|  |     27 | Section 9: a bit unreadable, at least for a non-scala programmer.
 | 
|  |     28 | PROS AND CONS
 | 
|  |     29 | + interesting solution for an interesting problem
 | 
|  |     30 | + elegant
 | 
|  |     31 | + innovative
 | 
|  |     32 | + first 8 pages are a pleasure to read
 | 
|  |     33 | - crucial sections are almost unreadable
 | 
|  |     34 | REBUTTAL
 | 
|  |     35 | 
 | 
|  |     36 | 
 | 
|  |     37 | -------------------- review 2 --------------------
 | 
|  |     38 | 
 | 
|  |     39 | PAPER: 104
 | 
|  |     40 | TITLE: Parsing with derivatives
 | 
|  |     41 | 
 | 
|  |     42 | OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
 | 
|  |     43 | REVIEWER'S CONFIDENCE: 4 (expert)
 | 
|  |     44 | ----------------------- REVIEW --------------------
 | 
|  |     45 | 
 | 
|  |     46 | MAIN PART
 | 
|  |     47 | The authors describe how one can use Brzozowski-style derivatives to
 | 
|  |     48 | implement parser combinators. They claim that their technique leads to
 | 
|  |     49 | simple implementations, and this seems to be the case. They also claim
 | 
|  |     50 | that their implementation is efficient, but give no solid evidence for
 | 
|  |     51 | this (they added a special combinator to get decent numbers from their
 | 
|  |     52 | single reported experiment). Another claim is that their parser
 | 
|  |     53 | combinators can handle left recursion, but they give no evidence for
 | 
|  |     54 | this.
 | 
|  |     55 | The authors seem to be unaware that Danielsson presented essentially
 | 
|  |     56 | the same technique at TYPES 2009 ("Mixing Induction and Coinduction").
 | 
|  |     57 | However, Danielsson's draft paper does not try to sell the technique,
 | 
|  |     58 | only using it as a vehicle to implement provably total parser
 | 
|  |     59 | combinators, and he does not address efficiency of parsing, so there
 | 
|  |     60 | could be a place for both papers.
 | 
|  |     61 | The authors seem to claim that their implementation is a lot more
 | 
|  |     62 | efficient than other parser combinator libraries:
 | 
|  |     63 |  "[...] they are easy to implement, and easy to implement as
 | 
|  |     64 |  libraries (like top-down parsers), yet they are efficient and
 | 
|  |     65 |  expressive (like bottom-up parsers)."
 | 
|  |     66 | However, they have not made any head-to-head comparisons, and they
 | 
|  |     67 | have not cited a single paper discussing efficient parser combinators.
 | 
|  |     68 | Generally there is a serious lack of references to related work on
 | 
|  |     69 | parser combinators.
 | 
|  |     70 | Section 7 discusses another parsing technique (also based on
 | 
|  |     71 | derivatives). However, I found this section to be poorly written and
 | 
|  |     72 | hard to follow. I suggest that it should be removed.
 | 
|  |     73 | If the claims about efficiency had been followed up with substantial
 | 
|  |     74 | evidence, then I would have recommended that the paper should be
 | 
|  |     75 | published. As it stands I do not make such a recommendation.
 | 
|  |     76 | Detailed comments:
 | 
|  |     77 | Section 2:
 | 
|  |     78 | "This technique was lost to the “sands of time” until [paper from
 | 
|  |     79 | 2009]" A quick web search will tell you that a large number of papers
 | 
|  |     80 | published before 2009 have used Brzozowski derivatives (or
 | 
|  |     81 | generalisations), so this comment is incorrect. See for instance the
 | 
|  |     82 | work of Rutten, including "Automata and Coinduction; an exercise in
 | 
|  |     83 | coalgebra" and "Automata, Power Series, and Coinduction: Taking Input
 | 
|  |     84 | Derivatives Seriously".
 | 
|  |     85 | Section 6:
 | 
|  |     86 | The expansion of D_x(List) would be easier to follow if you added a
 | 
|  |     87 | step between the second and third lines.
 | 
|  |     88 | "the null string terminal" and "the empty nonterminal" may not exist
 | 
|  |     89 | in N, so they may have to be added to N'.
 | 
|  |     90 | One could be led to believe that the last "and:" only applies when
 | 
|  |     91 | m = 0.
 | 
|  |     92 | "The empty nonterminal arises frequently with derivatives, and if this
 | 
|  |     93 | reduction is applied aggressively, the grammar will remain a
 | 
|  |     94 | manageable size." You give no proper motivation for this statement. In
 | 
|  |     95 | fact, your discussion of the "special closure construct" in Section 9
 | 
|  |     96 | seems to imply that this statement is false. I suggest that you remove
 | 
|  |     97 | this statement.
 | 
|  |     98 | Section 7:
 | 
|  |     99 | The machine in this section does not make sense to me. First of all
 | 
|  |    100 | you do not state what the machine should accomplish; my guess is that
 | 
|  |    101 | you intend
 | 
|  |    102 |  ((L, s), <>) ->* ((L', <>), <t>)
 | 
|  |    103 | to hold iff t is a parse tree which witnesses the membership of s in
 | 
|  |    104 | L. Furthermore you give no reason (such as a proof) to believe that
 | 
|  |    105 | the machine satisfies this specification. For instance, you push
 | 
|  |    106 | nonterminals onto the stack, but you never remove them (I suspect that
 | 
|  |    107 | the last rule should remove a nonterminal), and it seems strange that
 | 
|  |    108 | the state is unchanged in the rules involving "first".
 | 
|  |    109 | I think the paper would be better if Section 7, which is not central
 | 
|  |    110 | to the paper, was omitted.
 | 
|  |    111 | "This approach is not well suited to typed languages, since a literal
 | 
|  |    112 | implementation of the parsing machine must use a generic parse-tree
 | 
|  |    113 | type, or else violate type-safety with casts for reductions." I
 | 
|  |    114 | suspect that you could do better with a dependently typed language, so
 | 
|  |    115 | I suggest that you remove this sentence.
 | 
|  |    116 | Example 1: "the parse tree is:" I think you mean "one parse tree is:".
 | 
|  |    117 | "The parsing machine demonstrates that, even without reduction rules
 | 
|  |    118 | associated with the grammar, we can use derivatives to construct a
 | 
|  |    119 | concrete syntax tree." I do not follow this sentence. What kind of
 | 
|  |    120 | reduction rules are you referring to?
 | 
|  |    121 | Section 8:
 | 
|  |    122 | The parser combinators that you give seem quite outdated to me. The
 | 
|  |    123 | functional programming community moved away from this set of
 | 
|  |    124 | combinators more than a decade ago, in favour of combinators based on
 | 
|  |    125 | monads or applicative functors. Please motivate this choice.
 | 
|  |    126 | You state two properties that parser combinators must satisfy, but you
 | 
|  |    127 | never really make any use of these properties. Are these properties
 | 
|  |    128 | necessary for your development? Is your definition of a parser
 | 
|  |    129 | combinator a function which satisfies these properties?
 | 
|  |    130 | "A parser combinator is monotonic iff..." I assume that the second
 | 
|  |    131 | instance of \sqsubseteq should be read pointwise.
 | 
|  |    132 | "the meaning of an input expression w <- A^∗ is contained within
 | 
|  |    133 | tau_N(w)" In fact it is the one and only (complete) result, right?
 | 
|  |    134 | "D_c(c) = eps -> \w.{(c, w)}" This does not make sense, the second
 | 
|  |    135 | argument of -> is not a parser.
 | 
|  |    136 | Section 9:
 | 
|  |    137 | Your parsers accept infinite streams as input. However, parseFull
 | 
|  |    138 | clearly fails to terminate for infinite input. Why do you not restrict
 | 
|  |    139 | the input to finite lists?
 | 
|  |    140 | Your parsers return potentially infinite streams of results. However,
 | 
|  |    141 | your parse function uses "append", which (essentially) throws away its
 | 
|  |    142 | second argument if the first is infinite. Wouldn't it be better to
 | 
|  |    143 | return a fair interleaving of the two streams?
 | 
|  |    144 | In internalDerive you seem to have renamed a and b to first and
 | 
|  |    145 | second, respectively.
 | 
|  |    146 | "the longest possible parse first" Why is this relevant for
 | 
|  |    147 | performance? If you return all possible parses, then the order in
 | 
|  |    148 | which they are returned should not matter, and for unambiguous parsers
 | 
|  |    149 | at most one result is returned anyway. I can imagine that your
 | 
|  |    150 | optimisation matters in the (atypical?) case of an ambiguous parse, if
 | 
|  |    151 | you are not interested in inspecting all the results, but only want to
 | 
|  |    152 | see that there is more than one.
 | 
|  |    153 | The sentence before the table seems to be unfinished.
 | 
|  |    154 | I assume that "43 m" should be "43 ms".
 | 
|  |    155 | "These results suggest that derivative-based parser combinators scale
 | 
|  |    156 | well enough to be useful in practice." To me this result suggests that
 | 
|  |    157 | I may have to repeatedly add new special-case combinators (like your
 | 
|  |    158 | closure operation) to ensure decent performance.
 | 
|  |    159 | Section 10:
 | 
|  |    160 | "derivative-based parsing defies the classical taxonomies", "Top-down
 | 
|  |    161 | methods", "back-tracking"... Are you not aware of parser combinator
 | 
|  |    162 | libraries which use breadth-first techniques (which your combinators
 | 
|  |    163 | also do)? See, for instance, Claessen's "Parallel Parsing Processes".
 | 
|  |    164 | "these methods have not been able to handle left-recursive grammars"
 | 
|  |    165 | You should check out Lickman's MSc thesis "Parsing With Fixed Points"
 | 
|  |    166 | and "Parser Combinators for Ambiguous Left-Recursive Grammars" by
 | 
|  |    167 | Frost et al.
 | 
|  |    168 | "Philosophically, neither derivative-based method “feels” like a
 | 
|  |    169 | top-down parsing method." To me your combinator implementation does
 | 
|  |    170 | feel like a (breadth-first) top-down method.
 | 
|  |    171 | You claim (implicitly) that your parser combinators can handle left
 | 
|  |    172 | recursive grammars. However, I doubt that you can handle grammars of
 | 
|  |    173 | the form "n ::= n". If you can handle such grammars, please explain
 | 
|  |    174 | how you do it. Otherwise it would be interesting to know exactly what
 | 
|  |    175 | kind of left recursion you support.
 | 
|  |    176 | "they are efficient and expressive (like bottom-up parsers)" This
 | 
|  |    177 | seems to imply that there is a language which can be parsed using
 | 
|  |    178 | bottom-up techniques, but not using top-down techniques. Can you
 | 
|  |    179 | verify this claim?
 | 
|  |    180 | You may want to cite Frost and Szydlowski, who claim worst-case cubic
 | 
|  |    181 | time for parsing ("Memoizing purely functional top-down backtracking
 | 
|  |    182 | language processors"). Swierstra's work on efficient parser
 | 
|  |    183 | combinators is also relevant (many papers).
 | 
|  |    184 | References:
 | 
|  |    185 | Improve formatting.
 | 
|  |    186 | PROS AND CONS
 | 
|  |    187 | + Simple technique for implementing parser combinators.
 | 
|  |    188 | - The technique has been presented before.
 | 
|  |    189 | - No substantial evidence for claims of efficiency.
 | 
|  |    190 | - Section 7 does not make sense to me (but could be removed).
 | 
|  |    191 | REBUTTAL
 | 
|  |    192 | 
 | 
|  |    193 | 
 | 
|  |    194 | -------------------- review 3 --------------------
 | 
|  |    195 | 
 | 
|  |    196 | PAPER: 104
 | 
|  |    197 | TITLE: Parsing with derivatives
 | 
|  |    198 | 
 | 
|  |    199 | OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
 | 
|  |    200 | REVIEWER'S CONFIDENCE: 2 (medium)
 | 
|  |    201 | ----------------------- REVIEW --------------------
 | 
|  |    202 | 
 | 
|  |    203 | MAIN PART
 | 
|  |    204 | The paper proposes a new parsing technique based on "derivatives",
 | 
|  |    205 | inspired by the derivatives of regular expressions and extended to the
 | 
|  |    206 | derivatives of context free grammars.
 | 
|  |    207 | Because context free grammars may be ambiguous, they propose to compute the
 | 
|  |    208 | set of all parse trees but using lazy evaluation to share and delay the
 | 
|  |    209 | computation. They present their framework and an implementation in Scala.
 | 
|  |    210 | They claim that their technique is efficient in practice.
 | 
|  |    211 | 
 | 
|  |    212 | EVALUATION
 | 
|  |    213 | Parsing techniques are old and well-established, although the current status
 | 
|  |    214 | of the theory is still unsatisfactory: grammars are still a burden to write
 | 
|  |    215 | and any advance of this topic is welcomed.  However, the topic must be
 | 
|  |    216 | addressed very seriously.
 | 
|  |    217 | Given this, this work is potentially interesting as a possible way to
 | 
|  |    218 | improve the state of the art situation, but not in its current
 | 
|  |    219 | presentation, as the paper should not just _claim_ that using derivatives
 | 
|  |    220 | for parsing is feasible, but _demonstrate_ that writing grammars with
 | 
|  |    221 | combinators is at least both as convenient and as efficient as using
 | 
|  |    222 | existing technology.
 | 
|  |    223 | The first aspect is not discussed at all in the paper.  As for efficiency,
 | 
|  |    224 | which is claimed, the benchmarks are really small, and perhaps probably
 | 
|  |    225 | representative of existing grammars, and the results are not compared with
 | 
|  |    226 | other parsing techniques.
 | 
|  |    227 | Other questions arise: what happens when the grammar is largely ambiguous?
 | 
|  |    228 | Do you intend to reject those, or pick the meaning of the things that you
 | 
|  |    229 | are parsing at random?
 | 
|  |    230 | The overall structure of the paper is not very clear either. You should
 | 
|  |    231 | gives us a road map at the beginning and make it clear how the different
 | 
|  |    232 | sections fits together.
 | 
|  |    233 | 
 | 
|  |    234 | OTHER COMMENTS
 | 
|  |    235 | Section 7 and 8 are the key sections of the paper to understand your
 | 
|  |    236 | proposal. However, I have problems with both.
 | 
|  |    237 | I don't understand while your parsing is decomposed into production of an
 | 
|  |    238 | intermediate parsing string that is then consumed by the machine that
 | 
|  |    239 | produces a syntax tree. Since the parsing string is a linear representation
 | 
|  |    240 | of the parsing tree, why can't you work with the parsing tree itself, which
 | 
|  |    241 | is more structured than the parsing string? Why can't you directly extract
 | 
|  |    242 | a syntax tree from the parsing tree? Could you run the algorithm in Section
 | 
|  |    243 | 7 on an example?
 | 
|  |    244 | In section 8, you compute the derivatives of parser combinators, but you
 | 
|  |    245 | could instead compute the grammar defined by the parser combinator and
 | 
|  |    246 | compute its derivative in one step. What is this bad/worse than computing it
 | 
|  |    247 | directly?
 | 
|  |    248 | 
 | 
|  |    249 | - One line above section 2: "could [to<-] modify"
 | 
|  |    250 | - Section 2, Line 10: "derivative---[they<-??] critical property"
 | 
|  |    251 | - One line above Section 7: "remain [<-of] a manageable size.
 | 
|  |    252 | - Computation of the derivative, above theorem 1:
 | 
|  |    253 |  You could explain the derivatives, especially for rules n-> s1..sn
 | 
|  |    254 |  and perhaps give an example.
 | 
|  |    255 | - Last two lines above section 8: "The parsing machine demonstrates...
 | 
|  |    256 |  a concrete syntax tree".
 | 
|  |    257 |  I don't really understand what you (intend to) mean.
 | 
|  |    258 | - Section 8.1, Line 1: "ability [the<-to] combine them"
 | 
|  |    259 | - End of section 8.1:
 | 
|  |    260 |  you should say and comment on the fact that these definitions are
 | 
|  |    261 |  recursive.
 | 
|  |    262 | - First line of section 8.4: "to be able [<-to] parse"
 | 
|  |    263 | - Section 8.4: How do you use these inequations? Why do you give this list of
 | 
|  |    264 |  formulas without any comment?
 | 
|  |    265 | 
 | 
|  |    266 | - Section 8.5: what are the lambda-terms on the right of --> ?
 | 
|  |    267 | 
 | 
|  |    268 | 
 | 
|  |    269 | -------------------- review 4 --------------------
 | 
|  |    270 | 
 | 
|  |    271 | PAPER: 104
 | 
|  |    272 | TITLE: Parsing with derivatives
 | 
|  |    273 | 
 | 
|  |    274 | OVERALL RATING: 1 (Serious problems. I will argue to reject this paper.)
 | 
|  |    275 | REVIEWER'S CONFIDENCE: 3 (high)
 | 
|  |    276 | ----------------------- REVIEW --------------------
 | 
|  |    277 | 
 | 
|  |    278 | MAIN PART
 | 
|  |    279 | SUMMARY
 | 
|  |    280 | The paper proposes a basic algorithm for determining whether a sentence s is a
 | 
|  |    281 | member of a context-free language L: (i) if the sentence is empty, determine
 | 
|  |    282 | whether L contains the empty string (a decidable problem); (ii) if the
 | 
|  |    283 | sentence is of the form c.s, determine whether the suffix s is a member of the
 | 
|  |    284 | derivative c^{-1}.L. Because the derivative of a context-free language can be
 | 
|  |    285 | effectively constructed, this is an algorithm.
 | 
|  |    286 | The contribution of the paper is in generalizing this basic algorithm in two
 | 
|  |    287 | ways: (i) instead of just determining membership, the parsing algorithm can be
 | 
|  |    288 | made to return a concrete syntax tree, or a semantic value; (ii) instead of
 | 
|  |    289 | viewing the derivative as an operation that accepts a monolithic grammar and
 | 
|  |    290 | produces a monolithic grammar, one can view it as an operation that accepts a
 | 
|  |    291 | parser combinator and produces a parser combinator; this means that the
 | 
|  |    292 | parsing algorithm is amenable to a modular, combinator-based implementation.
 | 
|  |    293 | EVALUATION
 | 
|  |    294 | The introduction of the paper is somewhat irritating, for two reasons. First,
 | 
|  |    295 | it promises a lot ("simplicity, feasibility, and ubiquity"), which the paper
 | 
|  |    296 | does not clearly deliver. Second, it criticizes the most popular parsing
 | 
|  |    297 | methods (LL and LR) on the grounds that the constraints that they impose, in
 | 
|  |    298 | order to rule out ambiguous grammars, are difficult to grasp. I can accept
 | 
|  |    299 | this criticism. But the authors do not offer any solution to the ambiguity
 | 
|  |    300 | problem: their algorithm produces a parse forest. This leaves the programmer
 | 
|  |    301 | faced with the task of writing and debugging disambiguation filters, an error
 | 
|  |    302 | prone task. At least, an LR parser generator is able (in principle, and some
 | 
|  |    303 | implementations actually do this) to accurately explain why the grammar seems
 | 
|  |    304 | ambiguous. It can produce a sentence that is a prefix of the fringes of two
 | 
|  |    305 | distinct, valid parse trees.
 | 
|  |    306 | The first claimed contribution of the paper is the remark that the derivative
 | 
|  |    307 | of a context free language can be effectively constructed. (A proof appears in
 | 
|  |    308 | section 6.) This is in fact not a contribution. This property of context free
 | 
|  |    309 | languages is well-known, at least in the formal language community (perhaps
 | 
|  |    310 | not in the programming language community). The derivative of a formal language
 | 
|  |    311 | is also known as its left quotient, and written c^{-1}L. See, for instance, the
 | 
|  |    312 | following paper, which exploits this property intensively:
 | 
|  |    313 |  Jean Berstel and Luc Boasson. Towards an algebraic theory of context-free
 | 
|  |    314 |  languages. Fundamentae Informaticae, 25(3):217-239, 1996.
 | 
|  |    315 | The remainder of the paper contains two distinct developments. In section 7, a
 | 
|  |    316 | version of the derivative-based parsing algorithm is developed, which produces
 | 
|  |    317 | concrete syntax trees. Then, in sections 8 and 9, another version is developed,
 | 
|  |    318 | which produces programmer-defined semantic values, and is combinator-based.
 | 
|  |    319 | It is not clear to me why these two distinct developments co-exist. In my eyes,
 | 
|  |    320 | the approach of sections 8 and 9 seems superior, because it is more flexible
 | 
|  |    321 | (it is not limited to producing concrete syntax trees) and modular.
 | 
|  |    322 | The idea that the notion of a derivative can be generalized to parser
 | 
|  |    323 | combinators is interesting, and worthy of thought. To some extent, section 8.2
 | 
|  |    324 | is where the paper really begins! Section 8 provides a mathematical view of
 | 
|  |    325 | the problem, while section 9 describes part of a Scala implementation.
 | 
|  |    326 | This material is interesting, but can be criticized on several grounds:
 | 
|  |    327 | 1. The complexity analysis of the algorithm is absent. The algorithm seems
 | 
|  |    328 | very expensive in the worst case: indeed, the effective construction of a
 | 
|  |    329 | derivative in section 6 causes a quadratic blow up of the size of the grammar,
 | 
|  |    330 | and this operation is iterated at every character of the input stream! In
 | 
|  |    331 | fact, section 9.1 mentions that a naive implementation of the algorithm is
 | 
|  |    332 | unusable (exponentially slow), and that special support for the Kleene star
 | 
|  |    333 | had to be added. However, there is no argument that this patch is sufficient
 | 
|  |    334 | for the algorithm to become useable. A worst-case analysis of the space and
 | 
|  |    335 | time complexity is needed, and, if this worst-case complexity is bad (say,
 | 
|  |    336 | exponential), then a formal or semi-formal argument is needed why the behavior
 | 
|  |    337 | of the algorithm remains tolerable in practice. In light of the efficiency
 | 
|  |    338 | claim found in the introduction, this is indispensable!
 | 
|  |    339 | 2. The description of the implementation could be improved. In particular,
 | 
|  |    340 | memoization and sharing seem to play a crucial role. It seems that two
 | 
|  |    341 | attempts to compute the derivative of a parser combinator should return the
 | 
|  |    342 | same object. This includes the particular case where the second attempt is in
 | 
|  |    343 | fact nested within the first one (i.e., it is a recursive call). For instance,
 | 
|  |    344 | if the grammar contains the single production L -> LM, a call to L.derive(t)
 | 
|  |    345 | will (indirectly) cause another call to L.derive(t). What happens then? By
 | 
|  |    346 | which mechanism is non-termination avoided? This is not clearly described.
 | 
|  |    347 | The paper contains four lines of text that seem to concern this topic (p.15,
 | 
|  |    348 | "It is important to note [...] immediately suspended."), but I find them
 | 
|  |    349 | obscure. I note that there seems to be a distinction between derive and
 | 
|  |    350 | internalDerive, which is not explained in the paper; my guess is that the
 | 
|  |    351 | former may be a memoizing version of the latter, using a table indexed by
 | 
|  |    352 | tokens. Is this the case?
 | 
|  |    353 | 3. The introduction claims "ubiquity", that is, if I understand correctly, it
 | 
|  |    354 | claims that the method is very easy to implement as a library. This may be
 | 
|  |    355 | true, but the paper lacks a convincing argument why this is so. In particular,
 | 
|  |    356 | how is this approach superior to the other parsing combinator libraries in
 | 
|  |    357 | existence? The paper lacks a good comparison with other combinator-based
 | 
|  |    358 | parsing techniques.
 | 
|  |    359 | It seems that there might be a connection between the technique presented in
 | 
|  |    360 | the paper and intersection parsing (see e.g. chapter 13 of the excellent book
 | 
|  |    361 | "Parsing techniques", by Grune and Jacobs). Computing the derivative of a
 | 
|  |    362 | context free language with respect to a terminal symbol c is closely related
 | 
|  |    363 | to computing its intersection with the regular language c(.*). Intersection
 | 
|  |    364 | parsing, as described by Grune and Jacobs, computes the intersection of the
 | 
|  |    365 | grammar with the input (which is a string, so a regular language!) and
 | 
|  |    366 | simplifies the resulting grammar, which can then be viewed as a parse
 | 
|  |    367 | tree. Could one view the approach presented in the paper as an incremental
 | 
|  |    368 | (and modular) way of doing intersection parsing?
 | 
|  |    369 | In conclusion, although the paper contains an interesting idea, its
 | 
|  |    370 | investigation is not sufficiently thorough, and a clear case in favor
 | 
|  |    371 | of this parsing technique is missing.
 | 
|  |    372 | DETAILED COMMENTS
 | 
|  |    373 | The authors state in section 7 that "a literal implementation [...] must use a
 | 
|  |    374 | generic parse-tree type, or else violate type safety with casts for
 | 
|  |    375 | reductions". Strictly speaking, this depends on which type system one has in
 | 
|  |    376 | mind. If the type system is ML, for instance, the claim is probably true. If
 | 
|  |    377 | the type system has GADTs, however, it is quite possible that the discipline
 | 
|  |    378 | of the parsing machine can be encoded in the types. This has been done, for an
 | 
|  |    379 | LR parsing machine, and for an arbitrary but fixed grammar, in the following
 | 
|  |    380 | paper:
 | 
|  |    381 |  François Pottier and Yann Régis-Gianas. Towards efficient, typed LR parsers.
 | 
|  |    382 |  In ACM Workshop on ML, volume 148(2) of Electronic Notes in Theoretical Computer
 | 
|  |    383 |  Science, pages 155-180, 2006.
 | 
|  |    384 | To the best of my knowledge, both Haskell and Scala have GADTs, so the authors
 | 
|  |    385 | might wish to check this out.
 | 
|  |    386 | p.5, "for any class of decidable languages closed under the derivative,
 | 
|  |    387 | derivatives may be used to determine membership". I am not sure what is meant
 | 
|  |    388 | by "decidable" here. If it is meant that membership in the language is
 | 
|  |    389 | decidable, then this is a tautology. Perhaps one should understand that
 | 
|  |    390 | membership *of the empty word* in the language is decidable, since effectively
 | 
|  |    391 | that is all that is needed?
 | 
|  |    392 | p.5, "the derivative is closed under regular operations": unclear formulation.
 | 
|  |    393 | p.7, before theorem 1: please indicate the interval over which the index i
 | 
|  |    394 | ranges. My guess is 0 \geq i < m. The particular case where m = 0 seems
 | 
|  |    395 | redundant: when m = 0, there is no suitable index i, so no production is
 | 
|  |    396 | generated, which means D_c(n) is empty. In fact, the notation \emptyset
 | 
|  |    397 | in the right-hand side of a production does not make sense, since the
 | 
|  |    398 | right-hand side of a production is supposed to be a sequence of symbols.
 | 
|  |    399 | p.7, "the empty nonterminal": this has not been defined. I assume this
 | 
|  |    400 | explains the use of \emptyset above. Define it.
 | 
|  |    401 | p.8, example 1: "for the grammar [...] of balanced parentheses [...] the parse
 | 
|  |    402 | tree is": I assume this should be: "the parse tree for the string '()'".
 | 
|  |    403 | p.8, "A parse string is [...] containing both nonterminal markers and
 | 
|  |    404 | reduction markers"." Please indicate at this point that a parse string also
 | 
|  |    405 | contains elements of the alphabet A. In fact, the definition of the alphabet
 | 
|  |    406 | A' should be moved to this point.
 | 
|  |    407 | p.8, "we construct a new grammar [...] over parse strings": please change
 | 
|  |    408 | this to "a new grammar over the alphabet A'".
 | 
|  |    409 | p.8, the definition of \Sigma mentions \mathbb{L}: I guess this should be
 | 
|  |    410 | \mathbb{L}_{A'}.
 | 
|  |    411 | p.9, "a second rule for producing bras": is the rule correct? In the
 | 
|  |    412 | right-hand side of the conclusion, instead of L, I would have expected the
 | 
|  |    413 | derivative of L with respect to <n|. (The rule, as it stands, clearly causes
 | 
|  |    414 | an infinite stream of <n|'s to be produced! Similarly for the third rule.
 | 
|  |    415 | p.9, "the parsing machine reduces": one might wish to state the invariant
 | 
|  |    416 | of the parsing machine which guarantees that reduction can never fail (by
 | 
|  |    417 | lack of material on the stack).
 | 
|  |    418 | p.9, "popular as of late": please provide citation(s).
 | 
|  |    419 | p.10, the statement of the monotonicity condition is somewhat unclear: the
 | 
|  |    420 | assertion \tau_X(w) \sqsubseteq \tau_X(w') compares two sets of pairs. An
 | 
|  |    421 | ordering over pairs is defined, but an ordering over sets of pairs is not
 | 
|  |    422 | defined.
 | 
|  |    423 | p.11, the authors might want to spend more space explaining the definition of
 | 
|  |    424 | the derivative of a parser combinator -- why is the right definition? and how
 | 
|  |    425 | does it relate with the definition of the derivative of a formal language?
 | 
|  |    426 | Can one recover the latter as a degenerate case of the former?
 | 
|  |    427 | p.12, section 8.3: the parser combinator \epsilon is used, but has not been
 | 
|  |    428 | defined. What is its type? In particular, what semantic value does it produce?
 | 
|  |    429 | (A unit value?) What is the meaning of the notation \lambda\epsilon. ...
 | 
|  |    430 | p.12, equation (3): the elegance afforded by the use of \delta in the
 | 
|  |    431 | definition of the derivative of a concatenation (p.5) seems to have been lost
 | 
|  |    432 | here.
 | 
|  |    433 | p.13, "in practice, the cost [...] as they are derived". I don't know what
 | 
|  |    434 | this means. Please be more precise and include a formal complexity claim.
 | 
|  |    435 | p.14, "parse trees of type A" should really be referred to as "semantic
 | 
|  |    436 | values", since they may not be trees at all -- they are programmer-defined.
 | 
|  |    437 | p.14, "its implementation follows eq.2": actually, eq.2 corresponds only
 | 
|  |    438 | to the "else" clause of the code, so the text is badly worded. Same problem
 | 
|  |    439 | with eq.1 a few lines later.
 | 
|  |    440 | p.15, the distinction between "derive" and "internalDerive" is not explained.
 | 
|  |    441 | p.15, the derivative of union "short-circuits if it finds either arm is
 | 
|  |    442 | empty": is this important, say, to ensure termination? or is it just a minor
 | 
|  |    443 | optimization?
 | 
|  |    444 | p.16, the fact that a naïve treatment of the Kleene star required exponential
 | 
|  |    445 | time, and the fact that right recursion is significantly more efficient than
 | 
|  |    446 | left recursion, are worrisome. These problems are perhaps partially masked by
 | 
|  |    447 | the introduction of "the special closure construct", but how do we know that
 | 
|  |    448 | they are gone?
 | 
|  |    449 | p.16, the test suite consists of just one grammar and a set of randomly
 | 
|  |    450 | generated inputs. This seems far from representative of actual usage...
 | 
|  |    451 | p.17, "the cost of that compilation is amortized across the entire parsing
 | 
|  |    452 | process". I don't know what is meant by "compilation" here, nor do I know
 | 
|  |    453 | why its cost is amortized. Please explain.
 | 
|  |    454 | p.18, "these combinators are slowly transforming and unfolding the grammar
 | 
|  |    455 | itself into a parse tree": this seems to be an interesting intuition. Why
 | 
|  |    456 | not try to make it more formal?
 | 
|  |    457 | 
 | 
|  |    458 | TYPOS
 | 
|  |    459 | p.2, "could to modify"
 | 
|  |    460 | p.2, "they critical"
 | 
|  |    461 | p.6, "attempt interpret"
 | 
|  |    462 | p.10, "Operations or combinators"
 | 
|  |    463 | p.10, "the remainders of the input"
 | 
|  |    464 | p.12, "is base case"
 | 
|  |    465 | p.13, "able parse"
 | 
|  |    466 | p.15, "short-circuiting if the first element is nullable": should this be: "is
 | 
|  |    467 | not nullable"?
 | 
|  |    468 | p.16, "43 m" -> "43 ms"
 | 
|  |    469 | p.16, "surpising"
 | 
|  |    470 | p.17, "they extent"
 | 
|  |    471 | PROS AND CONS
 | 
|  |    472 | + interesting and possibly novel idea
 | 
|  |    473 | - no time/space complexity analysis
 | 
|  |    474 | - implementation not clearly described
 | 
|  |    475 | - case in favor of the idea not clearly made
 | 
|  |    476 | - comparison with related work should be more thorough
 | 
|  |    477 | REBUTTAL
 | 
|  |    478 | No specific questions.
 |