diff -r c265a1689bdc -r e7e4e490326b Parsing-Literature/yacc-is-dead-review --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Parsing-Literature/yacc-is-dead-review Wed Apr 06 08:18:23 2011 +0000 @@ -0,0 +1,478 @@ +-------------- Reviews on submission 104 ------------- + +-------------------- review 1 -------------------- + +PAPER: 104 +TITLE: Parsing with derivatives + +OVERALL RATING: 4 (OK paper, but I will not champion it.) +REVIEWER'S CONFIDENCE: 2 (medium) +----------------------- REVIEW -------------------- + +MAIN PART +The paper describes an original approach to the derivation of parsers from grammars. The approach is based on the observation that context-free grammars are closed under Brzozowski derivatives, and such derivatives can be easily computed. The parser described hence parses a string by lazily computing the derivatives of the grammar for each character met. Quite surprisingly, such a parser seems very easy to encode, especially in a language that supports streams and lazy computations, can parse every grammar, and is quite efficient. +I found the paper very interesting. The studied problem is very important, and is surprising to meet original results in a field that has been explored so much. I enjoyed the first pages of the paper very much, they are extremely crisp and clear. Unfortunately, around page 8, the paper starts getting more and more obscure; chapter 7 end abruptly, and chapter 8, which is probably the most important one, is quite obscure. At the end I still think that the paper should be accepted, but I really hope that the authors will be able to beef section 8 up to the point of making it readable for me. +Detailed remarks for the authors only +Example 1: the parse tree is: => the pasre tree of ' () ' is; +In general: you should write \epsilon> rather then \epsilon> (this is not quantum theory, after all...) +The non-deterministic state machine: the language in \Sigma is a language over A', not over A, right? You should say that explicitly. +In the rules for bra's and ket's, the consequences should look like + (L,w) =>^{\bra{n}} (D_{\bra{n}}(L),w) +I think you forgot the derivative, right? +The parse-string machine pushed a bra when it meets it, but never pops it. You can easily adjust the last rule of section 7 to pop it, but the real question is: why do you push it? +The end of section 7 is too abrupt. You have described a nice pair of abstract machines, but now you should tell me something a bit more concrete about the amount of nondetermins involved, how much would that hurt me in practice... +Page 10: the monotonicity for parser combinators should look like: + forall p in \tau_X(w) exists p' in \tau_X(w') such that first p prefix of first p' +Section 8 is obscure. It is too terse. The equations look right and look convincing, but I was totally unable to create a mental picture about how this machinery actually work. I understand that such a combination of non-determinism and lazyness creates code that tends to work in a mysterious way, but still the section is extremely unsatisfactory, especially. +Section 9: a bit unreadable, at least for a non-scala programmer. +PROS AND CONS ++ interesting solution for an interesting problem ++ elegant ++ innovative ++ first 8 pages are a pleasure to read +- crucial sections are almost unreadable +REBUTTAL + + +-------------------- review 2 -------------------- + +PAPER: 104 +TITLE: Parsing with derivatives + +OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.) +REVIEWER'S CONFIDENCE: 4 (expert) +----------------------- REVIEW -------------------- + +MAIN PART +The authors describe how one can use Brzozowski-style derivatives to +implement parser combinators. They claim that their technique leads to +simple implementations, and this seems to be the case. They also claim +that their implementation is efficient, but give no solid evidence for +this (they added a special combinator to get decent numbers from their +single reported experiment). Another claim is that their parser +combinators can handle left recursion, but they give no evidence for +this. +The authors seem to be unaware that Danielsson presented essentially +the same technique at TYPES 2009 ("Mixing Induction and Coinduction"). +However, Danielsson's draft paper does not try to sell the technique, +only using it as a vehicle to implement provably total parser +combinators, and he does not address efficiency of parsing, so there +could be a place for both papers. +The authors seem to claim that their implementation is a lot more +efficient than other parser combinator libraries: + "[...] they are easy to implement, and easy to implement as + libraries (like top-down parsers), yet they are efficient and + expressive (like bottom-up parsers)." +However, they have not made any head-to-head comparisons, and they +have not cited a single paper discussing efficient parser combinators. +Generally there is a serious lack of references to related work on +parser combinators. +Section 7 discusses another parsing technique (also based on +derivatives). However, I found this section to be poorly written and +hard to follow. I suggest that it should be removed. +If the claims about efficiency had been followed up with substantial +evidence, then I would have recommended that the paper should be +published. As it stands I do not make such a recommendation. +Detailed comments: +Section 2: +"This technique was lost to the “sands of time” until [paper from +2009]" A quick web search will tell you that a large number of papers +published before 2009 have used Brzozowski derivatives (or +generalisations), so this comment is incorrect. See for instance the +work of Rutten, including "Automata and Coinduction; an exercise in +coalgebra" and "Automata, Power Series, and Coinduction: Taking Input +Derivatives Seriously". +Section 6: +The expansion of D_x(List) would be easier to follow if you added a +step between the second and third lines. +"the null string terminal" and "the empty nonterminal" may not exist +in N, so they may have to be added to N'. +One could be led to believe that the last "and:" only applies when +m = 0. +"The empty nonterminal arises frequently with derivatives, and if this +reduction is applied aggressively, the grammar will remain a +manageable size." You give no proper motivation for this statement. In +fact, your discussion of the "special closure construct" in Section 9 +seems to imply that this statement is false. I suggest that you remove +this statement. +Section 7: +The machine in this section does not make sense to me. First of all +you do not state what the machine should accomplish; my guess is that +you intend + ((L, s), <>) ->* ((L', <>), ) +to hold iff t is a parse tree which witnesses the membership of s in +L. Furthermore you give no reason (such as a proof) to believe that +the machine satisfies this specification. For instance, you push +nonterminals onto the stack, but you never remove them (I suspect that +the last rule should remove a nonterminal), and it seems strange that +the state is unchanged in the rules involving "first". +I think the paper would be better if Section 7, which is not central +to the paper, was omitted. +"This approach is not well suited to typed languages, since a literal +implementation of the parsing machine must use a generic parse-tree +type, or else violate type-safety with casts for reductions." I +suspect that you could do better with a dependently typed language, so +I suggest that you remove this sentence. +Example 1: "the parse tree is:" I think you mean "one parse tree is:". +"The parsing machine demonstrates that, even without reduction rules +associated with the grammar, we can use derivatives to construct a +concrete syntax tree." I do not follow this sentence. What kind of +reduction rules are you referring to? +Section 8: +The parser combinators that you give seem quite outdated to me. The +functional programming community moved away from this set of +combinators more than a decade ago, in favour of combinators based on +monads or applicative functors. Please motivate this choice. +You state two properties that parser combinators must satisfy, but you +never really make any use of these properties. Are these properties +necessary for your development? Is your definition of a parser +combinator a function which satisfies these properties? +"A parser combinator is monotonic iff..." I assume that the second +instance of \sqsubseteq should be read pointwise. +"the meaning of an input expression w <- A^∗ is contained within +tau_N(w)" In fact it is the one and only (complete) result, right? +"D_c(c) = eps -> \w.{(c, w)}" This does not make sense, the second +argument of -> is not a parser. +Section 9: +Your parsers accept infinite streams as input. However, parseFull +clearly fails to terminate for infinite input. Why do you not restrict +the input to finite lists? +Your parsers return potentially infinite streams of results. However, +your parse function uses "append", which (essentially) throws away its +second argument if the first is infinite. Wouldn't it be better to +return a fair interleaving of the two streams? +In internalDerive you seem to have renamed a and b to first and +second, respectively. +"the longest possible parse first" Why is this relevant for +performance? If you return all possible parses, then the order in +which they are returned should not matter, and for unambiguous parsers +at most one result is returned anyway. I can imagine that your +optimisation matters in the (atypical?) case of an ambiguous parse, if +you are not interested in inspecting all the results, but only want to +see that there is more than one. +The sentence before the table seems to be unfinished. +I assume that "43 m" should be "43 ms". +"These results suggest that derivative-based parser combinators scale +well enough to be useful in practice." To me this result suggests that +I may have to repeatedly add new special-case combinators (like your +closure operation) to ensure decent performance. +Section 10: +"derivative-based parsing defies the classical taxonomies", "Top-down +methods", "back-tracking"... Are you not aware of parser combinator +libraries which use breadth-first techniques (which your combinators +also do)? See, for instance, Claessen's "Parallel Parsing Processes". +"these methods have not been able to handle left-recursive grammars" +You should check out Lickman's MSc thesis "Parsing With Fixed Points" +and "Parser Combinators for Ambiguous Left-Recursive Grammars" by +Frost et al. +"Philosophically, neither derivative-based method “feels” like a +top-down parsing method." To me your combinator implementation does +feel like a (breadth-first) top-down method. +You claim (implicitly) that your parser combinators can handle left +recursive grammars. However, I doubt that you can handle grammars of +the form "n ::= n". If you can handle such grammars, please explain +how you do it. Otherwise it would be interesting to know exactly what +kind of left recursion you support. +"they are efficient and expressive (like bottom-up parsers)" This +seems to imply that there is a language which can be parsed using +bottom-up techniques, but not using top-down techniques. Can you +verify this claim? +You may want to cite Frost and Szydlowski, who claim worst-case cubic +time for parsing ("Memoizing purely functional top-down backtracking +language processors"). Swierstra's work on efficient parser +combinators is also relevant (many papers). +References: +Improve formatting. +PROS AND CONS ++ Simple technique for implementing parser combinators. +- The technique has been presented before. +- No substantial evidence for claims of efficiency. +- Section 7 does not make sense to me (but could be removed). +REBUTTAL + + +-------------------- review 3 -------------------- + +PAPER: 104 +TITLE: Parsing with derivatives + +OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.) +REVIEWER'S CONFIDENCE: 2 (medium) +----------------------- REVIEW -------------------- + +MAIN PART +The paper proposes a new parsing technique based on "derivatives", +inspired by the derivatives of regular expressions and extended to the +derivatives of context free grammars. +Because context free grammars may be ambiguous, they propose to compute the +set of all parse trees but using lazy evaluation to share and delay the +computation. They present their framework and an implementation in Scala. +They claim that their technique is efficient in practice. + +EVALUATION +Parsing techniques are old and well-established, although the current status +of the theory is still unsatisfactory: grammars are still a burden to write +and any advance of this topic is welcomed. However, the topic must be +addressed very seriously. +Given this, this work is potentially interesting as a possible way to +improve the state of the art situation, but not in its current +presentation, as the paper should not just _claim_ that using derivatives +for parsing is feasible, but _demonstrate_ that writing grammars with +combinators is at least both as convenient and as efficient as using +existing technology. +The first aspect is not discussed at all in the paper. As for efficiency, +which is claimed, the benchmarks are really small, and perhaps probably +representative of existing grammars, and the results are not compared with +other parsing techniques. +Other questions arise: what happens when the grammar is largely ambiguous? +Do you intend to reject those, or pick the meaning of the things that you +are parsing at random? +The overall structure of the paper is not very clear either. You should +gives us a road map at the beginning and make it clear how the different +sections fits together. + +OTHER COMMENTS +Section 7 and 8 are the key sections of the paper to understand your +proposal. However, I have problems with both. +I don't understand while your parsing is decomposed into production of an +intermediate parsing string that is then consumed by the machine that +produces a syntax tree. Since the parsing string is a linear representation +of the parsing tree, why can't you work with the parsing tree itself, which +is more structured than the parsing string? Why can't you directly extract +a syntax tree from the parsing tree? Could you run the algorithm in Section +7 on an example? +In section 8, you compute the derivatives of parser combinators, but you +could instead compute the grammar defined by the parser combinator and +compute its derivative in one step. What is this bad/worse than computing it +directly? + +- One line above section 2: "could [to<-] modify" +- Section 2, Line 10: "derivative---[they<-??] critical property" +- One line above Section 7: "remain [<-of] a manageable size. +- Computation of the derivative, above theorem 1: + You could explain the derivatives, especially for rules n-> s1..sn + and perhaps give an example. +- Last two lines above section 8: "The parsing machine demonstrates... + a concrete syntax tree". + I don't really understand what you (intend to) mean. +- Section 8.1, Line 1: "ability [the<-to] combine them" +- End of section 8.1: + you should say and comment on the fact that these definitions are + recursive. +- First line of section 8.4: "to be able [<-to] parse" +- Section 8.4: How do you use these inequations? Why do you give this list of + formulas without any comment? + +- Section 8.5: what are the lambda-terms on the right of --> ? + + +-------------------- review 4 -------------------- + +PAPER: 104 +TITLE: Parsing with derivatives + +OVERALL RATING: 1 (Serious problems. I will argue to reject this paper.) +REVIEWER'S CONFIDENCE: 3 (high) +----------------------- REVIEW -------------------- + +MAIN PART +SUMMARY +The paper proposes a basic algorithm for determining whether a sentence s is a +member of a context-free language L: (i) if the sentence is empty, determine +whether L contains the empty string (a decidable problem); (ii) if the +sentence is of the form c.s, determine whether the suffix s is a member of the +derivative c^{-1}.L. Because the derivative of a context-free language can be +effectively constructed, this is an algorithm. +The contribution of the paper is in generalizing this basic algorithm in two +ways: (i) instead of just determining membership, the parsing algorithm can be +made to return a concrete syntax tree, or a semantic value; (ii) instead of +viewing the derivative as an operation that accepts a monolithic grammar and +produces a monolithic grammar, one can view it as an operation that accepts a +parser combinator and produces a parser combinator; this means that the +parsing algorithm is amenable to a modular, combinator-based implementation. +EVALUATION +The introduction of the paper is somewhat irritating, for two reasons. First, +it promises a lot ("simplicity, feasibility, and ubiquity"), which the paper +does not clearly deliver. Second, it criticizes the most popular parsing +methods (LL and LR) on the grounds that the constraints that they impose, in +order to rule out ambiguous grammars, are difficult to grasp. I can accept +this criticism. But the authors do not offer any solution to the ambiguity +problem: their algorithm produces a parse forest. This leaves the programmer +faced with the task of writing and debugging disambiguation filters, an error +prone task. At least, an LR parser generator is able (in principle, and some +implementations actually do this) to accurately explain why the grammar seems +ambiguous. It can produce a sentence that is a prefix of the fringes of two +distinct, valid parse trees. +The first claimed contribution of the paper is the remark that the derivative +of a context free language can be effectively constructed. (A proof appears in +section 6.) This is in fact not a contribution. This property of context free +languages is well-known, at least in the formal language community (perhaps +not in the programming language community). The derivative of a formal language +is also known as its left quotient, and written c^{-1}L. See, for instance, the +following paper, which exploits this property intensively: + Jean Berstel and Luc Boasson. Towards an algebraic theory of context-free + languages. Fundamentae Informaticae, 25(3):217-239, 1996. +The remainder of the paper contains two distinct developments. In section 7, a +version of the derivative-based parsing algorithm is developed, which produces +concrete syntax trees. Then, in sections 8 and 9, another version is developed, +which produces programmer-defined semantic values, and is combinator-based. +It is not clear to me why these two distinct developments co-exist. In my eyes, +the approach of sections 8 and 9 seems superior, because it is more flexible +(it is not limited to producing concrete syntax trees) and modular. +The idea that the notion of a derivative can be generalized to parser +combinators is interesting, and worthy of thought. To some extent, section 8.2 +is where the paper really begins! Section 8 provides a mathematical view of +the problem, while section 9 describes part of a Scala implementation. +This material is interesting, but can be criticized on several grounds: +1. The complexity analysis of the algorithm is absent. The algorithm seems +very expensive in the worst case: indeed, the effective construction of a +derivative in section 6 causes a quadratic blow up of the size of the grammar, +and this operation is iterated at every character of the input stream! In +fact, section 9.1 mentions that a naive implementation of the algorithm is +unusable (exponentially slow), and that special support for the Kleene star +had to be added. However, there is no argument that this patch is sufficient +for the algorithm to become useable. A worst-case analysis of the space and +time complexity is needed, and, if this worst-case complexity is bad (say, +exponential), then a formal or semi-formal argument is needed why the behavior +of the algorithm remains tolerable in practice. In light of the efficiency +claim found in the introduction, this is indispensable! +2. The description of the implementation could be improved. In particular, +memoization and sharing seem to play a crucial role. It seems that two +attempts to compute the derivative of a parser combinator should return the +same object. This includes the particular case where the second attempt is in +fact nested within the first one (i.e., it is a recursive call). For instance, +if the grammar contains the single production L -> LM, a call to L.derive(t) +will (indirectly) cause another call to L.derive(t). What happens then? By +which mechanism is non-termination avoided? This is not clearly described. +The paper contains four lines of text that seem to concern this topic (p.15, +"It is important to note [...] immediately suspended."), but I find them +obscure. I note that there seems to be a distinction between derive and +internalDerive, which is not explained in the paper; my guess is that the +former may be a memoizing version of the latter, using a table indexed by +tokens. Is this the case? +3. The introduction claims "ubiquity", that is, if I understand correctly, it +claims that the method is very easy to implement as a library. This may be +true, but the paper lacks a convincing argument why this is so. In particular, +how is this approach superior to the other parsing combinator libraries in +existence? The paper lacks a good comparison with other combinator-based +parsing techniques. +It seems that there might be a connection between the technique presented in +the paper and intersection parsing (see e.g. chapter 13 of the excellent book +"Parsing techniques", by Grune and Jacobs). Computing the derivative of a +context free language with respect to a terminal symbol c is closely related +to computing its intersection with the regular language c(.*). Intersection +parsing, as described by Grune and Jacobs, computes the intersection of the +grammar with the input (which is a string, so a regular language!) and +simplifies the resulting grammar, which can then be viewed as a parse +tree. Could one view the approach presented in the paper as an incremental +(and modular) way of doing intersection parsing? +In conclusion, although the paper contains an interesting idea, its +investigation is not sufficiently thorough, and a clear case in favor +of this parsing technique is missing. +DETAILED COMMENTS +The authors state in section 7 that "a literal implementation [...] must use a +generic parse-tree type, or else violate type safety with casts for +reductions". Strictly speaking, this depends on which type system one has in +mind. If the type system is ML, for instance, the claim is probably true. If +the type system has GADTs, however, it is quite possible that the discipline +of the parsing machine can be encoded in the types. This has been done, for an +LR parsing machine, and for an arbitrary but fixed grammar, in the following +paper: + François Pottier and Yann Régis-Gianas. Towards efficient, typed LR parsers. + In ACM Workshop on ML, volume 148(2) of Electronic Notes in Theoretical Computer + Science, pages 155-180, 2006. +To the best of my knowledge, both Haskell and Scala have GADTs, so the authors +might wish to check this out. +p.5, "for any class of decidable languages closed under the derivative, +derivatives may be used to determine membership". I am not sure what is meant +by "decidable" here. If it is meant that membership in the language is +decidable, then this is a tautology. Perhaps one should understand that +membership *of the empty word* in the language is decidable, since effectively +that is all that is needed? +p.5, "the derivative is closed under regular operations": unclear formulation. +p.7, before theorem 1: please indicate the interval over which the index i +ranges. My guess is 0 \geq i < m. The particular case where m = 0 seems +redundant: when m = 0, there is no suitable index i, so no production is +generated, which means D_c(n) is empty. In fact, the notation \emptyset +in the right-hand side of a production does not make sense, since the +right-hand side of a production is supposed to be a sequence of symbols. +p.7, "the empty nonterminal": this has not been defined. I assume this +explains the use of \emptyset above. Define it. +p.8, example 1: "for the grammar [...] of balanced parentheses [...] the parse +tree is": I assume this should be: "the parse tree for the string '()'". +p.8, "A parse string is [...] containing both nonterminal markers and +reduction markers"." Please indicate at this point that a parse string also +contains elements of the alphabet A. In fact, the definition of the alphabet +A' should be moved to this point. +p.8, "we construct a new grammar [...] over parse strings": please change +this to "a new grammar over the alphabet A'". +p.8, the definition of \Sigma mentions \mathbb{L}: I guess this should be +\mathbb{L}_{A'}. +p.9, "a second rule for producing bras": is the rule correct? In the +right-hand side of the conclusion, instead of L, I would have expected the +derivative of L with respect to "43 ms" +p.16, "surpising" +p.17, "they extent" +PROS AND CONS ++ interesting and possibly novel idea +- no time/space complexity analysis +- implementation not clearly described +- case in favor of the idea not clearly made +- comparison with related work should be more thorough +REBUTTAL +No specific questions.