Parsing-Literature/yacc-is-dead-review
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Sat, 14 Sep 2013 14:08:19 +0100
changeset 387 288637d9dcde
parent 153 e7e4e490326b
permissions -rw-r--r--
more changes for final submission

-------------- Reviews on submission 104 -------------

-------------------- review 1 --------------------

PAPER: 104
TITLE: Parsing with derivatives

OVERALL RATING: 4 (OK paper, but I will not champion it.)
REVIEWER'S CONFIDENCE: 2 (medium)
----------------------- REVIEW --------------------

MAIN PART
The paper describes an original approach to the derivation of parsers from grammars. The approach is based on the observation that context-free grammars are closed under Brzozowski derivatives, and such derivatives can be easily computed. The parser described hence parses a string by lazily computing the derivatives of the grammar for each character met. Quite surprisingly, such a parser seems very easy to encode, especially in a language that supports streams and lazy computations, can parse every grammar, and is quite efficient.
I found the paper very interesting. The studied problem is very important, and is surprising to meet original results in a field that has been explored so much. I enjoyed the first pages of the paper very much, they are extremely crisp and clear. Unfortunately, around page 8, the paper starts getting more and more obscure; chapter 7 end abruptly, and chapter 8, which is probably the most important one, is quite obscure. At the end I still think that the paper should be accepted, but I really hope that the authors will be able to beef section 8 up to the point of making it readable for me.
Detailed remarks for the authors only
Example 1: the parse tree is: => the pasre tree of ' () ' is;
In general: you should write <B||B->\epsilon> rather then <B|B->\epsilon> (this is not quantum theory, after all...)
The non-deterministic state machine: the language in \Sigma is a language over A', not over A, right? You should say that explicitly.
In the rules for bra's and ket's, the consequences should look like
   (L,w) =>^{\bra{n}} (D_{\bra{n}}(L),w)
I think you forgot the derivative, right?
The parse-string machine pushed a bra when it meets it, but never pops it. You can easily adjust the last rule of section 7 to pop it, but the real question is: why do you push it?
The end of section 7 is too abrupt. You have described a nice pair of abstract machines, but now you should tell me something a bit more concrete about the amount of nondetermins involved, how much would that hurt me in practice...
Page 10: the monotonicity for parser combinators should look like:
  forall p in \tau_X(w) exists p' in \tau_X(w') such that first p prefix of first p'
Section 8 is obscure. It is too terse. The equations look right and look convincing, but I was totally unable to create a mental picture about how this machinery actually work. I understand that such a combination of non-determinism and lazyness creates code that tends to work in a mysterious way, but still the section is extremely unsatisfactory, especially.
Section 9: a bit unreadable, at least for a non-scala programmer.
PROS AND CONS
+ interesting solution for an interesting problem
+ elegant
+ innovative
+ first 8 pages are a pleasure to read
- crucial sections are almost unreadable
REBUTTAL


-------------------- review 2 --------------------

PAPER: 104
TITLE: Parsing with derivatives

OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
REVIEWER'S CONFIDENCE: 4 (expert)
----------------------- REVIEW --------------------

MAIN PART
The authors describe how one can use Brzozowski-style derivatives to
implement parser combinators. They claim that their technique leads to
simple implementations, and this seems to be the case. They also claim
that their implementation is efficient, but give no solid evidence for
this (they added a special combinator to get decent numbers from their
single reported experiment). Another claim is that their parser
combinators can handle left recursion, but they give no evidence for
this.
The authors seem to be unaware that Danielsson presented essentially
the same technique at TYPES 2009 ("Mixing Induction and Coinduction").
However, Danielsson's draft paper does not try to sell the technique,
only using it as a vehicle to implement provably total parser
combinators, and he does not address efficiency of parsing, so there
could be a place for both papers.
The authors seem to claim that their implementation is a lot more
efficient than other parser combinator libraries:
 "[...] they are easy to implement, and easy to implement as
 libraries (like top-down parsers), yet they are efficient and
 expressive (like bottom-up parsers)."
However, they have not made any head-to-head comparisons, and they
have not cited a single paper discussing efficient parser combinators.
Generally there is a serious lack of references to related work on
parser combinators.
Section 7 discusses another parsing technique (also based on
derivatives). However, I found this section to be poorly written and
hard to follow. I suggest that it should be removed.
If the claims about efficiency had been followed up with substantial
evidence, then I would have recommended that the paper should be
published. As it stands I do not make such a recommendation.
Detailed comments:
Section 2:
"This technique was lost to the “sands of time” until [paper from
2009]" A quick web search will tell you that a large number of papers
published before 2009 have used Brzozowski derivatives (or
generalisations), so this comment is incorrect. See for instance the
work of Rutten, including "Automata and Coinduction; an exercise in
coalgebra" and "Automata, Power Series, and Coinduction: Taking Input
Derivatives Seriously".
Section 6:
The expansion of D_x(List) would be easier to follow if you added a
step between the second and third lines.
"the null string terminal" and "the empty nonterminal" may not exist
in N, so they may have to be added to N'.
One could be led to believe that the last "and:" only applies when
m = 0.
"The empty nonterminal arises frequently with derivatives, and if this
reduction is applied aggressively, the grammar will remain a
manageable size." You give no proper motivation for this statement. In
fact, your discussion of the "special closure construct" in Section 9
seems to imply that this statement is false. I suggest that you remove
this statement.
Section 7:
The machine in this section does not make sense to me. First of all
you do not state what the machine should accomplish; my guess is that
you intend
 ((L, s), <>) ->* ((L', <>), <t>)
to hold iff t is a parse tree which witnesses the membership of s in
L. Furthermore you give no reason (such as a proof) to believe that
the machine satisfies this specification. For instance, you push
nonterminals onto the stack, but you never remove them (I suspect that
the last rule should remove a nonterminal), and it seems strange that
the state is unchanged in the rules involving "first".
I think the paper would be better if Section 7, which is not central
to the paper, was omitted.
"This approach is not well suited to typed languages, since a literal
implementation of the parsing machine must use a generic parse-tree
type, or else violate type-safety with casts for reductions." I
suspect that you could do better with a dependently typed language, so
I suggest that you remove this sentence.
Example 1: "the parse tree is:" I think you mean "one parse tree is:".
"The parsing machine demonstrates that, even without reduction rules
associated with the grammar, we can use derivatives to construct a
concrete syntax tree." I do not follow this sentence. What kind of
reduction rules are you referring to?
Section 8:
The parser combinators that you give seem quite outdated to me. The
functional programming community moved away from this set of
combinators more than a decade ago, in favour of combinators based on
monads or applicative functors. Please motivate this choice.
You state two properties that parser combinators must satisfy, but you
never really make any use of these properties. Are these properties
necessary for your development? Is your definition of a parser
combinator a function which satisfies these properties?
"A parser combinator is monotonic iff..." I assume that the second
instance of \sqsubseteq should be read pointwise.
"the meaning of an input expression w <- A^∗ is contained within
tau_N(w)" In fact it is the one and only (complete) result, right?
"D_c(c) = eps -> \w.{(c, w)}" This does not make sense, the second
argument of -> is not a parser.
Section 9:
Your parsers accept infinite streams as input. However, parseFull
clearly fails to terminate for infinite input. Why do you not restrict
the input to finite lists?
Your parsers return potentially infinite streams of results. However,
your parse function uses "append", which (essentially) throws away its
second argument if the first is infinite. Wouldn't it be better to
return a fair interleaving of the two streams?
In internalDerive you seem to have renamed a and b to first and
second, respectively.
"the longest possible parse first" Why is this relevant for
performance? If you return all possible parses, then the order in
which they are returned should not matter, and for unambiguous parsers
at most one result is returned anyway. I can imagine that your
optimisation matters in the (atypical?) case of an ambiguous parse, if
you are not interested in inspecting all the results, but only want to
see that there is more than one.
The sentence before the table seems to be unfinished.
I assume that "43 m" should be "43 ms".
"These results suggest that derivative-based parser combinators scale
well enough to be useful in practice." To me this result suggests that
I may have to repeatedly add new special-case combinators (like your
closure operation) to ensure decent performance.
Section 10:
"derivative-based parsing defies the classical taxonomies", "Top-down
methods", "back-tracking"... Are you not aware of parser combinator
libraries which use breadth-first techniques (which your combinators
also do)? See, for instance, Claessen's "Parallel Parsing Processes".
"these methods have not been able to handle left-recursive grammars"
You should check out Lickman's MSc thesis "Parsing With Fixed Points"
and "Parser Combinators for Ambiguous Left-Recursive Grammars" by
Frost et al.
"Philosophically, neither derivative-based method “feels” like a
top-down parsing method." To me your combinator implementation does
feel like a (breadth-first) top-down method.
You claim (implicitly) that your parser combinators can handle left
recursive grammars. However, I doubt that you can handle grammars of
the form "n ::= n". If you can handle such grammars, please explain
how you do it. Otherwise it would be interesting to know exactly what
kind of left recursion you support.
"they are efficient and expressive (like bottom-up parsers)" This
seems to imply that there is a language which can be parsed using
bottom-up techniques, but not using top-down techniques. Can you
verify this claim?
You may want to cite Frost and Szydlowski, who claim worst-case cubic
time for parsing ("Memoizing purely functional top-down backtracking
language processors"). Swierstra's work on efficient parser
combinators is also relevant (many papers).
References:
Improve formatting.
PROS AND CONS
+ Simple technique for implementing parser combinators.
- The technique has been presented before.
- No substantial evidence for claims of efficiency.
- Section 7 does not make sense to me (but could be removed).
REBUTTAL


-------------------- review 3 --------------------

PAPER: 104
TITLE: Parsing with derivatives

OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
REVIEWER'S CONFIDENCE: 2 (medium)
----------------------- REVIEW --------------------

MAIN PART
The paper proposes a new parsing technique based on "derivatives",
inspired by the derivatives of regular expressions and extended to the
derivatives of context free grammars.
Because context free grammars may be ambiguous, they propose to compute the
set of all parse trees but using lazy evaluation to share and delay the
computation. They present their framework and an implementation in Scala.
They claim that their technique is efficient in practice.

EVALUATION
Parsing techniques are old and well-established, although the current status
of the theory is still unsatisfactory: grammars are still a burden to write
and any advance of this topic is welcomed.  However, the topic must be
addressed very seriously.
Given this, this work is potentially interesting as a possible way to
improve the state of the art situation, but not in its current
presentation, as the paper should not just _claim_ that using derivatives
for parsing is feasible, but _demonstrate_ that writing grammars with
combinators is at least both as convenient and as efficient as using
existing technology.
The first aspect is not discussed at all in the paper.  As for efficiency,
which is claimed, the benchmarks are really small, and perhaps probably
representative of existing grammars, and the results are not compared with
other parsing techniques.
Other questions arise: what happens when the grammar is largely ambiguous?
Do you intend to reject those, or pick the meaning of the things that you
are parsing at random?
The overall structure of the paper is not very clear either. You should
gives us a road map at the beginning and make it clear how the different
sections fits together.

OTHER COMMENTS
Section 7 and 8 are the key sections of the paper to understand your
proposal. However, I have problems with both.
I don't understand while your parsing is decomposed into production of an
intermediate parsing string that is then consumed by the machine that
produces a syntax tree. Since the parsing string is a linear representation
of the parsing tree, why can't you work with the parsing tree itself, which
is more structured than the parsing string? Why can't you directly extract
a syntax tree from the parsing tree? Could you run the algorithm in Section
7 on an example?
In section 8, you compute the derivatives of parser combinators, but you
could instead compute the grammar defined by the parser combinator and
compute its derivative in one step. What is this bad/worse than computing it
directly?

- One line above section 2: "could [to<-] modify"
- Section 2, Line 10: "derivative---[they<-??] critical property"
- One line above Section 7: "remain [<-of] a manageable size.
- Computation of the derivative, above theorem 1:
 You could explain the derivatives, especially for rules n-> s1..sn
 and perhaps give an example.
- Last two lines above section 8: "The parsing machine demonstrates...
 a concrete syntax tree".
 I don't really understand what you (intend to) mean.
- Section 8.1, Line 1: "ability [the<-to] combine them"
- End of section 8.1:
 you should say and comment on the fact that these definitions are
 recursive.
- First line of section 8.4: "to be able [<-to] parse"
- Section 8.4: How do you use these inequations? Why do you give this list of
 formulas without any comment?

- Section 8.5: what are the lambda-terms on the right of --> ?


-------------------- review 4 --------------------

PAPER: 104
TITLE: Parsing with derivatives

OVERALL RATING: 1 (Serious problems. I will argue to reject this paper.)
REVIEWER'S CONFIDENCE: 3 (high)
----------------------- REVIEW --------------------

MAIN PART
SUMMARY
The paper proposes a basic algorithm for determining whether a sentence s is a
member of a context-free language L: (i) if the sentence is empty, determine
whether L contains the empty string (a decidable problem); (ii) if the
sentence is of the form c.s, determine whether the suffix s is a member of the
derivative c^{-1}.L. Because the derivative of a context-free language can be
effectively constructed, this is an algorithm.
The contribution of the paper is in generalizing this basic algorithm in two
ways: (i) instead of just determining membership, the parsing algorithm can be
made to return a concrete syntax tree, or a semantic value; (ii) instead of
viewing the derivative as an operation that accepts a monolithic grammar and
produces a monolithic grammar, one can view it as an operation that accepts a
parser combinator and produces a parser combinator; this means that the
parsing algorithm is amenable to a modular, combinator-based implementation.
EVALUATION
The introduction of the paper is somewhat irritating, for two reasons. First,
it promises a lot ("simplicity, feasibility, and ubiquity"), which the paper
does not clearly deliver. Second, it criticizes the most popular parsing
methods (LL and LR) on the grounds that the constraints that they impose, in
order to rule out ambiguous grammars, are difficult to grasp. I can accept
this criticism. But the authors do not offer any solution to the ambiguity
problem: their algorithm produces a parse forest. This leaves the programmer
faced with the task of writing and debugging disambiguation filters, an error
prone task. At least, an LR parser generator is able (in principle, and some
implementations actually do this) to accurately explain why the grammar seems
ambiguous. It can produce a sentence that is a prefix of the fringes of two
distinct, valid parse trees.
The first claimed contribution of the paper is the remark that the derivative
of a context free language can be effectively constructed. (A proof appears in
section 6.) This is in fact not a contribution. This property of context free
languages is well-known, at least in the formal language community (perhaps
not in the programming language community). The derivative of a formal language
is also known as its left quotient, and written c^{-1}L. See, for instance, the
following paper, which exploits this property intensively:
 Jean Berstel and Luc Boasson. Towards an algebraic theory of context-free
 languages. Fundamentae Informaticae, 25(3):217-239, 1996.
The remainder of the paper contains two distinct developments. In section 7, a
version of the derivative-based parsing algorithm is developed, which produces
concrete syntax trees. Then, in sections 8 and 9, another version is developed,
which produces programmer-defined semantic values, and is combinator-based.
It is not clear to me why these two distinct developments co-exist. In my eyes,
the approach of sections 8 and 9 seems superior, because it is more flexible
(it is not limited to producing concrete syntax trees) and modular.
The idea that the notion of a derivative can be generalized to parser
combinators is interesting, and worthy of thought. To some extent, section 8.2
is where the paper really begins! Section 8 provides a mathematical view of
the problem, while section 9 describes part of a Scala implementation.
This material is interesting, but can be criticized on several grounds:
1. The complexity analysis of the algorithm is absent. The algorithm seems
very expensive in the worst case: indeed, the effective construction of a
derivative in section 6 causes a quadratic blow up of the size of the grammar,
and this operation is iterated at every character of the input stream! In
fact, section 9.1 mentions that a naive implementation of the algorithm is
unusable (exponentially slow), and that special support for the Kleene star
had to be added. However, there is no argument that this patch is sufficient
for the algorithm to become useable. A worst-case analysis of the space and
time complexity is needed, and, if this worst-case complexity is bad (say,
exponential), then a formal or semi-formal argument is needed why the behavior
of the algorithm remains tolerable in practice. In light of the efficiency
claim found in the introduction, this is indispensable!
2. The description of the implementation could be improved. In particular,
memoization and sharing seem to play a crucial role. It seems that two
attempts to compute the derivative of a parser combinator should return the
same object. This includes the particular case where the second attempt is in
fact nested within the first one (i.e., it is a recursive call). For instance,
if the grammar contains the single production L -> LM, a call to L.derive(t)
will (indirectly) cause another call to L.derive(t). What happens then? By
which mechanism is non-termination avoided? This is not clearly described.
The paper contains four lines of text that seem to concern this topic (p.15,
"It is important to note [...] immediately suspended."), but I find them
obscure. I note that there seems to be a distinction between derive and
internalDerive, which is not explained in the paper; my guess is that the
former may be a memoizing version of the latter, using a table indexed by
tokens. Is this the case?
3. The introduction claims "ubiquity", that is, if I understand correctly, it
claims that the method is very easy to implement as a library. This may be
true, but the paper lacks a convincing argument why this is so. In particular,
how is this approach superior to the other parsing combinator libraries in
existence? The paper lacks a good comparison with other combinator-based
parsing techniques.
It seems that there might be a connection between the technique presented in
the paper and intersection parsing (see e.g. chapter 13 of the excellent book
"Parsing techniques", by Grune and Jacobs). Computing the derivative of a
context free language with respect to a terminal symbol c is closely related
to computing its intersection with the regular language c(.*). Intersection
parsing, as described by Grune and Jacobs, computes the intersection of the
grammar with the input (which is a string, so a regular language!) and
simplifies the resulting grammar, which can then be viewed as a parse
tree. Could one view the approach presented in the paper as an incremental
(and modular) way of doing intersection parsing?
In conclusion, although the paper contains an interesting idea, its
investigation is not sufficiently thorough, and a clear case in favor
of this parsing technique is missing.
DETAILED COMMENTS
The authors state in section 7 that "a literal implementation [...] must use a
generic parse-tree type, or else violate type safety with casts for
reductions". Strictly speaking, this depends on which type system one has in
mind. If the type system is ML, for instance, the claim is probably true. If
the type system has GADTs, however, it is quite possible that the discipline
of the parsing machine can be encoded in the types. This has been done, for an
LR parsing machine, and for an arbitrary but fixed grammar, in the following
paper:
 François Pottier and Yann Régis-Gianas. Towards efficient, typed LR parsers.
 In ACM Workshop on ML, volume 148(2) of Electronic Notes in Theoretical Computer
 Science, pages 155-180, 2006.
To the best of my knowledge, both Haskell and Scala have GADTs, so the authors
might wish to check this out.
p.5, "for any class of decidable languages closed under the derivative,
derivatives may be used to determine membership". I am not sure what is meant
by "decidable" here. If it is meant that membership in the language is
decidable, then this is a tautology. Perhaps one should understand that
membership *of the empty word* in the language is decidable, since effectively
that is all that is needed?
p.5, "the derivative is closed under regular operations": unclear formulation.
p.7, before theorem 1: please indicate the interval over which the index i
ranges. My guess is 0 \geq i < m. The particular case where m = 0 seems
redundant: when m = 0, there is no suitable index i, so no production is
generated, which means D_c(n) is empty. In fact, the notation \emptyset
in the right-hand side of a production does not make sense, since the
right-hand side of a production is supposed to be a sequence of symbols.
p.7, "the empty nonterminal": this has not been defined. I assume this
explains the use of \emptyset above. Define it.
p.8, example 1: "for the grammar [...] of balanced parentheses [...] the parse
tree is": I assume this should be: "the parse tree for the string '()'".
p.8, "A parse string is [...] containing both nonterminal markers and
reduction markers"." Please indicate at this point that a parse string also
contains elements of the alphabet A. In fact, the definition of the alphabet
A' should be moved to this point.
p.8, "we construct a new grammar [...] over parse strings": please change
this to "a new grammar over the alphabet A'".
p.8, the definition of \Sigma mentions \mathbb{L}: I guess this should be
\mathbb{L}_{A'}.
p.9, "a second rule for producing bras": is the rule correct? In the
right-hand side of the conclusion, instead of L, I would have expected the
derivative of L with respect to <n|. (The rule, as it stands, clearly causes
an infinite stream of <n|'s to be produced! Similarly for the third rule.
p.9, "the parsing machine reduces": one might wish to state the invariant
of the parsing machine which guarantees that reduction can never fail (by
lack of material on the stack).
p.9, "popular as of late": please provide citation(s).
p.10, the statement of the monotonicity condition is somewhat unclear: the
assertion \tau_X(w) \sqsubseteq \tau_X(w') compares two sets of pairs. An
ordering over pairs is defined, but an ordering over sets of pairs is not
defined.
p.11, the authors might want to spend more space explaining the definition of
the derivative of a parser combinator -- why is the right definition? and how
does it relate with the definition of the derivative of a formal language?
Can one recover the latter as a degenerate case of the former?
p.12, section 8.3: the parser combinator \epsilon is used, but has not been
defined. What is its type? In particular, what semantic value does it produce?
(A unit value?) What is the meaning of the notation \lambda\epsilon. ...
p.12, equation (3): the elegance afforded by the use of \delta in the
definition of the derivative of a concatenation (p.5) seems to have been lost
here.
p.13, "in practice, the cost [...] as they are derived". I don't know what
this means. Please be more precise and include a formal complexity claim.
p.14, "parse trees of type A" should really be referred to as "semantic
values", since they may not be trees at all -- they are programmer-defined.
p.14, "its implementation follows eq.2": actually, eq.2 corresponds only
to the "else" clause of the code, so the text is badly worded. Same problem
with eq.1 a few lines later.
p.15, the distinction between "derive" and "internalDerive" is not explained.
p.15, the derivative of union "short-circuits if it finds either arm is
empty": is this important, say, to ensure termination? or is it just a minor
optimization?
p.16, the fact that a naïve treatment of the Kleene star required exponential
time, and the fact that right recursion is significantly more efficient than
left recursion, are worrisome. These problems are perhaps partially masked by
the introduction of "the special closure construct", but how do we know that
they are gone?
p.16, the test suite consists of just one grammar and a set of randomly
generated inputs. This seems far from representative of actual usage...
p.17, "the cost of that compilation is amortized across the entire parsing
process". I don't know what is meant by "compilation" here, nor do I know
why its cost is amortized. Please explain.
p.18, "these combinators are slowly transforming and unfolding the grammar
itself into a parse tree": this seems to be an interesting intuition. Why
not try to make it more formal?

TYPOS
p.2, "could to modify"
p.2, "they critical"
p.6, "attempt interpret"
p.10, "Operations or combinators"
p.10, "the remainders of the input"
p.12, "is base case"
p.13, "able parse"
p.15, "short-circuiting if the first element is nullable": should this be: "is
not nullable"?
p.16, "43 m" -> "43 ms"
p.16, "surpising"
p.17, "they extent"
PROS AND CONS
+ interesting and possibly novel idea
- no time/space complexity analysis
- implementation not clearly described
- case in favor of the idea not clearly made
- comparison with related work should be more thorough
REBUTTAL
No specific questions.