| 
153
 | 
     1  | 
-------------- Reviews on submission 104 -------------
  | 
| 
 | 
     2  | 
  | 
| 
 | 
     3  | 
-------------------- review 1 --------------------
  | 
| 
 | 
     4  | 
  | 
| 
 | 
     5  | 
PAPER: 104
  | 
| 
 | 
     6  | 
TITLE: Parsing with derivatives
  | 
| 
 | 
     7  | 
  | 
| 
 | 
     8  | 
OVERALL RATING: 4 (OK paper, but I will not champion it.)
  | 
| 
 | 
     9  | 
REVIEWER'S CONFIDENCE: 2 (medium)
  | 
| 
 | 
    10  | 
----------------------- REVIEW --------------------
  | 
| 
 | 
    11  | 
  | 
| 
 | 
    12  | 
MAIN PART
  | 
| 
 | 
    13  | 
The paper describes an original approach to the derivation of parsers from grammars. The approach is based on the observation that context-free grammars are closed under Brzozowski derivatives, and such derivatives can be easily computed. The parser described hence parses a string by lazily computing the derivatives of the grammar for each character met. Quite surprisingly, such a parser seems very easy to encode, especially in a language that supports streams and lazy computations, can parse every grammar, and is quite efficient.
  | 
| 
 | 
    14  | 
I found the paper very interesting. The studied problem is very important, and is surprising to meet original results in a field that has been explored so much. I enjoyed the first pages of the paper very much, they are extremely crisp and clear. Unfortunately, around page 8, the paper starts getting more and more obscure; chapter 7 end abruptly, and chapter 8, which is probably the most important one, is quite obscure. At the end I still think that the paper should be accepted, but I really hope that the authors will be able to beef section 8 up to the point of making it readable for me.
  | 
| 
 | 
    15  | 
Detailed remarks for the authors only
  | 
| 
 | 
    16  | 
Example 1: the parse tree is: => the pasre tree of ' () ' is;
  | 
| 
 | 
    17  | 
In general: you should write <B||B->\epsilon> rather then <B|B->\epsilon> (this is not quantum theory, after all...)
  | 
| 
 | 
    18  | 
The non-deterministic state machine: the language in \Sigma is a language over A', not over A, right? You should say that explicitly.
  | 
| 
 | 
    19  | 
In the rules for bra's and ket's, the consequences should look like
  | 
| 
 | 
    20  | 
   (L,w) =>^{\bra{n}} (D_{\bra{n}}(L),w)
 | 
| 
 | 
    21  | 
I think you forgot the derivative, right?
  | 
| 
 | 
    22  | 
The parse-string machine pushed a bra when it meets it, but never pops it. You can easily adjust the last rule of section 7 to pop it, but the real question is: why do you push it?
  | 
| 
 | 
    23  | 
The end of section 7 is too abrupt. You have described a nice pair of abstract machines, but now you should tell me something a bit more concrete about the amount of nondetermins involved, how much would that hurt me in practice...
  | 
| 
 | 
    24  | 
Page 10: the monotonicity for parser combinators should look like:
  | 
| 
 | 
    25  | 
  forall p in \tau_X(w) exists p' in \tau_X(w') such that first p prefix of first p'
  | 
| 
 | 
    26  | 
Section 8 is obscure. It is too terse. The equations look right and look convincing, but I was totally unable to create a mental picture about how this machinery actually work. I understand that such a combination of non-determinism and lazyness creates code that tends to work in a mysterious way, but still the section is extremely unsatisfactory, especially.
  | 
| 
 | 
    27  | 
Section 9: a bit unreadable, at least for a non-scala programmer.
  | 
| 
 | 
    28  | 
PROS AND CONS
  | 
| 
 | 
    29  | 
+ interesting solution for an interesting problem
  | 
| 
 | 
    30  | 
+ elegant
  | 
| 
 | 
    31  | 
+ innovative
  | 
| 
 | 
    32  | 
+ first 8 pages are a pleasure to read
  | 
| 
 | 
    33  | 
- crucial sections are almost unreadable
  | 
| 
 | 
    34  | 
REBUTTAL
  | 
| 
 | 
    35  | 
  | 
| 
 | 
    36  | 
  | 
| 
 | 
    37  | 
-------------------- review 2 --------------------
  | 
| 
 | 
    38  | 
  | 
| 
 | 
    39  | 
PAPER: 104
  | 
| 
 | 
    40  | 
TITLE: Parsing with derivatives
  | 
| 
 | 
    41  | 
  | 
| 
 | 
    42  | 
OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
  | 
| 
 | 
    43  | 
REVIEWER'S CONFIDENCE: 4 (expert)
  | 
| 
 | 
    44  | 
----------------------- REVIEW --------------------
  | 
| 
 | 
    45  | 
  | 
| 
 | 
    46  | 
MAIN PART
  | 
| 
 | 
    47  | 
The authors describe how one can use Brzozowski-style derivatives to
  | 
| 
 | 
    48  | 
implement parser combinators. They claim that their technique leads to
  | 
| 
 | 
    49  | 
simple implementations, and this seems to be the case. They also claim
  | 
| 
 | 
    50  | 
that their implementation is efficient, but give no solid evidence for
  | 
| 
 | 
    51  | 
this (they added a special combinator to get decent numbers from their
  | 
| 
 | 
    52  | 
single reported experiment). Another claim is that their parser
  | 
| 
 | 
    53  | 
combinators can handle left recursion, but they give no evidence for
  | 
| 
 | 
    54  | 
this.
  | 
| 
 | 
    55  | 
The authors seem to be unaware that Danielsson presented essentially
  | 
| 
 | 
    56  | 
the same technique at TYPES 2009 ("Mixing Induction and Coinduction").
 | 
| 
 | 
    57  | 
However, Danielsson's draft paper does not try to sell the technique,
  | 
| 
 | 
    58  | 
only using it as a vehicle to implement provably total parser
  | 
| 
 | 
    59  | 
combinators, and he does not address efficiency of parsing, so there
  | 
| 
 | 
    60  | 
could be a place for both papers.
  | 
| 
 | 
    61  | 
The authors seem to claim that their implementation is a lot more
  | 
| 
 | 
    62  | 
efficient than other parser combinator libraries:
  | 
| 
 | 
    63  | 
 "[...] they are easy to implement, and easy to implement as
  | 
| 
 | 
    64  | 
 libraries (like top-down parsers), yet they are efficient and
  | 
| 
 | 
    65  | 
 expressive (like bottom-up parsers)."
  | 
| 
 | 
    66  | 
However, they have not made any head-to-head comparisons, and they
  | 
| 
 | 
    67  | 
have not cited a single paper discussing efficient parser combinators.
  | 
| 
 | 
    68  | 
Generally there is a serious lack of references to related work on
  | 
| 
 | 
    69  | 
parser combinators.
  | 
| 
 | 
    70  | 
Section 7 discusses another parsing technique (also based on
  | 
| 
 | 
    71  | 
derivatives). However, I found this section to be poorly written and
  | 
| 
 | 
    72  | 
hard to follow. I suggest that it should be removed.
  | 
| 
 | 
    73  | 
If the claims about efficiency had been followed up with substantial
  | 
| 
 | 
    74  | 
evidence, then I would have recommended that the paper should be
  | 
| 
 | 
    75  | 
published. As it stands I do not make such a recommendation.
  | 
| 
 | 
    76  | 
Detailed comments:
  | 
| 
 | 
    77  | 
Section 2:
  | 
| 
 | 
    78  | 
"This technique was lost to the “sands of time” until [paper from
  | 
| 
 | 
    79  | 
2009]" A quick web search will tell you that a large number of papers
  | 
| 
 | 
    80  | 
published before 2009 have used Brzozowski derivatives (or
  | 
| 
 | 
    81  | 
generalisations), so this comment is incorrect. See for instance the
  | 
| 
 | 
    82  | 
work of Rutten, including "Automata and Coinduction; an exercise in
  | 
| 
 | 
    83  | 
coalgebra" and "Automata, Power Series, and Coinduction: Taking Input
  | 
| 
 | 
    84  | 
Derivatives Seriously".
  | 
| 
 | 
    85  | 
Section 6:
  | 
| 
 | 
    86  | 
The expansion of D_x(List) would be easier to follow if you added a
  | 
| 
 | 
    87  | 
step between the second and third lines.
  | 
| 
 | 
    88  | 
"the null string terminal" and "the empty nonterminal" may not exist
  | 
| 
 | 
    89  | 
in N, so they may have to be added to N'.
  | 
| 
 | 
    90  | 
One could be led to believe that the last "and:" only applies when
  | 
| 
 | 
    91  | 
m = 0.
  | 
| 
 | 
    92  | 
"The empty nonterminal arises frequently with derivatives, and if this
  | 
| 
 | 
    93  | 
reduction is applied aggressively, the grammar will remain a
  | 
| 
 | 
    94  | 
manageable size." You give no proper motivation for this statement. In
  | 
| 
 | 
    95  | 
fact, your discussion of the "special closure construct" in Section 9
  | 
| 
 | 
    96  | 
seems to imply that this statement is false. I suggest that you remove
  | 
| 
 | 
    97  | 
this statement.
  | 
| 
 | 
    98  | 
Section 7:
  | 
| 
 | 
    99  | 
The machine in this section does not make sense to me. First of all
  | 
| 
 | 
   100  | 
you do not state what the machine should accomplish; my guess is that
  | 
| 
 | 
   101  | 
you intend
  | 
| 
 | 
   102  | 
 ((L, s), <>) ->* ((L', <>), <t>)
  | 
| 
 | 
   103  | 
to hold iff t is a parse tree which witnesses the membership of s in
  | 
| 
 | 
   104  | 
L. Furthermore you give no reason (such as a proof) to believe that
  | 
| 
 | 
   105  | 
the machine satisfies this specification. For instance, you push
  | 
| 
 | 
   106  | 
nonterminals onto the stack, but you never remove them (I suspect that
  | 
| 
 | 
   107  | 
the last rule should remove a nonterminal), and it seems strange that
  | 
| 
 | 
   108  | 
the state is unchanged in the rules involving "first".
  | 
| 
 | 
   109  | 
I think the paper would be better if Section 7, which is not central
  | 
| 
 | 
   110  | 
to the paper, was omitted.
  | 
| 
 | 
   111  | 
"This approach is not well suited to typed languages, since a literal
  | 
| 
 | 
   112  | 
implementation of the parsing machine must use a generic parse-tree
  | 
| 
 | 
   113  | 
type, or else violate type-safety with casts for reductions." I
  | 
| 
 | 
   114  | 
suspect that you could do better with a dependently typed language, so
  | 
| 
 | 
   115  | 
I suggest that you remove this sentence.
  | 
| 
 | 
   116  | 
Example 1: "the parse tree is:" I think you mean "one parse tree is:".
  | 
| 
 | 
   117  | 
"The parsing machine demonstrates that, even without reduction rules
  | 
| 
 | 
   118  | 
associated with the grammar, we can use derivatives to construct a
  | 
| 
 | 
   119  | 
concrete syntax tree." I do not follow this sentence. What kind of
  | 
| 
 | 
   120  | 
reduction rules are you referring to?
  | 
| 
 | 
   121  | 
Section 8:
  | 
| 
 | 
   122  | 
The parser combinators that you give seem quite outdated to me. The
  | 
| 
 | 
   123  | 
functional programming community moved away from this set of
  | 
| 
 | 
   124  | 
combinators more than a decade ago, in favour of combinators based on
  | 
| 
 | 
   125  | 
monads or applicative functors. Please motivate this choice.
  | 
| 
 | 
   126  | 
You state two properties that parser combinators must satisfy, but you
  | 
| 
 | 
   127  | 
never really make any use of these properties. Are these properties
  | 
| 
 | 
   128  | 
necessary for your development? Is your definition of a parser
  | 
| 
 | 
   129  | 
combinator a function which satisfies these properties?
  | 
| 
 | 
   130  | 
"A parser combinator is monotonic iff..." I assume that the second
  | 
| 
 | 
   131  | 
instance of \sqsubseteq should be read pointwise.
  | 
| 
 | 
   132  | 
"the meaning of an input expression w <- A^∗ is contained within
  | 
| 
 | 
   133  | 
tau_N(w)" In fact it is the one and only (complete) result, right?
  | 
| 
 | 
   134  | 
"D_c(c) = eps -> \w.{(c, w)}" This does not make sense, the second
 | 
| 
 | 
   135  | 
argument of -> is not a parser.
  | 
| 
 | 
   136  | 
Section 9:
  | 
| 
 | 
   137  | 
Your parsers accept infinite streams as input. However, parseFull
  | 
| 
 | 
   138  | 
clearly fails to terminate for infinite input. Why do you not restrict
  | 
| 
 | 
   139  | 
the input to finite lists?
  | 
| 
 | 
   140  | 
Your parsers return potentially infinite streams of results. However,
  | 
| 
 | 
   141  | 
your parse function uses "append", which (essentially) throws away its
  | 
| 
 | 
   142  | 
second argument if the first is infinite. Wouldn't it be better to
  | 
| 
 | 
   143  | 
return a fair interleaving of the two streams?
  | 
| 
 | 
   144  | 
In internalDerive you seem to have renamed a and b to first and
  | 
| 
 | 
   145  | 
second, respectively.
  | 
| 
 | 
   146  | 
"the longest possible parse first" Why is this relevant for
  | 
| 
 | 
   147  | 
performance? If you return all possible parses, then the order in
  | 
| 
 | 
   148  | 
which they are returned should not matter, and for unambiguous parsers
  | 
| 
 | 
   149  | 
at most one result is returned anyway. I can imagine that your
  | 
| 
 | 
   150  | 
optimisation matters in the (atypical?) case of an ambiguous parse, if
  | 
| 
 | 
   151  | 
you are not interested in inspecting all the results, but only want to
  | 
| 
 | 
   152  | 
see that there is more than one.
  | 
| 
 | 
   153  | 
The sentence before the table seems to be unfinished.
  | 
| 
 | 
   154  | 
I assume that "43 m" should be "43 ms".
  | 
| 
 | 
   155  | 
"These results suggest that derivative-based parser combinators scale
  | 
| 
 | 
   156  | 
well enough to be useful in practice." To me this result suggests that
  | 
| 
 | 
   157  | 
I may have to repeatedly add new special-case combinators (like your
  | 
| 
 | 
   158  | 
closure operation) to ensure decent performance.
  | 
| 
 | 
   159  | 
Section 10:
  | 
| 
 | 
   160  | 
"derivative-based parsing defies the classical taxonomies", "Top-down
  | 
| 
 | 
   161  | 
methods", "back-tracking"... Are you not aware of parser combinator
  | 
| 
 | 
   162  | 
libraries which use breadth-first techniques (which your combinators
  | 
| 
 | 
   163  | 
also do)? See, for instance, Claessen's "Parallel Parsing Processes".
  | 
| 
 | 
   164  | 
"these methods have not been able to handle left-recursive grammars"
  | 
| 
 | 
   165  | 
You should check out Lickman's MSc thesis "Parsing With Fixed Points"
  | 
| 
 | 
   166  | 
and "Parser Combinators for Ambiguous Left-Recursive Grammars" by
  | 
| 
 | 
   167  | 
Frost et al.
  | 
| 
 | 
   168  | 
"Philosophically, neither derivative-based method “feels” like a
  | 
| 
 | 
   169  | 
top-down parsing method." To me your combinator implementation does
  | 
| 
 | 
   170  | 
feel like a (breadth-first) top-down method.
  | 
| 
 | 
   171  | 
You claim (implicitly) that your parser combinators can handle left
  | 
| 
 | 
   172  | 
recursive grammars. However, I doubt that you can handle grammars of
  | 
| 
 | 
   173  | 
the form "n ::= n". If you can handle such grammars, please explain
  | 
| 
 | 
   174  | 
how you do it. Otherwise it would be interesting to know exactly what
  | 
| 
 | 
   175  | 
kind of left recursion you support.
  | 
| 
 | 
   176  | 
"they are efficient and expressive (like bottom-up parsers)" This
  | 
| 
 | 
   177  | 
seems to imply that there is a language which can be parsed using
  | 
| 
 | 
   178  | 
bottom-up techniques, but not using top-down techniques. Can you
  | 
| 
 | 
   179  | 
verify this claim?
  | 
| 
 | 
   180  | 
You may want to cite Frost and Szydlowski, who claim worst-case cubic
  | 
| 
 | 
   181  | 
time for parsing ("Memoizing purely functional top-down backtracking
 | 
| 
 | 
   182  | 
language processors"). Swierstra's work on efficient parser
  | 
| 
 | 
   183  | 
combinators is also relevant (many papers).
  | 
| 
 | 
   184  | 
References:
  | 
| 
 | 
   185  | 
Improve formatting.
  | 
| 
 | 
   186  | 
PROS AND CONS
  | 
| 
 | 
   187  | 
+ Simple technique for implementing parser combinators.
  | 
| 
 | 
   188  | 
- The technique has been presented before.
  | 
| 
 | 
   189  | 
- No substantial evidence for claims of efficiency.
  | 
| 
 | 
   190  | 
- Section 7 does not make sense to me (but could be removed).
  | 
| 
 | 
   191  | 
REBUTTAL
  | 
| 
 | 
   192  | 
  | 
| 
 | 
   193  | 
  | 
| 
 | 
   194  | 
-------------------- review 3 --------------------
  | 
| 
 | 
   195  | 
  | 
| 
 | 
   196  | 
PAPER: 104
  | 
| 
 | 
   197  | 
TITLE: Parsing with derivatives
  | 
| 
 | 
   198  | 
  | 
| 
 | 
   199  | 
OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.)
  | 
| 
 | 
   200  | 
REVIEWER'S CONFIDENCE: 2 (medium)
  | 
| 
 | 
   201  | 
----------------------- REVIEW --------------------
  | 
| 
 | 
   202  | 
  | 
| 
 | 
   203  | 
MAIN PART
  | 
| 
 | 
   204  | 
The paper proposes a new parsing technique based on "derivatives",
  | 
| 
 | 
   205  | 
inspired by the derivatives of regular expressions and extended to the
  | 
| 
 | 
   206  | 
derivatives of context free grammars.
  | 
| 
 | 
   207  | 
Because context free grammars may be ambiguous, they propose to compute the
  | 
| 
 | 
   208  | 
set of all parse trees but using lazy evaluation to share and delay the
  | 
| 
 | 
   209  | 
computation. They present their framework and an implementation in Scala.
  | 
| 
 | 
   210  | 
They claim that their technique is efficient in practice.
  | 
| 
 | 
   211  | 
  | 
| 
 | 
   212  | 
EVALUATION
  | 
| 
 | 
   213  | 
Parsing techniques are old and well-established, although the current status
  | 
| 
 | 
   214  | 
of the theory is still unsatisfactory: grammars are still a burden to write
  | 
| 
 | 
   215  | 
and any advance of this topic is welcomed.  However, the topic must be
  | 
| 
 | 
   216  | 
addressed very seriously.
  | 
| 
 | 
   217  | 
Given this, this work is potentially interesting as a possible way to
  | 
| 
 | 
   218  | 
improve the state of the art situation, but not in its current
  | 
| 
 | 
   219  | 
presentation, as the paper should not just _claim_ that using derivatives
  | 
| 
 | 
   220  | 
for parsing is feasible, but _demonstrate_ that writing grammars with
  | 
| 
 | 
   221  | 
combinators is at least both as convenient and as efficient as using
  | 
| 
 | 
   222  | 
existing technology.
  | 
| 
 | 
   223  | 
The first aspect is not discussed at all in the paper.  As for efficiency,
  | 
| 
 | 
   224  | 
which is claimed, the benchmarks are really small, and perhaps probably
  | 
| 
 | 
   225  | 
representative of existing grammars, and the results are not compared with
  | 
| 
 | 
   226  | 
other parsing techniques.
  | 
| 
 | 
   227  | 
Other questions arise: what happens when the grammar is largely ambiguous?
  | 
| 
 | 
   228  | 
Do you intend to reject those, or pick the meaning of the things that you
  | 
| 
 | 
   229  | 
are parsing at random?
  | 
| 
 | 
   230  | 
The overall structure of the paper is not very clear either. You should
  | 
| 
 | 
   231  | 
gives us a road map at the beginning and make it clear how the different
  | 
| 
 | 
   232  | 
sections fits together.
  | 
| 
 | 
   233  | 
  | 
| 
 | 
   234  | 
OTHER COMMENTS
  | 
| 
 | 
   235  | 
Section 7 and 8 are the key sections of the paper to understand your
  | 
| 
 | 
   236  | 
proposal. However, I have problems with both.
  | 
| 
 | 
   237  | 
I don't understand while your parsing is decomposed into production of an
  | 
| 
 | 
   238  | 
intermediate parsing string that is then consumed by the machine that
  | 
| 
 | 
   239  | 
produces a syntax tree. Since the parsing string is a linear representation
  | 
| 
 | 
   240  | 
of the parsing tree, why can't you work with the parsing tree itself, which
  | 
| 
 | 
   241  | 
is more structured than the parsing string? Why can't you directly extract
  | 
| 
 | 
   242  | 
a syntax tree from the parsing tree? Could you run the algorithm in Section
  | 
| 
 | 
   243  | 
7 on an example?
  | 
| 
 | 
   244  | 
In section 8, you compute the derivatives of parser combinators, but you
  | 
| 
 | 
   245  | 
could instead compute the grammar defined by the parser combinator and
  | 
| 
 | 
   246  | 
compute its derivative in one step. What is this bad/worse than computing it
  | 
| 
 | 
   247  | 
directly?
  | 
| 
 | 
   248  | 
  | 
| 
 | 
   249  | 
- One line above section 2: "could [to<-] modify"
  | 
| 
 | 
   250  | 
- Section 2, Line 10: "derivative---[they<-??] critical property"
  | 
| 
 | 
   251  | 
- One line above Section 7: "remain [<-of] a manageable size.
  | 
| 
 | 
   252  | 
- Computation of the derivative, above theorem 1:
  | 
| 
 | 
   253  | 
 You could explain the derivatives, especially for rules n-> s1..sn
  | 
| 
 | 
   254  | 
 and perhaps give an example.
  | 
| 
 | 
   255  | 
- Last two lines above section 8: "The parsing machine demonstrates...
  | 
| 
 | 
   256  | 
 a concrete syntax tree".
  | 
| 
 | 
   257  | 
 I don't really understand what you (intend to) mean.
  | 
| 
 | 
   258  | 
- Section 8.1, Line 1: "ability [the<-to] combine them"
  | 
| 
 | 
   259  | 
- End of section 8.1:
  | 
| 
 | 
   260  | 
 you should say and comment on the fact that these definitions are
  | 
| 
 | 
   261  | 
 recursive.
  | 
| 
 | 
   262  | 
- First line of section 8.4: "to be able [<-to] parse"
  | 
| 
 | 
   263  | 
- Section 8.4: How do you use these inequations? Why do you give this list of
  | 
| 
 | 
   264  | 
 formulas without any comment?
  | 
| 
 | 
   265  | 
  | 
| 
 | 
   266  | 
- Section 8.5: what are the lambda-terms on the right of --> ?
  | 
| 
 | 
   267  | 
  | 
| 
 | 
   268  | 
  | 
| 
 | 
   269  | 
-------------------- review 4 --------------------
  | 
| 
 | 
   270  | 
  | 
| 
 | 
   271  | 
PAPER: 104
  | 
| 
 | 
   272  | 
TITLE: Parsing with derivatives
  | 
| 
 | 
   273  | 
  | 
| 
 | 
   274  | 
OVERALL RATING: 1 (Serious problems. I will argue to reject this paper.)
  | 
| 
 | 
   275  | 
REVIEWER'S CONFIDENCE: 3 (high)
  | 
| 
 | 
   276  | 
----------------------- REVIEW --------------------
  | 
| 
 | 
   277  | 
  | 
| 
 | 
   278  | 
MAIN PART
  | 
| 
 | 
   279  | 
SUMMARY
  | 
| 
 | 
   280  | 
The paper proposes a basic algorithm for determining whether a sentence s is a
  | 
| 
 | 
   281  | 
member of a context-free language L: (i) if the sentence is empty, determine
  | 
| 
 | 
   282  | 
whether L contains the empty string (a decidable problem); (ii) if the
  | 
| 
 | 
   283  | 
sentence is of the form c.s, determine whether the suffix s is a member of the
  | 
| 
 | 
   284  | 
derivative c^{-1}.L. Because the derivative of a context-free language can be
 | 
| 
 | 
   285  | 
effectively constructed, this is an algorithm.
  | 
| 
 | 
   286  | 
The contribution of the paper is in generalizing this basic algorithm in two
  | 
| 
 | 
   287  | 
ways: (i) instead of just determining membership, the parsing algorithm can be
  | 
| 
 | 
   288  | 
made to return a concrete syntax tree, or a semantic value; (ii) instead of
  | 
| 
 | 
   289  | 
viewing the derivative as an operation that accepts a monolithic grammar and
  | 
| 
 | 
   290  | 
produces a monolithic grammar, one can view it as an operation that accepts a
  | 
| 
 | 
   291  | 
parser combinator and produces a parser combinator; this means that the
  | 
| 
 | 
   292  | 
parsing algorithm is amenable to a modular, combinator-based implementation.
  | 
| 
 | 
   293  | 
EVALUATION
  | 
| 
 | 
   294  | 
The introduction of the paper is somewhat irritating, for two reasons. First,
  | 
| 
 | 
   295  | 
it promises a lot ("simplicity, feasibility, and ubiquity"), which the paper
 | 
| 
 | 
   296  | 
does not clearly deliver. Second, it criticizes the most popular parsing
  | 
| 
 | 
   297  | 
methods (LL and LR) on the grounds that the constraints that they impose, in
  | 
| 
 | 
   298  | 
order to rule out ambiguous grammars, are difficult to grasp. I can accept
  | 
| 
 | 
   299  | 
this criticism. But the authors do not offer any solution to the ambiguity
  | 
| 
 | 
   300  | 
problem: their algorithm produces a parse forest. This leaves the programmer
  | 
| 
 | 
   301  | 
faced with the task of writing and debugging disambiguation filters, an error
  | 
| 
 | 
   302  | 
prone task. At least, an LR parser generator is able (in principle, and some
  | 
| 
 | 
   303  | 
implementations actually do this) to accurately explain why the grammar seems
  | 
| 
 | 
   304  | 
ambiguous. It can produce a sentence that is a prefix of the fringes of two
  | 
| 
 | 
   305  | 
distinct, valid parse trees.
  | 
| 
 | 
   306  | 
The first claimed contribution of the paper is the remark that the derivative
  | 
| 
 | 
   307  | 
of a context free language can be effectively constructed. (A proof appears in
  | 
| 
 | 
   308  | 
section 6.) This is in fact not a contribution. This property of context free
  | 
| 
 | 
   309  | 
languages is well-known, at least in the formal language community (perhaps
  | 
| 
 | 
   310  | 
not in the programming language community). The derivative of a formal language
  | 
| 
 | 
   311  | 
is also known as its left quotient, and written c^{-1}L. See, for instance, the
 | 
| 
 | 
   312  | 
following paper, which exploits this property intensively:
  | 
| 
 | 
   313  | 
 Jean Berstel and Luc Boasson. Towards an algebraic theory of context-free
  | 
| 
 | 
   314  | 
 languages. Fundamentae Informaticae, 25(3):217-239, 1996.
  | 
| 
 | 
   315  | 
The remainder of the paper contains two distinct developments. In section 7, a
  | 
| 
 | 
   316  | 
version of the derivative-based parsing algorithm is developed, which produces
  | 
| 
 | 
   317  | 
concrete syntax trees. Then, in sections 8 and 9, another version is developed,
  | 
| 
 | 
   318  | 
which produces programmer-defined semantic values, and is combinator-based.
  | 
| 
 | 
   319  | 
It is not clear to me why these two distinct developments co-exist. In my eyes,
  | 
| 
 | 
   320  | 
the approach of sections 8 and 9 seems superior, because it is more flexible
  | 
| 
 | 
   321  | 
(it is not limited to producing concrete syntax trees) and modular.
  | 
| 
 | 
   322  | 
The idea that the notion of a derivative can be generalized to parser
  | 
| 
 | 
   323  | 
combinators is interesting, and worthy of thought. To some extent, section 8.2
  | 
| 
 | 
   324  | 
is where the paper really begins! Section 8 provides a mathematical view of
  | 
| 
 | 
   325  | 
the problem, while section 9 describes part of a Scala implementation.
  | 
| 
 | 
   326  | 
This material is interesting, but can be criticized on several grounds:
  | 
| 
 | 
   327  | 
1. The complexity analysis of the algorithm is absent. The algorithm seems
  | 
| 
 | 
   328  | 
very expensive in the worst case: indeed, the effective construction of a
  | 
| 
 | 
   329  | 
derivative in section 6 causes a quadratic blow up of the size of the grammar,
  | 
| 
 | 
   330  | 
and this operation is iterated at every character of the input stream! In
  | 
| 
 | 
   331  | 
fact, section 9.1 mentions that a naive implementation of the algorithm is
  | 
| 
 | 
   332  | 
unusable (exponentially slow), and that special support for the Kleene star
  | 
| 
 | 
   333  | 
had to be added. However, there is no argument that this patch is sufficient
  | 
| 
 | 
   334  | 
for the algorithm to become useable. A worst-case analysis of the space and
  | 
| 
 | 
   335  | 
time complexity is needed, and, if this worst-case complexity is bad (say,
  | 
| 
 | 
   336  | 
exponential), then a formal or semi-formal argument is needed why the behavior
  | 
| 
 | 
   337  | 
of the algorithm remains tolerable in practice. In light of the efficiency
  | 
| 
 | 
   338  | 
claim found in the introduction, this is indispensable!
  | 
| 
 | 
   339  | 
2. The description of the implementation could be improved. In particular,
  | 
| 
 | 
   340  | 
memoization and sharing seem to play a crucial role. It seems that two
  | 
| 
 | 
   341  | 
attempts to compute the derivative of a parser combinator should return the
  | 
| 
 | 
   342  | 
same object. This includes the particular case where the second attempt is in
  | 
| 
 | 
   343  | 
fact nested within the first one (i.e., it is a recursive call). For instance,
  | 
| 
 | 
   344  | 
if the grammar contains the single production L -> LM, a call to L.derive(t)
  | 
| 
 | 
   345  | 
will (indirectly) cause another call to L.derive(t). What happens then? By
  | 
| 
 | 
   346  | 
which mechanism is non-termination avoided? This is not clearly described.
  | 
| 
 | 
   347  | 
The paper contains four lines of text that seem to concern this topic (p.15,
  | 
| 
 | 
   348  | 
"It is important to note [...] immediately suspended."), but I find them
  | 
| 
 | 
   349  | 
obscure. I note that there seems to be a distinction between derive and
  | 
| 
 | 
   350  | 
internalDerive, which is not explained in the paper; my guess is that the
  | 
| 
 | 
   351  | 
former may be a memoizing version of the latter, using a table indexed by
  | 
| 
 | 
   352  | 
tokens. Is this the case?
  | 
| 
 | 
   353  | 
3. The introduction claims "ubiquity", that is, if I understand correctly, it
  | 
| 
 | 
   354  | 
claims that the method is very easy to implement as a library. This may be
  | 
| 
 | 
   355  | 
true, but the paper lacks a convincing argument why this is so. In particular,
  | 
| 
 | 
   356  | 
how is this approach superior to the other parsing combinator libraries in
  | 
| 
 | 
   357  | 
existence? The paper lacks a good comparison with other combinator-based
  | 
| 
 | 
   358  | 
parsing techniques.
  | 
| 
 | 
   359  | 
It seems that there might be a connection between the technique presented in
  | 
| 
 | 
   360  | 
the paper and intersection parsing (see e.g. chapter 13 of the excellent book
  | 
| 
 | 
   361  | 
"Parsing techniques", by Grune and Jacobs). Computing the derivative of a
  | 
| 
 | 
   362  | 
context free language with respect to a terminal symbol c is closely related
  | 
| 
 | 
   363  | 
to computing its intersection with the regular language c(.*). Intersection
  | 
| 
 | 
   364  | 
parsing, as described by Grune and Jacobs, computes the intersection of the
  | 
| 
 | 
   365  | 
grammar with the input (which is a string, so a regular language!) and
  | 
| 
 | 
   366  | 
simplifies the resulting grammar, which can then be viewed as a parse
  | 
| 
 | 
   367  | 
tree. Could one view the approach presented in the paper as an incremental
  | 
| 
 | 
   368  | 
(and modular) way of doing intersection parsing?
  | 
| 
 | 
   369  | 
In conclusion, although the paper contains an interesting idea, its
  | 
| 
 | 
   370  | 
investigation is not sufficiently thorough, and a clear case in favor
  | 
| 
 | 
   371  | 
of this parsing technique is missing.
  | 
| 
 | 
   372  | 
DETAILED COMMENTS
  | 
| 
 | 
   373  | 
The authors state in section 7 that "a literal implementation [...] must use a
  | 
| 
 | 
   374  | 
generic parse-tree type, or else violate type safety with casts for
  | 
| 
 | 
   375  | 
reductions". Strictly speaking, this depends on which type system one has in
  | 
| 
 | 
   376  | 
mind. If the type system is ML, for instance, the claim is probably true. If
  | 
| 
 | 
   377  | 
the type system has GADTs, however, it is quite possible that the discipline
  | 
| 
 | 
   378  | 
of the parsing machine can be encoded in the types. This has been done, for an
  | 
| 
 | 
   379  | 
LR parsing machine, and for an arbitrary but fixed grammar, in the following
  | 
| 
 | 
   380  | 
paper:
  | 
| 
 | 
   381  | 
 François Pottier and Yann Régis-Gianas. Towards efficient, typed LR parsers.
  | 
| 
 | 
   382  | 
 In ACM Workshop on ML, volume 148(2) of Electronic Notes in Theoretical Computer
  | 
| 
 | 
   383  | 
 Science, pages 155-180, 2006.
  | 
| 
 | 
   384  | 
To the best of my knowledge, both Haskell and Scala have GADTs, so the authors
  | 
| 
 | 
   385  | 
might wish to check this out.
  | 
| 
 | 
   386  | 
p.5, "for any class of decidable languages closed under the derivative,
  | 
| 
 | 
   387  | 
derivatives may be used to determine membership". I am not sure what is meant
  | 
| 
 | 
   388  | 
by "decidable" here. If it is meant that membership in the language is
  | 
| 
 | 
   389  | 
decidable, then this is a tautology. Perhaps one should understand that
  | 
| 
 | 
   390  | 
membership *of the empty word* in the language is decidable, since effectively
  | 
| 
 | 
   391  | 
that is all that is needed?
  | 
| 
 | 
   392  | 
p.5, "the derivative is closed under regular operations": unclear formulation.
  | 
| 
 | 
   393  | 
p.7, before theorem 1: please indicate the interval over which the index i
  | 
| 
 | 
   394  | 
ranges. My guess is 0 \geq i < m. The particular case where m = 0 seems
  | 
| 
 | 
   395  | 
redundant: when m = 0, there is no suitable index i, so no production is
  | 
| 
 | 
   396  | 
generated, which means D_c(n) is empty. In fact, the notation \emptyset
  | 
| 
 | 
   397  | 
in the right-hand side of a production does not make sense, since the
  | 
| 
 | 
   398  | 
right-hand side of a production is supposed to be a sequence of symbols.
  | 
| 
 | 
   399  | 
p.7, "the empty nonterminal": this has not been defined. I assume this
  | 
| 
 | 
   400  | 
explains the use of \emptyset above. Define it.
  | 
| 
 | 
   401  | 
p.8, example 1: "for the grammar [...] of balanced parentheses [...] the parse
  | 
| 
 | 
   402  | 
tree is": I assume this should be: "the parse tree for the string '()'".
  | 
| 
 | 
   403  | 
p.8, "A parse string is [...] containing both nonterminal markers and
  | 
| 
 | 
   404  | 
reduction markers"." Please indicate at this point that a parse string also
  | 
| 
 | 
   405  | 
contains elements of the alphabet A. In fact, the definition of the alphabet
  | 
| 
 | 
   406  | 
A' should be moved to this point.
  | 
| 
 | 
   407  | 
p.8, "we construct a new grammar [...] over parse strings": please change
  | 
| 
 | 
   408  | 
this to "a new grammar over the alphabet A'".
  | 
| 
 | 
   409  | 
p.8, the definition of \Sigma mentions \mathbb{L}: I guess this should be
 | 
| 
 | 
   410  | 
\mathbb{L}_{A'}.
 | 
| 
 | 
   411  | 
p.9, "a second rule for producing bras": is the rule correct? In the
  | 
| 
 | 
   412  | 
right-hand side of the conclusion, instead of L, I would have expected the
  | 
| 
 | 
   413  | 
derivative of L with respect to <n|. (The rule, as it stands, clearly causes
  | 
| 
 | 
   414  | 
an infinite stream of <n|'s to be produced! Similarly for the third rule.
  | 
| 
 | 
   415  | 
p.9, "the parsing machine reduces": one might wish to state the invariant
  | 
| 
 | 
   416  | 
of the parsing machine which guarantees that reduction can never fail (by
  | 
| 
 | 
   417  | 
lack of material on the stack).
  | 
| 
 | 
   418  | 
p.9, "popular as of late": please provide citation(s).
  | 
| 
 | 
   419  | 
p.10, the statement of the monotonicity condition is somewhat unclear: the
  | 
| 
 | 
   420  | 
assertion \tau_X(w) \sqsubseteq \tau_X(w') compares two sets of pairs. An
  | 
| 
 | 
   421  | 
ordering over pairs is defined, but an ordering over sets of pairs is not
  | 
| 
 | 
   422  | 
defined.
  | 
| 
 | 
   423  | 
p.11, the authors might want to spend more space explaining the definition of
  | 
| 
 | 
   424  | 
the derivative of a parser combinator -- why is the right definition? and how
  | 
| 
 | 
   425  | 
does it relate with the definition of the derivative of a formal language?
  | 
| 
 | 
   426  | 
Can one recover the latter as a degenerate case of the former?
  | 
| 
 | 
   427  | 
p.12, section 8.3: the parser combinator \epsilon is used, but has not been
  | 
| 
 | 
   428  | 
defined. What is its type? In particular, what semantic value does it produce?
  | 
| 
 | 
   429  | 
(A unit value?) What is the meaning of the notation \lambda\epsilon. ...
  | 
| 
 | 
   430  | 
p.12, equation (3): the elegance afforded by the use of \delta in the
  | 
| 
 | 
   431  | 
definition of the derivative of a concatenation (p.5) seems to have been lost
  | 
| 
 | 
   432  | 
here.
  | 
| 
 | 
   433  | 
p.13, "in practice, the cost [...] as they are derived". I don't know what
  | 
| 
 | 
   434  | 
this means. Please be more precise and include a formal complexity claim.
  | 
| 
 | 
   435  | 
p.14, "parse trees of type A" should really be referred to as "semantic
  | 
| 
 | 
   436  | 
values", since they may not be trees at all -- they are programmer-defined.
  | 
| 
 | 
   437  | 
p.14, "its implementation follows eq.2": actually, eq.2 corresponds only
  | 
| 
 | 
   438  | 
to the "else" clause of the code, so the text is badly worded. Same problem
  | 
| 
 | 
   439  | 
with eq.1 a few lines later.
  | 
| 
 | 
   440  | 
p.15, the distinction between "derive" and "internalDerive" is not explained.
  | 
| 
 | 
   441  | 
p.15, the derivative of union "short-circuits if it finds either arm is
  | 
| 
 | 
   442  | 
empty": is this important, say, to ensure termination? or is it just a minor
  | 
| 
 | 
   443  | 
optimization?
  | 
| 
 | 
   444  | 
p.16, the fact that a naïve treatment of the Kleene star required exponential
  | 
| 
 | 
   445  | 
time, and the fact that right recursion is significantly more efficient than
  | 
| 
 | 
   446  | 
left recursion, are worrisome. These problems are perhaps partially masked by
  | 
| 
 | 
   447  | 
the introduction of "the special closure construct", but how do we know that
  | 
| 
 | 
   448  | 
they are gone?
  | 
| 
 | 
   449  | 
p.16, the test suite consists of just one grammar and a set of randomly
  | 
| 
 | 
   450  | 
generated inputs. This seems far from representative of actual usage...
  | 
| 
 | 
   451  | 
p.17, "the cost of that compilation is amortized across the entire parsing
  | 
| 
 | 
   452  | 
process". I don't know what is meant by "compilation" here, nor do I know
  | 
| 
 | 
   453  | 
why its cost is amortized. Please explain.
  | 
| 
 | 
   454  | 
p.18, "these combinators are slowly transforming and unfolding the grammar
  | 
| 
 | 
   455  | 
itself into a parse tree": this seems to be an interesting intuition. Why
  | 
| 
 | 
   456  | 
not try to make it more formal?
  | 
| 
 | 
   457  | 
  | 
| 
 | 
   458  | 
TYPOS
  | 
| 
 | 
   459  | 
p.2, "could to modify"
  | 
| 
 | 
   460  | 
p.2, "they critical"
  | 
| 
 | 
   461  | 
p.6, "attempt interpret"
  | 
| 
 | 
   462  | 
p.10, "Operations or combinators"
  | 
| 
 | 
   463  | 
p.10, "the remainders of the input"
  | 
| 
 | 
   464  | 
p.12, "is base case"
  | 
| 
 | 
   465  | 
p.13, "able parse"
  | 
| 
 | 
   466  | 
p.15, "short-circuiting if the first element is nullable": should this be: "is
  | 
| 
 | 
   467  | 
not nullable"?
  | 
| 
 | 
   468  | 
p.16, "43 m" -> "43 ms"
  | 
| 
 | 
   469  | 
p.16, "surpising"
  | 
| 
 | 
   470  | 
p.17, "they extent"
  | 
| 
 | 
   471  | 
PROS AND CONS
  | 
| 
 | 
   472  | 
+ interesting and possibly novel idea
  | 
| 
 | 
   473  | 
- no time/space complexity analysis
  | 
| 
 | 
   474  | 
- implementation not clearly described
  | 
| 
 | 
   475  | 
- case in favor of the idea not clearly made
  | 
| 
 | 
   476  | 
- comparison with related work should be more thorough
  | 
| 
 | 
   477  | 
REBUTTAL
  | 
| 
 | 
   478  | 
No specific questions.
  |