370 |
370 |
371 @{ML_response [display,gray] |
371 @{ML_response [display,gray] |
372 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
372 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
373 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
373 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
374 |
374 |
375 (FIXME: In which situations is this useful? Give examples.) |
375 \footnote{\bf FIXME: In which situations is @{text "lift"} useful? Give examples.} |
|
376 |
|
377 Be aware of recursive parsers. Suppose you want to read strings |
|
378 separated by commas and by parentheses into a tree. We assume |
|
379 the trees are represented by the datatype: |
|
380 *} |
|
381 |
|
382 ML{*datatype tree = |
|
383 Lf of string |
|
384 | Br of tree * tree*} |
|
385 |
|
386 text {* |
|
387 Since nested parentheses should be treated in a meaningful way---for example |
|
388 the string @{text [quotes] "((A))"} should be read into a single |
|
389 leaf---you might implement the following parser. |
|
390 *} |
|
391 |
|
392 ML{*fun parse_basic s = |
|
393 $$ s >> Lf |
|
394 || $$ "(" |-- parse_tree s --| $$ ")" |
|
395 and parse_tree s = |
|
396 parse_basic s --| $$ "," -- parse_tree s >> Br |
|
397 || parse_basic s*} |
|
398 |
|
399 text {* |
|
400 The parameter @{text "s"} is the string over which the tree is parsed. The |
|
401 parser @{ML parse_basic} reads either a leaf or a tree enclosed in |
|
402 parentheses. The parser @{ML parse_tree} reads either a pair of trees |
|
403 separated by a comma, or acts like @{ML parse_basic}. Unfortunately, |
|
404 because of the mutual recursion, this parser will immediately run into a |
|
405 loop, even if it is called without any input. For example |
|
406 |
|
407 @{ML_response_fake_both [display, gray] |
|
408 "parse_tree \"A\"" |
|
409 "*** Exception- TOPLEVEL_ERROR raised"} |
|
410 |
|
411 raises an exception indicating that the stack limit is reached. Such |
|
412 looping parser are not useful at all, because of ML's strict evaluation of |
|
413 arguments. Therefore we need to delay the execution of the |
|
414 parser until an input is given. This can be done by adding the parsed |
|
415 string as an explicit argument. |
|
416 *} |
|
417 |
|
418 ML{*fun parse_basic s xs = |
|
419 ($$ s >> Lf |
|
420 || $$ "(" |-- parse_tree s --| $$ ")") xs |
|
421 and parse_tree s xs = |
|
422 (parse_basic s --| $$ "," -- parse_tree s >> Br |
|
423 || parse_basic s) xs*} |
|
424 |
|
425 text {* |
|
426 While the type of the parser is unchanged by the addition, its behaviour |
|
427 changed: with this version of the parser the execution is delayed until |
|
428 some string is applied for the argument @{text "xs"}. This gives us |
|
429 exactly the parser what we wanted. An example is as follows: |
|
430 |
|
431 @{ML_response [display, gray] |
|
432 "let |
|
433 val input = Symbol.explode \"(A,((A))),A\" |
|
434 in |
|
435 Scan.finite Symbol.stopper (parse_tree \"A\") input |
|
436 end" |
|
437 "(Br (Br (Lf \"A\", Lf \"A\"), Lf \"A\"), [])"} |
|
438 |
376 |
439 |
377 \begin{exercise}\label{ex:scancmts} |
440 \begin{exercise}\label{ex:scancmts} |
378 Write a parser that parses an input string so that any comment enclosed |
441 Write a parser that parses an input string so that any comment enclosed |
379 within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within |
442 within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within |
380 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |
443 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |