ProgTutorial/Parsing.thy
changeset 390 8ad407e77ea0
parent 376 76b1b679e845
child 391 ae2f0b40c840
equal deleted inserted replaced
389:4cc6df387ce9 390:8ad407e77ea0
   370 
   370 
   371 @{ML_response [display,gray]
   371 @{ML_response [display,gray]
   372 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   372 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   373 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
   373 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
   374 
   374 
   375   (FIXME: In which situations is this useful? Give examples.) 
   375   \footnote{\bf FIXME: In which situations is @{text "lift"} useful? Give examples.} 
       
   376 
       
   377   Be aware of recursive parsers. Suppose you want to read strings
       
   378   separated by commas and by parentheses into a tree. We assume 
       
   379   the trees are represented by the datatype:
       
   380 *}
       
   381 
       
   382 ML{*datatype tree = 
       
   383     Lf of string 
       
   384   | Br of tree * tree*}
       
   385 
       
   386 text {*
       
   387   Since nested parentheses should be treated in a meaningful way---for example
       
   388   the string @{text [quotes] "((A))"} should be read into a single 
       
   389   leaf---you might implement the following parser.
       
   390 *}
       
   391 
       
   392 ML{*fun parse_basic s = 
       
   393   $$ s >> Lf 
       
   394   || $$ "(" |-- parse_tree s --| $$ ")"  
       
   395 and parse_tree s = 
       
   396   parse_basic s --| $$ "," -- parse_tree s >> Br 
       
   397   || parse_basic s*}
       
   398 
       
   399 text {*
       
   400   The parameter @{text "s"} is the string over which the tree is parsed. The
       
   401   parser @{ML parse_basic} reads either a leaf or a tree enclosed in
       
   402   parentheses. The parser @{ML parse_tree} reads either a pair of trees
       
   403   separated by a comma, or acts like @{ML parse_basic}. Unfortunately,
       
   404   because of the mutual recursion, this parser will immediately run into a
       
   405   loop, even if it is called without any input. For example
       
   406 
       
   407   @{ML_response_fake_both [display, gray]
       
   408   "parse_tree \"A\""
       
   409   "*** Exception- TOPLEVEL_ERROR raised"}
       
   410 
       
   411   raises an exception indicating that the stack limit is reached. Such
       
   412   looping parser are not useful at all, because of ML's strict evaluation of
       
   413   arguments. Therefore we need to delay the execution of the
       
   414   parser until an input is given. This can be done by adding the parsed
       
   415   string as an explicit argument.
       
   416 *}
       
   417 
       
   418 ML{*fun parse_basic s xs = 
       
   419   ($$ s >> Lf 
       
   420    || $$ "(" |-- parse_tree s --| $$ ")") xs 
       
   421 and parse_tree s xs = 
       
   422   (parse_basic s --| $$ "," -- parse_tree s >> Br 
       
   423    || parse_basic s) xs*}
       
   424 
       
   425 text {*
       
   426   While the type of the parser is unchanged by the addition, its behaviour 
       
   427   changed: with this version of the parser the execution is delayed until 
       
   428   some string is  applied for the argument @{text "xs"}. This gives us 
       
   429   exactly the parser what we wanted. An example is as follows:
       
   430 
       
   431   @{ML_response [display, gray]
       
   432   "let
       
   433   val input = Symbol.explode \"(A,((A))),A\"
       
   434 in
       
   435   Scan.finite Symbol.stopper (parse_tree \"A\") input
       
   436 end"
       
   437   "(Br (Br (Lf \"A\", Lf \"A\"), Lf \"A\"), [])"}
       
   438 
   376 
   439 
   377   \begin{exercise}\label{ex:scancmts}
   440   \begin{exercise}\label{ex:scancmts}
   378   Write a parser that parses an input string so that any comment enclosed
   441   Write a parser that parses an input string so that any comment enclosed
   379   within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within
   442   within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within
   380   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
   443   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the