ProgTutorial/Parsing.thy
changeset 397 6b423b39cc11
parent 394 0019ebf76e10
child 414 5fc2fb34c323
equal deleted inserted replaced
396:a2e49e0771b3 397:6b423b39cc11
   372 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   372 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   373 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
   373 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
   374 
   374 
   375   \footnote{\bf FIXME: In which situations is @{text "lift"} useful? Give examples.} 
   375   \footnote{\bf FIXME: In which situations is @{text "lift"} useful? Give examples.} 
   376 
   376 
   377   Be aware of recursive parsers. Suppose you want to read strings
   377   Be aware of recursive parsers. Suppose you want to read strings separated by
   378   separated by commas and by parentheses into a tree datastructure. 
   378   commas and by parentheses into a tree datastructure; for example, generating
   379   We assume the trees are represented by the datatype:
   379   the tree corresponding to the string @{text [quotes] "(A, A), (A, A)"} where
       
   380   the @{text "A"}s will be the leaves.  We assume the trees are represented by the
       
   381   datatype:
   380 *}
   382 *}
   381 
   383 
   382 ML{*datatype tree = 
   384 ML{*datatype tree = 
   383     Lf of string 
   385     Lf of string 
   384   | Br of tree * tree*}
   386   | Br of tree * tree*}
   388   the string @{text [quotes] "((A))"} should be read into a single 
   390   the string @{text [quotes] "((A))"} should be read into a single 
   389   leaf---you might implement the following parser.
   391   leaf---you might implement the following parser.
   390 *}
   392 *}
   391 
   393 
   392 ML{*fun parse_basic s = 
   394 ML{*fun parse_basic s = 
   393   $$ s >> Lf 
   395   $$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")"  
   394   || $$ "(" |-- parse_tree s --| $$ ")"  
   396 
   395 and parse_tree s = 
   397 and parse_tree s = 
   396   parse_basic s --| $$ "," -- parse_tree s >> Br 
   398   parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s*}
   397   || parse_basic s*}
   399 
   398 
   400 text {*
   399 text {*
   401   This parser corrsponds to the grammar:
       
   402 
       
   403   \begin{center}
       
   404   \begin{tabular}{lcl}
       
   405   @{text "<Basic>"}  & @{text "::="} & @{text "<String> | (<Tree>)"}\\
       
   406   @{text "<Tree>"}   & @{text "::="} & @{text "<Basic>, <Tree> | <Basic>"}\\
       
   407   \end{tabular}
       
   408   \end{center}
       
   409 
   400   The parameter @{text "s"} is the string over which the tree is parsed. The
   410   The parameter @{text "s"} is the string over which the tree is parsed. The
   401   parser @{ML parse_basic} reads either a leaf or a tree enclosed in
   411   parser @{ML parse_basic} reads either a leaf or a tree enclosed in
   402   parentheses. The parser @{ML parse_tree} reads either a pair of trees
   412   parentheses. The parser @{ML parse_tree} reads either a pair of trees
   403   separated by a comma, or acts like @{ML parse_basic}. Unfortunately,
   413   separated by a comma, or acts like @{ML parse_basic}. Unfortunately,
   404   because of the mutual recursion, this parser will immediately run into a
   414   because of the mutual recursion, this parser will immediately run into a
   410 
   420 
   411   raises an exception indicating that the stack limit is reached. Such
   421   raises an exception indicating that the stack limit is reached. Such
   412   looping parser are not useful, because of ML's strict evaluation of
   422   looping parser are not useful, because of ML's strict evaluation of
   413   arguments. Therefore we need to delay the execution of the
   423   arguments. Therefore we need to delay the execution of the
   414   parser until an input is given. This can be done by adding the parsed
   424   parser until an input is given. This can be done by adding the parsed
   415   string as an explicit argument.
   425   string as an explicit argument. So the parser above should be implemented
       
   426   as follows.
   416 *}
   427 *}
   417 
   428 
   418 ML{*fun parse_basic s xs = 
   429 ML{*fun parse_basic s xs = 
   419   ($$ s >> Lf 
   430   ($$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")") xs 
   420    || $$ "(" |-- parse_tree s --| $$ ")") xs 
   431 
   421 and parse_tree s xs = 
   432 and parse_tree s xs = 
   422   (parse_basic s --| $$ "," -- parse_tree s >> Br 
   433   (parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s) xs*}
   423    || parse_basic s) xs*}
       
   424 
   434 
   425 text {*
   435 text {*
   426   While the type of the parser is unchanged by the addition, its behaviour 
   436   While the type of the parser is unchanged by the addition, its behaviour 
   427   changed: with this version of the parser the execution is delayed until 
   437   changed: with this version of the parser the execution is delayed until 
   428   some string is  applied for the argument @{text "xs"}. This gives us 
   438   some string is  applied for the argument @{text "xs"}. This gives us