ProgTutorial/Parsing.thy
changeset 390 8ad407e77ea0
parent 376 76b1b679e845
child 391 ae2f0b40c840
--- a/ProgTutorial/Parsing.thy	Sun Nov 15 13:47:31 2009 +0100
+++ b/ProgTutorial/Parsing.thy	Tue Nov 17 18:40:11 2009 +0100
@@ -372,7 +372,70 @@
 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
 
-  (FIXME: In which situations is this useful? Give examples.) 
+  \footnote{\bf FIXME: In which situations is @{text "lift"} useful? Give examples.} 
+
+  Be aware of recursive parsers. Suppose you want to read strings
+  separated by commas and by parentheses into a tree. We assume 
+  the trees are represented by the datatype:
+*}
+
+ML{*datatype tree = 
+    Lf of string 
+  | Br of tree * tree*}
+
+text {*
+  Since nested parentheses should be treated in a meaningful way---for example
+  the string @{text [quotes] "((A))"} should be read into a single 
+  leaf---you might implement the following parser.
+*}
+
+ML{*fun parse_basic s = 
+  $$ s >> Lf 
+  || $$ "(" |-- parse_tree s --| $$ ")"  
+and parse_tree s = 
+  parse_basic s --| $$ "," -- parse_tree s >> Br 
+  || parse_basic s*}
+
+text {*
+  The parameter @{text "s"} is the string over which the tree is parsed. The
+  parser @{ML parse_basic} reads either a leaf or a tree enclosed in
+  parentheses. The parser @{ML parse_tree} reads either a pair of trees
+  separated by a comma, or acts like @{ML parse_basic}. Unfortunately,
+  because of the mutual recursion, this parser will immediately run into a
+  loop, even if it is called without any input. For example
+
+  @{ML_response_fake_both [display, gray]
+  "parse_tree \"A\""
+  "*** Exception- TOPLEVEL_ERROR raised"}
+
+  raises an exception indicating that the stack limit is reached. Such
+  looping parser are not useful at all, because of ML's strict evaluation of
+  arguments. Therefore we need to delay the execution of the
+  parser until an input is given. This can be done by adding the parsed
+  string as an explicit argument.
+*}
+
+ML{*fun parse_basic s xs = 
+  ($$ s >> Lf 
+   || $$ "(" |-- parse_tree s --| $$ ")") xs 
+and parse_tree s xs = 
+  (parse_basic s --| $$ "," -- parse_tree s >> Br 
+   || parse_basic s) xs*}
+
+text {*
+  While the type of the parser is unchanged by the addition, its behaviour 
+  changed: with this version of the parser the execution is delayed until 
+  some string is  applied for the argument @{text "xs"}. This gives us 
+  exactly the parser what we wanted. An example is as follows:
+
+  @{ML_response [display, gray]
+  "let
+  val input = Symbol.explode \"(A,((A))),A\"
+in
+  Scan.finite Symbol.stopper (parse_tree \"A\") input
+end"
+  "(Br (Br (Lf \"A\", Lf \"A\"), Lf \"A\"), [])"}
+
 
   \begin{exercise}\label{ex:scancmts}
   Write a parser that parses an input string so that any comment enclosed