CookBook/Parsing.thy
changeset 53 0c3580c831a4
parent 52 a04bdee4fb1e
child 54 1783211b3494
--- a/CookBook/Parsing.thy	Fri Nov 28 05:56:28 2008 +0100
+++ b/CookBook/Parsing.thy	Sat Nov 29 21:20:18 2008 +0000
@@ -94,7 +94,7 @@
 
   Parsers that explore 
   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
-  parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, 
+  parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, 
   otherwise it returns the result of @{ML_text "q"}. For example
 
 @{ML_response [display] 
@@ -121,7 +121,7 @@
 end"
   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
 
-  The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
+  The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   @{ML_text "p"}, if it succeeds; otherwise it returns 
   the default value @{ML_text "x"}. For example
 
@@ -152,14 +152,14 @@
   followed by @{ML_text q}, or start a completely different parser @{ML_text r},
   one might write
 
-  @{ML_open [display] "(p -- q) || r" for p q r}
+  @{ML [display] "(p -- q) || r" for p q r}
 
   However, this parser is problematic for producing an appropriate error message, in case
-  the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser
+  the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser
   above the information 
   that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in
   which @{ML_text p} 
-  is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail 
+  is present in the input, but not @{ML_text q}. That means @{ML "(p -- q)" for p q} will fail 
   and the 
   alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong
   parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then 
@@ -191,7 +191,7 @@
   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
   (FIXME: give reference to later place). 
   
-  Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
+  Returning to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want
   to generate the correct error message for p-followed-by-q, then
   we have to write:
 *}
@@ -219,7 +219,7 @@
   
   yields the expected parsing. 
 
-  The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as 
+  The function @{ML "Scan.repeat p" for p} will apply a parser @{ML_text p} as 
   often as it succeeds. For example
   
   @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
@@ -241,7 +241,7 @@
   in practise, because it is already done by the infrastructure for you. 
 
   After parsing succeeded, one nearly always wants to apply a function on the parsed 
-  items. This is done using the function @{ML_open "(p >> f)" for p f} which runs 
+  items. This is done using the function @{ML "(p >> f)" for p f} which runs 
   first the parser @{ML_text p} and upon successful completion applies the 
   function @{ML_text f} to the result. For example
 
@@ -281,14 +281,17 @@
   \end{readmore}
 
   The structure @{ML_struct OuterLex} defines several kinds of token (for example 
-  @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and
-  @{ML "OuterLex.Command"} for commands). Some token parsers take into account the 
+  @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and
+  @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
   kind of token.
-  
-  We can generate a token list out of a string using the function @{ML
-  "OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument
-  since, at the moment, we are not interested in generating precise error
-  messages. The following
+*}  
+
+text {*
+  For the examples below, we can generate a token list out of a string using
+  the function @{ML "OuterSyntax.scan"}, which we give below @{ML
+  "Position.none"} as argument since, at the moment, we are not interested in
+  generating precise error messages. The following
+
 
 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" 
 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
@@ -302,7 +305,7 @@
   the token.} The second indicates a space.
 
   Many parsing functions later on will require spaces, comments and the like
-  to have been filtered out.  In what follows below, we are going to use the 
+  to have already been filtered out.  So from now on we are going to use the 
   functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example
 
 
@@ -367,20 +370,20 @@
 end"
 "((\"|\",\"in\"),[])"}
 
-  The parser @{ML_open "OuterParse.enum s p" for s p} parses a possibly empty 
-  list of items recognised by the parser @{ML_text p}, where the items are 
-  separated by the string @{ML_text s}. For example
+  The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
+  list of items recognised by the parser @{ML_text p}, where the items being parsed
+  are separated by the string @{ML_text s}. For example
 
 @{ML_response [display]
 "let 
-  val input = filtered_input \"in | in | in end\"
+  val input = filtered_input \"in | in | in foo\"
 in 
   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end" 
 "([\"in\",\"in\",\"in\"],[\<dots>])"}
 
   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
-  be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end
+  be non-empty. Note that we had to add an @{ML_text [quotes] "foo"} at the end
   of the parsed string, otherwise the parser would have consumed all tokens
   and then failed with the exception @{ML_text "MORE"}. Like in the previous
   section, we can avoid this exception using the wrapper @{ML
@@ -391,10 +394,21 @@
 "let 
   val input = filtered_input \"in | in | in\"
 in 
-  Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
+  Scan.finite OuterLex.stopper 
+         (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end" 
 "([\"in\",\"in\",\"in\"],[])"}
 
+  The following function will help us to run examples
+
+*}
+
+ML {*
+fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input
+*}
+
+text {*
+
   The function @{ML "OuterParse.!!!"} can be used to force termination of the
   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), 
   except that the error message is fixed to be @{text [quotes] "Outer syntax error"}
@@ -405,14 +419,22 @@
   val input = filtered_input \"in |\"
   val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
 in 
-  Scan.error (OuterParse.!!! parse_bar_then_in) input 
+  parse (OuterParse.!!! parse_bar_then_in) input 
 end"
 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
 but keyword in was found\" raised"
 }
 
+  \begin{exercise}
+  A type-identifier, for example @{typ "'a"}, is a token of 
+  kind @{ML "Keyword" in OuterLex} can be parsed 
+
+  \end{exercise}
+
+
 *}
 
+
 section {* Positional Information *}
 
 text {*