CookBook/Parsing.thy
changeset 41 b11653b11bd3
parent 40 35e1dff0d9bb
child 42 cd612b489504
--- a/CookBook/Parsing.thy	Fri Oct 17 17:41:34 2008 -0400
+++ b/CookBook/Parsing.thy	Mon Oct 20 06:22:11 2008 +0000
@@ -30,7 +30,6 @@
 
 section {* Building Up Generic Parsers *}
 
-
 text {*
 
   Let us first have a look at parsing strings using generic parsing combinators. 
@@ -43,22 +42,25 @@
   @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
 
   This function will either succeed (as in the two examples above) or raise the exeption 
-  @{ML_text "FAIL"} if no string can be consumed. For example in the next example
-  try to parse: 
+  @{ML_text "FAIL"} if no string can be consumed. For example trying to parse
 
-  @{ML_text [display] "($$ \"x\") (explode \"world\")"}
-
+  @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" 
+                               "Exception FAIL raised"}
+  
+  will raise the exception @{ML_text "FAIL"}.
   There are three exceptions used in the parsing combinators:
 
-  (FIXME: describe exceptions)
-
   \begin{itemize}
-  \item @{ML_text "FAIL"}
-  \item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"}
-  \item @{ML_text "ABORT"} dead end
+  \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing 
+  might be explored. 
+  \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example 
+  in @{ML_text "($$ \"h\") []"}.
+  \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. 
+  It is used for example in the function @{ML "(op !!)"} (see below).
   \end{itemize}
 
-  Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it 
+
+  Slightly more general than the parser @{ML "(op $$)"} is the function @{ML Scan.one}, in that it 
   takes a predicate as argument and then parses exactly one item from the input list 
   satisfying this prediate. For example the following parser either consumes an @{ML_text "h"}
   or a @{ML_text "w"}:
@@ -74,7 +76,8 @@
     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
 
   Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. 
-  For example   
+  For example parsing @{ML_text "h"}, @{ML_text "e"} and @{ML_text "l"} in this 
+  sequence can be achieved by
 
   @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
@@ -83,7 +86,7 @@
 
   Parsers that explore 
   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
-  parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
+  parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
   otherwise it returns the result of @{ML_text "q"}. For example
 
 @{ML_response [display] 
@@ -131,15 +134,15 @@
 
   @{ML_open [display] "(p -- q) || r" for p q r}
 
-  However, this way is problematic for producing an appropriate error message, in case
+  However, this parser is problematic for producing an appropriate error message, in case
   the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information 
   that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} 
   is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the 
-  alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong
-  parser for the input and therefore probably fail. However, the error message is then caused by the
+  alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong
+  parser for the input and therefore will fail. The error message is then caused by the
   failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
-  can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of
-  parsing and invokes an error message. For example if we invoke the parser
+  can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of
+  parsing in case of failure and invokes an error message. For example if we invoke the parser
   
   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
 
@@ -149,20 +152,22 @@
                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
 
-  In contrast if we invoke it on @{ML_text "world"}
+  but if we invoke it on @{ML_text "world"}
   
-  @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"}
+  @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
+                               "Exception ABORT raised"}
 
   the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
   see the error message properly, we need to prefix the parser with the function 
   @{ML "Scan.error"}. For example
 
-  @{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"}
+  @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"
+                               "Exception Error \"foo\" raised"}
 
   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
-  (FIXME: see below). 
+  (FIXME: give reference to late). 
   
-  Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
+  Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
   to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
   we have to write
 *}
@@ -170,39 +175,42 @@
 ML {*
   fun p_followed_by_q p q r =
   let 
-   val err = (fn _ => p ^ " is not followed by " ^ q)
+     val err = (fn _ => p ^ " is not followed by " ^ q)
   in
-  (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
+    (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
   end
 *}
 
+
 text {*
   Running this parser with
 
-  @{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"}
+  @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
+                               "Exception ERROR \"h is not followed by e\" raised"} 
 
-  gives the correct error message and running it with
+  gives the correct error message. Running it with
  
   @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   
   yields the expected parsing. 
 
-  The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle
+  The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as 
+  often as it succeeds. For example
   
-  @{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
+  @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   
   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
-  @{ML "Scan.repeat1"} is similar, but requires that in @{ML_open  "Scan.repeat1 p" for p}
-  the parse @{ML_text "p"} succeeds at least once.
+  @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} 
+  succeeds at least once.
 *}
  
 text {* 
-  After parsing succeeded, one wants to apply functions on the parsed items. This is
-  done using the function @{ML_open "(p >> f)" for p f} which applies first the 
-  parser @{ML_text p} upon successful completion applies the function @{ML_text f}.
-  For example
+  After parsing succeeded, one nearly always wants to apply a function on the parsed 
+  items. This is done using the function @{ML_open "(p >> f)" for p f} which runs 
+  first the parser @{ML_text p} and upon successful completion applies the 
+  function @{ML_text f} to the result. For example
 
 @{ML_response [display]
 "let 
@@ -212,6 +220,8 @@
 end"
 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
 
+  doubles the two parsed input strings.
+ 
   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   the given parser to the second component of the pair and leaves the  first component 
   untouched. For example
@@ -223,17 +233,38 @@
   (FIXME: In which situations is this useful?) 
 *}
 
-section {* Parsing Tokens *}
+section {* Parsing Theory Syntax *}
 
 text {*
-  Most of the time, however, we will have to deal with tokens that are not just strings.
-  The parsers for the theory syntax, as well as the parsers for the argument syntax
-  of proof methods and attributes use the token type @{ML_type OuterParse.token},
-  which is identical to @{ML_type OuterLex.token}.
+  Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
+  This is because the parsers for the theory syntax, as well as the parsers for the 
+  argument syntax of proof methods and attributes, use the type 
+  @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}).
   The parser functions for the theory syntax are contained in the structure
-  @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.
+  @{ML_struct OuterParse} defined in the file\linebreak @{ML_file  "Pure/Isar/outer_parse.ML"}.
+*}
+
+ML {* map OuterLex.mk_text (explode "hello") *} 
+
+text {* 
+
+  @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")"  
+       "[Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>)]"} 
+
 *}
 
+ML {*
+  OuterLex.mk_text "hello"
+*}
+
+ML {*
+
+  let val input = [OuterLex.mk_text "hello", OuterLex.mk_text "world"];
+  in (Scan.one (fn _ => true)) input end
+
+*}
+
+
 
 chapter {* Parsing *}