CookBook/Parsing.thy
changeset 50 3d4b49921cdb
parent 49 a0edabf14457
child 52 a04bdee4fb1e
--- a/CookBook/Parsing.thy	Mon Nov 24 02:51:08 2008 +0100
+++ b/CookBook/Parsing.thy	Tue Nov 25 05:19:27 2008 +0000
@@ -21,7 +21,7 @@
 
   \begin{readmore}
   The library 
-  for writing parser combinators can be split up, roughly, into two parts. 
+  for writing parser combinators is split up, roughly, into two parts. 
   The first part consists of a collection of generic parser combinators defined
   in the structure @{ML_struct Scan} in the file 
   @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of 
@@ -37,7 +37,7 @@
 text {*
 
   Let us first have a look at parsing strings using generic parsing combinators. 
-  The function @{ML "(op $$)"} takes a string and will ``consume'' this string from 
+  The function @{ML "(op $$)"} takes a string as argument and will ``consume'' this string from 
   a given input list of strings. ``Consume'' in this context means that it will 
   return a pair consisting of this string and the rest of the input list. 
   For example:
@@ -63,7 +63,7 @@
   It is used for example in the function @{ML "(op !!)"} (see below).
   \end{itemize}
 
-  However, note that these exception private to the parser and cannot be accessed
+  However, note that these exceptions are private to the parser and cannot be accessed
   by the programmer (for example to handle them).
   
   Slightly more general than the parser @{ML "(op $$)"} is the function @{ML
@@ -108,7 +108,7 @@
   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
 
   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion 
-  for parsers, except that they discard the item parsed by the first (respectively second)
+  for parsers, except that they discard the item being parsed by the first (respectively second)
   parser. For example
   
 @{ML_response [display]
@@ -136,7 +136,16 @@
  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
 
   The function @{ML Scan.option} works similarly, except no default value can
-  be given and in the failure case this parser does nothing.
+  be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
+
+@{ML_response [display]
+"let 
+  val p = Scan.option ($$ \"h\")
+  val input1 = (explode \"hello\")
+  val input2 = (explode \"world\")  
+in 
+  (p input1, p input2)
+end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
 
   The function @{ML "(op !!)"} helps to produce appropriate error messages
   during parsing. For example if one wants to parse that @{ML_text p} is immediately 
@@ -157,7 +166,7 @@
   caused by the
   failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation
   can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of
-  parsing in case of failure and invokes an error message. For example if we invoke the parser
+  parsing in case of a failure and invokes an error message. For example if we invoke the parser
   
   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
 
@@ -184,7 +193,7 @@
   
   Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
   to generate the correct error message for p-followed-by-q, then
-  we have to write, for example
+  we have to write:
 *}
 
 ML {* 
@@ -221,14 +230,15 @@
   succeeds at least once.
 
   Also note that the parser would have aborted with the exception @{ML_text MORE}, if
-  we had run it only on just @{ML_text [quotes] "hhhh"}. This can be awoided using
+  we had run it only on just @{ML_text [quotes] "hhhh"}. This can be avoided using
   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
   them we can write
   
   @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
 
-  However, the Isabelle-develloper rarely needs to do this kind of wrapping manually. 
+  However, this kind of manually wrapping needs to be done only very rarely
+  in practise, because it is already done by the infrastructure for you. 
 
   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   items. This is done using the function @{ML_open "(p >> f)" for p f} which runs 
@@ -272,31 +282,58 @@
 
   The structure @{ML_struct OuterLex} defines several kinds of token (for example 
   @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and
-  @{ML "OuterLex.Command"} for commands). Some parsers take into account the 
+  @{ML "OuterLex.Command"} for commands). Some token parsers take into account the 
   kind of token.
   
-  We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give
-  below @{ML "Position.none"} as argument since, at the moment, we are not interested in
-  generating precise error messages. The following\footnote{There is something funny 
-  going on with the pretty printing of the result token list.}
+  We can generate a token list out of a string using the function @{ML
+  "OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument
+  since, at the moment, we are not interested in generating precise error
+  messages. The following
 
 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" 
-"[Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), 
- Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), 
- Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"}
+"[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
+ Token (\<dots>,(Space, \" \"),\<dots>), 
+ Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
+
+  produces three tokens where the first and the last are identifiers, since
+  @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match any
+  other syntactic category.\footnote{Note that because of a possible a bug in
+  the PolyML runtime system the result is printed as @{text "?"}, instead of
+  the token.} The second indicates a space.
+
+  Many parsing functions later on will require spaces, comments and the like
+  to have been filtered out.  In what follows below, we are going to use the 
+  functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example
+
 
-  produces three tokens where the first and the last are identifiers, since 
-  @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match 
-  any other category. The second indicates a space. Many parsing functions 
-  later on will require spaces, comments and the like to have been filtered out.
+@{ML_response_fake [display]
+"let
+   val input = OuterSyntax.scan Position.none \"hello world\"
+in
+   filter OuterLex.is_proper input
+end" 
+"[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
+
+  For convenience we are going to use the function
+
+*}
+
+ML {* 
+fun filtered_input str = 
+       filter OuterLex.is_proper (OuterSyntax.scan Position.none str) 
+*}
+
+text {*
+
   If we parse
 
-@{ML_response_fake [display] "OuterSyntax.scan Position.none \"inductive|for\"" 
-"[Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), 
- Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), 
- Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"}
+@{ML_response_fake [display] 
+"filtered_input \"inductive | for\"" 
+"[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
+ Token (\<dots>,(Keyword, \"|\"),\<dots>), 
+ Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
 
-  we obtain a list consiting of only command and keyword tokens.
+  we obtain a list consiting of only a command and two keyword tokens.
   If you want to see which keywords and commands are currently known, use
   the following (you might have to adjust the @{ML print_depth} in order to
   see the complete list):
@@ -310,11 +347,11 @@
 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
 
   Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example
- 
+
 @{ML_response [display]
 "let 
-  val input1 = OuterSyntax.scan Position.none \"where for\"
-  val input2 = OuterSyntax.scan Position.none \"|in\"
+  val input1 = filtered_input \"where for\"
+  val input2 = filtered_input \"| in\"
 in 
   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
 end"
@@ -324,7 +361,7 @@
 
 @{ML_response [display]
 "let 
-  val input = OuterSyntax.scan Position.none \"|in\"
+  val input = filtered_input \"| in\"
 in 
   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
 end"
@@ -336,24 +373,23 @@
 
 @{ML_response [display]
 "let 
-  val input = OuterSyntax.scan Position.none \"in|in|in\\n\"
+  val input = filtered_input \"in | in | in end\"
 in 
   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end" 
 "([\"in\",\"in\",\"in\"],[\<dots>])"}
 
-  @{ML "OuterParse.enum1"} works similarly,  except that the parsed list must 
-  be non-empty.
-
-  Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed
-  string, otherwise the parser would have consumed all tokens and then failed with
-  the exception @{ML_text "MORE"}. Like in the previous section, we can avoid
-  this exception using the wrapper @{ML Scan.finite}. This time, however, we
-  have to use the ``stopper-token'' @{ML OuterLex.stopper}. We can write
+  @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
+  be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end
+  of the parsed string, otherwise the parser would have consumed all tokens
+  and then failed with the exception @{ML_text "MORE"}. Like in the previous
+  section, we can avoid this exception using the wrapper @{ML
+  Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
+  OuterLex.stopper}. We can write
 
 @{ML_response [display]
 "let 
-  val input = OuterSyntax.scan Position.none \"in|in|in\"
+  val input = filtered_input \"in | in | in\"
 in 
   Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end" 
@@ -366,7 +402,7 @@
 
 @{ML_response_fake [display]
 "let 
-  val input = OuterSyntax.scan Position.none \"in|\"
+  val input = filtered_input \"in |\"
   val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
 in 
   Scan.error (OuterParse.!!! parse_bar_then_in) input 
@@ -377,15 +413,6 @@
 
 *}
 
-
-ML {*
-let 
-  val input = filter OuterLex.is_proper (OuterSyntax.scan Position.none "(in foo)")
-in 
-  OuterParse.target input
-end
-*}
-
 section {* Positional Information *}
 
 text {*
@@ -410,6 +437,19 @@
 
 *}
 
+ML {*
+  OuterParse.opt_target
+*}
+
+ML {*
+  OuterParse.opt_target --
+  OuterParse.fixes -- 
+  OuterParse.for_fixes --
+  Scan.optional (OuterParse.$$$ "where" |--
+    OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []
+
+*}
+
 text {* (FIXME funny output for a proposition) *}