CookBook/Parsing.thy
changeset 42 cd612b489504
parent 41 b11653b11bd3
child 43 02f76f1b6e7b
--- a/CookBook/Parsing.thy	Mon Oct 20 06:22:11 2008 +0000
+++ b/CookBook/Parsing.thy	Mon Oct 27 18:48:52 2008 +0100
@@ -3,6 +3,7 @@
 
 begin
 
+
 chapter {* Parsing *}
 
 text {*
@@ -15,16 +16,19 @@
   is driven by priority grammars. Parsers for outer syntax are built up by functional
   parsing combinators. These combinators are a well-established technique for parsing, 
   which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
-  
-  Isabelle developers are usually concerned with writing parsers for outer
-  syntax, either for new definitional packages or for calling tactics. The library 
-  for writing such parsers in can roughly be split up into two parts. 
+  Isabelle developers are usually concerned with writing these outer syntax parsers, 
+  either for new definitional packages or for calling tactics with specific arguments. 
+
+  \begin{readmore}
+  The library 
+  for writing parser combinators can be split up, roughly, into two parts. 
   The first part consists of a collection of generic parser combinators defined
   in the structure @{ML_struct Scan} in the file 
   @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of 
   combinators for dealing with specific token types, which are defined in the 
   structure @{ML_struct OuterParse} in the file 
   @{ML_file "Pure/Isar/outer_parse.ML"}.
+  \end{readmore}
 
 *}
 
@@ -50,6 +54,8 @@
   will raise the exception @{ML_text "FAIL"}.
   There are three exceptions used in the parsing combinators:
 
+  (FIXME: do the exception need to be explained, because the user cannot use them from ``outside''?)
+
   \begin{itemize}
   \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing 
   might be explored. 
@@ -86,7 +92,7 @@
 
   Parsers that explore 
   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
-  parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
+  parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, 
   otherwise it returns the result of @{ML_text "q"}. For example
 
 @{ML_response [display] 
@@ -135,24 +141,27 @@
   @{ML_open [display] "(p -- q) || r" for p q r}
 
   However, this parser is problematic for producing an appropriate error message, in case
-  the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information 
-  that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} 
-  is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the 
+  the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser
+  above the information 
+  that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in
+  which @{ML_text p} 
+  is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail 
+  and the 
   alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong
-  parser for the input and therefore will fail. The error message is then caused by the
-  failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
-  can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of
+  parser for the input ``p-followed-by-q'' and therefore will fail. The error message is then caused by the
+  failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation
+  can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of
   parsing in case of failure and invokes an error message. For example if we invoke the parser
   
   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
 
-  on @{ML_text "hello"}, the parsing succeeds
+  on @{ML_text [quotes] "hello"}, the parsing succeeds
 
   @{ML_response [display] 
                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
 
-  but if we invoke it on @{ML_text "world"}
+  but if we invoke it on @{ML_text [quotes] "world"}
   
   @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
                                "Exception ABORT raised"}
@@ -165,14 +174,14 @@
                                "Exception Error \"foo\" raised"}
 
   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
-  (FIXME: give reference to late). 
+  (FIXME: give reference to later place). 
   
   Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
-  to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
+  to generate the correct error message for p-followed-by-q, then
   we have to write
 *}
 
-ML {*
+ML {* 
   fun p_followed_by_q p q r =
   let 
      val err = (fn _ => p ^ " is not followed by " ^ q)
@@ -240,12 +249,43 @@
   This is because the parsers for the theory syntax, as well as the parsers for the 
   argument syntax of proof methods and attributes, use the type 
   @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}).
+
+  \begin{readmore}
   The parser functions for the theory syntax are contained in the structure
-  @{ML_struct OuterParse} defined in the file\linebreak @{ML_file  "Pure/Isar/outer_parse.ML"}.
+  @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
+  The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
+  \end{readmore}
+*}
+
+ML {* OuterLex.mk_text "hello" *}
+
+ML {* print_depth 50 *}
+
+ML {* 
+  let open OuterLex in 
+  OuterSyntax.scan Position.none "for" 
+  end;
+
 *}
 
 ML {* map OuterLex.mk_text (explode "hello") *} 
 
+ML {*
+
+fun my_scan lex pos str =
+   Source.of_string str
+   |> Symbol.source {do_recover = false}
+   |> OuterLex.source {do_recover = SOME false}
+        (fn () => pairself (Scan.make_lexicon o map Symbol.explode) lex) pos
+   |> Source.exhaust;
+
+*}
+
+ML {*
+let val toks = my_scan (["hello"], []) Position.none "hello"
+in (OuterParse.$$$ "hello") toks end
+*}
+
 text {* 
 
   @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")"