more on the parsing section
authorChristian Urban <urbanc@in.tum.de>
Fri, 17 Oct 2008 17:41:34 -0400
changeset 40 35e1dff0d9bb
parent 39 631d12c25bde
child 41 b11653b11bd3
more on the parsing section
CookBook/FirstSteps.thy
CookBook/Parsing.thy
cookbook.pdf
--- a/CookBook/FirstSteps.thy	Fri Oct 17 05:02:04 2008 -0400
+++ b/CookBook/FirstSteps.thy	Fri Oct 17 17:41:34 2008 -0400
@@ -5,7 +5,6 @@
 
 chapter {* First Steps *}
 
-
 text {* 
   Isabelle programming is done in Standard ML.
   Just like lemmas and proofs, code in Isabelle is part of a 
@@ -50,7 +49,7 @@
   then Isabelle's undo operation has no effect on the definition of 
   @{ML_text "foo"}. Note that from now on we will drop the 
   \isacommand{ML} @{ML_text "{"}@{ML_text "* \<dots> *"}@{ML_text "}"} whenever
-  we show code.
+  we show code and its response.
 
   Once a portion of code is relatively stable, one usually wants to 
   export it to a separate ML-file. Such files can then be included in a 
@@ -572,7 +571,7 @@
       ml_val ys ("let open " ^ space_implode " " xs ^ " in " ^ txt ^ " end");
 
 fun ml_pat (rhs, pat) =
-  let val pat' = implode (map (fn "\\<dots>" => "_" | s => s)
+  let val pat' = implode (map (fn "\<dots>" => "_" | s => s)
     (Symbol.explode pat))
   in
     "val " ^ pat' ^ " = " ^ rhs
--- a/CookBook/Parsing.thy	Fri Oct 17 05:02:04 2008 -0400
+++ b/CookBook/Parsing.thy	Fri Oct 17 17:41:34 2008 -0400
@@ -50,12 +50,12 @@
 
   There are three exceptions used in the parsing combinators:
 
-  (FIXME: describe)
+  (FIXME: describe exceptions)
 
   \begin{itemize}
   \item @{ML_text "FAIL"}
-  \item @{ML_text "MORE"}
-  \item @{ML_text "ABORT"}
+  \item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"}
+  \item @{ML_text "ABORT"} dead end
   \end{itemize}
 
   Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it 
@@ -64,7 +64,8 @@
   or a @{ML_text "w"}:
 
 @{ML_response [display] 
-"let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
+"let 
+  val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
   val input1 = (explode \"hello\")
   val input2 = (explode \"world\")
 in
@@ -86,7 +87,8 @@
   otherwise it returns the result of @{ML_text "q"}. For example
 
 @{ML_response [display] 
-"let val hw = ($$ \"h\") || ($$ \"w\")
+"let 
+  val hw = ($$ \"h\") || ($$ \"w\")
   val input1 = (explode \"hello\")
   val input2 = (explode \"world\")
 in
@@ -94,13 +96,13 @@
 end"
   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
 
-  will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}.
   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion 
   for parsers, except that they discard the item parsed by the first (respectively second)
   parser. For example
   
 @{ML_response [display]
-"let val just_h = ($$ \"h\") |-- ($$ \"e\") 
+"let 
+  val just_h = ($$ \"h\") |-- ($$ \"e\") 
   val just_e = ($$ \"h\") --| ($$ \"e\") 
   val input = (explode \"hello\")  
 in 
@@ -113,7 +115,8 @@
   the default value @{ML_text "x"}. For example
 
 @{ML_response [display]
-"let val p = Scan.optional ($$ \"h\") \"x\"
+"let 
+  val p = Scan.optional ($$ \"h\") \"x\"
   val input1 = (explode \"hello\")
   val input2 = (explode \"world\")  
 in 
@@ -122,32 +125,115 @@
  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
 
   The function @{ML "(op !!)"} helps to produce appropriate error messages
-  during parsing. 
+  during parsing. For example if one wants to parse @{ML_text p} immediately 
+  followed by @{ML_text q}, or start a completely different parser @{ML_text r},
+  one might write
+
+  @{ML_open [display] "(p -- q) || r" for p q r}
+
+  However, this way is problematic for producing an appropriate error message, in case
+  the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information 
+  that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} 
+  is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the 
+  alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong
+  parser for the input and therefore probably fail. However, the error message is then caused by the
+  failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
+  can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of
+  parsing and invokes an error message. For example if we invoke the parser
+  
+  @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
+
+  on @{ML_text "hello"}, the parsing succeeds
 
+  @{ML_response [display] 
+                "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
+                "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
+
+  In contrast if we invoke it on @{ML_text "world"}
+  
+  @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"}
+
+  the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
+  see the error message properly, we need to prefix the parser with the function 
+  @{ML "Scan.error"}. For example
+
+  @{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"}
+
+  This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
+  (FIXME: see below). 
+  
+  Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
+  to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
+  we have to write
 *}
 
 ML {*
-  
-  val err_fn = (fn _ => "foo");
-  val p = (!! err_fn ($$ "h"))  || ($$ "w");
-  val input1 = (explode "hello");
-  val input2 = (explode "world");
-*}  
-
-ML {*
-
-  (*Scan.error p input2;*)
+  fun p_followed_by_q p q r =
+  let 
+   val err = (fn _ => p ^ " is not followed by " ^ q)
+  in
+  (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
+  end
 *}
 
-text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} 
+text {*
+  Running this parser with
+
+  @{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"}
+
+  gives the correct error message and running it with
+ 
+  @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
+                          "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
+  
+  yields the expected parsing. 
 
-text {* (FIXME: explain function application) *}
+  The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle
+  
+  @{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
+                "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
+  
+  Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
+  @{ML "Scan.repeat1"} is similar, but requires that in @{ML_open  "Scan.repeat1 p" for p}
+  the parse @{ML_text "p"} succeeds at least once.
+*}
+ 
+text {* 
+  After parsing succeeded, one wants to apply functions on the parsed items. This is
+  done using the function @{ML_open "(p >> f)" for p f} which applies first the 
+  parser @{ML_text p} upon successful completion applies the function @{ML_text f}.
+  For example
 
-ML {* fun parse_fn (x,y) = (x,y^y) *}
+@{ML_response [display]
+"let 
+  fun double (x,y) = (x^x,y^y) 
+in
+  (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
+end"
+"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
+
+  The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
+  the given parser to the second component of the pair and leaves the  first component 
+  untouched. For example
 
-ML {* ((($$ "h") -- ($$ "e")) >> parse_fn) (explode "hello") *}
+@{ML_response [display]
+"Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
+"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
+
+  (FIXME: In which situations is this useful?) 
+*}
+
+section {* Parsing Tokens *}
 
-text {* (FIXME: explain @{ML_text "lift"}) *}
+text {*
+  Most of the time, however, we will have to deal with tokens that are not just strings.
+  The parsers for the theory syntax, as well as the parsers for the argument syntax
+  of proof methods and attributes use the token type @{ML_type OuterParse.token},
+  which is identical to @{ML_type OuterLex.token}.
+  The parser functions for the theory syntax are contained in the structure
+  @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.
+*}
+
 
 chapter {* Parsing *}
 
Binary file cookbook.pdf has changed