isabelle-cookbook: comparison CookBook/Parsing.thy

equal deleted inserted replaced

-:631d12c25bde
+:35e1dff0d9bb
 @{ML_text [display] "($$ \"x\") (explode \"world\")"}
 There are three exceptions used in the parsing combinators:
-(FIXME: describe)
+(FIXME: describe exceptions)
 \begin{itemize}
 \item @{ML_text "FAIL"}
-\item @{ML_text "MORE"}
+\item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"}
-\item @{ML_text "ABORT"}
+\item @{ML_text "ABORT"} dead end
 \end{itemize}
 Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it
 takes a predicate as argument and then parses exactly one item from the input list
 satisfying this prediate. For example the following parser either consumes an @{ML_text "h"}
 or a @{ML_text "w"}:
 @{ML_response [display]
-"let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
+"let
+val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
 val input1 = (explode \"hello\")
 val input2 = (explode \"world\")
 in
 (hw input1, hw input2)
 end"
 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the
 parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds,
 otherwise it returns the result of @{ML_text "q"}. For example
 @{ML_response [display]
-"let val hw = ($$ \"h\") || ($$ \"w\")
+"let
+val hw = ($$ \"h\") || ($$ \"w\")
 val input1 = (explode \"hello\")
 val input2 = (explode \"world\")
 in
 (hw input1, hw input2)
 end"
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
-will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}.
 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion
 for parsers, except that they discard the item parsed by the first (respectively second)
 parser. For example
 @{ML_response [display]
-"let val just_h = ($$ \"h\") |-- ($$ \"e\")
+"let
+val just_h = ($$ \"h\") |-- ($$ \"e\")
 val just_e = ($$ \"h\") --| ($$ \"e\")
 val input = (explode \"hello\")
 in
 (just_h input, just_e input)
 end"
 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser
 @{ML_text "p"}, if it succeeds; otherwise it returns
 the default value @{ML_text "x"}. For example
 @{ML_response [display]
-"let val p = Scan.optional ($$ \"h\") \"x\"
+"let
+val p = Scan.optional ($$ \"h\") \"x\"
 val input1 = (explode \"hello\")
 val input2 = (explode \"world\")
 in
 (p input1, p input2)
 end"
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
 The function @{ML "(op !!)"} helps to produce appropriate error messages
-during parsing.
+during parsing. For example if one wants to parse @{ML_text p} immediately
+followed by @{ML_text q}, or start a completely different parser @{ML_text r},
+one might write
+@{ML_open [display] "(p -- q) || r" for p q r}
+However, this way is problematic for producing an appropriate error message, in case
+the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information
+that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p}
+is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the
+alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong
+parser for the input and therefore probably fail. However, the error message is then caused by the
+failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
+can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of
+parsing and invokes an error message. For example if we invoke the parser
+@{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
+on @{ML_text "hello"}, the parsing succeeds
+@{ML_response [display]
+"(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")"
+"(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
+In contrast if we invoke it on @{ML_text "world"}
+@{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"}
+the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
+see the error message properly, we need to prefix the parser with the function
+@{ML "Scan.error"}. For example
+@{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"}
+This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"}
+(FIXME: see below).
+Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
+to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
+we have to write
 *}
 ML {*
+fun p_followed_by_q p q r =
-val err_fn = (fn _ => "foo");
+let
-val p = (!! err_fn ($$ "h"))  || ($$ "w");
+val err = (fn _ => p ^ " is not followed by " ^ q)
-val input1 = (explode "hello");
+in
-val input2 = (explode "world");
+(($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
-*}
+end
+*}
-ML {*
+text {*
-(*Scan.error p input2;*)
+Running this parser with
-*}
+@{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"}
-text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *}
+gives the correct error message and running it with
-text {* (FIXME: explain function application) *}
+@{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
-ML {* fun parse_fn (x,y) = (x,y^y) *}
+"((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
-ML {* ((($$ "h") -- ($$ "e")) >> parse_fn) (explode "hello") *}
+yields the expected parsing.
-text {* (FIXME: explain @{ML_text "lift"}) *}
+The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle
+@{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")"
+"([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
+Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
+@{ML "Scan.repeat1"} is similar, but requires that in @{ML_open  "Scan.repeat1 p" for p}
+the parse @{ML_text "p"} succeeds at least once.
+*}
+text {*
+After parsing succeeded, one wants to apply functions on the parsed items. This is
+done using the function @{ML_open "(p >> f)" for p f} which applies first the
+parser @{ML_text p} upon successful completion applies the function @{ML_text f}.
+For example
+@{ML_response [display]
+"let
+fun double (x,y) = (x^x,y^y)
+in
+(($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
+end"
+"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
+The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
+the given parser to the second component of the pair and leaves the  first component
+untouched. For example
+@{ML_response [display]
+"Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
+"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
+(FIXME: In which situations is this useful?)
+*}
+section {* Parsing Tokens *}
+text {*
+Most of the time, however, we will have to deal with tokens that are not just strings.
+The parsers for the theory syntax, as well as the parsers for the argument syntax
+of proof methods and attributes use the token type @{ML_type OuterParse.token},
+which is identical to @{ML_type OuterLex.token}.
+The parser functions for the theory syntax are contained in the structure
+@{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.
+*}
 chapter {* Parsing *}
 text {*

changeset 40	35e1dff0d9bb
parent 39	631d12c25bde
child 41	b11653b11bd3