isabelle-cookbook: comparison CookBook/Parsing.thy

equal deleted inserted replaced

-:a04bdee4fb1e
+:0c3580c831a4
 Note how the result of consumed strings builds up on the left as nested pairs.
 Parsers that explore
 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the
-parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds,
+parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds,
 otherwise it returns the result of @{ML_text "q"}. For example
 @{ML_response [display]
 "let
 val hw = ($$ \"h\") || ($$ \"w\")
 in
 (just_e input, just_h input)
 end"
 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
-The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser
+The parser @{ML "Scan.optional p x" for p x} returns the result of the parser
 @{ML_text "p"}, if it succeeds; otherwise it returns
 the default value @{ML_text "x"}. For example
 @{ML_response [display]
 "let
 The function @{ML "(op !!)"} helps to produce appropriate error messages
 during parsing. For example if one wants to parse that @{ML_text p} is immediately
 followed by @{ML_text q}, or start a completely different parser @{ML_text r},
 one might write
-@{ML_open [display] "(p -- q) || r" for p q r}
+@{ML [display] "(p -- q) || r" for p q r}
 However, this parser is problematic for producing an appropriate error message, in case
-the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser
+the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser
 above the information
 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in
 which @{ML_text p}
-is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail
+is present in the input, but not @{ML_text q}. That means @{ML "(p -- q)" for p q} will fail
 and the
 alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong
 parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then
 caused by the
 failure of @{ML_text r}, not by the absence of @{ML_text q} in the input. This kind of situation
 "Exception Error \"foo\" raised"}
 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"}
 (FIXME: give reference to later place).
-Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
+Returning to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want
 to generate the correct error message for p-followed-by-q, then
 we have to write:
 *}
 ML {*
 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
 yields the expected parsing.
-The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as
+The function @{ML "Scan.repeat p" for p} will apply a parser @{ML_text p} as
 often as it succeeds. For example
 @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")"
 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
 However, this kind of manually wrapping needs to be done only very rarely
 in practise, because it is already done by the infrastructure for you.
 After parsing succeeded, one nearly always wants to apply a function on the parsed
-items. This is done using the function @{ML_open "(p >> f)" for p f} which runs
+items. This is done using the function @{ML "(p >> f)" for p f} which runs
 first the parser @{ML_text p} and upon successful completion applies the
 function @{ML_text f} to the result. For example
 @{ML_response [display]
 "let
 @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
 \end{readmore}
 The structure @{ML_struct OuterLex} defines several kinds of token (for example
-@{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and
+@{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and
-@{ML "OuterLex.Command"} for commands). Some token parsers take into account the
+@{ML "Command" in OuterLex} for commands). Some token parsers take into account the
 kind of token.
+*}
-We can generate a token list out of a string using the function @{ML
-"OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument
+text {*
-since, at the moment, we are not interested in generating precise error
+For the examples below, we can generate a token list out of a string using
-messages. The following
+the function @{ML "OuterSyntax.scan"}, which we give below @{ML
+"Position.none"} as argument since, at the moment, we are not interested in
+generating precise error messages. The following
 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\""
 "[Token (\<dots>,(Ident, \"hello\"),\<dots>),
 Token (\<dots>,(Space, \" \"),\<dots>),
 Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
 other syntactic category.\footnote{Note that because of a possible a bug in
 the PolyML runtime system the result is printed as @{text "?"}, instead of
 the token.} The second indicates a space.
 Many parsing functions later on will require spaces, comments and the like
-to have been filtered out.  In what follows below, we are going to use the
+to have already been filtered out.  So from now on we are going to use the
 functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example
 @{ML_response_fake [display]
 "let
 in
 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
 end"
 "((\"|\",\"in\"),[])"}
-The parser @{ML_open "OuterParse.enum s p" for s p} parses a possibly empty
+The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty
-list of items recognised by the parser @{ML_text p}, where the items are
+list of items recognised by the parser @{ML_text p}, where the items being parsed
-separated by the string @{ML_text s}. For example
+are separated by the string @{ML_text s}. For example
 @{ML_response [display]
 "let
-val input = filtered_input \"in | in | in end\"
+val input = filtered_input \"in | in | in foo\"
 in
 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end"
 "([\"in\",\"in\",\"in\"],[\<dots>])"}
 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
-be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end
+be non-empty. Note that we had to add an @{ML_text [quotes] "foo"} at the end
 of the parsed string, otherwise the parser would have consumed all tokens
 and then failed with the exception @{ML_text "MORE"}. Like in the previous
 section, we can avoid this exception using the wrapper @{ML
 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
 OuterLex.stopper}. We can write
 @{ML_response [display]
 "let
 val input = filtered_input \"in | in | in\"
 in
-Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
+Scan.finite OuterLex.stopper
+(OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end"
 "([\"in\",\"in\",\"in\"],[])"}
+The following function will help us to run examples
+*}
+ML {*
+fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input
+*}
+text {*
 The function @{ML "OuterParse.!!!"} can be used to force termination of the
 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section),
 except that the error message is fixed to be @{text [quotes] "Outer syntax error"}
 with a relatively precise description of the failure. For example:
 @{ML_response_fake [display]
 "let
 val input = filtered_input \"in |\"
 val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
 in
-Scan.error (OuterParse.!!! parse_bar_then_in) input
+parse (OuterParse.!!! parse_bar_then_in) input
 end"
 "Exception ERROR \"Outer syntax error: keyword \"|\" expected,
 but keyword in was found\" raised"
 }
-*}
+\begin{exercise}
+A type-identifier, for example @{typ "'a"}, is a token of
+kind @{ML "Keyword" in OuterLex} can be parsed
+\end{exercise}
+*}
 section {* Positional Information *}
 text {*

changeset 53	0c3580c831a4
parent 52	a04bdee4fb1e
child 54	1783211b3494