isabelle-cookbook: comparison ProgTutorial/Parsing.thy

equal deleted inserted replaced

-:352e31d9dacc
+:f76135c6c527
 these outer syntax parsers, either for new definitional packages or for
 calling methods with specific arguments.
 \begin{readmore}
 The library for writing parser combinators is split up, roughly, into two
-parts. The first part consists of a collection of generic parser combinators
+parts: The first part consists of a collection of generic parser combinators
 defined in the structure @{ML_struct Scan} in the file @{ML_file
 "Pure/General/scan.ML"}. The second part of the library consists of
 combinators for dealing with specific token types, which are defined in the
 structure @{ML_struct OuterParse} in the file @{ML_file
-"Pure/Isar/outer_parse.ML"}. Specific parsers for packages are defined in
+"Pure/Isar/outer_parse.ML"}. In addition specific parsers for packages are
-@{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments are
+defined in @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments
-defined in @{ML_file "Pure/Isar/args.ML"}.
+are defined in @{ML_file "Pure/Isar/args.ML"}.
 \end{readmore}
 *}
 section {* Building Generic Parsers *}
 However, note that these exceptions are private to the parser and cannot be accessed
 by the programmer (for example to handle them).
 In the examples above we use the function @{ML_ind Symbol.explode}, instead of the
 more standard library function @{ML_ind explode}, for obtaining an input list for
 the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences,
 for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see
 the difference consider
 @{ML_response_fake [display,gray]
 "let
 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
 Note how the result of consumed strings builds up on the left as nested pairs.
 If, as in the previous example, you want to parse a particular string,
-then you should use the function @{ML_ind  this_string in Scan}:
+then you can use the function @{ML_ind this_string in Scan}.
 @{ML_response [display,gray]
 "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
 "(\"hell\", [\"o\"])"}
 val input2 = Symbol.explode \"world\"
 in
 (p input1, p input2)
 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
-The function @{ML_ind  "!!"} helps to produce appropriate error messages
+The function @{ML_ind  ahead in Scan} parses some input, but leaves the original
-for parsing. For example if you want to parse @{text p} immediately
+input unchanged. For example:
+@{ML_response [display,gray]
+"Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")"
+"(\"foo\", [\"f\", \"o\", \"o\"])"}
+The function @{ML_ind  "!!"} helps with producing appropriate error messages
+during parsing. For example if you want to parse @{text p} immediately
 followed by @{text q}, or start a completely different parser @{text r},
 you might write:
 @{ML [display,gray] "(p -- q) || r" for p q r}
-However, this parser is problematic for producing an appropriate error
+However, this parser is problematic for producing a useful error
-message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that
+message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the
-case you lose the information that @{text p} should be followed by @{text q}.
+parser above you lose the information that @{text p} should be followed by @{text q}.
 To see this assume that @{text p} is present in the input, but it is not
 followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and
 hence the alternative parser @{text r} will be tried. However, in many
 circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something''
 and therefore will also fail. The error message is then caused by the failure
 This ``prefixing'' is usually done by wrappers such as @{ML_ind  local_theory in OuterSyntax}
 (see Section~\ref{sec:newcommand} which explains this function in more detail).
 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
-r}. If you want to generate the correct error message for
+r}. If you want to generate the correct error message for failure
-@{text "p"}-followed-by-@{text "q"}, then you have to write:
+of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write:
 *}
 ML{*fun p_followed_by_q p q r =
 let
 val err_msg = fn _ => p ^ " is not followed by " ^ q
 Note that @{ML_ind  repeat in Scan} stores the parsed items in a list. The function
 @{ML_ind  repeat1 in Scan} is similar, but requires that the parser @{text "p"}
 succeeds at least once.
 Also note that the parser would have aborted with the exception @{text MORE}, if
-you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
+you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using
 the wrapper @{ML_ind  finite in Scan} and the ``stopper-token''
 @{ML_ind  stopper in Symbol}. With them you can write:
 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")"
 "([\"h\", \"h\", \"h\", \"h\"], [])"}
-@{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
+The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings;
 other stoppers need to be used when parsing, for example, tokens. However, this kind of
 manually wrapping is often already done by the surrounding infrastructure.
 The function @{ML_ind  repeat in Scan} can be used with @{ML_ind  one in Scan} to read any
 string as in
 function @{text f} to the result. For example
 @{ML_response [display,gray]
 "let
 fun double (x, y) = (x ^ x, y ^ y)
+val parser = $$ \"h\" -- $$ \"e\"
 in
-(($$ \"h\") -- ($$ \"e\") >> double) (Symbol.explode \"hello\")
+(parser >> double) (Symbol.explode \"hello\")
 end"
 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
 doubles the two parsed input strings; or
 end"
 "(\"foo bar foo\",[])"}
 where the single-character strings in the parsed output are transformed
 back into one string.
-(FIXME:  move to an earlier place)
-The function @{ML_ind  ahead in Scan} parses some input, but leaves the original
-input unchanged. For example:
-@{ML_response [display,gray]
-"Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")"
-"(\"foo\", [\"f\", \"o\", \"o\"])"}
 The function @{ML_ind  lift in Scan} takes a parser and a pair as arguments. This function applies
 the given parser to the second component of the pair and leaves the  first component
 untouched. For example
 produces three tokens where the first and the last are identifiers, since
 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
 other syntactic category. The second indicates a space.
-We can easily change what is recognised as a keyword with
+We can easily change what is recognised as a keyword with the function
-@{ML_ind  keyword in OuterKeyword}. For example calling this function
+@{ML_ind  keyword in OuterKeyword}. For example calling it with
 *}
 ML{*val _ = OuterKeyword.keyword "hello"*}
 text {*
 in
 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end"
 "([\"in\", \"in\", \"in\"], [\<dots>])"}
-@{ML_ind  enum1 in OuterParse} works similarly, except that the parsed list must
+The function @{ML_ind enum1 in OuterParse} works similarly, except that the
-be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
+parsed list must be non-empty. Note that we had to add a string @{text
-end of the parsed string, otherwise the parser would have consumed all
+[quotes] "foo"} at the end of the parsed string, otherwise the parser would
-tokens and then failed with the exception @{text "MORE"}. Like in the
+have consumed all tokens and then failed with the exception @{text
-previous section, we can avoid this exception using the wrapper @{ML
+"MORE"}. Like in the previous section, we can avoid this exception using the
-Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
+wrapper @{ML Scan.finite}. This time, however, we have to use the
-OuterLex.stopper}. We can write:
+``stopper-token'' @{ML OuterLex.stopper}. We can write:
 @{ML_response [display,gray]
 "let
 val input = filtered_input \"in | in | in\"
+val p = OuterParse.enum \"|\" (OuterParse.$$$ \"in\")
 in
-Scan.finite OuterLex.stopper
+Scan.finite OuterLex.stopper p input
-(OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end"
 "([\"in\", \"in\", \"in\"], [])"}
 The following function will help to run examples.
 *}
 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
 text {*
+The function @{ML_ind "!!!" in OuterParse} can be used to force termination
-The function @{ML_ind  "!!!" in OuterParse} can be used to force termination of the
+of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous
-parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section).
+section). A difference, however, is that the error message of @{ML
-Except that the error message of @{ML "OuterParse.!!!"} is fixed to be
+"OuterParse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"}
-@{text [quotes] "Outer syntax error"}
 together with a relatively precise description of the failure. For example:
 @{ML_response_fake [display,gray]
 "let
 val input = filtered_input \"in |\"
 (FIXME: or give parser for numbers)
 Whenever there is a possibility that the processing of user input can fail,
 it is a good idea to give all available information about where the error
 occurred. For this Isabelle can attach positional information to tokens
-and then thread this information up the processing chain. To see this,
+and then thread this information up the ``processing chain''. To see this,
-modify the function @{ML filtered_input} described earlier to
+modify the function @{ML filtered_input}, described earlier, as follows
 *}
 ML{*fun filtered_input' str =
 filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *}
 Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"}
 in which the @{text [quotes] "\\n"} causes the second token to be in
 line 8.
-By using the parser @{ML OuterParse.position} you can access the token position
+By using the parser @{ML position in OuterParse} you can access the token
-and return it as part of the parser result. For example
+position and return it as part of the parser result. For example
 @{ML_response_fake [display,gray]
 "let
 val input = filtered_input' \"where\"
 in
 @{ML_file "Pure/General/position.ML"}.
 \end{readmore}
 *}
-text {*
+section {* Parsers for ML-Code (TBD) *}
-(FIXME: there are also handy parsers for ML-expressions and ML-files)
+text {*
+@{ML_ind ML_source in OuterParse}
 *}
 section {* Context Parser (TBD) *}
 text {*
+@{ML_ind Args.context}
+*}
+(*
+ML {*
+let
+val parser = Args.context -- Scan.lift Args.name_source
+fun term_pat (ctxt, str) =
+str |> Syntax.read_prop ctxt
+in
+(parser >> term_pat) (Context.Proof @{context}, filtered_input "f (a::nat)")
+|> fst
+end
+*}
+*)
+text {*
+@{ML_ind Args.context}
 Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}.
 *}
 section {* Argument and Attribute Parsers (TBD) *}
 section {* Parsing Inner Syntax *}
 text {*
 There is usually no need to write your own parser for parsing inner syntax, that is
 for terms and  types: you can just call the predefined parsers. Terms can
-be parsed using the function @{ML_ind  term in OuterParse}. For example:
+be parsed using the function @{ML_ind term in OuterParse}. For example:
 @{ML_response [display,gray]
 "let
 val input = OuterSyntax.scan Position.none \"foo\"
 in
 OuterParse.term input
 end"
 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
-The function @{ML_ind  prop in OuterParse} is similar, except that it gives a different
+The function @{ML_ind prop in OuterParse} is similar, except that it gives a different
 error message, when parsing fails. As you can see, the parser not just returns
 the parsed string, but also some encoded information. You can decode the
-information with the function @{ML_ind  parse in YXML}. For example
+information with the function @{ML_ind parse in YXML} in @{ML_struct YXML}. For example
 @{ML_response [display,gray]
 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
 @{text [display] "$ isabelle emacs -k foobar a_theory_file"}
 If you now build a theory on top of @{text "Command.thy"},
-then the command \isacommand{foobar} can be used. You can just write
+then you can use the command \isacommand{foobar}. You can just write
 *}
 foobar
 text {*
 but you will not see any action as we chose to implement this command to do
-nothing.  A similarly procedure has to be done with any other new command, and
+nothing. The point of this command is to only show the procedure of how
-also any new keyword that is introduced with @{ML_ind OuterKeyword.keyword}.
+to interact with ProofGeneral. A similar procedure has to be done with any
+other new command, and also any new keyword that is introduced with
+@{ML_ind OuterKeyword.keyword}.
 *}
 ML{*val _ = OuterKeyword.keyword "blink" *}
 text {*
 in
 Proof.theorem_i NONE (after_qed thm_name) [[(trm,[])]] lthy
 end
 val parser = SpecParse.opt_thm_name ":" -- OuterParse.ML_source
 in
 OuterSyntax.local_theory_to_proof "foobar_prove" "proving a proposition"
 OuterKeyword.thy_goal (parser >> setup_proof)
 end*}

changeset 326	f76135c6c527
parent 324	4172c0743cf2
child 327	ce754ad78bc9