isabelle-cookbook: comparison ProgTutorial/Parsing.thy

equal deleted inserted replaced

-:79202e2eab6a
+:de49d5780f57
 section {* Building Generic Parsers *}
 text {*
 Let us first have a look at parsing strings using generic parsing
-combinators. The function @{ML [index] "$$"} takes a string as argument and will
+combinators. The function @{ML_ind [index] "$$"} takes a string as argument and will
 ``consume'' this string from a given input list of strings. ``Consume'' in
 this context means that it will return a pair consisting of this string and
 the rest of the input list. For example:
 @{ML_response [display,gray]
 end"
 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
 Slightly more general than the parser @{ML "$$"} is the function
-@{ML [index] one in Scan}, in that it takes a predicate as argument and
+@{ML_ind [index] one in Scan}, in that it takes a predicate as argument and
 then parses exactly
 one item from the input list satisfying this predicate. For example the
 following parser either consumes an @{text [quotes] "h"} or a @{text
 [quotes] "w"}:
 in
 (hw input1, hw input2)
 end"
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
-Two parsers can be connected in sequence by using the function @{ML [index] "--"}.
+Two parsers can be connected in sequence by using the function @{ML_ind [index] "--"}.
 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this
 order) you can achieve by:
 @{ML_response [display,gray]
 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
 Note how the result of consumed strings builds up on the left as nested pairs.
 If, as in the previous example, you want to parse a particular string,
-then you should use the function @{ML [index] this_string in Scan}:
+then you should use the function @{ML_ind [index] this_string in Scan}:
 @{ML_response [display,gray]
 "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
 "(\"hell\", [\"o\"])"}
 Parsers that explore alternatives can be constructed using the function
-@{ML [index] "||"}. The parser @{ML "(p || q)" for p q} returns the
+@{ML_ind [index] "||"}. The parser @{ML "(p || q)" for p q} returns the
 result of @{text "p"}, in case it succeeds, otherwise it returns the
 result of @{text "q"}. For example:
 @{ML_response [display,gray]
 in
 (hw input1, hw input2)
 end"
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
-The functions @{ML [index] "|--"} and @{ML [index] "--|"} work like the sequencing function
+The functions @{ML_ind [index] "|--"} and @{ML_ind [index] "--|"} work like the sequencing function
 for parsers, except that they discard the item being parsed by the first (respectively second)
 parser. For example:
 @{ML_response [display,gray]
 "let
 in
 (p input1, p input2)
 end"
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
-The function @{ML [index] option in Scan} works similarly, except no default value can
+The function @{ML_ind [index] option in Scan} works similarly, except no default value can
 be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
 @{ML_response [display,gray]
 "let
 val p = Scan.option ($$ \"h\")
 val input2 = Symbol.explode \"world\"
 in
 (p input1, p input2)
 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
-The function @{ML [index] "!!"} helps to produce appropriate error messages
+The function @{ML_ind [index] "!!"} helps to produce appropriate error messages
 for parsing. For example if you want to parse @{text p} immediately
 followed by @{text q}, or start a completely different parser @{text r},
 you might write:
 @{ML [display,gray] "(p -- q) || r" for p q r}
 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
 "Exception ABORT raised"}
 then the parsing aborts and the error message @{text "foo"} is printed. In order to
 see the error message properly, you need to prefix the parser with the function
-@{ML [index] error in Scan}. For example:
+@{ML_ind [index] error in Scan}. For example:
 @{ML_response_fake [display,gray]
 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
 "Exception Error \"foo\" raised"}
-This ``prefixing'' is usually done by wrappers such as @{ML [index] local_theory in OuterSyntax}
+This ``prefixing'' is usually done by wrappers such as @{ML_ind [index] local_theory in OuterSyntax}
 (see Section~\ref{sec:newcommand} which explains this function in more detail).
 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
 r}. If you want to generate the correct error message for
 @{text "p"}-followed-by-@{text "q"}, then you have to write:
 often as it succeeds. For example:
 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")"
 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
-Note that @{ML [index] repeat in Scan} stores the parsed items in a list. The function
+Note that @{ML_ind [index] repeat in Scan} stores the parsed items in a list. The function
-@{ML [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"}
+@{ML_ind [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"}
 succeeds at least once.
 Also note that the parser would have aborted with the exception @{text MORE}, if
 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
-the wrapper @{ML [index] finite in Scan} and the ``stopper-token''
+the wrapper @{ML_ind [index] finite in Scan} and the ``stopper-token''
-@{ML [index] stopper in Symbol}. With them you can write:
+@{ML_ind [index] stopper in Symbol}. With them you can write:
 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")"
 "([\"h\", \"h\", \"h\", \"h\"], [])"}
 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
 other stoppers need to be used when parsing, for example, tokens. However, this kind of
 manually wrapping is often already done by the surrounding infrastructure.
-The function @{ML [index] repeat in Scan} can be used with @{ML [index] one in Scan} to read any
+The function @{ML_ind [index] repeat in Scan} can be used with @{ML_ind [index] one in Scan} to read any
 string as in
 @{ML_response [display,gray]
 "let
 val p = Scan.repeat (Scan.one Symbol.not_eof)
 in
 Scan.finite Symbol.stopper p input
 end"
 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
-where the function @{ML [index] not_eof in Symbol} ensures that we do not read beyond the
+where the function @{ML_ind [index] not_eof in Symbol} ensures that we do not read beyond the
 end of the input string (i.e.~stopper symbol).
-The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can
+The function @{ML_ind [index] unless in Scan} takes two parsers: if the first one can
 parse the input, then the whole parser fails; if not, then the second is tried. Therefore
 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
 "Exception FAIL raised"}
 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
 succeeds.
-The functions @{ML [index] repeat in Scan} and @{ML [index] unless in Scan} can
+The functions @{ML_ind [index] repeat in Scan} and @{ML_ind [index] unless in Scan} can
 be combined to read any input until a certain marker symbol is reached. In the
 example below the marker symbol is a @{text [quotes] "*"}.
 @{ML_response [display,gray]
 "let
 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
 After parsing is done, you almost always want to apply a function to the parsed
-items. One way to do this is the function @{ML [index]">>"} where
+items. One way to do this is the function @{ML_ind [index]">>"} where
 @{ML "(p >> f)" for p f} runs
 first the parser @{text p} and upon successful completion applies the
 function @{text f} to the result. For example
 @{ML_response [display,gray]
 where the single-character strings in the parsed output are transformed
 back into one string.
 (FIXME:  move to an earlier place)
-The function @{ML [index] ahead in Scan} parses some input, but leaves the original
+The function @{ML_ind [index] ahead in Scan} parses some input, but leaves the original
 input unchanged. For example:
 @{ML_response [display,gray]
 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")"
 "(\"foo\", [\"f\", \"o\", \"o\"])"}
-The function @{ML [index] lift in Scan} takes a parser and a pair as arguments. This function applies
+The function @{ML_ind [index] lift in Scan} takes a parser and a pair as arguments. This function applies
 the given parser to the second component of the pair and leaves the  first component
 untouched. For example
 @{ML_response [display,gray]
 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
 @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
 \end{readmore}
 The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for
-example @{ML [index] Ident in OuterLex} for identifiers, @{ML Keyword in
+example @{ML_ind [index] Ident in OuterLex} for identifiers, @{ML Keyword in
-OuterLex} for keywords and @{ML [index] Command in OuterLex} for commands). Some
+OuterLex} for keywords and @{ML_ind [index] Command in OuterLex} for commands). Some
 token parsers take into account the kind of tokens. The first example shows
 how to generate a token list out of a string using the function
-@{ML [index] scan in OuterSyntax}. It is given the argument
+@{ML_ind [index] scan in OuterSyntax}. It is given the argument
 @{ML "Position.none"} since,
 at the moment, we are not interested in generating precise error
 messages. The following code\footnote{Note that because of a possible bug in
 the PolyML runtime system, the result is printed as @{text [quotes] "?"},
 instead of the tokens.}
 produces three tokens where the first and the last are identifiers, since
 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
 other syntactic category. The second indicates a space.
 We can easily change what is recognised as a keyword with
-@{ML [index] keyword in OuterKeyword}. For example calling this function
+@{ML_ind [index] keyword in OuterKeyword}. For example calling this function
 *}
 ML{*val _ = OuterKeyword.keyword "hello"*}
 text {*
 Token (\<dots>,(Space, \" \"),\<dots>),
 Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
 Many parsing functions later on will require white space, comments and the like
 to have already been filtered out.  So from now on we are going to use the
-functions @{ML filter} and @{ML [index] is_proper in OuterLex} to do this.
+functions @{ML filter} and @{ML_ind [index] is_proper in OuterLex} to do this.
 For example:
 @{ML_response_fake [display,gray]
 "let
 val input = OuterSyntax.scan Position.none \"hello world\"
 in
 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
 end"
 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
-You might have to adjust the @{ML [index] print_depth} in order to
+You might have to adjust the @{ML_ind [index] print_depth} in order to
 see the complete list.
-The parser @{ML [index] "$$$" in OuterParse} parses a single keyword. For example:
+The parser @{ML_ind [index] "$$$" in OuterParse} parses a single keyword. For example:
 @{ML_response [display,gray]
 "let
 val input1 = filtered_input \"where for\"
 val input2 = filtered_input \"| in\"
 in
 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
 end"
 "((\"where\",\<dots>), (\"|\",\<dots>))"}
-Any non-keyword string can be parsed with the function @{ML [index] reserved in OuterParse}.
+Any non-keyword string can be parsed with the function @{ML_ind [index] reserved in OuterParse}.
 For example:
 @{ML_response [display,gray]
 "let
 val p = OuterParse.reserved \"bar\"
 in
 p input
 end"
 "(\"bar\",[])"}
-Like before, you can sequentially connect parsers with @{ML [index] "--"}. For example:
+Like before, you can sequentially connect parsers with @{ML_ind [index] "--"}. For example:
 @{ML_response [display,gray]
 "let
 val input = filtered_input \"| in\"
 in
 in
 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end"
 "([\"in\", \"in\", \"in\"], [\<dots>])"}
-@{ML [index] enum1 in OuterParse} works similarly, except that the parsed list must
+@{ML_ind [index] enum1 in OuterParse} works similarly, except that the parsed list must
 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
 end of the parsed string, otherwise the parser would have consumed all
 tokens and then failed with the exception @{text "MORE"}. Like in the
 previous section, we can avoid this exception using the wrapper @{ML
 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
 text {*
-The function @{ML [index] "!!!" in OuterParse} can be used to force termination of the
+The function @{ML_ind [index] "!!!" in OuterParse} can be used to force termination of the
 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section).
 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be
 @{text [quotes] "Outer syntax error"}
 together with a relatively precise description of the failure. For example:
 but keyword in was found\" raised"
 }
 \begin{exercise} (FIXME)
 A type-identifier, for example @{typ "'a"}, is a token of
-kind @{ML [index] Keyword in OuterLex}. It can be parsed using
+kind @{ML_ind [index] Keyword in OuterLex}. It can be parsed using
 the function @{ML type_ident in OuterParse}.
 \end{exercise}
 (FIXME: or give parser for numbers)
 section {* Parsing Inner Syntax *}
 text {*
 There is usually no need to write your own parser for parsing inner syntax, that is
 for terms and  types: you can just call the predefined parsers. Terms can
-be parsed using the function @{ML [index] term in OuterParse}. For example:
+be parsed using the function @{ML_ind [index] term in OuterParse}. For example:
 @{ML_response [display,gray]
 "let
 val input = OuterSyntax.scan Position.none \"foo\"
 in
 OuterParse.term input
 end"
 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
-The function @{ML [index] prop in OuterParse} is similar, except that it gives a different
+The function @{ML_ind [index] prop in OuterParse} is similar, except that it gives a different
 error message, when parsing fails. As you can see, the parser not just returns
 the parsed string, but also some encoded information. You can decode the
-information with the function @{ML [index] parse in YXML}. For example
+information with the function @{ML_ind [index] parse in YXML}. For example
 @{ML_response [display,gray]
 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
 As you see, the result is a pair consisting of a list of
 variables with optional type-annotation and syntax-annotation, and a list of
 rules where every rule has optionally a name and an attribute.
-The function @{ML [index] "fixes" in OuterParse} in Line 2 of the parser reads an
+The function @{ML_ind [index] "fixes" in OuterParse} in Line 2 of the parser reads an
 \isacommand{and}-separated
 list of variables that can include optional type annotations and syntax translations.
 For example:\footnote{Note that in the code we need to write
 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
 in the compound type.}
 text {*
 Whenever types are given, they are stored in the @{ML SOME}s. The types are
 not yet used to type the variables: this must be done by type-inference later
 on. Since types are part of the inner syntax they are strings with some
 encoded information (see previous section). If a mixfix-syntax is
-present for a variable, then it is stored in the @{ML [index] Mixfix} data structure;
+present for a variable, then it is stored in the @{ML_ind [index] Mixfix} data structure;
-no syntax translation is indicated by @{ML [index] NoSyn}.
+no syntax translation is indicated by @{ML_ind [index] NoSyn}.
 \begin{readmore}
 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
 \end{readmore}
 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
 list of introduction rules, that is propositions with theorem annotations
 such as rule names and attributes. The introduction rules are propositions
-parsed by @{ML [index] prop  in OuterParse}. However, they can include an optional
+parsed by @{ML_ind [index] prop  in OuterParse}. However, they can include an optional
 theorem name plus some attributes. For example
 @{ML_response [display,gray] "let
 val input = filtered_input \"foo_lemma[intro,dest!]:\"
 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input
 in
 (name, map Args.dest_src attrib)
 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
-The function @{ML [index] opt_thm_name in SpecParse} is the ``optional'' variant of
+The function @{ML_ind [index] opt_thm_name in SpecParse} is the ``optional'' variant of
-@{ML [index] thm_name in SpecParse}. Theorem names can contain attributes. The name
+@{ML_ind [index] thm_name in SpecParse}. Theorem names can contain attributes. The name
 has to end with @{text [quotes] ":"}---see the argument of
 the function @{ML SpecParse.opt_thm_name} in Line 7.
 \begin{readmore}
 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"}
 text_raw {*
 \begin{exercise}
 Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
 in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
 to the ``where-part'' of the introduction rules given above. Below
-we paraphrase the code of @{ML [index] where_alt_specs in SpecParse} adapted to our
+we paraphrase the code of @{ML_ind [index] where_alt_specs in SpecParse} adapted to our
 purposes.
 \begin{isabelle}
 *}
 ML %linenosgray{*val spec_parser' =
 OuterParse.fixes --
 in
 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
 end *}
 text {*
-The crucial function @{ML [index] local_theory in OuterSyntax} expects a name for the command, a
+The crucial function @{ML_ind [index] local_theory in OuterSyntax} expects a name for the command, a
 short description, a kind indicator (which we will explain later more thoroughly) and a
 parser producing a local theory transition (its purpose will also explained
 later).
 While this is everything you have to do on the ML-level, you need a keyword
 next by letting it take a proposition as argument and printing this proposition
 inside the tracing buffer.
 The crucial part of a command is the function that determines the behaviour
 of the command. In the code above we used a ``do-nothing''-function, which
-because of @{ML [index] succeed in Scan} does not parse any argument, but immediately
+because of @{ML_ind [index] succeed in Scan} does not parse any argument, but immediately
 returns the simple function @{ML "LocalTheory.theory I"}. We can
 replace this code by a function that first parses a proposition (using the
 parser @{ML OuterParse.prop}), then prints out the tracing
 information (using a new function @{text trace_prop}) and
 finally does nothing. For this you can write:
 @{text "> \"True \<and> False\""}
 \end{isabelle}
 and see the proposition in the tracing buffer.
-Note that so far we used @{ML [index] thy_decl in OuterKeyword} as the kind
+Note that so far we used @{ML_ind [index] thy_decl in OuterKeyword} as the kind
 indicator for the command.  This means that the command finishes as soon as
 the arguments are processed. Examples of this kind of commands are
 \isacommand{definition} and \isacommand{declare}.  In other cases, commands
 are expected to parse some arguments, for example a proposition, and then
 ``open up'' a proof in order to prove the proposition (for example
 \isacommand{lemma}) or prove some other properties (for example
 \isacommand{function}). To achieve this kind of behaviour, you have to use
-the kind indicator @{ML [index] thy_goal in OuterKeyword} and the function @{ML
+the kind indicator @{ML_ind [index] thy_goal in OuterKeyword} and the function @{ML
 "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
 however, once you change the ``kind'' of a command from @{ML thy_decl in
 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
 to be re-created!
 end *}
 text {*
 The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be
 proved) and a context as argument.  The context is necessary in order to be able to use
-@{ML [index] read_prop in Syntax}, which converts a string into a proper proposition.
+@{ML_ind [index] read_prop in Syntax}, which converts a string into a proper proposition.
-In Line 6 the function @{ML [index] theorem_i in Proof} starts the proof for the
+In Line 6 the function @{ML_ind [index] theorem_i in Proof} starts the proof for the
 proposition. Its argument @{ML NONE} stands for a locale (which we chose to
 omit); the argument @{ML "(K I)"} stands for a function that determines what
 should be done with the theorem once it is proved (we chose to just forget
 about it). Line 9 contains the parser for the proposition.
 \isacommand{apply}@{text "(rule conjI)"}\\
 \isacommand{apply}@{text "(rule TrueI)+"}\\
 \isacommand{done}
 \end{isabelle}
-(FIXME: read a name and show how to store theorems; see @{ML [index] note in LocalTheory})
+(FIXME: read a name and show how to store theorems; see @{ML_ind [index] note in LocalTheory})
 *}
 section {* Methods (TBD) *}
 text {*
 It defines the method @{text foo}, which takes no arguments (therefore the
 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which
 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function
-@{ML [index] SIMPLE_METHOD}
+@{ML_ind [index] SIMPLE_METHOD}
 turns such a tactic into a method. The method @{text "foo"} can be used as follows
 *}
 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
 apply(foo)

changeset 315	de49d5780f57
parent 310	007922777ff1
child 316	74f0a06f751f