theory Parsingimports Base "Helper/Command/Command" "Package/Simple_Inductive_Package"uses "Parsing.ML"beginchapter {* Parsing *}text {* Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. Commands, such as \isacommand{definition}, \isacommand{inductive} and so on, belong to the outer syntax, whereas terms, types and so on belong to the inner syntax. For parsing inner syntax, Isabelle uses a rather general and sophisticated algorithm, which is driven by priority grammars. Parsers for outer syntax are built up by functional parsing combinators. These combinators are a well-established technique for parsing, which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. Isabelle developers are usually concerned with writing these outer syntax parsers, either for new definitional packages or for calling methods with specific arguments. \begin{readmore} The library for writing parser combinators is split up, roughly, into two parts: The first part consists of a collection of generic parser combinators defined in the structure @{ML_struct Scan} in the file @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of combinators for dealing with specific token types, which are defined in the structure @{ML_struct OuterParse} in the file @{ML_file "Pure/Isar/outer_parse.ML"}. In addition specific parsers for packages are defined in @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments are defined in @{ML_file "Pure/Isar/args.ML"}. \end{readmore}*}section {* Building Generic Parsers *}text {* Let us first have a look at parsing strings using generic parsing combinators. The function @{ML_ind "$$"} takes a string as argument and will ``consume'' this string from a given input list of strings. ``Consume'' in this context means that it will return a pair consisting of this string and the rest of the input list. For example: @{ML_response [display,gray] "($$ \"h\") (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} @{ML_response [display,gray] "($$ \"w\") (Symbol.explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception @{text "FAIL"} if no string can be consumed. For example trying to parse @{ML_response_fake [display,gray] "($$ \"x\") (Symbol.explode \"world\")" "Exception FAIL raised"} will raise the exception @{text "FAIL"}. There are three exceptions used in the parsing combinators: \begin{itemize} \item @{text "FAIL"} is used to indicate that alternative routes of parsing might be explored. \item @{text "MORE"} indicates that there is not enough input for the parser. For example in @{text "($$ \"h\") []"}. \item @{text "ABORT"} is the exception that is raised when a dead end is reached. It is used for example in the function @{ML "!!"} (see below). \end{itemize} However, note that these exceptions are private to the parser and cannot be accessed by the programmer (for example to handle them). In the examples above we use the function @{ML_ind Symbol.explode}, instead of the more standard library function @{ML_ind explode}, for obtaining an input list for the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see the difference consider@{ML_response_fake [display,gray]"let val input = \"\<foo> bar\"in (explode input, Symbol.explode input)end""([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} Slightly more general than the parser @{ML "$$"} is the function @{ML_ind one in Scan}, in that it takes a predicate as argument and then parses exactly one item from the input list satisfying this predicate. For example the following parser either consumes an @{text [quotes] "h"} or a @{text [quotes] "w"}:@{ML_response [display,gray] "let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") val input1 = Symbol.explode \"hello\" val input2 = Symbol.explode \"world\"in (hw input1, hw input2)end" "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} Two parsers can be connected in sequence by using the function @{ML_ind "--"}. For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this order) you can achieve by: @{ML_response [display,gray] "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} Note how the result of consumed strings builds up on the left as nested pairs. If, as in the previous example, you want to parse a particular string, then you can use the function @{ML_ind this_string in Scan}. @{ML_response [display,gray] "Scan.this_string \"hell\" (Symbol.explode \"hello\")" "(\"hell\", [\"o\"])"} Parsers that explore alternatives can be constructed using the function @{ML_ind "||"}. The parser @{ML "(p || q)" for p q} returns the result of @{text "p"}, in case it succeeds, otherwise it returns the result of @{text "q"}. For example:@{ML_response [display,gray] "let val hw = $$ \"h\" || $$ \"w\" val input1 = Symbol.explode \"hello\" val input2 = Symbol.explode \"world\"in (hw input1, hw input2)end" "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} The functions @{ML_ind "|--"} and @{ML_ind "--|"} work like the sequencing function for parsers, except that they discard the item being parsed by the first (respectively second) parser. For example:@{ML_response [display,gray]"let val just_e = $$ \"h\" |-- $$ \"e\" val just_h = $$ \"h\" --| $$ \"e\" val input = Symbol.explode \"hello\" in (just_e input, just_h input)end" "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"} The parser @{ML "Scan.optional p x" for p x} returns the result of the parser @{text "p"}, if it succeeds; otherwise it returns the default value @{text "x"}. For example:@{ML_response [display,gray]"let val p = Scan.optional ($$ \"h\") \"x\" val input1 = Symbol.explode \"hello\" val input2 = Symbol.explode \"world\" in (p input1, p input2)end" "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} The function @{ML_ind option in Scan} works similarly, except no default value can be given. Instead, the result is wrapped as an @{text "option"}-type. For example:@{ML_response [display,gray]"let val p = Scan.option ($$ \"h\") val input1 = Symbol.explode \"hello\" val input2 = Symbol.explode \"world\" in (p input1, p input2)end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} The function @{ML_ind ahead in Scan} parses some input, but leaves the original input unchanged. For example: @{ML_response [display,gray] "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" "(\"foo\", [\"f\", \"o\", \"o\"])"} The function @{ML_ind "!!"} helps with producing appropriate error messages during parsing. For example if you want to parse @{text p} immediately followed by @{text q}, or start a completely different parser @{text r}, you might write: @{ML [display,gray] "(p -- q) || r" for p q r} However, this parser is problematic for producing a useful error message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the parser above you lose the information that @{text p} should be followed by @{text q}. To see this assume that @{text p} is present in the input, but it is not followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and hence the alternative parser @{text r} will be tried. However, in many circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something'' and therefore will also fail. The error message is then caused by the failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation can be avoided when using the function @{ML "!!"}. This function aborts the whole process of parsing in case of a failure and prints an error message. For example if you invoke the parser @{ML [display,gray] "!! (fn _ => \"foo\") ($$ \"h\")"} on @{text [quotes] "hello"}, the parsing succeeds @{ML_response [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} but if you invoke it on @{text [quotes] "world"} @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" "Exception ABORT raised"} then the parsing aborts and the error message @{text "foo"} is printed. In order to see the error message properly, you need to prefix the parser with the function @{ML_ind error in Scan}. For example: @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" "Exception Error \"foo\" raised"} This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in OuterSyntax} (see Section~\ref{sec:newcommand} which explains this function in more detail). Let us now return to our example of parsing @{ML "(p -- q) || r" for p q r}. If you want to generate the correct error message for failure of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write:*}ML{*fun p_followed_by_q p q r =let val err_msg = fn _ => p ^ " is not followed by " ^ qin ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)end *}text {* Running this parser with the arguments @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and the input @{text [quotes] "holle"} @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")" "Exception ERROR \"h is not followed by e\" raised"} produces the correct error message. Running it with @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")" "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} yields the expected parsing. The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as often as it succeeds. For example: @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} succeeds at least once. Also note that the parser would have aborted with the exception @{text MORE}, if you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' @{ML_ind stopper in Symbol}. With them you can write: @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" "([\"h\", \"h\", \"h\", \"h\"], [])"} The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; other stoppers need to be used when parsing, for example, tokens. However, this kind of manually wrapping is often already done by the surrounding infrastructure. The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any string as in @{ML_response [display,gray] "let val p = Scan.repeat (Scan.one Symbol.not_eof) val input = Symbol.explode \"foo bar foo\"in Scan.finite Symbol.stopper p inputend" "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the end of the input string (i.e.~stopper symbol). The function @{ML_ind unless in Scan} takes two parsers: if the first one can parse the input, then the whole parser fails; if not, then the second is tried. Therefore @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" "Exception FAIL raised"} fails, while @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} succeeds. The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can be combined to read any input until a certain marker symbol is reached. In the example below the marker symbol is a @{text [quotes] "*"}. @{ML_response [display,gray]"let val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) val input1 = Symbol.explode \"fooooo\" val input2 = Symbol.explode \"foo*ooo\"in (Scan.finite Symbol.stopper p input1, Scan.finite Symbol.stopper p input2)end""(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} After parsing is done, you almost always want to apply a function to the parsed items. One way to do this is the function @{ML_ind ">>"} where @{ML "(p >> f)" for p f} runs first the parser @{text p} and upon successful completion applies the function @{text f} to the result. For example@{ML_response [display,gray]"let fun double (x, y) = (x ^ x, y ^ y) val parser = $$ \"h\" -- $$ \"e\"in (parser >> double) (Symbol.explode \"hello\")end""((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} doubles the two parsed input strings; or @{ML_response [display,gray] "let val p = Scan.repeat (Scan.one Symbol.not_eof) val input = Symbol.explode \"foo bar foo\" in Scan.finite Symbol.stopper (p >> implode) inputend" "(\"foo bar foo\",[])"} where the single-character strings in the parsed output are transformed back into one string. The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies the given parser to the second component of the pair and leaves the first component untouched. For example@{ML_response [display,gray]"Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")""((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} (FIXME: In which situations is this useful? Give examples.) \begin{exercise}\label{ex:scancmts} Write a parser that parses an input string so that any comment enclosed within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML "s1 ^ s ^ s2" for s1 s2 s}. Hint: To simplify the task ignore the proper nesting of comments. \end{exercise}*}section {* Parsing Theory Syntax *}text {* Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. These token parsers have the type:*}ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}text {* The reason for using token parsers is that theory syntax, as well as the parsers for the arguments of proof methods, use the type @{ML_type OuterLex.token}. \begin{readmore} The parser functions for the theory syntax are contained in the structure @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. \end{readmore} The structure @{ML_struct OuterLex} defines several kinds of tokens (for example @{ML_ind Ident in OuterLex} for identifiers, @{ML Keyword in OuterLex} for keywords and @{ML_ind Command in OuterLex} for commands). Some token parsers take into account the kind of tokens. The first example shows how to generate a token list out of a string using the function @{ML_ind scan in OuterSyntax}. It is given the argument @{ML "Position.none"} since, at the moment, we are not interested in generating precise error messages. The following code\footnote{Note that because of a possible bug in the PolyML runtime system, the result is printed as @{text [quotes] "?"}, instead of the tokens.}@{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" "[Token (\<dots>,(Ident, \"hello\"),\<dots>), Token (\<dots>,(Space, \" \"),\<dots>), Token (\<dots>,(Ident, \"world\"),\<dots>)]"} produces three tokens where the first and the last are identifiers, since @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any other syntactic category. The second indicates a space. We can easily change what is recognised as a keyword with the function @{ML_ind keyword in OuterKeyword}. For example calling it with *}ML{*val _ = OuterKeyword.keyword "hello"*}text {* then lexing @{text [quotes] "hello world"} will produce @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" "[Token (\<dots>,(Keyword, \"hello\"),\<dots>), Token (\<dots>,(Space, \" \"),\<dots>), Token (\<dots>,(Ident, \"world\"),\<dots>)]"} Many parsing functions later on will require white space, comments and the like to have already been filtered out. So from now on we are going to use the functions @{ML filter} and @{ML_ind is_proper in OuterLex} to do this. For example:@{ML_response_fake [display,gray]"let val input = OuterSyntax.scan Position.none \"hello world\"in filter OuterLex.is_proper inputend" "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"} For convenience we define the function:*}ML{*fun filtered_input str = filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}text {* If you now parse@{ML_response_fake [display,gray] "filtered_input \"inductive | for\"" "[Token (\<dots>,(Command, \"inductive\"),\<dots>), Token (\<dots>,(Keyword, \"|\"),\<dots>), Token (\<dots>,(Keyword, \"for\"),\<dots>)]"} you obtain a list consisting of only one command and two keyword tokens. If you want to see which keywords and commands are currently known to Isabelle, type:@{ML_response_fake [display,gray] "let val (keywords, commands) = OuterKeyword.get_lexicons ()in (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)end" "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} You might have to adjust the @{ML_ind print_depth} in order to see the complete list. The parser @{ML_ind "$$$" in OuterParse} parses a single keyword. For example:@{ML_response [display,gray]"let val input1 = filtered_input \"where for\" val input2 = filtered_input \"| in\"in (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)end""((\"where\",\<dots>), (\"|\",\<dots>))"} Any non-keyword string can be parsed with the function @{ML_ind reserved in OuterParse}. For example: @{ML_response [display,gray]"let val p = OuterParse.reserved \"bar\" val input = filtered_input \"bar\"in p inputend" "(\"bar\",[])"} Like before, you can sequentially connect parsers with @{ML_ind "--"}. For example: @{ML_response [display,gray]"let val input = filtered_input \"| in\"in (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") inputend""((\"|\", \"in\"), [])"} The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty list of items recognised by the parser @{text p}, where the items being parsed are separated by the string @{text s}. For example:@{ML_response [display,gray]"let val input = filtered_input \"in | in | in foo\"in (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) inputend" "([\"in\", \"in\", \"in\"], [\<dots>])"} The function @{ML_ind enum1 in OuterParse} works similarly, except that the parsed list must be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the end of the parsed string, otherwise the parser would have consumed all tokens and then failed with the exception @{text "MORE"}. Like in the previous section, we can avoid this exception using the wrapper @{ML Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML OuterLex.stopper}. We can write:@{ML_response [display,gray]"let val input = filtered_input \"in | in | in\" val p = OuterParse.enum \"|\" (OuterParse.$$$ \"in\")in Scan.finite OuterLex.stopper p inputend" "([\"in\", \"in\", \"in\"], [])"} The following function will help to run examples.*}ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}text {* The function @{ML_ind "!!!" in OuterParse} can be used to force termination of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). A difference, however, is that the error message of @{ML "OuterParse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} together with a relatively precise description of the failure. For example:@{ML_response_fake [display,gray]"let val input = filtered_input \"in |\" val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"in parse (OuterParse.!!! parse_bar_then_in) input end""Exception ERROR \"Outer syntax error: keyword \"|\" expected, but keyword in was found\" raised"} \begin{exercise} (FIXME) A type-identifier, for example @{typ "'a"}, is a token of kind @{ML_ind Keyword in OuterLex}. It can be parsed using the function @{ML type_ident in OuterParse}. \end{exercise} (FIXME: or give parser for numbers) Whenever there is a possibility that the processing of user input can fail, it is a good idea to give all available information about where the error occurred. For this Isabelle can attach positional information to tokens and then thread this information up the ``processing chain''. To see this, modify the function @{ML filtered_input}, described earlier, as follows *}ML{*fun filtered_input' str = filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *}text {* where we pretend the parsed string starts on line 7. An example is@{ML_response_fake [display,gray]"filtered_input' \"foo \\n bar\"""[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \<dots>), Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"} in which the @{text [quotes] "\\n"} causes the second token to be in line 8. By using the parser @{ML position in OuterParse} you can access the token position and return it as part of the parser result. For example@{ML_response_fake [display,gray]"let val input = filtered_input' \"where\"in parse (OuterParse.position (OuterParse.$$$ \"where\")) inputend""((\"where\", {line=7, end_line=7}), [])"} \begin{readmore} The functions related to positions are implemented in the file @{ML_file "Pure/General/position.ML"}. \end{readmore}*}section {* Parsers for ML-Code (TBD) *}text {* @{ML_ind ML_source in OuterParse}*}section {* Context Parser (TBD) *}text {* @{ML_ind Args.context}*}(*ML {*let val parser = Args.context -- Scan.lift Args.name_source fun term_pat (ctxt, str) = str |> Syntax.read_prop ctxtin (parser >> term_pat) (Context.Proof @{context}, filtered_input "f (a::nat)") |> fstend*}*)text {* @{ML_ind Args.context} Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}.*}section {* Argument and Attribute Parsers (TBD) *}section {* Parsing Inner Syntax *}text {* There is usually no need to write your own parser for parsing inner syntax, that is for terms and types: you can just call the predefined parsers. Terms can be parsed using the function @{ML_ind term in OuterParse}. For example:@{ML_response [display,gray]"let val input = OuterSyntax.scan Position.none \"foo\"in OuterParse.term inputend""(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} The function @{ML_ind prop in OuterParse} is similar, except that it gives a different error message, when parsing fails. As you can see, the parser not just returns the parsed string, but also some encoded information. You can decode the information with the function @{ML_ind parse in YXML} in @{ML_struct YXML}. For example @{ML_response [display,gray] "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} The result of the decoding is an XML-tree. You can see better what is going on if you replace @{ML Position.none} by @{ML "Position.line 42"}, say:@{ML_response [display,gray]"let val input = OuterSyntax.scan (Position.line 42) \"foo\"in YXML.parse (fst (OuterParse.term input))end""XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"} The positional information is stored as part of an XML-tree so that code called later on will be able to give more precise error messages. \begin{readmore} The functions to do with input and output of XML and YXML are defined in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}. \end{readmore}*}section {* Parsing Specifications\label{sec:parsingspecs} *}text {* There are a number of special purpose parsers that help with parsing specifications of function definitions, inductive predicates and so on. In Chapter~\ref{chp:package}, for example, we will need to parse specifications for inductive predicates of the form:*}simple_inductive even and oddwhere even0: "even 0"| evenS: "odd n \<Longrightarrow> even (Suc n)"| oddS: "even n \<Longrightarrow> odd (Suc n)"text {* For this we are going to use the parser:*}ML %linenosgray{*val spec_parser = OuterParse.fixes -- Scan.optional (OuterParse.$$$ "where" |-- OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*}text {* Note that the parser must not parse the keyword \simpleinductive, even if it is meant to process definitions as shown above. The parser of the keyword will be given by the infrastructure that will eventually call @{ML spec_parser}. To see what the parser returns, let us parse the string corresponding to the definition of @{term even} and @{term odd}:@{ML_response [display,gray]"let val input = filtered_input (\"even and odd \" ^ \"where \" ^ \" even0[intro]: \\\"even 0\\\" \" ^ \"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^ \"| oddS[intro]: \\\"even n \<Longrightarrow> odd (Suc n)\\\"\")in parse spec_parser inputend""(([(even, NONE, NoSyn), (odd, NONE, NoSyn)], [((even0,\<dots>), \"\\^E\\^Ftoken\\^Eeven 0\\^E\\^F\\^E\"), ((evenS,\<dots>), \"\\^E\\^Ftoken\\^Eodd n \<Longrightarrow> even (Suc n)\\^E\\^F\\^E\"), ((oddS,\<dots>), \"\\^E\\^Ftoken\\^Eeven n \<Longrightarrow> odd (Suc n)\\^E\\^F\\^E\")]), [])"} As you see, the result is a pair consisting of a list of variables with optional type-annotation and syntax-annotation, and a list of rules where every rule has optionally a name and an attribute. The function @{ML_ind "fixes" in OuterParse} in Line 2 of the parser reads an \isacommand{and}-separated list of variables that can include optional type annotations and syntax translations. For example:\footnote{Note that in the code we need to write @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes in the compound type.}@{ML_response [display,gray]"let val input = filtered_input \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\"in parse OuterParse.fixes inputend""([(foo, SOME \"\\^E\\^Ftoken\\^Eint \<Rightarrow> bool\\^E\\^F\\^E\", NoSyn), (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), (blonk, NONE, NoSyn)],[])"} *}text {* Whenever types are given, they are stored in the @{ML SOME}s. The types are not yet used to type the variables: this must be done by type-inference later on. Since types are part of the inner syntax they are strings with some encoded information (see previous section). If a mixfix-syntax is present for a variable, then it is stored in the @{ML_ind Mixfix} data structure; no syntax translation is indicated by @{ML_ind NoSyn}. \begin{readmore} The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. \end{readmore} Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a list of introduction rules, that is propositions with theorem annotations such as rule names and attributes. The introduction rules are propositions parsed by @{ML_ind prop in OuterParse}. However, they can include an optional theorem name plus some attributes. For example@{ML_response [display,gray] "let val input = filtered_input \"foo_lemma[intro,dest!]:\" val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input in (name, map Args.dest_src attrib)end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} The function @{ML_ind opt_thm_name in SpecParse} is the ``optional'' variant of @{ML_ind thm_name in SpecParse}. Theorem names can contain attributes. The name has to end with @{text [quotes] ":"}---see the argument of the function @{ML SpecParse.opt_thm_name} in Line 7. \begin{readmore} Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} and @{ML_file "Pure/Isar/args.ML"}. \end{readmore}*}text_raw {* \begin{exercise} Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds to the ``where-part'' of the introduction rules given above. Below we paraphrase the code of @{ML_ind where_alt_specs in SpecParse} adapted to our purposes. \begin{isabelle}*}ML %linenosgray{*val spec_parser' = OuterParse.fixes -- Scan.optional (OuterParse.$$$ "where" |-- OuterParse.!!! (OuterParse.enum1 "|" ((SpecParse.opt_thm_name ":" -- OuterParse.prop) --| Scan.option (Scan.ahead (OuterParse.name || OuterParse.$$$ "[") -- OuterParse.!!! (OuterParse.$$$ "|"))))) [] *}text_raw {* \end{isabelle} Both parsers accept the same input% that's not true: % spec_parser accepts input that is refuted by spec_parser' , but if you look closely, you can notice an additional ``tail'' (Lines 8 to 10) in @{ML spec_parser'}. What is the purpose of this additional ``tail''? \end{exercise}*}text {* (FIXME: @{ML OuterParse.type_args}, @{ML OuterParse.typ}, @{ML OuterParse.opt_mixfix})*}section {* New Commands and Keyword Files\label{sec:newcommand} *}text {* Often new commands, for example for providing new definitional principles, need to be implemented. While this is not difficult on the ML-level, new commands, in order to be useful, need to be recognised by ProofGeneral. This results in some subtle configuration issues, which we will explain in this section. To keep things simple, let us start with a ``silly'' command that does nothing at all. We shall name this command \isacommand{foobar}. On the ML-level it can be defined as:*}ML{*let val do_nothing = Scan.succeed (LocalTheory.theory I) val kind = OuterKeyword.thy_declin OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothingend *}text {* The crucial function @{ML_ind local_theory in OuterSyntax} expects a name for the command, a short description, a kind indicator (which we will explain later more thoroughly) and a parser producing a local theory transition (its purpose will also explained later). While this is everything you have to do on the ML-level, you need a keyword file that can be loaded by ProofGeneral. This is to enable ProofGeneral to recognise \isacommand{foobar} as a command. Such a keyword file can be generated with the command-line: @{text [display] "$ isabelle keywords -k foobar some_log_files"} The option @{text "-k foobar"} indicates which postfix the name of the keyword file will be assigned. In the case above the file will be named @{text "isar-keywords-foobar.el"}. This command requires log files to be present (in order to extract the keywords from them). To generate these log files, you first need to package the code above into a separate theory file named @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the complete code. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{figure}[t] \begin{graybox}\small \isacommand{theory}~@{text Command}\\ \isacommand{imports}~@{text Main}\\ \isacommand{begin}\\ \isacommand{ML}~@{text "\<verbopen>"}\\ @{ML"let val do_nothing = Scan.succeed (LocalTheory.theory I) val kind = OuterKeyword.thy_declin OuterSyntax.local_theory \"foobar\" \"description of foobar\" kind do_nothingend"}\\ @{text "\<verbclose>"}\\ \isacommand{end} \end{graybox} \caption{This file can be used to generate a log file. This log file in turn can be used to generate a keyword file containing the command \isacommand{foobar}. \label{fig:commandtheory}} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% For our purposes it is sufficient to use the log files of the theories @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as the log file for the theory @{text "Command.thy"}, which contains the new \isacommand{foobar}-command. If you target other logics besides HOL, such as Nominal or ZF, then you need to adapt the log files appropriately. @{text Pure} and @{text HOL} are usually compiled during the installation of Isabelle. So log files for them should be already available. If not, then they can be conveniently compiled with the help of the build-script from the Isabelle distribution. @{text [display] "$ ./build -m \"Pure\"$ ./build -m \"HOL\""} The @{text "Pure-ProofGeneral"} theory needs to be compiled with: @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""} For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory with: @{text [display] "$ isabelle mkdir FoobarCommand"} This generates a directory containing the files: @{text [display] "./IsaMakefile./FoobarCommand/ROOT.ML./FoobarCommand/document./FoobarCommand/document/root.tex"} You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"} and add the line @{text [display] "no_document use_thy \"Command\";"} to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing: @{text [display] "$ isabelle make"} If the compilation succeeds, you have finally created all the necessary log files. They are stored in the directory @{text [display] "~/.isabelle/heaps/Isabelle2009/polyml-5.2.1_x86-linux/log"} or something similar depending on your Isabelle distribution and architecture. One quick way to assign a shell variable to this directory is by typing @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"} on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the directory should include the files: @{text [display] "Pure.gzHOL.gzPure-ProofGeneral.gzHOL-FoobarCommand.gz"} From them you can create the keyword files. Assuming the name of the directory is in @{text "$ISABELLE_LOGS"}, then the Unix command for creating the keyword file is:@{text [display]"$ isabelle keywords -k foobar $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"} The result is the file @{text "isar-keywords-foobar.el"}. It should contain the string @{text "foobar"} twice.\footnote{To see whether things are fine, check that @{text "grep foobar"} on this file returns something non-empty.} This keyword file needs to be copied into the directory @{text "~/.isabelle/etc"}. To make ProofGeneral aware of it, you have to start Isabelle with the option @{text "-k foobar"}, that is: @{text [display] "$ isabelle emacs -k foobar a_theory_file"} If you now build a theory on top of @{text "Command.thy"}, then you can use the command \isacommand{foobar}. You can just write*}foobartext {* but you will not see any action as we chose to implement this command to do nothing. The point of this command is only to show the procedure of how to interact with ProofGeneral. A similar procedure has to be done with any other new command, and also any new keyword that is introduced with the function @{ML_ind keyword in OuterKeyword}. For example:*}ML{*val _ = OuterKeyword.keyword "blink" *}text {* At the moment the command \isacommand{foobar} is not very useful. Let us refine it a bit next by letting it take a proposition as argument and printing this proposition inside the tracing buffer. The crucial part of a command is the function that determines the behaviour of the command. In the code above we used a ``do-nothing''-function, which because of @{ML_ind succeed in Scan} does not parse any argument, but immediately returns the simple function @{ML "LocalTheory.theory I"}. We can replace this code by a function that first parses a proposition (using the parser @{ML OuterParse.prop}), then prints out the tracing information (using a new function @{text trace_prop}) and finally does nothing. For this you can write:*}ML{*let fun trace_prop str = LocalTheory.theory (fn ctxt => (tracing str; ctxt)) val kind = OuterKeyword.thy_declin OuterSyntax.local_theory "foobar_trace" "traces a proposition" kind (OuterParse.prop >> trace_prop)end *}text {* The command is now \isacommand{foobar\_trace} and can be used to see the proposition in the tracing buffer. *}foobar_trace "True \<and> False"text {* Note that so far we used @{ML_ind thy_decl in OuterKeyword} as the kind indicator for the command. This means that the command finishes as soon as the arguments are processed. Examples of this kind of commands are \isacommand{definition} and \isacommand{declare}. In other cases, commands are expected to parse some arguments, for example a proposition, and then ``open up'' a proof in order to prove the proposition (for example \isacommand{lemma}) or prove some other properties (for example \isacommand{function}). To achieve this kind of behaviour, you have to use the kind indicator @{ML_ind thy_goal in OuterKeyword} and the function @{ML "local_theory_to_proof" in OuterSyntax} to set up the command. Note, however, once you change the ``kind'' of a command from @{ML thy_decl in OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs to be re-created! Below we show the command \isacommand{foobar\_goal} which takes a proposition as argument and then starts a proof in order to prove it. Therefore in Line 9, we set the kind indicator to @{ML thy_goal in OuterKeyword}.*}ML%linenosgray{*let fun goal_prop str lthy = let val prop = Syntax.read_prop lthy str in Proof.theorem_i NONE (K I) [[(prop,[])]] lthy end val kind = OuterKeyword.thy_goalin OuterSyntax.local_theory_to_proof "foobar_goal" "proves a proposition" kind (OuterParse.prop >> goal_prop)end *}text {* The function @{text goal_prop} in Lines 2 to 7 takes a string (the proposition to be proved) and a context as argument. The context is necessary in order to be able to use @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition. In Line 6 the function @{ML_ind theorem_i in Proof} starts the proof for the proposition. Its argument @{ML NONE} stands for a locale (which we chose to omit); the argument @{ML "(K I)"} stands for a function that determines what should be done with the theorem once it is proved (we chose to just forget about it). Line 9 contains the parser for the proposition. If you now type \isacommand{foobar\_goal}~@{text [quotes] "True \<and> True"}, you obtain the following proof state*}foobar_goal "True \<and> True"txt {* \begin{minipage}{\textwidth} @{subgoals [display]} \end{minipage}\medskip and can prove the proposition as follows.*}apply(rule conjI)apply(rule TrueI)+donetext {* {\bf TBD below} (FIXME: read a name and show how to store theorems; see @{ML_ind note in LocalTheory})*}ML_val{*val r = Unsynchronized.ref (NONE:(unit -> term) option)*}ML{*let fun after_qed thm_name thms lthy = LocalTheory.note Thm.theoremK (thm_name, (flat thms)) lthy |> snd fun setup_proof (thm_name, (txt, pos)) lthy = let val trm = ML_Context.evaluate lthy true ("r", r) txt in Proof.theorem_i NONE (after_qed thm_name) [[(trm,[])]] lthy end val parser = SpecParse.opt_thm_name ":" -- OuterParse.ML_sourcein OuterSyntax.local_theory_to_proof "foobar_prove" "proving a proposition" OuterKeyword.thy_goal (parser >> setup_proof)end*}foobar_prove test: {* @{prop "True"} *}apply(rule TrueI)done(*ML {*structure TacticData = ProofDataFun( type T = thm list -> tactic; fun init _ = undefined;);val set_tactic = TacticData.put;*}ML {* TacticData.get @{context}*}ML {* Method.set_tactic *}ML {* fun tactic (facts: thm list) : tactic = (atac 1) *}ML {* Context.map_proof *}ML {* ML_Context.expression *}ML {* METHOD *}ML {* fun myexpression pos bind body txt =let val _ = tracing ("bind)" ^ bind) val _ = tracing ("body)" ^ body) val _ = tracing ("txt)" ^ txt) val _ = tracing ("result) " ^ "Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^ " end (ML_Context.the_generic_context ())));")in ML_Context.exec (fn () => ML_Context.eval false pos ("Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^ " end (ML_Context.the_generic_context ())));"))end*}ML {*fun ml_tactic (txt, pos) ctxt =let val ctxt' = ctxt |> Context.proof_map (myexpression pos "fun tactic (facts: thm list) : tactic" "Context.map_proof (Method.set_tactic tactic)" txt);in Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt')end;*}ML {*fun tactic3 (txt, pos) ctxt = let val _ = tracing ("1) " ^ txt ) in METHOD (ml_tactic (txt, pos) ctxt; K (atac 1)) end*}setup {*Method.setup (Binding.name "tactic3") (Scan.lift (OuterParse.position Args.name) >> tactic3) "ML tactic as proof method"*}lemma "A \<Longrightarrow> A"apply(tactic3 {* (atac 1) *})doneML {* (ML_Context.the_generic_context ())*}ML {*Context.set_thread_data;ML_Context.the_generic_context*}lemma "A \<Longrightarrow> A"ML_prf {*Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));*}ML {*Context.set_thread_data (SOME ((let fun tactic (facts: thm list) : tactic = (atac 1) in 3 end) (ML_Context.the_generic_context ())));*}ML {*Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));*}ML {*let fun tactic (facts: thm list) : tactic = atacin Context.map_proof (Method.set_tactic tactic)end *}end *}ML {* Toplevel.program (fn () => (ML_Context.expression Position.none "val plus : int" "3 + 4" "1" (Context.Proof @{context})))*}ML {*fun ml_tactic (txt, pos) ctxt = let val ctxt' = ctxt |> Context.proof_map (ML_Context.expression pos "fun tactic (facts: thm list) : tactic" "Context.map_proof (Method.set_tactic tactic)" txt); in Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt') end;*}ML {*Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));*}*)section {* Methods (TBD) *}text {* (FIXME: maybe move to after the tactic section) Methods are central to Isabelle. They are the ones you use for example in \isacommand{apply}. To print out all currently known methods you can use the Isabelle command: \begin{isabelle} \isacommand{print\_methods}\\ @{text "> methods:"}\\ @{text "> -: do nothing (insert current facts only)"}\\ @{text "> HOL.default: apply some intro/elim rule (potentially classical)"}\\ @{text "> ..."} \end{isabelle} An example of a very simple method is:*}method_setup %gray foo = {* Scan.succeed (K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *} "foo method for conjE and conjI"text {* It defines the method @{text foo}, which takes no arguments (therefore the parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which applies @{thm [source] conjE} and then @{thm [source] conjI}. The function @{ML_ind SIMPLE_METHOD} turns such a tactic into a method. The method @{text "foo"} can be used as follows*}lemma shows "A \<and> B \<Longrightarrow> C \<and> D"apply(foo)txt {* where it results in the goal state \begin{minipage}{\textwidth} @{subgoals} \end{minipage} *}(*<*)oops(*>*)(*ML {* SIMPLE_METHOD *}ML {* METHOD *}ML {* K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1)) *}ML {* Scan.succeed *}*)text {* (FIXME: explain a version of rule-tac)*}(*<*)(* THIS IS AN OLD VERSION OF THE PARSING CHAPTER BY JEREMY DAWSON *)chapter {* Parsing *}text {* Lots of Standard ML code is given in this document, for various reasons, including: \begin{itemize} \item direct quotation of code found in the Isabelle source files, or simplified versions of such code \item identifiers found in the Isabelle source code, with their types (or specialisations of their types) \item code examples, which can be run by the reader, to help illustrate the behaviour of functions found in the Isabelle source code \item ancillary functions, not from the Isabelle source code, which enable the reader to run relevant code examples \item type abbreviations, which help explain the uses of certain functions \end{itemize}*}section {* Parsing Isar input *}text {* The typical parsing function has the type \texttt{'src -> 'res * 'src}, with input of type \texttt{'src}, returning a result of type \texttt{'res}, which is (or is derived from) the first part of the input, and also returning the remainder of the input. (In the common case, when it is clear what the ``remainder of the input'' means, we will just say that the functions ``returns'' the value of type \texttt{'res}). An exception is raised if an appropriate value cannot be produced from the input. A range of exceptions can be used to identify different reasons for the failure of a parse. This contrasts the standard parsing function in Standard ML, which is of type \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option}; (for example, \texttt{List.getItem} and \texttt{Substring.getc}). However, much of the discussion at FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html is relevant. Naturally one may convert between the two different sorts of parsing functions as follows: \begin{verbatim} open StringCvt ; type ('res, 'src) ex_reader = 'src -> 'res * 'src ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader fun ex_reader rdr src = Option.valOf (rdr src) ; reader : ('res, 'src) ex_reader -> ('res, 'src) reader fun reader exrdr src = SOME (exrdr src) handle _ => NONE ; \end{verbatim}*}section{* The \texttt{Scan} structure *}text {* The source file is \texttt{src/General/scan.ML}. This structure provides functions for using and combining parsing functions of the type \texttt{'src -> 'res * 'src}. Three exceptions are used: \begin{verbatim} exception MORE of string option; (*need more input (prompt)*) exception FAIL of string option; (*try alternatives (reason of failure)*) exception ABORT of string; (*dead end*) \end{verbatim} Many functions in this structure (generally those with names composed of symbols) are declared as infix. Some functions from that structure are \begin{verbatim} |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> 'src -> 'res2 * 'src'' --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> 'src -> 'res1 * 'src'' -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> 'src -> ('res1 * 'res2) * 'src'' ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') -> 'src -> string * 'src'' \end{verbatim} These functions parse a result off the input source twice. \texttt{|--} and \texttt{--|} return the first result and the second result, respectively. \texttt{--} returns both. \verb|^^| returns the result of concatenating the two results (which must be strings). Note how, although the types \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same, the types as shown help suggest the behaviour of the functions. \begin{verbatim} :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> 'src -> ('res1 * 'res2) * 'src'' :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> 'src -> 'res2 * 'src'' \end{verbatim} These are similar to \texttt{|--} and \texttt{--|}, except that the second parsing function can depend on the result of the first. \begin{verbatim} >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src' || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src \end{verbatim} \texttt{p >> f} applies a function \texttt{f} to the result of a parse. \texttt{||} tries a second parsing function if the first one fails by raising an exception of the form \texttt{FAIL \_}. \begin{verbatim} succeed : 'res -> ('src -> 'res * 'src) ; fail : ('src -> 'res_src) ; !! : ('src * string option -> string) -> ('src -> 'res_src) -> ('src -> 'res_src) ; \end{verbatim} \texttt{succeed r} returns \texttt{r}, with the input unchanged. \texttt{fail} always fails, raising exception \texttt{FAIL NONE}. \texttt{!! f} only affects the failure mode, turning a failure that raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}. This is used to prevent recovery from the failure --- thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, it won't recover by trying \texttt{parse2}. \begin{verbatim} one : ('si -> bool) -> ('si list -> 'si * 'si list) ; some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ; \end{verbatim} These require the input to be a list of items: they fail, raising \texttt{MORE NONE} if the list is empty. On other failures they raise \texttt{FAIL NONE} \texttt{one p} takes the first item from the list if it satisfies \texttt{p}, otherwise fails. \texttt{some f} takes the first item from the list and applies \texttt{f} to it, failing if this returns \texttt{NONE}. \begin{verbatim} many : ('si -> bool) -> 'si list -> 'si list * 'si list ; \end{verbatim} \texttt{many p} takes items from the input until it encounters one which does not satisfy \texttt{p}. If it reaches the end of the input it fails, raising \texttt{MORE NONE}. \texttt{many1} (with the same type) fails if the first item does not satisfy \texttt{p}. \begin{verbatim} option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src) optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src) \end{verbatim} \texttt{option}: where the parser \texttt{f} succeeds with result \texttt{r} or raises \texttt{FAIL \_}, \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}. \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_}, \texttt{optional f default} provides the result \texttt{default}. \begin{verbatim} repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src \end{verbatim} \texttt{repeat f} repeatedly parses an item off the remaining input until \texttt{f} fails with \texttt{FAIL \_} \texttt{repeat1} is as for \texttt{repeat}, but requires at least one successful parse. \begin{verbatim} lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src)) \end{verbatim} \texttt{lift} changes the source type of a parser by putting in an extra component \texttt{'ex}, which is ignored in the parsing. The \texttt{Scan} structure also provides the type \texttt{lexicon}, HOW DO THEY WORK ?? TO BE COMPLETED \begin{verbatim} dest_lexicon: lexicon -> string list ; make_lexicon: string list list -> lexicon ; empty_lexicon: lexicon ; extend_lexicon: string list list -> lexicon -> lexicon ; merge_lexicons: lexicon -> lexicon -> lexicon ; is_literal: lexicon -> string list -> bool ; literal: lexicon -> string list -> string list * string list ; \end{verbatim} Two lexicons, for the commands and keywords, are stored and can be retrieved by: \begin{verbatim} val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ; val commands = Scan.dest_lexicon command_lexicon ; val keywords = Scan.dest_lexicon keyword_lexicon ; \end{verbatim}*}section{* The \texttt{OuterLex} structure *}text {* The source file is @{text "src/Pure/Isar/outer_lex.ML"}. In some other source files its name is abbreviated: \begin{verbatim} structure T = OuterLex; \end{verbatim} This structure defines the type \texttt{token}. (The types \texttt{OuterLex.token}, \texttt{OuterParse.token} and \texttt{SpecParse.token} are all the same). Input text is split up into tokens, and the input source type for many parsing functions is \texttt{token list}. The datatype definition (which is not published in the signature) is \begin{verbatim} datatype token = Token of Position.T * (token_kind * string); \end{verbatim} but here are some runnable examples for viewing tokens: *}ML{* val toks = OuterSyntax.scan Position.none "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ;*}ML{* print_depth 20 ;*}ML{* map OuterLex.text_of toks ;*}ML{* val proper_toks = filter OuterLex.is_proper toks ;*} ML{* map OuterLex.kind_of proper_toks *}ML{* map OuterLex.unparse proper_toks ;*}ML{* OuterLex.stopper*}text {* The function \texttt{is\_proper : token -> bool} identifies tokens which are not white space or comments: many parsing functions assume require spaces or comments to have been filtered out. There is a special end-of-file token: \begin{verbatim} val (tok_eof : token, is_eof : token -> bool) = T.stopper ; (* end of file token *) \end{verbatim}*}section {* The \texttt{OuterParse} structure *}text {* The source file is \texttt{src/Pure/Isar/outer\_parse.ML}. In some other source files its name is abbreviated: \begin{verbatim} structure P = OuterParse; \end{verbatim} Here the parsers use \texttt{token list} as the input source type. Some of the parsers simply select the first token, provided that it is of the right kind (as returned by \texttt{T.kind\_of}): these are \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var, type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof} Others select the first token, provided that it is one of several kinds, (eg, \texttt{name, xname, text, typ}). \begin{verbatim} type 'a tlp = token list -> 'a * token list ; (* token list parser *) $$$ : string -> string tlp nat : int tlp ; maybe : 'a tlp -> 'a option tlp ; \end{verbatim} \texttt{\$\$\$ s} returns the first token, if it equals \texttt{s} \emph{and} \texttt{s} is a keyword. \texttt{nat} returns the first token, if it is a number, and evaluates it. \texttt{maybe}: if \texttt{p} returns \texttt{r}, then \texttt{maybe p} returns \texttt{SOME r} ; if the first token is an underscore, it returns \texttt{NONE}. A few examples: \begin{verbatim} P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *) P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *) val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ; val proper_toks = List.filter T.is_proper toks ; P.list P.nat toks ; (* OK, doesn't recognize white space *) P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *) P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *) P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *) val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ; P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *) \end{verbatim} The following code helps run examples: \begin{verbatim} fun parse_str tlp str = let val toks : token list = OuterSyntax.scan str ; val proper_toks = List.filter T.is_proper toks @ [tok_eof] ; val (res, rem_toks) = tlp proper_toks ; val rem_str = String.concat (Library.separate " " (List.map T.unparse rem_toks)) ; in (res, rem_str) end ; \end{verbatim} Some examples from \texttt{src/Pure/Isar/outer\_parse.ML} \begin{verbatim} val type_args = type_ident >> Library.single || $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") || Scan.succeed []; \end{verbatim} There are three ways parsing a list of type arguments can succeed. The first line reads a single type argument, and turns it into a singleton list. The second line reads "(", and then the remainder, ignoring the "(" ; the remainder consists of a list of type identifiers (at least one), and then a ")" which is also ignored. The \texttt{!!!} ensures that if the parsing proceeds this far and then fails, it won't try the third line (see the description of \texttt{Scan.!!}). The third line consumes no input and returns the empty list. \begin{verbatim} fun triple2 (x, (y, z)) = (x, y, z); val arity = xname -- ($$$ "::" |-- !!! ( Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) [] -- sort)) >> triple2; \end{verbatim} The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is ignored), then optionally a list $ss$ of sorts and then another sort $s$. The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$. The second line reads the optional list of sorts: it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored, and between them a comma-separated list of sorts. If this list is absent, the default \texttt{[]} provides the list of sorts. \begin{verbatim} parse_str P.type_args "('a, 'b) ntyp" ; parse_str P.type_args "'a ntyp" ; parse_str P.type_args "ntyp" ; parse_str P.arity "ty :: tycl" ; parse_str P.arity "ty :: (tycl1, tycl2) tycl" ; \end{verbatim}*}section {* The \texttt{SpecParse} structure *}text {* The source file is \texttt{src/Pure/Isar/spec\_parse.ML}. This structure contains token list parsers for more complicated values. For example, \begin{verbatim} open SpecParse ; attrib : Attrib.src tok_rdr ; attribs : Attrib.src list tok_rdr ; opt_attribs : Attrib.src list tok_rdr ; xthm : (thmref * Attrib.src list) tok_rdr ; xthms1 : (thmref * Attrib.src list) list tok_rdr ; parse_str attrib "simp" ; parse_str opt_attribs "hello" ; val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ; map Args.dest_src ass ; val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ; parse_str xthm "mythm [attr]" ; parse_str xthms1 "thm1 [attr] thms2" ; \end{verbatim} As you can see, attributes are described using types of the \texttt{Args} structure, described below.*}section{* The \texttt{Args} structure *}text {* The source file is \texttt{src/Pure/Isar/args.ML}. The primary type of this structure is the \texttt{src} datatype; the single constructors not published in the signature, but \texttt{Args.src} and \texttt{Args.dest\_src} are in fact the constructor and destructor functions. Note that the types \texttt{Attrib.src} and \texttt{Method.src} are in fact \texttt{Args.src}. \begin{verbatim} src : (string * Args.T list) * Position.T -> Args.src ; dest_src : Args.src -> (string * Args.T list) * Position.T ; Args.pretty_src : Proof.context -> Args.src -> Pretty.T ; fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ; val thy = ML_Context.the_context () ; val ctxt = ProofContext.init thy ; map (pr_src ctxt) ass ; \end{verbatim} So an \texttt{Args.src} consists of the first word, then a list of further ``arguments'', of type \texttt{Args.T}, with information about position in the input. \begin{verbatim} (* how an Args.src is parsed *) P.position : 'a tlp -> ('a * Position.T) tlp ; P.arguments : Args.T list tlp ; val parse_src : Args.src tlp = P.position (P.xname -- P.arguments) >> Args.src ; \end{verbatim} \begin{verbatim} val ((first_word, args), pos) = Args.dest_src asrc ; map Args.string_of args ; \end{verbatim} The \texttt{Args} structure contains more parsers and parser transformers for which the input source type is \texttt{Args.T list}. For example, \begin{verbatim} type 'a atlp = Args.T list -> 'a * Args.T list ; open Args ; nat : int atlp ; (* also Args.int *) thm_sel : PureThy.interval list atlp ; list : 'a atlp -> 'a list atlp ; attribs : (string -> string) -> Args.src list atlp ; opt_attribs : (string -> string) -> Args.src list atlp ; (* parse_atl_str : 'a atlp -> (string -> 'a * string) ; given an Args.T list parser, to get a string parser *) fun parse_atl_str atlp str = let val (ats, rem_str) = parse_str P.arguments str ; val (res, rem_ats) = atlp ats ; in (res, String.concat (Library.separate " " (List.map Args.string_of rem_ats @ [rem_str]))) end ; parse_atl_str Args.int "-1-," ; parse_atl_str (Scan.option Args.int) "x1-," ; parse_atl_str Args.thm_sel "(1-,4,13-22)" ; val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I) "[THEN trans [THEN sym], simp, OF sym]" ; \end{verbatim} From here, an attribute is interpreted using \texttt{Attrib.attribute}. \texttt{Args} has a large number of functions which parse an \texttt{Args.src} and also refer to a generic context. Note the use of \texttt{Scan.lift} for this. (as does \texttt{Attrib} - RETHINK THIS) (\texttt{Args.syntax} shown below has type specialised) \begin{verbatim} type ('res, 'src) parse_fn = 'src -> 'res * 'src ; type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ; Scan.lift : 'a atlp -> 'a cgatlp ; term : term cgatlp ; typ : typ cgatlp ; Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ; Attrib.thm : thm cgatlp ; Attrib.thms : thm list cgatlp ; Attrib.multi_thm : thm list cgatlp ; (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ; given a (Context.generic * Args.T list) parser, to get a string parser *) fun parse_cgatl_str cgatlp str = let (* use the current generic context *) val generic = Context.Theory thy ; val (ats, rem_str) = parse_str P.arguments str ; (* ignore any change to the generic context *) val (res, (_, rem_ats)) = cgatlp (generic, ats) ; in (res, String.concat (Library.separate " " (List.map Args.string_of rem_ats @ [rem_str]))) end ; \end{verbatim}*}section{* Attributes, and the \texttt{Attrib} structure *}text {* The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}. The source file for the \texttt{Attrib} structure is \texttt{src/Pure/Isar/attrib.ML}. Most attributes use a theorem to change a generic context (for example, by declaring that the theorem should be used, by default, in simplification), or change a theorem (which most often involves referring to the current theory). The functions \texttt{Thm.rule\_attribute} and \texttt{Thm.declaration\_attribute} create attributes of these kinds. \begin{verbatim} type attribute = Context.generic * thm -> Context.generic * thm; type 'a trf = 'a -> 'a ; (* transformer of a given type *) Thm.rule_attribute : (Context.generic -> thm -> thm) -> attribute ; Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ; Attrib.print_attributes : theory -> unit ; Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ; List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ; \end{verbatim} An attribute is stored in a theory as indicated by: \begin{verbatim} Attrib.add_attributes : (bstring * (src -> attribute) * string) list -> theory trf ; (* Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ; *) \end{verbatim} where the first and third arguments are name and description of the attribute, and the second is a function which parses the attribute input text (including the attribute name, which has necessarily already been parsed). Here, \texttt{THEN\_att} is a function declared in the code for the structure \texttt{Attrib}, but not published in its signature. The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of \texttt{Attrib.add\_attributes} to add a number of attributes. \begin{verbatim} FullAttrib.THEN_att : src -> attribute ; FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ; FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ; \end{verbatim} \begin{verbatim} Attrib.syntax : attribute cgatlp -> src -> attribute ; Attrib.no_args : attribute -> src -> attribute ; \end{verbatim} When this is called as \texttt{syntax scan src (gc, th)} the generic context \texttt{gc} is used (and potentially changed to \texttt{gc'}) by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would then be applied to \texttt{(gc', th)}. The source for parsing the attribute is the arguments part of \texttt{src}, which must all be consumed by the parse. For example, for \texttt{Attrib.no\_args attr src}, the attribute parser simply returns \texttt{attr}, requiring that the arguments part of \texttt{src} must be empty. Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified: \begin{verbatim} fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ; rot_att_n : int -> attribute ; val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ; val rotated_att : src -> attribute = Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ; val THEN_arg : int cgatlp = Scan.lift (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ; Attrib.thm : thm cgatlp ; THEN_arg -- Attrib.thm : (int * thm) cgatlp ; fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ; THEN_att_n : int * thm -> attribute ; val THEN_att : src -> attribute = Attrib.syntax (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp); \end{verbatim} The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg} read an optional argument, which for \texttt{rotated} is an integer, and for \texttt{THEN} is a natural enclosed in square brackets; the default, if the argument is absent, is 1 in each case. Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is parsed by \texttt{Attrib.thm}. Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}.*}section{* Methods, and the \texttt{Method} structure *}text {* The source file is \texttt{src/Pure/Isar/method.ML}. The type \texttt{method} is defined by the datatype declaration \begin{verbatim} (* datatype method = Meth of thm list -> cases_tactic; *) RuleCases.NO_CASES : tactic -> cases_tactic ; \end{verbatim} In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor \texttt{Meth}. A \texttt{cases\_tactic} is an elaborated version of a tactic. \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a \texttt{cases\_tactic} without any further case information. For further details see the description of structure \texttt{RuleCases} below. The list of theorems to be passed to a method consists of the current \emph{facts} in the proof. \begin{verbatim} RAW_METHOD : (thm list -> tactic) -> method ; METHOD : (thm list -> tactic) -> method ; SIMPLE_METHOD : tactic -> method ; SIMPLE_METHOD' : (int -> tactic) -> method ; SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ; RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ; METHOD_CASES : (thm list -> cases_tactic) -> method ; \end{verbatim} A method is, in its simplest form, a tactic; applying the method is to apply the tactic to the current goal state. Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying \texttt{tacf} to the current {facts}, and applying that tactic to the goal state. \texttt{METHOD} is similar but also first applies \texttt{Goal.conjunction\_tac} to all subgoals. \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then applies \texttt{tacf}. \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then applies \texttt{tacf} to subgoal 1. \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by \texttt{quant}, which may be, for example, \texttt{ALLGOALS} (all subgoals), \texttt{TRYALL} (try all subgoals, failure is OK), \texttt{FIRSTGOAL} (try subgoals until it succeeds once), \texttt{(fn tacf => tacf 4)} (subgoal 4), etc (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}). A method is stored in a theory as indicated by: \begin{verbatim} Method.add_method : (bstring * (src -> Proof.context -> method) * string) -> theory trf ; ( * * ) \end{verbatim} where the first and third arguments are name and description of the method, and the second is a function which parses the method input text (including the method name, which has necessarily already been parsed). Here, \texttt{xxx} is a function declared in the code for the structure \texttt{Method}, but not published in its signature. The source file \texttt{src/Pure/Isar/method.ML} shows the use of \texttt{Method.add\_method} to add a number of methods.*}(*>*)end