theory Parsingimports Base "Package/Simple_Inductive_Package"keywords "foobar" "foobar_trace" :: thy_decl and "foobar_goal" "foobar_prove" :: thy_goalbeginchapter \<open>Parsing\label{chp:parsing}\<close>text \<open> \begin{flushright} {\em An important principle underlying the success and popularity of Unix\\ is the philosophy of building on the work of others.} \\[1ex] Tony Travis in an email about the\\ ``LINUX is obsolete'' debate \end{flushright} \begin{flushright} {\em Document your code as if someone else might have to take it over at any moment and know what to do with it. That person might actually be you -- how often have you had to revisit your own code and thought to yourself, what was I trying to do here?} \\[1ex] Phil Chu in ``Seven Habits of Highly Effective Programmers'' \end{flushright} Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. Commands, such as \isacommand{definition}, \isacommand{inductive} and so on, belong to the outer syntax, whereas terms, types and so on belong to the inner syntax. For parsing inner syntax, Isabelle uses a rather general and sophisticated algorithm, which is driven by priority grammars. Parsers for outer syntax are built up by functional parsing combinators. These combinators are a well-established technique for parsing, which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. Isabelle developers are usually concerned with writing these outer syntax parsers, either for new definitional packages or for calling methods with specific arguments. \begin{readmore} The library for writing parser combinators is split up, roughly, into two parts: The first part consists of a collection of generic parser combinators defined in the structure @{ML_structure Scan} in the file @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of combinators for dealing with specific token types, which are defined in the structure @{ML_structure Parse} in the file @{ML_file "Pure/Isar/parse.ML"}. In addition specific parsers for packages are defined in @{ML_file "Pure/Isar/parse_spec.ML"}. Parsers for method arguments are defined in @{ML_file "Pure/Isar/args.ML"}. \end{readmore}\<close>section \<open>Building Generic Parsers\<close>text \<open> Let us first have a look at parsing strings using generic parsing combinators. The function @{ML_ind "$$" in Scan} takes a string as argument and will ``consume'' this string from a given input list of strings. ``Consume'' in this context means that it will return a pair consisting of this string and the rest of the input list. For example: @{ML_matchresult [display,gray] \<open>($$ "h") (Symbol.explode "hello")\<close> \<open>("h", ["e", "l", "l", "o"])\<close>} @{ML_matchresult [display,gray] \<open>($$ "w") (Symbol.explode "world")\<close> \<open>("w", ["o", "r", "l", "d"])\<close>} The function @{ML \<open>$$\<close>} will either succeed (as in the two examples above) or raise the exception \<open>FAIL\<close> if no string can be consumed. For example trying to parse @{ML_response [display,gray] \<open>($$ "x") (Symbol.explode "world")\<close> \<open>exception FAIL NONE raised\<close>} will raise the exception \<open>FAIL\<close>. The function @{ML_ind "$$" in Scan} will also fail if you try to consume more than a single character. There are three exceptions that are raised by the parsing combinators: \begin{itemize} \item \<open>FAIL\<close> indicates that alternative routes of parsing might be explored. \item \<open>MORE\<close> indicates that there is not enough input for the parser. For example in \<open>($$ "h") []\<close>. \item \<open>ABORT\<close> indicates that a dead end is reached. It is used for example in the function @{ML \<open>!!\<close>} (see below). \end{itemize} However, note that these exceptions are private to the parser and cannot be accessed by the programmer (for example to handle them). In the examples above we use the function @{ML_ind explode in Symbol} from the structure @{ML_structure Symbol}, instead of the more standard library function @{ML_ind explode in String}, for obtaining an input list for the parser. The reason is that @{ML explode in Symbol} is aware of character sequences, for example \<open>\<foo>\<close>, that have a special meaning in Isabelle. To see the difference consider@{ML_matchresult [display,gray]\<open>String.explode "\<foo> bar"\<close>\<open>[#"\\", #"<", #"f", #"o", #"o", #">", #" ", #"b", #"a", #"r"]\<close>}@{ML_matchresult [display,gray]\<open>Symbol.explode "\<foo> bar"\<close>\<open>["\<foo>", " ", "b", "a", "r"]\<close>} Slightly more general than the parser @{ML \<open>$$\<close>} is the function @{ML_ind one in Scan}, in that it takes a predicate as argument and then parses exactly one item from the input list satisfying this predicate. For example the following parser either consumes an @{text [quotes] "h"} or a @{text [quotes] "w"}:@{ML_matchresult [display,gray] \<open>let val hw = Scan.one (fn x => x = "h" orelse x = "w") val input1 = Symbol.explode "hello" val input2 = Symbol.explode "world"in (hw input1, hw input2)end\<close> \<open>(("h", ["e", "l", "l", "o"]),("w", ["o", "r", "l", "d"]))\<close>} Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. For example parsing \<open>h\<close>, \<open>e\<close> and \<open>l\<close> (in this order) you can achieve by: @{ML_matchresult [display,gray] \<open>($$ "h" -- $$ "e" -- $$ "l") (Symbol.explode "hello")\<close> \<open>((("h", "e"), "l"), ["l", "o"])\<close>} Note how the result of consumed strings builds up on the left as nested pairs. If, as in the previous example, you want to parse a particular string, then you can use the function @{ML_ind this_string in Scan}. @{ML_matchresult [display,gray] \<open>Scan.this_string "hell" (Symbol.explode "hello")\<close> \<open>("hell", ["o"])\<close>} Parsers that explore alternatives can be constructed using the function @{ML_ind "||" in Scan}. The parser @{ML \<open>(p || q)\<close> for p q} returns the result of \<open>p\<close>, in case it succeeds; otherwise it returns the result of \<open>q\<close>. For example:@{ML_matchresult [display,gray] \<open>let val hw = $$ "h" || $$ "w" val input1 = Symbol.explode "hello" val input2 = Symbol.explode "world"in (hw input1, hw input2)end\<close> \<open>(("h", ["e", "l", "l", "o"]), ("w", ["o", "r", "l", "d"]))\<close>} The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing function for parsers, except that they discard the item being parsed by the first (respectively second) parser. That means the item being dropped is the one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away. For example:@{ML_matchresult [display,gray]\<open>let val just_e = $$ "h" |-- $$ "e" val just_h = $$ "h" --| $$ "e" val input = Symbol.explode "hello" in (just_e input, just_h input)end\<close> \<open>(("e", ["l", "l", "o"]), ("h", ["l", "l", "o"]))\<close>} The parser @{ML \<open>Scan.optional p x\<close> for p x} returns the result of the parser \<open>p\<close>, in case it succeeds; otherwise it returns the default value \<open>x\<close>. For example:@{ML_matchresult [display,gray]\<open>let val p = Scan.optional ($$ "h") "x" val input1 = Symbol.explode "hello" val input2 = Symbol.explode "world" in (p input1, p input2)end\<close> \<open>(("h", ["e", "l", "l", "o"]), ("x", ["w", "o", "r", "l", "d"]))\<close>} The function @{ML_ind option in Scan} works similarly, except no default value can be given. Instead, the result is wrapped as an \<open>option\<close>-type. For example:@{ML_matchresult [display,gray]\<open>let val p = Scan.option ($$ "h") val input1 = Symbol.explode "hello" val input2 = Symbol.explode "world" in (p input1, p input2)end\<close> \<open>((SOME "h", ["e", "l", "l", "o"]), (NONE, ["w", "o", "r", "l", "d"]))\<close>} The function @{ML_ind ahead in Scan} parses some input, but leaves the original input unchanged. For example: @{ML_matchresult [display,gray] \<open>Scan.ahead (Scan.this_string "foo") (Symbol.explode "foo")\<close> \<open>("foo", ["f", "o", "o"])\<close>} The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages during parsing. For example if you want to parse \<open>p\<close> immediately followed by \<open>q\<close>, or start a completely different parser \<open>r\<close>, you might write: @{ML [display,gray] \<open>(p -- q) || r\<close> for p q r} However, this parser is problematic for producing a useful error message, if the parsing of @{ML \<open>(p -- q)\<close> for p q} fails. Because with the parser above you lose the information that \<open>p\<close> should be followed by \<open>q\<close>. To see this assume that \<open>p\<close> is present in the input, but it is not followed by \<open>q\<close>. That means @{ML \<open>(p -- q)\<close> for p q} will fail and hence the alternative parser \<open>r\<close> will be tried. However, in many circumstances this will be the wrong parser for the input ``\<open>p\<close>-followed-by-something'' and therefore will also fail. The error message is then caused by the failure of \<open>r\<close>, not by the absence of \<open>q\<close> in the input. Such situation can be avoided when using the function @{ML \<open>!!\<close>}. This function aborts the whole process of parsing in case of a failure and prints an error message. For example if you invoke the parser @{ML [display,gray] \<open>!! (fn _ => fn _ =>"foo") ($$ "h")\<close>} on @{text [quotes] "hello"}, the parsing succeeds @{ML_matchresult [display,gray] \<open>(!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "hello")\<close> \<open>("h", ["e", "l", "l", "o"])\<close>} but if you invoke it on @{text [quotes] "world"} @{ML_response [display,gray] \<open>(!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "world")\<close> \<open>exception ABORT fn raised\<close>} then the parsing aborts and the error message \<open>foo\<close> is printed. In order to see the error message properly, you need to prefix the parser with the function @{ML_ind error in Scan}. For example: @{ML_response [display,gray] \<open>Scan.error (!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "world")\<close> \<open>foo\<close>} This kind of ``prefixing'' to see the correct error message is usually done by wrappers such as @{ML_ind local_theory in Outer_Syntax} (see Section~\ref{sec:newcommand} which explains this function in more detail). Let us now return to our example of parsing @{ML \<open>(p -- q) || r\<close> for p q r}. If you want to generate the correct error message for failure of parsing \<open>p\<close>-followed-by-\<open>q\<close>, then you have to write:\<close>ML %grayML\<open>fun p_followed_by_q p q r =let val err_msg = fn _ => p ^ " is not followed by " ^ qin ($$ p -- (!! (fn _ => err_msg) ($$ q))) || ($$ r -- $$ r)end\<close>text \<open> Running this parser with the arguments @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and the input @{text [quotes] "holle"} @{ML_response [display,gray] \<open>Scan.error (p_followed_by_q "h" "e" "w") (Symbol.explode "holle")\<close> \<open>h is not followed by e\<close>} produces the correct error message. Running it with @{ML_matchresult [display,gray] \<open>Scan.error (p_followed_by_q "h" "e" "w") (Symbol.explode "wworld")\<close> \<open>(("w", "w"), ["o", "r", "l", "d"])\<close>} yields the expected parsing. The function @{ML \<open>Scan.repeat p\<close> for p} will apply a parser \<open>p\<close> as long as it succeeds. For example: @{ML_matchresult [display,gray] \<open>Scan.repeat ($$ "h") (Symbol.explode "hhhhello")\<close> \<open>(["h", "h", "h", "h"], ["e", "l", "l", "o"])\<close>} Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function @{ML_ind repeat1 in Scan} is similar, but requires that the parser \<open>p\<close> succeeds at least once. Also note that the parser would have aborted with the exception \<open>MORE\<close>, if you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' @{ML_ind stopper in Symbol}. With them you can write: @{ML_matchresult [display,gray] \<open>Scan.finite Symbol.stopper (Scan.repeat ($$ "h")) (Symbol.explode "hhhh")\<close> \<open>(["h", "h", "h", "h"], [])\<close>} The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; other stoppers need to be used when parsing, for example, tokens. However, this kind of manually wrapping is often already done by the surrounding infrastructure. The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any string as in @{ML_matchresult [display,gray] \<open>let val p = Scan.repeat (Scan.one Symbol.not_eof) val input = Symbol.explode "foo bar foo"in Scan.finite Symbol.stopper p inputend\<close> \<open>(["f", "o", "o", " ", "b", "a", "r", " ", "f", "o", "o"], [])\<close>} where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the end of the input string (i.e.~stopper symbol). The function @{ML_ind unless in Scan} takes two parsers: if the first one can parse the input, then the whole parser fails; if not, then the second is tried. Therefore @{ML_response [display,gray] \<open>Scan.unless ($$ "h") ($$ "w") (Symbol.explode "hello")\<close> "exception FAIL NONE raised"} fails, while @{ML_matchresult [display,gray] \<open>Scan.unless ($$ "h") ($$ "w") (Symbol.explode "world")\<close> \<open>("w",["o", "r", "l", "d"])\<close>} succeeds. The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can be combined to read any input until a certain marker symbol is reached. In the example below the marker symbol is a @{text [quotes] "*"}. @{ML_matchresult [display,gray]\<open>let val p = Scan.repeat (Scan.unless ($$ "*") (Scan.one Symbol.not_eof)) val input1 = Symbol.explode "fooooo" val input2 = Symbol.explode "foo*ooo"in (Scan.finite Symbol.stopper p input1, Scan.finite Symbol.stopper p input2)end\<close>\<open>((["f", "o", "o", "o", "o", "o"], []), (["f", "o", "o"], ["*", "o", "o", "o"]))\<close>} After parsing is done, you almost always want to apply a function to the parsed items. One way to do this is the function @{ML_ind ">>" in Scan} where @{ML \<open>(p >> f)\<close> for p f} runs first the parser \<open>p\<close> and upon successful completion applies the function \<open>f\<close> to the result. For example@{ML_matchresult [display,gray]\<open>let fun double (x, y) = (x ^ x, y ^ y) val parser = $$ "h" -- $$ "e"in (parser >> double) (Symbol.explode "hello")end\<close>\<open>(("hh", "ee"), ["l", "l", "o"])\<close>} doubles the two parsed input strings; or @{ML_matchresult [display,gray] \<open>let val p = Scan.repeat (Scan.one Symbol.not_eof) val input = Symbol.explode "foo bar foo" in Scan.finite Symbol.stopper (p >> implode) inputend\<close> \<open>("foo bar foo",[])\<close>} where the single-character strings in the parsed output are transformed back into one string. The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies the given parser to the second component of the pair and leaves the first component untouched. For example@{ML_matchresult [display,gray]\<open>Scan.lift ($$ "h" -- $$ "e") (1, Symbol.explode "hello")\<close>\<open>(("h", "e"), (1, ["l", "l", "o"]))\<close>} \footnote{\bf FIXME: In which situations is \<open>lift\<close> useful? Give examples.} Be aware of recursive parsers. Suppose you want to read strings separated by commas and by parentheses into a tree datastructure; for example, generating the tree corresponding to the string @{text [quotes] "(A, A), (A, A)"} where the \<open>A\<close>s will be the leaves. We assume the trees are represented by the datatype:\<close>ML %grayML\<open>datatype tree = Lf of string | Br of tree * tree\<close>text \<open> Since nested parentheses should be treated in a meaningful way---for example the string @{text [quotes] "((A))"} should be read into a single leaf---you might implement the following parser.\<close>ML %grayML\<open>fun parse_basic s = $$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")" and parse_tree s = parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s\<close>text \<open> This parser corresponds to the grammar: \begin{center} \begin{tabular}{lcl} \<open><Basic>\<close> & \<open>::=\<close> & \<open><String> | (<Tree>)\<close>\\ \<open><Tree>\<close> & \<open>::=\<close> & \<open><Basic>, <Tree> | <Basic>\<close>\\ \end{tabular} \end{center} The parameter \<open>s\<close> is the string over which the tree is parsed. The parser @{ML parse_basic} reads either a leaf or a tree enclosed in parentheses. The parser @{ML parse_tree} reads either a pair of trees separated by a comma, or acts like @{ML parse_basic}. Unfortunately, because of the mutual recursion, this parser will immediately run into a loop, even if it is called without any input. For example @{ML_matchresult_fake_both [display, gray] \<open>parse_tree "A"\<close> \<open>*** Exception- TOPLEVEL_ERROR raised\<close>} raises an exception indicating that the stack limit is reached. Such looping parser are not useful, because of ML's strict evaluation of arguments. Therefore we need to delay the execution of the parser until an input is given. This can be done by adding the parsed string as an explicit argument. So the parser above should be implemented as follows.\<close>ML %grayML\<open>fun parse_basic s xs = ($$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")") xs and parse_tree s xs = (parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s) xs\<close>text \<open> While the type of the parser is unchanged by the addition, its behaviour changed: with this version of the parser the execution is delayed until some string is applied for the argument \<open>xs\<close>. This gives us exactly the parser what we wanted. An example is as follows: @{ML_matchresult [display, gray] \<open>let val input = Symbol.explode "(A,((A))),A"in Scan.finite Symbol.stopper (parse_tree "A") inputend\<close> \<open>(Br (Br (Lf "A", Lf "A"), Lf "A"), [])\<close>} \begin{exercise}\label{ex:scancmts} Write a parser that parses an input string so that any comment enclosed within \<open>(*\<dots>*)\<close> is replaced by the same comment but enclosed within \<open>(**\<dots>**)\<close> in the output string. To enclose a string, you can use the function @{ML \<open>enclose s1 s2 s\<close> for s1 s2 s} which produces the string @{ML \<open>s1 ^ s ^ s2\<close> for s1 s2 s}. Hint: To simplify the task ignore the proper nesting of comments. \end{exercise}\<close>section \<open>Parsing Theory Syntax\<close>text \<open> Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. These token parsers have the type:\<close>ML %grayML\<open>type 'a parser = Token.T list -> 'a * Token.T list\<close>ML "Thy_Header.get_keywords'"text \<open> {\bf REDO!!} The reason for using token parsers is that theory syntax, as well as the parsers for the arguments of proof methods, use the type @{ML_type Token.T}. \begin{readmore} The parser functions for the theory syntax are contained in the structure @{ML_structure Parse} defined in the file @{ML_file "Pure/Isar/parse.ML"}. The definition for tokens is in the file @{ML_file "Pure/Isar/token.ML"}. \end{readmore} The structure @{ML_structure Token} defines several kinds of tokens (for example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in Token} for keywords and @{ML_ind Command in Token} for commands). Some token parsers take into account the kind of tokens. The first example shows how to generate a token list out of a string using the function @{ML_ind explode in Token}. It is given the argument @{ML \<open>Position.none\<close>} since, at the moment, we are not interested in generating precise error messages. The following code@{ML_response [display,gray] \<open>Token.explode (Thy_Header.get_keywords' @{context}) Position.none "hello world"\<close>\<open>[Token (\<dots>(Ident, "hello"),\<dots>), Token (\<dots>(Space, " "),\<dots>), Token (\<dots>(Ident, "world"),\<dots>)]\<close>} produces three tokens where the first and the last are identifiers, since @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any other syntactic category. The second indicates a space. We can easily change what is recognised as a keyword with the function @{ML_ind add_keywords in Thy_Header}. For example calling it with \<close>setup \<open>Thy_Header.add_keywords [(("hello", Position.none),Keyword.no_spec)]\<close>text \<open> then lexing @{text [quotes] "hello world"} will produce @{ML_response [display,gray] \<open>Token.explode (Thy_Header.get_keywords' @{context}) Position.none "hello world"\<close> \<open>[Token (\<dots>(Keyword, "hello"),\<dots>), Token (\<dots>(Space, " "),\<dots>), Token (\<dots>(Ident, "world"),\<dots>)]\<close>} Many parsing functions later on will require white space, comments and the like to have already been filtered out. So from now on we are going to use the functions @{ML filter} and @{ML_ind is_proper in Token} to do this. For example:@{ML_response [display,gray]\<open>let val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none "hello world"in filter Token.is_proper inputend\<close> \<open>[Token (\<dots>(Keyword, "hello"), \<dots>), Token (\<dots>(Ident, "world"), \<dots>)]\<close>} For convenience we define the function:\<close>ML %grayML\<open>fun filtered_input str = filter Token.is_proper (Token.explode (Thy_Header.get_keywords' @{context}) Position.none str)\<close>ML \<open>filtered_input "inductive | for"\<close>text \<open> If you now parse@{ML_response [display,gray] \<open>filtered_input "inductive | for"\<close> \<open>[Token (\<dots>(Command, "inductive"),\<dots>), Token (\<dots>(Keyword, "|"),\<dots>), Token (\<dots>(Keyword, "for"),\<dots>)]\<close>} you obtain a list consisting of only one command and two keyword tokens. If you want to see which keywords and commands are currently known to Isabelle, use the function @{ML_ind get_keywords' in Thy_Header}: You might have to adjust the \<open>ML_print_depth\<close> in order to see the complete list. The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example:@{ML_matchresult [display,gray]\<open>let val input1 = filtered_input "where for" val input2 = filtered_input "| in"in (Parse.$$$ "where" input1, Parse.$$$ "|" input2)end\<close>\<open>(("where",_), ("|",_))\<close>} Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}. For example: @{ML_matchresult [display,gray]\<open>let val p = Parse.reserved "bar" val input = filtered_input "bar"in p inputend\<close> \<open>("bar",[])\<close>} Like before, you can sequentially connect parsers with @{ML \<open>--\<close>}. For example: @{ML_matchresult [display,gray]\<open>let val input = filtered_input "| in"in (Parse.$$$ "|" -- Parse.$$$ "in") inputend\<close>\<open>(("|", "in"), [])\<close>} The parser @{ML \<open>Parse.enum s p\<close> for s p} parses a possibly empty list of items recognised by the parser \<open>p\<close>, where the items being parsed are separated by the string \<open>s\<close>. For example:@{ML_matchresult [display,gray]\<open>let val input = filtered_input "in | in | in foo"in (Parse.enum "|" (Parse.$$$ "in")) inputend\<close> \<open>(["in", "in", "in"], [_])\<close>} The function @{ML_ind enum1 in Parse} works similarly, except that the parsed list must be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the end of the parsed string, otherwise the parser would have consumed all tokens and then failed with the exception \<open>MORE\<close>. Like in the previous section, we can avoid this exception using the wrapper @{ML Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML Token.stopper}. We can write:@{ML_matchresult [display,gray]\<open>let val input = filtered_input "in | in | in" val p = Parse.enum "|" (Parse.$$$ "in")in Scan.finite Token.stopper p inputend\<close> \<open>(["in", "in", "in"], [])\<close>} The following function will help to run examples.\<close>ML %grayML\<open>fun parse p input = Scan.finite Token.stopper (Scan.error p) input\<close>text \<open> The function @{ML_ind "!!!" in Parse} can be used to force termination of the parser in case of a dead end, just like @{ML \<open>Scan.!!\<close>} (see previous section). A difference, however, is that the error message of @{ML \<open>Parse.!!!\<close>} is fixed to be @{text [quotes] "Outer syntax error"} together with a relatively precise description of the failure. For example:@{ML_response [display,gray]\<open>let val input = filtered_input "in |" val parse_bar_then_in = Parse.$$$ "|" -- Parse.$$$ "in"in parse (Parse.!!! parse_bar_then_in) input end\<close>\<open>Outer syntax error: keyword "|" expected, but keyword in was found\<close>} \begin{exercise} (FIXME) A type-identifier, for example @{typ "'a"}, is a token of kind @{ML_ind Keyword in Token}. It can be parsed using the function @{ML type_ident in Parse}. \end{exercise} (FIXME: or give parser for numbers) Whenever there is a possibility that the processing of user input can fail, it is a good idea to give all available information about where the error occurred. For this Isabelle can attach positional information to tokens and then thread this information up the ``processing chain''. To see this, modify the function @{ML filtered_input}, described earlier, as follows \<close>ML %grayML\<open>fun filtered_input' str = filter Token.is_proper (Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 7) str)\<close>text \<open> where we pretend the parsed string starts on line 7. An example is@{ML_response [display,gray]\<open>filtered_input' "foo \n bar"\<close>\<open>[Token (("foo", ({line=7, offset=1, end_offset=4}, {line=7, offset=4})), (Ident, "foo"),\<dots>), Token (("bar", ({line=8, offset=7, end_offset=10}, {line=8, offset=10})), (Ident, "bar"),\<dots>)]\<close>} in which the @{text [quotes] "\\n"} causes the second token to be in line 8. By using the parser @{ML position in Parse} you can access the token position and return it as part of the parser result. For example@{ML_response [display,gray]\<open>let val input = filtered_input' "where"in parse (Parse.position (Parse.$$$ "where")) inputend\<close>\<open>(("where", {line=7, offset=1, end_offset=6}), [])\<close>} \begin{readmore} The functions related to positions are implemented in the file @{ML_file "Pure/General/position.ML"}. \end{readmore} \begin{exercise}\label{ex:contextfree} Write a parser for the context-free grammar representing arithmetic expressions with addition and multiplication. As usual, multiplication binds stronger than addition, and both of them nest to the right. The context-free grammar is defined as: \begin{center} \begin{tabular}{lcl} \<open><Basic>\<close> & \<open>::=\<close> & \<open><Number> | (<Expr>)\<close>\\ \<open><Factor>\<close> & \<open>::=\<close> & \<open><Basic> * <Factor> | <Basic>\<close>\\ \<open><Expr>\<close> & \<open>::=\<close> & \<open><Factor> + <Expr> | <Factor>\<close>\\ \end{tabular} \end{center} Hint: Be careful with recursive parsers. \end{exercise}\<close>section \<open>Parsers for ML-Code (TBD)\<close>text \<open> @{ML_ind ML_source in Parse}\<close>section \<open>Context Parser (TBD)\<close>text \<open> @{ML_ind Args.context}\<close>(*ML {*let val parser = Args.context -- Scan.lift Args.name_inner_syntax fun term_pat (ctxt, str) = str |> Syntax.read_prop ctxtin (parser >> term_pat) (Context.Proof @{context}, filtered_input "f (a::nat)") |> fstend*}*)text \<open> @{ML_ind Args.context} Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}.\<close>section \<open>Argument and Attribute Parsers (TBD)\<close>section \<open>Parsing Inner Syntax\<close>text \<open> There is usually no need to write your own parser for parsing inner syntax, that is for terms and types: you can just call the predefined parsers. Terms can be parsed using the function @{ML_ind term in Parse}. For example:@{ML_response [display,gray]\<open>let val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none "foo"in Parse.term inputend\<close>\<open>("<markup>", [])\<close>} The function @{ML_ind prop in Parse} is similar, except that it gives a different error message, when parsing fails. As you can see, the parser not just returns the parsed string, but also some markup information. You can decode the information with the function @{ML_ind parse in YXML} in @{ML_structure YXML}. The result of the decoding is an XML-tree. You can see better what is going on if you replace @{ML Position.none} by @{ML \<open>Position.line 42\<close>}, say:@{ML_response [display,gray]\<open>let val input = Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 42) "foo"in YXML.parse (fst (Parse.term input))end\<close>\<open><input delimited="false" line="42" offset="1" end_offset="4">foo</input>\<close>} The positional information is stored as part of an XML-tree so that code called later on will be able to give more precise error messages. \begin{readmore} The functions to do with input and output of XML and YXML are defined in @{ML_file "Pure/PIDE/xml.ML"} and @{ML_file "Pure/PIDE/yxml.ML"}. \end{readmore} FIXME: @{ML_ind parse_term in Syntax} @{ML_ind check_term in Syntax} @{ML_ind parse_typ in Syntax} @{ML_ind check_typ in Syntax} @{ML_ind read_term in Syntax} @{ML_ind read_term in Syntax}\<close>section \<open>Parsing Specifications\label{sec:parsingspecs}\<close>text \<open> There are a number of special purpose parsers that help with parsing specifications of function definitions, inductive predicates and so on. In Chapter~\ref{chp:package}, for example, we will need to parse specifications for inductive predicates of the form:\<close>simple_inductive even and oddwhere even0: "even 0"| evenS: "odd n \<Longrightarrow> even (Suc n)"| oddS: "even n \<Longrightarrow> odd (Suc n)"text \<open> For this we are going to use the parser:\<close>ML %linenosgray\<open>val spec_parser = Parse.vars -- Scan.optional (Parse.$$$ "where" |-- Parse.!!! (Parse.enum1 "|" (Parse_Spec.opt_thm_name ":" -- Parse.prop))) []\<close>text \<open> Note that the parser must not parse the keyword \simpleinductive, even if it is meant to process definitions as shown above. The parser of the keyword will be given by the infrastructure that will eventually call @{ML spec_parser}. To see what the parser returns, let us parse the string corresponding to the definition of @{term even} and @{term odd}:@{ML_matchresult [display,gray]\<open>let val input = filtered_input ("even and odd " ^ "where " ^ " even0[intro]: \"even 0\" " ^ "| evenS[intro]: \"odd n \<Longrightarrow> even (Suc n)\" " ^ "| oddS[intro]: \"even n \<Longrightarrow> odd (Suc n)\"")in parse spec_parser inputend\<close>\<open>(([(even, NONE, NoSyn), (odd, NONE, NoSyn)], [((even0,_),_), ((evenS,_),_), ((oddS,_),_)]), [])\<close>}\<close>text \<open> As you see, the result is a pair consisting of a list of variables with optional type-annotation and syntax-annotation, and a list of rules where every rule has optionally a name and an attribute. The function @{ML_ind "vars" in Parse} in Line 2 of the parser reads an \isacommand{and}-separated list of variables that can include optional type annotations and syntax translations. For example:\footnote{Note that in the code we need to write \<open>"int \<Rightarrow> bool"\<close> in order to properly escape the double quotes in the compound type.}@{ML_response [display,gray]\<open>let val input = filtered_input "foo::\"int \<Rightarrow> bool\" and bar::nat (\"BAR\" 100) and blonk"in parse Parse.vars inputend\<close>\<open>([("foo", SOME \<dots>, NoSyn), ("bar", SOME \<dots>, Mixfix (Source {\<dots>text = "BAR"}, [], 100, \<dots>)), ("blonk", NONE, NoSyn)], [])\<close>} \<close>text \<open> Whenever types are given, they are stored in the @{ML SOME}s. The types are not yet used to type the variables: this must be done by type-inference later on. Since types are part of the inner syntax they are strings with some encoded information (see previous section). If a mixfix-syntax is present for a variable, then it is stored in the @{ML Mixfix} data structure; no syntax translation is indicated by @{ML_ind NoSyn in Syntax}. \begin{readmore} The data structure for mixfix annotations are implemented in @{ML_file "Pure/Syntax/mixfix.ML"} and @{ML_file "Pure/Syntax/syntax.ML"}. \end{readmore} Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a list of introduction rules, that is propositions with theorem annotations such as rule names and attributes. The introduction rules are propositions parsed by @{ML_ind prop in Parse}. However, they can include an optional theorem name plus some attributes. For example@{ML_matchresult [display,gray] \<open>let val input = filtered_input "foo_lemma[intro,dest!]:" val ((name, attrib), _) = parse (Parse_Spec.thm_name ":") input in (name, map Token.name_of_src attrib)end\<close> \<open>(foo_lemma, [("intro", _), ("dest", _)])\<close>} The function @{ML_ind opt_thm_name in Parse_Spec} is the ``optional'' variant of @{ML_ind thm_name in Parse_Spec}. Theorem names can contain attributes. The name has to end with @{text [quotes] ":"}---see the argument of the function @{ML Parse_Spec.opt_thm_name} in Line 7. \begin{readmore} Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} and @{ML_file "Pure/Isar/args.ML"}. \end{readmore}\<close>text_raw \<open> \begin{exercise} Have a look at how the parser @{ML Parse_Spec.where_multi_specs} is implemented in file @{ML_file "Pure/Isar/parse_spec.ML"}. This parser corresponds to the ``where-part'' of the introduction rules given above. Below we paraphrase the code of @{ML_ind where_multi_specs in Parse_Spec} adapted to our purposes. \begin{isabelle}\<close>ML %linenosgray\<open>val spec_parser' = Parse.vars -- Scan.optional (Parse.$$$ "where" |-- Parse.!!! (Parse.enum1 "|" ((Parse_Spec.opt_thm_name ":" -- Parse.prop) --| Scan.option (Scan.ahead (Parse.name || Parse.$$$ "[") -- Parse.!!! (Parse.$$$ "|"))))) []\<close>text_raw \<open> \end{isabelle} Both parsers accept the same input% that's not true: % spec_parser accepts input that is refuted by spec_parser' , but if you look closely, you can notice an additional ``tail'' (Lines 8 to 10) in @{ML spec_parser'}. What is the purpose of this additional ``tail''? \end{exercise}\<close>text \<open> (FIXME: @{ML Parse.type_args}, @{ML Parse.typ}, @{ML Parse.opt_mixfix})\<close>section \<open>New Commands\label{sec:newcommand}\<close>text \<open> Often new commands, for example for providing new definitional principles, need to be implemented. Let us start with a ``silly'' command that does nothing at all. We shall name this command \isacommand{foobar}. Before you can implement any new command, you have to ``announce'' it in the \isacommand{keywords}-section of your theory header. For \isacommand{foobar} we need to write something like \begin{graybox} \isacommand{theory}~\<open>Foo\<close>\\ \isacommand{imports}~\<open>Main\<close>\\ \isacommand{keywords} @{text [quotes] "foobar"} \<open>::\<close> \<open>thy_decl\<close>\\ ... \end{graybox} where @{ML_ind "thy_decl" in Keyword} indicates the kind of the command. Another possible kind is \<open>thy_goal\<close>, or you can also omit the kind entirely, in which case you declare a keyword (something that is part of a command). Now you can implement \isacommand{foobar} as follows.\<close>ML %grayML\<open>let val do_nothing = Scan.succeed (Local_Theory.background_theory I)in Outer_Syntax.local_theory @{command_keyword "foobar"} "description of foobar" do_nothingend\<close>text \<open> The crucial function @{ML_ind local_theory in Outer_Syntax} expects the name for the command, a kind indicator, a short description and a parser producing a local theory transition (explained later). For the name and the kind, you can use the ML-antiquotation \<open>@{command_spec ...}\<close>. You can now write in your theory \<close>foobartext \<open> but of course you will not see anything since \isacommand{foobar} is not intended to do anything. As it stands, the command \isacommand{foobar} is not very useful. Let us refine it a bit next by letting it take a proposition as argument and printing this proposition inside the tracing buffer. We announce the command \isacommand{foobar\_trace} in the theory header as \begin{graybox} \isacommand{keywords} @{text [quotes] "foobar_trace"} \<open>::\<close> \<open>thy_decl\<close> \end{graybox} The crucial part of a command is the function that determines the behaviour of the command. In the code above we used a ``do-nothing''-function, which because of the parser @{ML_ind succeed in Scan} does not parse any argument, but immediately returns the simple function @{ML \<open>Local_Theory.background_theory I\<close>}. We can replace this code by a function that first parses a proposition (using the parser @{ML Parse.prop}), then prints out some tracing information (using the function \<open>trace_prop\<close>) and finally does nothing. For this you can write:\<close>ML %grayML\<open>let fun trace_prop str = Local_Theory.background_theory (fn ctxt => (tracing str; ctxt))in Outer_Syntax.local_theory @{command_keyword "foobar_trace"} "traces a proposition" (Parse.prop >> trace_prop)end\<close>text \<open> This command can now be used to see the proposition in the tracing buffer.\<close>foobar_trace "True \<and> False"text \<open> Note that so far we used @{ML_ind thy_decl in Keyword} as the kind indicator for the new command. This means that the command finishes as soon as the arguments are processed. Examples of this kind of commands are \isacommand{definition} and \isacommand{declare}. In other cases, commands are expected to parse some arguments, for example a proposition, and then ``open up'' a proof in order to prove the proposition (for example \isacommand{lemma}) or prove some other properties (for example \isacommand{function}). To achieve this kind of behaviour, you have to use the kind indicator @{ML_ind thy_goal in Keyword} and the function @{ML \<open>local_theory_to_proof\<close> in Outer_Syntax} to set up the command. Below we show the command \isacommand{foobar\_goal} which takes a proposition as argument and then starts a proof in order to prove it. Therefore, we need to announce this command in the header as \<open>thy_goal\<close>. \begin{graybox} \isacommand{keywords} @{text [quotes] "foobar_goal"} \<open>::\<close> \<open>thy_goal\<close> \end{graybox} Then we can write:\<close>ML%linenosgray\<open>let fun goal_prop str ctxt = let val prop = Syntax.read_prop ctxt str in Proof.theorem NONE (K I) [[(prop, [])]] ctxt endin Outer_Syntax.local_theory_to_proof @{command_keyword "foobar_goal"} "proves a proposition" (Parse.prop >> goal_prop)end\<close>text \<open> The function \<open>goal_prop\<close> in Lines 2 to 7 takes a string (the proposition to be proved) and a context as argument. The context is necessary in order to be able to use @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition. In Line 6 the function @{ML_ind theorem in Proof} starts the proof for the proposition. Its argument @{ML NONE} stands for a locale (which we chose to omit); the argument @{ML \<open>(K I)\<close>} stands for a function that determines what should be done with the theorem once it is proved (we chose to just forget about it). If you now type \isacommand{foobar\_goal}~@{text [quotes] "True \<and> True"}, you obtain the following proof state:\<close>foobar_goal "True \<and> True"txt \<open> \begin{minipage}{\textwidth} @{subgoals [display]} \end{minipage}\medskip and can prove the proposition as follows.\<close>apply(rule conjI)apply(rule TrueI)+donetext \<open> The last command we describe here is \isacommand{foobar\_proof}. Like \isacommand{foobar\_goal}, its purpose is to take a proposition and open a corresponding proof-state that allows us to give a proof for it. However, unlike \isacommand{foobar\_goal}, the proposition will be given as a ML-value. Such a command is quite useful during development when you generate a goal on the ML-level and want to see whether it is provable. In addition we want to allow the proved proposition to have a name that can be referenced later on. The first problem for \isacommand{foobar\_proof} is to parse some text as ML-source and then interpret it as an Isabelle term using the ML-runtime. For the parsing part, we can use the function @{ML_ind "ML_source" in Parse} in the structure @{ML_structure Parse}. For running the ML-interpreter we need the following scaffolding code.\<close>ML %grayML\<open>structure Result = Proof_Data (type T = unit -> term fun init thy () = error "Result")val result_cookie = (Result.get, Result.put, "Result.put")\<close>text \<open> With this in place, we can implement the code for \isacommand{foobar\_prove} as follows.\<close>ML %linenosgray\<open>let fun after_qed thm_name thms lthy = Local_Theory.note (thm_name, (flat thms)) lthy |> snd fun setup_proof (thm_name, src) lthy = let val (text, _) = Input.source_content src val trm = Code_Runtime.value lthy result_cookie ("", text) in Proof.theorem NONE (after_qed thm_name) [[(trm, [])]] lthy end val parser = Parse_Spec.opt_thm_name ":" -- Parse.ML_sourcein Outer_Syntax.local_theory_to_proof @{command_keyword "foobar_prove"} "proving a proposition" (parser >> setup_proof)end\<close>text \<open> In Line 13, we implement a parser that first reads in an optional lemma name (terminated by ``:'') and then some ML-code. The function in Lines 5 to 10 takes the ML-text and lets the ML-runtime evaluate it using the function @{ML_ind value in Code_Runtime} in the structure @{ML_structure Code_Runtime}. Once the ML-text has been turned into a term, the function @{ML theorem in Proof} opens a corresponding proof-state. This function takes the function \<open>after_qed\<close> as argument, whose purpose is to store the theorem (once it is proven) under the given name \<open>thm_name\<close>. You can now define a term, for example\<close>ML %grayML\<open>val prop_true = @{prop "True"}\<close>text \<open> and give it a proof using \isacommand{foobar\_prove}:\<close>foobar_prove test: prop_trueapply(rule TrueI)donetext \<open> Finally you can test whether the lemma has been stored under the given name. \begin{isabelle} \isacommand{thm}~\<open>test\<close>\\ \<open>> \<close>~@{thm TrueI} \end{isabelle}\<close>(*text {* {\bf TBD below} *}*)(*ML {*structure TacticData = ProofDataFun( type T = thm list -> tactic; fun init _ = undefined;)val set_tactic = TacticData.put;*}ML {* TacticData.get @{context}*}ML {* Method.set_tactic *}ML {* fun tactic (facts: thm list) : tactic = (atac 1) *}ML {* Context.map_proof *}ML {* ML_Context.expression *}ML {* METHOD *}ML {* fun myexpression pos bind body txt =let val _ = tracing ("bind)" ^ bind) val _ = tracing ("body)" ^ body) val _ = tracing ("txt)" ^ txt) val _ = tracing ("result) " ^ "Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^ " end (ML_Context.the_generic_context ())));")in ML_Context.exec (fn () => ML_Context.eval false pos ("Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^ " end (ML_Context.the_generic_context ())));"))end*}ML {*fun ml_tactic (txt, pos) ctxt =let val ctxt' = ctxt |> Context.proof_map (myexpression pos "fun tactic (facts: thm list) : tactic" "Context.map_proof (Method.set_tactic tactic)" txt);in Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt')end;*}ML {*fun tactic3 (txt, pos) ctxt = let val _ = tracing ("1) " ^ txt ) in METHOD (ml_tactic (txt, pos) ctxt; K (atac 1)) end*}setup {*Method.setup (Binding.name "tactic3") (Scan.lift (Parse.position Args.name) >> tactic3) "ML tactic as proof method"*}lemma "A \<Longrightarrow> A"apply(tactic3 {* (atac 1) *})doneML {* (ML_Context.the_generic_context ())*}ML {*Context.set_thread_data;ML_Context.the_generic_context*}lemma "A \<Longrightarrow> A"ML_prf {*Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));*}ML {*Context.set_thread_data (SOME ((let fun tactic (facts: thm list) : tactic = (atac 1) in 3 end) (ML_Context.the_generic_context ())));*}ML {*Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));*}ML {*let fun tactic (facts: thm list) : tactic = atacin Context.map_proof (Method.set_tactic tactic)end *}end *}ML {* Toplevel.program (fn () => (ML_Context.expression Position.none "val plus : int" "3 + 4" "1" (Context.Proof @{context})))*}ML {*fun ml_tactic (txt, pos) ctxt = let val ctxt' = ctxt |> Context.proof_map (ML_Context.expression pos "fun tactic (facts: thm list) : tactic" "Context.map_proof (Method.set_tactic tactic)" txt); in Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt') end;*}ML {*Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));*}*)section \<open>Methods (TBD)\<close>text \<open> (FIXME: maybe move to after the tactic section) Methods are central to Isabelle. You use them, for example, in \isacommand{apply}. To print out all currently known methods you can use the Isabelle command: \begin{isabelle} \isacommand{print\_methods}\\ \<open>> methods:\<close>\\ \<open>> -: do nothing (insert current facts only)\<close>\\ \<open>> HOL.default: apply some intro/elim rule (potentially classical)\<close>\\ \<open>> ...\<close> \end{isabelle} An example of a very simple method is:\<close>method_setup %gray foo = \<open>Scan.succeed (fn ctxt => (SIMPLE_METHOD ((eresolve_tac ctxt [@{thm conjE}] THEN' resolve_tac ctxt [@{thm conjI}]) 1)))\<close> "foo method for conjE and conjI"text \<open> It defines the method \<open>foo\<close>, which takes no arguments (therefore the parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which applies @{thm [source] conjE} and then @{thm [source] conjI}. The function @{ML_ind SIMPLE_METHOD in Method} turns such a tactic into a method. The method \<open>foo\<close> can be used as follows\<close>lemma shows "A \<and> B \<Longrightarrow> C \<and> D"apply(foo)txt \<open> where it results in the goal state \begin{minipage}{\textwidth} @{subgoals} \end{minipage}\<close>(*<*)oops(*>*)method_setup test = \<open> Scan.lift (Scan.succeed (K Method.succeed))\<close> \<open>bla\<close>lemma "True"apply(test)oopsmethod_setup joker = \<open> Scan.lift (Scan.succeed (fn ctxt => Method.cheating true))\<close> \<open>bla\<close>lemma "False"apply(joker)oopstext \<open>if true is set then always works\<close>ML \<open>assume_tac\<close> method_setup first_atac = \<open>Scan.lift (Scan.succeed (fn ctxt => (SIMPLE_METHOD (assume_tac ctxt 1))))\<close> \<open>bla\<close>ML \<open>HEADGOAL\<close>lemma "A \<Longrightarrow> A"apply(first_atac)oopsmethod_setup my_atac = \<open>Scan.lift (Scan.succeed (fn ctxt => (SIMPLE_METHOD' (assume_tac ctxt))))\<close> \<open>bla\<close>lemma "A \<Longrightarrow> A"apply(my_atac)oops(*ML {* SIMPLE_METHOD *}ML {* METHOD *}ML {* K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1)) *}ML {* Scan.succeed *}*)ML \<open>resolve_tac\<close>method_setup myrule = \<open>Scan.lift (Scan.succeed (fn ctxt => (METHOD (fn thms => resolve_tac ctxt thms 1))))\<close> \<open>bla\<close>lemma assumes a: "A \<Longrightarrow> B \<Longrightarrow> C" shows "C"using aapply(myrule)oopstext \<open> (********************************************************) (FIXME: explain a version of rule-tac)\<close>end