theory Parsing+ −
imports Base "Helper/Command/Command" "Package/Simple_Inductive_Package"+ −
begin+ −
+ −
(*<*)+ −
setup {*+ −
open_file_with_prelude + −
"Parsing_Code.thy"+ −
["theory Parsing", + −
"imports Base \"Package/Simple_Inductive_Package\"", + −
"begin"]+ −
*}+ −
(*>*)+ −
+ −
chapter {* Parsing\label{chp:parsing} *}+ −
+ −
text {*+ −
\begin{flushright}+ −
{\em An important principle underlying the success and popularity of Unix\\ is+ −
the philosophy of building on the work of others.} \\[1ex]+ −
Linus Torwalds in the email exchange\\ with Andrew S.~Tannenbaum+ −
\end{flushright}+ −
+ −
+ −
Isabelle distinguishes between \emph{outer} and \emph{inner}+ −
syntax. Commands, such as \isacommand{definition}, \isacommand{inductive}+ −
and so on, belong to the outer syntax, whereas terms, types and so on belong+ −
to the inner syntax. For parsing inner syntax, Isabelle uses a rather+ −
general and sophisticated algorithm, which is driven by priority+ −
grammars. Parsers for outer syntax are built up by functional parsing+ −
combinators. These combinators are a well-established technique for parsing,+ −
which has, for example, been described in Paulson's classic ML-book+ −
\cite{paulson-ml2}. Isabelle developers are usually concerned with writing+ −
these outer syntax parsers, either for new definitional packages or for+ −
calling methods with specific arguments.+ −
+ −
\begin{readmore}+ −
The library for writing parser combinators is split up, roughly, into two+ −
parts: The first part consists of a collection of generic parser combinators+ −
defined in the structure @{ML_struct Scan} in the file @{ML_file+ −
"Pure/General/scan.ML"}. The second part of the library consists of+ −
combinators for dealing with specific token types, which are defined in the+ −
structure @{ML_struct Parse} in the file @{ML_file+ −
"Pure/Isar/parse.ML"}. In addition specific parsers for packages are + −
defined in @{ML_file "Pure/Isar/parse_spec.ML"}. Parsers for method arguments + −
are defined in @{ML_file "Pure/Isar/args.ML"}.+ −
\end{readmore}+ −
+ −
*}+ −
+ −
section {* Building Generic Parsers *}+ −
+ −
text {*+ −
+ −
Let us first have a look at parsing strings using generic parsing+ −
combinators. The function @{ML_ind "$$" in Scan} takes a string as argument and will+ −
``consume'' this string from a given input list of strings. ``Consume'' in+ −
this context means that it will return a pair consisting of this string and+ −
the rest of the input list. For example:+ −
+ −
@{ML_response [display,gray] + −
"($$ \"h\") (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}+ −
+ −
@{ML_response [display,gray] + −
"($$ \"w\") (Symbol.explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}+ −
+ −
The function @{ML "$$"} will either succeed (as in the two examples above)+ −
or raise the exception @{text "FAIL"} if no string can be consumed. For+ −
example trying to parse+ −
+ −
@{ML_response_fake [display,gray] + −
"($$ \"x\") (Symbol.explode \"world\")" + −
"Exception FAIL raised"}+ −
+ −
will raise the exception @{text "FAIL"}. There are three exceptions used in+ −
the parsing combinators:+ −
+ −
\begin{itemize}+ −
\item @{text "FAIL"} is used to indicate that alternative routes of parsing + −
might be explored. + −
\item @{text "MORE"} indicates that there is not enough input for the parser. For example + −
in @{text "($$ \"h\") []"}.+ −
\item @{text "ABORT"} is the exception that is raised when a dead end is reached. + −
It is used for example in the function @{ML "!!"} (see below).+ −
\end{itemize}+ −
+ −
However, note that these exceptions are private to the parser and cannot be accessed+ −
by the programmer (for example to handle them).+ −
+ −
In the examples above we use the function @{ML_ind explode in Symbol} from the+ −
structure @{ML_struct Symbol}, instead of the more standard library function+ −
@{ML_ind explode in String}, for obtaining an input list for the parser. The reason is+ −
that @{ML explode in Symbol} is aware of character+ −
sequences, for example @{text "\<foo>"}, that have a special meaning in+ −
Isabelle. To see the difference consider+ −
+ −
@{ML_response_fake [display,gray]+ −
"let + −
val input = \"\<foo> bar\"+ −
in+ −
(String.explode input, Symbol.explode input)+ −
end"+ −
"([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],+ −
[\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}+ −
+ −
Slightly more general than the parser @{ML "$$"} is the function + −
@{ML_ind one in Scan}, in that it takes a predicate as argument and + −
then parses exactly+ −
one item from the input list satisfying this predicate. For example the+ −
following parser either consumes an @{text [quotes] "h"} or a @{text+ −
[quotes] "w"}:+ −
+ −
@{ML_response [display,gray] + −
"let + −
val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")+ −
val input1 = Symbol.explode \"hello\"+ −
val input2 = Symbol.explode \"world\"+ −
in+ −
(hw input1, hw input2)+ −
end"+ −
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. + −
For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this + −
order) you can achieve by:+ −
+ −
@{ML_response [display,gray] + −
"($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"+ −
"(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}+ −
+ −
Note how the result of consumed strings builds up on the left as nested pairs. + −
+ −
If, as in the previous example, you want to parse a particular string, + −
then you can use the function @{ML_ind this_string in Scan}.+ −
+ −
@{ML_response [display,gray] + −
"Scan.this_string \"hell\" (Symbol.explode \"hello\")"+ −
"(\"hell\", [\"o\"])"}+ −
+ −
Parsers that explore alternatives can be constructed using the function + −
@{ML_ind "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the+ −
result of @{text "p"}, in case it succeeds, otherwise it returns the+ −
result of @{text "q"}. For example:+ −
+ −
+ −
@{ML_response [display,gray] + −
"let + −
val hw = $$ \"h\" || $$ \"w\"+ −
val input1 = Symbol.explode \"hello\"+ −
val input2 = Symbol.explode \"world\"+ −
in+ −
(hw input1, hw input2)+ −
end"+ −
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing+ −
function for parsers, except that they discard the item being parsed by the+ −
first (respectively second) parser. That means the item being dropped is the + −
one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away.+ −
For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val just_e = $$ \"h\" |-- $$ \"e\" + −
val just_h = $$ \"h\" --| $$ \"e\" + −
val input = Symbol.explode \"hello\" + −
in + −
(just_e input, just_h input)+ −
end"+ −
"((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"}+ −
+ −
The parser @{ML "Scan.optional p x" for p x} returns the result of the parser + −
@{text "p"}, if it succeeds; otherwise it returns + −
the default value @{text "x"}. For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val p = Scan.optional ($$ \"h\") \"x\"+ −
val input1 = Symbol.explode \"hello\"+ −
val input2 = Symbol.explode \"world\" + −
in + −
(p input1, p input2)+ −
end" + −
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
The function @{ML_ind option in Scan} works similarly, except no default value can+ −
be given. Instead, the result is wrapped as an @{text "option"}-type. For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val p = Scan.option ($$ \"h\")+ −
val input1 = Symbol.explode \"hello\"+ −
val input2 = Symbol.explode \"world\" + −
in + −
(p input1, p input2)+ −
end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
The function @{ML_ind ahead in Scan} parses some input, but leaves the original+ −
input unchanged. For example:+ −
+ −
@{ML_response [display,gray]+ −
"Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" + −
"(\"foo\", [\"f\", \"o\", \"o\"])"} + −
+ −
The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages+ −
during parsing. For example if you want to parse @{text p} immediately + −
followed by @{text q}, or start a completely different parser @{text r},+ −
you might write:+ −
+ −
@{ML [display,gray] "(p -- q) || r" for p q r}+ −
+ −
However, this parser is problematic for producing a useful error+ −
message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the+ −
parser above you lose the information that @{text p} should be followed by @{text q}.+ −
To see this assume that @{text p} is present in the input, but it is not+ −
followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and+ −
hence the alternative parser @{text r} will be tried. However, in many+ −
circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something''+ −
and therefore will also fail. The error message is then caused by the failure+ −
of @{text r}, not by the absence of @{text q} in the input. This kind of+ −
situation can be avoided when using the function @{ML "!!"}. This function+ −
aborts the whole process of parsing in case of a failure and prints an error+ −
message. For example if you invoke the parser+ −
+ −
+ −
@{ML [display,gray] "!! (fn _ => fn _ =>\"foo\") ($$ \"h\")"}+ −
*}+ −
text {*+ −
on @{text [quotes] "hello"}, the parsing succeeds+ −
+ −
@{ML_response [display,gray] + −
"(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" + −
"(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}+ −
+ −
but if you invoke it on @{text [quotes] "world"}+ −
+ −
@{ML_response_fake [display,gray] "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"+ −
"Exception ABORT raised"}+ −
+ −
then the parsing aborts and the error message @{text "foo"} is printed. In order to+ −
see the error message properly, you need to prefix the parser with the function + −
@{ML_ind error in Scan}. For example:+ −
+ −
@{ML_response_fake [display,gray] + −
"Scan.error (!! (fn _ => fn _ => \"foo\") ($$ \"h\"))"+ −
"Exception Error \"foo\" raised"}+ −
+ −
This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in Outer_Syntax} + −
(see Section~\ref{sec:newcommand} which explains this function in more detail). + −
+ −
Let us now return to our example of parsing @{ML "(p -- q) || r" for p q+ −
r}. If you want to generate the correct error message for failure+ −
of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write:+ −
*}+ −
+ −
ML{*fun p_followed_by_q p q r =+ −
let + −
val err_msg = fn _ => p ^ " is not followed by " ^ q+ −
in+ −
($$ p -- (!! (fn _ => err_msg) ($$ q))) || ($$ r -- $$ r)+ −
end *}+ −
+ −
+ −
text {*+ −
Running this parser with the arguments+ −
@{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and + −
the input @{text [quotes] "holle"} + −
+ −
@{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")"+ −
"Exception ERROR \"h is not followed by e\" raised"} + −
+ −
produces the correct error message. Running it with+ −
+ −
@{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")"+ −
"((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}+ −
+ −
yields the expected parsing. + −
+ −
The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as + −
often as it succeeds. For example:+ −
+ −
@{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" + −
"([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}+ −
+ −
Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function+ −
@{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} + −
succeeds at least once.+ −
+ −
Also note that the parser would have aborted with the exception @{text MORE}, if+ −
you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using+ −
the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' + −
@{ML_ind stopper in Symbol}. With them you can write:+ −
+ −
@{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" + −
"([\"h\", \"h\", \"h\", \"h\"], [])"}+ −
+ −
The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings;+ −
other stoppers need to be used when parsing, for example, tokens. However, this kind of + −
manually wrapping is often already done by the surrounding infrastructure. + −
+ −
The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any + −
string as in+ −
+ −
@{ML_response [display,gray] + −
"let + −
val p = Scan.repeat (Scan.one Symbol.not_eof)+ −
val input = Symbol.explode \"foo bar foo\"+ −
in+ −
Scan.finite Symbol.stopper p input+ −
end" + −
"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}+ −
+ −
where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the + −
end of the input string (i.e.~stopper symbol).+ −
+ −
The function @{ML_ind unless in Scan} takes two parsers: if the first one can + −
parse the input, then the whole parser fails; if not, then the second is tried. Therefore+ −
+ −
@{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"+ −
"Exception FAIL raised"}+ −
+ −
fails, while+ −
+ −
@{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"+ −
"(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}+ −
+ −
succeeds. + −
+ −
The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can + −
be combined to read any input until a certain marker symbol is reached. In the + −
example below the marker symbol is a @{text [quotes] "*"}.+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))+ −
val input1 = Symbol.explode \"fooooo\"+ −
val input2 = Symbol.explode \"foo*ooo\"+ −
in+ −
(Scan.finite Symbol.stopper p input1, + −
Scan.finite Symbol.stopper p input2)+ −
end"+ −
"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),+ −
([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}+ −
+ −
+ −
After parsing is done, you almost always want to apply a function to the parsed + −
items. One way to do this is the function @{ML_ind ">>" in Scan} where + −
@{ML "(p >> f)" for p f} runs + −
first the parser @{text p} and upon successful completion applies the + −
function @{text f} to the result. For example+ −
+ −
@{ML_response [display,gray]+ −
"let + −
fun double (x, y) = (x ^ x, y ^ y) + −
val parser = $$ \"h\" -- $$ \"e\"+ −
in+ −
(parser >> double) (Symbol.explode \"hello\")+ −
end"+ −
"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}+ −
+ −
doubles the two parsed input strings; or+ −
+ −
@{ML_response [display,gray] + −
"let + −
val p = Scan.repeat (Scan.one Symbol.not_eof)+ −
val input = Symbol.explode \"foo bar foo\" + −
in+ −
Scan.finite Symbol.stopper (p >> implode) input+ −
end" + −
"(\"foo bar foo\",[])"}+ −
+ −
where the single-character strings in the parsed output are transformed+ −
back into one string.+ −
+ −
The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies+ −
the given parser to the second component of the pair and leaves the first component + −
untouched. For example+ −
+ −
@{ML_response [display,gray]+ −
"Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"+ −
"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}+ −
+ −
\footnote{\bf FIXME: In which situations is @{text "lift"} useful? Give examples.} + −
+ −
Be aware of recursive parsers. Suppose you want to read strings separated by+ −
commas and by parentheses into a tree datastructure; for example, generating+ −
the tree corresponding to the string @{text [quotes] "(A, A), (A, A)"} where+ −
the @{text "A"}s will be the leaves. We assume the trees are represented by the+ −
datatype:+ −
*}+ −
+ −
ML{*datatype tree = + −
Lf of string + −
| Br of tree * tree*}+ −
+ −
text {*+ −
Since nested parentheses should be treated in a meaningful way---for example+ −
the string @{text [quotes] "((A))"} should be read into a single + −
leaf---you might implement the following parser.+ −
*}+ −
+ −
ML{*fun parse_basic s = + −
$$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")" + −
+ −
and parse_tree s = + −
parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s*}+ −
+ −
text {*+ −
This parser corrsponds to the grammar:+ −
+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
@{text "<Basic>"} & @{text "::="} & @{text "<String> | (<Tree>)"}\\+ −
@{text "<Tree>"} & @{text "::="} & @{text "<Basic>, <Tree> | <Basic>"}\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
The parameter @{text "s"} is the string over which the tree is parsed. The+ −
parser @{ML parse_basic} reads either a leaf or a tree enclosed in+ −
parentheses. The parser @{ML parse_tree} reads either a pair of trees+ −
separated by a comma, or acts like @{ML parse_basic}. Unfortunately,+ −
because of the mutual recursion, this parser will immediately run into a+ −
loop, even if it is called without any input. For example+ −
+ −
@{ML_response_fake_both [display, gray]+ −
"parse_tree \"A\""+ −
"*** Exception- TOPLEVEL_ERROR raised"}+ −
+ −
raises an exception indicating that the stack limit is reached. Such+ −
looping parser are not useful, because of ML's strict evaluation of+ −
arguments. Therefore we need to delay the execution of the+ −
parser until an input is given. This can be done by adding the parsed+ −
string as an explicit argument. So the parser above should be implemented+ −
as follows.+ −
*}+ −
+ −
ML{*fun parse_basic s xs = + −
($$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")") xs + −
+ −
and parse_tree s xs = + −
(parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s) xs*}+ −
+ −
text {*+ −
While the type of the parser is unchanged by the addition, its behaviour + −
changed: with this version of the parser the execution is delayed until + −
some string is applied for the argument @{text "xs"}. This gives us + −
exactly the parser what we wanted. An example is as follows:+ −
+ −
@{ML_response [display, gray]+ −
"let+ −
val input = Symbol.explode \"(A,((A))),A\"+ −
in+ −
Scan.finite Symbol.stopper (parse_tree \"A\") input+ −
end"+ −
"(Br (Br (Lf \"A\", Lf \"A\"), Lf \"A\"), [])"}+ −
+ −
+ −
\begin{exercise}\label{ex:scancmts}+ −
Write a parser that parses an input string so that any comment enclosed+ −
within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within+ −
@{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the+ −
function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML+ −
"s1 ^ s ^ s2" for s1 s2 s}. Hint: To simplify the task ignore the proper + −
nesting of comments.+ −
\end{exercise}+ −
*}+ −
+ −
section {* Parsing Theory Syntax *}+ −
+ −
text {*+ −
Most of the time, however, Isabelle developers have to deal with parsing+ −
tokens, not strings. These token parsers have the type:+ −
*}+ −
+ −
ML{*type 'a parser = Token.T list -> 'a * Token.T list*}+ −
+ −
text {*+ −
The reason for using token parsers is that theory syntax, as well as the+ −
parsers for the arguments of proof methods, use the type @{ML_type+ −
Token.T}.+ −
+ −
\begin{readmore}+ −
The parser functions for the theory syntax are contained in the structure+ −
@{ML_struct Parse} defined in the file @{ML_file "Pure/Isar/parse.ML"}.+ −
The definition for tokens is in the file @{ML_file "Pure/Isar/token.ML"}.+ −
\end{readmore}+ −
+ −
The structure @{ML_struct Token} defines several kinds of tokens (for+ −
example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in+ −
Token} for keywords and @{ML_ind Command in Token} for commands). Some+ −
token parsers take into account the kind of tokens. The first example shows+ −
how to generate a token list out of a string using the function + −
@{ML_ind scan in Outer_Syntax}. It is given the argument + −
@{ML "Position.none"} since,+ −
at the moment, we are not interested in generating precise error+ −
messages. The following code+ −
+ −
+ −
@{ML_response_fake [display,gray] "Outer_Syntax.scan Position.none \"hello world\"" + −
"[Token (\<dots>,(Ident, \"hello\"),\<dots>), + −
Token (\<dots>,(Space, \" \"),\<dots>), + −
Token (\<dots>,(Ident, \"world\"),\<dots>)]"}+ −
+ −
produces three tokens where the first and the last are identifiers, since+ −
@{text [quotes] "hello"} and @{text [quotes] "world"} do not match any+ −
other syntactic category. The second indicates a space.+ −
+ −
We can easily change what is recognised as a keyword with the function+ −
@{ML_ind keyword in Keyword}. For example calling it with + −
*}+ −
+ −
ML{*val _ = Keyword.keyword "hello"*}+ −
+ −
text {*+ −
then lexing @{text [quotes] "hello world"} will produce+ −
+ −
@{ML_response_fake [display,gray] "Outer_Syntax.scan Position.none \"hello world\"" + −
"[Token (\<dots>,(Keyword, \"hello\"),\<dots>), + −
Token (\<dots>,(Space, \" \"),\<dots>), + −
Token (\<dots>,(Ident, \"world\"),\<dots>)]"}+ −
+ −
Many parsing functions later on will require white space, comments and the like+ −
to have already been filtered out. So from now on we are going to use the + −
functions @{ML filter} and @{ML_ind is_proper in Token} to do this. + −
For example:+ −
+ −
@{ML_response_fake [display,gray]+ −
"let+ −
val input = Outer_Syntax.scan Position.none \"hello world\"+ −
in+ −
filter Token.is_proper input+ −
end" + −
"[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}+ −
+ −
For convenience we define the function:+ −
*}+ −
+ −
ML{*fun filtered_input str = + −
filter Token.is_proper (Outer_Syntax.scan Position.none str) *}+ −
+ −
text {* + −
If you now parse+ −
+ −
@{ML_response_fake [display,gray] + −
"filtered_input \"inductive | for\"" + −
"[Token (\<dots>,(Command, \"inductive\"),\<dots>), + −
Token (\<dots>,(Keyword, \"|\"),\<dots>), + −
Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}+ −
+ −
you obtain a list consisting of only one command and two keyword tokens.+ −
If you want to see which keywords and commands are currently known to Isabelle, + −
use the function @{ML_ind get_lexicons in Keyword}:+ −
+ −
@{ML_response_fake [display,gray] + −
"let + −
val (keywords, commands) = Keyword.get_lexicons ()+ −
in + −
(Scan.dest_lexicon commands, Scan.dest_lexicon keywords)+ −
end" + −
"([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}+ −
+ −
You might have to adjust the @{ML_ind print_depth} in order to+ −
see the complete list.+ −
+ −
The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val input1 = filtered_input \"where for\"+ −
val input2 = filtered_input \"| in\"+ −
in + −
(Parse.$$$ \"where\" input1, Parse.$$$ \"|\" input2)+ −
end"+ −
"((\"where\",\<dots>), (\"|\",\<dots>))"}+ −
+ −
Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}.+ −
For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val p = Parse.reserved \"bar\"+ −
val input = filtered_input \"bar\"+ −
in+ −
p input+ −
end"+ −
"(\"bar\",[])"}+ −
+ −
Like before, you can sequentially connect parsers with @{ML "--"}. For example: + −
+ −
@{ML_response [display,gray]+ −
"let + −
val input = filtered_input \"| in\"+ −
in + −
(Parse.$$$ \"|\" -- Parse.$$$ \"in\") input+ −
end"+ −
"((\"|\", \"in\"), [])"}+ −
+ −
The parser @{ML "Parse.enum s p" for s p} parses a possibly empty + −
list of items recognised by the parser @{text p}, where the items being parsed+ −
are separated by the string @{text s}. For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val input = filtered_input \"in | in | in foo\"+ −
in + −
(Parse.enum \"|\" (Parse.$$$ \"in\")) input+ −
end" + −
"([\"in\", \"in\", \"in\"], [\<dots>])"}+ −
+ −
The function @{ML_ind enum1 in Parse} works similarly, except that the+ −
parsed list must be non-empty. Note that we had to add a string @{text+ −
[quotes] "foo"} at the end of the parsed string, otherwise the parser would+ −
have consumed all tokens and then failed with the exception @{text+ −
"MORE"}. Like in the previous section, we can avoid this exception using the+ −
wrapper @{ML Scan.finite}. This time, however, we have to use the+ −
``stopper-token'' @{ML Token.stopper}. We can write:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val input = filtered_input \"in | in | in\"+ −
val p = Parse.enum \"|\" (Parse.$$$ \"in\")+ −
in + −
Scan.finite Token.stopper p input+ −
end" + −
"([\"in\", \"in\", \"in\"], [])"}+ −
+ −
The following function will help to run examples.+ −
*}+ −
+ −
ML{*fun parse p input = Scan.finite Token.stopper (Scan.error p) input *}+ −
+ −
text {*+ −
The function @{ML_ind "!!!" in Parse} can be used to force termination+ −
of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous+ −
section). A difference, however, is that the error message of @{ML+ −
"Parse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"}+ −
together with a relatively precise description of the failure. For example:+ −
+ −
@{ML_response_fake [display,gray]+ −
"let + −
val input = filtered_input \"in |\"+ −
val parse_bar_then_in = Parse.$$$ \"|\" -- Parse.$$$ \"in\"+ −
in + −
parse (Parse.!!! parse_bar_then_in) input + −
end"+ −
"Exception ERROR \"Outer syntax error: keyword \"|\" expected, + −
but keyword in was found\" raised"+ −
}+ −
+ −
\begin{exercise} (FIXME)+ −
A type-identifier, for example @{typ "'a"}, is a token of + −
kind @{ML_ind Keyword in Token}. It can be parsed using + −
the function @{ML type_ident in Parse}.+ −
\end{exercise}+ −
+ −
(FIXME: or give parser for numbers)+ −
+ −
Whenever there is a possibility that the processing of user input can fail, + −
it is a good idea to give all available information about where the error + −
occurred. For this Isabelle can attach positional information to tokens+ −
and then thread this information up the ``processing chain''. To see this,+ −
modify the function @{ML filtered_input}, described earlier, as follows + −
*}+ −
+ −
ML{*fun filtered_input' str = + −
filter Token.is_proper (Outer_Syntax.scan (Position.line 7) str) *}+ −
+ −
text {*+ −
where we pretend the parsed string starts on line 7. An example is+ −
+ −
@{ML_response_fake [display,gray]+ −
"filtered_input' \"foo \\n bar\""+ −
"[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \<dots>),+ −
Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"}+ −
+ −
in which the @{text [quotes] "\\n"} causes the second token to be in + −
line 8.+ −
+ −
By using the parser @{ML position in Parse} you can access the token + −
position and return it as part of the parser result. For example+ −
+ −
@{ML_response_fake [display,gray]+ −
"let+ −
val input = filtered_input' \"where\"+ −
in + −
parse (Parse.position (Parse.$$$ \"where\")) input+ −
end"+ −
"((\"where\", {line=7, end_line=7}), [])"}+ −
+ −
\begin{readmore}+ −
The functions related to positions are implemented in the file+ −
@{ML_file "Pure/General/position.ML"}.+ −
\end{readmore}+ −
+ −
\begin{exercise}\label{ex:contextfree}+ −
Write a parser for the context-free grammar representing arithmetic + −
expressions with addition and multiplication. As usual, multiplication + −
binds stronger than addition, and both of them nest to the right. + −
The context-free grammar is defined as:+ −
+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
@{text "<Basic>"} & @{text "::="} & @{text "<Number> | (<Expr>)"}\\+ −
@{text "<Factor>"} & @{text "::="} & @{text "<Basic> * <Factor> | <Basic>"}\\+ −
@{text "<Expr>"} & @{text "::="} & @{text "<Factor> + <Expr> | <Factor>"}\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
Hint: Be careful with recursive parsers.+ −
\end{exercise}+ −
*}+ −
+ −
section {* Parsers for ML-Code (TBD) *}+ −
+ −
text {*+ −
@{ML_ind ML_source in Parse}+ −
*}+ −
+ −
section {* Context Parser (TBD) *}+ −
+ −
text {*+ −
@{ML_ind Args.context}+ −
*}+ −
(*+ −
ML {*+ −
let+ −
val parser = Args.context -- Scan.lift Args.name_source+ −
+ −
fun term_pat (ctxt, str) =+ −
str |> Syntax.read_prop ctxt+ −
in+ −
(parser >> term_pat) (Context.Proof @{context}, filtered_input "f (a::nat)")+ −
|> fst+ −
end+ −
*}+ −
*)+ −
+ −
text {*+ −
@{ML_ind Args.context}+ −
+ −
Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}.+ −
*}+ −
+ −
section {* Argument and Attribute Parsers (TBD) *}+ −
+ −
section {* Parsing Inner Syntax *}+ −
+ −
text {*+ −
There is usually no need to write your own parser for parsing inner syntax, that is + −
for terms and types: you can just call the predefined parsers. Terms can + −
be parsed using the function @{ML_ind term in Parse}. For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val input = Outer_Syntax.scan Position.none \"foo\"+ −
in + −
Parse.term input+ −
end"+ −
"(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}+ −
+ −
The function @{ML_ind prop in Parse} is similar, except that it gives a different+ −
error message, when parsing fails. As you can see, the parser not just returns + −
the parsed string, but also some encoded information. You can decode the+ −
information with the function @{ML_ind parse in YXML} in @{ML_struct YXML}. For example+ −
+ −
@{ML_response_fake [display,gray]+ −
"YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""+ −
"Text \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""}+ −
+ −
The result of the decoding is an XML-tree. You can see better what is going on if+ −
you replace @{ML Position.none} by @{ML "Position.line 42"}, say:+ −
+ −
@{ML_response_fake [display,gray]+ −
"let + −
val input = Outer_Syntax.scan (Position.line 42) \"foo\"+ −
in + −
YXML.parse (fst (Parse.term input))+ −
end"+ −
"Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}+ −
+ −
The positional information is stored as part of an XML-tree so that code + −
called later on will be able to give more precise error messages. + −
+ −
\begin{readmore}+ −
The functions to do with input and output of XML and YXML are defined + −
in @{ML_file "Pure/PIDE/xml.ML"} and @{ML_file "Pure/PIDE/yxml.ML"}.+ −
\end{readmore}+ −
+ −
FIXME:+ −
@{ML_ind parse_term in Syntax} @{ML_ind check_term in Syntax}+ −
@{ML_ind parse_typ in Syntax} @{ML_ind check_typ in Syntax}+ −
@{ML_ind read_term in Syntax} @{ML_ind read_term in Syntax}+ −
+ −
+ −
*}+ −
+ −
section {* Parsing Specifications\label{sec:parsingspecs} *}+ −
+ −
text {*+ −
There are a number of special purpose parsers that help with parsing+ −
specifications of function definitions, inductive predicates and so on. In+ −
Chapter~\ref{chp:package}, for example, we will need to parse specifications+ −
for inductive predicates of the form:+ −
*}+ −
+ −
+ −
simple_inductive+ −
even and odd+ −
where+ −
even0: "even 0"+ −
| evenS: "odd n \<Longrightarrow> even (Suc n)"+ −
| oddS: "even n \<Longrightarrow> odd (Suc n)"+ −
+ −
text {*+ −
For this we are going to use the parser:+ −
*}+ −
+ −
ML %linenosgray{*val spec_parser = + −
Parse.fixes -- + −
Scan.optional + −
(Parse.$$$ "where" |--+ −
Parse.!!! + −
(Parse.enum1 "|" + −
(Parse_Spec.opt_thm_name ":" -- Parse.prop))) []*}+ −
+ −
text {*+ −
Note that the parser must not parse the keyword \simpleinductive, even if it is+ −
meant to process definitions as shown above. The parser of the keyword + −
will be given by the infrastructure that will eventually call @{ML spec_parser}.+ −
+ −
+ −
To see what the parser returns, let us parse the string corresponding to the + −
definition of @{term even} and @{term odd}:+ −
+ −
@{ML_response [display,gray]+ −
"let+ −
val input = filtered_input+ −
(\"even and odd \" ^ + −
\"where \" ^+ −
\" even0[intro]: \\\"even 0\\\" \" ^ + −
\"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^ + −
\"| oddS[intro]: \\\"even n \<Longrightarrow> odd (Suc n)\\\"\")+ −
in+ −
parse spec_parser input+ −
end"+ −
"(([(even, NONE, NoSyn), (odd, NONE, NoSyn)],+ −
[((even0,\<dots>), \"\\^E\\^Ftoken\\^Eeven 0\\^E\\^F\\^E\"),+ −
((evenS,\<dots>), \"\\^E\\^Ftoken\\^Eodd n \<Longrightarrow> even (Suc n)\\^E\\^F\\^E\"),+ −
((oddS,\<dots>), \"\\^E\\^Ftoken\\^Eeven n \<Longrightarrow> odd (Suc n)\\^E\\^F\\^E\")]), [])"}+ −
+ −
As you see, the result is a pair consisting of a list of+ −
variables with optional type-annotation and syntax-annotation, and a list of+ −
rules where every rule has optionally a name and an attribute.+ −
+ −
The function @{ML_ind "fixes" in Parse} in Line 2 of the parser reads an + −
\isacommand{and}-separated + −
list of variables that can include optional type annotations and syntax translations. + −
For example:\footnote{Note that in the code we need to write + −
@{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes+ −
in the compound type.}+ −
+ −
@{ML_response [display,gray]+ −
"let+ −
val input = filtered_input + −
\"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\"+ −
in+ −
parse Parse.fixes input+ −
end"+ −
"([(foo, SOME \"\\^E\\^Ftoken\\^Eint \<Rightarrow> bool\\^E\\^F\\^E\", NoSyn), + −
(bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), + −
(blonk, NONE, NoSyn)],[])"} + −
*}+ −
+ −
text {*+ −
Whenever types are given, they are stored in the @{ML SOME}s. The types are+ −
not yet used to type the variables: this must be done by type-inference later+ −
on. Since types are part of the inner syntax they are strings with some+ −
encoded information (see previous section). If a mixfix-syntax is+ −
present for a variable, then it is stored in the + −
@{ML Mixfix} data structure; no syntax translation is indicated by @{ML_ind NoSyn in Syntax}.+ −
+ −
\begin{readmore}+ −
The data structure for mixfix annotations are implemented in + −
@{ML_file "Pure/Syntax/mixfix.ML"} and @{ML_file "Pure/Syntax/syntax.ML"}.+ −
\end{readmore}+ −
+ −
Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a+ −
list of introduction rules, that is propositions with theorem annotations+ −
such as rule names and attributes. The introduction rules are propositions+ −
parsed by @{ML_ind prop in Parse}. However, they can include an optional+ −
theorem name plus some attributes. For example+ −
+ −
@{ML_response [display,gray] "let + −
val input = filtered_input \"foo_lemma[intro,dest!]:\"+ −
val ((name, attrib), _) = parse (Parse_Spec.thm_name \":\") input + −
in + −
(name, map Args.dest_src attrib)+ −
end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}+ −
+ −
The function @{ML_ind opt_thm_name in Parse_Spec} is the ``optional'' variant of+ −
@{ML_ind thm_name in Parse_Spec}. Theorem names can contain attributes. The name + −
has to end with @{text [quotes] ":"}---see the argument of + −
the function @{ML Parse_Spec.opt_thm_name} in Line 7.+ −
+ −
\begin{readmore}+ −
Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} + −
and @{ML_file "Pure/Isar/args.ML"}.+ −
\end{readmore}+ −
*}+ −
+ −
text_raw {*+ −
\begin{exercise}+ −
Have a look at how the parser @{ML Parse_Spec.where_alt_specs} is implemented+ −
in file @{ML_file "Pure/Isar/parse_spec.ML"}. This parser corresponds+ −
to the ``where-part'' of the introduction rules given above. Below+ −
we paraphrase the code of @{ML_ind where_alt_specs in Parse_Spec} adapted to our+ −
purposes. + −
\begin{isabelle}+ −
*}+ −
ML %linenosgray{*val spec_parser' = + −
Parse.fixes -- + −
Scan.optional+ −
(Parse.$$$ "where" |-- + −
Parse.!!! + −
(Parse.enum1 "|" + −
((Parse_Spec.opt_thm_name ":" -- Parse.prop) --| + −
Scan.option (Scan.ahead (Parse.name || + −
Parse.$$$ "[") -- + −
Parse.!!! (Parse.$$$ "|"))))) [] *}+ −
text_raw {*+ −
\end{isabelle}+ −
Both parsers accept the same input% that's not true:+ −
% spec_parser accepts input that is refuted by spec_parser'+ −
, but if you look closely, you can notice + −
an additional ``tail'' (Lines 8 to 10) in @{ML spec_parser'}. What is the purpose of + −
this additional ``tail''?+ −
\end{exercise}+ −
*}+ −
+ −
text {*+ −
(FIXME: @{ML Parse.type_args}, @{ML Parse.typ}, @{ML Parse.opt_mixfix})+ −
*}+ −
+ −
+ −
section {* New Commands and Keyword Files\label{sec:newcommand} *}+ −
+ −
text {*+ −
Often new commands, for example for providing new definitional principles,+ −
need to be implemented. While this is not difficult on the ML-level,+ −
new commands, in order to be useful, need to be recognised by+ −
ProofGeneral. This results in some subtle configuration issues, which we+ −
will explain in this section.+ −
+ −
To keep things simple, let us start with a ``silly'' command that does nothing + −
at all. We shall name this command \isacommand{foobar}. On the ML-level it can be + −
defined as:+ −
*}+ −
+ −
ML{*let+ −
val do_nothing = Scan.succeed (Local_Theory.background_theory I)+ −
val kind = Keyword.thy_decl+ −
in+ −
Outer_Syntax.local_theory "foobar" "description of foobar" kind do_nothing+ −
end *}+ −
+ −
text {*+ −
The crucial function @{ML_ind local_theory in Outer_Syntax} expects a name for the command, a+ −
short description, a kind indicator (which we will explain later more thoroughly) and a+ −
parser producing a local theory transition (its purpose will also explained+ −
later). + −
+ −
While this is everything you have to do on the ML-level, you need a keyword+ −
file that can be loaded by ProofGeneral. This is to enable ProofGeneral to+ −
recognise \isacommand{foobar} as a command. Such a keyword file can be+ −
generated with the command-line:+ −
+ −
@{text [display] "$ isabelle keywords -k foobar some_log_files"}+ −
+ −
The option @{text "-k foobar"} indicates which postfix the name of the keyword file + −
will be assigned. In the case above the file will be named @{text+ −
"isar-keywords-foobar.el"}. This command requires log files to be+ −
present (in order to extract the keywords from them). To generate these log+ −
files, you first need to package the code above into a separate theory file named+ −
@{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the+ −
complete code.+ −
+ −
+ −
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+ −
\begin{figure}[t]+ −
\begin{graybox}\small+ −
\isacommand{theory}~@{text Command}\\+ −
\isacommand{imports}~@{text Main}\\+ −
\isacommand{begin}\\+ −
\isacommand{ML}~@{text "\<verbopen>"}\\+ −
@{ML+ −
"let+ −
val do_nothing = Scan.succeed (Local_Theory.background_theory I)+ −
val kind = Keyword.thy_decl+ −
in+ −
Outer_Syntax.local_theory \"foobar\" \"description of foobar\" kind do_nothing+ −
end"}\\+ −
@{text "\<verbclose>"}\\+ −
\isacommand{end}+ −
\end{graybox}+ −
\caption{This file can be used to generate a log file. This log file in turn can+ −
be used to generate a keyword file containing the command \isacommand{foobar}.+ −
\label{fig:commandtheory}}+ −
\end{figure}+ −
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+ −
+ −
For our purposes it is sufficient to use the log files of the theories+ −
@{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as+ −
the log file for the theory @{text "Command.thy"}, which contains the new+ −
\isacommand{foobar}-command. If you target other logics besides HOL, such+ −
as Nominal or ZF, then you need to adapt the log files appropriately.+ −
+ −
@{text Pure} and @{text HOL} are usually compiled during the installation of+ −
Isabelle. So log files for them should be already available. If not, then+ −
they can be conveniently compiled with the help of the build-script from the Isabelle+ −
distribution.+ −
+ −
@{text [display] + −
"$ ./build -m \"Pure\"+ −
$ ./build -m \"HOL\""}+ −
+ −
The @{text "Pure-ProofGeneral"} theory needs to be compiled with:+ −
+ −
@{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}+ −
+ −
For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory + −
with:+ −
+ −
@{text [display] "$ isabelle mkdir FoobarCommand"}+ −
+ −
This generates a directory containing the files: + −
+ −
@{text [display] + −
"./IsaMakefile+ −
./FoobarCommand/ROOT.ML+ −
./FoobarCommand/document+ −
./FoobarCommand/document/root.tex"}+ −
+ −
+ −
You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}+ −
and add the line + −
+ −
@{text [display] "no_document use_thy \"Command\";"} + −
+ −
to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing:+ −
+ −
@{text [display] "$ isabelle make"}+ −
+ −
If the compilation succeeds, you have finally created all the necessary log files. + −
They are stored in the directory + −
+ −
@{text [display] "~/.isabelle/heaps/Isabelle2009/polyml-5.2.1_x86-linux/log"}+ −
+ −
or something similar depending on your Isabelle distribution and architecture.+ −
One quick way to assign a shell variable to this directory is by typing+ −
+ −
@{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}+ −
+ −
on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the + −
directory should include the files:+ −
+ −
@{text [display] + −
"Pure.gz+ −
HOL.gz+ −
Pure-ProofGeneral.gz+ −
HOL-FoobarCommand.gz"} + −
+ −
From them you can create the keyword files. Assuming the name + −
of the directory is in @{text "$ISABELLE_LOGS"},+ −
then the Unix command for creating the keyword file is:+ −
+ −
@{text [display]+ −
"$ isabelle keywords -k foobar + −
$ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"}+ −
+ −
The result is the file @{text "isar-keywords-foobar.el"}. It should contain+ −
the string @{text "foobar"} twice.\footnote{To see whether things are fine,+ −
check that @{text "grep foobar"} on this file returns something non-empty.}+ −
This keyword file needs to be copied into the directory @{text+ −
"~/.isabelle/etc"}. To make ProofGeneral aware of it, you have to start+ −
Isabelle with the option @{text "-k foobar"}, that is:+ −
+ −
+ −
@{text [display] "$ isabelle emacs -k foobar a_theory_file"}+ −
+ −
If you now build a theory on top of @{text "Command.thy"}, + −
then you can use the command \isacommand{foobar}. You can just write+ −
*}+ −
+ −
foobar+ −
+ −
text {* + −
but you will not see any action as we chose to implement this command to do+ −
nothing. The point of this command is only to show the procedure of how+ −
to interact with ProofGeneral. A similar procedure has to be done with any + −
other new command, and also any new keyword that is introduced with + −
the function @{ML_ind keyword in Keyword}. For example:+ −
*}+ −
+ −
ML{*val _ = Keyword.keyword "blink" *}+ −
+ −
text {*+ −
At the moment the command \isacommand{foobar} is not very useful. Let us+ −
refine it a bit next by letting it take a proposition as argument and+ −
printing this proposition inside the tracing buffer.+ −
+ −
The crucial part of a command is the function that determines the behaviour+ −
of the command. In the code above we used a ``do-nothing''-function, which+ −
because of @{ML_ind succeed in Scan} does not parse any argument, but immediately+ −
returns the simple function @{ML "Local_Theory.background_theory I"}. We can+ −
replace this code by a function that first parses a proposition (using the+ −
parser @{ML Parse.prop}), then prints out the tracing+ −
information (using a new function @{text trace_prop}) and + −
finally does nothing. For this you can write:+ −
*}+ −
+ −
ML{*let+ −
fun trace_prop str = + −
Local_Theory.background_theory (fn ctxt => (tracing str; ctxt))+ −
+ −
val kind = Keyword.thy_decl+ −
in+ −
Outer_Syntax.local_theory "foobar_trace" "traces a proposition" + −
kind (Parse.prop >> trace_prop)+ −
end *}+ −
+ −
text {*+ −
The command is now \isacommand{foobar\_trace} and can be used to + −
see the proposition in the tracing buffer. + −
*}+ −
+ −
foobar_trace "True \<and> False"+ −
+ −
text {*+ −
Note that so far we used @{ML_ind thy_decl in Keyword} as the kind+ −
indicator for the command. This means that the command finishes as soon as+ −
the arguments are processed. Examples of this kind of commands are+ −
\isacommand{definition} and \isacommand{declare}. In other cases, commands+ −
are expected to parse some arguments, for example a proposition, and then+ −
``open up'' a proof in order to prove the proposition (for example+ −
\isacommand{lemma}) or prove some other properties (for example+ −
\isacommand{function}). To achieve this kind of behaviour, you have to use+ −
the kind indicator @{ML_ind thy_goal in Keyword} and the function @{ML+ −
"local_theory_to_proof" in Outer_Syntax} to set up the command. Note,+ −
however, once you change the ``kind'' of a command from @{ML thy_decl in+ −
Keyword} to @{ML thy_goal in Keyword} then the keyword file needs+ −
to be re-created!+ −
+ −
Below we show the command \isacommand{foobar\_goal} which takes a+ −
proposition as argument and then starts a proof in order to prove+ −
it. Therefore in Line 9, we set the kind indicator to @{ML thy_goal in+ −
Keyword}.+ −
*}+ −
+ −
ML%linenosgray{*let+ −
fun goal_prop str lthy =+ −
let+ −
val prop = Syntax.read_prop lthy str+ −
in+ −
Proof.theorem NONE (K I) [[(prop, [])]] lthy+ −
end+ −
+ −
val kind = Keyword.thy_goal+ −
in+ −
Outer_Syntax.local_theory_to_proof "foobar_goal" "proves a proposition" + −
kind (Parse.prop >> goal_prop)+ −
end *}+ −
+ −
text {*+ −
The function @{text goal_prop} in Lines 2 to 7 takes a string (the proposition to be+ −
proved) and a context as argument. The context is necessary in order to be able to use+ −
@{ML_ind read_prop in Syntax}, which converts a string into a proper proposition.+ −
In Line 6 the function @{ML_ind theorem in Proof} starts the proof for the+ −
proposition. Its argument @{ML NONE} stands for a locale (which we chose to+ −
omit); the argument @{ML "(K I)"} stands for a function that determines what+ −
should be done with the theorem once it is proved (we chose to just forget+ −
about it). Line 9 contains the parser for the proposition.+ −
+ −
If you now type \isacommand{foobar\_goal}~@{text [quotes] "True \<and> True"},+ −
you obtain the following proof state+ −
*}+ −
+ −
foobar_goal "True \<and> True"+ −
txt {*+ −
\begin{minipage}{\textwidth}+ −
@{subgoals [display]}+ −
\end{minipage}\medskip+ −
+ −
and can prove the proposition as follows.+ −
*}+ −
apply(rule conjI)+ −
apply(rule TrueI)++ −
done+ −
+ −
text {*+ −
{\bf TBD below}+ −
+ −
(FIXME: read a name and show how to store theorems; see @{ML_ind note in Local_Theory})+ −
+ −
*}+ −
+ −
ML {*+ −
structure Result = Proof_Data(+ −
type T = unit -> term+ −
fun init thy () = error "Result")+ −
+ −
val result_cookie = (Result.get, Result.put, "Result.put")+ −
*}+ −
+ −
ML{*let+ −
fun after_qed thm_name thms lthy =+ −
Local_Theory.note (thm_name, (flat thms)) lthy |> snd+ −
+ −
fun setup_proof (thm_name, (txt, pos)) lthy =+ −
let+ −
val trm = Code_Runtime.value lthy result_cookie ("", txt)+ −
in+ −
Proof.theorem NONE (after_qed thm_name) [[(trm,[])]] lthy+ −
end+ −
+ −
val parser = Parse_Spec.opt_thm_name ":" -- Parse.ML_source+ −
+ −
in+ −
Outer_Syntax.local_theory_to_proof "foobar_prove" "proving a proposition" + −
Keyword.thy_goal (parser >> setup_proof)+ −
end*}+ −
+ −
(*+ −
foobar_prove test: {* @{prop "True"} *}+ −
apply(rule TrueI)+ −
done+ −
*)+ −
+ −
(*+ −
ML {*+ −
structure TacticData = ProofDataFun+ −
(+ −
type T = thm list -> tactic;+ −
fun init _ = undefined;+ −
)+ −
+ −
val set_tactic = TacticData.put;+ −
*}+ −
+ −
ML {*+ −
TacticData.get @{context}+ −
*}+ −
+ −
ML {* Method.set_tactic *}+ −
ML {* fun tactic (facts: thm list) : tactic = (atac 1) *}+ −
ML {* Context.map_proof *}+ −
ML {* ML_Context.expression *}+ −
ML {* METHOD *}+ −
+ −
+ −
ML {* + −
fun myexpression pos bind body txt =+ −
let+ −
val _ = tracing ("bind)" ^ bind)+ −
val _ = tracing ("body)" ^ body)+ −
val _ = tracing ("txt)" ^ txt)+ −
val _ = tracing ("result) " ^ "Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^+ −
" end (ML_Context.the_generic_context ())));")+ −
in+ −
ML_Context.exec (fn () => ML_Context.eval false pos+ −
("Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^+ −
" end (ML_Context.the_generic_context ())));"))+ −
end+ −
*}+ −
+ −
+ −
ML {*+ −
fun ml_tactic (txt, pos) ctxt =+ −
let+ −
val ctxt' = ctxt |> Context.proof_map+ −
(myexpression pos+ −
"fun tactic (facts: thm list) : tactic"+ −
"Context.map_proof (Method.set_tactic tactic)" txt);+ −
in + −
Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt')+ −
end;+ −
*}+ −
+ −
ML {*+ −
fun tactic3 (txt, pos) ctxt = + −
let+ −
val _ = tracing ("1) " ^ txt )+ −
in + −
METHOD (ml_tactic (txt, pos) ctxt; K (atac 1))+ −
end+ −
*}+ −
+ −
setup {*+ −
Method.setup (Binding.name "tactic3") (Scan.lift (Parse.position Args.name) + −
>> tactic3)+ −
"ML tactic as proof method"+ −
*}+ −
+ −
lemma "A \<Longrightarrow> A"+ −
apply(tactic3 {* (atac 1) *})+ −
done+ −
+ −
ML {* + −
(ML_Context.the_generic_context ())+ −
*}+ −
+ −
ML {*+ −
Context.set_thread_data;+ −
ML_Context.the_generic_context+ −
*}+ −
+ −
lemma "A \<Longrightarrow> A"+ −
ML_prf {*+ −
Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));+ −
*}+ −
+ −
ML {*+ −
Context.set_thread_data (SOME ((let fun tactic (facts: thm list) : tactic = (atac 1) in 3 end) (ML_Context.the_generic_context ())));+ −
*}+ −
+ −
ML {*+ −
Context.set_thread_data (SOME (let + −
fun tactic (facts: thm list) : tactic = (atac 1) + −
in + −
Context.map_proof (Method.set_tactic tactic) + −
end + −
(ML_Context.the_generic_context ())));+ −
*}+ −
+ −
+ −
ML {*+ −
let + −
fun tactic (facts: thm list) : tactic = atac+ −
in+ −
Context.map_proof (Method.set_tactic tactic)+ −
end *}+ −
+ −
end *}+ −
+ −
ML {* Toplevel.program (fn () => + −
(ML_Context.expression Position.none "val plus : int" "3 + 4" "1" (Context.Proof @{context})))*}+ −
+ −
+ −
ML {*+ −
fun ml_tactic (txt, pos) ctxt =+ −
let+ −
val ctxt' = ctxt |> Context.proof_map+ −
(ML_Context.expression pos+ −
"fun tactic (facts: thm list) : tactic"+ −
"Context.map_proof (Method.set_tactic tactic)" txt);+ −
in Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt') end;+ −
+ −
*}+ −
+ −
ML {*+ −
Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic = (atac 1) in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));+ −
*}+ −
*)+ −
+ −
section {* Methods (TBD) *}+ −
+ −
text {*+ −
(FIXME: maybe move to after the tactic section)+ −
+ −
Methods are central to Isabelle. They are the ones you use for example+ −
in \isacommand{apply}. To print out all currently known methods you can use the + −
Isabelle command:+ −
+ −
\begin{isabelle}+ −
\isacommand{print\_methods}\\+ −
@{text "> methods:"}\\+ −
@{text "> -: do nothing (insert current facts only)"}\\+ −
@{text "> HOL.default: apply some intro/elim rule (potentially classical)"}\\+ −
@{text "> ..."}+ −
\end{isabelle}+ −
+ −
An example of a very simple method is:+ −
*}+ −
+ −
method_setup %gray foo = + −
{* Scan.succeed+ −
(K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *}+ −
"foo method for conjE and conjI"+ −
+ −
text {*+ −
It defines the method @{text foo}, which takes no arguments (therefore the+ −
parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which + −
applies @{thm [source] conjE} and then @{thm [source] conjI}. The function + −
@{ML_ind SIMPLE_METHOD in Method}+ −
turns such a tactic into a method. The method @{text "foo"} can be used as follows+ −
*}+ −
+ −
lemma shows "A \<and> B \<Longrightarrow> C \<and> D"+ −
apply(foo)+ −
txt {*+ −
where it results in the goal state+ −
+ −
\begin{minipage}{\textwidth}+ −
@{subgoals}+ −
\end{minipage} *}+ −
(*<*)oops(*>*)+ −
+ −
method_setup test = {* Scan.lift (Scan.succeed (K Method.succeed)) *} {* bla *}+ −
+ −
lemma "True"+ −
apply(test)+ −
oops+ −
+ −
method_setup joker = {* Scan.lift (Scan.succeed (fn ctxt => Method.cheating true ctxt)) *} {* bla *}+ −
+ −
lemma "False"+ −
apply(joker)+ −
oops+ −
+ −
text {* if true is set then always works *}+ −
+ −
ML {* atac *} + −
+ −
method_setup first_atac = {* Scan.lift (Scan.succeed (K (SIMPLE_METHOD (atac 1)))) *} {* bla *}+ −
+ −
ML {* HEADGOAL *}+ −
+ −
lemma "A \<Longrightarrow> A"+ −
apply(first_atac)+ −
oops+ −
+ −
method_setup my_atac = {* Scan.lift (Scan.succeed (K (SIMPLE_METHOD' atac))) *} {* bla *}+ −
+ −
lemma "A \<Longrightarrow> A"+ −
apply(my_atac)+ −
oops+ −
+ −
+ −
+ −
+ −
+ −
+ −
+ −
(*+ −
ML {* SIMPLE_METHOD *}+ −
ML {* METHOD *}+ −
ML {* K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1)) *}+ −
ML {* Scan.succeed *}+ −
*)+ −
+ −
ML {* resolve_tac *}+ −
+ −
method_setup myrule =+ −
{* Scan.lift (Scan.succeed (K (METHOD (fn thms => resolve_tac thms 1)))) *}+ −
{* bla *}+ −
+ −
lemma+ −
assumes a: "A \<Longrightarrow> B \<Longrightarrow> C"+ −
shows "C"+ −
using a+ −
apply(myrule)+ −
oops+ −
+ −
+ −
+ −
text {*+ −
(********************************************************)+ −
(FIXME: explain a version of rule-tac)+ −
*}+ −
+ −
end+ −