theory Parsing+ −
imports Base + −
+ −
begin+ −
+ −
+ −
chapter {* Parsing *}+ −
+ −
text {*+ −
+ −
Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. + −
Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so+ −
on, belong to the outer syntax, whereas items inside double quotation marks, such + −
as terms, types and so on, belong to the inner syntax. For parsing inner syntax, + −
Isabelle uses a rather general and sophisticated algorithm due to Earley, which + −
is driven by priority grammars. Parsers for outer syntax are built up by functional+ −
parsing combinators. These combinators are a well-established technique for parsing, + −
which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}.+ −
Isabelle developers are usually concerned with writing these outer syntax parsers, + −
either for new definitional packages or for calling tactics with specific arguments. + −
+ −
\begin{readmore}+ −
The library + −
for writing parser combinators is split up, roughly, into two parts. + −
The first part consists of a collection of generic parser combinators defined+ −
in the structure @{ML_struct Scan} in the file + −
@{ML_file "Pure/General/scan.ML"}. The second part of the library consists of + −
combinators for dealing with specific token types, which are defined in the + −
structure @{ML_struct OuterParse} in the file + −
@{ML_file "Pure/Isar/outer_parse.ML"}.+ −
\end{readmore}+ −
+ −
*}+ −
+ −
section {* Building Generic Parsers *}+ −
+ −
text {*+ −
+ −
Let us first have a look at parsing strings using generic parsing combinators. + −
The function @{ML "(op $$)"} takes a string as argument and will ``consume'' this string from + −
a given input list of strings. ``Consume'' in this context means that it will + −
return a pair consisting of this string and the rest of the input list. + −
For example:+ −
+ −
@{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}+ −
+ −
@{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}+ −
+ −
This function will either succeed (as in the two examples above) or raise the exception + −
@{text "FAIL"} if no string can be consumed. For example trying to parse+ −
+ −
@{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" + −
"Exception FAIL raised"}+ −
+ −
will raise the exception @{text "FAIL"}.+ −
There are three exceptions used in the parsing combinators:+ −
+ −
\begin{itemize}+ −
\item @{text "FAIL"} is used to indicate that alternative routes of parsing + −
might be explored. + −
\item @{text "MORE"} indicates that there is not enough input for the parser. For example + −
in @{text "($$ \"h\") []"}.+ −
\item @{text "ABORT"} is the exception that is raised when a dead end is reached. + −
It is used for example in the function @{ML "(op !!)"} (see below).+ −
\end{itemize}+ −
+ −
However, note that these exceptions are private to the parser and cannot be accessed+ −
by the programmer (for example to handle them).+ −
+ −
Slightly more general than the parser @{ML "(op $$)"} is the function @{ML+ −
Scan.one}, in that it takes a predicate as argument and then parses exactly+ −
one item from the input list satisfying this predicate. For example the+ −
following parser either consumes an @{text [quotes] "h"} or a @{text+ −
[quotes] "w"}:+ −
+ −
+ −
@{ML_response [display,gray] + −
"let + −
val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")+ −
val input1 = (explode \"hello\")+ −
val input2 = (explode \"world\")+ −
in+ −
(hw input1, hw input2)+ −
end"+ −
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
Two parser can be connected in sequence by using the function @{ML "(op --)"}. + −
For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this + −
sequence can be achieved by+ −
+ −
@{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"+ −
"(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}+ −
+ −
Note how the result of consumed strings builds up on the left as nested pairs. + −
+ −
If, as in the previous example, one wants to parse a particular string, + −
then one should use the function @{ML Scan.this_string}:+ −
+ −
@{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"+ −
"(\"hell\", [\"o\"])"}+ −
+ −
Parsers that explore alternatives can be constructed using the function @{ML+ −
"(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the+ −
result of @{text "p"}, in case it succeeds, otherwise it returns the+ −
result of @{text "q"}. For example+ −
+ −
+ −
@{ML_response [display,gray] + −
"let + −
val hw = ($$ \"h\") || ($$ \"w\")+ −
val input1 = (explode \"hello\")+ −
val input2 = (explode \"world\")+ −
in+ −
(hw input1, hw input2)+ −
end"+ −
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing function + −
for parsers, except that they discard the item being parsed by the first (respectively second)+ −
parser. For example+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val just_e = ($$ \"h\") |-- ($$ \"e\") + −
val just_h = ($$ \"h\") --| ($$ \"e\") + −
val input = (explode \"hello\") + −
in + −
(just_e input, just_h input)+ −
end"+ −
"((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}+ −
+ −
The parser @{ML "Scan.optional p x" for p x} returns the result of the parser + −
@{text "p"}, if it succeeds; otherwise it returns + −
the default value @{text "x"}. For example+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val p = Scan.optional ($$ \"h\") \"x\"+ −
val input1 = (explode \"hello\")+ −
val input2 = (explode \"world\") + −
in + −
(p input1, p input2)+ −
end" + −
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
The function @{ML Scan.option} works similarly, except no default value can+ −
be given. Instead, the result is wrapped as an @{text "option"}-type. For example:+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val p = Scan.option ($$ \"h\")+ −
val input1 = (explode \"hello\")+ −
val input2 = (explode \"world\") + −
in + −
(p input1, p input2)+ −
end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}+ −
+ −
The function @{ML "(op !!)"} helps to produce appropriate error messages+ −
during parsing. For example if one wants to parse that @{text p} is immediately + −
followed by @{text q}, or start a completely different parser @{text r},+ −
one might write+ −
+ −
@{ML [display,gray] "(p -- q) || r" for p q r}+ −
+ −
However, this parser is problematic for producing an appropriate error message, in case+ −
the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses the information + −
that @{text p} should be followed by @{text q}. To see this consider the case in+ −
which @{text p} + −
is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail + −
and the + −
alternative parser @{text r} will be tried. However in many circumstance this will be the wrong+ −
parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then + −
caused by the+ −
failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation+ −
can be avoided when using the function @{ML "(op !!)"}. This function aborts the whole process of+ −
parsing in case of a failure and prints an error message. For example if we invoke the parser+ −
+ −
@{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}+ −
+ −
on @{text [quotes] "hello"}, the parsing succeeds+ −
+ −
@{ML_response [display,gray] + −
"(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" + −
"(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}+ −
+ −
but if we invoke it on @{text [quotes] "world"}+ −
+ −
@{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"+ −
"Exception ABORT raised"}+ −
+ −
then the parsing aborts and the error message @{text "foo"} is printed out. In order to+ −
see the error message, we need to prefix the parser with the function + −
@{ML "Scan.error"}. For example+ −
+ −
@{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"+ −
"Exception Error \"foo\" raised"}+ −
+ −
This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} + −
(FIXME: give reference to later place). + −
+ −
Let us now return to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want+ −
to generate the correct error message for p-followed-by-q, then+ −
we have to write:+ −
*}+ −
+ −
ML{*fun p_followed_by_q p q r =+ −
let + −
val err_msg = (fn _ => p ^ " is not followed by " ^ q)+ −
in+ −
($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)+ −
end *}+ −
+ −
+ −
text {*+ −
Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and + −
the input @{text [quotes] "holle"} + −
+ −
@{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"+ −
"Exception ERROR \"h is not followed by e\" raised"} + −
+ −
produces the correct error message. Running it with+ −
+ −
@{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"+ −
"((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}+ −
+ −
yields the expected parsing. + −
+ −
The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as + −
often as it succeeds. For example+ −
+ −
@{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" + −
"([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}+ −
+ −
Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function+ −
@{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} + −
succeeds at least once.+ −
+ −
Also note that the parser would have aborted with the exception @{text MORE}, if+ −
we had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using+ −
the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With+ −
them we can write+ −
+ −
@{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" + −
"([\"h\", \"h\", \"h\", \"h\"], [])"}+ −
+ −
@{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;+ −
other stoppers need to be used when parsing token, for example. However, this kind of + −
manually wrapping is often already done by the surrounding infrastructure. + −
+ −
The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any + −
string as in+ −
+ −
@{ML_response [display,gray] + −
"let + −
val p = Scan.repeat (Scan.one Symbol.not_eof)+ −
val input = (explode \"foo bar foo\") + −
in+ −
Scan.finite Symbol.stopper p input+ −
end" + −
"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}+ −
+ −
where the function @{ML Symbol.not_eof} ensures that we do not read beyond the + −
end of the input string (i.e.~stopper symbol).+ −
+ −
The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can + −
parse the input, then the whole parser fails; if not, then the second is tried. Therefore+ −
+ −
@{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"+ −
"Exception FAIL raised"}+ −
+ −
fails, while+ −
+ −
@{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")"+ −
"(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}+ −
+ −
succeeds. + −
+ −
The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any+ −
input until a certain marker symbol is reached. In the example below the marker+ −
symbol is a @{text [quotes] "*"}.+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))+ −
val input1 = (explode \"fooooo\")+ −
val input2 = (explode \"foo*ooo\")+ −
in+ −
(Scan.finite Symbol.stopper p input1, + −
Scan.finite Symbol.stopper p input2)+ −
end"+ −
"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),+ −
([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}+ −
+ −
After parsing is done, one nearly always wants to apply a function on the parsed + −
items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs + −
first the parser @{text p} and upon successful completion applies the + −
function @{text f} to the result. For example+ −
+ −
@{ML_response [display,gray]+ −
"let + −
fun double (x,y) = (x^x,y^y) + −
in+ −
(($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")+ −
end"+ −
"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}+ −
+ −
doubles the two parsed input strings. Or+ −
+ −
@{ML_response [display,gray] + −
"let + −
val p = Scan.repeat (Scan.one Symbol.not_eof) >> implode+ −
val input = (explode \"foo bar foo\") + −
in+ −
Scan.finite Symbol.stopper p input+ −
end" + −
"(\"foo bar foo\",[])"}+ −
+ −
where the single-character strings in the parsed output are transformed+ −
back into one string.+ −
+ −
\begin{exercise}\label{ex:scancmts}+ −
Write a parser that parses an input string so that any comment enclosed+ −
inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside+ −
@{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the+ −
function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML+ −
"s1^s^s2" for s1 s2 s}.+ −
\end{exercise}+ −
+ −
+ −
The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies+ −
the given parser to the second component of the pair and leaves the first component + −
untouched. For example+ −
+ −
@{ML_response [display,gray]+ −
"Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"+ −
"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}+ −
+ −
(FIXME: In which situations is this useful? Give examples.) + −
*}+ −
+ −
section {* Parsing Theory Syntax *}+ −
+ −
text {*+ −
Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.+ −
This is because the parsers for the theory syntax, as well as the parsers for the + −
argument syntax of proof methods and attributes use the type + −
@{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}).+ −
There are also parsers for ML-expressions and ML-files.+ −
+ −
\begin{readmore}+ −
The parser functions for the theory syntax are contained in the structure+ −
@{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.+ −
The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.+ −
\end{readmore}+ −
+ −
The structure @{ML_struct OuterLex} defines several kinds of tokens (for example + −
@{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and+ −
@{ML "Command" in OuterLex} for commands). Some token parsers take into account the + −
kind of tokens.+ −
*} + −
+ −
text {*+ −
For the examples below, we can generate a token list out of a string using+ −
the function @{ML "OuterSyntax.scan"}, which we give below @{ML+ −
"Position.none"} as argument since, at the moment, we are not interested in+ −
generating precise error messages. The following code+ −
+ −
+ −
@{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" + −
"[Token (\<dots>,(Ident, \"hello\"),\<dots>), + −
Token (\<dots>,(Space, \" \"),\<dots>), + −
Token (\<dots>,(Ident, \"world\"),\<dots>)]"}+ −
+ −
produces three tokens where the first and the last are identifiers, since+ −
@{text [quotes] "hello"} and @{text [quotes] "world"} do not match any+ −
other syntactic category.\footnote{Note that because of a possible a bug in+ −
the PolyML runtime system the result is printed as @{text "?"}, instead of+ −
the tokens.} The second indicates a space.+ −
+ −
Many parsing functions later on will require spaces, comments and the like+ −
to have already been filtered out. So from now on we are going to use the + −
functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example+ −
+ −
+ −
@{ML_response_fake [display,gray]+ −
"let+ −
val input = OuterSyntax.scan Position.none \"hello world\"+ −
in+ −
filter OuterLex.is_proper input+ −
end" + −
"[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}+ −
+ −
For convenience we define the function+ −
+ −
*}+ −
+ −
ML{*fun filtered_input str = + −
filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}+ −
+ −
text {*+ −
+ −
If we parse+ −
+ −
@{ML_response_fake [display,gray] + −
"filtered_input \"inductive | for\"" + −
"[Token (\<dots>,(Command, \"inductive\"),\<dots>), + −
Token (\<dots>,(Keyword, \"|\"),\<dots>), + −
Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}+ −
+ −
we obtain a list consisting of only a command and two keyword tokens.+ −
If you want to see which keywords and commands are currently known, type in+ −
the following code (you might have to adjust the @{ML print_depth} in order to+ −
see the complete list):+ −
+ −
@{ML_response_fake [display,gray] + −
"let + −
val (keywords, commands) = OuterKeyword.get_lexicons ()+ −
in + −
(Scan.dest_lexicon commands, Scan.dest_lexicon keywords)+ −
end" + −
"([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}+ −
+ −
Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val input1 = filtered_input \"where for\"+ −
val input2 = filtered_input \"| in\"+ −
in + −
(OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)+ −
end"+ −
"((\"where\",\<dots>),(\"|\",\<dots>))"}+ −
+ −
Like before, we can sequentially connect parsers with @{ML "(op --)"}. For example + −
+ −
@{ML_response [display,gray]+ −
"let + −
val input = filtered_input \"| in\"+ −
in + −
(OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input+ −
end"+ −
"((\"|\",\"in\"),[])"}+ −
+ −
The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty + −
list of items recognised by the parser @{text p}, where the items being parsed+ −
are separated by the string @{text s}. For example+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val input = filtered_input \"in | in | in foo\"+ −
in + −
(OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input+ −
end" + −
"([\"in\",\"in\",\"in\"],[\<dots>])"}+ −
+ −
@{ML "OuterParse.enum1"} works similarly, except that the parsed list must+ −
be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the end+ −
of the parsed string, otherwise the parser would have consumed all tokens+ −
and then failed with the exception @{text "MORE"}. Like in the previous+ −
section, we can avoid this exception using the wrapper @{ML+ −
Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML+ −
OuterLex.stopper}. We can write+ −
+ −
@{ML_response [display,gray]+ −
"let + −
val input = filtered_input \"in | in | in\"+ −
in + −
Scan.finite OuterLex.stopper + −
(OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input+ −
end" + −
"([\"in\",\"in\",\"in\"],[])"}+ −
+ −
The following function will help to run examples.+ −
+ −
*}+ −
+ −
ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}+ −
+ −
text {*+ −
+ −
The function @{ML "OuterParse.!!!"} can be used to force termination of the+ −
parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), + −
except that the error message is fixed to be @{text [quotes] "Outer syntax error"}+ −
with a relatively precise description of the failure. For example:+ −
+ −
@{ML_response_fake [display,gray]+ −
"let + −
val input = filtered_input \"in |\"+ −
val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"+ −
in + −
parse (OuterParse.!!! parse_bar_then_in) input + −
end"+ −
"Exception ERROR \"Outer syntax error: keyword \"|\" expected, + −
but keyword in was found\" raised"+ −
}+ −
+ −
\begin{exercise} (FIXME)+ −
A type-identifier, for example @{typ "'a"}, is a token of + −
kind @{ML "Keyword" in OuterLex}. It can be parsed using + −
the function @{ML OuterParse.type_ident}.+ −
\end{exercise}+ −
+ −
+ −
*}+ −
+ −
+ −
+ −
+ −
+ −
section {* Positional Information *}+ −
+ −
text {*+ −
+ −
@{ML OuterParse.position}+ −
+ −
*}+ −
+ −
ML{*+ −
OuterParse.position+ −
*}+ −
+ −
+ −
section {* Parsing Inner Syntax *}+ −
+ −
ML{*+ −
let + −
val input = OuterSyntax.scan Position.none "0"+ −
in + −
OuterParse.prop input+ −
end + −
+ −
*}+ −
+ −
ML{*+ −
OuterParse.opt_target+ −
*}+ −
+ −
text {* (FIXME funny output for a proposition) *}+ −
+ −
ML{*+ −
OuterParse.opt_target --+ −
OuterParse.fixes -- + −
OuterParse.for_fixes --+ −
Scan.optional (OuterParse.$$$ "where" |--+ −
OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []+ −
+ −
*}+ −
+ −
ML{* OuterSyntax.command *}+ −
+ −
section {* New Commands and Keyword Files *}+ −
+ −
text {*+ −
Often new commands, for example for providing new definitional principles,+ −
need to be implemented. While this is not difficult on the ML-level,+ −
new commands, in order to be useful, need to be recognised by+ −
ProofGeneral. This results in some subtle configuration issues, which we+ −
will explain in this section.+ −
+ −
To keep things simple, let us start with a ``silly'' command that does nothing + −
at all. We shall name this command \isacommand{foobar}. On the ML-level it can be + −
defined as+ −
*}+ −
+ −
ML{*let+ −
val do_nothing = Scan.succeed (Toplevel.theory I)+ −
val kind = OuterKeyword.thy_decl+ −
in+ −
OuterSyntax.command "foobar" "description of foobar" kind do_nothing+ −
end *}+ −
+ −
text {*+ −
The crucial function @{ML OuterSyntax.command} expects a name for the command, a+ −
short description, a kind indicator (which we will explain later on more thoroughly) and a+ −
parser producing a top-level transition function (its purpose will also explained+ −
later). + −
+ −
While this is everything we have to do on the ML-level, we need a keyword+ −
file that can be loaded by ProofGeneral. This is to enable ProofGeneral to+ −
recognise \isacommand{foobar} as a command. Such a keyword file can be+ −
generated with the command-line:+ −
+ −
@{text [display] "$ isabelle keywords -k foobar some_log_files"}+ −
+ −
The option @{text "-k foobar"} indicates which postfix the name of the keyword file + −
will be assigned. In the case above the file will be named @{text+ −
"isar-keywords-foobar.el"}. As can be seen, this command requires log files to be+ −
present (in order to extract the keywords from them). To generate these log+ −
files, we first package the code above into a separate theory file named+ −
@{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the+ −
complete code.+ −
+ −
+ −
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+ −
\begin{figure}[t]+ −
\begin{graybox}\small+ −
\isacommand{theory}~@{text Command}\\+ −
\isacommand{imports}~@{text Main}\\+ −
\isacommand{begin}\\+ −
\isacommand{ML}~\isa{\isacharverbatimopen}\\+ −
@{ML+ −
"let+ −
val do_nothing = Scan.succeed (Toplevel.theory I)+ −
val kind = OuterKeyword.thy_decl+ −
in+ −
OuterSyntax.command \"foobar\" \"description of foobar\" kind do_nothing+ −
end"}\\+ −
\isa{\isacharverbatimclose}\\+ −
\isacommand{end}+ −
\end{graybox}+ −
\caption{\small The file @{text "Command.thy"} is necessary for generating a log + −
file. This log file enables Isabelle to generate a keyword file containing + −
the command \isacommand{foobar}.\label{fig:commandtheory}}+ −
\end{figure}+ −
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%+ −
+ −
For our purposes it is sufficient to use the log files of the theories+ −
@{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as+ −
the log file for the theory @{text "Command.thy"}, which contains the new+ −
\isacommand{foobar}-command. If you target other logics besides HOL, such+ −
as Nominal or ZF, then you need to adapt the log files appropriately.+ −
@{text Pure} and @{text HOL} are usually compiled during the installation of+ −
Isabelle. So log files for them should be already available. If not, then+ −
they can be conveniently compiled with the help of the build-script from the Isabelle+ −
distribution+ −
+ −
+ −
@{text [display] + −
"$ ./build -m \"Pure\"+ −
$ ./build -m \"HOL\""}+ −
+ −
The @{text "Pure-ProofGeneral"} theory needs to be compiled with+ −
+ −
@{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}+ −
+ −
For the theory @{text "Command.thy"}, we first create a ``managed'' subdirectory + −
with+ −
+ −
@{text [display] "$ isabelle mkdir FoobarCommand"}+ −
+ −
This generates a directory containing the files + −
+ −
@{text [display] + −
"./IsaMakefile+ −
./FoobarCommand/ROOT.ML+ −
./FoobarCommand/document+ −
./FoobarCommand/document/root.tex"}+ −
+ −
+ −
We need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}+ −
and add the line + −
+ −
@{text [display] "use_thy \"Command\";"} + −
+ −
to the file @{text "./FoobarCommand/ROOT.ML"}. We can now compile the theory by just typing+ −
+ −
@{text [display] "$ isabelle make"}+ −
+ −
We created finally all the necessary log files. They are stored + −
in the directory + −
+ −
@{text [display] "~/.isabelle/heaps/Isabelle2008/polyml-5.2.1_x86-linux/log"}+ −
+ −
or something similar depending on your Isabelle distribution and architecture.+ −
One quick way to assign a shell variable to this directory is by typing+ −
+ −
@{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}+ −
+ −
on the Unix prompt. The directory should include the files+ −
+ −
@{text [display] + −
"Pure.gz+ −
HOL.gz+ −
Pure-ProofGeneral.gz+ −
HOL-FoobarCommand.gz"} + −
+ −
From them we create the keyword files. Assuming the name + −
of the directory is in @{text "$ISABELLE_LOGS"},+ −
then the Unix command for creating the keyword file is:+ −
+ −
@{text [display]+ −
"$ isabelle keywords -k foobar + −
$ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"}+ −
+ −
The result is the file @{text "isar-keywords-foobar.el"}. It should contain+ −
the string @{text "foobar"} twice (to see whether things went wrong, check+ −
that @{text "grep foobar"} on this file returns something+ −
non-empty). This keyword file needs to+ −
be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle+ −
aware of this keyword file, we have to start Isabelle with the option @{text+ −
"-k foobar"}, i.e.+ −
+ −
+ −
@{text [display] "$ isabelle -k foobar a_theory_file"}+ −
+ −
If we now build a theory on top of @{text "Command.thy"}, + −
then we can make use of the command \isacommand{foobar}. + −
Similarly with any other new command. + −
+ −
+ −
At the moment \isacommand{foobar} is not very useful. Let us refine it a bit + −
next by taking a proposition as argument and printing this proposition inside+ −
the tracing buffer. + −
+ −
The crucial part of a command is the function that determines the behaviour+ −
of the command. In the code above we used a ``do-nothing''-function, which+ −
because of @{ML Scan.succeed} does not parse any argument, but immediately+ −
returns the simple toplevel function @{ML "Toplevel.theory I"}. We can+ −
replace this code by a function that first parses a proposition (using the+ −
parser @{ML OuterParse.prop}), then prints out the tracing+ −
information (using a new top-level function @{text trace_top_lvl}) and + −
finally does nothing. For this we can write+ −
+ −
*}+ −
+ −
ML{*let+ −
fun trace_top_lvl str = + −
Toplevel.theory (fn thy => (tracing str; thy))+ −
+ −
val trace_prop = OuterParse.prop >> trace_top_lvl+ −
+ −
val kind = OuterKeyword.thy_decl+ −
in+ −
OuterSyntax.command "foobar" "traces a proposition" kind trace_prop+ −
end *}+ −
+ −
text {*+ −
Now we can type+ −
+ −
\begin{isabelle}+ −
\isacommand{foobar}~@{text [quotes] "True \<and> False"}\\+ −
@{text "> \"True \<and> False\""}+ −
\end{isabelle}+ −
+ −
and see the proposition in the tracing buffer. + −
+ −
Note that so far we used @{ML thy_decl in OuterKeyword} as the kind indicator+ −
for the command. This means that the command finishes as soon as the+ −
arguments are processed. Examples of this kind of commands are+ −
\isacommand{definition} and \isacommand{declare}. In other cases,+ −
commands are expected to parse some arguments, for example a proposition,+ −
and then ``open up'' a proof in order to prove the proposition (for example+ −
\isacommand{lemma}) or prove some other properties (for example in+ −
\isacommand{function}). To achieve this kind of behaviour, we have to use the kind+ −
indicator @{ML thy_goal in OuterKeyword}.+ −
+ −
Below we change \isacommand{foobar} so that it takes a proposition as+ −
argument and then starts a proof in order to prove it. Therefore in Line 13+ −
below, we set the kind indicator to @{ML thy_goal in OuterKeyword}.+ −
*}+ −
+ −
ML%linenumbers{*let+ −
fun set_up_thm str ctxt =+ −
let+ −
val prop = Syntax.read_prop ctxt str+ −
in+ −
Proof.theorem_i NONE (K I) [[(prop,[])]] ctxt+ −
end;+ −
+ −
val prove_prop = OuterParse.prop >> + −
(fn str => Toplevel.print o + −
Toplevel.local_theory_to_proof NONE (set_up_thm str))+ −
+ −
val kind = OuterKeyword.thy_goal+ −
in+ −
OuterSyntax.command "foobar" "proving a proposition" kind prove_prop+ −
end *}+ −
+ −
text {*+ −
The function @{text set_up_thm} takes a string (the proposition to be+ −
proved) and a context. The context is necessary in order to be able to use+ −
@{ML Syntax.read_prop}, which converts a string into a proper proposition.+ −
In Line 6 the function @{ML Proof.theorem_i} starts the proof for the+ −
proposition. Its argument @{ML NONE} stands for a locale (which we chose to+ −
omit); the argument @{ML "(K I)"} stands for a function that determines what+ −
should be done with the theorem once it is proved (we chose to just forget+ −
about it). In Lines 9 to 11 contain the parser for the proposition.+ −
+ −
If we now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, we obtain the following + −
proof state:+ −
+ −
\begin{isabelle}+ −
\isacommand{foobar}~@{text [quotes] "True \<and> True"}\\+ −
@{text "goal (1 subgoal)"}\\+ −
@{text "1. True \<and> True"}+ −
\end{isabelle}+ −
+ −
and we can build the proof+ −
+ −
\begin{isabelle}+ −
\isacommand{foobar}~@{text [quotes] "True \<and> True"}\\+ −
\isacommand{apply}@{text "(rule conjI)"}\\+ −
\isacommand{apply}@{text "(rule TrueI)+"}\\+ −
\isacommand{done}+ −
\end{isabelle}+ −
+ −
+ −
(FIXME What does @{text "Toplevel.theory"} @{text "Toplevel.print"}?)+ −
+ −
(FIXME read a name and show how to store theorems)+ −
+ −
(FIXME possibly also read a locale)+ −
*}+ −
+ −
(*<*)+ −
+ −
chapter {* Parsing *}+ −
+ −
text {*+ −
+ −
Lots of Standard ML code is given in this document, for various reasons,+ −
including:+ −
\begin{itemize}+ −
\item direct quotation of code found in the Isabelle source files,+ −
or simplified versions of such code+ −
\item identifiers found in the Isabelle source code, with their types + −
(or specialisations of their types)+ −
\item code examples, which can be run by the reader, to help illustrate the+ −
behaviour of functions found in the Isabelle source code+ −
\item ancillary functions, not from the Isabelle source code, + −
which enable the reader to run relevant code examples+ −
\item type abbreviations, which help explain the uses of certain functions+ −
\end{itemize}+ −
+ −
*}+ −
+ −
section {* Parsing Isar input *}+ −
+ −
text {*+ −
+ −
The typical parsing function has the type+ −
\texttt{'src -> 'res * 'src}, with input + −
of type \texttt{'src}, returning a result + −
of type \texttt{'res}, which is (or is derived from) the first part of the+ −
input, and also returning the remainder of the input.+ −
(In the common case, when it is clear what the ``remainder of the input''+ −
means, we will just say that the functions ``returns'' the+ −
value of type \texttt{'res}). + −
An exception is raised if an appropriate value + −
cannot be produced from the input.+ −
A range of exceptions can be used to identify different reasons + −
for the failure of a parse.+ −
+ −
This contrasts the standard parsing function in Standard ML,+ −
which is of type + −
\texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option};+ −
(for example, \texttt{List.getItem} and \texttt{Substring.getc}).+ −
However, much of the discussion at + −
FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html+ −
is relevant.+ −
+ −
Naturally one may convert between the two different sorts of parsing functions+ −
as follows:+ −
\begin{verbatim}+ −
open StringCvt ;+ −
type ('res, 'src) ex_reader = 'src -> 'res * 'src+ −
ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader + −
fun ex_reader rdr src = Option.valOf (rdr src) ;+ −
reader : ('res, 'src) ex_reader -> ('res, 'src) reader + −
fun reader exrdr src = SOME (exrdr src) handle _ => NONE ;+ −
\end{verbatim}+ −
+ −
*}+ −
+ −
section{* The \texttt{Scan} structure *}+ −
+ −
text {* + −
The source file is \texttt{src/General/scan.ML}.+ −
This structure provides functions for using and combining parsing functions+ −
of the type \texttt{'src -> 'res * 'src}.+ −
Three exceptions are used:+ −
\begin{verbatim}+ −
exception MORE of string option; (*need more input (prompt)*)+ −
exception FAIL of string option; (*try alternatives (reason of failure)*)+ −
exception ABORT of string; (*dead end*)+ −
\end{verbatim}+ −
Many functions in this structure (generally those with names composed of+ −
symbols) are declared as infix.+ −
+ −
Some functions from that structure are+ −
\begin{verbatim}+ −
|-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->+ −
'src -> 'res2 * 'src''+ −
--| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->+ −
'src -> 'res1 * 'src''+ −
-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->+ −
'src -> ('res1 * 'res2) * 'src''+ −
^^ : ('src -> string * 'src') * ('src' -> string * 'src'') ->+ −
'src -> string * 'src''+ −
\end{verbatim}+ −
These functions parse a result off the input source twice.+ −
+ −
\texttt{|--} and \texttt{--|} + −
return the first result and the second result, respectively.+ −
+ −
\texttt{--} returns both.+ −
+ −
\verb|^^| returns the result of concatenating the two results+ −
(which must be strings).+ −
+ −
Note how, although the types + −
\texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same,+ −
the types as shown help suggest the behaviour of the functions.+ −
\begin{verbatim}+ −
:-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->+ −
'src -> ('res1 * 'res2) * 'src''+ −
:|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->+ −
'src -> 'res2 * 'src''+ −
\end{verbatim}+ −
These are similar to \texttt{|--} and \texttt{--|},+ −
except that the second parsing function can depend on the result of the first.+ −
\begin{verbatim}+ −
>> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src'+ −
|| : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src+ −
\end{verbatim}+ −
\texttt{p >> f} applies a function \texttt{f} to the result of a parse.+ −
+ −
\texttt{||} tries a second parsing function if the first one+ −
fails by raising an exception of the form \texttt{FAIL \_}.+ −
+ −
\begin{verbatim}+ −
succeed : 'res -> ('src -> 'res * 'src) ;+ −
fail : ('src -> 'res_src) ;+ −
!! : ('src * string option -> string) -> + −
('src -> 'res_src) -> ('src -> 'res_src) ;+ −
\end{verbatim}+ −
\texttt{succeed r} returns \texttt{r}, with the input unchanged.+ −
\texttt{fail} always fails, raising exception \texttt{FAIL NONE}.+ −
\texttt{!! f} only affects the failure mode, turning a failure that+ −
raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}.+ −
This is used to prevent recovery from the failure ---+ −
thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, + −
it won't recover by trying \texttt{parse2}.+ −
+ −
\begin{verbatim}+ −
one : ('si -> bool) -> ('si list -> 'si * 'si list) ;+ −
some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ;+ −
\end{verbatim}+ −
These require the input to be a list of items:+ −
they fail, raising \texttt{MORE NONE} if the list is empty.+ −
On other failures they raise \texttt{FAIL NONE} + −
+ −
\texttt{one p} takes the first+ −
item from the list if it satisfies \texttt{p}, otherwise fails.+ −
+ −
\texttt{some f} takes the first+ −
item from the list and applies \texttt{f} to it, failing if this returns+ −
\texttt{NONE}. + −
+ −
\begin{verbatim}+ −
many : ('si -> bool) -> 'si list -> 'si list * 'si list ; + −
\end{verbatim}+ −
\texttt{many p} takes items from the input until it encounters one + −
which does not satisfy \texttt{p}. If it reaches the end of the input+ −
it fails, raising \texttt{MORE NONE}.+ −
+ −
\texttt{many1} (with the same type) fails if the first item + −
does not satisfy \texttt{p}. + −
+ −
\begin{verbatim}+ −
option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src)+ −
optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src)+ −
\end{verbatim}+ −
\texttt{option}: + −
where the parser \texttt{f} succeeds with result \texttt{r} + −
or raises \texttt{FAIL \_},+ −
\texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}.+ −
+ −
\texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_},+ −
\texttt{optional f default} provides the result \texttt{default}.+ −
+ −
\begin{verbatim}+ −
repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src+ −
repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src+ −
bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src + −
\end{verbatim}+ −
\texttt{repeat f} repeatedly parses an item off the remaining input until + −
\texttt{f} fails with \texttt{FAIL \_}+ −
+ −
\texttt{repeat1} is as for \texttt{repeat}, but requires at least one+ −
successful parse.+ −
+ −
\begin{verbatim}+ −
lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src))+ −
\end{verbatim}+ −
\texttt{lift} changes the source type of a parser by putting in an extra+ −
component \texttt{'ex}, which is ignored in the parsing.+ −
+ −
The \texttt{Scan} structure also provides the type \texttt{lexicon}, + −
HOW DO THEY WORK ?? TO BE COMPLETED+ −
\begin{verbatim}+ −
dest_lexicon: lexicon -> string list ;+ −
make_lexicon: string list list -> lexicon ;+ −
empty_lexicon: lexicon ;+ −
extend_lexicon: string list list -> lexicon -> lexicon ;+ −
merge_lexicons: lexicon -> lexicon -> lexicon ;+ −
is_literal: lexicon -> string list -> bool ;+ −
literal: lexicon -> string list -> string list * string list ;+ −
\end{verbatim}+ −
Two lexicons, for the commands and keywords, are stored and can be retrieved+ −
by:+ −
\begin{verbatim}+ −
val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ;+ −
val commands = Scan.dest_lexicon command_lexicon ;+ −
val keywords = Scan.dest_lexicon keyword_lexicon ;+ −
\end{verbatim}+ −
*}+ −
+ −
section{* The \texttt{OuterLex} structure *}+ −
+ −
text {*+ −
The source file is @{text "src/Pure/Isar/outer_lex.ML"}.+ −
In some other source files its name is abbreviated:+ −
\begin{verbatim}+ −
structure T = OuterLex;+ −
\end{verbatim}+ −
This structure defines the type \texttt{token}.+ −
(The types+ −
\texttt{OuterLex.token},+ −
\texttt{OuterParse.token} and+ −
\texttt{SpecParse.token} are all the same).+ −
+ −
Input text is split up into tokens, and the input source type for many parsing+ −
functions is \texttt{token list}.+ −
+ −
The datatype definition (which is not published in the signature) is+ −
\begin{verbatim}+ −
datatype token = Token of Position.T * (token_kind * string);+ −
\end{verbatim}+ −
but here are some runnable examples for viewing tokens: + −
+ −
*}+ −
+ −
+ −
+ −
+ −
ML{*+ −
val toks = OuterSyntax.scan Position.none+ −
"theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ;+ −
*}+ −
+ −
ML{*+ −
print_depth 20 ;+ −
*}+ −
+ −
ML{*+ −
map OuterLex.text_of toks ;+ −
*}+ −
+ −
ML{*+ −
val proper_toks = filter OuterLex.is_proper toks ;+ −
*} + −
+ −
ML{*+ −
map OuterLex.kind_of proper_toks + −
*}+ −
+ −
ML{*+ −
map OuterLex.unparse proper_toks ;+ −
*}+ −
+ −
ML{*+ −
OuterLex.stopper+ −
*}+ −
+ −
text {*+ −
+ −
The function \texttt{is\_proper : token -> bool} identifies tokens which are+ −
not white space or comments: many parsing functions assume require spaces or+ −
comments to have been filtered out.+ −
+ −
There is a special end-of-file token:+ −
\begin{verbatim}+ −
val (tok_eof : token, is_eof : token -> bool) = T.stopper ; + −
(* end of file token *)+ −
\end{verbatim}+ −
+ −
*}+ −
+ −
section {* The \texttt{OuterParse} structure *}+ −
+ −
text {*+ −
The source file is \texttt{src/Pure/Isar/outer\_parse.ML}.+ −
In some other source files its name is abbreviated:+ −
\begin{verbatim}+ −
structure P = OuterParse;+ −
\end{verbatim}+ −
Here the parsers use \texttt{token list} as the input source type. + −
+ −
Some of the parsers simply select the first token, provided that it is of the+ −
right kind (as returned by \texttt{T.kind\_of}): these are + −
\texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var,+ −
type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof}+ −
Others select the first token, provided that it is one of several kinds,+ −
(eg, \texttt{name, xname, text, typ}).+ −
+ −
\begin{verbatim}+ −
type 'a tlp = token list -> 'a * token list ; (* token list parser *)+ −
$$$ : string -> string tlp+ −
nat : int tlp ;+ −
maybe : 'a tlp -> 'a option tlp ;+ −
\end{verbatim}+ −
+ −
\texttt{\$\$\$ s} returns the first token,+ −
if it equals \texttt{s} \emph{and} \texttt{s} is a keyword.+ −
+ −
\texttt{nat} returns the first token, if it is a number, and evaluates it.+ −
+ −
\texttt{maybe}: if \texttt{p} returns \texttt{r}, + −
then \texttt{maybe p} returns \texttt{SOME r} ;+ −
if the first token is an underscore, it returns \texttt{NONE}.+ −
+ −
A few examples:+ −
\begin{verbatim}+ −
P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *)+ −
P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *)+ −
val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ;+ −
val proper_toks = List.filter T.is_proper toks ;+ −
P.list P.nat toks ; (* OK, doesn't recognize white space *)+ −
P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *)+ −
P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *)+ −
P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *)+ −
val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ;+ −
P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *)+ −
\end{verbatim}+ −
+ −
The following code helps run examples:+ −
\begin{verbatim}+ −
fun parse_str tlp str = + −
let val toks : token list = OuterSyntax.scan str ;+ −
val proper_toks = List.filter T.is_proper toks @ [tok_eof] ;+ −
val (res, rem_toks) = tlp proper_toks ;+ −
val rem_str = String.concat+ −
(Library.separate " " (List.map T.unparse rem_toks)) ;+ −
in (res, rem_str) end ;+ −
\end{verbatim}+ −
+ −
Some examples from \texttt{src/Pure/Isar/outer\_parse.ML}+ −
\begin{verbatim}+ −
val type_args =+ −
type_ident >> Library.single ||+ −
$$$ "(" |-- !!! (list1 type_ident --| $$$ ")") ||+ −
Scan.succeed [];+ −
\end{verbatim}+ −
There are three ways parsing a list of type arguments can succeed.+ −
The first line reads a single type argument, and turns it into a singleton+ −
list.+ −
The second line reads "(", and then the remainder, ignoring the "(" ;+ −
the remainder consists of a list of type identifiers (at least one),+ −
and then a ")" which is also ignored.+ −
The \texttt{!!!} ensures that if the parsing proceeds this far and then fails,+ −
it won't try the third line (see the description of \texttt{Scan.!!}).+ −
The third line consumes no input and returns the empty list.+ −
+ −
\begin{verbatim}+ −
fun triple2 (x, (y, z)) = (x, y, z);+ −
val arity = xname -- ($$$ "::" |-- !!! (+ −
Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) []+ −
-- sort)) >> triple2;+ −
\end{verbatim}+ −
The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is+ −
ignored), then optionally a list $ss$ of sorts and then another sort $s$.+ −
The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$.+ −
The second line reads the optional list of sorts:+ −
it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored,+ −
and between them a comma-separated list of sorts.+ −
If this list is absent, the default \texttt{[]} provides the list of sorts.+ −
+ −
\begin{verbatim}+ −
parse_str P.type_args "('a, 'b) ntyp" ;+ −
parse_str P.type_args "'a ntyp" ;+ −
parse_str P.type_args "ntyp" ;+ −
parse_str P.arity "ty :: tycl" ;+ −
parse_str P.arity "ty :: (tycl1, tycl2) tycl" ;+ −
\end{verbatim}+ −
+ −
*}+ −
+ −
section {* The \texttt{SpecParse} structure *}+ −
+ −
text {*+ −
The source file is \texttt{src/Pure/Isar/spec\_parse.ML}.+ −
This structure contains token list parsers for more complicated values.+ −
For example, + −
\begin{verbatim}+ −
open SpecParse ;+ −
attrib : Attrib.src tok_rdr ; + −
attribs : Attrib.src list tok_rdr ;+ −
opt_attribs : Attrib.src list tok_rdr ;+ −
xthm : (thmref * Attrib.src list) tok_rdr ;+ −
xthms1 : (thmref * Attrib.src list) list tok_rdr ;+ −
+ −
parse_str attrib "simp" ;+ −
parse_str opt_attribs "hello" ;+ −
val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ;+ −
map Args.dest_src ass ;+ −
val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ;+ −
+ −
parse_str xthm "mythm [attr]" ;+ −
parse_str xthms1 "thm1 [attr] thms2" ;+ −
\end{verbatim}+ −
+ −
As you can see, attributes are described using types of the \texttt{Args}+ −
structure, described below.+ −
*}+ −
+ −
section{* The \texttt{Args} structure *}+ −
+ −
text {*+ −
The source file is \texttt{src/Pure/Isar/args.ML}.+ −
The primary type of this structure is the \texttt{src} datatype;+ −
the single constructors not published in the signature, but + −
\texttt{Args.src} and \texttt{Args.dest\_src}+ −
are in fact the constructor and destructor functions.+ −
Note that the types \texttt{Attrib.src} and \texttt{Method.src}+ −
are in fact \texttt{Args.src}.+ −
+ −
\begin{verbatim}+ −
src : (string * Args.T list) * Position.T -> Args.src ;+ −
dest_src : Args.src -> (string * Args.T list) * Position.T ;+ −
Args.pretty_src : Proof.context -> Args.src -> Pretty.T ;+ −
fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ;+ −
+ −
val thy = ML_Context.the_context () ;+ −
val ctxt = ProofContext.init thy ;+ −
map (pr_src ctxt) ass ;+ −
\end{verbatim}+ −
+ −
So an \texttt{Args.src} consists of the first word, then a list of further + −
``arguments'', of type \texttt{Args.T}, with information about position in the+ −
input.+ −
\begin{verbatim}+ −
(* how an Args.src is parsed *)+ −
P.position : 'a tlp -> ('a * Position.T) tlp ;+ −
P.arguments : Args.T list tlp ;+ −
+ −
val parse_src : Args.src tlp =+ −
P.position (P.xname -- P.arguments) >> Args.src ;+ −
\end{verbatim}+ −
+ −
\begin{verbatim}+ −
val ((first_word, args), pos) = Args.dest_src asrc ;+ −
map Args.string_of args ;+ −
\end{verbatim}+ −
+ −
The \texttt{Args} structure contains more parsers and parser transformers + −
for which the input source type is \texttt{Args.T list}. For example,+ −
\begin{verbatim}+ −
type 'a atlp = Args.T list -> 'a * Args.T list ;+ −
open Args ;+ −
nat : int atlp ; (* also Args.int *)+ −
thm_sel : PureThy.interval list atlp ;+ −
list : 'a atlp -> 'a list atlp ;+ −
attribs : (string -> string) -> Args.src list atlp ;+ −
opt_attribs : (string -> string) -> Args.src list atlp ;+ −
+ −
(* parse_atl_str : 'a atlp -> (string -> 'a * string) ;+ −
given an Args.T list parser, to get a string parser *)+ −
fun parse_atl_str atlp str = + −
let val (ats, rem_str) = parse_str P.arguments str ;+ −
val (res, rem_ats) = atlp ats ;+ −
in (res, String.concat (Library.separate " "+ −
(List.map Args.string_of rem_ats @ [rem_str]))) end ;+ −
+ −
parse_atl_str Args.int "-1-," ;+ −
parse_atl_str (Scan.option Args.int) "x1-," ;+ −
parse_atl_str Args.thm_sel "(1-,4,13-22)" ;+ −
+ −
val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I)+ −
"[THEN trans [THEN sym], simp, OF sym]" ;+ −
\end{verbatim}+ −
+ −
From here, an attribute is interpreted using \texttt{Attrib.attribute}.+ −
+ −
\texttt{Args} has a large number of functions which parse an \texttt{Args.src}+ −
and also refer to a generic context. + −
Note the use of \texttt{Scan.lift} for this.+ −
(as does \texttt{Attrib} - RETHINK THIS)+ −
+ −
(\texttt{Args.syntax} shown below has type specialised)+ −
+ −
\begin{verbatim}+ −
type ('res, 'src) parse_fn = 'src -> 'res * 'src ;+ −
type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ;+ −
Scan.lift : 'a atlp -> 'a cgatlp ;+ −
term : term cgatlp ;+ −
typ : typ cgatlp ;+ −
+ −
Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ;+ −
Attrib.thm : thm cgatlp ;+ −
Attrib.thms : thm list cgatlp ;+ −
Attrib.multi_thm : thm list cgatlp ;+ −
+ −
(* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ;+ −
given a (Context.generic * Args.T list) parser, to get a string parser *)+ −
fun parse_cgatl_str cgatlp str = + −
let + −
(* use the current generic context *)+ −
val generic = Context.Theory thy ;+ −
val (ats, rem_str) = parse_str P.arguments str ;+ −
(* ignore any change to the generic context *)+ −
val (res, (_, rem_ats)) = cgatlp (generic, ats) ;+ −
in (res, String.concat (Library.separate " "+ −
(List.map Args.string_of rem_ats @ [rem_str]))) end ;+ −
\end{verbatim}+ −
*}+ −
+ −
section{* Attributes, and the \texttt{Attrib} structure *}+ −
+ −
text {*+ −
The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}.+ −
The source file for the \texttt{Attrib} structure is+ −
\texttt{src/Pure/Isar/attrib.ML}.+ −
Most attributes use a theorem to change a generic context (for example, + −
by declaring that the theorem should be used, by default, in simplification),+ −
or change a theorem (which most often involves referring to the current+ −
theory). + −
The functions \texttt{Thm.rule\_attribute} and+ −
\texttt{Thm.declaration\_attribute} create attributes of these kinds.+ −
+ −
\begin{verbatim}+ −
type attribute = Context.generic * thm -> Context.generic * thm;+ −
type 'a trf = 'a -> 'a ; (* transformer of a given type *)+ −
Thm.rule_attribute : (Context.generic -> thm -> thm) -> attribute ;+ −
Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ;+ −
+ −
Attrib.print_attributes : theory -> unit ;+ −
Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ;+ −
+ −
List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ;+ −
\end{verbatim}+ −
+ −
An attribute is stored in a theory as indicated by:+ −
\begin{verbatim}+ −
Attrib.add_attributes : + −
(bstring * (src -> attribute) * string) list -> theory trf ; + −
(*+ −
Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ;+ −
*)+ −
\end{verbatim}+ −
where the first and third arguments are name and description of the attribute,+ −
and the second is a function which parses the attribute input text + −
(including the attribute name, which has necessarily already been parsed).+ −
Here, \texttt{THEN\_att} is a function declared in the code for the+ −
structure \texttt{Attrib}, but not published in its signature.+ −
The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of + −
\texttt{Attrib.add\_attributes} to add a number of attributes.+ −
+ −
\begin{verbatim}+ −
FullAttrib.THEN_att : src -> attribute ;+ −
FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ;+ −
FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ;+ −
\end{verbatim}+ −
+ −
\begin{verbatim}+ −
Attrib.syntax : attribute cgatlp -> src -> attribute ;+ −
Attrib.no_args : attribute -> src -> attribute ;+ −
\end{verbatim}+ −
When this is called as \texttt{syntax scan src (gc, th)}+ −
the generic context \texttt{gc} is used + −
(and potentially changed to \texttt{gc'})+ −
by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would+ −
then be applied to \texttt{(gc', th)}.+ −
The source for parsing the attribute is the arguments part of \texttt{src},+ −
which must all be consumed by the parse.+ −
+ −
For example, for \texttt{Attrib.no\_args attr src}, the attribute parser + −
simply returns \texttt{attr}, requiring that the arguments part of+ −
\texttt{src} must be empty.+ −
+ −
Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified:+ −
\begin{verbatim}+ −
fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ;+ −
rot_att_n : int -> attribute ;+ −
val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ;+ −
val rotated_att : src -> attribute =+ −
Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ;+ −
+ −
val THEN_arg : int cgatlp = Scan.lift + −
(Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ;+ −
+ −
Attrib.thm : thm cgatlp ;+ −
+ −
THEN_arg -- Attrib.thm : (int * thm) cgatlp ;+ −
+ −
fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ;+ −
THEN_att_n : int * thm -> attribute ;+ −
+ −
val THEN_att : src -> attribute = Attrib.syntax+ −
(THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp);+ −
\end{verbatim}+ −
The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg}+ −
read an optional argument, which for \texttt{rotated} is an integer, + −
and for \texttt{THEN} is a natural enclosed in square brackets;+ −
the default, if the argument is absent, is 1 in each case.+ −
Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into+ −
attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is+ −
parsed by \texttt{Attrib.thm}. + −
Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}.+ −
+ −
*}+ −
+ −
section{* Methods, and the \texttt{Method} structure *}+ −
+ −
text {*+ −
The source file is \texttt{src/Pure/Isar/method.ML}.+ −
The type \texttt{method} is defined by the datatype declaration+ −
\begin{verbatim}+ −
(* datatype method = Meth of thm list -> cases_tactic; *)+ −
RuleCases.NO_CASES : tactic -> cases_tactic ;+ −
\end{verbatim}+ −
In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor+ −
\texttt{Meth}.+ −
A \texttt{cases\_tactic} is an elaborated version of a tactic.+ −
\texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a+ −
\texttt{cases\_tactic} without any further case information.+ −
For further details see the description of structure \texttt{RuleCases} below.+ −
The list of theorems to be passed to a method consists of the current+ −
\emph{facts} in the proof.+ −
+ −
\begin{verbatim}+ −
RAW_METHOD : (thm list -> tactic) -> method ;+ −
METHOD : (thm list -> tactic) -> method ;+ −
+ −
SIMPLE_METHOD : tactic -> method ;+ −
SIMPLE_METHOD' : (int -> tactic) -> method ;+ −
SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ;+ −
+ −
RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ;+ −
METHOD_CASES : (thm list -> cases_tactic) -> method ;+ −
\end{verbatim}+ −
A method is, in its simplest form, a tactic; applying the method is to apply+ −
the tactic to the current goal state.+ −
+ −
Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying + −
\texttt{tacf} to the current {facts}, and applying that tactic to the+ −
goal state.+ −
+ −
\texttt{METHOD} is similar but also first applies+ −
\texttt{Goal.conjunction\_tac} to all subgoals.+ −
+ −
\texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then+ −
applies \texttt{tacf}.+ −
+ −
\texttt{SIMPLE\_METHOD' tacf} inserts the facts and then+ −
applies \texttt{tacf} to subgoal 1.+ −
+ −
\texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by+ −
\texttt{quant}, which may be, for example,+ −
\texttt{ALLGOALS} (all subgoals),+ −
\texttt{TRYALL} (try all subgoals, failure is OK),+ −
\texttt{FIRSTGOAL} (try subgoals until it succeeds once), + −
\texttt{(fn tacf => tacf 4)} (subgoal 4), etc+ −
(see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}).+ −
+ −
A method is stored in a theory as indicated by:+ −
\begin{verbatim}+ −
Method.add_method : + −
(bstring * (src -> Proof.context -> method) * string) -> theory trf ; + −
( *+ −
* )+ −
\end{verbatim}+ −
where the first and third arguments are name and description of the method,+ −
and the second is a function which parses the method input text + −
(including the method name, which has necessarily already been parsed).+ −
+ −
Here, \texttt{xxx} is a function declared in the code for the+ −
structure \texttt{Method}, but not published in its signature.+ −
The source file \texttt{src/Pure/Isar/method.ML} shows the use of + −
\texttt{Method.add\_method} to add a number of methods.+ −
+ −
+ −
*}+ −
(*>*)+ −
end+ −