diff -r 8939b8fd8603 -r 069d525f8f1d ProgTutorial/Parsing.thy --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ProgTutorial/Parsing.thy Thu Mar 19 13:28:16 2009 +0100 @@ -0,0 +1,1684 @@ +theory Parsing +imports Base "Package/Simple_Inductive_Package" +begin + + +chapter {* Parsing *} + +text {* + + Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. + Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so + on, belong to the outer syntax, whereas items inside double quotation marks, such + as terms, types and so on, belong to the inner syntax. For parsing inner syntax, + Isabelle uses a rather general and sophisticated algorithm, which + is driven by priority grammars. Parsers for outer syntax are built up by functional + parsing combinators. These combinators are a well-established technique for parsing, + which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. + Isabelle developers are usually concerned with writing these outer syntax parsers, + either for new definitional packages or for calling tactics with specific arguments. + + \begin{readmore} + The library + for writing parser combinators is split up, roughly, into two parts. + The first part consists of a collection of generic parser combinators defined + in the structure @{ML_struct Scan} in the file + @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of + combinators for dealing with specific token types, which are defined in the + structure @{ML_struct OuterParse} in the file + @{ML_file "Pure/Isar/outer_parse.ML"}. + \end{readmore} + +*} + +section {* Building Generic Parsers *} + +text {* + + Let us first have a look at parsing strings using generic parsing combinators. + The function @{ML "$$"} takes a string as argument and will ``consume'' this string from + a given input list of strings. ``Consume'' in this context means that it will + return a pair consisting of this string and the rest of the input list. + For example: + + @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} + + @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} + + The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception + @{text "FAIL"} if no string can be consumed. For example trying to parse + + @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" + "Exception FAIL raised"} + + will raise the exception @{text "FAIL"}. + There are three exceptions used in the parsing combinators: + + \begin{itemize} + \item @{text "FAIL"} is used to indicate that alternative routes of parsing + might be explored. + \item @{text "MORE"} indicates that there is not enough input for the parser. For example + in @{text "($$ \"h\") []"}. + \item @{text "ABORT"} is the exception that is raised when a dead end is reached. + It is used for example in the function @{ML "!!"} (see below). + \end{itemize} + + However, note that these exceptions are private to the parser and cannot be accessed + by the programmer (for example to handle them). + + Slightly more general than the parser @{ML "$$"} is the function @{ML + Scan.one}, in that it takes a predicate as argument and then parses exactly + one item from the input list satisfying this predicate. For example the + following parser either consumes an @{text [quotes] "h"} or a @{text + [quotes] "w"}: + + +@{ML_response [display,gray] +"let + val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") + val input1 = (explode \"hello\") + val input2 = (explode \"world\") +in + (hw input1, hw input2) +end" + "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} + + Two parser can be connected in sequence by using the function @{ML "--"}. + For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this + sequence you can achieve by: + + @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" + "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} + + Note how the result of consumed strings builds up on the left as nested pairs. + + If, as in the previous example, you want to parse a particular string, + then you should use the function @{ML Scan.this_string}: + + @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")" + "(\"hell\", [\"o\"])"} + + Parsers that explore alternatives can be constructed using the function @{ML + "||"}. For example, the parser @{ML "(p || q)" for p q} returns the + result of @{text "p"}, in case it succeeds, otherwise it returns the + result of @{text "q"}. For example: + + +@{ML_response [display,gray] +"let + val hw = ($$ \"h\") || ($$ \"w\") + val input1 = (explode \"hello\") + val input2 = (explode \"world\") +in + (hw input1, hw input2) +end" + "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} + + The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function + for parsers, except that they discard the item being parsed by the first (respectively second) + parser. For example: + +@{ML_response [display,gray] +"let + val just_e = ($$ \"h\") |-- ($$ \"e\") + val just_h = ($$ \"h\") --| ($$ \"e\") + val input = (explode \"hello\") +in + (just_e input, just_h input) +end" + "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} + + The parser @{ML "Scan.optional p x" for p x} returns the result of the parser + @{text "p"}, if it succeeds; otherwise it returns + the default value @{text "x"}. For example: + +@{ML_response [display,gray] +"let + val p = Scan.optional ($$ \"h\") \"x\" + val input1 = (explode \"hello\") + val input2 = (explode \"world\") +in + (p input1, p input2) +end" + "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} + + The function @{ML Scan.option} works similarly, except no default value can + be given. Instead, the result is wrapped as an @{text "option"}-type. For example: + +@{ML_response [display,gray] +"let + val p = Scan.option ($$ \"h\") + val input1 = (explode \"hello\") + val input2 = (explode \"world\") +in + (p input1, p input2) +end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} + + The function @{ML "!!"} helps to produce appropriate error messages + during parsing. For example if you want to parse that @{text p} is immediately + followed by @{text q}, or start a completely different parser @{text r}, + you might write: + + @{ML [display,gray] "(p -- q) || r" for p q r} + + However, this parser is problematic for producing an appropriate error + message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in + that case you lose the information that @{text p} should be followed by + @{text q}. To see this consider the case in which @{text p} is present in + the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail + and the alternative parser @{text r} will be tried. However in many + circumstance this will be the wrong parser for the input ``p-followed-by-q'' + and therefore will also fail. The error message is then caused by the + failure of @{text r}, not by the absence of @{text q} in the input. This + kind of situation can be avoided when using the function @{ML "!!"}. + This function aborts the whole process of parsing in case of a + failure and prints an error message. For example if you invoke the parser + + + @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} + + on @{text [quotes] "hello"}, the parsing succeeds + + @{ML_response [display,gray] + "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" + "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} + + but if you invoke it on @{text [quotes] "world"} + + @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" + "Exception ABORT raised"} + + then the parsing aborts and the error message @{text "foo"} is printed. In order to + see the error message properly, you need to prefix the parser with the function + @{ML "Scan.error"}. For example: + + @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" + "Exception Error \"foo\" raised"} + + This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} + (see Section~\ref{sec:newcommand} which explains this function in more detail). + + Let us now return to our example of parsing @{ML "(p -- q) || r" for p q + r}. If you want to generate the correct error message for p-followed-by-q, + then you have to write: +*} + +ML{*fun p_followed_by_q p q r = +let + val err_msg = (fn _ => p ^ " is not followed by " ^ q) +in + ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) +end *} + + +text {* + Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and + the input @{text [quotes] "holle"} + + @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" + "Exception ERROR \"h is not followed by e\" raised"} + + produces the correct error message. Running it with + + @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" + "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} + + yields the expected parsing. + + The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as + often as it succeeds. For example: + + @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" + "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} + + Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function + @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} + succeeds at least once. + + Also note that the parser would have aborted with the exception @{text MORE}, if + you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using + the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With + them you can write: + + @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" + "([\"h\", \"h\", \"h\", \"h\"], [])"} + + @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; + other stoppers need to be used when parsing, for example, tokens. However, this kind of + manually wrapping is often already done by the surrounding infrastructure. + + The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any + string as in + + @{ML_response [display,gray] +"let + val p = Scan.repeat (Scan.one Symbol.not_eof) + val input = (explode \"foo bar foo\") +in + Scan.finite Symbol.stopper p input +end" +"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} + + where the function @{ML Symbol.not_eof} ensures that we do not read beyond the + end of the input string (i.e.~stopper symbol). + + The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can + parse the input, then the whole parser fails; if not, then the second is tried. Therefore + + @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")" + "Exception FAIL raised"} + + fails, while + + @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")" + "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} + + succeeds. + + The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any + input until a certain marker symbol is reached. In the example below the marker + symbol is a @{text [quotes] "*"}. + + @{ML_response [display,gray] +"let + val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) + val input1 = (explode \"fooooo\") + val input2 = (explode \"foo*ooo\") +in + (Scan.finite Symbol.stopper p input1, + Scan.finite Symbol.stopper p input2) +end" +"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), + ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} + + After parsing is done, you nearly always want to apply a function on the parsed + items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs + first the parser @{text p} and upon successful completion applies the + function @{text f} to the result. For example + +@{ML_response [display,gray] +"let + fun double (x,y) = (x ^ x, y ^ y) +in + (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\") +end" +"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} + + doubles the two parsed input strings; or + + @{ML_response [display,gray] +"let + val p = Scan.repeat (Scan.one Symbol.not_eof) + val input = (explode \"foo bar foo\") +in + Scan.finite Symbol.stopper (p >> implode) input +end" +"(\"foo bar foo\",[])"} + + where the single-character strings in the parsed output are transformed + back into one string. + + The function @{ML Scan.ahead} parses some input, but leaves the original + input unchanged. For example: + + @{ML_response [display,gray] + "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")" + "(\"foo\", [\"f\", \"o\", \"o\"])"} + + The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies + the given parser to the second component of the pair and leaves the first component + untouched. For example + +@{ML_response [display,gray] +"Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))" +"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} + + (FIXME: In which situations is this useful? Give examples.) + + \begin{exercise}\label{ex:scancmts} + Write a parser that parses an input string so that any comment enclosed + inside @{text "(*\*)"} is replaced by a the same comment but enclosed inside + @{text "(**\**)"} in the output string. To enclose a string, you can use the + function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML + "s1 ^ s ^ s2" for s1 s2 s}. + \end{exercise} +*} + +section {* Parsing Theory Syntax *} + +text {* + (FIXME: context parser) + + Most of the time, however, Isabelle developers have to deal with parsing + tokens, not strings. These token parsers have the type: +*} + +ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*} + +text {* + The reason for using token parsers is that theory syntax, as well as the + parsers for the arguments of proof methods, use the type @{ML_type + OuterLex.token} (which is identical to the type @{ML_type + OuterParse.token}). However, there are also handy parsers for + ML-expressions and ML-files. + + \begin{readmore} + The parser functions for the theory syntax are contained in the structure + @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. + The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. + \end{readmore} + + The structure @{ML_struct OuterLex} defines several kinds of tokens (for example + @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and + @{ML "Command" in OuterLex} for commands). Some token parsers take into account the + kind of tokens. +*} + +text {* + The first example shows how to generate a token list out of a string using + the function @{ML "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"} + since, at the moment, we are not interested in generating + precise error messages. The following code + +@{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" +"[Token (\,(Ident, \"hello\"),\), + Token (\,(Space, \" \"),\), + Token (\,(Ident, \"world\"),\)]"} + + produces three tokens where the first and the last are identifiers, since + @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any + other syntactic category.\footnote{Note that because of a possible a bug in + the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of + the tokens.} The second indicates a space. + + Many parsing functions later on will require spaces, comments and the like + to have already been filtered out. So from now on we are going to use the + functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example: + +@{ML_response_fake [display,gray] +"let + val input = OuterSyntax.scan Position.none \"hello world\" +in + filter OuterLex.is_proper input +end" +"[Token (\,(Ident, \"hello\"), \), Token (\,(Ident, \"world\"), \)]"} + + For convenience we define the function: + +*} + +ML{*fun filtered_input str = + filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *} + +text {* + + If you now parse + +@{ML_response_fake [display,gray] +"filtered_input \"inductive | for\"" +"[Token (\,(Command, \"inductive\"),\), + Token (\,(Keyword, \"|\"),\), + Token (\,(Keyword, \"for\"),\)]"} + + you obtain a list consisting of only a command and two keyword tokens. + If you want to see which keywords and commands are currently known to Isabelle, type in + the following code (you might have to adjust the @{ML print_depth} in order to + see the complete list): + +@{ML_response_fake [display,gray] +"let + val (keywords, commands) = OuterKeyword.get_lexicons () +in + (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) +end" +"([\"}\", \"{\", \], [\"\\", \"\\", \])"} + + The parser @{ML "OuterParse.$$$"} parses a single keyword. For example: + +@{ML_response [display,gray] +"let + val input1 = filtered_input \"where for\" + val input2 = filtered_input \"| in\" +in + (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) +end" +"((\"where\",\), (\"|\",\))"} + + Like before, you can sequentially connect parsers with @{ML "--"}. For example: + +@{ML_response [display,gray] +"let + val input = filtered_input \"| in\" +in + (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input +end" +"((\"|\", \"in\"), [])"} + + The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty + list of items recognised by the parser @{text p}, where the items being parsed + are separated by the string @{text s}. For example: + +@{ML_response [display,gray] +"let + val input = filtered_input \"in | in | in foo\" +in + (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input +end" +"([\"in\", \"in\", \"in\"], [\])"} + + @{ML "OuterParse.enum1"} works similarly, except that the parsed list must + be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the + end of the parsed string, otherwise the parser would have consumed all + tokens and then failed with the exception @{text "MORE"}. Like in the + previous section, we can avoid this exception using the wrapper @{ML + Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML + OuterLex.stopper}. We can write: + +@{ML_response [display,gray] +"let + val input = filtered_input \"in | in | in\" +in + Scan.finite OuterLex.stopper + (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input +end" +"([\"in\", \"in\", \"in\"], [])"} + + The following function will help to run examples. + +*} + +ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} + +text {* + + The function @{ML "OuterParse.!!!"} can be used to force termination of the + parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), + except that the error message is fixed to be @{text [quotes] "Outer syntax error"} + with a relatively precise description of the failure. For example: + +@{ML_response_fake [display,gray] +"let + val input = filtered_input \"in |\" + val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" +in + parse (OuterParse.!!! parse_bar_then_in) input +end" +"Exception ERROR \"Outer syntax error: keyword \"|\" expected, +but keyword in was found\" raised" +} + + \begin{exercise} (FIXME) + A type-identifier, for example @{typ "'a"}, is a token of + kind @{ML "Keyword" in OuterLex}. It can be parsed using + the function @{ML OuterParse.type_ident}. + \end{exercise} + + (FIXME: or give parser for numbers) + + Whenever there is a possibility that the processing of user input can fail, + it is a good idea to give as much information about where the error + occured. For this Isabelle can attach positional information to tokens + and then thread this information up the processing chain. To see this, + modify the function @{ML filtered_input} described earlier to +*} + +ML{*fun filtered_input' str = + filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *} + +text {* + where we pretend the parsed string starts on line 7. An example is + +@{ML_response_fake [display,gray] +"filtered_input' \"foo \\n bar\"" +"[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \), + Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \)]"} + + in which the @{text [quotes] "\\n"} causes the second token to be in + line 8. + + By using the parser @{ML OuterParse.position} you can decode the positional + information and return it as part of the parsed input. For example + +@{ML_response_fake [display,gray] +"let + val input = (filtered_input' \"where\") +in + parse (OuterParse.position (OuterParse.$$$ \"where\")) input +end" +"((\"where\", {line=7, end_line=7}), [])"} + + \begin{readmore} + The functions related to positions are implemented in the file + @{ML_file "Pure/General/position.ML"}. + \end{readmore} + +*} + +section {* Parsing Inner Syntax *} + +text {* + There is usually no need to write your own parser for parsing inner syntax, that is + for terms and types: you can just call the pre-defined parsers. Terms can + be parsed using the function @{ML OuterParse.term}. For example: + +@{ML_response [display,gray] +"let + val input = OuterSyntax.scan Position.none \"foo\" +in + OuterParse.term input +end" +"(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} + + The function @{ML OuterParse.prop} is similar, except that it gives a different + error message, when parsing fails. As you can see, the parser not just returns + the parsed string, but also some encoded information. You can decode the + information with the function @{ML YXML.parse}. For example + + @{ML_response [display,gray] + "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" + "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} + + The result of the decoding is an XML-tree. You can see better what is going on if + you replace @{ML Position.none} by @{ML "Position.line 42"}, say: + +@{ML_response [display,gray] +"let + val input = OuterSyntax.scan (Position.line 42) \"foo\" +in + YXML.parse (fst (OuterParse.term input)) +end" +"XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"} + + The positional information is stored as part of an XML-tree so that code + called later on will be able to give more precise error messages. + + \begin{readmore} + The functions to do with input and output of XML and YXML are defined + in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}. + \end{readmore} + +*} + +section {* Parsing Specifications\label{sec:parsingspecs} *} + +text {* + There are a number of special purpose parsers that help with parsing + specifications of function definitions, inductive predicates and so on. In + Capter~\ref{chp:package}, for example, we will need to parse specifications + for inductive predicates of the form: +*} + +simple_inductive + even and odd +where + even0: "even 0" +| evenS: "odd n \ even (Suc n)" +| oddS: "even n \ odd (Suc n)" + +text {* + For this we are going to use the parser: +*} + +ML %linenosgray{*val spec_parser = + OuterParse.fixes -- + Scan.optional + (OuterParse.$$$ "where" |-- + OuterParse.!!! + (OuterParse.enum1 "|" + (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*} + +text {* + Note that the parser does not parse the keyword \simpleinductive, even if it is + meant to process definitions as shown above. The parser of the keyword + will be given by the infrastructure that will eventually call @{ML spec_parser}. + + + To see what the parser returns, let us parse the string corresponding to the + definition of @{term even} and @{term odd}: + +@{ML_response [display,gray] +"let + val input = filtered_input + (\"even and odd \" ^ + \"where \" ^ + \" even0[intro]: \\\"even 0\\\" \" ^ + \"| evenS[intro]: \\\"odd n \ even (Suc n)\\\" \" ^ + \"| oddS[intro]: \\\"even n \ odd (Suc n)\\\"\") +in + parse spec_parser input +end" +"(([(even, NONE, NoSyn), (odd, NONE, NoSyn)], + [((even0,\), \"\\^E\\^Ftoken\\^Eeven 0\\^E\\^F\\^E\"), + ((evenS,\), \"\\^E\\^Ftoken\\^Eodd n \ even (Suc n)\\^E\\^F\\^E\"), + ((oddS,\), \"\\^E\\^Ftoken\\^Eeven n \ odd (Suc n)\\^E\\^F\\^E\")]), [])"} + + As you see, the result is a pair consisting of a list of + variables with optional type-annotation and syntax-annotation, and a list of + rules where every rule has optionally a name and an attribute. + + The function @{ML OuterParse.fixes} in Line 2 of the parser reads an + \isacommand{and}-separated + list of variables that can include optional type annotations and syntax translations. + For example:\footnote{Note that in the code we need to write + @{text "\\\"int \ bool\\\""} in order to properly escape the double quotes + in the compound type.} + +@{ML_response [display,gray] +"let + val input = filtered_input + \"foo::\\\"int \ bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\" +in + parse OuterParse.fixes input +end" +"([(foo, SOME \"\\^E\\^Ftoken\\^Eint \ bool\\^E\\^F\\^E\", NoSyn), + (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), + (blonk, NONE, NoSyn)],[])"} +*} + +text {* + Whenever types are given, they are stored in the @{ML SOME}s. The types are + not yet used to type the variables: this must be done by type-inference later + on. Since types are part of the inner syntax they are strings with some + encoded information (see previous section). If a syntax translation is + present for a variable, then it is stored in the @{ML Mixfix} datastructure; + no syntax translation is indicated by @{ML NoSyn}. + + \begin{readmore} + The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. + \end{readmore} + + Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a + list of introduction rules, that is propositions with theorem + annotations. The introduction rules are propositions parsed by @{ML + OuterParse.prop}. However, they can include an optional theorem name plus + some attributes. For example + +@{ML_response [display,gray] "let + val input = filtered_input \"foo_lemma[intro,dest!]:\" + val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input +in + (name, map Args.dest_src attrib) +end" "(foo_lemma, [((\"intro\", []), \), ((\"dest\", [\]), \)])"} + + The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of + @{ML thm_name in SpecParse}. Theorem names can contain attributes. The name + has to end with @{text [quotes] ":"}---see the argument of + the function @{ML SpecParse.opt_thm_name} in Line 7. + + \begin{readmore} + Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} + and @{ML_file "Pure/Isar/args.ML"}. + \end{readmore} +*} + +section {* New Commands and Keyword Files\label{sec:newcommand} *} + +text {* + (FIXME: update to the right command setup) + + Often new commands, for example for providing new definitional principles, + need to be implemented. While this is not difficult on the ML-level, + new commands, in order to be useful, need to be recognised by + ProofGeneral. This results in some subtle configuration issues, which we + will explain in this section. + + To keep things simple, let us start with a ``silly'' command that does nothing + at all. We shall name this command \isacommand{foobar}. On the ML-level it can be + defined as: +*} + +ML{*let + val do_nothing = Scan.succeed (Toplevel.theory I) + val kind = OuterKeyword.thy_decl +in + OuterSyntax.command "foobar" "description of foobar" kind do_nothing +end *} + +text {* + The crucial function @{ML OuterSyntax.command} expects a name for the command, a + short description, a kind indicator (which we will explain later on more thoroughly) and a + parser producing a top-level transition function (its purpose will also explained + later). + + While this is everything you have to do on the ML-level, you need a keyword + file that can be loaded by ProofGeneral. This is to enable ProofGeneral to + recognise \isacommand{foobar} as a command. Such a keyword file can be + generated with the command-line: + + @{text [display] "$ isabelle keywords -k foobar some_log_files"} + + The option @{text "-k foobar"} indicates which postfix the name of the keyword file + will be assigned. In the case above the file will be named @{text + "isar-keywords-foobar.el"}. This command requires log files to be + present (in order to extract the keywords from them). To generate these log + files, you first need to package the code above into a separate theory file named + @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the + complete code. + + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + \begin{figure}[t] + \begin{graybox}\small + \isacommand{theory}~@{text Command}\\ + \isacommand{imports}~@{text Main}\\ + \isacommand{begin}\\ + \isacommand{ML}~@{text "\"}\\ + @{ML +"let + val do_nothing = Scan.succeed (Toplevel.theory I) + val kind = OuterKeyword.thy_decl +in + OuterSyntax.command \"foobar\" \"description of foobar\" kind do_nothing +end"}\\ + @{text "\"}\\ + \isacommand{end} + \end{graybox} + \caption{\small The file @{text "Command.thy"} is necessary for generating a log + file. This log file enables Isabelle to generate a keyword file containing + the command \isacommand{foobar}.\label{fig:commandtheory}} + \end{figure} + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + For our purposes it is sufficient to use the log files of the theories + @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as + the log file for the theory @{text "Command.thy"}, which contains the new + \isacommand{foobar}-command. If you target other logics besides HOL, such + as Nominal or ZF, then you need to adapt the log files appropriately. + + @{text Pure} and @{text HOL} are usually compiled during the installation of + Isabelle. So log files for them should be already available. If not, then + they can be conveniently compiled with the help of the build-script from the Isabelle + distribution. + + @{text [display] +"$ ./build -m \"Pure\" +$ ./build -m \"HOL\""} + + The @{text "Pure-ProofGeneral"} theory needs to be compiled with: + + @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""} + + For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory + with: + + @{text [display] "$ isabelle mkdir FoobarCommand"} + + This generates a directory containing the files: + + @{text [display] +"./IsaMakefile +./FoobarCommand/ROOT.ML +./FoobarCommand/document +./FoobarCommand/document/root.tex"} + + + You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"} + and add the line + + @{text [display] "use_thy \"Command\";"} + + to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing: + + @{text [display] "$ isabelle make"} + + If the compilation succeeds, you have finally created all the necessary log files. + They are stored in the directory + + @{text [display] "~/.isabelle/heaps/Isabelle2008/polyml-5.2.1_x86-linux/log"} + + or something similar depending on your Isabelle distribution and architecture. + One quick way to assign a shell variable to this directory is by typing + + @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"} + + on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the + directory should include the files: + + @{text [display] +"Pure.gz +HOL.gz +Pure-ProofGeneral.gz +HOL-FoobarCommand.gz"} + + From them you can create the keyword files. Assuming the name + of the directory is in @{text "$ISABELLE_LOGS"}, + then the Unix command for creating the keyword file is: + +@{text [display] +"$ isabelle keywords -k foobar + $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"} + + The result is the file @{text "isar-keywords-foobar.el"}. It should contain + the string @{text "foobar"} twice.\footnote{To see whether things are fine, check + that @{text "grep foobar"} on this file returns something + non-empty.} This keyword file needs to + be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle + aware of this keyword file, you have to start Isabelle with the option @{text + "-k foobar"}, that is: + + + @{text [display] "$ isabelle emacs -k foobar a_theory_file"} + + If you now build a theory on top of @{text "Command.thy"}, + then the command \isacommand{foobar} can be used. + Similarly with any other new command. + + + At the moment \isacommand{foobar} is not very useful. Let us refine it a bit + next by letting it take a proposition as argument and printing this proposition + inside the tracing buffer. + + The crucial part of a command is the function that determines the behaviour + of the command. In the code above we used a ``do-nothing''-function, which + because of @{ML Scan.succeed} does not parse any argument, but immediately + returns the simple toplevel function @{ML "Toplevel.theory I"}. We can + replace this code by a function that first parses a proposition (using the + parser @{ML OuterParse.prop}), then prints out the tracing + information (using a new top-level function @{text trace_top_lvl}) and + finally does nothing. For this you can write: +*} + +ML{*let + fun trace_top_lvl str = + Toplevel.theory (fn thy => (tracing str; thy)) + + val trace_prop = OuterParse.prop >> trace_top_lvl + + val kind = OuterKeyword.thy_decl +in + OuterSyntax.command "foobar" "traces a proposition" kind trace_prop +end *} + +text {* + Now you can type + + \begin{isabelle} + \isacommand{foobar}~@{text [quotes] "True \ False"}\\ + @{text "> \"True \ False\""} + \end{isabelle} + + and see the proposition in the tracing buffer. + + Note that so far we used @{ML thy_decl in OuterKeyword} as the kind indicator + for the command. This means that the command finishes as soon as the + arguments are processed. Examples of this kind of commands are + \isacommand{definition} and \isacommand{declare}. In other cases, + commands are expected to parse some arguments, for example a proposition, + and then ``open up'' a proof in order to prove the proposition (for example + \isacommand{lemma}) or prove some other properties (for example + \isacommand{function}). To achieve this kind of behaviour, you have to use the kind + indicator @{ML thy_goal in OuterKeyword}. Note, however, once you change the + ``kind'' of a command from @{ML thy_decl in OuterKeyword} to @{ML thy_goal in OuterKeyword} + then the keyword file needs to be re-created! + + Below we change \isacommand{foobar} so that it takes a proposition as + argument and then starts a proof in order to prove it. Therefore in Line 13, + we set the kind indicator to @{ML thy_goal in OuterKeyword}. +*} + +ML%linenosgray{*let + fun set_up_thm str ctxt = + let + val prop = Syntax.read_prop ctxt str + in + Proof.theorem_i NONE (K I) [[(prop,[])]] ctxt + end; + + val prove_prop = OuterParse.prop >> + (fn str => Toplevel.print o + Toplevel.local_theory_to_proof NONE (set_up_thm str)) + + val kind = OuterKeyword.thy_goal +in + OuterSyntax.command "foobar" "proving a proposition" kind prove_prop +end *} + +text {* + The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be + proved) and a context as argument. The context is necessary in order to be able to use + @{ML Syntax.read_prop}, which converts a string into a proper proposition. + In Line 6 the function @{ML Proof.theorem_i} starts the proof for the + proposition. Its argument @{ML NONE} stands for a locale (which we chose to + omit); the argument @{ML "(K I)"} stands for a function that determines what + should be done with the theorem once it is proved (we chose to just forget + about it). Lines 9 to 11 contain the parser for the proposition. + + If you now type \isacommand{foobar}~@{text [quotes] "True \ True"}, you obtain the following + proof state: + + \begin{isabelle} + \isacommand{foobar}~@{text [quotes] "True \ True"}\\ + @{text "goal (1 subgoal):"}\\ + @{text "1. True \ True"} + \end{isabelle} + + and you can build the proof + + \begin{isabelle} + \isacommand{foobar}~@{text [quotes] "True \ True"}\\ + \isacommand{apply}@{text "(rule conjI)"}\\ + \isacommand{apply}@{text "(rule TrueI)+"}\\ + \isacommand{done} + \end{isabelle} + + + + (FIXME What do @{ML "Toplevel.theory"} + @{ML "Toplevel.print"} + @{ML Toplevel.local_theory} do?) + + (FIXME read a name and show how to store theorems) + +*} + +section {* Methods *} + +text {* + Methods are a central concept in Isabelle. They are the ones you use for example + in \isacommand{apply}. To print out all currently known methods you can use the + Isabelle command. +*} + +print_methods + +text {* + An example of a very simple method is the following code. +*} + +method_setup %gray foobar_meth = + {* Scan.succeed + (K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *} + "foobar method" + +text {* + It defines the method @{text foobar_meth}, which takes no arguments (therefore the + parser @{ML Scan.succeed}) and + only applies the tactic @{thm [source] conjE} and then @{thm [source] conjI}. + This method can be used in the following proof +*} + +lemma shows "A \ B \ C \ D" +apply(foobar_meth) +txt {* + where it results in the goal state + + \begin{minipage}{\textwidth} + @{subgoals} + \end{minipage} *} +(*<*)oops(*>*) + +text {* + (FIXME: explain a version of rule-tac) +*} + +(*<*) + +chapter {* Parsing *} + +text {* + + Lots of Standard ML code is given in this document, for various reasons, + including: + \begin{itemize} + \item direct quotation of code found in the Isabelle source files, + or simplified versions of such code + \item identifiers found in the Isabelle source code, with their types + (or specialisations of their types) + \item code examples, which can be run by the reader, to help illustrate the + behaviour of functions found in the Isabelle source code + \item ancillary functions, not from the Isabelle source code, + which enable the reader to run relevant code examples + \item type abbreviations, which help explain the uses of certain functions + \end{itemize} + +*} + +section {* Parsing Isar input *} + +text {* + + The typical parsing function has the type + \texttt{'src -> 'res * 'src}, with input + of type \texttt{'src}, returning a result + of type \texttt{'res}, which is (or is derived from) the first part of the + input, and also returning the remainder of the input. + (In the common case, when it is clear what the ``remainder of the input'' + means, we will just say that the functions ``returns'' the + value of type \texttt{'res}). + An exception is raised if an appropriate value + cannot be produced from the input. + A range of exceptions can be used to identify different reasons + for the failure of a parse. + + This contrasts the standard parsing function in Standard ML, + which is of type + \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option}; + (for example, \texttt{List.getItem} and \texttt{Substring.getc}). + However, much of the discussion at + FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html + is relevant. + + Naturally one may convert between the two different sorts of parsing functions + as follows: + \begin{verbatim} + open StringCvt ; + type ('res, 'src) ex_reader = 'src -> 'res * 'src + ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader + fun ex_reader rdr src = Option.valOf (rdr src) ; + reader : ('res, 'src) ex_reader -> ('res, 'src) reader + fun reader exrdr src = SOME (exrdr src) handle _ => NONE ; + \end{verbatim} + +*} + +section{* The \texttt{Scan} structure *} + +text {* + The source file is \texttt{src/General/scan.ML}. + This structure provides functions for using and combining parsing functions + of the type \texttt{'src -> 'res * 'src}. + Three exceptions are used: + \begin{verbatim} + exception MORE of string option; (*need more input (prompt)*) + exception FAIL of string option; (*try alternatives (reason of failure)*) + exception ABORT of string; (*dead end*) + \end{verbatim} + Many functions in this structure (generally those with names composed of + symbols) are declared as infix. + + Some functions from that structure are + \begin{verbatim} + |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> + 'src -> 'res2 * 'src'' + --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> + 'src -> 'res1 * 'src'' + -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> + 'src -> ('res1 * 'res2) * 'src'' + ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') -> + 'src -> string * 'src'' + \end{verbatim} + These functions parse a result off the input source twice. + + \texttt{|--} and \texttt{--|} + return the first result and the second result, respectively. + + \texttt{--} returns both. + + \verb|^^| returns the result of concatenating the two results + (which must be strings). + + Note how, although the types + \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same, + the types as shown help suggest the behaviour of the functions. + \begin{verbatim} + :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> + 'src -> ('res1 * 'res2) * 'src'' + :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> + 'src -> 'res2 * 'src'' + \end{verbatim} + These are similar to \texttt{|--} and \texttt{--|}, + except that the second parsing function can depend on the result of the first. + \begin{verbatim} + >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src' + || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src + \end{verbatim} + \texttt{p >> f} applies a function \texttt{f} to the result of a parse. + + \texttt{||} tries a second parsing function if the first one + fails by raising an exception of the form \texttt{FAIL \_}. + + \begin{verbatim} + succeed : 'res -> ('src -> 'res * 'src) ; + fail : ('src -> 'res_src) ; + !! : ('src * string option -> string) -> + ('src -> 'res_src) -> ('src -> 'res_src) ; + \end{verbatim} + \texttt{succeed r} returns \texttt{r}, with the input unchanged. + \texttt{fail} always fails, raising exception \texttt{FAIL NONE}. + \texttt{!! f} only affects the failure mode, turning a failure that + raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}. + This is used to prevent recovery from the failure --- + thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, + it won't recover by trying \texttt{parse2}. + + \begin{verbatim} + one : ('si -> bool) -> ('si list -> 'si * 'si list) ; + some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ; + \end{verbatim} + These require the input to be a list of items: + they fail, raising \texttt{MORE NONE} if the list is empty. + On other failures they raise \texttt{FAIL NONE} + + \texttt{one p} takes the first + item from the list if it satisfies \texttt{p}, otherwise fails. + + \texttt{some f} takes the first + item from the list and applies \texttt{f} to it, failing if this returns + \texttt{NONE}. + + \begin{verbatim} + many : ('si -> bool) -> 'si list -> 'si list * 'si list ; + \end{verbatim} + \texttt{many p} takes items from the input until it encounters one + which does not satisfy \texttt{p}. If it reaches the end of the input + it fails, raising \texttt{MORE NONE}. + + \texttt{many1} (with the same type) fails if the first item + does not satisfy \texttt{p}. + + \begin{verbatim} + option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src) + optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src) + \end{verbatim} + \texttt{option}: + where the parser \texttt{f} succeeds with result \texttt{r} + or raises \texttt{FAIL \_}, + \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}. + + \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_}, + \texttt{optional f default} provides the result \texttt{default}. + + \begin{verbatim} + repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src + repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src + bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src + \end{verbatim} + \texttt{repeat f} repeatedly parses an item off the remaining input until + \texttt{f} fails with \texttt{FAIL \_} + + \texttt{repeat1} is as for \texttt{repeat}, but requires at least one + successful parse. + + \begin{verbatim} + lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src)) + \end{verbatim} + \texttt{lift} changes the source type of a parser by putting in an extra + component \texttt{'ex}, which is ignored in the parsing. + + The \texttt{Scan} structure also provides the type \texttt{lexicon}, + HOW DO THEY WORK ?? TO BE COMPLETED + \begin{verbatim} + dest_lexicon: lexicon -> string list ; + make_lexicon: string list list -> lexicon ; + empty_lexicon: lexicon ; + extend_lexicon: string list list -> lexicon -> lexicon ; + merge_lexicons: lexicon -> lexicon -> lexicon ; + is_literal: lexicon -> string list -> bool ; + literal: lexicon -> string list -> string list * string list ; + \end{verbatim} + Two lexicons, for the commands and keywords, are stored and can be retrieved + by: + \begin{verbatim} + val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ; + val commands = Scan.dest_lexicon command_lexicon ; + val keywords = Scan.dest_lexicon keyword_lexicon ; + \end{verbatim} +*} + +section{* The \texttt{OuterLex} structure *} + +text {* + The source file is @{text "src/Pure/Isar/outer_lex.ML"}. + In some other source files its name is abbreviated: + \begin{verbatim} + structure T = OuterLex; + \end{verbatim} + This structure defines the type \texttt{token}. + (The types + \texttt{OuterLex.token}, + \texttt{OuterParse.token} and + \texttt{SpecParse.token} are all the same). + + Input text is split up into tokens, and the input source type for many parsing + functions is \texttt{token list}. + + The datatype definition (which is not published in the signature) is + \begin{verbatim} + datatype token = Token of Position.T * (token_kind * string); + \end{verbatim} + but here are some runnable examples for viewing tokens: + +*} + + + + +ML{* + val toks = OuterSyntax.scan Position.none + "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ; +*} + +ML{* + print_depth 20 ; +*} + +ML{* + map OuterLex.text_of toks ; +*} + +ML{* + val proper_toks = filter OuterLex.is_proper toks ; +*} + +ML{* + map OuterLex.kind_of proper_toks +*} + +ML{* + map OuterLex.unparse proper_toks ; +*} + +ML{* + OuterLex.stopper +*} + +text {* + + The function \texttt{is\_proper : token -> bool} identifies tokens which are + not white space or comments: many parsing functions assume require spaces or + comments to have been filtered out. + + There is a special end-of-file token: + \begin{verbatim} + val (tok_eof : token, is_eof : token -> bool) = T.stopper ; + (* end of file token *) + \end{verbatim} + +*} + +section {* The \texttt{OuterParse} structure *} + +text {* + The source file is \texttt{src/Pure/Isar/outer\_parse.ML}. + In some other source files its name is abbreviated: + \begin{verbatim} + structure P = OuterParse; + \end{verbatim} + Here the parsers use \texttt{token list} as the input source type. + + Some of the parsers simply select the first token, provided that it is of the + right kind (as returned by \texttt{T.kind\_of}): these are + \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var, + type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof} + Others select the first token, provided that it is one of several kinds, + (eg, \texttt{name, xname, text, typ}). + + \begin{verbatim} + type 'a tlp = token list -> 'a * token list ; (* token list parser *) + $$$ : string -> string tlp + nat : int tlp ; + maybe : 'a tlp -> 'a option tlp ; + \end{verbatim} + + \texttt{\$\$\$ s} returns the first token, + if it equals \texttt{s} \emph{and} \texttt{s} is a keyword. + + \texttt{nat} returns the first token, if it is a number, and evaluates it. + + \texttt{maybe}: if \texttt{p} returns \texttt{r}, + then \texttt{maybe p} returns \texttt{SOME r} ; + if the first token is an underscore, it returns \texttt{NONE}. + + A few examples: + \begin{verbatim} + P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *) + P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *) + val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ; + val proper_toks = List.filter T.is_proper toks ; + P.list P.nat toks ; (* OK, doesn't recognize white space *) + P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *) + P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *) + P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *) + val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ; + P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *) + \end{verbatim} + + The following code helps run examples: + \begin{verbatim} + fun parse_str tlp str = + let val toks : token list = OuterSyntax.scan str ; + val proper_toks = List.filter T.is_proper toks @ [tok_eof] ; + val (res, rem_toks) = tlp proper_toks ; + val rem_str = String.concat + (Library.separate " " (List.map T.unparse rem_toks)) ; + in (res, rem_str) end ; + \end{verbatim} + + Some examples from \texttt{src/Pure/Isar/outer\_parse.ML} + \begin{verbatim} + val type_args = + type_ident >> Library.single || + $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") || + Scan.succeed []; + \end{verbatim} + There are three ways parsing a list of type arguments can succeed. + The first line reads a single type argument, and turns it into a singleton + list. + The second line reads "(", and then the remainder, ignoring the "(" ; + the remainder consists of a list of type identifiers (at least one), + and then a ")" which is also ignored. + The \texttt{!!!} ensures that if the parsing proceeds this far and then fails, + it won't try the third line (see the description of \texttt{Scan.!!}). + The third line consumes no input and returns the empty list. + + \begin{verbatim} + fun triple2 (x, (y, z)) = (x, y, z); + val arity = xname -- ($$$ "::" |-- !!! ( + Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) [] + -- sort)) >> triple2; + \end{verbatim} + The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is + ignored), then optionally a list $ss$ of sorts and then another sort $s$. + The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$. + The second line reads the optional list of sorts: + it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored, + and between them a comma-separated list of sorts. + If this list is absent, the default \texttt{[]} provides the list of sorts. + + \begin{verbatim} + parse_str P.type_args "('a, 'b) ntyp" ; + parse_str P.type_args "'a ntyp" ; + parse_str P.type_args "ntyp" ; + parse_str P.arity "ty :: tycl" ; + parse_str P.arity "ty :: (tycl1, tycl2) tycl" ; + \end{verbatim} + +*} + +section {* The \texttt{SpecParse} structure *} + +text {* + The source file is \texttt{src/Pure/Isar/spec\_parse.ML}. + This structure contains token list parsers for more complicated values. + For example, + \begin{verbatim} + open SpecParse ; + attrib : Attrib.src tok_rdr ; + attribs : Attrib.src list tok_rdr ; + opt_attribs : Attrib.src list tok_rdr ; + xthm : (thmref * Attrib.src list) tok_rdr ; + xthms1 : (thmref * Attrib.src list) list tok_rdr ; + + parse_str attrib "simp" ; + parse_str opt_attribs "hello" ; + val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ; + map Args.dest_src ass ; + val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ; + + parse_str xthm "mythm [attr]" ; + parse_str xthms1 "thm1 [attr] thms2" ; + \end{verbatim} + + As you can see, attributes are described using types of the \texttt{Args} + structure, described below. +*} + +section{* The \texttt{Args} structure *} + +text {* + The source file is \texttt{src/Pure/Isar/args.ML}. + The primary type of this structure is the \texttt{src} datatype; + the single constructors not published in the signature, but + \texttt{Args.src} and \texttt{Args.dest\_src} + are in fact the constructor and destructor functions. + Note that the types \texttt{Attrib.src} and \texttt{Method.src} + are in fact \texttt{Args.src}. + + \begin{verbatim} + src : (string * Args.T list) * Position.T -> Args.src ; + dest_src : Args.src -> (string * Args.T list) * Position.T ; + Args.pretty_src : Proof.context -> Args.src -> Pretty.T ; + fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ; + + val thy = ML_Context.the_context () ; + val ctxt = ProofContext.init thy ; + map (pr_src ctxt) ass ; + \end{verbatim} + + So an \texttt{Args.src} consists of the first word, then a list of further + ``arguments'', of type \texttt{Args.T}, with information about position in the + input. + \begin{verbatim} + (* how an Args.src is parsed *) + P.position : 'a tlp -> ('a * Position.T) tlp ; + P.arguments : Args.T list tlp ; + + val parse_src : Args.src tlp = + P.position (P.xname -- P.arguments) >> Args.src ; + \end{verbatim} + + \begin{verbatim} + val ((first_word, args), pos) = Args.dest_src asrc ; + map Args.string_of args ; + \end{verbatim} + + The \texttt{Args} structure contains more parsers and parser transformers + for which the input source type is \texttt{Args.T list}. For example, + \begin{verbatim} + type 'a atlp = Args.T list -> 'a * Args.T list ; + open Args ; + nat : int atlp ; (* also Args.int *) + thm_sel : PureThy.interval list atlp ; + list : 'a atlp -> 'a list atlp ; + attribs : (string -> string) -> Args.src list atlp ; + opt_attribs : (string -> string) -> Args.src list atlp ; + + (* parse_atl_str : 'a atlp -> (string -> 'a * string) ; + given an Args.T list parser, to get a string parser *) + fun parse_atl_str atlp str = + let val (ats, rem_str) = parse_str P.arguments str ; + val (res, rem_ats) = atlp ats ; + in (res, String.concat (Library.separate " " + (List.map Args.string_of rem_ats @ [rem_str]))) end ; + + parse_atl_str Args.int "-1-," ; + parse_atl_str (Scan.option Args.int) "x1-," ; + parse_atl_str Args.thm_sel "(1-,4,13-22)" ; + + val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I) + "[THEN trans [THEN sym], simp, OF sym]" ; + \end{verbatim} + + From here, an attribute is interpreted using \texttt{Attrib.attribute}. + + \texttt{Args} has a large number of functions which parse an \texttt{Args.src} + and also refer to a generic context. + Note the use of \texttt{Scan.lift} for this. + (as does \texttt{Attrib} - RETHINK THIS) + + (\texttt{Args.syntax} shown below has type specialised) + + \begin{verbatim} + type ('res, 'src) parse_fn = 'src -> 'res * 'src ; + type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ; + Scan.lift : 'a atlp -> 'a cgatlp ; + term : term cgatlp ; + typ : typ cgatlp ; + + Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ; + Attrib.thm : thm cgatlp ; + Attrib.thms : thm list cgatlp ; + Attrib.multi_thm : thm list cgatlp ; + + (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ; + given a (Context.generic * Args.T list) parser, to get a string parser *) + fun parse_cgatl_str cgatlp str = + let + (* use the current generic context *) + val generic = Context.Theory thy ; + val (ats, rem_str) = parse_str P.arguments str ; + (* ignore any change to the generic context *) + val (res, (_, rem_ats)) = cgatlp (generic, ats) ; + in (res, String.concat (Library.separate " " + (List.map Args.string_of rem_ats @ [rem_str]))) end ; + \end{verbatim} +*} + +section{* Attributes, and the \texttt{Attrib} structure *} + +text {* + The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}. + The source file for the \texttt{Attrib} structure is + \texttt{src/Pure/Isar/attrib.ML}. + Most attributes use a theorem to change a generic context (for example, + by declaring that the theorem should be used, by default, in simplification), + or change a theorem (which most often involves referring to the current + theory). + The functions \texttt{Thm.rule\_attribute} and + \texttt{Thm.declaration\_attribute} create attributes of these kinds. + + \begin{verbatim} + type attribute = Context.generic * thm -> Context.generic * thm; + type 'a trf = 'a -> 'a ; (* transformer of a given type *) + Thm.rule_attribute : (Context.generic -> thm -> thm) -> attribute ; + Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ; + + Attrib.print_attributes : theory -> unit ; + Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ; + + List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ; + \end{verbatim} + + An attribute is stored in a theory as indicated by: + \begin{verbatim} + Attrib.add_attributes : + (bstring * (src -> attribute) * string) list -> theory trf ; + (* + Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ; + *) + \end{verbatim} + where the first and third arguments are name and description of the attribute, + and the second is a function which parses the attribute input text + (including the attribute name, which has necessarily already been parsed). + Here, \texttt{THEN\_att} is a function declared in the code for the + structure \texttt{Attrib}, but not published in its signature. + The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of + \texttt{Attrib.add\_attributes} to add a number of attributes. + + \begin{verbatim} + FullAttrib.THEN_att : src -> attribute ; + FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ; + FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ; + \end{verbatim} + + \begin{verbatim} + Attrib.syntax : attribute cgatlp -> src -> attribute ; + Attrib.no_args : attribute -> src -> attribute ; + \end{verbatim} + When this is called as \texttt{syntax scan src (gc, th)} + the generic context \texttt{gc} is used + (and potentially changed to \texttt{gc'}) + by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would + then be applied to \texttt{(gc', th)}. + The source for parsing the attribute is the arguments part of \texttt{src}, + which must all be consumed by the parse. + + For example, for \texttt{Attrib.no\_args attr src}, the attribute parser + simply returns \texttt{attr}, requiring that the arguments part of + \texttt{src} must be empty. + + Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified: + \begin{verbatim} + fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ; + rot_att_n : int -> attribute ; + val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ; + val rotated_att : src -> attribute = + Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ; + + val THEN_arg : int cgatlp = Scan.lift + (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ; + + Attrib.thm : thm cgatlp ; + + THEN_arg -- Attrib.thm : (int * thm) cgatlp ; + + fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ; + THEN_att_n : int * thm -> attribute ; + + val THEN_att : src -> attribute = Attrib.syntax + (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp); + \end{verbatim} + The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg} + read an optional argument, which for \texttt{rotated} is an integer, + and for \texttt{THEN} is a natural enclosed in square brackets; + the default, if the argument is absent, is 1 in each case. + Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into + attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is + parsed by \texttt{Attrib.thm}. + Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}. + +*} + +section{* Methods, and the \texttt{Method} structure *} + +text {* + The source file is \texttt{src/Pure/Isar/method.ML}. + The type \texttt{method} is defined by the datatype declaration + \begin{verbatim} + (* datatype method = Meth of thm list -> cases_tactic; *) + RuleCases.NO_CASES : tactic -> cases_tactic ; + \end{verbatim} + In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor + \texttt{Meth}. + A \texttt{cases\_tactic} is an elaborated version of a tactic. + \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a + \texttt{cases\_tactic} without any further case information. + For further details see the description of structure \texttt{RuleCases} below. + The list of theorems to be passed to a method consists of the current + \emph{facts} in the proof. + + \begin{verbatim} + RAW_METHOD : (thm list -> tactic) -> method ; + METHOD : (thm list -> tactic) -> method ; + + SIMPLE_METHOD : tactic -> method ; + SIMPLE_METHOD' : (int -> tactic) -> method ; + SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ; + + RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ; + METHOD_CASES : (thm list -> cases_tactic) -> method ; + \end{verbatim} + A method is, in its simplest form, a tactic; applying the method is to apply + the tactic to the current goal state. + + Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying + \texttt{tacf} to the current {facts}, and applying that tactic to the + goal state. + + \texttt{METHOD} is similar but also first applies + \texttt{Goal.conjunction\_tac} to all subgoals. + + \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then + applies \texttt{tacf}. + + \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then + applies \texttt{tacf} to subgoal 1. + + \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by + \texttt{quant}, which may be, for example, + \texttt{ALLGOALS} (all subgoals), + \texttt{TRYALL} (try all subgoals, failure is OK), + \texttt{FIRSTGOAL} (try subgoals until it succeeds once), + \texttt{(fn tacf => tacf 4)} (subgoal 4), etc + (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}). + + A method is stored in a theory as indicated by: + \begin{verbatim} + Method.add_method : + (bstring * (src -> Proof.context -> method) * string) -> theory trf ; + ( * + * ) + \end{verbatim} + where the first and third arguments are name and description of the method, + and the second is a function which parses the method input text + (including the method name, which has necessarily already been parsed). + + Here, \texttt{xxx} is a function declared in the code for the + structure \texttt{Method}, but not published in its signature. + The source file \texttt{src/Pure/Isar/method.ML} shows the use of + \texttt{Method.add\_method} to add a number of methods. + + +*} +(*>*) +end \ No newline at end of file