| 
     1 theory Parsing  | 
         | 
     2 imports Base "Package/Simple_Inductive_Package"  | 
         | 
     3 begin  | 
         | 
     4   | 
         | 
     5   | 
         | 
     6 chapter {* Parsing *} | 
         | 
     7   | 
         | 
     8 text {* | 
         | 
     9   | 
         | 
    10   Isabelle distinguishes between \emph{outer} and \emph{inner} syntax.  | 
         | 
    11   Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so | 
         | 
    12   on, belong to the outer syntax, whereas items inside double quotation marks, such   | 
         | 
    13   as terms, types and so on, belong to the inner syntax. For parsing inner syntax,   | 
         | 
    14   Isabelle uses a rather general and sophisticated algorithm, which   | 
         | 
    15   is driven by priority grammars. Parsers for outer syntax are built up by functional  | 
         | 
    16   parsing combinators. These combinators are a well-established technique for parsing,   | 
         | 
    17   which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. | 
         | 
    18   Isabelle developers are usually concerned with writing these outer syntax parsers,   | 
         | 
    19   either for new definitional packages or for calling tactics with specific arguments.   | 
         | 
    20   | 
         | 
    21   \begin{readmore} | 
         | 
    22   The library   | 
         | 
    23   for writing parser combinators is split up, roughly, into two parts.   | 
         | 
    24   The first part consists of a collection of generic parser combinators defined  | 
         | 
    25   in the structure @{ML_struct Scan} in the file  | 
         | 
    26   @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of  | 
         | 
    27   combinators for dealing with specific token types, which are defined in the   | 
         | 
    28   structure @{ML_struct OuterParse} in the file  | 
         | 
    29   @{ML_file "Pure/Isar/outer_parse.ML"}. | 
         | 
    30   \end{readmore} | 
         | 
    31   | 
         | 
    32 *}  | 
         | 
    33   | 
         | 
    34 section {* Building Generic Parsers *} | 
         | 
    35   | 
         | 
    36 text {* | 
         | 
    37   | 
         | 
    38   Let us first have a look at parsing strings using generic parsing combinators.   | 
         | 
    39   The function @{ML "$$"} takes a string as argument and will ``consume'' this string from  | 
         | 
    40   a given input list of strings. ``Consume'' in this context means that it will   | 
         | 
    41   return a pair consisting of this string and the rest of the input list.   | 
         | 
    42   For example:  | 
         | 
    43   | 
         | 
    44   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} | 
         | 
    45   | 
         | 
    46   @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} | 
         | 
    47   | 
         | 
    48   The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception  | 
         | 
    49   @{text "FAIL"} if no string can be consumed. For example trying to parse | 
         | 
    50   | 
         | 
    51   @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")"  | 
         | 
    52                                "Exception FAIL raised"}  | 
         | 
    53     | 
         | 
    54   will raise the exception @{text "FAIL"}. | 
         | 
    55   There are three exceptions used in the parsing combinators:  | 
         | 
    56   | 
         | 
    57   \begin{itemize} | 
         | 
    58   \item @{text "FAIL"} is used to indicate that alternative routes of parsing  | 
         | 
    59   might be explored.   | 
         | 
    60   \item @{text "MORE"} indicates that there is not enough input for the parser. For example  | 
         | 
    61   in @{text "($$ \"h\") []"}. | 
         | 
    62   \item @{text "ABORT"} is the exception that is raised when a dead end is reached.  | 
         | 
    63   It is used for example in the function @{ML "!!"} (see below). | 
         | 
    64   \end{itemize} | 
         | 
    65   | 
         | 
    66   However, note that these exceptions are private to the parser and cannot be accessed  | 
         | 
    67   by the programmer (for example to handle them).  | 
         | 
    68     | 
         | 
    69   Slightly more general than the parser @{ML "$$"} is the function @{ML | 
         | 
    70   Scan.one}, in that it takes a predicate as argument and then parses exactly  | 
         | 
    71   one item from the input list satisfying this predicate. For example the  | 
         | 
    72   following parser either consumes an @{text [quotes] "h"} or a @{text | 
         | 
    73   [quotes] "w"}:  | 
         | 
    74   | 
         | 
    75   | 
         | 
    76 @{ML_response [display,gray]  | 
         | 
    77 "let   | 
         | 
    78   val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")  | 
         | 
    79   val input1 = (explode \"hello\")  | 
         | 
    80   val input2 = (explode \"world\")  | 
         | 
    81 in  | 
         | 
    82     (hw input1, hw input2)  | 
         | 
    83 end"  | 
         | 
    84     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}  | 
         | 
    85   | 
         | 
    86   Two parser can be connected in sequence by using the function @{ML "--"}.  | 
         | 
    87   For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this  | 
         | 
    88   sequence you can achieve by:  | 
         | 
    89   | 
         | 
    90   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" | 
         | 
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}  | 
         | 
    92   | 
         | 
    93   Note how the result of consumed strings builds up on the left as nested pairs.    | 
         | 
    94   | 
         | 
    95   If, as in the previous example, you want to parse a particular string,   | 
         | 
    96   then you should use the function @{ML Scan.this_string}: | 
         | 
    97   | 
         | 
    98   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")" | 
         | 
    99                           "(\"hell\", [\"o\"])"}  | 
         | 
   100   | 
         | 
   101   Parsers that explore alternatives can be constructed using the function @{ML | 
         | 
   102   "||"}. For example, the parser @{ML "(p || q)" for p q} returns the | 
         | 
   103   result of @{text "p"}, in case it succeeds, otherwise it returns the | 
         | 
   104   result of @{text "q"}. For example: | 
         | 
   105   | 
         | 
   106   | 
         | 
   107 @{ML_response [display,gray]  | 
         | 
   108 "let   | 
         | 
   109   val hw = ($$ \"h\") || ($$ \"w\")  | 
         | 
   110   val input1 = (explode \"hello\")  | 
         | 
   111   val input2 = (explode \"world\")  | 
         | 
   112 in  | 
         | 
   113   (hw input1, hw input2)  | 
         | 
   114 end"  | 
         | 
   115   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}  | 
         | 
   116   | 
         | 
   117   The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function  | 
         | 
   118   for parsers, except that they discard the item being parsed by the first (respectively second)  | 
         | 
   119   parser. For example:  | 
         | 
   120     | 
         | 
   121 @{ML_response [display,gray] | 
         | 
   122 "let   | 
         | 
   123   val just_e = ($$ \"h\") |-- ($$ \"e\")   | 
         | 
   124   val just_h = ($$ \"h\") --| ($$ \"e\")   | 
         | 
   125   val input = (explode \"hello\")    | 
         | 
   126 in   | 
         | 
   127   (just_e input, just_h input)  | 
         | 
   128 end"  | 
         | 
   129   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}  | 
         | 
   130   | 
         | 
   131   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser  | 
         | 
   132   @{text "p"}, if it succeeds; otherwise it returns  | 
         | 
   133   the default value @{text "x"}. For example: | 
         | 
   134   | 
         | 
   135 @{ML_response [display,gray] | 
         | 
   136 "let   | 
         | 
   137   val p = Scan.optional ($$ \"h\") \"x\"  | 
         | 
   138   val input1 = (explode \"hello\")  | 
         | 
   139   val input2 = (explode \"world\")    | 
         | 
   140 in   | 
         | 
   141   (p input1, p input2)  | 
         | 
   142 end"   | 
         | 
   143  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}  | 
         | 
   144   | 
         | 
   145   The function @{ML Scan.option} works similarly, except no default value can | 
         | 
   146   be given. Instead, the result is wrapped as an @{text "option"}-type. For example: | 
         | 
   147   | 
         | 
   148 @{ML_response [display,gray] | 
         | 
   149 "let   | 
         | 
   150   val p = Scan.option ($$ \"h\")  | 
         | 
   151   val input1 = (explode \"hello\")  | 
         | 
   152   val input2 = (explode \"world\")    | 
         | 
   153 in   | 
         | 
   154   (p input1, p input2)  | 
         | 
   155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}  | 
         | 
   156   | 
         | 
   157   The function @{ML "!!"} helps to produce appropriate error messages | 
         | 
   158   during parsing. For example if you want to parse that @{text p} is immediately  | 
         | 
   159   followed by @{text q}, or start a completely different parser @{text r}, | 
         | 
   160   you might write:  | 
         | 
   161   | 
         | 
   162   @{ML [display,gray] "(p -- q) || r" for p q r} | 
         | 
   163   | 
         | 
   164   However, this parser is problematic for producing an appropriate error  | 
         | 
   165   message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in | 
         | 
   166   that case you lose the information that @{text p} should be followed by | 
         | 
   167   @{text q}. To see this consider the case in which @{text p} is present in | 
         | 
   168   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail | 
         | 
   169   and the alternative parser @{text r} will be tried. However in many | 
         | 
   170   circumstance this will be the wrong parser for the input ``p-followed-by-q''  | 
         | 
   171   and therefore will also fail. The error message is then caused by the  | 
         | 
   172   failure of @{text r}, not by the absence of @{text q} in the input. This | 
         | 
   173   kind of situation can be avoided when using the function @{ML "!!"}.  | 
         | 
   174   This function aborts the whole process of parsing in case of a  | 
         | 
   175   failure and prints an error message. For example if you invoke the parser  | 
         | 
   176   | 
         | 
   177     | 
         | 
   178   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} | 
         | 
   179   | 
         | 
   180   on @{text [quotes] "hello"}, the parsing succeeds | 
         | 
   181   | 
         | 
   182   @{ML_response [display,gray]  | 
         | 
   183                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")"   | 
         | 
   184                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}  | 
         | 
   185   | 
         | 
   186   but if you invoke it on @{text [quotes] "world"} | 
         | 
   187     | 
         | 
   188   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" | 
         | 
   189                                "Exception ABORT raised"}  | 
         | 
   190   | 
         | 
   191   then the parsing aborts and the error message @{text "foo"} is printed. In order to | 
         | 
   192   see the error message properly, you need to prefix the parser with the function   | 
         | 
   193   @{ML "Scan.error"}. For example: | 
         | 
   194   | 
         | 
   195   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" | 
         | 
   196                                "Exception Error \"foo\" raised"}  | 
         | 
   197   | 
         | 
   198   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"}  | 
         | 
   199   (see Section~\ref{sec:newcommand} which explains this function in more detail).  | 
         | 
   200     | 
         | 
   201   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q | 
         | 
   202   r}. If you want to generate the correct error message for p-followed-by-q,  | 
         | 
   203   then you have to write:  | 
         | 
   204 *}  | 
         | 
   205   | 
         | 
   206 ML{*fun p_followed_by_q p q r = | 
         | 
   207 let   | 
         | 
   208   val err_msg = (fn _ => p ^ " is not followed by " ^ q)  | 
         | 
   209 in  | 
         | 
   210   ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)  | 
         | 
   211 end *}  | 
         | 
   212   | 
         | 
   213   | 
         | 
   214 text {* | 
         | 
   215   Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and  | 
         | 
   216   the input @{text [quotes] "holle"}  | 
         | 
   217   | 
         | 
   218   @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" | 
         | 
   219                                "Exception ERROR \"h is not followed by e\" raised"}   | 
         | 
   220   | 
         | 
   221   produces the correct error message. Running it with  | 
         | 
   222    | 
         | 
   223   @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" | 
         | 
   224                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}  | 
         | 
   225     | 
         | 
   226   yields the expected parsing.   | 
         | 
   227   | 
         | 
   228   The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as  | 
         | 
   229   often as it succeeds. For example:  | 
         | 
   230     | 
         | 
   231   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")"  | 
         | 
   232                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}  | 
         | 
   233     | 
         | 
   234   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function | 
         | 
   235   @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"}  | 
         | 
   236   succeeds at least once.  | 
         | 
   237   | 
         | 
   238   Also note that the parser would have aborted with the exception @{text MORE}, if | 
         | 
   239   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using | 
         | 
   240   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With | 
         | 
   241   them you can write:  | 
         | 
   242     | 
         | 
   243   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")"  | 
         | 
   244                 "([\"h\", \"h\", \"h\", \"h\"], [])"}  | 
         | 
   245   | 
         | 
   246   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; | 
         | 
   247   other stoppers need to be used when parsing, for example, tokens. However, this kind of   | 
         | 
   248   manually wrapping is often already done by the surrounding infrastructure.   | 
         | 
   249   | 
         | 
   250   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any  | 
         | 
   251   string as in  | 
         | 
   252   | 
         | 
   253   @{ML_response [display,gray]  | 
         | 
   254 "let   | 
         | 
   255    val p = Scan.repeat (Scan.one Symbol.not_eof)  | 
         | 
   256    val input = (explode \"foo bar foo\")   | 
         | 
   257 in  | 
         | 
   258    Scan.finite Symbol.stopper p input  | 
         | 
   259 end"   | 
         | 
   260 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}  | 
         | 
   261   | 
         | 
   262   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the  | 
         | 
   263   end of the input string (i.e.~stopper symbol).  | 
         | 
   264   | 
         | 
   265   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can  | 
         | 
   266   parse the input, then the whole parser fails; if not, then the second is tried. Therefore  | 
         | 
   267     | 
         | 
   268   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")" | 
         | 
   269                                "Exception FAIL raised"}  | 
         | 
   270   | 
         | 
   271   fails, while  | 
         | 
   272   | 
         | 
   273   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")" | 
         | 
   274                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}  | 
         | 
   275   | 
         | 
   276   succeeds.   | 
         | 
   277   | 
         | 
   278   The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any | 
         | 
   279   input until a certain marker symbol is reached. In the example below the marker  | 
         | 
   280   symbol is a @{text [quotes] "*"}. | 
         | 
   281   | 
         | 
   282   @{ML_response [display,gray] | 
         | 
   283 "let   | 
         | 
   284   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))  | 
         | 
   285   val input1 = (explode \"fooooo\")  | 
         | 
   286   val input2 = (explode \"foo*ooo\")  | 
         | 
   287 in  | 
         | 
   288   (Scan.finite Symbol.stopper p input1,   | 
         | 
   289    Scan.finite Symbol.stopper p input2)  | 
         | 
   290 end"  | 
         | 
   291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),  | 
         | 
   292  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}  | 
         | 
   293   | 
         | 
   294   After parsing is done, you nearly always want to apply a function on the parsed   | 
         | 
   295   items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs  | 
         | 
   296   first the parser @{text p} and upon successful completion applies the  | 
         | 
   297   function @{text f} to the result. For example | 
         | 
   298   | 
         | 
   299 @{ML_response [display,gray] | 
         | 
   300 "let   | 
         | 
   301   fun double (x,y) = (x ^ x, y ^ y)   | 
         | 
   302 in  | 
         | 
   303   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")  | 
         | 
   304 end"  | 
         | 
   305 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}  | 
         | 
   306   | 
         | 
   307   doubles the two parsed input strings; or  | 
         | 
   308   | 
         | 
   309   @{ML_response [display,gray]  | 
         | 
   310 "let   | 
         | 
   311    val p = Scan.repeat (Scan.one Symbol.not_eof)  | 
         | 
   312    val input = (explode \"foo bar foo\")   | 
         | 
   313 in  | 
         | 
   314    Scan.finite Symbol.stopper (p >> implode) input  | 
         | 
   315 end"   | 
         | 
   316 "(\"foo bar foo\",[])"}  | 
         | 
   317   | 
         | 
   318   where the single-character strings in the parsed output are transformed  | 
         | 
   319   back into one string.  | 
         | 
   320    | 
         | 
   321   The function @{ML Scan.ahead} parses some input, but leaves the original | 
         | 
   322   input unchanged. For example:  | 
         | 
   323   | 
         | 
   324   @{ML_response [display,gray] | 
         | 
   325   "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")"   | 
         | 
   326   "(\"foo\", [\"f\", \"o\", \"o\"])"}   | 
         | 
   327   | 
         | 
   328   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies | 
         | 
   329   the given parser to the second component of the pair and leaves the  first component   | 
         | 
   330   untouched. For example  | 
         | 
   331   | 
         | 
   332 @{ML_response [display,gray] | 
         | 
   333 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"  | 
         | 
   334 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}  | 
         | 
   335   | 
         | 
   336   (FIXME: In which situations is this useful? Give examples.)   | 
         | 
   337   | 
         | 
   338   \begin{exercise}\label{ex:scancmts} | 
         | 
   339   Write a parser that parses an input string so that any comment enclosed  | 
         | 
   340   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside | 
         | 
   341   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the | 
         | 
   342   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML | 
         | 
   343   "s1 ^ s ^ s2" for s1 s2 s}.  | 
         | 
   344   \end{exercise} | 
         | 
   345 *}  | 
         | 
   346   | 
         | 
   347 section {* Parsing Theory Syntax *} | 
         | 
   348   | 
         | 
   349 text {* | 
         | 
   350   (FIXME: context parser)  | 
         | 
   351   | 
         | 
   352   Most of the time, however, Isabelle developers have to deal with parsing  | 
         | 
   353   tokens, not strings.  These token parsers have the type:  | 
         | 
   354 *}  | 
         | 
   355     | 
         | 
   356 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*} | 
         | 
   357   | 
         | 
   358 text {* | 
         | 
   359   The reason for using token parsers is that theory syntax, as well as the  | 
         | 
   360   parsers for the arguments of proof methods, use the type @{ML_type | 
         | 
   361   OuterLex.token} (which is identical to the type @{ML_type | 
         | 
   362   OuterParse.token}).  However, there are also handy parsers for  | 
         | 
   363   ML-expressions and ML-files.  | 
         | 
   364   | 
         | 
   365   \begin{readmore} | 
         | 
   366   The parser functions for the theory syntax are contained in the structure  | 
         | 
   367   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}. | 
         | 
   368   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. | 
         | 
   369   \end{readmore} | 
         | 
   370   | 
         | 
   371   The structure @{ML_struct OuterLex} defines several kinds of tokens (for example  | 
         | 
   372   @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and | 
         | 
   373   @{ML "Command" in OuterLex} for commands). Some token parsers take into account the  | 
         | 
   374   kind of tokens.  | 
         | 
   375 *}    | 
         | 
   376   | 
         | 
   377 text {* | 
         | 
   378   The first example shows how to generate a token list out of a string using  | 
         | 
   379   the function @{ML "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"} | 
         | 
   380   since, at the moment, we are not interested in generating  | 
         | 
   381   precise error messages. The following code  | 
         | 
   382   | 
         | 
   383 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\""  | 
         | 
   384 "[Token (\<dots>,(Ident, \"hello\"),\<dots>),   | 
         | 
   385  Token (\<dots>,(Space, \" \"),\<dots>),   | 
         | 
   386  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}  | 
         | 
   387   | 
         | 
   388   produces three tokens where the first and the last are identifiers, since  | 
         | 
   389   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any | 
         | 
   390   other syntactic category.\footnote{Note that because of a possible a bug in | 
         | 
   391   the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of | 
         | 
   392   the tokens.} The second indicates a space.  | 
         | 
   393   | 
         | 
   394   Many parsing functions later on will require spaces, comments and the like  | 
         | 
   395   to have already been filtered out.  So from now on we are going to use the   | 
         | 
   396   functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example: | 
         | 
   397   | 
         | 
   398 @{ML_response_fake [display,gray] | 
         | 
   399 "let  | 
         | 
   400    val input = OuterSyntax.scan Position.none \"hello world\"  | 
         | 
   401 in  | 
         | 
   402    filter OuterLex.is_proper input  | 
         | 
   403 end"   | 
         | 
   404 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}  | 
         | 
   405   | 
         | 
   406   For convenience we define the function:  | 
         | 
   407   | 
         | 
   408 *}  | 
         | 
   409   | 
         | 
   410 ML{*fun filtered_input str =  | 
         | 
   411   filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}  | 
         | 
   412   | 
         | 
   413 text {* | 
         | 
   414   | 
         | 
   415   If you now parse  | 
         | 
   416   | 
         | 
   417 @{ML_response_fake [display,gray]  | 
         | 
   418 "filtered_input \"inductive | for\""   | 
         | 
   419 "[Token (\<dots>,(Command, \"inductive\"),\<dots>),   | 
         | 
   420  Token (\<dots>,(Keyword, \"|\"),\<dots>),   | 
         | 
   421  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}  | 
         | 
   422   | 
         | 
   423   you obtain a list consisting of only a command and two keyword tokens.  | 
         | 
   424   If you want to see which keywords and commands are currently known to Isabelle, type in  | 
         | 
   425   the following code (you might have to adjust the @{ML print_depth} in order to | 
         | 
   426   see the complete list):  | 
         | 
   427   | 
         | 
   428 @{ML_response_fake [display,gray]  | 
         | 
   429 "let   | 
         | 
   430   val (keywords, commands) = OuterKeyword.get_lexicons ()  | 
         | 
   431 in   | 
         | 
   432   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)  | 
         | 
   433 end"   | 
         | 
   434 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} | 
         | 
   435   | 
         | 
   436   The parser @{ML "OuterParse.$$$"} parses a single keyword. For example: | 
         | 
   437   | 
         | 
   438 @{ML_response [display,gray] | 
         | 
   439 "let   | 
         | 
   440   val input1 = filtered_input \"where for\"  | 
         | 
   441   val input2 = filtered_input \"| in\"  | 
         | 
   442 in   | 
         | 
   443   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)  | 
         | 
   444 end"  | 
         | 
   445 "((\"where\",\<dots>), (\"|\",\<dots>))"}  | 
         | 
   446   | 
         | 
   447   Like before, you can sequentially connect parsers with @{ML "--"}. For example:  | 
         | 
   448   | 
         | 
   449 @{ML_response [display,gray] | 
         | 
   450 "let   | 
         | 
   451   val input = filtered_input \"| in\"  | 
         | 
   452 in   | 
         | 
   453   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input  | 
         | 
   454 end"  | 
         | 
   455 "((\"|\", \"in\"), [])"}  | 
         | 
   456   | 
         | 
   457   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty  | 
         | 
   458   list of items recognised by the parser @{text p}, where the items being parsed | 
         | 
   459   are separated by the string @{text s}. For example: | 
         | 
   460   | 
         | 
   461 @{ML_response [display,gray] | 
         | 
   462 "let   | 
         | 
   463   val input = filtered_input \"in | in | in foo\"  | 
         | 
   464 in   | 
         | 
   465   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input  | 
         | 
   466 end"   | 
         | 
   467 "([\"in\", \"in\", \"in\"], [\<dots>])"}  | 
         | 
   468   | 
         | 
   469   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must | 
         | 
   470   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the | 
         | 
   471   end of the parsed string, otherwise the parser would have consumed all  | 
         | 
   472   tokens and then failed with the exception @{text "MORE"}. Like in the | 
         | 
   473   previous section, we can avoid this exception using the wrapper @{ML | 
         | 
   474   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML | 
         | 
   475   OuterLex.stopper}. We can write:  | 
         | 
   476   | 
         | 
   477 @{ML_response [display,gray] | 
         | 
   478 "let   | 
         | 
   479   val input = filtered_input \"in | in | in\"  | 
         | 
   480 in   | 
         | 
   481   Scan.finite OuterLex.stopper   | 
         | 
   482          (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input  | 
         | 
   483 end"   | 
         | 
   484 "([\"in\", \"in\", \"in\"], [])"}  | 
         | 
   485   | 
         | 
   486   The following function will help to run examples.  | 
         | 
   487   | 
         | 
   488 *}  | 
         | 
   489   | 
         | 
   490 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} | 
         | 
   491   | 
         | 
   492 text {* | 
         | 
   493   | 
         | 
   494   The function @{ML "OuterParse.!!!"} can be used to force termination of the | 
         | 
   495   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section),  | 
         | 
   496   except that the error message is fixed to be @{text [quotes] "Outer syntax error"} | 
         | 
   497   with a relatively precise description of the failure. For example:  | 
         | 
   498   | 
         | 
   499 @{ML_response_fake [display,gray] | 
         | 
   500 "let   | 
         | 
   501   val input = filtered_input \"in |\"  | 
         | 
   502   val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"  | 
         | 
   503 in   | 
         | 
   504   parse (OuterParse.!!! parse_bar_then_in) input   | 
         | 
   505 end"  | 
         | 
   506 "Exception ERROR \"Outer syntax error: keyword \"|\" expected,   | 
         | 
   507 but keyword in was found\" raised"  | 
         | 
   508 }  | 
         | 
   509   | 
         | 
   510   \begin{exercise} (FIXME) | 
         | 
   511   A type-identifier, for example @{typ "'a"}, is a token of  | 
         | 
   512   kind @{ML "Keyword" in OuterLex}. It can be parsed using  | 
         | 
   513   the function @{ML OuterParse.type_ident}. | 
         | 
   514   \end{exercise} | 
         | 
   515   | 
         | 
   516   (FIXME: or give parser for numbers)  | 
         | 
   517   | 
         | 
   518   Whenever there is a possibility that the processing of user input can fail,   | 
         | 
   519   it is a good idea to give as much information about where the error   | 
         | 
   520   occured. For this Isabelle can attach positional information to tokens  | 
         | 
   521   and then thread this information up the processing chain. To see this,  | 
         | 
   522   modify the function @{ML filtered_input} described earlier to  | 
         | 
   523 *}  | 
         | 
   524   | 
         | 
   525 ML{*fun filtered_input' str =  | 
         | 
   526        filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *}  | 
         | 
   527   | 
         | 
   528 text {* | 
         | 
   529   where we pretend the parsed string starts on line 7. An example is  | 
         | 
   530   | 
         | 
   531 @{ML_response_fake [display,gray] | 
         | 
   532 "filtered_input' \"foo \\n bar\""  | 
         | 
   533 "[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \<dots>), | 
         | 
   534  Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"} | 
         | 
   535   | 
         | 
   536   in which the @{text [quotes] "\\n"} causes the second token to be in  | 
         | 
   537   line 8.  | 
         | 
   538   | 
         | 
   539   By using the parser @{ML OuterParse.position} you can decode the positional | 
         | 
   540   information and return it as part of the parsed input. For example  | 
         | 
   541   | 
         | 
   542 @{ML_response_fake [display,gray] | 
         | 
   543 "let  | 
         | 
   544   val input = (filtered_input' \"where\")  | 
         | 
   545 in   | 
         | 
   546   parse (OuterParse.position (OuterParse.$$$ \"where\")) input  | 
         | 
   547 end"  | 
         | 
   548 "((\"where\", {line=7, end_line=7}), [])"} | 
         | 
   549   | 
         | 
   550   \begin{readmore} | 
         | 
   551   The functions related to positions are implemented in the file  | 
         | 
   552   @{ML_file "Pure/General/position.ML"}. | 
         | 
   553   \end{readmore} | 
         | 
   554   | 
         | 
   555 *}  | 
         | 
   556   | 
         | 
   557 section {* Parsing Inner Syntax *} | 
         | 
   558   | 
         | 
   559 text {* | 
         | 
   560   There is usually no need to write your own parser for parsing inner syntax, that is   | 
         | 
   561   for terms and  types: you can just call the pre-defined parsers. Terms can   | 
         | 
   562   be parsed using the function @{ML OuterParse.term}. For example: | 
         | 
   563   | 
         | 
   564 @{ML_response [display,gray] | 
         | 
   565 "let   | 
         | 
   566   val input = OuterSyntax.scan Position.none \"foo\"  | 
         | 
   567 in   | 
         | 
   568   OuterParse.term input  | 
         | 
   569 end"  | 
         | 
   570 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}  | 
         | 
   571   | 
         | 
   572   The function @{ML OuterParse.prop} is similar, except that it gives a different | 
         | 
   573   error message, when parsing fails. As you can see, the parser not just returns   | 
         | 
   574   the parsed string, but also some encoded information. You can decode the  | 
         | 
   575   information with the function @{ML YXML.parse}. For example | 
         | 
   576   | 
         | 
   577   @{ML_response [display,gray] | 
         | 
   578   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""  | 
         | 
   579   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}  | 
         | 
   580   | 
         | 
   581   The result of the decoding is an XML-tree. You can see better what is going on if  | 
         | 
   582   you replace @{ML Position.none} by @{ML "Position.line 42"}, say: | 
         | 
   583   | 
         | 
   584 @{ML_response [display,gray] | 
         | 
   585 "let   | 
         | 
   586   val input = OuterSyntax.scan (Position.line 42) \"foo\"  | 
         | 
   587 in   | 
         | 
   588   YXML.parse (fst (OuterParse.term input))  | 
         | 
   589 end"  | 
         | 
   590 "XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}  | 
         | 
   591     | 
         | 
   592   The positional information is stored as part of an XML-tree so that code   | 
         | 
   593   called later on will be able to give more precise error messages.   | 
         | 
   594   | 
         | 
   595   \begin{readmore} | 
         | 
   596   The functions to do with input and output of XML and YXML are defined   | 
         | 
   597   in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}. | 
         | 
   598   \end{readmore} | 
         | 
   599     | 
         | 
   600 *}  | 
         | 
   601   | 
         | 
   602 section {* Parsing Specifications\label{sec:parsingspecs} *} | 
         | 
   603   | 
         | 
   604 text {* | 
         | 
   605   There are a number of special purpose parsers that help with parsing  | 
         | 
   606   specifications of function definitions, inductive predicates and so on. In  | 
         | 
   607   Capter~\ref{chp:package}, for example, we will need to parse specifications | 
         | 
   608   for inductive predicates of the form:  | 
         | 
   609 *}  | 
         | 
   610   | 
         | 
   611 simple_inductive  | 
         | 
   612   even and odd  | 
         | 
   613 where  | 
         | 
   614   even0: "even 0"  | 
         | 
   615 | evenS: "odd n \<Longrightarrow> even (Suc n)"  | 
         | 
   616 | oddS: "even n \<Longrightarrow> odd (Suc n)"  | 
         | 
   617   | 
         | 
   618 text {* | 
         | 
   619   For this we are going to use the parser:  | 
         | 
   620 *}  | 
         | 
   621   | 
         | 
   622 ML %linenosgray{*val spec_parser =  | 
         | 
   623      OuterParse.fixes --   | 
         | 
   624      Scan.optional   | 
         | 
   625        (OuterParse.$$$ "where" |--  | 
         | 
   626           OuterParse.!!!   | 
         | 
   627             (OuterParse.enum1 "|"   | 
         | 
   628                (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*}  | 
         | 
   629   | 
         | 
   630 text {* | 
         | 
   631   Note that the parser does not parse the keyword \simpleinductive, even if it is  | 
         | 
   632   meant to process definitions as shown above. The parser of the keyword   | 
         | 
   633   will be given by the infrastructure that will eventually call @{ML spec_parser}. | 
         | 
   634     | 
         | 
   635   | 
         | 
   636   To see what the parser returns, let us parse the string corresponding to the   | 
         | 
   637   definition of @{term even} and @{term odd}: | 
         | 
   638   | 
         | 
   639 @{ML_response [display,gray] | 
         | 
   640 "let  | 
         | 
   641   val input = filtered_input  | 
         | 
   642      (\"even and odd \" ^    | 
         | 
   643       \"where \" ^  | 
         | 
   644       \"  even0[intro]: \\\"even 0\\\" \" ^   | 
         | 
   645       \"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^   | 
         | 
   646       \"| oddS[intro]:  \\\"even n \<Longrightarrow> odd (Suc n)\\\"\")  | 
         | 
   647 in  | 
         | 
   648   parse spec_parser input  | 
         | 
   649 end"  | 
         | 
   650 "(([(even, NONE, NoSyn), (odd, NONE, NoSyn)],  | 
         | 
   651      [((even0,\<dots>), \"\\^E\\^Ftoken\\^Eeven 0\\^E\\^F\\^E\"),  | 
         | 
   652       ((evenS,\<dots>), \"\\^E\\^Ftoken\\^Eodd n \<Longrightarrow> even (Suc n)\\^E\\^F\\^E\"),  | 
         | 
   653       ((oddS,\<dots>), \"\\^E\\^Ftoken\\^Eeven n \<Longrightarrow> odd (Suc n)\\^E\\^F\\^E\")]), [])"}  | 
         | 
   654   | 
         | 
   655   As you see, the result is a pair consisting of a list of  | 
         | 
   656   variables with optional type-annotation and syntax-annotation, and a list of  | 
         | 
   657   rules where every rule has optionally a name and an attribute.  | 
         | 
   658   | 
         | 
   659   The function @{ML OuterParse.fixes} in Line 2 of the parser reads an  | 
         | 
   660   \isacommand{and}-separated  | 
         | 
   661   list of variables that can include optional type annotations and syntax translations.   | 
         | 
   662   For example:\footnote{Note that in the code we need to write  | 
         | 
   663   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes | 
         | 
   664   in the compound type.}  | 
         | 
   665   | 
         | 
   666 @{ML_response [display,gray] | 
         | 
   667 "let  | 
         | 
   668   val input = filtered_input   | 
         | 
   669         \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\"  | 
         | 
   670 in  | 
         | 
   671    parse OuterParse.fixes input  | 
         | 
   672 end"  | 
         | 
   673 "([(foo, SOME \"\\^E\\^Ftoken\\^Eint \<Rightarrow> bool\\^E\\^F\\^E\", NoSyn),   | 
         | 
   674   (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)),   | 
         | 
   675   (blonk, NONE, NoSyn)],[])"}    | 
         | 
   676 *}  | 
         | 
   677   | 
         | 
   678 text {* | 
         | 
   679   Whenever types are given, they are stored in the @{ML SOME}s. The types are | 
         | 
   680   not yet used to type the variables: this must be done by type-inference later  | 
         | 
   681   on. Since types are part of the inner syntax they are strings with some  | 
         | 
   682   encoded information (see previous section). If a syntax translation is  | 
         | 
   683   present for a variable, then it is stored in the @{ML Mixfix} datastructure; | 
         | 
   684   no syntax translation is indicated by @{ML NoSyn}. | 
         | 
   685   | 
         | 
   686   \begin{readmore} | 
         | 
   687   The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. | 
         | 
   688   \end{readmore} | 
         | 
   689   | 
         | 
   690   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a | 
         | 
   691   list of introduction rules, that is propositions with theorem  | 
         | 
   692   annotations. The introduction rules are propositions parsed by @{ML | 
         | 
   693   OuterParse.prop}. However, they can include an optional theorem name plus  | 
         | 
   694   some attributes. For example  | 
         | 
   695   | 
         | 
   696 @{ML_response [display,gray] "let  | 
         | 
   697   val input = filtered_input \"foo_lemma[intro,dest!]:\"  | 
         | 
   698   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input   | 
         | 
   699 in   | 
         | 
   700   (name, map Args.dest_src attrib)  | 
         | 
   701 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}  | 
         | 
   702    | 
         | 
   703   The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of | 
         | 
   704   @{ML thm_name in SpecParse}. Theorem names can contain attributes. The name  | 
         | 
   705   has to end with @{text [quotes] ":"}---see the argument of  | 
         | 
   706   the function @{ML SpecParse.opt_thm_name} in Line 7. | 
         | 
   707   | 
         | 
   708   \begin{readmore} | 
         | 
   709   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"}  | 
         | 
   710   and @{ML_file "Pure/Isar/args.ML"}. | 
         | 
   711   \end{readmore} | 
         | 
   712 *}  | 
         | 
   713   | 
         | 
   714 section {* New Commands and Keyword Files\label{sec:newcommand} *} | 
         | 
   715   | 
         | 
   716 text {* | 
         | 
   717   (FIXME: update to the right command setup)  | 
         | 
   718   | 
         | 
   719   Often new commands, for example for providing new definitional principles,  | 
         | 
   720   need to be implemented. While this is not difficult on the ML-level,  | 
         | 
   721   new commands, in order to be useful, need to be recognised by  | 
         | 
   722   ProofGeneral. This results in some subtle configuration issues, which we  | 
         | 
   723   will explain in this section.  | 
         | 
   724   | 
         | 
   725   To keep things simple, let us start with a ``silly'' command that does nothing   | 
         | 
   726   at all. We shall name this command \isacommand{foobar}. On the ML-level it can be  | 
         | 
   727   defined as:  | 
         | 
   728 *}  | 
         | 
   729   | 
         | 
   730 ML{*let | 
         | 
   731   val do_nothing = Scan.succeed (Toplevel.theory I)  | 
         | 
   732   val kind = OuterKeyword.thy_decl  | 
         | 
   733 in  | 
         | 
   734   OuterSyntax.command "foobar" "description of foobar" kind do_nothing  | 
         | 
   735 end *}  | 
         | 
   736   | 
         | 
   737 text {* | 
         | 
   738   The crucial function @{ML OuterSyntax.command} expects a name for the command, a | 
         | 
   739   short description, a kind indicator (which we will explain later on more thoroughly) and a  | 
         | 
   740   parser producing a top-level transition function (its purpose will also explained  | 
         | 
   741   later).   | 
         | 
   742   | 
         | 
   743   While this is everything you have to do on the ML-level, you need a keyword  | 
         | 
   744   file that can be loaded by ProofGeneral. This is to enable ProofGeneral to  | 
         | 
   745   recognise \isacommand{foobar} as a command. Such a keyword file can be | 
         | 
   746   generated with the command-line:  | 
         | 
   747   | 
         | 
   748   @{text [display] "$ isabelle keywords -k foobar some_log_files"} | 
         | 
   749   | 
         | 
   750   The option @{text "-k foobar"} indicates which postfix the name of the keyword file  | 
         | 
   751   will be assigned. In the case above the file will be named @{text | 
         | 
   752   "isar-keywords-foobar.el"}. This command requires log files to be  | 
         | 
   753   present (in order to extract the keywords from them). To generate these log  | 
         | 
   754   files, you first need to package the code above into a separate theory file named  | 
         | 
   755   @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the | 
         | 
   756   complete code.  | 
         | 
   757   | 
         | 
   758   | 
         | 
   759   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  | 
         | 
   760   \begin{figure}[t] | 
         | 
   761   \begin{graybox}\small | 
         | 
   762   \isacommand{theory}~@{text Command}\\ | 
         | 
   763   \isacommand{imports}~@{text Main}\\ | 
         | 
   764   \isacommand{begin}\\ | 
         | 
   765   \isacommand{ML}~@{text "\<verbopen>"}\\ | 
         | 
   766   @{ML | 
         | 
   767 "let  | 
         | 
   768   val do_nothing = Scan.succeed (Toplevel.theory I)  | 
         | 
   769   val kind = OuterKeyword.thy_decl  | 
         | 
   770 in  | 
         | 
   771   OuterSyntax.command \"foobar\" \"description of foobar\" kind do_nothing  | 
         | 
   772 end"}\\  | 
         | 
   773   @{text "\<verbclose>"}\\ | 
         | 
   774   \isacommand{end} | 
         | 
   775   \end{graybox} | 
         | 
   776   \caption{\small The file @{text "Command.thy"} is necessary for generating a log  | 
         | 
   777   file. This log file enables Isabelle to generate a keyword file containing   | 
         | 
   778   the command \isacommand{foobar}.\label{fig:commandtheory}} | 
         | 
   779   \end{figure} | 
         | 
   780   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  | 
         | 
   781   | 
         | 
   782   For our purposes it is sufficient to use the log files of the theories  | 
         | 
   783   @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as | 
         | 
   784   the log file for the theory @{text "Command.thy"}, which contains the new | 
         | 
   785   \isacommand{foobar}-command. If you target other logics besides HOL, such | 
         | 
   786   as Nominal or ZF, then you need to adapt the log files appropriately.  | 
         | 
   787     | 
         | 
   788   @{text Pure} and @{text HOL} are usually compiled during the installation of | 
         | 
   789   Isabelle. So log files for them should be already available. If not, then  | 
         | 
   790   they can be conveniently compiled with the help of the build-script from the Isabelle  | 
         | 
   791   distribution.  | 
         | 
   792   | 
         | 
   793   @{text [display]  | 
         | 
   794 "$ ./build -m \"Pure\"  | 
         | 
   795 $ ./build -m \"HOL\""}  | 
         | 
   796     | 
         | 
   797   The @{text "Pure-ProofGeneral"} theory needs to be compiled with: | 
         | 
   798   | 
         | 
   799   @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""} | 
         | 
   800   | 
         | 
   801   For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory  | 
         | 
   802   with:  | 
         | 
   803   | 
         | 
   804   @{text [display] "$ isabelle mkdir FoobarCommand"} | 
         | 
   805   | 
         | 
   806   This generates a directory containing the files:   | 
         | 
   807   | 
         | 
   808   @{text [display]  | 
         | 
   809 "./IsaMakefile  | 
         | 
   810 ./FoobarCommand/ROOT.ML  | 
         | 
   811 ./FoobarCommand/document  | 
         | 
   812 ./FoobarCommand/document/root.tex"}  | 
         | 
   813   | 
         | 
   814   | 
         | 
   815   You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"} | 
         | 
   816   and add the line   | 
         | 
   817   | 
         | 
   818   @{text [display] "use_thy \"Command\";"}  | 
         | 
   819     | 
         | 
   820   to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing: | 
         | 
   821   | 
         | 
   822   @{text [display] "$ isabelle make"} | 
         | 
   823   | 
         | 
   824   If the compilation succeeds, you have finally created all the necessary log files.   | 
         | 
   825   They are stored in the directory   | 
         | 
   826     | 
         | 
   827   @{text [display]  "~/.isabelle/heaps/Isabelle2008/polyml-5.2.1_x86-linux/log"} | 
         | 
   828   | 
         | 
   829   or something similar depending on your Isabelle distribution and architecture.  | 
         | 
   830   One quick way to assign a shell variable to this directory is by typing  | 
         | 
   831   | 
         | 
   832   @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"} | 
         | 
   833    | 
         | 
   834   on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the  | 
         | 
   835   directory should include the files:  | 
         | 
   836   | 
         | 
   837   @{text [display]  | 
         | 
   838 "Pure.gz  | 
         | 
   839 HOL.gz  | 
         | 
   840 Pure-ProofGeneral.gz  | 
         | 
   841 HOL-FoobarCommand.gz"}   | 
         | 
   842   | 
         | 
   843   From them you can create the keyword files. Assuming the name   | 
         | 
   844   of the directory is in  @{text "$ISABELLE_LOGS"}, | 
         | 
   845   then the Unix command for creating the keyword file is:  | 
         | 
   846   | 
         | 
   847 @{text [display] | 
         | 
   848 "$ isabelle keywords -k foobar   | 
         | 
   849    $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"} | 
         | 
   850   | 
         | 
   851   The result is the file @{text "isar-keywords-foobar.el"}. It should contain | 
         | 
   852   the string @{text "foobar"} twice.\footnote{To see whether things are fine, check | 
         | 
   853   that @{text "grep foobar"} on this file returns something | 
         | 
   854   non-empty.}  This keyword file needs to  | 
         | 
   855   be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle | 
         | 
   856   aware of this keyword file, you have to start Isabelle with the option @{text | 
         | 
   857   "-k foobar"}, that is:  | 
         | 
   858   | 
         | 
   859   | 
         | 
   860   @{text [display] "$ isabelle emacs -k foobar a_theory_file"} | 
         | 
   861   | 
         | 
   862   If you now build a theory on top of @{text "Command.thy"},  | 
         | 
   863   then the command \isacommand{foobar} can be used.  | 
         | 
   864   Similarly with any other new command.   | 
         | 
   865   | 
         | 
   866   | 
         | 
   867   At the moment \isacommand{foobar} is not very useful. Let us refine it a bit  | 
         | 
   868   next by letting it take a proposition as argument and printing this proposition   | 
         | 
   869   inside the tracing buffer.   | 
         | 
   870   | 
         | 
   871   The crucial part of a command is the function that determines the behaviour  | 
         | 
   872   of the command. In the code above we used a ``do-nothing''-function, which  | 
         | 
   873   because of @{ML Scan.succeed} does not parse any argument, but immediately | 
         | 
   874   returns the simple toplevel function @{ML "Toplevel.theory I"}. We can | 
         | 
   875   replace this code by a function that first parses a proposition (using the  | 
         | 
   876   parser @{ML OuterParse.prop}), then prints out the tracing | 
         | 
   877   information (using a new top-level function @{text trace_top_lvl}) and  | 
         | 
   878   finally does nothing. For this you can write:  | 
         | 
   879 *}  | 
         | 
   880   | 
         | 
   881 ML{*let | 
         | 
   882   fun trace_top_lvl str =   | 
         | 
   883      Toplevel.theory (fn thy => (tracing str; thy))  | 
         | 
   884   | 
         | 
   885   val trace_prop = OuterParse.prop >> trace_top_lvl  | 
         | 
   886   | 
         | 
   887   val kind = OuterKeyword.thy_decl  | 
         | 
   888 in  | 
         | 
   889   OuterSyntax.command "foobar" "traces a proposition" kind trace_prop  | 
         | 
   890 end *}  | 
         | 
   891   | 
         | 
   892 text {* | 
         | 
   893   Now you can type  | 
         | 
   894   | 
         | 
   895   \begin{isabelle} | 
         | 
   896   \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\ | 
         | 
   897   @{text "> \"True \<and> False\""} | 
         | 
   898   \end{isabelle} | 
         | 
   899     | 
         | 
   900   and see the proposition in the tracing buffer.    | 
         | 
   901   | 
         | 
   902   Note that so far we used @{ML thy_decl in OuterKeyword} as the kind indicator | 
         | 
   903   for the command.  This means that the command finishes as soon as the  | 
         | 
   904   arguments are processed. Examples of this kind of commands are  | 
         | 
   905   \isacommand{definition} and \isacommand{declare}.  In other cases, | 
         | 
   906   commands are expected to parse some arguments, for example a proposition,  | 
         | 
   907   and then ``open up'' a proof in order to prove the proposition (for example  | 
         | 
   908   \isacommand{lemma}) or prove some other properties (for example | 
         | 
   909   \isacommand{function}). To achieve this kind of behaviour, you have to use the kind | 
         | 
   910   indicator @{ML thy_goal in OuterKeyword}.  Note, however, once you change the  | 
         | 
   911   ``kind'' of a command from @{ML thy_decl in OuterKeyword} to @{ML thy_goal in OuterKeyword}  | 
         | 
   912   then the keyword file needs to be re-created!  | 
         | 
   913   | 
         | 
   914   Below we change \isacommand{foobar} so that it takes a proposition as | 
         | 
   915   argument and then starts a proof in order to prove it. Therefore in Line 13,   | 
         | 
   916   we set the kind indicator to @{ML thy_goal in OuterKeyword}. | 
         | 
   917 *}  | 
         | 
   918   | 
         | 
   919 ML%linenosgray{*let | 
         | 
   920   fun set_up_thm str ctxt =  | 
         | 
   921     let  | 
         | 
   922       val prop = Syntax.read_prop ctxt str  | 
         | 
   923     in  | 
         | 
   924       Proof.theorem_i NONE (K I) [[(prop,[])]] ctxt  | 
         | 
   925     end;  | 
         | 
   926     | 
         | 
   927   val prove_prop = OuterParse.prop >>    | 
         | 
   928       (fn str => Toplevel.print o   | 
         | 
   929                     Toplevel.local_theory_to_proof NONE (set_up_thm str))  | 
         | 
   930     | 
         | 
   931   val kind = OuterKeyword.thy_goal  | 
         | 
   932 in  | 
         | 
   933   OuterSyntax.command "foobar" "proving a proposition" kind prove_prop  | 
         | 
   934 end *}  | 
         | 
   935   | 
         | 
   936 text {* | 
         | 
   937   The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be | 
         | 
   938   proved) and a context as argument.  The context is necessary in order to be able to use  | 
         | 
   939   @{ML Syntax.read_prop}, which converts a string into a proper proposition. | 
         | 
   940   In Line 6 the function @{ML Proof.theorem_i} starts the proof for the | 
         | 
   941   proposition. Its argument @{ML NONE} stands for a locale (which we chose to | 
         | 
   942   omit); the argument @{ML "(K I)"} stands for a function that determines what | 
         | 
   943   should be done with the theorem once it is proved (we chose to just forget  | 
         | 
   944   about it). Lines 9 to 11 contain the parser for the proposition.  | 
         | 
   945   | 
         | 
   946   If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following  | 
         | 
   947   proof state:  | 
         | 
   948   | 
         | 
   949   \begin{isabelle} | 
         | 
   950   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\ | 
         | 
   951   @{text "goal (1 subgoal):"}\\ | 
         | 
   952   @{text "1. True \<and> True"} | 
         | 
   953   \end{isabelle} | 
         | 
   954   | 
         | 
   955   and you can build the proof  | 
         | 
   956   | 
         | 
   957   \begin{isabelle} | 
         | 
   958   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\ | 
         | 
   959   \isacommand{apply}@{text "(rule conjI)"}\\ | 
         | 
   960   \isacommand{apply}@{text "(rule TrueI)+"}\\ | 
         | 
   961   \isacommand{done} | 
         | 
   962   \end{isabelle} | 
         | 
   963   | 
         | 
   964    | 
         | 
   965     | 
         | 
   966   (FIXME What do @{ML "Toplevel.theory"}  | 
         | 
   967   @{ML "Toplevel.print"}  | 
         | 
   968   @{ML Toplevel.local_theory} do?) | 
         | 
   969   | 
         | 
   970   (FIXME read a name and show how to store theorems)  | 
         | 
   971   | 
         | 
   972 *}  | 
         | 
   973   | 
         | 
   974 section {* Methods *} | 
         | 
   975   | 
         | 
   976 text {* | 
         | 
   977   Methods are a central concept in Isabelle. They are the ones you use for example  | 
         | 
   978   in \isacommand{apply}. To print out all currently known methods you can use the  | 
         | 
   979   Isabelle command.   | 
         | 
   980 *}  | 
         | 
   981   | 
         | 
   982 print_methods  | 
         | 
   983   | 
         | 
   984 text {* | 
         | 
   985   An example of a very simple method is the following code.  | 
         | 
   986 *}  | 
         | 
   987   | 
         | 
   988 method_setup %gray foobar_meth =   | 
         | 
   989  {* Scan.succeed | 
         | 
   990       (K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *} | 
         | 
   991          "foobar method"  | 
         | 
   992   | 
         | 
   993 text {* | 
         | 
   994   It defines the method @{text foobar_meth}, which takes no arguments (therefore the | 
         | 
   995   parser @{ML Scan.succeed}) and  | 
         | 
   996   only applies the tactic @{thm [source] conjE} and then @{thm [source] conjI}. | 
         | 
   997   This method can be used in the following proof  | 
         | 
   998 *}  | 
         | 
   999   | 
         | 
  1000 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"  | 
         | 
  1001 apply(foobar_meth)  | 
         | 
  1002 txt {* | 
         | 
  1003   where it results in the goal state  | 
         | 
  1004   | 
         | 
  1005   \begin{minipage}{\textwidth} | 
         | 
  1006   @{subgoals} | 
         | 
  1007   \end{minipage} *} | 
         | 
  1008 (*<*)oops(*>*)  | 
         | 
  1009   | 
         | 
  1010 text {* | 
         | 
  1011   (FIXME: explain a version of rule-tac)  | 
         | 
  1012 *}  | 
         | 
  1013   | 
         | 
  1014 (*<*)  | 
         | 
  1015   | 
         | 
  1016 chapter {* Parsing *} | 
         | 
  1017   | 
         | 
  1018 text {* | 
         | 
  1019   | 
         | 
  1020   Lots of Standard ML code is given in this document, for various reasons,  | 
         | 
  1021   including:  | 
         | 
  1022   \begin{itemize} | 
         | 
  1023   \item direct quotation of code found in the Isabelle source files,  | 
         | 
  1024   or simplified versions of such code  | 
         | 
  1025   \item identifiers found in the Isabelle source code, with their types   | 
         | 
  1026   (or specialisations of their types)  | 
         | 
  1027   \item code examples, which can be run by the reader, to help illustrate the  | 
         | 
  1028   behaviour of functions found in the Isabelle source code  | 
         | 
  1029   \item ancillary functions, not from the Isabelle source code,   | 
         | 
  1030   which enable the reader to run relevant code examples  | 
         | 
  1031   \item type abbreviations, which help explain the uses of certain functions  | 
         | 
  1032   \end{itemize} | 
         | 
  1033   | 
         | 
  1034 *}  | 
         | 
  1035   | 
         | 
  1036 section {* Parsing Isar input *} | 
         | 
  1037   | 
         | 
  1038 text {* | 
         | 
  1039   | 
         | 
  1040   The typical parsing function has the type  | 
         | 
  1041   \texttt{'src -> 'res * 'src}, with input   | 
         | 
  1042   of type \texttt{'src}, returning a result  | 
         | 
  1043   of type \texttt{'res}, which is (or is derived from) the first part of the | 
         | 
  1044   input, and also returning the remainder of the input.  | 
         | 
  1045   (In the common case, when it is clear what the ``remainder of the input''  | 
         | 
  1046   means, we will just say that the functions ``returns'' the  | 
         | 
  1047   value of type \texttt{'res}).  | 
         | 
  1048   An exception is raised if an appropriate value   | 
         | 
  1049   cannot be produced from the input.  | 
         | 
  1050   A range of exceptions can be used to identify different reasons   | 
         | 
  1051   for the failure of a parse.  | 
         | 
  1052     | 
         | 
  1053   This contrasts the standard parsing function in Standard ML,  | 
         | 
  1054   which is of type   | 
         | 
  1055   \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option}; | 
         | 
  1056   (for example, \texttt{List.getItem} and \texttt{Substring.getc}). | 
         | 
  1057   However, much of the discussion at   | 
         | 
  1058   FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html  | 
         | 
  1059   is relevant.  | 
         | 
  1060   | 
         | 
  1061   Naturally one may convert between the two different sorts of parsing functions  | 
         | 
  1062   as follows:  | 
         | 
  1063   \begin{verbatim} | 
         | 
  1064   open StringCvt ;  | 
         | 
  1065   type ('res, 'src) ex_reader = 'src -> 'res * 'src | 
         | 
  1066   ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader  | 
         | 
  1067   fun ex_reader rdr src = Option.valOf (rdr src) ;  | 
         | 
  1068   reader : ('res, 'src) ex_reader -> ('res, 'src) reader  | 
         | 
  1069   fun reader exrdr src = SOME (exrdr src) handle _ => NONE ;  | 
         | 
  1070   \end{verbatim} | 
         | 
  1071     | 
         | 
  1072 *}  | 
         | 
  1073   | 
         | 
  1074 section{* The \texttt{Scan} structure *} | 
         | 
  1075   | 
         | 
  1076 text {*  | 
         | 
  1077   The source file is \texttt{src/General/scan.ML}. | 
         | 
  1078   This structure provides functions for using and combining parsing functions  | 
         | 
  1079   of the type \texttt{'src -> 'res * 'src}. | 
         | 
  1080   Three exceptions are used:  | 
         | 
  1081   \begin{verbatim} | 
         | 
  1082   exception MORE of string option;  (*need more input (prompt)*)  | 
         | 
  1083   exception FAIL of string option;  (*try alternatives (reason of failure)*)  | 
         | 
  1084   exception ABORT of string;        (*dead end*)  | 
         | 
  1085   \end{verbatim} | 
         | 
  1086   Many functions in this structure (generally those with names composed of  | 
         | 
  1087   symbols) are declared as infix.  | 
         | 
  1088   | 
         | 
  1089   Some functions from that structure are  | 
         | 
  1090   \begin{verbatim} | 
         | 
  1091   |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> | 
         | 
  1092   'src -> 'res2 * 'src''  | 
         | 
  1093   --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> | 
         | 
  1094   'src -> 'res1 * 'src''  | 
         | 
  1095   -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> | 
         | 
  1096   'src -> ('res1 * 'res2) * 'src'' | 
         | 
  1097   ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') -> | 
         | 
  1098   'src -> string * 'src''  | 
         | 
  1099   \end{verbatim} | 
         | 
  1100   These functions parse a result off the input source twice.  | 
         | 
  1101   | 
         | 
  1102   \texttt{|--} and \texttt{--|}  | 
         | 
  1103   return the first result and the second result, respectively.  | 
         | 
  1104   | 
         | 
  1105   \texttt{--} returns both. | 
         | 
  1106   | 
         | 
  1107   \verb|^^| returns the result of concatenating the two results  | 
         | 
  1108   (which must be strings).  | 
         | 
  1109   | 
         | 
  1110   Note how, although the types   | 
         | 
  1111   \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same, | 
         | 
  1112   the types as shown help suggest the behaviour of the functions.  | 
         | 
  1113   \begin{verbatim} | 
         | 
  1114   :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> | 
         | 
  1115   'src -> ('res1 * 'res2) * 'src'' | 
         | 
  1116   :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> | 
         | 
  1117   'src -> 'res2 * 'src''  | 
         | 
  1118   \end{verbatim} | 
         | 
  1119   These are similar to \texttt{|--} and \texttt{--|}, | 
         | 
  1120   except that the second parsing function can depend on the result of the first.  | 
         | 
  1121   \begin{verbatim} | 
         | 
  1122   >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src' | 
         | 
  1123   || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src | 
         | 
  1124   \end{verbatim} | 
         | 
  1125   \texttt{p >> f} applies a function \texttt{f} to the result of a parse. | 
         | 
  1126     | 
         | 
  1127   \texttt{||} tries a second parsing function if the first one | 
         | 
  1128   fails by raising an exception of the form \texttt{FAIL \_}. | 
         | 
  1129     | 
         | 
  1130   \begin{verbatim} | 
         | 
  1131   succeed : 'res -> ('src -> 'res * 'src) ; | 
         | 
  1132   fail : ('src -> 'res_src) ; | 
         | 
  1133   !! : ('src * string option -> string) ->  | 
         | 
  1134   ('src -> 'res_src) -> ('src -> 'res_src) ; | 
         | 
  1135   \end{verbatim} | 
         | 
  1136   \texttt{succeed r} returns \texttt{r}, with the input unchanged. | 
         | 
  1137   \texttt{fail} always fails, raising exception \texttt{FAIL NONE}. | 
         | 
  1138   \texttt{!! f} only affects the failure mode, turning a failure that | 
         | 
  1139   raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}. | 
         | 
  1140   This is used to prevent recovery from the failure ---  | 
         | 
  1141   thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails,  | 
         | 
  1142   it won't recover by trying \texttt{parse2}. | 
         | 
  1143   | 
         | 
  1144   \begin{verbatim} | 
         | 
  1145   one : ('si -> bool) -> ('si list -> 'si * 'si list) ; | 
         | 
  1146   some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ; | 
         | 
  1147   \end{verbatim} | 
         | 
  1148   These require the input to be a list of items:  | 
         | 
  1149   they fail, raising \texttt{MORE NONE} if the list is empty. | 
         | 
  1150   On other failures they raise \texttt{FAIL NONE}  | 
         | 
  1151   | 
         | 
  1152   \texttt{one p} takes the first | 
         | 
  1153   item from the list if it satisfies \texttt{p}, otherwise fails. | 
         | 
  1154   | 
         | 
  1155   \texttt{some f} takes the first | 
         | 
  1156   item from the list and applies \texttt{f} to it, failing if this returns | 
         | 
  1157   \texttt{NONE}.   | 
         | 
  1158   | 
         | 
  1159   \begin{verbatim} | 
         | 
  1160   many : ('si -> bool) -> 'si list -> 'si list * 'si list ;  | 
         | 
  1161   \end{verbatim} | 
         | 
  1162   \texttt{many p} takes items from the input until it encounters one  | 
         | 
  1163   which does not satisfy \texttt{p}.  If it reaches the end of the input | 
         | 
  1164   it fails, raising \texttt{MORE NONE}. | 
         | 
  1165   | 
         | 
  1166   \texttt{many1} (with the same type) fails if the first item  | 
         | 
  1167   does not satisfy \texttt{p}.   | 
         | 
  1168   | 
         | 
  1169   \begin{verbatim} | 
         | 
  1170   option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src) | 
         | 
  1171   optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src) | 
         | 
  1172   \end{verbatim} | 
         | 
  1173   \texttt{option}:  | 
         | 
  1174   where the parser \texttt{f} succeeds with result \texttt{r}  | 
         | 
  1175   or raises \texttt{FAIL \_}, | 
         | 
  1176   \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}. | 
         | 
  1177     | 
         | 
  1178   \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_}, | 
         | 
  1179   \texttt{optional f default} provides the result \texttt{default}. | 
         | 
  1180   | 
         | 
  1181   \begin{verbatim} | 
         | 
  1182   repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src | 
         | 
  1183   repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src | 
         | 
  1184   bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src  | 
         | 
  1185   \end{verbatim} | 
         | 
  1186   \texttt{repeat f} repeatedly parses an item off the remaining input until  | 
         | 
  1187   \texttt{f} fails with \texttt{FAIL \_} | 
         | 
  1188   | 
         | 
  1189   \texttt{repeat1} is as for \texttt{repeat}, but requires at least one | 
         | 
  1190   successful parse.  | 
         | 
  1191   | 
         | 
  1192   \begin{verbatim} | 
         | 
  1193   lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src)) | 
         | 
  1194   \end{verbatim} | 
         | 
  1195   \texttt{lift} changes the source type of a parser by putting in an extra | 
         | 
  1196   component \texttt{'ex}, which is ignored in the parsing. | 
         | 
  1197   | 
         | 
  1198   The \texttt{Scan} structure also provides the type \texttt{lexicon},  | 
         | 
  1199   HOW DO THEY WORK ?? TO BE COMPLETED  | 
         | 
  1200   \begin{verbatim} | 
         | 
  1201   dest_lexicon: lexicon -> string list ;  | 
         | 
  1202   make_lexicon: string list list -> lexicon ;  | 
         | 
  1203   empty_lexicon: lexicon ;  | 
         | 
  1204   extend_lexicon: string list list -> lexicon -> lexicon ;  | 
         | 
  1205   merge_lexicons: lexicon -> lexicon -> lexicon ;  | 
         | 
  1206   is_literal: lexicon -> string list -> bool ;  | 
         | 
  1207   literal: lexicon -> string list -> string list * string list ;  | 
         | 
  1208   \end{verbatim} | 
         | 
  1209   Two lexicons, for the commands and keywords, are stored and can be retrieved  | 
         | 
  1210   by:  | 
         | 
  1211   \begin{verbatim} | 
         | 
  1212   val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ;  | 
         | 
  1213   val commands = Scan.dest_lexicon command_lexicon ;  | 
         | 
  1214   val keywords = Scan.dest_lexicon keyword_lexicon ;  | 
         | 
  1215   \end{verbatim} | 
         | 
  1216 *}  | 
         | 
  1217   | 
         | 
  1218 section{* The \texttt{OuterLex} structure *} | 
         | 
  1219   | 
         | 
  1220 text {* | 
         | 
  1221   The source file is @{text "src/Pure/Isar/outer_lex.ML"}. | 
         | 
  1222   In some other source files its name is abbreviated:  | 
         | 
  1223   \begin{verbatim} | 
         | 
  1224   structure T = OuterLex;  | 
         | 
  1225   \end{verbatim} | 
         | 
  1226   This structure defines the type \texttt{token}. | 
         | 
  1227   (The types  | 
         | 
  1228   \texttt{OuterLex.token}, | 
         | 
  1229   \texttt{OuterParse.token} and | 
         | 
  1230   \texttt{SpecParse.token} are all the same). | 
         | 
  1231     | 
         | 
  1232   Input text is split up into tokens, and the input source type for many parsing  | 
         | 
  1233   functions is \texttt{token list}. | 
         | 
  1234   | 
         | 
  1235   The datatype definition (which is not published in the signature) is  | 
         | 
  1236   \begin{verbatim} | 
         | 
  1237   datatype token = Token of Position.T * (token_kind * string);  | 
         | 
  1238   \end{verbatim} | 
         | 
  1239   but here are some runnable examples for viewing tokens:   | 
         | 
  1240   | 
         | 
  1241 *}  | 
         | 
  1242   | 
         | 
  1243   | 
         | 
  1244   | 
         | 
  1245   | 
         | 
  1246 ML{* | 
         | 
  1247   val toks = OuterSyntax.scan Position.none  | 
         | 
  1248    "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ; | 
         | 
  1249 *}  | 
         | 
  1250   | 
         | 
  1251 ML{* | 
         | 
  1252   print_depth 20 ;  | 
         | 
  1253 *}  | 
         | 
  1254   | 
         | 
  1255 ML{* | 
         | 
  1256   map OuterLex.text_of toks ;  | 
         | 
  1257 *}  | 
         | 
  1258   | 
         | 
  1259 ML{* | 
         | 
  1260   val proper_toks = filter OuterLex.is_proper toks ;  | 
         | 
  1261 *}    | 
         | 
  1262   | 
         | 
  1263 ML{* | 
         | 
  1264   map OuterLex.kind_of proper_toks   | 
         | 
  1265 *}  | 
         | 
  1266   | 
         | 
  1267 ML{* | 
         | 
  1268   map OuterLex.unparse proper_toks ;  | 
         | 
  1269 *}  | 
         | 
  1270   | 
         | 
  1271 ML{* | 
         | 
  1272   OuterLex.stopper  | 
         | 
  1273 *}  | 
         | 
  1274   | 
         | 
  1275 text {* | 
         | 
  1276   | 
         | 
  1277   The function \texttt{is\_proper : token -> bool} identifies tokens which are | 
         | 
  1278   not white space or comments: many parsing functions assume require spaces or  | 
         | 
  1279   comments to have been filtered out.  | 
         | 
  1280     | 
         | 
  1281   There is a special end-of-file token:  | 
         | 
  1282   \begin{verbatim} | 
         | 
  1283   val (tok_eof : token, is_eof : token -> bool) = T.stopper ;   | 
         | 
  1284   (* end of file token *)  | 
         | 
  1285   \end{verbatim} | 
         | 
  1286   | 
         | 
  1287 *}  | 
         | 
  1288   | 
         | 
  1289 section {* The \texttt{OuterParse} structure *} | 
         | 
  1290   | 
         | 
  1291 text {* | 
         | 
  1292   The source file is \texttt{src/Pure/Isar/outer\_parse.ML}. | 
         | 
  1293   In some other source files its name is abbreviated:  | 
         | 
  1294   \begin{verbatim} | 
         | 
  1295   structure P = OuterParse;  | 
         | 
  1296   \end{verbatim} | 
         | 
  1297   Here the parsers use \texttt{token list} as the input source type.  | 
         | 
  1298     | 
         | 
  1299   Some of the parsers simply select the first token, provided that it is of the  | 
         | 
  1300   right kind (as returned by \texttt{T.kind\_of}): these are  | 
         | 
  1301   \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var, | 
         | 
  1302   type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof}  | 
         | 
  1303   Others select the first token, provided that it is one of several kinds,  | 
         | 
  1304   (eg, \texttt{name, xname, text, typ}). | 
         | 
  1305   | 
         | 
  1306   \begin{verbatim} | 
         | 
  1307   type 'a tlp = token list -> 'a * token list ; (* token list parser *)  | 
         | 
  1308   $$$ : string -> string tlp  | 
         | 
  1309   nat : int tlp ;  | 
         | 
  1310   maybe : 'a tlp -> 'a option tlp ;  | 
         | 
  1311   \end{verbatim} | 
         | 
  1312   | 
         | 
  1313   \texttt{\$\$\$ s} returns the first token, | 
         | 
  1314   if it equals \texttt{s} \emph{and} \texttt{s} is a keyword. | 
         | 
  1315   | 
         | 
  1316   \texttt{nat} returns the first token, if it is a number, and evaluates it. | 
         | 
  1317   | 
         | 
  1318   \texttt{maybe}: if \texttt{p} returns \texttt{r},  | 
         | 
  1319   then \texttt{maybe p} returns \texttt{SOME r} ; | 
         | 
  1320   if the first token is an underscore, it returns \texttt{NONE}. | 
         | 
  1321   | 
         | 
  1322   A few examples:  | 
         | 
  1323   \begin{verbatim} | 
         | 
  1324   P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *)  | 
         | 
  1325   P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *)  | 
         | 
  1326   val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ;  | 
         | 
  1327   val proper_toks = List.filter T.is_proper toks ;  | 
         | 
  1328   P.list P.nat toks ; (* OK, doesn't recognize white space *)  | 
         | 
  1329   P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *)  | 
         | 
  1330   P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *)  | 
         | 
  1331   P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *)  | 
         | 
  1332   val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ;  | 
         | 
  1333   P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *)  | 
         | 
  1334   \end{verbatim} | 
         | 
  1335   | 
         | 
  1336   The following code helps run examples:  | 
         | 
  1337   \begin{verbatim} | 
         | 
  1338   fun parse_str tlp str =   | 
         | 
  1339   let val toks : token list = OuterSyntax.scan str ;  | 
         | 
  1340   val proper_toks = List.filter T.is_proper toks @ [tok_eof] ;  | 
         | 
  1341   val (res, rem_toks) = tlp proper_toks ;  | 
         | 
  1342   val rem_str = String.concat  | 
         | 
  1343   (Library.separate " " (List.map T.unparse rem_toks)) ;  | 
         | 
  1344   in (res, rem_str) end ;  | 
         | 
  1345   \end{verbatim} | 
         | 
  1346   | 
         | 
  1347   Some examples from \texttt{src/Pure/Isar/outer\_parse.ML} | 
         | 
  1348   \begin{verbatim} | 
         | 
  1349   val type_args =  | 
         | 
  1350   type_ident >> Library.single ||  | 
         | 
  1351   $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") || | 
         | 
  1352   Scan.succeed [];  | 
         | 
  1353   \end{verbatim} | 
         | 
  1354   There are three ways parsing a list of type arguments can succeed.  | 
         | 
  1355   The first line reads a single type argument, and turns it into a singleton  | 
         | 
  1356   list.  | 
         | 
  1357   The second line reads "(", and then the remainder, ignoring the "(" ; | 
         | 
  1358   the remainder consists of a list of type identifiers (at least one),  | 
         | 
  1359   and then a ")" which is also ignored.  | 
         | 
  1360   The \texttt{!!!} ensures that if the parsing proceeds this far and then fails, | 
         | 
  1361   it won't try the third line (see the description of \texttt{Scan.!!}). | 
         | 
  1362   The third line consumes no input and returns the empty list.  | 
         | 
  1363   | 
         | 
  1364   \begin{verbatim} | 
         | 
  1365   fun triple2 (x, (y, z)) = (x, y, z);  | 
         | 
  1366   val arity = xname -- ($$$ "::" |-- !!! (  | 
         | 
  1367   Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) [] | 
         | 
  1368   -- sort)) >> triple2;  | 
         | 
  1369   \end{verbatim} | 
         | 
  1370   The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is | 
         | 
  1371   ignored), then optionally a list $ss$ of sorts and then another sort $s$.  | 
         | 
  1372   The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$. | 
         | 
  1373   The second line reads the optional list of sorts:  | 
         | 
  1374   it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored, | 
         | 
  1375   and between them a comma-separated list of sorts.  | 
         | 
  1376   If this list is absent, the default \texttt{[]} provides the list of sorts. | 
         | 
  1377   | 
         | 
  1378   \begin{verbatim} | 
         | 
  1379   parse_str P.type_args "('a, 'b) ntyp" ; | 
         | 
  1380   parse_str P.type_args "'a ntyp" ;  | 
         | 
  1381   parse_str P.type_args "ntyp" ;  | 
         | 
  1382   parse_str P.arity "ty :: tycl" ;  | 
         | 
  1383   parse_str P.arity "ty :: (tycl1, tycl2) tycl" ;  | 
         | 
  1384   \end{verbatim} | 
         | 
  1385   | 
         | 
  1386 *}  | 
         | 
  1387   | 
         | 
  1388 section {* The \texttt{SpecParse} structure *} | 
         | 
  1389   | 
         | 
  1390 text {* | 
         | 
  1391   The source file is \texttt{src/Pure/Isar/spec\_parse.ML}. | 
         | 
  1392   This structure contains token list parsers for more complicated values.  | 
         | 
  1393   For example,   | 
         | 
  1394   \begin{verbatim} | 
         | 
  1395   open SpecParse ;  | 
         | 
  1396   attrib : Attrib.src tok_rdr ;   | 
         | 
  1397   attribs : Attrib.src list tok_rdr ;  | 
         | 
  1398   opt_attribs : Attrib.src list tok_rdr ;  | 
         | 
  1399   xthm : (thmref * Attrib.src list) tok_rdr ;  | 
         | 
  1400   xthms1 : (thmref * Attrib.src list) list tok_rdr ;  | 
         | 
  1401     | 
         | 
  1402   parse_str attrib "simp" ;  | 
         | 
  1403   parse_str opt_attribs "hello" ;  | 
         | 
  1404   val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ;  | 
         | 
  1405   map Args.dest_src ass ;  | 
         | 
  1406   val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ;  | 
         | 
  1407     | 
         | 
  1408   parse_str xthm "mythm [attr]" ;  | 
         | 
  1409   parse_str xthms1 "thm1 [attr] thms2" ;  | 
         | 
  1410   \end{verbatim} | 
         | 
  1411     | 
         | 
  1412   As you can see, attributes are described using types of the \texttt{Args} | 
         | 
  1413   structure, described below.  | 
         | 
  1414 *}  | 
         | 
  1415   | 
         | 
  1416 section{* The \texttt{Args} structure *} | 
         | 
  1417   | 
         | 
  1418 text {* | 
         | 
  1419   The source file is \texttt{src/Pure/Isar/args.ML}. | 
         | 
  1420   The primary type of this structure is the \texttt{src} datatype; | 
         | 
  1421   the single constructors not published in the signature, but   | 
         | 
  1422   \texttt{Args.src} and \texttt{Args.dest\_src} | 
         | 
  1423   are in fact the constructor and destructor functions.  | 
         | 
  1424   Note that the types \texttt{Attrib.src} and \texttt{Method.src} | 
         | 
  1425   are in fact \texttt{Args.src}. | 
         | 
  1426   | 
         | 
  1427   \begin{verbatim} | 
         | 
  1428   src : (string * Args.T list) * Position.T -> Args.src ;  | 
         | 
  1429   dest_src : Args.src -> (string * Args.T list) * Position.T ;  | 
         | 
  1430   Args.pretty_src : Proof.context -> Args.src -> Pretty.T ;  | 
         | 
  1431   fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ;  | 
         | 
  1432   | 
         | 
  1433   val thy = ML_Context.the_context () ;  | 
         | 
  1434   val ctxt = ProofContext.init thy ;  | 
         | 
  1435   map (pr_src ctxt) ass ;  | 
         | 
  1436   \end{verbatim} | 
         | 
  1437   | 
         | 
  1438   So an \texttt{Args.src} consists of the first word, then a list of further  | 
         | 
  1439   ``arguments'', of type \texttt{Args.T}, with information about position in the | 
         | 
  1440   input.  | 
         | 
  1441   \begin{verbatim} | 
         | 
  1442   (* how an Args.src is parsed *)  | 
         | 
  1443   P.position : 'a tlp -> ('a * Position.T) tlp ; | 
         | 
  1444   P.arguments : Args.T list tlp ;  | 
         | 
  1445   | 
         | 
  1446   val parse_src : Args.src tlp =  | 
         | 
  1447   P.position (P.xname -- P.arguments) >> Args.src ;  | 
         | 
  1448   \end{verbatim} | 
         | 
  1449   | 
         | 
  1450   \begin{verbatim} | 
         | 
  1451   val ((first_word, args), pos) = Args.dest_src asrc ;  | 
         | 
  1452   map Args.string_of args ;  | 
         | 
  1453   \end{verbatim} | 
         | 
  1454   | 
         | 
  1455   The \texttt{Args} structure contains more parsers and parser transformers  | 
         | 
  1456   for which the input source type is \texttt{Args.T list}.  For example, | 
         | 
  1457   \begin{verbatim} | 
         | 
  1458   type 'a atlp = Args.T list -> 'a * Args.T list ;  | 
         | 
  1459   open Args ;  | 
         | 
  1460   nat : int atlp ; (* also Args.int *)  | 
         | 
  1461   thm_sel : PureThy.interval list atlp ;  | 
         | 
  1462   list : 'a atlp -> 'a list atlp ;  | 
         | 
  1463   attribs : (string -> string) -> Args.src list atlp ;  | 
         | 
  1464   opt_attribs : (string -> string) -> Args.src list atlp ;  | 
         | 
  1465     | 
         | 
  1466   (* parse_atl_str : 'a atlp -> (string -> 'a * string) ;  | 
         | 
  1467   given an Args.T list parser, to get a string parser *)  | 
         | 
  1468   fun parse_atl_str atlp str =   | 
         | 
  1469   let val (ats, rem_str) = parse_str P.arguments str ;  | 
         | 
  1470   val (res, rem_ats) = atlp ats ;  | 
         | 
  1471   in (res, String.concat (Library.separate " "  | 
         | 
  1472   (List.map Args.string_of rem_ats @ [rem_str]))) end ;  | 
         | 
  1473   | 
         | 
  1474   parse_atl_str Args.int "-1-," ;  | 
         | 
  1475   parse_atl_str (Scan.option Args.int) "x1-," ;  | 
         | 
  1476   parse_atl_str Args.thm_sel "(1-,4,13-22)" ;  | 
         | 
  1477   | 
         | 
  1478   val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I)  | 
         | 
  1479   "[THEN trans [THEN sym], simp, OF sym]" ;  | 
         | 
  1480   \end{verbatim} | 
         | 
  1481   | 
         | 
  1482   From here, an attribute is interpreted using \texttt{Attrib.attribute}. | 
         | 
  1483   | 
         | 
  1484   \texttt{Args} has a large number of functions which parse an \texttt{Args.src} | 
         | 
  1485   and also refer to a generic context.    | 
         | 
  1486   Note the use of \texttt{Scan.lift} for this. | 
         | 
  1487   (as does \texttt{Attrib} - RETHINK THIS) | 
         | 
  1488     | 
         | 
  1489   (\texttt{Args.syntax} shown below has type specialised) | 
         | 
  1490   | 
         | 
  1491   \begin{verbatim} | 
         | 
  1492   type ('res, 'src) parse_fn = 'src -> 'res * 'src ; | 
         | 
  1493   type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ; | 
         | 
  1494   Scan.lift : 'a atlp -> 'a cgatlp ;  | 
         | 
  1495   term : term cgatlp ;  | 
         | 
  1496   typ : typ cgatlp ;  | 
         | 
  1497     | 
         | 
  1498   Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ; | 
         | 
  1499   Attrib.thm : thm cgatlp ;  | 
         | 
  1500   Attrib.thms : thm list cgatlp ;  | 
         | 
  1501   Attrib.multi_thm : thm list cgatlp ;  | 
         | 
  1502     | 
         | 
  1503   (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ;  | 
         | 
  1504   given a (Context.generic * Args.T list) parser, to get a string parser *)  | 
         | 
  1505   fun parse_cgatl_str cgatlp str =   | 
         | 
  1506   let   | 
         | 
  1507     (* use the current generic context *)  | 
         | 
  1508     val generic = Context.Theory thy ;  | 
         | 
  1509     val (ats, rem_str) = parse_str P.arguments str ;  | 
         | 
  1510     (* ignore any change to the generic context *)  | 
         | 
  1511     val (res, (_, rem_ats)) = cgatlp (generic, ats) ;  | 
         | 
  1512   in (res, String.concat (Library.separate " "  | 
         | 
  1513       (List.map Args.string_of rem_ats @ [rem_str]))) end ;  | 
         | 
  1514   \end{verbatim} | 
         | 
  1515 *}  | 
         | 
  1516   | 
         | 
  1517 section{* Attributes, and the \texttt{Attrib} structure *} | 
         | 
  1518   | 
         | 
  1519 text {* | 
         | 
  1520   The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}. | 
         | 
  1521   The source file for the \texttt{Attrib} structure is | 
         | 
  1522   \texttt{src/Pure/Isar/attrib.ML}. | 
         | 
  1523   Most attributes use a theorem to change a generic context (for example,   | 
         | 
  1524   by declaring that the theorem should be used, by default, in simplification),  | 
         | 
  1525   or change a theorem (which most often involves referring to the current  | 
         | 
  1526   theory).   | 
         | 
  1527   The functions \texttt{Thm.rule\_attribute} and | 
         | 
  1528   \texttt{Thm.declaration\_attribute} create attributes of these kinds. | 
         | 
  1529   | 
         | 
  1530   \begin{verbatim} | 
         | 
  1531   type attribute = Context.generic * thm -> Context.generic * thm;  | 
         | 
  1532   type 'a trf = 'a -> 'a ; (* transformer of a given type *)  | 
         | 
  1533   Thm.rule_attribute  : (Context.generic -> thm -> thm) -> attribute ;  | 
         | 
  1534   Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ;  | 
         | 
  1535   | 
         | 
  1536   Attrib.print_attributes : theory -> unit ;  | 
         | 
  1537   Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ;  | 
         | 
  1538   | 
         | 
  1539   List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ;  | 
         | 
  1540   \end{verbatim} | 
         | 
  1541   | 
         | 
  1542   An attribute is stored in a theory as indicated by:  | 
         | 
  1543   \begin{verbatim} | 
         | 
  1544   Attrib.add_attributes :   | 
         | 
  1545   (bstring * (src -> attribute) * string) list -> theory trf ;   | 
         | 
  1546   (*  | 
         | 
  1547   Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ; | 
         | 
  1548   *)  | 
         | 
  1549   \end{verbatim} | 
         | 
  1550   where the first and third arguments are name and description of the attribute,  | 
         | 
  1551   and the second is a function which parses the attribute input text   | 
         | 
  1552   (including the attribute name, which has necessarily already been parsed).  | 
         | 
  1553   Here, \texttt{THEN\_att} is a function declared in the code for the | 
         | 
  1554   structure \texttt{Attrib}, but not published in its signature. | 
         | 
  1555   The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of  | 
         | 
  1556   \texttt{Attrib.add\_attributes} to add a number of attributes. | 
         | 
  1557   | 
         | 
  1558   \begin{verbatim} | 
         | 
  1559   FullAttrib.THEN_att : src -> attribute ;  | 
         | 
  1560   FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ;  | 
         | 
  1561   FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ;  | 
         | 
  1562   \end{verbatim} | 
         | 
  1563   | 
         | 
  1564   \begin{verbatim} | 
         | 
  1565   Attrib.syntax : attribute cgatlp -> src -> attribute ;  | 
         | 
  1566   Attrib.no_args : attribute -> src -> attribute ;  | 
         | 
  1567   \end{verbatim} | 
         | 
  1568   When this is called as \texttt{syntax scan src (gc, th)} | 
         | 
  1569   the generic context \texttt{gc} is used  | 
         | 
  1570   (and potentially changed to \texttt{gc'}) | 
         | 
  1571   by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would | 
         | 
  1572   then be applied to \texttt{(gc', th)}. | 
         | 
  1573   The source for parsing the attribute is the arguments part of \texttt{src}, | 
         | 
  1574   which must all be consumed by the parse.  | 
         | 
  1575   | 
         | 
  1576   For example, for \texttt{Attrib.no\_args attr src}, the attribute parser  | 
         | 
  1577   simply returns \texttt{attr}, requiring that the arguments part of | 
         | 
  1578   \texttt{src} must be empty. | 
         | 
  1579   | 
         | 
  1580   Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified: | 
         | 
  1581   \begin{verbatim} | 
         | 
  1582   fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ;  | 
         | 
  1583   rot_att_n : int -> attribute ;  | 
         | 
  1584   val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ;  | 
         | 
  1585   val rotated_att : src -> attribute =  | 
         | 
  1586   Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ;  | 
         | 
  1587     | 
         | 
  1588   val THEN_arg : int cgatlp = Scan.lift   | 
         | 
  1589   (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ;  | 
         | 
  1590   | 
         | 
  1591   Attrib.thm : thm cgatlp ;  | 
         | 
  1592   | 
         | 
  1593   THEN_arg -- Attrib.thm : (int * thm) cgatlp ;  | 
         | 
  1594   | 
         | 
  1595   fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ;  | 
         | 
  1596   THEN_att_n : int * thm -> attribute ;  | 
         | 
  1597   | 
         | 
  1598   val THEN_att : src -> attribute = Attrib.syntax  | 
         | 
  1599   (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp);  | 
         | 
  1600   \end{verbatim} | 
         | 
  1601   The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg} | 
         | 
  1602   read an optional argument, which for \texttt{rotated} is an integer,  | 
         | 
  1603   and for \texttt{THEN} is a natural enclosed in square brackets; | 
         | 
  1604   the default, if the argument is absent, is 1 in each case.  | 
         | 
  1605   Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into | 
         | 
  1606   attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is | 
         | 
  1607   parsed by \texttt{Attrib.thm}.   | 
         | 
  1608   Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}. | 
         | 
  1609   | 
         | 
  1610 *}  | 
         | 
  1611   | 
         | 
  1612 section{* Methods, and the \texttt{Method} structure *} | 
         | 
  1613   | 
         | 
  1614 text {* | 
         | 
  1615   The source file is \texttt{src/Pure/Isar/method.ML}. | 
         | 
  1616   The type \texttt{method} is defined by the datatype declaration | 
         | 
  1617   \begin{verbatim} | 
         | 
  1618   (* datatype method = Meth of thm list -> cases_tactic; *)  | 
         | 
  1619   RuleCases.NO_CASES : tactic -> cases_tactic ;  | 
         | 
  1620   \end{verbatim} | 
         | 
  1621   In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor | 
         | 
  1622   \texttt{Meth}. | 
         | 
  1623   A \texttt{cases\_tactic} is an elaborated version of a tactic. | 
         | 
  1624   \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a | 
         | 
  1625   \texttt{cases\_tactic} without any further case information. | 
         | 
  1626   For further details see the description of structure \texttt{RuleCases} below. | 
         | 
  1627   The list of theorems to be passed to a method consists of the current  | 
         | 
  1628   \emph{facts} in the proof. | 
         | 
  1629     | 
         | 
  1630   \begin{verbatim} | 
         | 
  1631   RAW_METHOD : (thm list -> tactic) -> method ;  | 
         | 
  1632   METHOD : (thm list -> tactic) -> method ;  | 
         | 
  1633     | 
         | 
  1634   SIMPLE_METHOD : tactic -> method ;  | 
         | 
  1635   SIMPLE_METHOD' : (int -> tactic) -> method ;  | 
         | 
  1636   SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ;  | 
         | 
  1637   | 
         | 
  1638   RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ;  | 
         | 
  1639   METHOD_CASES : (thm list -> cases_tactic) -> method ;  | 
         | 
  1640   \end{verbatim} | 
         | 
  1641   A method is, in its simplest form, a tactic; applying the method is to apply  | 
         | 
  1642   the tactic to the current goal state.  | 
         | 
  1643   | 
         | 
  1644   Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying  | 
         | 
  1645   \texttt{tacf} to the current {facts}, and applying that tactic to the | 
         | 
  1646   goal state.  | 
         | 
  1647   | 
         | 
  1648   \texttt{METHOD} is similar but also first applies | 
         | 
  1649   \texttt{Goal.conjunction\_tac} to all subgoals. | 
         | 
  1650   | 
         | 
  1651   \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then | 
         | 
  1652   applies \texttt{tacf}. | 
         | 
  1653   | 
         | 
  1654   \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then | 
         | 
  1655   applies \texttt{tacf} to subgoal 1. | 
         | 
  1656   | 
         | 
  1657   \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by | 
         | 
  1658   \texttt{quant}, which may be, for example, | 
         | 
  1659   \texttt{ALLGOALS} (all subgoals), | 
         | 
  1660   \texttt{TRYALL} (try all subgoals, failure is OK), | 
         | 
  1661   \texttt{FIRSTGOAL} (try subgoals until it succeeds once),  | 
         | 
  1662   \texttt{(fn tacf => tacf 4)} (subgoal 4), etc | 
         | 
  1663   (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}). | 
         | 
  1664   | 
         | 
  1665   A method is stored in a theory as indicated by:  | 
         | 
  1666   \begin{verbatim} | 
         | 
  1667   Method.add_method :   | 
         | 
  1668   (bstring * (src -> Proof.context -> method) * string) -> theory trf ;   | 
         | 
  1669   ( *  | 
         | 
  1670   * )  | 
         | 
  1671   \end{verbatim} | 
         | 
  1672   where the first and third arguments are name and description of the method,  | 
         | 
  1673   and the second is a function which parses the method input text   | 
         | 
  1674   (including the method name, which has necessarily already been parsed).  | 
         | 
  1675   | 
         | 
  1676   Here, \texttt{xxx} is a function declared in the code for the | 
         | 
  1677   structure \texttt{Method}, but not published in its signature. | 
         | 
  1678   The source file \texttt{src/Pure/Isar/method.ML} shows the use of  | 
         | 
  1679   \texttt{Method.add\_method} to add a number of methods. | 
         | 
  1680   | 
         | 
  1681   | 
         | 
  1682 *}  | 
         | 
  1683 (*>*)  | 
         | 
  1684 end  |