ProgTutorial/Parsing.thy
changeset 189 069d525f8f1d
parent 188 8939b8fd8603
child 192 2fff636e1fa0
equal deleted inserted replaced
188:8939b8fd8603 189:069d525f8f1d
       
     1 theory Parsing
       
     2 imports Base "Package/Simple_Inductive_Package"
       
     3 begin
       
     4 
       
     5 
       
     6 chapter {* Parsing *}
       
     7 
       
     8 text {*
       
     9 
       
    10   Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. 
       
    11   Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so
       
    12   on, belong to the outer syntax, whereas items inside double quotation marks, such 
       
    13   as terms, types and so on, belong to the inner syntax. For parsing inner syntax, 
       
    14   Isabelle uses a rather general and sophisticated algorithm, which 
       
    15   is driven by priority grammars. Parsers for outer syntax are built up by functional
       
    16   parsing combinators. These combinators are a well-established technique for parsing, 
       
    17   which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}.
       
    18   Isabelle developers are usually concerned with writing these outer syntax parsers, 
       
    19   either for new definitional packages or for calling tactics with specific arguments. 
       
    20 
       
    21   \begin{readmore}
       
    22   The library 
       
    23   for writing parser combinators is split up, roughly, into two parts. 
       
    24   The first part consists of a collection of generic parser combinators defined
       
    25   in the structure @{ML_struct Scan} in the file 
       
    26   @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of 
       
    27   combinators for dealing with specific token types, which are defined in the 
       
    28   structure @{ML_struct OuterParse} in the file 
       
    29   @{ML_file "Pure/Isar/outer_parse.ML"}.
       
    30   \end{readmore}
       
    31 
       
    32 *}
       
    33 
       
    34 section {* Building Generic Parsers *}
       
    35 
       
    36 text {*
       
    37 
       
    38   Let us first have a look at parsing strings using generic parsing combinators. 
       
    39   The function @{ML "$$"} takes a string as argument and will ``consume'' this string from 
       
    40   a given input list of strings. ``Consume'' in this context means that it will 
       
    41   return a pair consisting of this string and the rest of the input list. 
       
    42   For example:
       
    43 
       
    44   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
       
    45 
       
    46   @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
       
    47 
       
    48   The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception 
       
    49   @{text "FAIL"} if no string can be consumed. For example trying to parse
       
    50 
       
    51   @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" 
       
    52                                "Exception FAIL raised"}
       
    53   
       
    54   will raise the exception @{text "FAIL"}.
       
    55   There are three exceptions used in the parsing combinators:
       
    56 
       
    57   \begin{itemize}
       
    58   \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
       
    59   might be explored. 
       
    60   \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
       
    61   in @{text "($$ \"h\") []"}.
       
    62   \item @{text "ABORT"} is the exception that is raised when a dead end is reached. 
       
    63   It is used for example in the function @{ML "!!"} (see below).
       
    64   \end{itemize}
       
    65 
       
    66   However, note that these exceptions are private to the parser and cannot be accessed
       
    67   by the programmer (for example to handle them).
       
    68   
       
    69   Slightly more general than the parser @{ML "$$"} is the function @{ML
       
    70   Scan.one}, in that it takes a predicate as argument and then parses exactly
       
    71   one item from the input list satisfying this predicate. For example the
       
    72   following parser either consumes an @{text [quotes] "h"} or a @{text
       
    73   [quotes] "w"}:
       
    74 
       
    75 
       
    76 @{ML_response [display,gray] 
       
    77 "let 
       
    78   val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
       
    79   val input1 = (explode \"hello\")
       
    80   val input2 = (explode \"world\")
       
    81 in
       
    82     (hw input1, hw input2)
       
    83 end"
       
    84     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
       
    85 
       
    86   Two parser can be connected in sequence by using the function @{ML "--"}. 
       
    87   For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
       
    88   sequence you can achieve by:
       
    89 
       
    90   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
       
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
       
    92 
       
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
       
    94 
       
    95   If, as in the previous example, you want to parse a particular string, 
       
    96   then you should use the function @{ML Scan.this_string}:
       
    97 
       
    98   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
       
    99                           "(\"hell\", [\"o\"])"}
       
   100 
       
   101   Parsers that explore alternatives can be constructed using the function @{ML
       
   102   "||"}. For example, the parser @{ML "(p || q)" for p q} returns the
       
   103   result of @{text "p"}, in case it succeeds, otherwise it returns the
       
   104   result of @{text "q"}. For example:
       
   105 
       
   106 
       
   107 @{ML_response [display,gray] 
       
   108 "let 
       
   109   val hw = ($$ \"h\") || ($$ \"w\")
       
   110   val input1 = (explode \"hello\")
       
   111   val input2 = (explode \"world\")
       
   112 in
       
   113   (hw input1, hw input2)
       
   114 end"
       
   115   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
       
   116 
       
   117   The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function 
       
   118   for parsers, except that they discard the item being parsed by the first (respectively second)
       
   119   parser. For example:
       
   120   
       
   121 @{ML_response [display,gray]
       
   122 "let 
       
   123   val just_e = ($$ \"h\") |-- ($$ \"e\") 
       
   124   val just_h = ($$ \"h\") --| ($$ \"e\") 
       
   125   val input = (explode \"hello\")  
       
   126 in 
       
   127   (just_e input, just_h input)
       
   128 end"
       
   129   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
       
   130 
       
   131   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
       
   132   @{text "p"}, if it succeeds; otherwise it returns 
       
   133   the default value @{text "x"}. For example:
       
   134 
       
   135 @{ML_response [display,gray]
       
   136 "let 
       
   137   val p = Scan.optional ($$ \"h\") \"x\"
       
   138   val input1 = (explode \"hello\")
       
   139   val input2 = (explode \"world\")  
       
   140 in 
       
   141   (p input1, p input2)
       
   142 end" 
       
   143  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
       
   144 
       
   145   The function @{ML Scan.option} works similarly, except no default value can
       
   146   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
       
   147 
       
   148 @{ML_response [display,gray]
       
   149 "let 
       
   150   val p = Scan.option ($$ \"h\")
       
   151   val input1 = (explode \"hello\")
       
   152   val input2 = (explode \"world\")  
       
   153 in 
       
   154   (p input1, p input2)
       
   155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
       
   156 
       
   157   The function @{ML "!!"} helps to produce appropriate error messages
       
   158   during parsing. For example if you want to parse that @{text p} is immediately 
       
   159   followed by @{text q}, or start a completely different parser @{text r},
       
   160   you might write:
       
   161 
       
   162   @{ML [display,gray] "(p -- q) || r" for p q r}
       
   163 
       
   164   However, this parser is problematic for producing an appropriate error
       
   165   message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in
       
   166   that case you lose the information that @{text p} should be followed by
       
   167   @{text q}. To see this consider the case in which @{text p} is present in
       
   168   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
       
   169   and the alternative parser @{text r} will be tried. However in many
       
   170   circumstance this will be the wrong parser for the input ``p-followed-by-q''
       
   171   and therefore will also fail. The error message is then caused by the
       
   172   failure of @{text r}, not by the absence of @{text q} in the input. This
       
   173   kind of situation can be avoided when using the function @{ML "!!"}. 
       
   174   This function aborts the whole process of parsing in case of a
       
   175   failure and prints an error message. For example if you invoke the parser
       
   176 
       
   177   
       
   178   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
       
   179 
       
   180   on @{text [quotes] "hello"}, the parsing succeeds
       
   181 
       
   182   @{ML_response [display,gray] 
       
   183                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
       
   184                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
       
   185 
       
   186   but if you invoke it on @{text [quotes] "world"}
       
   187   
       
   188   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
       
   189                                "Exception ABORT raised"}
       
   190 
       
   191   then the parsing aborts and the error message @{text "foo"} is printed. In order to
       
   192   see the error message properly, you need to prefix the parser with the function 
       
   193   @{ML "Scan.error"}. For example:
       
   194 
       
   195   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
       
   196                                "Exception Error \"foo\" raised"}
       
   197 
       
   198   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
       
   199   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
       
   200   
       
   201   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
       
   202   r}. If you want to generate the correct error message for p-followed-by-q,
       
   203   then you have to write:
       
   204 *}
       
   205 
       
   206 ML{*fun p_followed_by_q p q r =
       
   207 let 
       
   208   val err_msg = (fn _ => p ^ " is not followed by " ^ q)
       
   209 in
       
   210   ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)
       
   211 end *}
       
   212 
       
   213 
       
   214 text {*
       
   215   Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and 
       
   216   the input @{text [quotes] "holle"} 
       
   217 
       
   218   @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
       
   219                                "Exception ERROR \"h is not followed by e\" raised"} 
       
   220 
       
   221   produces the correct error message. Running it with
       
   222  
       
   223   @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
       
   224                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
       
   225   
       
   226   yields the expected parsing. 
       
   227 
       
   228   The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as 
       
   229   often as it succeeds. For example:
       
   230   
       
   231   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
       
   232                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
       
   233   
       
   234   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
       
   235   @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
       
   236   succeeds at least once.
       
   237 
       
   238   Also note that the parser would have aborted with the exception @{text MORE}, if
       
   239   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
       
   240   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
       
   241   them you can write:
       
   242   
       
   243   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
       
   244                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
       
   245 
       
   246   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
       
   247   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
       
   248   manually wrapping is often already done by the surrounding infrastructure. 
       
   249 
       
   250   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
       
   251   string as in
       
   252 
       
   253   @{ML_response [display,gray] 
       
   254 "let 
       
   255    val p = Scan.repeat (Scan.one Symbol.not_eof)
       
   256    val input = (explode \"foo bar foo\") 
       
   257 in
       
   258    Scan.finite Symbol.stopper p input
       
   259 end" 
       
   260 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
       
   261 
       
   262   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
       
   263   end of the input string (i.e.~stopper symbol).
       
   264 
       
   265   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
       
   266   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
       
   267   
       
   268   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
       
   269                                "Exception FAIL raised"}
       
   270 
       
   271   fails, while
       
   272 
       
   273   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")"
       
   274                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
       
   275 
       
   276   succeeds. 
       
   277 
       
   278   The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
       
   279   input until a certain marker symbol is reached. In the example below the marker
       
   280   symbol is a @{text [quotes] "*"}.
       
   281 
       
   282   @{ML_response [display,gray]
       
   283 "let 
       
   284   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
       
   285   val input1 = (explode \"fooooo\")
       
   286   val input2 = (explode \"foo*ooo\")
       
   287 in
       
   288   (Scan.finite Symbol.stopper p input1, 
       
   289    Scan.finite Symbol.stopper p input2)
       
   290 end"
       
   291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
       
   292  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
       
   293 
       
   294   After parsing is done, you nearly always want to apply a function on the parsed 
       
   295   items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs 
       
   296   first the parser @{text p} and upon successful completion applies the 
       
   297   function @{text f} to the result. For example
       
   298 
       
   299 @{ML_response [display,gray]
       
   300 "let 
       
   301   fun double (x,y) = (x ^ x, y ^ y) 
       
   302 in
       
   303   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
       
   304 end"
       
   305 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
       
   306 
       
   307   doubles the two parsed input strings; or
       
   308 
       
   309   @{ML_response [display,gray] 
       
   310 "let 
       
   311    val p = Scan.repeat (Scan.one Symbol.not_eof)
       
   312    val input = (explode \"foo bar foo\") 
       
   313 in
       
   314    Scan.finite Symbol.stopper (p >> implode) input
       
   315 end" 
       
   316 "(\"foo bar foo\",[])"}
       
   317 
       
   318   where the single-character strings in the parsed output are transformed
       
   319   back into one string.
       
   320  
       
   321   The function @{ML Scan.ahead} parses some input, but leaves the original
       
   322   input unchanged. For example:
       
   323 
       
   324   @{ML_response [display,gray]
       
   325   "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")" 
       
   326   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
       
   327 
       
   328   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
       
   329   the given parser to the second component of the pair and leaves the  first component 
       
   330   untouched. For example
       
   331 
       
   332 @{ML_response [display,gray]
       
   333 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
       
   334 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
       
   335 
       
   336   (FIXME: In which situations is this useful? Give examples.) 
       
   337 
       
   338   \begin{exercise}\label{ex:scancmts}
       
   339   Write a parser that parses an input string so that any comment enclosed
       
   340   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
       
   341   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
       
   342   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
       
   343   "s1 ^ s ^ s2" for s1 s2 s}.
       
   344   \end{exercise}
       
   345 *}
       
   346 
       
   347 section {* Parsing Theory Syntax *}
       
   348 
       
   349 text {*
       
   350   (FIXME: context parser)
       
   351 
       
   352   Most of the time, however, Isabelle developers have to deal with parsing
       
   353   tokens, not strings.  These token parsers have the type:
       
   354 *}
       
   355   
       
   356 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}
       
   357 
       
   358 text {*
       
   359   The reason for using token parsers is that theory syntax, as well as the
       
   360   parsers for the arguments of proof methods, use the type @{ML_type
       
   361   OuterLex.token} (which is identical to the type @{ML_type
       
   362   OuterParse.token}).  However, there are also handy parsers for
       
   363   ML-expressions and ML-files.
       
   364 
       
   365   \begin{readmore}
       
   366   The parser functions for the theory syntax are contained in the structure
       
   367   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
       
   368   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
       
   369   \end{readmore}
       
   370 
       
   371   The structure @{ML_struct OuterLex} defines several kinds of tokens (for example 
       
   372   @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and
       
   373   @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
       
   374   kind of tokens.
       
   375 *}  
       
   376 
       
   377 text {*
       
   378   The first example shows how to generate a token list out of a string using
       
   379   the function @{ML "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"}
       
   380   since, at the moment, we are not interested in generating
       
   381   precise error messages. The following code
       
   382 
       
   383 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
       
   384 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
       
   385  Token (\<dots>,(Space, \" \"),\<dots>), 
       
   386  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
       
   387 
       
   388   produces three tokens where the first and the last are identifiers, since
       
   389   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
       
   390   other syntactic category.\footnote{Note that because of a possible a bug in
       
   391   the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of
       
   392   the tokens.} The second indicates a space.
       
   393 
       
   394   Many parsing functions later on will require spaces, comments and the like
       
   395   to have already been filtered out.  So from now on we are going to use the 
       
   396   functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example:
       
   397 
       
   398 @{ML_response_fake [display,gray]
       
   399 "let
       
   400    val input = OuterSyntax.scan Position.none \"hello world\"
       
   401 in
       
   402    filter OuterLex.is_proper input
       
   403 end" 
       
   404 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
       
   405 
       
   406   For convenience we define the function:
       
   407 
       
   408 *}
       
   409 
       
   410 ML{*fun filtered_input str = 
       
   411   filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}
       
   412 
       
   413 text {*
       
   414 
       
   415   If you now parse
       
   416 
       
   417 @{ML_response_fake [display,gray] 
       
   418 "filtered_input \"inductive | for\"" 
       
   419 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
       
   420  Token (\<dots>,(Keyword, \"|\"),\<dots>), 
       
   421  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
       
   422 
       
   423   you obtain a list consisting of only a command and two keyword tokens.
       
   424   If you want to see which keywords and commands are currently known to Isabelle, type in
       
   425   the following code (you might have to adjust the @{ML print_depth} in order to
       
   426   see the complete list):
       
   427 
       
   428 @{ML_response_fake [display,gray] 
       
   429 "let 
       
   430   val (keywords, commands) = OuterKeyword.get_lexicons ()
       
   431 in 
       
   432   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
       
   433 end" 
       
   434 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
       
   435 
       
   436   The parser @{ML "OuterParse.$$$"} parses a single keyword. For example:
       
   437 
       
   438 @{ML_response [display,gray]
       
   439 "let 
       
   440   val input1 = filtered_input \"where for\"
       
   441   val input2 = filtered_input \"| in\"
       
   442 in 
       
   443   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
       
   444 end"
       
   445 "((\"where\",\<dots>), (\"|\",\<dots>))"}
       
   446 
       
   447   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
       
   448 
       
   449 @{ML_response [display,gray]
       
   450 "let 
       
   451   val input = filtered_input \"| in\"
       
   452 in 
       
   453   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
       
   454 end"
       
   455 "((\"|\", \"in\"), [])"}
       
   456 
       
   457   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
       
   458   list of items recognised by the parser @{text p}, where the items being parsed
       
   459   are separated by the string @{text s}. For example:
       
   460 
       
   461 @{ML_response [display,gray]
       
   462 "let 
       
   463   val input = filtered_input \"in | in | in foo\"
       
   464 in 
       
   465   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
       
   466 end" 
       
   467 "([\"in\", \"in\", \"in\"], [\<dots>])"}
       
   468 
       
   469   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
       
   470   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
       
   471   end of the parsed string, otherwise the parser would have consumed all
       
   472   tokens and then failed with the exception @{text "MORE"}. Like in the
       
   473   previous section, we can avoid this exception using the wrapper @{ML
       
   474   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
       
   475   OuterLex.stopper}. We can write:
       
   476 
       
   477 @{ML_response [display,gray]
       
   478 "let 
       
   479   val input = filtered_input \"in | in | in\"
       
   480 in 
       
   481   Scan.finite OuterLex.stopper 
       
   482          (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
       
   483 end" 
       
   484 "([\"in\", \"in\", \"in\"], [])"}
       
   485 
       
   486   The following function will help to run examples.
       
   487 
       
   488 *}
       
   489 
       
   490 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
       
   491 
       
   492 text {*
       
   493 
       
   494   The function @{ML "OuterParse.!!!"} can be used to force termination of the
       
   495   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), 
       
   496   except that the error message is fixed to be @{text [quotes] "Outer syntax error"}
       
   497   with a relatively precise description of the failure. For example:
       
   498 
       
   499 @{ML_response_fake [display,gray]
       
   500 "let 
       
   501   val input = filtered_input \"in |\"
       
   502   val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
       
   503 in 
       
   504   parse (OuterParse.!!! parse_bar_then_in) input 
       
   505 end"
       
   506 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
       
   507 but keyword in was found\" raised"
       
   508 }
       
   509 
       
   510   \begin{exercise} (FIXME)
       
   511   A type-identifier, for example @{typ "'a"}, is a token of 
       
   512   kind @{ML "Keyword" in OuterLex}. It can be parsed using 
       
   513   the function @{ML OuterParse.type_ident}.
       
   514   \end{exercise}
       
   515 
       
   516   (FIXME: or give parser for numbers)
       
   517 
       
   518   Whenever there is a possibility that the processing of user input can fail, 
       
   519   it is a good idea to give as much information about where the error 
       
   520   occured. For this Isabelle can attach positional information to tokens
       
   521   and then thread this information up the processing chain. To see this,
       
   522   modify the function @{ML filtered_input} described earlier to 
       
   523 *}
       
   524 
       
   525 ML{*fun filtered_input' str = 
       
   526        filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *}
       
   527 
       
   528 text {*
       
   529   where we pretend the parsed string starts on line 7. An example is
       
   530 
       
   531 @{ML_response_fake [display,gray]
       
   532 "filtered_input' \"foo \\n bar\""
       
   533 "[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \<dots>),
       
   534  Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"}
       
   535 
       
   536   in which the @{text [quotes] "\\n"} causes the second token to be in 
       
   537   line 8.
       
   538 
       
   539   By using the parser @{ML OuterParse.position} you can decode the positional
       
   540   information and return it as part of the parsed input. For example
       
   541 
       
   542 @{ML_response_fake [display,gray]
       
   543 "let
       
   544   val input = (filtered_input' \"where\")
       
   545 in 
       
   546   parse (OuterParse.position (OuterParse.$$$ \"where\")) input
       
   547 end"
       
   548 "((\"where\", {line=7, end_line=7}), [])"}
       
   549 
       
   550   \begin{readmore}
       
   551   The functions related to positions are implemented in the file
       
   552   @{ML_file "Pure/General/position.ML"}.
       
   553   \end{readmore}
       
   554 
       
   555 *}
       
   556 
       
   557 section {* Parsing Inner Syntax *}
       
   558 
       
   559 text {*
       
   560   There is usually no need to write your own parser for parsing inner syntax, that is 
       
   561   for terms and  types: you can just call the pre-defined parsers. Terms can 
       
   562   be parsed using the function @{ML OuterParse.term}. For example:
       
   563 
       
   564 @{ML_response [display,gray]
       
   565 "let 
       
   566   val input = OuterSyntax.scan Position.none \"foo\"
       
   567 in 
       
   568   OuterParse.term input
       
   569 end"
       
   570 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
       
   571 
       
   572   The function @{ML OuterParse.prop} is similar, except that it gives a different
       
   573   error message, when parsing fails. As you can see, the parser not just returns 
       
   574   the parsed string, but also some encoded information. You can decode the
       
   575   information with the function @{ML YXML.parse}. For example
       
   576 
       
   577   @{ML_response [display,gray]
       
   578   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
       
   579   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
       
   580 
       
   581   The result of the decoding is an XML-tree. You can see better what is going on if
       
   582   you replace @{ML Position.none} by @{ML "Position.line 42"}, say:
       
   583 
       
   584 @{ML_response [display,gray]
       
   585 "let 
       
   586   val input = OuterSyntax.scan (Position.line 42) \"foo\"
       
   587 in 
       
   588   YXML.parse (fst (OuterParse.term input))
       
   589 end"
       
   590 "XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}
       
   591   
       
   592   The positional information is stored as part of an XML-tree so that code 
       
   593   called later on will be able to give more precise error messages. 
       
   594 
       
   595   \begin{readmore}
       
   596   The functions to do with input and output of XML and YXML are defined 
       
   597   in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}.
       
   598   \end{readmore}
       
   599   
       
   600 *}
       
   601 
       
   602 section {* Parsing Specifications\label{sec:parsingspecs} *}
       
   603 
       
   604 text {*
       
   605   There are a number of special purpose parsers that help with parsing
       
   606   specifications of function definitions, inductive predicates and so on. In
       
   607   Capter~\ref{chp:package}, for example, we will need to parse specifications
       
   608   for inductive predicates of the form:
       
   609 *}
       
   610 
       
   611 simple_inductive
       
   612   even and odd
       
   613 where
       
   614   even0: "even 0"
       
   615 | evenS: "odd n \<Longrightarrow> even (Suc n)"
       
   616 | oddS: "even n \<Longrightarrow> odd (Suc n)"
       
   617 
       
   618 text {*
       
   619   For this we are going to use the parser:
       
   620 *}
       
   621 
       
   622 ML %linenosgray{*val spec_parser = 
       
   623      OuterParse.fixes -- 
       
   624      Scan.optional 
       
   625        (OuterParse.$$$ "where" |--
       
   626           OuterParse.!!! 
       
   627             (OuterParse.enum1 "|" 
       
   628                (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*}
       
   629 
       
   630 text {*
       
   631   Note that the parser does not parse the keyword \simpleinductive, even if it is
       
   632   meant to process definitions as shown above. The parser of the keyword 
       
   633   will be given by the infrastructure that will eventually call @{ML spec_parser}.
       
   634   
       
   635 
       
   636   To see what the parser returns, let us parse the string corresponding to the 
       
   637   definition of @{term even} and @{term odd}:
       
   638 
       
   639 @{ML_response [display,gray]
       
   640 "let
       
   641   val input = filtered_input
       
   642      (\"even and odd \" ^  
       
   643       \"where \" ^
       
   644       \"  even0[intro]: \\\"even 0\\\" \" ^ 
       
   645       \"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^ 
       
   646       \"| oddS[intro]:  \\\"even n \<Longrightarrow> odd (Suc n)\\\"\")
       
   647 in
       
   648   parse spec_parser input
       
   649 end"
       
   650 "(([(even, NONE, NoSyn), (odd, NONE, NoSyn)],
       
   651      [((even0,\<dots>), \"\\^E\\^Ftoken\\^Eeven 0\\^E\\^F\\^E\"),
       
   652       ((evenS,\<dots>), \"\\^E\\^Ftoken\\^Eodd n \<Longrightarrow> even (Suc n)\\^E\\^F\\^E\"),
       
   653       ((oddS,\<dots>), \"\\^E\\^Ftoken\\^Eeven n \<Longrightarrow> odd (Suc n)\\^E\\^F\\^E\")]), [])"}
       
   654 
       
   655   As you see, the result is a pair consisting of a list of
       
   656   variables with optional type-annotation and syntax-annotation, and a list of
       
   657   rules where every rule has optionally a name and an attribute.
       
   658 
       
   659   The function @{ML OuterParse.fixes} in Line 2 of the parser reads an 
       
   660   \isacommand{and}-separated 
       
   661   list of variables that can include optional type annotations and syntax translations. 
       
   662   For example:\footnote{Note that in the code we need to write 
       
   663   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
       
   664   in the compound type.}
       
   665 
       
   666 @{ML_response [display,gray]
       
   667 "let
       
   668   val input = filtered_input 
       
   669         \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\"
       
   670 in
       
   671    parse OuterParse.fixes input
       
   672 end"
       
   673 "([(foo, SOME \"\\^E\\^Ftoken\\^Eint \<Rightarrow> bool\\^E\\^F\\^E\", NoSyn), 
       
   674   (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), 
       
   675   (blonk, NONE, NoSyn)],[])"}  
       
   676 *}
       
   677 
       
   678 text {*
       
   679   Whenever types are given, they are stored in the @{ML SOME}s. The types are
       
   680   not yet used to type the variables: this must be done by type-inference later
       
   681   on. Since types are part of the inner syntax they are strings with some
       
   682   encoded information (see previous section). If a syntax translation is
       
   683   present for a variable, then it is stored in the @{ML Mixfix} datastructure;
       
   684   no syntax translation is indicated by @{ML NoSyn}.
       
   685 
       
   686   \begin{readmore}
       
   687   The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
       
   688   \end{readmore}
       
   689 
       
   690   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
       
   691   list of introduction rules, that is propositions with theorem
       
   692   annotations. The introduction rules are propositions parsed by @{ML
       
   693   OuterParse.prop}. However, they can include an optional theorem name plus
       
   694   some attributes. For example
       
   695 
       
   696 @{ML_response [display,gray] "let 
       
   697   val input = filtered_input \"foo_lemma[intro,dest!]:\"
       
   698   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
       
   699 in 
       
   700   (name, map Args.dest_src attrib)
       
   701 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
       
   702  
       
   703   The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of
       
   704   @{ML thm_name in SpecParse}. Theorem names can contain attributes. The name 
       
   705   has to end with @{text [quotes] ":"}---see the argument of 
       
   706   the function @{ML SpecParse.opt_thm_name} in Line 7.
       
   707 
       
   708   \begin{readmore}
       
   709   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
       
   710   and @{ML_file "Pure/Isar/args.ML"}.
       
   711   \end{readmore}
       
   712 *}
       
   713 
       
   714 section {* New Commands and Keyword Files\label{sec:newcommand} *}
       
   715 
       
   716 text {*
       
   717   (FIXME: update to the right command setup)
       
   718 
       
   719   Often new commands, for example for providing new definitional principles,
       
   720   need to be implemented. While this is not difficult on the ML-level,
       
   721   new commands, in order to be useful, need to be recognised by
       
   722   ProofGeneral. This results in some subtle configuration issues, which we
       
   723   will explain in this section.
       
   724 
       
   725   To keep things simple, let us start with a ``silly'' command that does nothing 
       
   726   at all. We shall name this command \isacommand{foobar}. On the ML-level it can be 
       
   727   defined as:
       
   728 *}
       
   729 
       
   730 ML{*let
       
   731   val do_nothing = Scan.succeed (Toplevel.theory I)
       
   732   val kind = OuterKeyword.thy_decl
       
   733 in
       
   734   OuterSyntax.command "foobar" "description of foobar" kind do_nothing
       
   735 end *}
       
   736 
       
   737 text {*
       
   738   The crucial function @{ML OuterSyntax.command} expects a name for the command, a
       
   739   short description, a kind indicator (which we will explain later on more thoroughly) and a
       
   740   parser producing a top-level transition function (its purpose will also explained
       
   741   later). 
       
   742 
       
   743   While this is everything you have to do on the ML-level, you need a keyword
       
   744   file that can be loaded by ProofGeneral. This is to enable ProofGeneral to
       
   745   recognise \isacommand{foobar} as a command. Such a keyword file can be
       
   746   generated with the command-line:
       
   747 
       
   748   @{text [display] "$ isabelle keywords -k foobar some_log_files"}
       
   749 
       
   750   The option @{text "-k foobar"} indicates which postfix the name of the keyword file 
       
   751   will be assigned. In the case above the file will be named @{text
       
   752   "isar-keywords-foobar.el"}. This command requires log files to be
       
   753   present (in order to extract the keywords from them). To generate these log
       
   754   files, you first need to package the code above into a separate theory file named
       
   755   @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the
       
   756   complete code.
       
   757 
       
   758 
       
   759   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
       
   760   \begin{figure}[t]
       
   761   \begin{graybox}\small
       
   762   \isacommand{theory}~@{text Command}\\
       
   763   \isacommand{imports}~@{text Main}\\
       
   764   \isacommand{begin}\\
       
   765   \isacommand{ML}~@{text "\<verbopen>"}\\
       
   766   @{ML
       
   767 "let
       
   768   val do_nothing = Scan.succeed (Toplevel.theory I)
       
   769   val kind = OuterKeyword.thy_decl
       
   770 in
       
   771   OuterSyntax.command \"foobar\" \"description of foobar\" kind do_nothing
       
   772 end"}\\
       
   773   @{text "\<verbclose>"}\\
       
   774   \isacommand{end}
       
   775   \end{graybox}
       
   776   \caption{\small The file @{text "Command.thy"} is necessary for generating a log 
       
   777   file. This log file enables Isabelle to generate a keyword file containing 
       
   778   the command \isacommand{foobar}.\label{fig:commandtheory}}
       
   779   \end{figure}
       
   780   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
       
   781 
       
   782   For our purposes it is sufficient to use the log files of the theories
       
   783   @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as
       
   784   the log file for the theory @{text "Command.thy"}, which contains the new
       
   785   \isacommand{foobar}-command. If you target other logics besides HOL, such
       
   786   as Nominal or ZF, then you need to adapt the log files appropriately.
       
   787   
       
   788   @{text Pure} and @{text HOL} are usually compiled during the installation of
       
   789   Isabelle. So log files for them should be already available. If not, then
       
   790   they can be conveniently compiled with the help of the build-script from the Isabelle
       
   791   distribution.
       
   792 
       
   793   @{text [display] 
       
   794 "$ ./build -m \"Pure\"
       
   795 $ ./build -m \"HOL\""}
       
   796   
       
   797   The @{text "Pure-ProofGeneral"} theory needs to be compiled with:
       
   798 
       
   799   @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}
       
   800 
       
   801   For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory 
       
   802   with:
       
   803 
       
   804   @{text [display] "$ isabelle mkdir FoobarCommand"}
       
   805 
       
   806   This generates a directory containing the files: 
       
   807 
       
   808   @{text [display] 
       
   809 "./IsaMakefile
       
   810 ./FoobarCommand/ROOT.ML
       
   811 ./FoobarCommand/document
       
   812 ./FoobarCommand/document/root.tex"}
       
   813 
       
   814 
       
   815   You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}
       
   816   and add the line 
       
   817 
       
   818   @{text [display] "use_thy \"Command\";"} 
       
   819   
       
   820   to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing:
       
   821 
       
   822   @{text [display] "$ isabelle make"}
       
   823 
       
   824   If the compilation succeeds, you have finally created all the necessary log files. 
       
   825   They are stored in the directory 
       
   826   
       
   827   @{text [display]  "~/.isabelle/heaps/Isabelle2008/polyml-5.2.1_x86-linux/log"}
       
   828 
       
   829   or something similar depending on your Isabelle distribution and architecture.
       
   830   One quick way to assign a shell variable to this directory is by typing
       
   831 
       
   832   @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
       
   833  
       
   834   on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the 
       
   835   directory should include the files:
       
   836 
       
   837   @{text [display] 
       
   838 "Pure.gz
       
   839 HOL.gz
       
   840 Pure-ProofGeneral.gz
       
   841 HOL-FoobarCommand.gz"} 
       
   842 
       
   843   From them you can create the keyword files. Assuming the name 
       
   844   of the directory is in  @{text "$ISABELLE_LOGS"},
       
   845   then the Unix command for creating the keyword file is:
       
   846 
       
   847 @{text [display]
       
   848 "$ isabelle keywords -k foobar 
       
   849    $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"}
       
   850 
       
   851   The result is the file @{text "isar-keywords-foobar.el"}. It should contain
       
   852   the string @{text "foobar"} twice.\footnote{To see whether things are fine, check
       
   853   that @{text "grep foobar"} on this file returns something
       
   854   non-empty.}  This keyword file needs to
       
   855   be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle
       
   856   aware of this keyword file, you have to start Isabelle with the option @{text
       
   857   "-k foobar"}, that is:
       
   858 
       
   859 
       
   860   @{text [display] "$ isabelle emacs -k foobar a_theory_file"}
       
   861 
       
   862   If you now build a theory on top of @{text "Command.thy"}, 
       
   863   then the command \isacommand{foobar} can be used. 
       
   864   Similarly with any other new command. 
       
   865 
       
   866 
       
   867   At the moment \isacommand{foobar} is not very useful. Let us refine it a bit 
       
   868   next by letting it take a proposition as argument and printing this proposition 
       
   869   inside the tracing buffer. 
       
   870 
       
   871   The crucial part of a command is the function that determines the behaviour
       
   872   of the command. In the code above we used a ``do-nothing''-function, which
       
   873   because of @{ML Scan.succeed} does not parse any argument, but immediately
       
   874   returns the simple toplevel function @{ML "Toplevel.theory I"}. We can
       
   875   replace this code by a function that first parses a proposition (using the
       
   876   parser @{ML OuterParse.prop}), then prints out the tracing
       
   877   information (using a new top-level function @{text trace_top_lvl}) and 
       
   878   finally does nothing. For this you can write:
       
   879 *}
       
   880 
       
   881 ML{*let
       
   882   fun trace_top_lvl str = 
       
   883      Toplevel.theory (fn thy => (tracing str; thy))
       
   884 
       
   885   val trace_prop = OuterParse.prop >> trace_top_lvl
       
   886 
       
   887   val kind = OuterKeyword.thy_decl
       
   888 in
       
   889   OuterSyntax.command "foobar" "traces a proposition" kind trace_prop
       
   890 end *}
       
   891 
       
   892 text {*
       
   893   Now you can type
       
   894 
       
   895   \begin{isabelle}
       
   896   \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\
       
   897   @{text "> \"True \<and> False\""}
       
   898   \end{isabelle}
       
   899   
       
   900   and see the proposition in the tracing buffer.  
       
   901 
       
   902   Note that so far we used @{ML thy_decl in OuterKeyword} as the kind indicator
       
   903   for the command.  This means that the command finishes as soon as the
       
   904   arguments are processed. Examples of this kind of commands are
       
   905   \isacommand{definition} and \isacommand{declare}.  In other cases,
       
   906   commands are expected to parse some arguments, for example a proposition,
       
   907   and then ``open up'' a proof in order to prove the proposition (for example
       
   908   \isacommand{lemma}) or prove some other properties (for example
       
   909   \isacommand{function}). To achieve this kind of behaviour, you have to use the kind
       
   910   indicator @{ML thy_goal in OuterKeyword}.  Note, however, once you change the 
       
   911   ``kind'' of a command from @{ML thy_decl in OuterKeyword} to @{ML thy_goal in OuterKeyword} 
       
   912   then the keyword file needs to be re-created!
       
   913 
       
   914   Below we change \isacommand{foobar} so that it takes a proposition as
       
   915   argument and then starts a proof in order to prove it. Therefore in Line 13, 
       
   916   we set the kind indicator to @{ML thy_goal in OuterKeyword}.
       
   917 *}
       
   918 
       
   919 ML%linenosgray{*let
       
   920   fun set_up_thm str ctxt =
       
   921     let
       
   922       val prop = Syntax.read_prop ctxt str
       
   923     in
       
   924       Proof.theorem_i NONE (K I) [[(prop,[])]] ctxt
       
   925     end;
       
   926   
       
   927   val prove_prop = OuterParse.prop >>  
       
   928       (fn str => Toplevel.print o 
       
   929                     Toplevel.local_theory_to_proof NONE (set_up_thm str))
       
   930   
       
   931   val kind = OuterKeyword.thy_goal
       
   932 in
       
   933   OuterSyntax.command "foobar" "proving a proposition" kind prove_prop
       
   934 end *}
       
   935 
       
   936 text {*
       
   937   The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be
       
   938   proved) and a context as argument.  The context is necessary in order to be able to use
       
   939   @{ML Syntax.read_prop}, which converts a string into a proper proposition.
       
   940   In Line 6 the function @{ML Proof.theorem_i} starts the proof for the
       
   941   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
       
   942   omit); the argument @{ML "(K I)"} stands for a function that determines what
       
   943   should be done with the theorem once it is proved (we chose to just forget
       
   944   about it). Lines 9 to 11 contain the parser for the proposition.
       
   945 
       
   946   If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following 
       
   947   proof state:
       
   948 
       
   949   \begin{isabelle}
       
   950   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
       
   951   @{text "goal (1 subgoal):"}\\
       
   952   @{text "1. True \<and> True"}
       
   953   \end{isabelle}
       
   954 
       
   955   and you can build the proof
       
   956 
       
   957   \begin{isabelle}
       
   958   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
       
   959   \isacommand{apply}@{text "(rule conjI)"}\\
       
   960   \isacommand{apply}@{text "(rule TrueI)+"}\\
       
   961   \isacommand{done}
       
   962   \end{isabelle}
       
   963 
       
   964  
       
   965   
       
   966   (FIXME What do @{ML "Toplevel.theory"} 
       
   967   @{ML "Toplevel.print"} 
       
   968   @{ML Toplevel.local_theory} do?)
       
   969 
       
   970   (FIXME read a name and show how to store theorems)
       
   971 
       
   972 *}
       
   973 
       
   974 section {* Methods *}
       
   975 
       
   976 text {*
       
   977   Methods are a central concept in Isabelle. They are the ones you use for example
       
   978   in \isacommand{apply}. To print out all currently known methods you can use the 
       
   979   Isabelle command. 
       
   980 *}
       
   981 
       
   982 print_methods
       
   983 
       
   984 text {*
       
   985   An example of a very simple method is the following code.
       
   986 *}
       
   987 
       
   988 method_setup %gray foobar_meth = 
       
   989  {* Scan.succeed
       
   990       (K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *}
       
   991          "foobar method"
       
   992 
       
   993 text {*
       
   994   It defines the method @{text foobar_meth}, which takes no arguments (therefore the
       
   995   parser @{ML Scan.succeed}) and 
       
   996   only applies the tactic @{thm [source] conjE} and then @{thm [source] conjI}.
       
   997   This method can be used in the following proof
       
   998 *}
       
   999 
       
  1000 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
       
  1001 apply(foobar_meth)
       
  1002 txt {*
       
  1003   where it results in the goal state
       
  1004 
       
  1005   \begin{minipage}{\textwidth}
       
  1006   @{subgoals}
       
  1007   \end{minipage} *}
       
  1008 (*<*)oops(*>*)
       
  1009 
       
  1010 text {*
       
  1011   (FIXME: explain a version of rule-tac)
       
  1012 *}
       
  1013 
       
  1014 (*<*)
       
  1015 
       
  1016 chapter {* Parsing *}
       
  1017 
       
  1018 text {*
       
  1019 
       
  1020   Lots of Standard ML code is given in this document, for various reasons,
       
  1021   including:
       
  1022   \begin{itemize}
       
  1023   \item direct quotation of code found in the Isabelle source files,
       
  1024   or simplified versions of such code
       
  1025   \item identifiers found in the Isabelle source code, with their types 
       
  1026   (or specialisations of their types)
       
  1027   \item code examples, which can be run by the reader, to help illustrate the
       
  1028   behaviour of functions found in the Isabelle source code
       
  1029   \item ancillary functions, not from the Isabelle source code, 
       
  1030   which enable the reader to run relevant code examples
       
  1031   \item type abbreviations, which help explain the uses of certain functions
       
  1032   \end{itemize}
       
  1033 
       
  1034 *}
       
  1035 
       
  1036 section {* Parsing Isar input *}
       
  1037 
       
  1038 text {*
       
  1039 
       
  1040   The typical parsing function has the type
       
  1041   \texttt{'src -> 'res * 'src}, with input  
       
  1042   of type \texttt{'src}, returning a result 
       
  1043   of type \texttt{'res}, which is (or is derived from) the first part of the
       
  1044   input, and also returning the remainder of the input.
       
  1045   (In the common case, when it is clear what the ``remainder of the input''
       
  1046   means, we will just say that the functions ``returns'' the
       
  1047   value of type \texttt{'res}). 
       
  1048   An exception is raised if an appropriate value 
       
  1049   cannot be produced from the input.
       
  1050   A range of exceptions can be used to identify different reasons 
       
  1051   for the failure of a parse.
       
  1052   
       
  1053   This contrasts the standard parsing function in Standard ML,
       
  1054   which is of type 
       
  1055   \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option};
       
  1056   (for example, \texttt{List.getItem} and \texttt{Substring.getc}).
       
  1057   However, much of the discussion at 
       
  1058   FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html
       
  1059   is relevant.
       
  1060 
       
  1061   Naturally one may convert between the two different sorts of parsing functions
       
  1062   as follows:
       
  1063   \begin{verbatim}
       
  1064   open StringCvt ;
       
  1065   type ('res, 'src) ex_reader = 'src -> 'res * 'src
       
  1066   ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader 
       
  1067   fun ex_reader rdr src = Option.valOf (rdr src) ;
       
  1068   reader : ('res, 'src) ex_reader -> ('res, 'src) reader 
       
  1069   fun reader exrdr src = SOME (exrdr src) handle _ => NONE ;
       
  1070   \end{verbatim}
       
  1071   
       
  1072 *}
       
  1073 
       
  1074 section{* The \texttt{Scan} structure *}
       
  1075 
       
  1076 text {* 
       
  1077   The source file is \texttt{src/General/scan.ML}.
       
  1078   This structure provides functions for using and combining parsing functions
       
  1079   of the type \texttt{'src -> 'res * 'src}.
       
  1080   Three exceptions are used:
       
  1081   \begin{verbatim}
       
  1082   exception MORE of string option;  (*need more input (prompt)*)
       
  1083   exception FAIL of string option;  (*try alternatives (reason of failure)*)
       
  1084   exception ABORT of string;        (*dead end*)
       
  1085   \end{verbatim}
       
  1086   Many functions in this structure (generally those with names composed of
       
  1087   symbols) are declared as infix.
       
  1088 
       
  1089   Some functions from that structure are
       
  1090   \begin{verbatim}
       
  1091   |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
       
  1092   'src -> 'res2 * 'src''
       
  1093   --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
       
  1094   'src -> 'res1 * 'src''
       
  1095   -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
       
  1096   'src -> ('res1 * 'res2) * 'src''
       
  1097   ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') ->
       
  1098   'src -> string * 'src''
       
  1099   \end{verbatim}
       
  1100   These functions parse a result off the input source twice.
       
  1101 
       
  1102   \texttt{|--} and \texttt{--|} 
       
  1103   return the first result and the second result, respectively.
       
  1104 
       
  1105   \texttt{--} returns both.
       
  1106 
       
  1107   \verb|^^| returns the result of concatenating the two results
       
  1108   (which must be strings).
       
  1109 
       
  1110   Note how, although the types 
       
  1111   \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same,
       
  1112   the types as shown help suggest the behaviour of the functions.
       
  1113   \begin{verbatim}
       
  1114   :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->
       
  1115   'src -> ('res1 * 'res2) * 'src''
       
  1116   :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->
       
  1117   'src -> 'res2 * 'src''
       
  1118   \end{verbatim}
       
  1119   These are similar to \texttt{|--} and \texttt{--|},
       
  1120   except that the second parsing function can depend on the result of the first.
       
  1121   \begin{verbatim}
       
  1122   >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src'
       
  1123   || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src
       
  1124   \end{verbatim}
       
  1125   \texttt{p >> f} applies a function \texttt{f} to the result of a parse.
       
  1126   
       
  1127   \texttt{||} tries a second parsing function if the first one
       
  1128   fails by raising an exception of the form \texttt{FAIL \_}.
       
  1129   
       
  1130   \begin{verbatim}
       
  1131   succeed : 'res -> ('src -> 'res * 'src) ;
       
  1132   fail : ('src -> 'res_src) ;
       
  1133   !! : ('src * string option -> string) -> 
       
  1134   ('src -> 'res_src) -> ('src -> 'res_src) ;
       
  1135   \end{verbatim}
       
  1136   \texttt{succeed r} returns \texttt{r}, with the input unchanged.
       
  1137   \texttt{fail} always fails, raising exception \texttt{FAIL NONE}.
       
  1138   \texttt{!! f} only affects the failure mode, turning a failure that
       
  1139   raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}.
       
  1140   This is used to prevent recovery from the failure ---
       
  1141   thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, 
       
  1142   it won't recover by trying \texttt{parse2}.
       
  1143 
       
  1144   \begin{verbatim}
       
  1145   one : ('si -> bool) -> ('si list -> 'si * 'si list) ;
       
  1146   some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ;
       
  1147   \end{verbatim}
       
  1148   These require the input to be a list of items:
       
  1149   they fail, raising \texttt{MORE NONE} if the list is empty.
       
  1150   On other failures they raise \texttt{FAIL NONE} 
       
  1151 
       
  1152   \texttt{one p} takes the first
       
  1153   item from the list if it satisfies \texttt{p}, otherwise fails.
       
  1154 
       
  1155   \texttt{some f} takes the first
       
  1156   item from the list and applies \texttt{f} to it, failing if this returns
       
  1157   \texttt{NONE}.  
       
  1158 
       
  1159   \begin{verbatim}
       
  1160   many : ('si -> bool) -> 'si list -> 'si list * 'si list ; 
       
  1161   \end{verbatim}
       
  1162   \texttt{many p} takes items from the input until it encounters one 
       
  1163   which does not satisfy \texttt{p}.  If it reaches the end of the input
       
  1164   it fails, raising \texttt{MORE NONE}.
       
  1165 
       
  1166   \texttt{many1} (with the same type) fails if the first item 
       
  1167   does not satisfy \texttt{p}.  
       
  1168 
       
  1169   \begin{verbatim}
       
  1170   option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src)
       
  1171   optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src)
       
  1172   \end{verbatim}
       
  1173   \texttt{option}: 
       
  1174   where the parser \texttt{f} succeeds with result \texttt{r} 
       
  1175   or raises \texttt{FAIL \_},
       
  1176   \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}.
       
  1177   
       
  1178   \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_},
       
  1179   \texttt{optional f default} provides the result \texttt{default}.
       
  1180 
       
  1181   \begin{verbatim}
       
  1182   repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src
       
  1183   repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src
       
  1184   bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src 
       
  1185   \end{verbatim}
       
  1186   \texttt{repeat f} repeatedly parses an item off the remaining input until 
       
  1187   \texttt{f} fails with \texttt{FAIL \_}
       
  1188 
       
  1189   \texttt{repeat1} is as for \texttt{repeat}, but requires at least one
       
  1190   successful parse.
       
  1191 
       
  1192   \begin{verbatim}
       
  1193   lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src))
       
  1194   \end{verbatim}
       
  1195   \texttt{lift} changes the source type of a parser by putting in an extra
       
  1196   component \texttt{'ex}, which is ignored in the parsing.
       
  1197 
       
  1198   The \texttt{Scan} structure also provides the type \texttt{lexicon}, 
       
  1199   HOW DO THEY WORK ?? TO BE COMPLETED
       
  1200   \begin{verbatim}
       
  1201   dest_lexicon: lexicon -> string list ;
       
  1202   make_lexicon: string list list -> lexicon ;
       
  1203   empty_lexicon: lexicon ;
       
  1204   extend_lexicon: string list list -> lexicon -> lexicon ;
       
  1205   merge_lexicons: lexicon -> lexicon -> lexicon ;
       
  1206   is_literal: lexicon -> string list -> bool ;
       
  1207   literal: lexicon -> string list -> string list * string list ;
       
  1208   \end{verbatim}
       
  1209   Two lexicons, for the commands and keywords, are stored and can be retrieved
       
  1210   by:
       
  1211   \begin{verbatim}
       
  1212   val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ;
       
  1213   val commands = Scan.dest_lexicon command_lexicon ;
       
  1214   val keywords = Scan.dest_lexicon keyword_lexicon ;
       
  1215   \end{verbatim}
       
  1216 *}
       
  1217 
       
  1218 section{* The \texttt{OuterLex} structure *}
       
  1219 
       
  1220 text {*
       
  1221   The source file is @{text "src/Pure/Isar/outer_lex.ML"}.
       
  1222   In some other source files its name is abbreviated:
       
  1223   \begin{verbatim}
       
  1224   structure T = OuterLex;
       
  1225   \end{verbatim}
       
  1226   This structure defines the type \texttt{token}.
       
  1227   (The types
       
  1228   \texttt{OuterLex.token},
       
  1229   \texttt{OuterParse.token} and
       
  1230   \texttt{SpecParse.token} are all the same).
       
  1231   
       
  1232   Input text is split up into tokens, and the input source type for many parsing
       
  1233   functions is \texttt{token list}.
       
  1234 
       
  1235   The datatype definition (which is not published in the signature) is
       
  1236   \begin{verbatim}
       
  1237   datatype token = Token of Position.T * (token_kind * string);
       
  1238   \end{verbatim}
       
  1239   but here are some runnable examples for viewing tokens: 
       
  1240 
       
  1241 *}
       
  1242 
       
  1243 
       
  1244 
       
  1245 
       
  1246 ML{*
       
  1247   val toks = OuterSyntax.scan Position.none
       
  1248    "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ;
       
  1249 *}
       
  1250 
       
  1251 ML{*
       
  1252   print_depth 20 ;
       
  1253 *}
       
  1254 
       
  1255 ML{*
       
  1256   map OuterLex.text_of toks ;
       
  1257 *}
       
  1258 
       
  1259 ML{*
       
  1260   val proper_toks = filter OuterLex.is_proper toks ;
       
  1261 *}  
       
  1262 
       
  1263 ML{*
       
  1264   map OuterLex.kind_of proper_toks 
       
  1265 *}
       
  1266 
       
  1267 ML{*
       
  1268   map OuterLex.unparse proper_toks ;
       
  1269 *}
       
  1270 
       
  1271 ML{*
       
  1272   OuterLex.stopper
       
  1273 *}
       
  1274 
       
  1275 text {*
       
  1276 
       
  1277   The function \texttt{is\_proper : token -> bool} identifies tokens which are
       
  1278   not white space or comments: many parsing functions assume require spaces or
       
  1279   comments to have been filtered out.
       
  1280   
       
  1281   There is a special end-of-file token:
       
  1282   \begin{verbatim}
       
  1283   val (tok_eof : token, is_eof : token -> bool) = T.stopper ; 
       
  1284   (* end of file token *)
       
  1285   \end{verbatim}
       
  1286 
       
  1287 *}
       
  1288 
       
  1289 section {* The \texttt{OuterParse} structure *}
       
  1290 
       
  1291 text {*
       
  1292   The source file is \texttt{src/Pure/Isar/outer\_parse.ML}.
       
  1293   In some other source files its name is abbreviated:
       
  1294   \begin{verbatim}
       
  1295   structure P = OuterParse;
       
  1296   \end{verbatim}
       
  1297   Here the parsers use \texttt{token list} as the input source type. 
       
  1298   
       
  1299   Some of the parsers simply select the first token, provided that it is of the
       
  1300   right kind (as returned by \texttt{T.kind\_of}): these are 
       
  1301   \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var,
       
  1302   type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof}
       
  1303   Others select the first token, provided that it is one of several kinds,
       
  1304   (eg, \texttt{name, xname, text, typ}).
       
  1305 
       
  1306   \begin{verbatim}
       
  1307   type 'a tlp = token list -> 'a * token list ; (* token list parser *)
       
  1308   $$$ : string -> string tlp
       
  1309   nat : int tlp ;
       
  1310   maybe : 'a tlp -> 'a option tlp ;
       
  1311   \end{verbatim}
       
  1312 
       
  1313   \texttt{\$\$\$ s} returns the first token,
       
  1314   if it equals \texttt{s} \emph{and} \texttt{s} is a keyword.
       
  1315 
       
  1316   \texttt{nat} returns the first token, if it is a number, and evaluates it.
       
  1317 
       
  1318   \texttt{maybe}: if \texttt{p} returns \texttt{r}, 
       
  1319   then \texttt{maybe p} returns \texttt{SOME r} ;
       
  1320   if the first token is an underscore, it returns \texttt{NONE}.
       
  1321 
       
  1322   A few examples:
       
  1323   \begin{verbatim}
       
  1324   P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *)
       
  1325   P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *)
       
  1326   val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ;
       
  1327   val proper_toks = List.filter T.is_proper toks ;
       
  1328   P.list P.nat toks ; (* OK, doesn't recognize white space *)
       
  1329   P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *)
       
  1330   P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *)
       
  1331   P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *)
       
  1332   val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ;
       
  1333   P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *)
       
  1334   \end{verbatim}
       
  1335 
       
  1336   The following code helps run examples:
       
  1337   \begin{verbatim}
       
  1338   fun parse_str tlp str = 
       
  1339   let val toks : token list = OuterSyntax.scan str ;
       
  1340   val proper_toks = List.filter T.is_proper toks @ [tok_eof] ;
       
  1341   val (res, rem_toks) = tlp proper_toks ;
       
  1342   val rem_str = String.concat
       
  1343   (Library.separate " " (List.map T.unparse rem_toks)) ;
       
  1344   in (res, rem_str) end ;
       
  1345   \end{verbatim}
       
  1346 
       
  1347   Some examples from \texttt{src/Pure/Isar/outer\_parse.ML}
       
  1348   \begin{verbatim}
       
  1349   val type_args =
       
  1350   type_ident >> Library.single ||
       
  1351   $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") ||
       
  1352   Scan.succeed [];
       
  1353   \end{verbatim}
       
  1354   There are three ways parsing a list of type arguments can succeed.
       
  1355   The first line reads a single type argument, and turns it into a singleton
       
  1356   list.
       
  1357   The second line reads "(", and then the remainder, ignoring the "(" ;
       
  1358   the remainder consists of a list of type identifiers (at least one),
       
  1359   and then a ")" which is also ignored.
       
  1360   The \texttt{!!!} ensures that if the parsing proceeds this far and then fails,
       
  1361   it won't try the third line (see the description of \texttt{Scan.!!}).
       
  1362   The third line consumes no input and returns the empty list.
       
  1363 
       
  1364   \begin{verbatim}
       
  1365   fun triple2 (x, (y, z)) = (x, y, z);
       
  1366   val arity = xname -- ($$$ "::" |-- !!! (
       
  1367   Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) []
       
  1368   -- sort)) >> triple2;
       
  1369   \end{verbatim}
       
  1370   The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is
       
  1371   ignored), then optionally a list $ss$ of sorts and then another sort $s$.
       
  1372   The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$.
       
  1373   The second line reads the optional list of sorts:
       
  1374   it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored,
       
  1375   and between them a comma-separated list of sorts.
       
  1376   If this list is absent, the default \texttt{[]} provides the list of sorts.
       
  1377 
       
  1378   \begin{verbatim}
       
  1379   parse_str P.type_args "('a, 'b) ntyp" ;
       
  1380   parse_str P.type_args "'a ntyp" ;
       
  1381   parse_str P.type_args "ntyp" ;
       
  1382   parse_str P.arity "ty :: tycl" ;
       
  1383   parse_str P.arity "ty :: (tycl1, tycl2) tycl" ;
       
  1384   \end{verbatim}
       
  1385 
       
  1386 *}
       
  1387 
       
  1388 section {* The \texttt{SpecParse} structure *}
       
  1389 
       
  1390 text {*
       
  1391   The source file is \texttt{src/Pure/Isar/spec\_parse.ML}.
       
  1392   This structure contains token list parsers for more complicated values.
       
  1393   For example, 
       
  1394   \begin{verbatim}
       
  1395   open SpecParse ;
       
  1396   attrib : Attrib.src tok_rdr ; 
       
  1397   attribs : Attrib.src list tok_rdr ;
       
  1398   opt_attribs : Attrib.src list tok_rdr ;
       
  1399   xthm : (thmref * Attrib.src list) tok_rdr ;
       
  1400   xthms1 : (thmref * Attrib.src list) list tok_rdr ;
       
  1401   
       
  1402   parse_str attrib "simp" ;
       
  1403   parse_str opt_attribs "hello" ;
       
  1404   val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ;
       
  1405   map Args.dest_src ass ;
       
  1406   val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ;
       
  1407   
       
  1408   parse_str xthm "mythm [attr]" ;
       
  1409   parse_str xthms1 "thm1 [attr] thms2" ;
       
  1410   \end{verbatim}
       
  1411   
       
  1412   As you can see, attributes are described using types of the \texttt{Args}
       
  1413   structure, described below.
       
  1414 *}
       
  1415 
       
  1416 section{* The \texttt{Args} structure *}
       
  1417 
       
  1418 text {*
       
  1419   The source file is \texttt{src/Pure/Isar/args.ML}.
       
  1420   The primary type of this structure is the \texttt{src} datatype;
       
  1421   the single constructors not published in the signature, but 
       
  1422   \texttt{Args.src} and \texttt{Args.dest\_src}
       
  1423   are in fact the constructor and destructor functions.
       
  1424   Note that the types \texttt{Attrib.src} and \texttt{Method.src}
       
  1425   are in fact \texttt{Args.src}.
       
  1426 
       
  1427   \begin{verbatim}
       
  1428   src : (string * Args.T list) * Position.T -> Args.src ;
       
  1429   dest_src : Args.src -> (string * Args.T list) * Position.T ;
       
  1430   Args.pretty_src : Proof.context -> Args.src -> Pretty.T ;
       
  1431   fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ;
       
  1432 
       
  1433   val thy = ML_Context.the_context () ;
       
  1434   val ctxt = ProofContext.init thy ;
       
  1435   map (pr_src ctxt) ass ;
       
  1436   \end{verbatim}
       
  1437 
       
  1438   So an \texttt{Args.src} consists of the first word, then a list of further 
       
  1439   ``arguments'', of type \texttt{Args.T}, with information about position in the
       
  1440   input.
       
  1441   \begin{verbatim}
       
  1442   (* how an Args.src is parsed *)
       
  1443   P.position : 'a tlp -> ('a * Position.T) tlp ;
       
  1444   P.arguments : Args.T list tlp ;
       
  1445 
       
  1446   val parse_src : Args.src tlp =
       
  1447   P.position (P.xname -- P.arguments) >> Args.src ;
       
  1448   \end{verbatim}
       
  1449 
       
  1450   \begin{verbatim}
       
  1451   val ((first_word, args), pos) = Args.dest_src asrc ;
       
  1452   map Args.string_of args ;
       
  1453   \end{verbatim}
       
  1454 
       
  1455   The \texttt{Args} structure contains more parsers and parser transformers 
       
  1456   for which the input source type is \texttt{Args.T list}.  For example,
       
  1457   \begin{verbatim}
       
  1458   type 'a atlp = Args.T list -> 'a * Args.T list ;
       
  1459   open Args ;
       
  1460   nat : int atlp ; (* also Args.int *)
       
  1461   thm_sel : PureThy.interval list atlp ;
       
  1462   list : 'a atlp -> 'a list atlp ;
       
  1463   attribs : (string -> string) -> Args.src list atlp ;
       
  1464   opt_attribs : (string -> string) -> Args.src list atlp ;
       
  1465   
       
  1466   (* parse_atl_str : 'a atlp -> (string -> 'a * string) ;
       
  1467   given an Args.T list parser, to get a string parser *)
       
  1468   fun parse_atl_str atlp str = 
       
  1469   let val (ats, rem_str) = parse_str P.arguments str ;
       
  1470   val (res, rem_ats) = atlp ats ;
       
  1471   in (res, String.concat (Library.separate " "
       
  1472   (List.map Args.string_of rem_ats @ [rem_str]))) end ;
       
  1473 
       
  1474   parse_atl_str Args.int "-1-," ;
       
  1475   parse_atl_str (Scan.option Args.int) "x1-," ;
       
  1476   parse_atl_str Args.thm_sel "(1-,4,13-22)" ;
       
  1477 
       
  1478   val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I)
       
  1479   "[THEN trans [THEN sym], simp, OF sym]" ;
       
  1480   \end{verbatim}
       
  1481 
       
  1482   From here, an attribute is interpreted using \texttt{Attrib.attribute}.
       
  1483 
       
  1484   \texttt{Args} has a large number of functions which parse an \texttt{Args.src}
       
  1485   and also refer to a generic context.  
       
  1486   Note the use of \texttt{Scan.lift} for this.
       
  1487   (as does \texttt{Attrib} - RETHINK THIS)
       
  1488   
       
  1489   (\texttt{Args.syntax} shown below has type specialised)
       
  1490 
       
  1491   \begin{verbatim}
       
  1492   type ('res, 'src) parse_fn = 'src -> 'res * 'src ;
       
  1493   type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ;
       
  1494   Scan.lift : 'a atlp -> 'a cgatlp ;
       
  1495   term : term cgatlp ;
       
  1496   typ : typ cgatlp ;
       
  1497   
       
  1498   Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ;
       
  1499   Attrib.thm : thm cgatlp ;
       
  1500   Attrib.thms : thm list cgatlp ;
       
  1501   Attrib.multi_thm : thm list cgatlp ;
       
  1502   
       
  1503   (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ;
       
  1504   given a (Context.generic * Args.T list) parser, to get a string parser *)
       
  1505   fun parse_cgatl_str cgatlp str = 
       
  1506   let 
       
  1507     (* use the current generic context *)
       
  1508     val generic = Context.Theory thy ;
       
  1509     val (ats, rem_str) = parse_str P.arguments str ;
       
  1510     (* ignore any change to the generic context *)
       
  1511     val (res, (_, rem_ats)) = cgatlp (generic, ats) ;
       
  1512   in (res, String.concat (Library.separate " "
       
  1513       (List.map Args.string_of rem_ats @ [rem_str]))) end ;
       
  1514   \end{verbatim}
       
  1515 *}
       
  1516 
       
  1517 section{* Attributes, and the \texttt{Attrib} structure *}
       
  1518 
       
  1519 text {*
       
  1520   The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}.
       
  1521   The source file for the \texttt{Attrib} structure is
       
  1522   \texttt{src/Pure/Isar/attrib.ML}.
       
  1523   Most attributes use a theorem to change a generic context (for example, 
       
  1524   by declaring that the theorem should be used, by default, in simplification),
       
  1525   or change a theorem (which most often involves referring to the current
       
  1526   theory). 
       
  1527   The functions \texttt{Thm.rule\_attribute} and
       
  1528   \texttt{Thm.declaration\_attribute} create attributes of these kinds.
       
  1529 
       
  1530   \begin{verbatim}
       
  1531   type attribute = Context.generic * thm -> Context.generic * thm;
       
  1532   type 'a trf = 'a -> 'a ; (* transformer of a given type *)
       
  1533   Thm.rule_attribute  : (Context.generic -> thm -> thm) -> attribute ;
       
  1534   Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ;
       
  1535 
       
  1536   Attrib.print_attributes : theory -> unit ;
       
  1537   Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ;
       
  1538 
       
  1539   List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ;
       
  1540   \end{verbatim}
       
  1541 
       
  1542   An attribute is stored in a theory as indicated by:
       
  1543   \begin{verbatim}
       
  1544   Attrib.add_attributes : 
       
  1545   (bstring * (src -> attribute) * string) list -> theory trf ; 
       
  1546   (*
       
  1547   Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ;
       
  1548   *)
       
  1549   \end{verbatim}
       
  1550   where the first and third arguments are name and description of the attribute,
       
  1551   and the second is a function which parses the attribute input text 
       
  1552   (including the attribute name, which has necessarily already been parsed).
       
  1553   Here, \texttt{THEN\_att} is a function declared in the code for the
       
  1554   structure \texttt{Attrib}, but not published in its signature.
       
  1555   The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of 
       
  1556   \texttt{Attrib.add\_attributes} to add a number of attributes.
       
  1557 
       
  1558   \begin{verbatim}
       
  1559   FullAttrib.THEN_att : src -> attribute ;
       
  1560   FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ;
       
  1561   FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ;
       
  1562   \end{verbatim}
       
  1563 
       
  1564   \begin{verbatim}
       
  1565   Attrib.syntax : attribute cgatlp -> src -> attribute ;
       
  1566   Attrib.no_args : attribute -> src -> attribute ;
       
  1567   \end{verbatim}
       
  1568   When this is called as \texttt{syntax scan src (gc, th)}
       
  1569   the generic context \texttt{gc} is used 
       
  1570   (and potentially changed to \texttt{gc'})
       
  1571   by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would
       
  1572   then be applied to \texttt{(gc', th)}.
       
  1573   The source for parsing the attribute is the arguments part of \texttt{src},
       
  1574   which must all be consumed by the parse.
       
  1575 
       
  1576   For example, for \texttt{Attrib.no\_args attr src}, the attribute parser 
       
  1577   simply returns \texttt{attr}, requiring that the arguments part of
       
  1578   \texttt{src} must be empty.
       
  1579 
       
  1580   Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified:
       
  1581   \begin{verbatim}
       
  1582   fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ;
       
  1583   rot_att_n : int -> attribute ;
       
  1584   val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ;
       
  1585   val rotated_att : src -> attribute =
       
  1586   Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ;
       
  1587   
       
  1588   val THEN_arg : int cgatlp = Scan.lift 
       
  1589   (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ;
       
  1590 
       
  1591   Attrib.thm : thm cgatlp ;
       
  1592 
       
  1593   THEN_arg -- Attrib.thm : (int * thm) cgatlp ;
       
  1594 
       
  1595   fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ;
       
  1596   THEN_att_n : int * thm -> attribute ;
       
  1597 
       
  1598   val THEN_att : src -> attribute = Attrib.syntax
       
  1599   (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp);
       
  1600   \end{verbatim}
       
  1601   The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg}
       
  1602   read an optional argument, which for \texttt{rotated} is an integer, 
       
  1603   and for \texttt{THEN} is a natural enclosed in square brackets;
       
  1604   the default, if the argument is absent, is 1 in each case.
       
  1605   Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into
       
  1606   attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is
       
  1607   parsed by \texttt{Attrib.thm}.  
       
  1608   Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}.
       
  1609 
       
  1610 *}
       
  1611 
       
  1612 section{* Methods, and the \texttt{Method} structure *}
       
  1613 
       
  1614 text {*
       
  1615   The source file is \texttt{src/Pure/Isar/method.ML}.
       
  1616   The type \texttt{method} is defined by the datatype declaration
       
  1617   \begin{verbatim}
       
  1618   (* datatype method = Meth of thm list -> cases_tactic; *)
       
  1619   RuleCases.NO_CASES : tactic -> cases_tactic ;
       
  1620   \end{verbatim}
       
  1621   In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor
       
  1622   \texttt{Meth}.
       
  1623   A \texttt{cases\_tactic} is an elaborated version of a tactic.
       
  1624   \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a
       
  1625   \texttt{cases\_tactic} without any further case information.
       
  1626   For further details see the description of structure \texttt{RuleCases} below.
       
  1627   The list of theorems to be passed to a method consists of the current
       
  1628   \emph{facts} in the proof.
       
  1629   
       
  1630   \begin{verbatim}
       
  1631   RAW_METHOD : (thm list -> tactic) -> method ;
       
  1632   METHOD : (thm list -> tactic) -> method ;
       
  1633   
       
  1634   SIMPLE_METHOD : tactic -> method ;
       
  1635   SIMPLE_METHOD' : (int -> tactic) -> method ;
       
  1636   SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ;
       
  1637 
       
  1638   RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ;
       
  1639   METHOD_CASES : (thm list -> cases_tactic) -> method ;
       
  1640   \end{verbatim}
       
  1641   A method is, in its simplest form, a tactic; applying the method is to apply
       
  1642   the tactic to the current goal state.
       
  1643 
       
  1644   Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying 
       
  1645   \texttt{tacf} to the current {facts}, and applying that tactic to the
       
  1646   goal state.
       
  1647 
       
  1648   \texttt{METHOD} is similar but also first applies
       
  1649   \texttt{Goal.conjunction\_tac} to all subgoals.
       
  1650 
       
  1651   \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then
       
  1652   applies \texttt{tacf}.
       
  1653 
       
  1654   \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then
       
  1655   applies \texttt{tacf} to subgoal 1.
       
  1656 
       
  1657   \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by
       
  1658   \texttt{quant}, which may be, for example,
       
  1659   \texttt{ALLGOALS} (all subgoals),
       
  1660   \texttt{TRYALL} (try all subgoals, failure is OK),
       
  1661   \texttt{FIRSTGOAL} (try subgoals until it succeeds once), 
       
  1662   \texttt{(fn tacf => tacf 4)} (subgoal 4), etc
       
  1663   (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}).
       
  1664 
       
  1665   A method is stored in a theory as indicated by:
       
  1666   \begin{verbatim}
       
  1667   Method.add_method : 
       
  1668   (bstring * (src -> Proof.context -> method) * string) -> theory trf ; 
       
  1669   ( *
       
  1670   * )
       
  1671   \end{verbatim}
       
  1672   where the first and third arguments are name and description of the method,
       
  1673   and the second is a function which parses the method input text 
       
  1674   (including the method name, which has necessarily already been parsed).
       
  1675 
       
  1676   Here, \texttt{xxx} is a function declared in the code for the
       
  1677   structure \texttt{Method}, but not published in its signature.
       
  1678   The source file \texttt{src/Pure/Isar/method.ML} shows the use of 
       
  1679   \texttt{Method.add\_method} to add a number of methods.
       
  1680 
       
  1681 
       
  1682 *}
       
  1683 (*>*)
       
  1684 end