CookBook/Parsing.thy
author Christian Urban <urbanc@in.tum.de>
Thu, 29 Jan 2009 17:10:13 +0000
changeset 92 4e3f262a459d
parent 86 fdb9ea86c2a3
child 101 123401a5c8e9
permissions -rw-r--r--
tuned

theory Parsing
imports Base 

begin


chapter {* Parsing *}

text {*

  Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. 
  Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so
  on, belong to the outer syntax, whereas items inside double quotation marks, such 
  as terms, types and so on, belong to the inner syntax. For parsing inner syntax, 
  Isabelle uses a rather general and sophisticated algorithm due to Earley, which 
  is driven by priority grammars. Parsers for outer syntax are built up by functional
  parsing combinators. These combinators are a well-established technique for parsing, 
  which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}.
  Isabelle developers are usually concerned with writing these outer syntax parsers, 
  either for new definitional packages or for calling tactics with specific arguments. 

  \begin{readmore}
  The library 
  for writing parser combinators is split up, roughly, into two parts. 
  The first part consists of a collection of generic parser combinators defined
  in the structure @{ML_struct Scan} in the file 
  @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of 
  combinators for dealing with specific token types, which are defined in the 
  structure @{ML_struct OuterParse} in the file 
  @{ML_file "Pure/Isar/outer_parse.ML"}.
  \end{readmore}

*}

section {* Building Generic Parsers *}

text {*

  Let us first have a look at parsing strings using generic parsing combinators. 
  The function @{ML "(op $$)"} takes a string as argument and will ``consume'' this string from 
  a given input list of strings. ``Consume'' in this context means that it will 
  return a pair consisting of this string and the rest of the input list. 
  For example:

  @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}

  @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}

  This function will either succeed (as in the two examples above) or raise the exception 
  @{text "FAIL"} if no string can be consumed. For example trying to parse

  @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" 
                               "Exception FAIL raised"}
  
  will raise the exception @{text "FAIL"}.
  There are three exceptions used in the parsing combinators:

  \begin{itemize}
  \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
  might be explored. 
  \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
  in @{text "($$ \"h\") []"}.
  \item @{text "ABORT"} is the exception that is raised when a dead end is reached. 
  It is used for example in the function @{ML "(op !!)"} (see below).
  \end{itemize}

  However, note that these exceptions are private to the parser and cannot be accessed
  by the programmer (for example to handle them).
  
  Slightly more general than the parser @{ML "(op $$)"} is the function @{ML
  Scan.one}, in that it takes a predicate as argument and then parses exactly
  one item from the input list satisfying this predicate. For example the
  following parser either consumes an @{text [quotes] "h"} or a @{text
  [quotes] "w"}:


@{ML_response [display,gray] 
"let 
  val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
  val input1 = (explode \"hello\")
  val input2 = (explode \"world\")
in
    (hw input1, hw input2)
end"
    "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}

  Two parser can be connected in sequence by using the function @{ML "(op --)"}. 
  For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
  sequence can be achieved by

  @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
                          "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}

  Note how the result of consumed strings builds up on the left as nested pairs.  

  If, as in the previous example, one wants to parse a particular string, 
  then one should use the function @{ML Scan.this_string}:

  @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
                          "(\"hell\", [\"o\"])"}

  Parsers that explore alternatives can be constructed using the function @{ML
  "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
  result of @{text "p"}, in case it succeeds, otherwise it returns the
  result of @{text "q"}. For example


@{ML_response [display,gray] 
"let 
  val hw = ($$ \"h\") || ($$ \"w\")
  val input1 = (explode \"hello\")
  val input2 = (explode \"world\")
in
  (hw input1, hw input2)
end"
  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}

  The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing function 
  for parsers, except that they discard the item being parsed by the first (respectively second)
  parser. For example
  
@{ML_response [display,gray]
"let 
  val just_e = ($$ \"h\") |-- ($$ \"e\") 
  val just_h = ($$ \"h\") --| ($$ \"e\") 
  val input = (explode \"hello\")  
in 
  (just_e input, just_h input)
end"
  "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}

  The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
  @{text "p"}, if it succeeds; otherwise it returns 
  the default value @{text "x"}. For example

@{ML_response [display,gray]
"let 
  val p = Scan.optional ($$ \"h\") \"x\"
  val input1 = (explode \"hello\")
  val input2 = (explode \"world\")  
in 
  (p input1, p input2)
end" 
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}

  The function @{ML Scan.option} works similarly, except no default value can
  be given. Instead, the result is wrapped as an @{text "option"}-type. For example:

@{ML_response [display,gray]
"let 
  val p = Scan.option ($$ \"h\")
  val input1 = (explode \"hello\")
  val input2 = (explode \"world\")  
in 
  (p input1, p input2)
end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}

  The function @{ML "(op !!)"} helps to produce appropriate error messages
  during parsing. For example if one wants to parse that @{text p} is immediately 
  followed by @{text q}, or start a completely different parser @{text r},
  one might write

  @{ML [display,gray] "(p -- q) || r" for p q r}

  However, this parser is problematic for producing an appropriate error message, in case
  the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses the information 
  that @{text p} should be followed by @{text q}. To see this consider the case in
  which @{text p} 
  is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail 
  and the 
  alternative parser @{text r} will be tried. However in many circumstance this will be the wrong
  parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then 
  caused by the
  failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation
  can be avoided when using the function @{ML "(op !!)"}. This function aborts the whole process of
  parsing in case of a failure and prints an error message. For example if we invoke the parser
  
  @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}

  on @{text [quotes] "hello"}, the parsing succeeds

  @{ML_response [display,gray] 
                "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
                "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}

  but if we invoke it on @{text [quotes] "world"}
  
  @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
                               "Exception ABORT raised"}

  then the parsing aborts and the error message @{text "foo"} is printed out. In order to
  see the error message, we need to prefix the parser with the function 
  @{ML "Scan.error"}. For example

  @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
                               "Exception Error \"foo\" raised"}

  This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
  (FIXME: give reference to later place). 
  
  Let us now return to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want
  to generate the correct error message for p-followed-by-q, then
  we have to write:
*}

ML{*fun p_followed_by_q p q r =
  let 
     val err_msg = (fn _ => p ^ " is not followed by " ^ q)
  in
    ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)
  end *}


text {*
  Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and 
  the input @{text [quotes] "holle"} 

  @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
                               "Exception ERROR \"h is not followed by e\" raised"} 

  produces the correct error message. Running it with
 
  @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
                          "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
  
  yields the expected parsing. 

  The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as 
  often as it succeeds. For example
  
  @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
                "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
  
  Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
  @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
  succeeds at least once.

  Also note that the parser would have aborted with the exception @{text MORE}, if
  we had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
  the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
  them we can write
  
  @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
                "([\"h\", \"h\", \"h\", \"h\"], [])"}

  @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
  other stoppers need to be used when parsing token, for example. However, this kind of 
  manually wrapping is often already done by the surrounding infrastructure. 

  The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
  string as in

  @{ML_response [display,gray] 
"let 
   val p = Scan.repeat (Scan.one Symbol.not_eof)
   val input = (explode \"foo bar foo\") 
in
   Scan.finite Symbol.stopper p input
end" 
"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}

  where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
  end of the input string (i.e.~stopper symbol).

  The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
  parse the input, then the whole parser fails; if not, then the second is tried. Therefore
  
  @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
                               "Exception FAIL raised"}

  fails, while

  @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")"
                          "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}

  succeeds. 

  The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
  input until a certain marker symbol is reached. In the example below the marker
  symbol is a @{text [quotes] "*"}.

  @{ML_response [display,gray]
"let 
  val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
  val input1 = (explode \"fooooo\")
  val input2 = (explode \"foo*ooo\")
in
  (Scan.finite Symbol.stopper p input1, 
   Scan.finite Symbol.stopper p input2)
end"
"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}

  After parsing is done, one nearly always wants to apply a function on the parsed 
  items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs 
  first the parser @{text p} and upon successful completion applies the 
  function @{text f} to the result. For example

@{ML_response [display,gray]
"let 
  fun double (x,y) = (x^x,y^y) 
in
  (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
end"
"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}

  doubles the two parsed input strings. Or

  @{ML_response [display,gray] 
"let 
   val p = Scan.repeat (Scan.one Symbol.not_eof) >> implode
   val input = (explode \"foo bar foo\") 
in
   Scan.finite Symbol.stopper p input
end" 
"(\"foo bar foo\",[])"}

  where the single-character strings in the parsed output are transformed
  back into one string.
 
  \begin{exercise}\label{ex:scancmts}
  Write a parser that parses an input string so that any comment enclosed
  inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
  @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
  function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
  "s1^s^s2" for s1 s2 s}.
  \end{exercise}


  The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
  the given parser to the second component of the pair and leaves the  first component 
  untouched. For example

@{ML_response [display,gray]
"Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}

  (FIXME: In which situations is this useful? Give examples.) 
*}

section {* Parsing Theory Syntax *}

text {*
  Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
  This is because the parsers for the theory syntax, as well as the parsers for the 
  argument syntax of proof methods and attributes use the type 
  @{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}).
  There are also parsers for ML-expressions and ML-files.

  \begin{readmore}
  The parser functions for the theory syntax are contained in the structure
  @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
  The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
  \end{readmore}

  The structure @{ML_struct OuterLex} defines several kinds of tokens (for example 
  @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and
  @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
  kind of tokens.
*}  

text {*
  For the examples below, we can generate a token list out of a string using
  the function @{ML "OuterSyntax.scan"}, which we give below @{ML
  "Position.none"} as argument since, at the moment, we are not interested in
  generating precise error messages. The following code


@{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
"[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
 Token (\<dots>,(Space, \" \"),\<dots>), 
 Token (\<dots>,(Ident, \"world\"),\<dots>)]"}

  produces three tokens where the first and the last are identifiers, since
  @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
  other syntactic category.\footnote{Note that because of a possible a bug in
  the PolyML runtime system the result is printed as @{text "?"}, instead of
  the tokens.} The second indicates a space.

  Many parsing functions later on will require spaces, comments and the like
  to have already been filtered out.  So from now on we are going to use the 
  functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example


@{ML_response_fake [display,gray]
"let
   val input = OuterSyntax.scan Position.none \"hello world\"
in
   filter OuterLex.is_proper input
end" 
"[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}

  For convenience we define the function

*}

ML{*fun filtered_input str = 
       filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}

text {*

  If we parse

@{ML_response_fake [display,gray] 
"filtered_input \"inductive | for\"" 
"[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
 Token (\<dots>,(Keyword, \"|\"),\<dots>), 
 Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}

  we obtain a list consisting of only a command and two keyword tokens.
  If you want to see which keywords and commands are currently known, type in
  the following code (you might have to adjust the @{ML print_depth} in order to
  see the complete list):

@{ML_response_fake [display,gray] 
"let 
  val (keywords, commands) = OuterKeyword.get_lexicons ()
in 
  (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
end" 
"([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}

  Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example

@{ML_response [display,gray]
"let 
  val input1 = filtered_input \"where for\"
  val input2 = filtered_input \"| in\"
in 
  (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
end"
"((\"where\",\<dots>),(\"|\",\<dots>))"}

  Like before, we can sequentially connect parsers with @{ML "(op --)"}. For example 

@{ML_response [display,gray]
"let 
  val input = filtered_input \"| in\"
in 
  (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
end"
"((\"|\",\"in\"),[])"}

  The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
  list of items recognised by the parser @{text p}, where the items being parsed
  are separated by the string @{text s}. For example

@{ML_response [display,gray]
"let 
  val input = filtered_input \"in | in | in foo\"
in 
  (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
end" 
"([\"in\",\"in\",\"in\"],[\<dots>])"}

  @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
  be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the end
  of the parsed string, otherwise the parser would have consumed all tokens
  and then failed with the exception @{text "MORE"}. Like in the previous
  section, we can avoid this exception using the wrapper @{ML
  Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
  OuterLex.stopper}. We can write

@{ML_response [display,gray]
"let 
  val input = filtered_input \"in | in | in\"
in 
  Scan.finite OuterLex.stopper 
         (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
end" 
"([\"in\",\"in\",\"in\"],[])"}

  The following function will help to run examples.

*}

ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}

text {*

  The function @{ML "OuterParse.!!!"} can be used to force termination of the
  parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), 
  except that the error message is fixed to be @{text [quotes] "Outer syntax error"}
  with a relatively precise description of the failure. For example:

@{ML_response_fake [display,gray]
"let 
  val input = filtered_input \"in |\"
  val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
in 
  parse (OuterParse.!!! parse_bar_then_in) input 
end"
"Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
but keyword in was found\" raised"
}

  \begin{exercise} (FIXME)
  A type-identifier, for example @{typ "'a"}, is a token of 
  kind @{ML "Keyword" in OuterLex}. It can be parsed using 
  the function @{ML OuterParse.type_ident}.
  \end{exercise}


*}





section {* Positional Information *}

text {*

  @{ML OuterParse.position}

*}

ML{*
  OuterParse.position
*}


section {* Parsing Inner Syntax *}

ML{*
let 
  val input = OuterSyntax.scan Position.none "0"
in 
  OuterParse.prop input
end 

*}

ML{*
  OuterParse.opt_target
*}

text {* (FIXME funny output for a proposition) *}

ML{*
  OuterParse.opt_target --
  OuterParse.fixes -- 
  OuterParse.for_fixes --
  Scan.optional (OuterParse.$$$ "where" |--
    OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []

*}

ML{* OuterSyntax.command *}

section {* New Commands and Keyword Files *}

text {*
  Often new commands, for example for providing new definitional principles,
  need to be implemented. While this is not difficult on the ML-level,
  new commands, in order to be useful, need to be recognised by
  ProofGeneral. This results in some subtle configuration issues, which we
  will explain in this section.

  To keep things simple, let us start with a ``silly'' command that does nothing 
  at all. We shall name this command \isacommand{foobar}. On the ML-level it can be 
  defined as
*}

ML{*let
  val do_nothing = Scan.succeed (Toplevel.theory I)
  val kind = OuterKeyword.thy_decl
in
  OuterSyntax.command "foobar" "description of foobar" kind do_nothing
end *}

text {*
  The crucial function @{ML OuterSyntax.command} expects a name for the command, a
  short description, a kind indicator (which we will explain later on more thoroughly) and a
  parser producing a top-level transition function (its purpose will also explained
  later). 

  While this is everything we have to do on the ML-level, we need a keyword
  file that can be loaded by ProofGeneral. This is to enable ProofGeneral to
  recognise \isacommand{foobar} as a command. Such a keyword file can be
  generated with the command-line:

  @{text [display] "$ isabelle keywords -k foobar some_log_files"}

  The option @{text "-k foobar"} indicates which postfix the name of the keyword file 
  will be assigned. In the case above the file will be named @{text
  "isar-keywords-foobar.el"}. This command requires log files to be
  present (in order to extract the keywords from them). To generate these log
  files, we first package the code above into a separate theory file named
  @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the
  complete code.


  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \begin{figure}[t]
  \begin{graybox}\small
  \isacommand{theory}~@{text Command}\\
  \isacommand{imports}~@{text Main}\\
  \isacommand{begin}\\
  \isacommand{ML}~@{text "\<verbopen>"}\\
  @{ML
"let
  val do_nothing = Scan.succeed (Toplevel.theory I)
  val kind = OuterKeyword.thy_decl
in
  OuterSyntax.command \"foobar\" \"description of foobar\" kind do_nothing
end"}\\
  @{text "\<verbclose>"}\\
  \isacommand{end}
  \end{graybox}
  \caption{\small The file @{text "Command.thy"} is necessary for generating a log 
  file. This log file enables Isabelle to generate a keyword file containing 
  the command \isacommand{foobar}.\label{fig:commandtheory}}
  \end{figure}
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  For our purposes it is sufficient to use the log files of the theories
  @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as
  the log file for the theory @{text "Command.thy"}, which contains the new
  \isacommand{foobar}-command. If you target other logics besides HOL, such
  as Nominal or ZF, then you need to adapt the log files appropriately.
  @{text Pure} and @{text HOL} are usually compiled during the installation of
  Isabelle. So log files for them should be already available. If not, then
  they can be conveniently compiled with the help of the build-script from the Isabelle
  distribution


  @{text [display] 
"$ ./build -m \"Pure\"
$ ./build -m \"HOL\""}
  
  The @{text "Pure-ProofGeneral"} theory needs to be compiled with

  @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}

  For the theory @{text "Command.thy"}, we first create a ``managed'' subdirectory 
  with

  @{text [display] "$ isabelle mkdir FoobarCommand"}

  This generates a directory containing the files 

  @{text [display] 
"./IsaMakefile
./FoobarCommand/ROOT.ML
./FoobarCommand/document
./FoobarCommand/document/root.tex"}


  We need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}
  and add the line 

  @{text [display] "use_thy \"Command\";"} 
  
  to the file @{text "./FoobarCommand/ROOT.ML"}. We can now compile the theory by just typing

  @{text [display] "$ isabelle make"}

  We created finally all the necessary log files. They are stored 
  in the directory 
  
  @{text [display]  "~/.isabelle/heaps/Isabelle2008/polyml-5.2.1_x86-linux/log"}

  or something similar depending on your Isabelle distribution and architecture.
  One quick way to assign a shell variable to this directory is by typing

  @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
 
  on the Unix prompt. The directory should include the files

  @{text [display] 
"Pure.gz
HOL.gz
Pure-ProofGeneral.gz
HOL-FoobarCommand.gz"} 

  From them we create the keyword files. Assuming the name 
  of the directory is in  @{text "$ISABELLE_LOGS"},
  then the Unix command for creating the keyword file is:

@{text [display]
"$ isabelle keywords -k foobar 
   $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"}

  The result is the file @{text "isar-keywords-foobar.el"}. It should contain
  the string @{text "foobar"} twice.\footnote{To see whether things are fine, check
  that @{text "grep foobar"} on this file returns something
  non-empty.}  This keyword file needs to
  be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle
  aware of this keyword file, we have to start Isabelle with the option @{text
  "-k foobar"}, i.e.


  @{text [display] "$ isabelle -k foobar a_theory_file"}

  If we now build a theory on top of @{text "Command.thy"}, 
  then we can make use of the command \isacommand{foobar}. 
  Similarly with any other new command. 


  At the moment \isacommand{foobar} is not very useful. Let us refine it a bit 
  next by taking a proposition as argument and printing this proposition inside
  the tracing buffer. 

  The crucial part of a command is the function that determines the behaviour
  of the command. In the code above we used a ``do-nothing''-function, which
  because of @{ML Scan.succeed} does not parse any argument, but immediately
  returns the simple toplevel function @{ML "Toplevel.theory I"}. We can
  replace this code by a function that first parses a proposition (using the
  parser @{ML OuterParse.prop}), then prints out the tracing
  information (using a new top-level function @{text trace_top_lvl}) and 
  finally does nothing. For this we can write

*}

ML{*let
  fun trace_top_lvl str = 
     Toplevel.theory (fn thy => (tracing str; thy))

  val trace_prop = OuterParse.prop >> trace_top_lvl

  val kind = OuterKeyword.thy_decl
in
  OuterSyntax.command "foobar" "traces a proposition" kind trace_prop
end *}

text {*
  Now we can type

  \begin{isabelle}
  \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\
  @{text "> \"True \<and> False\""}
  \end{isabelle}
  
  and see the proposition in the tracing buffer.  

  Note that so far we used @{ML thy_decl in OuterKeyword} as the kind indicator
  for the command.  This means that the command finishes as soon as the
  arguments are processed. Examples of this kind of commands are
  \isacommand{definition} and \isacommand{declare}.  In other cases,
  commands are expected to parse some arguments, for example a proposition,
  and then ``open up'' a proof in order to prove the proposition (for example
  \isacommand{lemma}) or prove some other properties (for example
  \isacommand{function}). To achieve this kind of behaviour, we have to use the kind
  indicator @{ML thy_goal in OuterKeyword}.

  Below we change \isacommand{foobar} so that it takes a proposition as
  argument and then starts a proof in order to prove it. Therefore in Line 13
  below, we set the kind indicator to @{ML thy_goal in OuterKeyword}.
*}

ML%linenumbers{*let
  fun set_up_thm str ctxt =
    let
      val prop = Syntax.read_prop ctxt str
    in
      Proof.theorem_i NONE (K I) [[(prop,[])]] ctxt
    end;
  
  val prove_prop = OuterParse.prop >>  
      (fn str => Toplevel.print o 
                    Toplevel.local_theory_to_proof NONE (set_up_thm str))
  
  val kind = OuterKeyword.thy_goal
in
  OuterSyntax.command "foobar" "proving a proposition" kind prove_prop
end *}

text {*
  The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be
  proved) and a context.  The context is necessary in order to be able to use
  @{ML Syntax.read_prop}, which converts a string into a proper proposition.
  In Line 6 the function @{ML Proof.theorem_i} starts the proof for the
  proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  omit); the argument @{ML "(K I)"} stands for a function that determines what
  should be done with the theorem once it is proved (we chose to just forget
  about it). In Lines 9 to 11 contain the parser for the proposition.

  If we now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, we obtain the following 
  proof state:
 
  \begin{isabelle}
  \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
  @{text "goal (1 subgoal)"}\\
  @{text "1. True \<and> True"}
  \end{isabelle}

  and we can build the proof

  \begin{isabelle}
  \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
  \isacommand{apply}@{text "(rule conjI)"}\\
  \isacommand{apply}@{text "(rule TrueI)+"}\\
  \isacommand{done}
  \end{isabelle}

  Similarly for the other function composition combinators.

  
  (FIXME What does @{text "Toplevel.theory"} @{text "Toplevel.print"}?)

  (FIXME read a name and show how to store theorems)

  (FIXME possibly also read a locale)
*}

(*<*)

chapter {* Parsing *}

text {*

  Lots of Standard ML code is given in this document, for various reasons,
  including:
  \begin{itemize}
  \item direct quotation of code found in the Isabelle source files,
  or simplified versions of such code
  \item identifiers found in the Isabelle source code, with their types 
  (or specialisations of their types)
  \item code examples, which can be run by the reader, to help illustrate the
  behaviour of functions found in the Isabelle source code
  \item ancillary functions, not from the Isabelle source code, 
  which enable the reader to run relevant code examples
  \item type abbreviations, which help explain the uses of certain functions
  \end{itemize}

*}

section {* Parsing Isar input *}

text {*

  The typical parsing function has the type
  \texttt{'src -> 'res * 'src}, with input  
  of type \texttt{'src}, returning a result 
  of type \texttt{'res}, which is (or is derived from) the first part of the
  input, and also returning the remainder of the input.
  (In the common case, when it is clear what the ``remainder of the input''
  means, we will just say that the functions ``returns'' the
  value of type \texttt{'res}). 
  An exception is raised if an appropriate value 
  cannot be produced from the input.
  A range of exceptions can be used to identify different reasons 
  for the failure of a parse.
  
  This contrasts the standard parsing function in Standard ML,
  which is of type 
  \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option};
  (for example, \texttt{List.getItem} and \texttt{Substring.getc}).
  However, much of the discussion at 
  FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html
  is relevant.

  Naturally one may convert between the two different sorts of parsing functions
  as follows:
  \begin{verbatim}
  open StringCvt ;
  type ('res, 'src) ex_reader = 'src -> 'res * 'src
  ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader 
  fun ex_reader rdr src = Option.valOf (rdr src) ;
  reader : ('res, 'src) ex_reader -> ('res, 'src) reader 
  fun reader exrdr src = SOME (exrdr src) handle _ => NONE ;
  \end{verbatim}
  
*}

section{* The \texttt{Scan} structure *}

text {* 
  The source file is \texttt{src/General/scan.ML}.
  This structure provides functions for using and combining parsing functions
  of the type \texttt{'src -> 'res * 'src}.
  Three exceptions are used:
  \begin{verbatim}
  exception MORE of string option;  (*need more input (prompt)*)
  exception FAIL of string option;  (*try alternatives (reason of failure)*)
  exception ABORT of string;        (*dead end*)
  \end{verbatim}
  Many functions in this structure (generally those with names composed of
  symbols) are declared as infix.

  Some functions from that structure are
  \begin{verbatim}
  |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
  'src -> 'res2 * 'src''
  --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
  'src -> 'res1 * 'src''
  -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
  'src -> ('res1 * 'res2) * 'src''
  ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') ->
  'src -> string * 'src''
  \end{verbatim}
  These functions parse a result off the input source twice.

  \texttt{|--} and \texttt{--|} 
  return the first result and the second result, respectively.

  \texttt{--} returns both.

  \verb|^^| returns the result of concatenating the two results
  (which must be strings).

  Note how, although the types 
  \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same,
  the types as shown help suggest the behaviour of the functions.
  \begin{verbatim}
  :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->
  'src -> ('res1 * 'res2) * 'src''
  :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->
  'src -> 'res2 * 'src''
  \end{verbatim}
  These are similar to \texttt{|--} and \texttt{--|},
  except that the second parsing function can depend on the result of the first.
  \begin{verbatim}
  >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src'
  || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src
  \end{verbatim}
  \texttt{p >> f} applies a function \texttt{f} to the result of a parse.
  
  \texttt{||} tries a second parsing function if the first one
  fails by raising an exception of the form \texttt{FAIL \_}.
  
  \begin{verbatim}
  succeed : 'res -> ('src -> 'res * 'src) ;
  fail : ('src -> 'res_src) ;
  !! : ('src * string option -> string) -> 
  ('src -> 'res_src) -> ('src -> 'res_src) ;
  \end{verbatim}
  \texttt{succeed r} returns \texttt{r}, with the input unchanged.
  \texttt{fail} always fails, raising exception \texttt{FAIL NONE}.
  \texttt{!! f} only affects the failure mode, turning a failure that
  raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}.
  This is used to prevent recovery from the failure ---
  thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, 
  it won't recover by trying \texttt{parse2}.

  \begin{verbatim}
  one : ('si -> bool) -> ('si list -> 'si * 'si list) ;
  some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ;
  \end{verbatim}
  These require the input to be a list of items:
  they fail, raising \texttt{MORE NONE} if the list is empty.
  On other failures they raise \texttt{FAIL NONE} 

  \texttt{one p} takes the first
  item from the list if it satisfies \texttt{p}, otherwise fails.

  \texttt{some f} takes the first
  item from the list and applies \texttt{f} to it, failing if this returns
  \texttt{NONE}.  

  \begin{verbatim}
  many : ('si -> bool) -> 'si list -> 'si list * 'si list ; 
  \end{verbatim}
  \texttt{many p} takes items from the input until it encounters one 
  which does not satisfy \texttt{p}.  If it reaches the end of the input
  it fails, raising \texttt{MORE NONE}.

  \texttt{many1} (with the same type) fails if the first item 
  does not satisfy \texttt{p}.  

  \begin{verbatim}
  option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src)
  optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src)
  \end{verbatim}
  \texttt{option}: 
  where the parser \texttt{f} succeeds with result \texttt{r} 
  or raises \texttt{FAIL \_},
  \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}.
  
  \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_},
  \texttt{optional f default} provides the result \texttt{default}.

  \begin{verbatim}
  repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src
  repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src
  bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src 
  \end{verbatim}
  \texttt{repeat f} repeatedly parses an item off the remaining input until 
  \texttt{f} fails with \texttt{FAIL \_}

  \texttt{repeat1} is as for \texttt{repeat}, but requires at least one
  successful parse.

  \begin{verbatim}
  lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src))
  \end{verbatim}
  \texttt{lift} changes the source type of a parser by putting in an extra
  component \texttt{'ex}, which is ignored in the parsing.

  The \texttt{Scan} structure also provides the type \texttt{lexicon}, 
  HOW DO THEY WORK ?? TO BE COMPLETED
  \begin{verbatim}
  dest_lexicon: lexicon -> string list ;
  make_lexicon: string list list -> lexicon ;
  empty_lexicon: lexicon ;
  extend_lexicon: string list list -> lexicon -> lexicon ;
  merge_lexicons: lexicon -> lexicon -> lexicon ;
  is_literal: lexicon -> string list -> bool ;
  literal: lexicon -> string list -> string list * string list ;
  \end{verbatim}
  Two lexicons, for the commands and keywords, are stored and can be retrieved
  by:
  \begin{verbatim}
  val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ;
  val commands = Scan.dest_lexicon command_lexicon ;
  val keywords = Scan.dest_lexicon keyword_lexicon ;
  \end{verbatim}
*}

section{* The \texttt{OuterLex} structure *}

text {*
  The source file is @{text "src/Pure/Isar/outer_lex.ML"}.
  In some other source files its name is abbreviated:
  \begin{verbatim}
  structure T = OuterLex;
  \end{verbatim}
  This structure defines the type \texttt{token}.
  (The types
  \texttt{OuterLex.token},
  \texttt{OuterParse.token} and
  \texttt{SpecParse.token} are all the same).
  
  Input text is split up into tokens, and the input source type for many parsing
  functions is \texttt{token list}.

  The datatype definition (which is not published in the signature) is
  \begin{verbatim}
  datatype token = Token of Position.T * (token_kind * string);
  \end{verbatim}
  but here are some runnable examples for viewing tokens: 

*}




ML{*
  val toks = OuterSyntax.scan Position.none
   "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ;
*}

ML{*
  print_depth 20 ;
*}

ML{*
  map OuterLex.text_of toks ;
*}

ML{*
  val proper_toks = filter OuterLex.is_proper toks ;
*}  

ML{*
  map OuterLex.kind_of proper_toks 
*}

ML{*
  map OuterLex.unparse proper_toks ;
*}

ML{*
  OuterLex.stopper
*}

text {*

  The function \texttt{is\_proper : token -> bool} identifies tokens which are
  not white space or comments: many parsing functions assume require spaces or
  comments to have been filtered out.
  
  There is a special end-of-file token:
  \begin{verbatim}
  val (tok_eof : token, is_eof : token -> bool) = T.stopper ; 
  (* end of file token *)
  \end{verbatim}

*}

section {* The \texttt{OuterParse} structure *}

text {*
  The source file is \texttt{src/Pure/Isar/outer\_parse.ML}.
  In some other source files its name is abbreviated:
  \begin{verbatim}
  structure P = OuterParse;
  \end{verbatim}
  Here the parsers use \texttt{token list} as the input source type. 
  
  Some of the parsers simply select the first token, provided that it is of the
  right kind (as returned by \texttt{T.kind\_of}): these are 
  \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var,
  type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof}
  Others select the first token, provided that it is one of several kinds,
  (eg, \texttt{name, xname, text, typ}).

  \begin{verbatim}
  type 'a tlp = token list -> 'a * token list ; (* token list parser *)
  $$$ : string -> string tlp
  nat : int tlp ;
  maybe : 'a tlp -> 'a option tlp ;
  \end{verbatim}

  \texttt{\$\$\$ s} returns the first token,
  if it equals \texttt{s} \emph{and} \texttt{s} is a keyword.

  \texttt{nat} returns the first token, if it is a number, and evaluates it.

  \texttt{maybe}: if \texttt{p} returns \texttt{r}, 
  then \texttt{maybe p} returns \texttt{SOME r} ;
  if the first token is an underscore, it returns \texttt{NONE}.

  A few examples:
  \begin{verbatim}
  P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *)
  P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *)
  val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ;
  val proper_toks = List.filter T.is_proper toks ;
  P.list P.nat toks ; (* OK, doesn't recognize white space *)
  P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *)
  P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *)
  P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *)
  val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ;
  P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *)
  \end{verbatim}

  The following code helps run examples:
  \begin{verbatim}
  fun parse_str tlp str = 
  let val toks : token list = OuterSyntax.scan str ;
  val proper_toks = List.filter T.is_proper toks @ [tok_eof] ;
  val (res, rem_toks) = tlp proper_toks ;
  val rem_str = String.concat
  (Library.separate " " (List.map T.unparse rem_toks)) ;
  in (res, rem_str) end ;
  \end{verbatim}

  Some examples from \texttt{src/Pure/Isar/outer\_parse.ML}
  \begin{verbatim}
  val type_args =
  type_ident >> Library.single ||
  $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") ||
  Scan.succeed [];
  \end{verbatim}
  There are three ways parsing a list of type arguments can succeed.
  The first line reads a single type argument, and turns it into a singleton
  list.
  The second line reads "(", and then the remainder, ignoring the "(" ;
  the remainder consists of a list of type identifiers (at least one),
  and then a ")" which is also ignored.
  The \texttt{!!!} ensures that if the parsing proceeds this far and then fails,
  it won't try the third line (see the description of \texttt{Scan.!!}).
  The third line consumes no input and returns the empty list.

  \begin{verbatim}
  fun triple2 (x, (y, z)) = (x, y, z);
  val arity = xname -- ($$$ "::" |-- !!! (
  Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) []
  -- sort)) >> triple2;
  \end{verbatim}
  The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is
  ignored), then optionally a list $ss$ of sorts and then another sort $s$.
  The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$.
  The second line reads the optional list of sorts:
  it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored,
  and between them a comma-separated list of sorts.
  If this list is absent, the default \texttt{[]} provides the list of sorts.

  \begin{verbatim}
  parse_str P.type_args "('a, 'b) ntyp" ;
  parse_str P.type_args "'a ntyp" ;
  parse_str P.type_args "ntyp" ;
  parse_str P.arity "ty :: tycl" ;
  parse_str P.arity "ty :: (tycl1, tycl2) tycl" ;
  \end{verbatim}

*}

section {* The \texttt{SpecParse} structure *}

text {*
  The source file is \texttt{src/Pure/Isar/spec\_parse.ML}.
  This structure contains token list parsers for more complicated values.
  For example, 
  \begin{verbatim}
  open SpecParse ;
  attrib : Attrib.src tok_rdr ; 
  attribs : Attrib.src list tok_rdr ;
  opt_attribs : Attrib.src list tok_rdr ;
  xthm : (thmref * Attrib.src list) tok_rdr ;
  xthms1 : (thmref * Attrib.src list) list tok_rdr ;
  
  parse_str attrib "simp" ;
  parse_str opt_attribs "hello" ;
  val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ;
  map Args.dest_src ass ;
  val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ;
  
  parse_str xthm "mythm [attr]" ;
  parse_str xthms1 "thm1 [attr] thms2" ;
  \end{verbatim}
  
  As you can see, attributes are described using types of the \texttt{Args}
  structure, described below.
*}

section{* The \texttt{Args} structure *}

text {*
  The source file is \texttt{src/Pure/Isar/args.ML}.
  The primary type of this structure is the \texttt{src} datatype;
  the single constructors not published in the signature, but 
  \texttt{Args.src} and \texttt{Args.dest\_src}
  are in fact the constructor and destructor functions.
  Note that the types \texttt{Attrib.src} and \texttt{Method.src}
  are in fact \texttt{Args.src}.

  \begin{verbatim}
  src : (string * Args.T list) * Position.T -> Args.src ;
  dest_src : Args.src -> (string * Args.T list) * Position.T ;
  Args.pretty_src : Proof.context -> Args.src -> Pretty.T ;
  fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ;

  val thy = ML_Context.the_context () ;
  val ctxt = ProofContext.init thy ;
  map (pr_src ctxt) ass ;
  \end{verbatim}

  So an \texttt{Args.src} consists of the first word, then a list of further 
  ``arguments'', of type \texttt{Args.T}, with information about position in the
  input.
  \begin{verbatim}
  (* how an Args.src is parsed *)
  P.position : 'a tlp -> ('a * Position.T) tlp ;
  P.arguments : Args.T list tlp ;

  val parse_src : Args.src tlp =
  P.position (P.xname -- P.arguments) >> Args.src ;
  \end{verbatim}

  \begin{verbatim}
  val ((first_word, args), pos) = Args.dest_src asrc ;
  map Args.string_of args ;
  \end{verbatim}

  The \texttt{Args} structure contains more parsers and parser transformers 
  for which the input source type is \texttt{Args.T list}.  For example,
  \begin{verbatim}
  type 'a atlp = Args.T list -> 'a * Args.T list ;
  open Args ;
  nat : int atlp ; (* also Args.int *)
  thm_sel : PureThy.interval list atlp ;
  list : 'a atlp -> 'a list atlp ;
  attribs : (string -> string) -> Args.src list atlp ;
  opt_attribs : (string -> string) -> Args.src list atlp ;
  
  (* parse_atl_str : 'a atlp -> (string -> 'a * string) ;
  given an Args.T list parser, to get a string parser *)
  fun parse_atl_str atlp str = 
  let val (ats, rem_str) = parse_str P.arguments str ;
  val (res, rem_ats) = atlp ats ;
  in (res, String.concat (Library.separate " "
  (List.map Args.string_of rem_ats @ [rem_str]))) end ;

  parse_atl_str Args.int "-1-," ;
  parse_atl_str (Scan.option Args.int) "x1-," ;
  parse_atl_str Args.thm_sel "(1-,4,13-22)" ;

  val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I)
  "[THEN trans [THEN sym], simp, OF sym]" ;
  \end{verbatim}

  From here, an attribute is interpreted using \texttt{Attrib.attribute}.

  \texttt{Args} has a large number of functions which parse an \texttt{Args.src}
  and also refer to a generic context.  
  Note the use of \texttt{Scan.lift} for this.
  (as does \texttt{Attrib} - RETHINK THIS)
  
  (\texttt{Args.syntax} shown below has type specialised)

  \begin{verbatim}
  type ('res, 'src) parse_fn = 'src -> 'res * 'src ;
  type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ;
  Scan.lift : 'a atlp -> 'a cgatlp ;
  term : term cgatlp ;
  typ : typ cgatlp ;
  
  Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ;
  Attrib.thm : thm cgatlp ;
  Attrib.thms : thm list cgatlp ;
  Attrib.multi_thm : thm list cgatlp ;
  
  (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ;
  given a (Context.generic * Args.T list) parser, to get a string parser *)
  fun parse_cgatl_str cgatlp str = 
  let 
    (* use the current generic context *)
    val generic = Context.Theory thy ;
    val (ats, rem_str) = parse_str P.arguments str ;
    (* ignore any change to the generic context *)
    val (res, (_, rem_ats)) = cgatlp (generic, ats) ;
  in (res, String.concat (Library.separate " "
      (List.map Args.string_of rem_ats @ [rem_str]))) end ;
  \end{verbatim}
*}

section{* Attributes, and the \texttt{Attrib} structure *}

text {*
  The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}.
  The source file for the \texttt{Attrib} structure is
  \texttt{src/Pure/Isar/attrib.ML}.
  Most attributes use a theorem to change a generic context (for example, 
  by declaring that the theorem should be used, by default, in simplification),
  or change a theorem (which most often involves referring to the current
  theory). 
  The functions \texttt{Thm.rule\_attribute} and
  \texttt{Thm.declaration\_attribute} create attributes of these kinds.

  \begin{verbatim}
  type attribute = Context.generic * thm -> Context.generic * thm;
  type 'a trf = 'a -> 'a ; (* transformer of a given type *)
  Thm.rule_attribute  : (Context.generic -> thm -> thm) -> attribute ;
  Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ;

  Attrib.print_attributes : theory -> unit ;
  Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ;

  List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ;
  \end{verbatim}

  An attribute is stored in a theory as indicated by:
  \begin{verbatim}
  Attrib.add_attributes : 
  (bstring * (src -> attribute) * string) list -> theory trf ; 
  (*
  Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ;
  *)
  \end{verbatim}
  where the first and third arguments are name and description of the attribute,
  and the second is a function which parses the attribute input text 
  (including the attribute name, which has necessarily already been parsed).
  Here, \texttt{THEN\_att} is a function declared in the code for the
  structure \texttt{Attrib}, but not published in its signature.
  The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of 
  \texttt{Attrib.add\_attributes} to add a number of attributes.

  \begin{verbatim}
  FullAttrib.THEN_att : src -> attribute ;
  FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ;
  FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ;
  \end{verbatim}

  \begin{verbatim}
  Attrib.syntax : attribute cgatlp -> src -> attribute ;
  Attrib.no_args : attribute -> src -> attribute ;
  \end{verbatim}
  When this is called as \texttt{syntax scan src (gc, th)}
  the generic context \texttt{gc} is used 
  (and potentially changed to \texttt{gc'})
  by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would
  then be applied to \texttt{(gc', th)}.
  The source for parsing the attribute is the arguments part of \texttt{src},
  which must all be consumed by the parse.

  For example, for \texttt{Attrib.no\_args attr src}, the attribute parser 
  simply returns \texttt{attr}, requiring that the arguments part of
  \texttt{src} must be empty.

  Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified:
  \begin{verbatim}
  fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ;
  rot_att_n : int -> attribute ;
  val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ;
  val rotated_att : src -> attribute =
  Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ;
  
  val THEN_arg : int cgatlp = Scan.lift 
  (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ;

  Attrib.thm : thm cgatlp ;

  THEN_arg -- Attrib.thm : (int * thm) cgatlp ;

  fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ;
  THEN_att_n : int * thm -> attribute ;

  val THEN_att : src -> attribute = Attrib.syntax
  (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp);
  \end{verbatim}
  The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg}
  read an optional argument, which for \texttt{rotated} is an integer, 
  and for \texttt{THEN} is a natural enclosed in square brackets;
  the default, if the argument is absent, is 1 in each case.
  Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into
  attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is
  parsed by \texttt{Attrib.thm}.  
  Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}.

*}

section{* Methods, and the \texttt{Method} structure *}

text {*
  The source file is \texttt{src/Pure/Isar/method.ML}.
  The type \texttt{method} is defined by the datatype declaration
  \begin{verbatim}
  (* datatype method = Meth of thm list -> cases_tactic; *)
  RuleCases.NO_CASES : tactic -> cases_tactic ;
  \end{verbatim}
  In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor
  \texttt{Meth}.
  A \texttt{cases\_tactic} is an elaborated version of a tactic.
  \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a
  \texttt{cases\_tactic} without any further case information.
  For further details see the description of structure \texttt{RuleCases} below.
  The list of theorems to be passed to a method consists of the current
  \emph{facts} in the proof.
  
  \begin{verbatim}
  RAW_METHOD : (thm list -> tactic) -> method ;
  METHOD : (thm list -> tactic) -> method ;
  
  SIMPLE_METHOD : tactic -> method ;
  SIMPLE_METHOD' : (int -> tactic) -> method ;
  SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ;

  RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ;
  METHOD_CASES : (thm list -> cases_tactic) -> method ;
  \end{verbatim}
  A method is, in its simplest form, a tactic; applying the method is to apply
  the tactic to the current goal state.

  Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying 
  \texttt{tacf} to the current {facts}, and applying that tactic to the
  goal state.

  \texttt{METHOD} is similar but also first applies
  \texttt{Goal.conjunction\_tac} to all subgoals.

  \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then
  applies \texttt{tacf}.

  \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then
  applies \texttt{tacf} to subgoal 1.

  \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by
  \texttt{quant}, which may be, for example,
  \texttt{ALLGOALS} (all subgoals),
  \texttt{TRYALL} (try all subgoals, failure is OK),
  \texttt{FIRSTGOAL} (try subgoals until it succeeds once), 
  \texttt{(fn tacf => tacf 4)} (subgoal 4), etc
  (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}).

  A method is stored in a theory as indicated by:
  \begin{verbatim}
  Method.add_method : 
  (bstring * (src -> Proof.context -> method) * string) -> theory trf ; 
  ( *
  * )
  \end{verbatim}
  where the first and third arguments are name and description of the method,
  and the second is a function which parses the method input text 
  (including the method name, which has necessarily already been parsed).

  Here, \texttt{xxx} is a function declared in the code for the
  structure \texttt{Method}, but not published in its signature.
  The source file \texttt{src/Pure/Isar/method.ML} shows the use of 
  \texttt{Method.add\_method} to add a number of methods.


*}
(*>*)
end