ProgTutorial/Parsing.thy
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Sun, 06 Apr 2014 12:45:54 +0100
changeset 555 2c34c69236ce
parent 553 c53d74b34123
child 556 3c214b215f7e
permissions -rw-r--r--
updated to new isabelle

theory Parsing
imports Base "Helper/Command/Command" "Package/Simple_Inductive_Package"
begin

chapter {* Parsing\label{chp:parsing} *}


text {*
  \begin{flushright}
  {\em An important principle underlying the success and popularity of Unix\\ is
  the philosophy of building on the work of others.} \\[1ex]
  Tony Travis in an email about the\\ ``LINUX is obsolete'' debate 
  \end{flushright}

  \begin{flushright}
  {\em
  Document your code as if someone else might have to take it over at any 
  moment and know what to do with it. That person might actually be you -- 
  how often have you had to revisit your own code and thought to yourself, 
  what was I trying to do here?} \\[1ex]
  Phil Chu in ``Seven Habits of Highly Effective Programmers'' 
  \end{flushright}

  Isabelle distinguishes between \emph{outer} and \emph{inner}
  syntax. Commands, such as \isacommand{definition}, \isacommand{inductive}
  and so on, belong to the outer syntax, whereas terms, types and so on belong
  to the inner syntax. For parsing inner syntax, Isabelle uses a rather
  general and sophisticated algorithm, which is driven by priority
  grammars. Parsers for outer syntax are built up by functional parsing
  combinators. These combinators are a well-established technique for parsing,
  which has, for example, been described in Paulson's classic ML-book
  \cite{paulson-ml2}.  Isabelle developers are usually concerned with writing
  these outer syntax parsers, either for new definitional packages or for
  calling methods with specific arguments.

  \begin{readmore}
  The library for writing parser combinators is split up, roughly, into two
  parts: The first part consists of a collection of generic parser combinators
  defined in the structure @{ML_struct Scan} in the file @{ML_file
  "Pure/General/scan.ML"}. The second part of the library consists of
  combinators for dealing with specific token types, which are defined in the
  structure @{ML_struct Parse} in the file @{ML_file
  "Pure/Isar/parse.ML"}. In addition specific parsers for packages are 
  defined in @{ML_file "Pure/Isar/parse_spec.ML"}. Parsers for method arguments 
  are defined in @{ML_file "Pure/Isar/args.ML"}.
  \end{readmore}

*}

section {* Building Generic Parsers *}

text {*

  Let us first have a look at parsing strings using generic parsing
  combinators. The function @{ML_ind "$$" in Scan} takes a string as argument and will
  ``consume'' this string from a given input list of strings. ``Consume'' in
  this context means that it will return a pair consisting of this string and
  the rest of the input list. For example:

  @{ML_response [display,gray] 
  "($$ \"h\") (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}

  @{ML_response [display,gray] 
  "($$ \"w\") (Symbol.explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}

  The function @{ML "$$"} will either succeed (as in the two examples above)
  or raise the exception @{text "FAIL"} if no string can be consumed. For
  example trying to parse

  @{ML_response_fake [display,gray] 
  "($$ \"x\") (Symbol.explode \"world\")" 
  "Exception FAIL raised"}
  
  will raise the exception @{text "FAIL"}.  There are three exceptions used in
  the parsing combinators:

  \begin{itemize}
  \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
  might be explored. 
  \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
  in @{text "($$ \"h\") []"}.
  \item @{text "ABORT"} is the exception that is raised when a dead end is reached. 
  It is used for example in the function @{ML "!!"} (see below).
  \end{itemize}

  However, note that these exceptions are private to the parser and cannot be accessed
  by the programmer (for example to handle them).

  In the examples above we use the function @{ML_ind explode in Symbol} from the
  structure @{ML_struct Symbol}, instead of the more standard library function
  @{ML_ind explode in String}, for obtaining an input list for the parser. The reason is
  that @{ML explode in Symbol} is aware of character
  sequences, for example @{text "\<foo>"}, that have a special meaning in
  Isabelle. To see the difference consider

@{ML_response_fake [display,gray]
"let 
  val input = \"\<foo> bar\"
in
  (String.explode input, Symbol.explode input)
end"
"([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}

  Slightly more general than the parser @{ML "$$"} is the function 
  @{ML_ind one in Scan}, in that it takes a predicate as argument and 
  then parses exactly
  one item from the input list satisfying this predicate. For example the
  following parser either consumes an @{text [quotes] "h"} or a @{text
  [quotes] "w"}:

@{ML_response [display,gray] 
"let 
  val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
  val input1 = Symbol.explode \"hello\"
  val input2 = Symbol.explode \"world\"
in
  (hw input1, hw input2)
end"
    "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}

  Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. 
  For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
  order) you can achieve by:

  @{ML_response [display,gray] 
  "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
  "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}

  Note how the result of consumed strings builds up on the left as nested pairs.  

  If, as in the previous example, you want to parse a particular string, 
  then you can use the function @{ML_ind this_string in Scan}.

  @{ML_response [display,gray] 
  "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
  "(\"hell\", [\"o\"])"}

  Parsers that explore alternatives can be constructed using the function 
  @{ML_ind  "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the
  result of @{text "p"}, in case it succeeds, otherwise it returns the
  result of @{text "q"}. For example:


@{ML_response [display,gray] 
"let 
  val hw = $$ \"h\" || $$ \"w\"
  val input1 = Symbol.explode \"hello\"
  val input2 = Symbol.explode \"world\"
in
  (hw input1, hw input2)
end"
  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}

  The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing
  function for parsers, except that they discard the item being parsed by the
  first (respectively second) parser. That means the item being dropped is the 
  one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away.
  For example:
  
@{ML_response [display,gray]
"let 
  val just_e = $$ \"h\" |-- $$ \"e\" 
  val just_h = $$ \"h\" --| $$ \"e\" 
  val input = Symbol.explode \"hello\"  
in 
  (just_e input, just_h input)
end"
  "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"}

  The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
  @{text "p"}, if it succeeds; otherwise it returns 
  the default value @{text "x"}. For example:

@{ML_response [display,gray]
"let 
  val p = Scan.optional ($$ \"h\") \"x\"
  val input1 = Symbol.explode \"hello\"
  val input2 = Symbol.explode \"world\"  
in 
  (p input1, p input2)
end" 
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}

  The function @{ML_ind option in Scan} works similarly, except no default value can
  be given. Instead, the result is wrapped as an @{text "option"}-type. For example:

@{ML_response [display,gray]
"let 
  val p = Scan.option ($$ \"h\")
  val input1 = Symbol.explode \"hello\"
  val input2 = Symbol.explode \"world\"  
in 
  (p input1, p input2)
end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}

  The function @{ML_ind ahead in Scan} parses some input, but leaves the original
  input unchanged. For example:

  @{ML_response [display,gray]
  "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
  "(\"foo\", [\"f\", \"o\", \"o\"])"} 

  The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages
  during parsing. For example if you want to parse @{text p} immediately 
  followed by @{text q}, or start a completely different parser @{text r},
  you might write:

  @{ML [display,gray] "(p -- q) || r" for p q r}

  However, this parser is problematic for producing a useful error
  message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the
  parser above you lose the information that @{text p} should be followed by @{text q}.
  To see this assume that @{text p} is present in the input, but it is not
  followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and
  hence the alternative parser @{text r} will be tried. However, in many
  circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something''
  and therefore will also fail. The error message is then caused by the failure
  of @{text r}, not by the absence of @{text q} in the input. This kind of
  situation can be avoided when using the function @{ML "!!"}.  This function
  aborts the whole process of parsing in case of a failure and prints an error
  message. For example if you invoke the parser


  @{ML [display,gray] "!! (fn _ => fn _ =>\"foo\") ($$ \"h\")"}
*}
text {*
  on @{text [quotes] "hello"}, the parsing succeeds

  @{ML_response [display,gray] 
  "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" 
  "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}

  but if you invoke it on @{text [quotes] "world"}

  @{ML_response_fake [display,gray] "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
                               "Exception ABORT raised"}

  then the parsing aborts and the error message @{text "foo"} is printed. In order to
  see the error message properly, you need to prefix the parser with the function 
  @{ML_ind error in Scan}. For example:

  @{ML_response_fake [display,gray] 
  "Scan.error (!! (fn _ => fn _ => \"foo\") ($$ \"h\"))"
  "Exception Error \"foo\" raised"}

  This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in Outer_Syntax} 
  (see Section~\ref{sec:newcommand} which explains this function in more detail). 
  
  Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
  r}. If you want to generate the correct error message for failure
  of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write:
*}

ML %grayML{*fun p_followed_by_q p q r =
let 
  val err_msg = fn _ => p ^ " is not followed by " ^ q
in
  ($$ p -- (!! (fn _ => err_msg) ($$ q))) || ($$ r -- $$ r)
end *}


text {*
  Running this parser with the arguments
  @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and 
  the input @{text [quotes] "holle"} 

  @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")"
                               "Exception ERROR \"h is not followed by e\" raised"} 

  produces the correct error message. Running it with
 
  @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")"
                          "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
  
  yields the expected parsing. 

  The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as 
  often as it succeeds. For example:
  
  @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
                "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
  
  Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function
  @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
  succeeds at least once.

  Also note that the parser would have aborted with the exception @{text MORE}, if
  you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using
  the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' 
  @{ML_ind stopper in Symbol}. With them you can write:
  
  @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
                "([\"h\", \"h\", \"h\", \"h\"], [])"}

  The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings;
  other stoppers need to be used when parsing, for example, tokens. However, this kind of 
  manually wrapping is often already done by the surrounding infrastructure. 

  The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any 
  string as in

  @{ML_response [display,gray] 
"let 
   val p = Scan.repeat (Scan.one Symbol.not_eof)
   val input = Symbol.explode \"foo bar foo\"
in
   Scan.finite Symbol.stopper p input
end" 
"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}

  where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the 
  end of the input string (i.e.~stopper symbol).

  The function @{ML_ind unless in Scan} takes two parsers: if the first one can 
  parse the input, then the whole parser fails; if not, then the second is tried. Therefore
  
  @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
                               "Exception FAIL raised"}

  fails, while

  @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
                          "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}

  succeeds. 

  The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can 
  be combined to read any input until a certain marker symbol is reached. In the 
  example below the marker symbol is a @{text [quotes] "*"}.

  @{ML_response [display,gray]
"let 
  val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
  val input1 = Symbol.explode \"fooooo\"
  val input2 = Symbol.explode \"foo*ooo\"
in
  (Scan.finite Symbol.stopper p input1, 
   Scan.finite Symbol.stopper p input2)
end"
"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}

  
  After parsing is done, you almost always want to apply a function to the parsed 
  items. One way to do this is the function @{ML_ind ">>" in Scan} where 
  @{ML "(p >> f)" for p f} runs 
  first the parser @{text p} and upon successful completion applies the 
  function @{text f} to the result. For example

@{ML_response [display,gray]
"let 
  fun double (x, y) = (x ^ x, y ^ y) 
  val parser = $$ \"h\" -- $$ \"e\"
in
  (parser >> double) (Symbol.explode \"hello\")
end"
"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}

  doubles the two parsed input strings; or

  @{ML_response [display,gray] 
"let 
   val p = Scan.repeat (Scan.one Symbol.not_eof)
   val input = Symbol.explode \"foo bar foo\" 
in
   Scan.finite Symbol.stopper (p >> implode) input
end" 
"(\"foo bar foo\",[])"}

  where the single-character strings in the parsed output are transformed
  back into one string.

  The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies
  the given parser to the second component of the pair and leaves the  first component 
  untouched. For example

@{ML_response [display,gray]
"Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}

  \footnote{\bf FIXME: In which situations is @{text "lift"} useful? Give examples.} 

  Be aware of recursive parsers. Suppose you want to read strings separated by
  commas and by parentheses into a tree datastructure; for example, generating
  the tree corresponding to the string @{text [quotes] "(A, A), (A, A)"} where
  the @{text "A"}s will be the leaves.  We assume the trees are represented by the
  datatype:
*}

ML %grayML{*datatype tree = 
    Lf of string 
  | Br of tree * tree*}

text {*
  Since nested parentheses should be treated in a meaningful way---for example
  the string @{text [quotes] "((A))"} should be read into a single 
  leaf---you might implement the following parser.
*}

ML %grayML{*fun parse_basic s = 
  $$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")"  

and parse_tree s = 
  parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s*}

text {*
  This parser corresponds to the grammar:

  \begin{center}
  \begin{tabular}{lcl}
  @{text "<Basic>"}  & @{text "::="} & @{text "<String> | (<Tree>)"}\\
  @{text "<Tree>"}   & @{text "::="} & @{text "<Basic>, <Tree> | <Basic>"}\\
  \end{tabular}
  \end{center}

  The parameter @{text "s"} is the string over which the tree is parsed. The
  parser @{ML parse_basic} reads either a leaf or a tree enclosed in
  parentheses. The parser @{ML parse_tree} reads either a pair of trees
  separated by a comma, or acts like @{ML parse_basic}. Unfortunately,
  because of the mutual recursion, this parser will immediately run into a
  loop, even if it is called without any input. For example

  @{ML_response_fake_both [display, gray]
  "parse_tree \"A\""
  "*** Exception- TOPLEVEL_ERROR raised"}

  raises an exception indicating that the stack limit is reached. Such
  looping parser are not useful, because of ML's strict evaluation of
  arguments. Therefore we need to delay the execution of the
  parser until an input is given. This can be done by adding the parsed
  string as an explicit argument. So the parser above should be implemented
  as follows.
*}

ML %grayML{*fun parse_basic s xs = 
  ($$ s >> Lf || $$ "(" |-- parse_tree s --| $$ ")") xs 

and parse_tree s xs = 
  (parse_basic s --| $$ "," -- parse_tree s >> Br || parse_basic s) xs*}

text {*
  While the type of the parser is unchanged by the addition, its behaviour 
  changed: with this version of the parser the execution is delayed until 
  some string is  applied for the argument @{text "xs"}. This gives us 
  exactly the parser what we wanted. An example is as follows:

  @{ML_response [display, gray]
  "let
  val input = Symbol.explode \"(A,((A))),A\"
in
  Scan.finite Symbol.stopper (parse_tree \"A\") input
end"
  "(Br (Br (Lf \"A\", Lf \"A\"), Lf \"A\"), [])"}


  \begin{exercise}\label{ex:scancmts}
  Write a parser that parses an input string so that any comment enclosed
  within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within
  @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
  function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
  "s1 ^ s ^ s2" for s1 s2 s}. Hint: To simplify the task ignore the proper 
  nesting of comments.
  \end{exercise}
*}

section {* Parsing Theory Syntax *}

text {*
  Most of the time, however, Isabelle developers have to deal with parsing
  tokens, not strings.  These token parsers have the type:
*}
  
ML %grayML{*type 'a parser = Token.T list -> 'a * Token.T list*}

text {*
  {\bf REDO!!}


  The reason for using token parsers is that theory syntax, as well as the
  parsers for the arguments of proof methods, use the type @{ML_type
  Token.T}.

  \begin{readmore}
  The parser functions for the theory syntax are contained in the structure
  @{ML_struct Parse} defined in the file @{ML_file  "Pure/Isar/parse.ML"}.
  The definition for tokens is in the file @{ML_file "Pure/Isar/token.ML"}.
  \end{readmore}

  The structure @{ML_struct  Token} defines several kinds of tokens (for
  example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in
  Token} for keywords and @{ML_ind Command in Token} for commands). Some
  token parsers take into account the kind of tokens. The first example shows
  how to generate a token list out of a string using the function 
  @{ML_ind scan in Outer_Syntax}. It is given the argument 
  @{ML "Position.none"} since,
  at the moment, we are not interested in generating precise error
  messages. The following code


@{ML_response_fake [display,gray] "Outer_Syntax.scan Position.none \"hello world\"" 
"[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
 Token (\<dots>,(Space, \" \"),\<dots>), 
 Token (\<dots>,(Ident, \"world\"),\<dots>)]"}

  produces three tokens where the first and the last are identifiers, since
  @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
  other syntactic category. The second indicates a space.

  We can easily change what is recognised as a keyword with the function
  @{ML_ind define in Keyword}. For example calling it with 
*}

ML %grayML{*val _ = Keyword.define ("hello", NONE) *}

text {*
  then lexing @{text [quotes] "hello world"} will produce

  @{ML_response_fake [display,gray] "Outer_Syntax.scan Position.none \"hello world\"" 
"[Token (\<dots>,(Keyword, \"hello\"),\<dots>), 
 Token (\<dots>,(Space, \" \"),\<dots>), 
 Token (\<dots>,(Ident, \"world\"),\<dots>)]"}

  Many parsing functions later on will require white space, comments and the like
  to have already been filtered out.  So from now on we are going to use the 
  functions @{ML filter} and @{ML_ind is_proper in Token} to do this. 
  For example:

@{ML_response_fake [display,gray]
"let
   val input = Outer_Syntax.scan Position.none \"hello world\"
in
   filter Token.is_proper input
end" 
"[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}

  For convenience we define the function:
*}

ML %grayML{*fun filtered_input str = 
  filter Token.is_proper (Outer_Syntax.scan Position.none str) *}

text {* 
  If you now parse

@{ML_response_fake [display,gray] 
"filtered_input \"inductive | for\"" 
"[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
 Token (\<dots>,(Keyword, \"|\"),\<dots>), 
 Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}

  you obtain a list consisting of only one command and two keyword tokens.
  If you want to see which keywords and commands are currently known to Isabelle, 
  use the function @{ML_ind get_lexicons in Keyword}:

@{ML_response_fake [display,gray] 
"let 
  val (keywords, commands) = Keyword.get_lexicons ()
in 
  (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
end" 
"([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}

  You might have to adjust the @{ML_ind default_print_depth} in order to
  see the complete list.

  The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example:

@{ML_response [display,gray]
"let 
  val input1 = filtered_input \"where for\"
  val input2 = filtered_input \"| in\"
in 
  (Parse.$$$ \"where\" input1, Parse.$$$ \"|\" input2)
end"
"((\"where\",\<dots>), (\"|\",\<dots>))"}

  Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}.
  For example:

  @{ML_response [display,gray]
"let 
  val p = Parse.reserved \"bar\"
  val input = filtered_input \"bar\"
in
  p input
end"
  "(\"bar\",[])"}

  Like before, you can sequentially connect parsers with @{ML "--"}. For example: 

@{ML_response [display,gray]
"let 
  val input = filtered_input \"| in\"
in 
  (Parse.$$$ \"|\" -- Parse.$$$ \"in\") input
end"
"((\"|\", \"in\"), [])"}

  The parser @{ML "Parse.enum s p" for s p} parses a possibly empty 
  list of items recognised by the parser @{text p}, where the items being parsed
  are separated by the string @{text s}. For example:

@{ML_response [display,gray]
"let 
  val input = filtered_input \"in | in | in foo\"
in 
  (Parse.enum \"|\" (Parse.$$$ \"in\")) input
end" 
"([\"in\", \"in\", \"in\"], [\<dots>])"}

  The function @{ML_ind enum1 in Parse} works similarly, except that the
  parsed list must be non-empty. Note that we had to add a string @{text
  [quotes] "foo"} at the end of the parsed string, otherwise the parser would
  have consumed all tokens and then failed with the exception @{text
  "MORE"}. Like in the previous section, we can avoid this exception using the
  wrapper @{ML Scan.finite}. This time, however, we have to use the
  ``stopper-token'' @{ML Token.stopper}. We can write:

@{ML_response [display,gray]
"let 
  val input = filtered_input \"in | in | in\"
  val p = Parse.enum \"|\" (Parse.$$$ \"in\")
in 
  Scan.finite Token.stopper p input
end" 
"([\"in\", \"in\", \"in\"], [])"}

  The following function will help to run examples.
*}

ML %grayML{*fun parse p input = Scan.finite Token.stopper (Scan.error p) input *}

text {*
  The function @{ML_ind "!!!" in Parse} can be used to force termination
  of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous
  section). A difference, however, is that the error message of @{ML
  "Parse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"}
  together with a relatively precise description of the failure. For example:

@{ML_response_fake [display,gray]
"let 
  val input = filtered_input \"in |\"
  val parse_bar_then_in = Parse.$$$ \"|\" -- Parse.$$$ \"in\"
in 
  parse (Parse.!!! parse_bar_then_in) input 
end"
"Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
but keyword in was found\" raised"
}

  \begin{exercise} (FIXME)
  A type-identifier, for example @{typ "'a"}, is a token of 
  kind @{ML_ind Keyword in Token}. It can be parsed using 
  the function @{ML type_ident in Parse}.
  \end{exercise}

  (FIXME: or give parser for numbers)

  Whenever there is a possibility that the processing of user input can fail, 
  it is a good idea to give all available information about where the error 
  occurred. For this Isabelle can attach positional information to tokens
  and then thread this information up the ``processing chain''. To see this,
  modify the function @{ML filtered_input}, described earlier, as follows 
*}

ML %grayML{*fun filtered_input' str = 
       filter Token.is_proper (Outer_Syntax.scan (Position.line 7) str) *}

text {*
  where we pretend the parsed string starts on line 7. An example is

@{ML_response_fake [display,gray]
"filtered_input' \"foo \\n bar\""
"[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \<dots>),
 Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"}

  in which the @{text [quotes] "\\n"} causes the second token to be in 
  line 8.

  By using the parser @{ML position in Parse} you can access the token 
  position and return it as part of the parser result. For example

@{ML_response_fake [display,gray]
"let
  val input = filtered_input' \"where\"
in 
  parse (Parse.position (Parse.$$$ \"where\")) input
end"
"((\"where\", {line=7, end_line=7}), [])"}

  \begin{readmore}
  The functions related to positions are implemented in the file
  @{ML_file "Pure/General/position.ML"}.
  \end{readmore}

  \begin{exercise}\label{ex:contextfree}
  Write a parser for the context-free grammar representing arithmetic 
  expressions with addition and multiplication. As usual, multiplication 
  binds stronger than addition, and both of them nest to the right. 
  The context-free grammar is defined as:

  \begin{center}
  \begin{tabular}{lcl}
  @{text "<Basic>"}  & @{text "::="} & @{text "<Number> | (<Expr>)"}\\
  @{text "<Factor>"} & @{text "::="} & @{text "<Basic> * <Factor> | <Basic>"}\\
  @{text "<Expr>"}   & @{text "::="} & @{text "<Factor> + <Expr>  | <Factor>"}\\
  \end{tabular}
  \end{center}

  Hint: Be careful with recursive parsers.
  \end{exercise}
*}

section {* Parsers for ML-Code (TBD) *}

text {*
  @{ML_ind ML_source in Parse}
*}

section {* Context Parser (TBD) *}

text {*
  @{ML_ind Args.context}
*}
(*
ML {*
let
   val parser = Args.context -- Scan.lift Args.name_inner_syntax
   
  fun term_pat (ctxt, str) =
      str |> Syntax.read_prop ctxt
in
  (parser >> term_pat) (Context.Proof @{context}, filtered_input "f (a::nat)")
  |> fst
end
*}
*)

text {*
  @{ML_ind Args.context}

  Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}.
*}

section {* Argument and Attribute Parsers (TBD) *}

section {* Parsing Inner Syntax *}

text {*
  There is usually no need to write your own parser for parsing inner syntax, that is 
  for terms and  types: you can just call the predefined parsers. Terms can 
  be parsed using the function @{ML_ind term in Parse}. For example:

@{ML_response_fake [display,gray]
"let 
  val input = Outer_Syntax.scan Position.none \"foo\"
in 
  Parse.term input
end"
"(\"<markup>\", [])"}


  The function @{ML_ind prop in Parse} is similar, except that it gives a different
  error message, when parsing fails. As you can see, the parser not just returns 
  the parsed string, but also some markup information. You can decode the
  information with the function @{ML_ind parse in YXML} in @{ML_struct YXML}. 
  The result of the decoding is an XML-tree. You can see better what is going on if
  you replace @{ML Position.none} by @{ML "Position.line 42"}, say:

@{ML_response_fake [display,gray]
"let 
  val input = Outer_Syntax.scan (Position.line 42) \"foo\"
in 
  YXML.parse (fst (Parse.term input))
end"
"Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}

  The positional information is stored as part of an XML-tree so that code 
  called later on will be able to give more precise error messages. 

  \begin{readmore}
  The functions to do with input and output of XML and YXML are defined 
  in @{ML_file "Pure/PIDE/xml.ML"} and @{ML_file "Pure/PIDE/yxml.ML"}.
  \end{readmore}
  
  FIXME:
  @{ML_ind parse_term in Syntax} @{ML_ind check_term in Syntax}
  @{ML_ind parse_typ in Syntax} @{ML_ind check_typ in Syntax}
  @{ML_ind read_term in Syntax} @{ML_ind read_term in Syntax}


*}

section {* Parsing Specifications\label{sec:parsingspecs} *}


text {*
  There are a number of special purpose parsers that help with parsing
  specifications of function definitions, inductive predicates and so on. In
  Chapter~\ref{chp:package}, for example, we will need to parse specifications
  for inductive predicates of the form:
*}


simple_inductive
  even and odd
where
  even0: "even 0"
| evenS: "odd n \<Longrightarrow> even (Suc n)"
| oddS: "even n \<Longrightarrow> odd (Suc n)"

text {*
  For this we are going to use the parser:
*}

ML %linenosgray{*val spec_parser = 
     Parse.fixes -- 
     Scan.optional 
       (Parse.$$$ "where" |--
          Parse.!!! 
            (Parse.enum1 "|" 
               (Parse_Spec.opt_thm_name ":" -- Parse.prop))) []*}

text {*
  Note that the parser must not parse the keyword \simpleinductive, even if it is
  meant to process definitions as shown above. The parser of the keyword 
  will be given by the infrastructure that will eventually call @{ML spec_parser}.
  

  To see what the parser returns, let us parse the string corresponding to the 
  definition of @{term even} and @{term odd}:

@{ML_response [display,gray]
"let
  val input = filtered_input
     (\"even and odd \" ^  
      \"where \" ^
      \"  even0[intro]: \\\"even 0\\\" \" ^ 
      \"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^ 
      \"| oddS[intro]:  \\\"even n \<Longrightarrow> odd (Suc n)\\\"\")
in
  parse spec_parser input
end"
"(([(even, NONE, NoSyn), (odd, NONE, NoSyn)],
   [((even0,\<dots>),\<dots>),
    ((evenS,\<dots>),\<dots>),
    ((oddS,\<dots>),\<dots>)]), [])"}

  As you see, the result is a pair consisting of a list of
  variables with optional type-annotation and syntax-annotation, and a list of
  rules where every rule has optionally a name and an attribute.

  The function @{ML_ind "fixes" in Parse} in Line 2 of the parser reads an 
  \isacommand{and}-separated 
  list of variables that can include optional type annotations and syntax translations. 
  For example:\footnote{Note that in the code we need to write 
  @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
  in the compound type.}

@{ML_response [display,gray]
"let
  val input = filtered_input 
        \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\"
in
  parse Parse.fixes input
end"
"([(foo, SOME \<dots>, NoSyn), 
  (bar, SOME \<dots>, Mixfix (\"BAR\", [], 100)), 
  (blonk, NONE, NoSyn)],[])"}  
*}

text {*
  Whenever types are given, they are stored in the @{ML SOME}s. The types are
  not yet used to type the variables: this must be done by type-inference later
  on. Since types are part of the inner syntax they are strings with some
  encoded information (see previous section). If a mixfix-syntax is
  present for a variable, then it is stored in the 
  @{ML Mixfix} data structure; no syntax translation is indicated by @{ML_ind NoSyn in Syntax}.

  \begin{readmore}
  The data structure for mixfix annotations are implemented in 
  @{ML_file "Pure/Syntax/mixfix.ML"} and @{ML_file "Pure/Syntax/syntax.ML"}.
  \end{readmore}

  Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
  list of introduction rules, that is propositions with theorem annotations
  such as rule names and attributes. The introduction rules are propositions
  parsed by @{ML_ind prop in Parse}. However, they can include an optional
  theorem name plus some attributes. For example

@{ML_response [display,gray] "let 
  val input = filtered_input \"foo_lemma[intro,dest!]:\"
  val ((name, attrib), _) = parse (Parse_Spec.thm_name \":\") input 
in 
  (name, map Args.name_of_src attrib)
end" "(foo_lemma, [(\"intro\", \<dots>), (\"dest\", \<dots>)])"}
 
  The function @{ML_ind opt_thm_name in Parse_Spec} is the ``optional'' variant of
  @{ML_ind thm_name in Parse_Spec}. Theorem names can contain attributes. The name 
  has to end with @{text [quotes] ":"}---see the argument of 
  the function @{ML Parse_Spec.opt_thm_name} in Line 7.

  \begin{readmore}
  Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
  and @{ML_file "Pure/Isar/args.ML"}.
  \end{readmore}
*}

text_raw {*
  \begin{exercise}
  Have a look at how the parser @{ML Parse_Spec.where_alt_specs} is implemented
  in file @{ML_file "Pure/Isar/parse_spec.ML"}. This parser corresponds
  to the ``where-part'' of the introduction rules given above. Below
  we paraphrase the code of @{ML_ind where_alt_specs in Parse_Spec} adapted to our
  purposes. 
  \begin{isabelle}
*}
ML %linenosgray{*val spec_parser' = 
     Parse.fixes -- 
     Scan.optional
     (Parse.$$$ "where" |-- 
        Parse.!!! 
          (Parse.enum1 "|" 
             ((Parse_Spec.opt_thm_name ":" -- Parse.prop) --| 
                  Scan.option (Scan.ahead (Parse.name || 
                  Parse.$$$ "[") -- 
                  Parse.!!! (Parse.$$$ "|"))))) [] *}
text_raw {*
  \end{isabelle}
  Both parsers accept the same input% that's not true:
  % spec_parser accepts input that is refuted by spec_parser'
  , but if you look closely, you can notice 
  an additional  ``tail'' (Lines 8 to 10) in @{ML spec_parser'}. What is the purpose of 
  this additional ``tail''?
  \end{exercise}
*}

text {*
  (FIXME: @{ML Parse.type_args}, @{ML Parse.typ}, @{ML Parse.opt_mixfix})
*}


section {* New Commands\label{sec:newcommand} *}

text {*
  Often new commands, for example for providing new definitional principles,
  need to be implemented. While this is not difficult on the ML-level and for
  jEdit, in order to be backwards compatible, new commands need also to be recognised 
  by Proof-General. This results in some subtle configuration issues, which we will 
  explain in the next section. Here we just describe how to define new commands
  to work with jEdit.

  Let us start with a ``silly'' command that does nothing at all. We
  shall name this command \isacommand{foobar}.  Before you can
  implement any new command, you have to ``announce'' it in the
  \isacommand{keywords}-section of your theory header. For \isacommand{foobar}
  we need to write something like

  \begin{graybox}
  \isacommand{theory}~@{text Foo}\\
  \isacommand{imports}~@{text Main}\\
  \isacommand{keywords} @{text [quotes] "foobar"} @{text "::"} @{text "thy_decl"}\\
  ...
  \end{graybox}

  whereby @{ML_ind "thy_decl" in Keyword} indicates the kind of the
  command.  Another possible kind is @{text "thy_goal"}, or you can
  also omit the kind entirely, in which case you declare a keyword
  (something that is part of a command).

  Now you can implement \isacommand{foobar} as follows.
*}

ML %grayML{*let
  val do_nothing = Scan.succeed (Local_Theory.background_theory I)
in
  Outer_Syntax.local_theory @{command_spec "foobar"} 
    "description of foobar" 
      do_nothing
end *}

text {*
  The crucial function @{ML_ind local_theory in Outer_Syntax} expects
  the name for the command, a kind indicator, a short description and 
  a parser producing a local theory transition (explained later). For the
  name and the kind, you can use the ML-antiquotation @{text "@{command_spec ...}"}.
  You can now write in your theory 
*}

foobar

text {*
  but of course you will not see anything since \isacommand{foobar} is
  not intended to do anything.  Remember, however, that this only
  works in jEdit. In order to enable also Proof-General recognise this
  command, a keyword file needs to be generated (see next section).

  As it stands, the command \isacommand{foobar} is not very useful. Let
  us refine it a bit next by letting it take a proposition as argument
  and printing this proposition inside the tracing buffer. We announce
  the command \isacommand{foobar\_trace} in the theory header as

  \begin{graybox}
  \isacommand{keywords} @{text [quotes] "foobar_trace"} @{text "::"} @{text "thy_decl"}
  \end{graybox}

  The crucial part of a command is the function that determines the
  behaviour of the command. In the code above we used a
  ``do-nothing''-function, which because of the parser @{ML_ind succeed in Scan}
  does not parse any argument, but immediately returns the simple
  function @{ML "Local_Theory.background_theory I"}. We can replace
  this code by a function that first parses a proposition (using the
  parser @{ML Parse.prop}), then prints out some tracing information
  (using the function @{text trace_prop}) and finally does
  nothing. For this you can write:
*}

ML %grayML{*let
  fun trace_prop str = 
    Local_Theory.background_theory (fn ctxt => (tracing str; ctxt))
in
  Outer_Syntax.local_theory @{command_spec "foobar_trace"} 
    "traces a proposition" 
      (Parse.prop >> trace_prop)
end *}

text {*
  This command can now be used to 
  see the proposition in the tracing buffer.
*}

foobar_trace "True \<and> False"

text {*
  Note that so far we used @{ML_ind thy_decl in Keyword} as the kind
  indicator for the new command.  This means that the command finishes as soon as
  the arguments are processed. Examples of this kind of commands are
  \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  are expected to parse some arguments, for example a proposition, and then
  ``open up'' a proof in order to prove the proposition (for example
  \isacommand{lemma}) or prove some other properties (for example
  \isacommand{function}). To achieve this kind of behaviour, you have to use
  the kind indicator @{ML_ind thy_goal in Keyword} and the function @{ML
  "local_theory_to_proof" in Outer_Syntax} to set up the command. 
  Below we show the command \isacommand{foobar\_goal} which takes a
  proposition as argument and then starts a proof in order to prove
  it. Therefore, we need to announce this command in the header 
  as @{text "thy_goal"}.

  \begin{graybox}
  \isacommand{keywords} @{text [quotes] "foobar_goal"} @{text "::"} @{text "thy_goal"}
  \end{graybox}

  Then we can write:
*}

ML%linenosgray{*let
  fun goal_prop str ctxt =
    let
      val prop = Syntax.read_prop ctxt str
    in
      Proof.theorem NONE (K I) [[(prop, [])]] ctxt
    end
in
  Outer_Syntax.local_theory_to_proof @{command_spec "foobar_goal"}
    "proves a proposition" 
      (Parse.prop >> goal_prop)
end *}

text {*
  The function @{text goal_prop} in Lines 2 to 7 takes a string (the proposition to be
  proved) and a context as argument.  The context is necessary in order to be able to use
  @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition.
  In Line 6 the function @{ML_ind theorem in Proof} starts the proof for the
  proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  omit); the argument @{ML "(K I)"} stands for a function that determines what
  should be done with the theorem once it is proved (we chose to just forget
  about it). 

  If you now type \isacommand{foobar\_goal}~@{text [quotes] "True \<and> True"},
  you obtain the following proof state:
*}

foobar_goal "True \<and> True"
txt {*
  \begin{minipage}{\textwidth}
  @{subgoals [display]}
  \end{minipage}\medskip

  and can prove the proposition as follows.
*}
apply(rule conjI)
apply(rule TrueI)+
done

text {*
  The last command we describe here is
  \isacommand{foobar\_proof}. Like \isacommand{foobar\_goal}, its purpose is
  to take a proposition and open a corresponding proof-state that
  allows us to give a proof for it. However, unlike
  \isacommand{foobar\_goal}, the proposition will be given as a
  ML-value. Such a command is quite useful during development
  when you generate a goal on the ML-level and want to see
  whether it is provable. In addition we want to allow the proved 
  proposition to have a name that can be referenced later on.

  The first problem for \isacommand{foobar\_proof} is to parse some
  text as ML-source and then interpret it as an Isabelle term using
  the ML-runtime.  For the parsing part, we can use the function
  @{ML_ind "ML_source" in Parse} in the structure @{ML_struct
  Parse}. For running the ML-interpreter we need the following
  scaffolding code.
*}

ML %grayML{*
structure Result = Proof_Data 
  (type T = unit -> term
   fun init thy () = error "Result")

val result_cookie = (Result.get, Result.put, "Result.put") *}

text {*
  With this in place, we can implement the code for \isacommand{foobar\_prove} 
  as follows.
*}

ML %linenosgray{*let
   fun after_qed thm_name thms lthy =
        Local_Theory.note (thm_name, (flat thms)) lthy |> snd

   fun setup_proof (thm_name, {text, ...}) lthy =
     let
       val trm = Code_Runtime.value lthy result_cookie ("", text)
     in
       Proof.theorem NONE (after_qed thm_name) [[(trm, [])]] lthy
     end

   val parser = Parse_Spec.opt_thm_name ":" -- Parse.ML_source
in
   Outer_Syntax.local_theory_to_proof @{command_spec "foobar_prove"}
     "proving a proposition" 
       (parser >> setup_proof)
end*}

text {*
  In Line 12, we implement a parser that first reads in an optional lemma name (terminated 
  by ``:'') and then some ML-code. The function in Lines 5 to 10 takes the ML-text
  and lets the ML-runtime evaluate it using the function @{ML_ind value in Code_Runtime}
  in the structure @{ML_struct Code_Runtime}.  Once the ML-text has been turned into a term, 
  the function @{ML theorem in Proof} opens a corresponding proof-state. This function takes the
  function @{text "after_qed"} as argument, whose purpose is to store the theorem
  (once it is proven) under the given name @{text "thm_name"}.

  You can now define a term, for example
*}

ML %grayML{*val prop_true = @{prop "True"}*}

text {*
  and give it a proof using \isacommand{foobar\_prove}:
*}

foobar_prove test: prop_true
apply(rule TrueI)
done

text {*
  Finally you can test whether the lemma has been stored under the given name.
  
  \begin{isabelle}
  \isacommand{thm}~@{text "test"}\\
  @{text "> "}~@{thm TrueI}
  \end{isabelle}

  While this is everything you have to do for a new command when using jEdit, 
  things are not as simple when using Emacs and ProofGeneral. We explain the details 
  next.
*}


section {* Proof-General and Keyword Files *}

text {*
  In order to use a new command in Emacs and Proof-General, you need a keyword
  file that can be loaded by ProofGeneral. To keep things simple we take as
  running example the command \isacommand{foobar} from the previous section. 

  A keyword file can be generated with the command-line:

  @{text [display] "$ isabelle keywords -k foobar some_log_files"}

  The option @{text "-k foobar"} indicates which postfix the name of the keyword file 
  will be assigned. In the case above the file will be named @{text
  "isar-keywords-foobar.el"}. This command requires log files to be
  present (in order to extract the keywords from them). To generate these log
  files, you first need to package the code above into a separate theory file named
  @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the
  complete code.


  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \begin{figure}[t]
  \begin{graybox}\small
  \isacommand{theory}~@{text Command}\\
  \isacommand{imports}~@{text Main}\\
  \isacommand{keywords} @{text [quotes] "foobar"} @{text "::"} @{text "thy_decl"}\\
  \isacommand{begin}\\
  \isacommand{ML}~@{text "\<verbopen>"}\\
  @{ML
"let
  val do_nothing = Scan.succeed (Local_Theory.background_theory I)
in
  Outer_Syntax.local_theory @{command_spec \"foobar\"}
    \"description of foobar\" 
       do_nothing
end"}\\
  @{text "\<verbclose>"}\\
  \isacommand{end}
  \end{graybox}
  \caption{This file can be used to generate a log file. This log file in turn can
  be used to generate a keyword file containing the command \isacommand{foobar}.
  \label{fig:commandtheory}}
  \end{figure}
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  For our purposes it is sufficient to use the log files of the theories
  @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as
  the log file for the theory @{text "Command.thy"}, which contains the new
  \isacommand{foobar}-command. If you target other logics besides HOL, such
  as Nominal or ZF, then you need to adapt the log files appropriately.
  
  @{text Pure} and @{text HOL} are usually compiled during the installation of
  Isabelle. So log files for them should be already available. If not, then
  they can be conveniently compiled with the help of the build-script from the Isabelle
  distribution.

  @{text [display] 
"$ ./build -m \"Pure\"
$ ./build -m \"HOL\""}
  
  The @{text "Pure-ProofGeneral"} theory needs to be compiled with:

  @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}

  For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory 
  with:

  @{text [display] "$ isabelle mkdir FoobarCommand"}

  This generates a directory containing the files: 

  @{text [display] 
"./IsaMakefile
./FoobarCommand/ROOT.ML
./FoobarCommand/document
./FoobarCommand/document/root.tex"}


  You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}
  and add the line 

  @{text [display] "no_document use_thy \"Command\";"} 
  
  to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing:

  @{text [display] "$ isabelle make"}

  If the compilation succeeds, you have finally created all the necessary log files. 
  They are stored in the directory 
  
  @{text [display]  "~/.isabelle/heaps/Isabelle2012/polyml-5.2.1_x86-linux/log"}

  or something similar depending on your Isabelle distribution and architecture.
  One quick way to assign a shell variable to this directory is by typing

  @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
 
  on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the 
  directory should include the files:

  @{text [display] 
"Pure.gz
HOL.gz
Pure-ProofGeneral.gz
HOL-FoobarCommand.gz"} 

  From them you can create the keyword files. Assuming the name 
  of the directory is in  @{text "$ISABELLE_LOGS"},
  then the Unix command for creating the keyword file is:

@{text [display]
"$ isabelle keywords -k foobar 
   $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"}

  The result is the file @{text "isar-keywords-foobar.el"}. It should contain
  the string @{text "foobar"} twice.\footnote{To see whether things are fine,
  check that @{text "grep foobar"} on this file returns something non-empty.}
  This keyword file needs to be copied into the directory @{text
  "~/.isabelle/etc"}. To make ProofGeneral aware of it, you have to start
  Isabelle with the option @{text "-k foobar"}, that is:


  @{text [display] "$ isabelle emacs -k foobar a_theory_file"}

  If you now build a theory on top of @{text "Command.thy"}, 
  then you can now use the command \isacommand{foobar}
  in Proof-General

  A similar procedure has to be done with any 
  other new command, and also any new keyword that is introduced with 
  the function @{ML_ind define in Keyword}. For example:
*}

ML %grayML{*val _ = Keyword.define ("blink", NONE) *}


text {*
  Also if the kind of a command changes, from @{text "thy_decl"} to 
  @{text "thy_goal"} say,  you need to recreate the keyword file.
*}




text {*
  {\bf TBD below}

 *}





(*
ML {*
structure TacticData = ProofDataFun
(
  type T = thm list -> tactic;
  fun init _ = undefined;
)

val set_tactic = TacticData.put;
*}

ML {*
  TacticData.get @{context}
*}

ML {* Method.set_tactic  *}
ML {* fun tactic (facts: thm list) : tactic = (atac 1) *}
ML {* Context.map_proof *}
ML {* ML_Context.expression *}
ML {* METHOD *}


ML {* 
fun myexpression pos bind body txt =
let
  val _ = tracing ("bind)" ^ bind)
  val _ = tracing ("body)" ^ body)
  val _ = tracing ("txt)"  ^ txt)
  val _ = tracing ("result) " ^ "Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^
      " end (ML_Context.the_generic_context ())));")
in
  ML_Context.exec (fn () => ML_Context.eval false pos
    ("Context.set_thread_data (SOME (let " ^ bind ^ " = " ^ txt ^ " in " ^ body ^
      " end (ML_Context.the_generic_context ())));"))
end
*}


ML {*
fun ml_tactic (txt, pos) ctxt =
let
  val ctxt' = ctxt |> Context.proof_map
      (myexpression pos
        "fun tactic (facts: thm list) : tactic"
        "Context.map_proof (Method.set_tactic tactic)" txt);
in 
  Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt')
end;
*}

ML {*
fun tactic3 (txt, pos) ctxt = 
  let
    val _ = tracing ("1) " ^ txt )
  in 
   METHOD (ml_tactic (txt, pos) ctxt; K (atac 1))
  end
*}

setup {*
Method.setup (Binding.name "tactic3") (Scan.lift (Parse.position Args.name) 
  >> tactic3)
    "ML tactic as proof method"
*}

lemma "A \<Longrightarrow> A"
apply(tactic3 {* (atac 1)  *})
done

ML {* 
(ML_Context.the_generic_context ())
*}

ML {*
Context.set_thread_data;
ML_Context.the_generic_context
*}

lemma "A \<Longrightarrow> A"
ML_prf {*
Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic =  (atac 1)   in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));
*}

ML {*
Context.set_thread_data (SOME ((let fun tactic (facts: thm list) : tactic =  (atac 1)   in 3 end) (ML_Context.the_generic_context ())));
*}

ML {*
Context.set_thread_data (SOME (let 
  fun tactic (facts: thm list) : tactic =  (atac 1) 
in 
  Context.map_proof (Method.set_tactic tactic) 
end 
  (ML_Context.the_generic_context ())));
*}


ML {*
let 
  fun tactic (facts: thm list) : tactic = atac
in
  Context.map_proof (Method.set_tactic tactic)
end *}

end *}

ML {* Toplevel.program (fn () => 
(ML_Context.expression Position.none "val plus : int" "3 + 4" "1" (Context.Proof @{context})))*}


ML {*
fun ml_tactic (txt, pos) ctxt =
  let
    val ctxt' = ctxt |> Context.proof_map
      (ML_Context.expression pos
        "fun tactic (facts: thm list) : tactic"
        "Context.map_proof (Method.set_tactic tactic)" txt);
  in Context.setmp_thread_data (SOME (Context.Proof ctxt)) (TacticData.get ctxt') end;

*}

ML {*
Context.set_thread_data (SOME (let fun tactic (facts: thm list) : tactic =  (atac 1)   in Context.map_proof (Method.set_tactic tactic) end (ML_Context.the_generic_context ())));
*}
*)

section {* Methods (TBD) *}

text {*
  (FIXME: maybe move to after the tactic section)

  Methods are central to Isabelle. They are the ones you use for example
  in \isacommand{apply}. To print out all currently known methods you can use the 
  Isabelle command:

  \begin{isabelle}
  \isacommand{print\_methods}\\
  @{text "> methods:"}\\
  @{text ">   -:  do nothing (insert current facts only)"}\\
  @{text ">   HOL.default:  apply some intro/elim rule (potentially classical)"}\\
  @{text ">   ..."}
  \end{isabelle}

  An example of a very simple method is:
*}

method_setup %gray foo = 
 {* Scan.succeed
      (K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *}
         "foo method for conjE and conjI"

text {*
  It defines the method @{text foo}, which takes no arguments (therefore the
  parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
  @{ML_ind SIMPLE_METHOD in Method}
  turns such a tactic into a method. The method @{text "foo"} can be used as follows
*}

lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
apply(foo)
txt {*
  where it results in the goal state

  \begin{minipage}{\textwidth}
  @{subgoals}
  \end{minipage} *}
(*<*)oops(*>*)

method_setup test = {* 
  Scan.lift (Scan.succeed (K Method.succeed)) *} {* bla *}

lemma "True"
apply(test)
oops

method_setup joker = {* 
  Scan.lift (Scan.succeed (fn ctxt => Method.cheating ctxt true)) *} {* bla *}

lemma "False"
apply(joker)
oops

text {* if true is set then always works *}

ML {* atac *} 

method_setup first_atac = {* Scan.lift (Scan.succeed (K (SIMPLE_METHOD (atac 1)))) *} {* bla *}

ML {* HEADGOAL *}

lemma "A \<Longrightarrow> A"
apply(first_atac)
oops

method_setup my_atac = {* Scan.lift (Scan.succeed (K (SIMPLE_METHOD' atac))) *} {* bla *}

lemma "A \<Longrightarrow> A"
apply(my_atac)
oops







(*
ML {* SIMPLE_METHOD *}
ML {* METHOD *}
ML {* K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1)) *}
ML {* Scan.succeed  *}
*)

ML {* resolve_tac *}

method_setup myrule =
  {* Scan.lift (Scan.succeed (K (METHOD (fn thms => resolve_tac thms 1)))) *}
  {* bla *}

lemma
  assumes a: "A \<Longrightarrow> B \<Longrightarrow> C"
  shows "C"
using a
apply(myrule)
oops



text {*
  (********************************************************)
  (FIXME: explain a version of rule-tac)
*}

end