CookBook/Parsing.thy
changeset 189 069d525f8f1d
parent 188 8939b8fd8603
child 190 ca0ac2e75f6d
--- a/CookBook/Parsing.thy	Wed Mar 18 23:52:51 2009 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,1684 +0,0 @@
-theory Parsing
-imports Base "Package/Simple_Inductive_Package"
-begin
-
-
-chapter {* Parsing *}
-
-text {*
-
-  Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. 
-  Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so
-  on, belong to the outer syntax, whereas items inside double quotation marks, such 
-  as terms, types and so on, belong to the inner syntax. For parsing inner syntax, 
-  Isabelle uses a rather general and sophisticated algorithm, which 
-  is driven by priority grammars. Parsers for outer syntax are built up by functional
-  parsing combinators. These combinators are a well-established technique for parsing, 
-  which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}.
-  Isabelle developers are usually concerned with writing these outer syntax parsers, 
-  either for new definitional packages or for calling tactics with specific arguments. 
-
-  \begin{readmore}
-  The library 
-  for writing parser combinators is split up, roughly, into two parts. 
-  The first part consists of a collection of generic parser combinators defined
-  in the structure @{ML_struct Scan} in the file 
-  @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of 
-  combinators for dealing with specific token types, which are defined in the 
-  structure @{ML_struct OuterParse} in the file 
-  @{ML_file "Pure/Isar/outer_parse.ML"}.
-  \end{readmore}
-
-*}
-
-section {* Building Generic Parsers *}
-
-text {*
-
-  Let us first have a look at parsing strings using generic parsing combinators. 
-  The function @{ML "$$"} takes a string as argument and will ``consume'' this string from 
-  a given input list of strings. ``Consume'' in this context means that it will 
-  return a pair consisting of this string and the rest of the input list. 
-  For example:
-
-  @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
-
-  @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
-
-  The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception 
-  @{text "FAIL"} if no string can be consumed. For example trying to parse
-
-  @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" 
-                               "Exception FAIL raised"}
-  
-  will raise the exception @{text "FAIL"}.
-  There are three exceptions used in the parsing combinators:
-
-  \begin{itemize}
-  \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
-  might be explored. 
-  \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
-  in @{text "($$ \"h\") []"}.
-  \item @{text "ABORT"} is the exception that is raised when a dead end is reached. 
-  It is used for example in the function @{ML "!!"} (see below).
-  \end{itemize}
-
-  However, note that these exceptions are private to the parser and cannot be accessed
-  by the programmer (for example to handle them).
-  
-  Slightly more general than the parser @{ML "$$"} is the function @{ML
-  Scan.one}, in that it takes a predicate as argument and then parses exactly
-  one item from the input list satisfying this predicate. For example the
-  following parser either consumes an @{text [quotes] "h"} or a @{text
-  [quotes] "w"}:
-
-
-@{ML_response [display,gray] 
-"let 
-  val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
-  val input1 = (explode \"hello\")
-  val input2 = (explode \"world\")
-in
-    (hw input1, hw input2)
-end"
-    "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
-
-  Two parser can be connected in sequence by using the function @{ML "--"}. 
-  For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
-  sequence you can achieve by:
-
-  @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
-                          "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
-
-  Note how the result of consumed strings builds up on the left as nested pairs.  
-
-  If, as in the previous example, you want to parse a particular string, 
-  then you should use the function @{ML Scan.this_string}:
-
-  @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
-                          "(\"hell\", [\"o\"])"}
-
-  Parsers that explore alternatives can be constructed using the function @{ML
-  "||"}. For example, the parser @{ML "(p || q)" for p q} returns the
-  result of @{text "p"}, in case it succeeds, otherwise it returns the
-  result of @{text "q"}. For example:
-
-
-@{ML_response [display,gray] 
-"let 
-  val hw = ($$ \"h\") || ($$ \"w\")
-  val input1 = (explode \"hello\")
-  val input2 = (explode \"world\")
-in
-  (hw input1, hw input2)
-end"
-  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
-
-  The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function 
-  for parsers, except that they discard the item being parsed by the first (respectively second)
-  parser. For example:
-  
-@{ML_response [display,gray]
-"let 
-  val just_e = ($$ \"h\") |-- ($$ \"e\") 
-  val just_h = ($$ \"h\") --| ($$ \"e\") 
-  val input = (explode \"hello\")  
-in 
-  (just_e input, just_h input)
-end"
-  "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
-
-  The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
-  @{text "p"}, if it succeeds; otherwise it returns 
-  the default value @{text "x"}. For example:
-
-@{ML_response [display,gray]
-"let 
-  val p = Scan.optional ($$ \"h\") \"x\"
-  val input1 = (explode \"hello\")
-  val input2 = (explode \"world\")  
-in 
-  (p input1, p input2)
-end" 
- "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
-
-  The function @{ML Scan.option} works similarly, except no default value can
-  be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
-
-@{ML_response [display,gray]
-"let 
-  val p = Scan.option ($$ \"h\")
-  val input1 = (explode \"hello\")
-  val input2 = (explode \"world\")  
-in 
-  (p input1, p input2)
-end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
-
-  The function @{ML "!!"} helps to produce appropriate error messages
-  during parsing. For example if you want to parse that @{text p} is immediately 
-  followed by @{text q}, or start a completely different parser @{text r},
-  you might write:
-
-  @{ML [display,gray] "(p -- q) || r" for p q r}
-
-  However, this parser is problematic for producing an appropriate error
-  message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in
-  that case you lose the information that @{text p} should be followed by
-  @{text q}. To see this consider the case in which @{text p} is present in
-  the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
-  and the alternative parser @{text r} will be tried. However in many
-  circumstance this will be the wrong parser for the input ``p-followed-by-q''
-  and therefore will also fail. The error message is then caused by the
-  failure of @{text r}, not by the absence of @{text q} in the input. This
-  kind of situation can be avoided when using the function @{ML "!!"}. 
-  This function aborts the whole process of parsing in case of a
-  failure and prints an error message. For example if you invoke the parser
-
-  
-  @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
-
-  on @{text [quotes] "hello"}, the parsing succeeds
-
-  @{ML_response [display,gray] 
-                "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
-                "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
-
-  but if you invoke it on @{text [quotes] "world"}
-  
-  @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
-                               "Exception ABORT raised"}
-
-  then the parsing aborts and the error message @{text "foo"} is printed. In order to
-  see the error message properly, you need to prefix the parser with the function 
-  @{ML "Scan.error"}. For example:
-
-  @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
-                               "Exception Error \"foo\" raised"}
-
-  This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
-  (see Section~\ref{sec:newcommand} which explains this function in more detail). 
-  
-  Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
-  r}. If you want to generate the correct error message for p-followed-by-q,
-  then you have to write:
-*}
-
-ML{*fun p_followed_by_q p q r =
-let 
-  val err_msg = (fn _ => p ^ " is not followed by " ^ q)
-in
-  ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)
-end *}
-
-
-text {*
-  Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and 
-  the input @{text [quotes] "holle"} 
-
-  @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
-                               "Exception ERROR \"h is not followed by e\" raised"} 
-
-  produces the correct error message. Running it with
- 
-  @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
-                          "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
-  
-  yields the expected parsing. 
-
-  The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as 
-  often as it succeeds. For example:
-  
-  @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
-                "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
-  
-  Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
-  @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
-  succeeds at least once.
-
-  Also note that the parser would have aborted with the exception @{text MORE}, if
-  you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
-  the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
-  them you can write:
-  
-  @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
-                "([\"h\", \"h\", \"h\", \"h\"], [])"}
-
-  @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
-  other stoppers need to be used when parsing, for example, tokens. However, this kind of 
-  manually wrapping is often already done by the surrounding infrastructure. 
-
-  The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
-  string as in
-
-  @{ML_response [display,gray] 
-"let 
-   val p = Scan.repeat (Scan.one Symbol.not_eof)
-   val input = (explode \"foo bar foo\") 
-in
-   Scan.finite Symbol.stopper p input
-end" 
-"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
-
-  where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
-  end of the input string (i.e.~stopper symbol).
-
-  The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
-  parse the input, then the whole parser fails; if not, then the second is tried. Therefore
-  
-  @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
-                               "Exception FAIL raised"}
-
-  fails, while
-
-  @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")"
-                          "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
-
-  succeeds. 
-
-  The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
-  input until a certain marker symbol is reached. In the example below the marker
-  symbol is a @{text [quotes] "*"}.
-
-  @{ML_response [display,gray]
-"let 
-  val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
-  val input1 = (explode \"fooooo\")
-  val input2 = (explode \"foo*ooo\")
-in
-  (Scan.finite Symbol.stopper p input1, 
-   Scan.finite Symbol.stopper p input2)
-end"
-"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
- ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
-
-  After parsing is done, you nearly always want to apply a function on the parsed 
-  items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs 
-  first the parser @{text p} and upon successful completion applies the 
-  function @{text f} to the result. For example
-
-@{ML_response [display,gray]
-"let 
-  fun double (x,y) = (x ^ x, y ^ y) 
-in
-  (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
-end"
-"((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
-
-  doubles the two parsed input strings; or
-
-  @{ML_response [display,gray] 
-"let 
-   val p = Scan.repeat (Scan.one Symbol.not_eof)
-   val input = (explode \"foo bar foo\") 
-in
-   Scan.finite Symbol.stopper (p >> implode) input
-end" 
-"(\"foo bar foo\",[])"}
-
-  where the single-character strings in the parsed output are transformed
-  back into one string.
- 
-  The function @{ML Scan.ahead} parses some input, but leaves the original
-  input unchanged. For example:
-
-  @{ML_response [display,gray]
-  "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")" 
-  "(\"foo\", [\"f\", \"o\", \"o\"])"} 
-
-  The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
-  the given parser to the second component of the pair and leaves the  first component 
-  untouched. For example
-
-@{ML_response [display,gray]
-"Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
-"((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
-
-  (FIXME: In which situations is this useful? Give examples.) 
-
-  \begin{exercise}\label{ex:scancmts}
-  Write a parser that parses an input string so that any comment enclosed
-  inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
-  @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
-  function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
-  "s1 ^ s ^ s2" for s1 s2 s}.
-  \end{exercise}
-*}
-
-section {* Parsing Theory Syntax *}
-
-text {*
-  (FIXME: context parser)
-
-  Most of the time, however, Isabelle developers have to deal with parsing
-  tokens, not strings.  These token parsers have the type:
-*}
-  
-ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}
-
-text {*
-  The reason for using token parsers is that theory syntax, as well as the
-  parsers for the arguments of proof methods, use the type @{ML_type
-  OuterLex.token} (which is identical to the type @{ML_type
-  OuterParse.token}).  However, there are also handy parsers for
-  ML-expressions and ML-files.
-
-  \begin{readmore}
-  The parser functions for the theory syntax are contained in the structure
-  @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
-  The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
-  \end{readmore}
-
-  The structure @{ML_struct OuterLex} defines several kinds of tokens (for example 
-  @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and
-  @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
-  kind of tokens.
-*}  
-
-text {*
-  The first example shows how to generate a token list out of a string using
-  the function @{ML "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"}
-  since, at the moment, we are not interested in generating
-  precise error messages. The following code
-
-@{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
-"[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
- Token (\<dots>,(Space, \" \"),\<dots>), 
- Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
-
-  produces three tokens where the first and the last are identifiers, since
-  @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
-  other syntactic category.\footnote{Note that because of a possible a bug in
-  the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of
-  the tokens.} The second indicates a space.
-
-  Many parsing functions later on will require spaces, comments and the like
-  to have already been filtered out.  So from now on we are going to use the 
-  functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example:
-
-@{ML_response_fake [display,gray]
-"let
-   val input = OuterSyntax.scan Position.none \"hello world\"
-in
-   filter OuterLex.is_proper input
-end" 
-"[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
-
-  For convenience we define the function:
-
-*}
-
-ML{*fun filtered_input str = 
-  filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}
-
-text {*
-
-  If you now parse
-
-@{ML_response_fake [display,gray] 
-"filtered_input \"inductive | for\"" 
-"[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
- Token (\<dots>,(Keyword, \"|\"),\<dots>), 
- Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
-
-  you obtain a list consisting of only a command and two keyword tokens.
-  If you want to see which keywords and commands are currently known to Isabelle, type in
-  the following code (you might have to adjust the @{ML print_depth} in order to
-  see the complete list):
-
-@{ML_response_fake [display,gray] 
-"let 
-  val (keywords, commands) = OuterKeyword.get_lexicons ()
-in 
-  (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
-end" 
-"([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
-
-  The parser @{ML "OuterParse.$$$"} parses a single keyword. For example:
-
-@{ML_response [display,gray]
-"let 
-  val input1 = filtered_input \"where for\"
-  val input2 = filtered_input \"| in\"
-in 
-  (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
-end"
-"((\"where\",\<dots>), (\"|\",\<dots>))"}
-
-  Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
-
-@{ML_response [display,gray]
-"let 
-  val input = filtered_input \"| in\"
-in 
-  (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
-end"
-"((\"|\", \"in\"), [])"}
-
-  The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
-  list of items recognised by the parser @{text p}, where the items being parsed
-  are separated by the string @{text s}. For example:
-
-@{ML_response [display,gray]
-"let 
-  val input = filtered_input \"in | in | in foo\"
-in 
-  (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
-end" 
-"([\"in\", \"in\", \"in\"], [\<dots>])"}
-
-  @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
-  be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
-  end of the parsed string, otherwise the parser would have consumed all
-  tokens and then failed with the exception @{text "MORE"}. Like in the
-  previous section, we can avoid this exception using the wrapper @{ML
-  Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
-  OuterLex.stopper}. We can write:
-
-@{ML_response [display,gray]
-"let 
-  val input = filtered_input \"in | in | in\"
-in 
-  Scan.finite OuterLex.stopper 
-         (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
-end" 
-"([\"in\", \"in\", \"in\"], [])"}
-
-  The following function will help to run examples.
-
-*}
-
-ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
-
-text {*
-
-  The function @{ML "OuterParse.!!!"} can be used to force termination of the
-  parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), 
-  except that the error message is fixed to be @{text [quotes] "Outer syntax error"}
-  with a relatively precise description of the failure. For example:
-
-@{ML_response_fake [display,gray]
-"let 
-  val input = filtered_input \"in |\"
-  val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
-in 
-  parse (OuterParse.!!! parse_bar_then_in) input 
-end"
-"Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
-but keyword in was found\" raised"
-}
-
-  \begin{exercise} (FIXME)
-  A type-identifier, for example @{typ "'a"}, is a token of 
-  kind @{ML "Keyword" in OuterLex}. It can be parsed using 
-  the function @{ML OuterParse.type_ident}.
-  \end{exercise}
-
-  (FIXME: or give parser for numbers)
-
-  Whenever there is a possibility that the processing of user input can fail, 
-  it is a good idea to give as much information about where the error 
-  occured. For this Isabelle can attach positional information to tokens
-  and then thread this information up the processing chain. To see this,
-  modify the function @{ML filtered_input} described earlier to 
-*}
-
-ML{*fun filtered_input' str = 
-       filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *}
-
-text {*
-  where we pretend the parsed string starts on line 7. An example is
-
-@{ML_response_fake [display,gray]
-"filtered_input' \"foo \\n bar\""
-"[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \<dots>),
- Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"}
-
-  in which the @{text [quotes] "\\n"} causes the second token to be in 
-  line 8.
-
-  By using the parser @{ML OuterParse.position} you can decode the positional
-  information and return it as part of the parsed input. For example
-
-@{ML_response_fake [display,gray]
-"let
-  val input = (filtered_input' \"where\")
-in 
-  parse (OuterParse.position (OuterParse.$$$ \"where\")) input
-end"
-"((\"where\", {line=7, end_line=7}), [])"}
-
-  \begin{readmore}
-  The functions related to positions are implemented in the file
-  @{ML_file "Pure/General/position.ML"}.
-  \end{readmore}
-
-*}
-
-section {* Parsing Inner Syntax *}
-
-text {*
-  There is usually no need to write your own parser for parsing inner syntax, that is 
-  for terms and  types: you can just call the pre-defined parsers. Terms can 
-  be parsed using the function @{ML OuterParse.term}. For example:
-
-@{ML_response [display,gray]
-"let 
-  val input = OuterSyntax.scan Position.none \"foo\"
-in 
-  OuterParse.term input
-end"
-"(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
-
-  The function @{ML OuterParse.prop} is similar, except that it gives a different
-  error message, when parsing fails. As you can see, the parser not just returns 
-  the parsed string, but also some encoded information. You can decode the
-  information with the function @{ML YXML.parse}. For example
-
-  @{ML_response [display,gray]
-  "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
-  "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
-
-  The result of the decoding is an XML-tree. You can see better what is going on if
-  you replace @{ML Position.none} by @{ML "Position.line 42"}, say:
-
-@{ML_response [display,gray]
-"let 
-  val input = OuterSyntax.scan (Position.line 42) \"foo\"
-in 
-  YXML.parse (fst (OuterParse.term input))
-end"
-"XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}
-  
-  The positional information is stored as part of an XML-tree so that code 
-  called later on will be able to give more precise error messages. 
-
-  \begin{readmore}
-  The functions to do with input and output of XML and YXML are defined 
-  in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}.
-  \end{readmore}
-  
-*}
-
-section {* Parsing Specifications\label{sec:parsingspecs} *}
-
-text {*
-  There are a number of special purpose parsers that help with parsing
-  specifications of function definitions, inductive predicates and so on. In
-  Capter~\ref{chp:package}, for example, we will need to parse specifications
-  for inductive predicates of the form:
-*}
-
-simple_inductive
-  even and odd
-where
-  even0: "even 0"
-| evenS: "odd n \<Longrightarrow> even (Suc n)"
-| oddS: "even n \<Longrightarrow> odd (Suc n)"
-
-text {*
-  For this we are going to use the parser:
-*}
-
-ML %linenosgray{*val spec_parser = 
-     OuterParse.fixes -- 
-     Scan.optional 
-       (OuterParse.$$$ "where" |--
-          OuterParse.!!! 
-            (OuterParse.enum1 "|" 
-               (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*}
-
-text {*
-  Note that the parser does not parse the keyword \simpleinductive, even if it is
-  meant to process definitions as shown above. The parser of the keyword 
-  will be given by the infrastructure that will eventually call @{ML spec_parser}.
-  
-
-  To see what the parser returns, let us parse the string corresponding to the 
-  definition of @{term even} and @{term odd}:
-
-@{ML_response [display,gray]
-"let
-  val input = filtered_input
-     (\"even and odd \" ^  
-      \"where \" ^
-      \"  even0[intro]: \\\"even 0\\\" \" ^ 
-      \"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^ 
-      \"| oddS[intro]:  \\\"even n \<Longrightarrow> odd (Suc n)\\\"\")
-in
-  parse spec_parser input
-end"
-"(([(even, NONE, NoSyn), (odd, NONE, NoSyn)],
-     [((even0,\<dots>), \"\\^E\\^Ftoken\\^Eeven 0\\^E\\^F\\^E\"),
-      ((evenS,\<dots>), \"\\^E\\^Ftoken\\^Eodd n \<Longrightarrow> even (Suc n)\\^E\\^F\\^E\"),
-      ((oddS,\<dots>), \"\\^E\\^Ftoken\\^Eeven n \<Longrightarrow> odd (Suc n)\\^E\\^F\\^E\")]), [])"}
-
-  As you see, the result is a pair consisting of a list of
-  variables with optional type-annotation and syntax-annotation, and a list of
-  rules where every rule has optionally a name and an attribute.
-
-  The function @{ML OuterParse.fixes} in Line 2 of the parser reads an 
-  \isacommand{and}-separated 
-  list of variables that can include optional type annotations and syntax translations. 
-  For example:\footnote{Note that in the code we need to write 
-  @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
-  in the compound type.}
-
-@{ML_response [display,gray]
-"let
-  val input = filtered_input 
-        \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\"
-in
-   parse OuterParse.fixes input
-end"
-"([(foo, SOME \"\\^E\\^Ftoken\\^Eint \<Rightarrow> bool\\^E\\^F\\^E\", NoSyn), 
-  (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), 
-  (blonk, NONE, NoSyn)],[])"}  
-*}
-
-text {*
-  Whenever types are given, they are stored in the @{ML SOME}s. The types are
-  not yet used to type the variables: this must be done by type-inference later
-  on. Since types are part of the inner syntax they are strings with some
-  encoded information (see previous section). If a syntax translation is
-  present for a variable, then it is stored in the @{ML Mixfix} datastructure;
-  no syntax translation is indicated by @{ML NoSyn}.
-
-  \begin{readmore}
-  The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
-  \end{readmore}
-
-  Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
-  list of introduction rules, that is propositions with theorem
-  annotations. The introduction rules are propositions parsed by @{ML
-  OuterParse.prop}. However, they can include an optional theorem name plus
-  some attributes. For example
-
-@{ML_response [display,gray] "let 
-  val input = filtered_input \"foo_lemma[intro,dest!]:\"
-  val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
-in 
-  (name, map Args.dest_src attrib)
-end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
- 
-  The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of
-  @{ML thm_name in SpecParse}. Theorem names can contain attributes. The name 
-  has to end with @{text [quotes] ":"}---see the argument of 
-  the function @{ML SpecParse.opt_thm_name} in Line 7.
-
-  \begin{readmore}
-  Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
-  and @{ML_file "Pure/Isar/args.ML"}.
-  \end{readmore}
-*}
-
-section {* New Commands and Keyword Files\label{sec:newcommand} *}
-
-text {*
-  (FIXME: update to the right command setup)
-
-  Often new commands, for example for providing new definitional principles,
-  need to be implemented. While this is not difficult on the ML-level,
-  new commands, in order to be useful, need to be recognised by
-  ProofGeneral. This results in some subtle configuration issues, which we
-  will explain in this section.
-
-  To keep things simple, let us start with a ``silly'' command that does nothing 
-  at all. We shall name this command \isacommand{foobar}. On the ML-level it can be 
-  defined as:
-*}
-
-ML{*let
-  val do_nothing = Scan.succeed (Toplevel.theory I)
-  val kind = OuterKeyword.thy_decl
-in
-  OuterSyntax.command "foobar" "description of foobar" kind do_nothing
-end *}
-
-text {*
-  The crucial function @{ML OuterSyntax.command} expects a name for the command, a
-  short description, a kind indicator (which we will explain later on more thoroughly) and a
-  parser producing a top-level transition function (its purpose will also explained
-  later). 
-
-  While this is everything you have to do on the ML-level, you need a keyword
-  file that can be loaded by ProofGeneral. This is to enable ProofGeneral to
-  recognise \isacommand{foobar} as a command. Such a keyword file can be
-  generated with the command-line:
-
-  @{text [display] "$ isabelle keywords -k foobar some_log_files"}
-
-  The option @{text "-k foobar"} indicates which postfix the name of the keyword file 
-  will be assigned. In the case above the file will be named @{text
-  "isar-keywords-foobar.el"}. This command requires log files to be
-  present (in order to extract the keywords from them). To generate these log
-  files, you first need to package the code above into a separate theory file named
-  @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the
-  complete code.
-
-
-  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-  \begin{figure}[t]
-  \begin{graybox}\small
-  \isacommand{theory}~@{text Command}\\
-  \isacommand{imports}~@{text Main}\\
-  \isacommand{begin}\\
-  \isacommand{ML}~@{text "\<verbopen>"}\\
-  @{ML
-"let
-  val do_nothing = Scan.succeed (Toplevel.theory I)
-  val kind = OuterKeyword.thy_decl
-in
-  OuterSyntax.command \"foobar\" \"description of foobar\" kind do_nothing
-end"}\\
-  @{text "\<verbclose>"}\\
-  \isacommand{end}
-  \end{graybox}
-  \caption{\small The file @{text "Command.thy"} is necessary for generating a log 
-  file. This log file enables Isabelle to generate a keyword file containing 
-  the command \isacommand{foobar}.\label{fig:commandtheory}}
-  \end{figure}
-  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-  For our purposes it is sufficient to use the log files of the theories
-  @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as
-  the log file for the theory @{text "Command.thy"}, which contains the new
-  \isacommand{foobar}-command. If you target other logics besides HOL, such
-  as Nominal or ZF, then you need to adapt the log files appropriately.
-  
-  @{text Pure} and @{text HOL} are usually compiled during the installation of
-  Isabelle. So log files for them should be already available. If not, then
-  they can be conveniently compiled with the help of the build-script from the Isabelle
-  distribution.
-
-  @{text [display] 
-"$ ./build -m \"Pure\"
-$ ./build -m \"HOL\""}
-  
-  The @{text "Pure-ProofGeneral"} theory needs to be compiled with:
-
-  @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}
-
-  For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory 
-  with:
-
-  @{text [display] "$ isabelle mkdir FoobarCommand"}
-
-  This generates a directory containing the files: 
-
-  @{text [display] 
-"./IsaMakefile
-./FoobarCommand/ROOT.ML
-./FoobarCommand/document
-./FoobarCommand/document/root.tex"}
-
-
-  You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}
-  and add the line 
-
-  @{text [display] "use_thy \"Command\";"} 
-  
-  to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing:
-
-  @{text [display] "$ isabelle make"}
-
-  If the compilation succeeds, you have finally created all the necessary log files. 
-  They are stored in the directory 
-  
-  @{text [display]  "~/.isabelle/heaps/Isabelle2008/polyml-5.2.1_x86-linux/log"}
-
-  or something similar depending on your Isabelle distribution and architecture.
-  One quick way to assign a shell variable to this directory is by typing
-
-  @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
- 
-  on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the 
-  directory should include the files:
-
-  @{text [display] 
-"Pure.gz
-HOL.gz
-Pure-ProofGeneral.gz
-HOL-FoobarCommand.gz"} 
-
-  From them you can create the keyword files. Assuming the name 
-  of the directory is in  @{text "$ISABELLE_LOGS"},
-  then the Unix command for creating the keyword file is:
-
-@{text [display]
-"$ isabelle keywords -k foobar 
-   $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"}
-
-  The result is the file @{text "isar-keywords-foobar.el"}. It should contain
-  the string @{text "foobar"} twice.\footnote{To see whether things are fine, check
-  that @{text "grep foobar"} on this file returns something
-  non-empty.}  This keyword file needs to
-  be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle
-  aware of this keyword file, you have to start Isabelle with the option @{text
-  "-k foobar"}, that is:
-
-
-  @{text [display] "$ isabelle emacs -k foobar a_theory_file"}
-
-  If you now build a theory on top of @{text "Command.thy"}, 
-  then the command \isacommand{foobar} can be used. 
-  Similarly with any other new command. 
-
-
-  At the moment \isacommand{foobar} is not very useful. Let us refine it a bit 
-  next by letting it take a proposition as argument and printing this proposition 
-  inside the tracing buffer. 
-
-  The crucial part of a command is the function that determines the behaviour
-  of the command. In the code above we used a ``do-nothing''-function, which
-  because of @{ML Scan.succeed} does not parse any argument, but immediately
-  returns the simple toplevel function @{ML "Toplevel.theory I"}. We can
-  replace this code by a function that first parses a proposition (using the
-  parser @{ML OuterParse.prop}), then prints out the tracing
-  information (using a new top-level function @{text trace_top_lvl}) and 
-  finally does nothing. For this you can write:
-*}
-
-ML{*let
-  fun trace_top_lvl str = 
-     Toplevel.theory (fn thy => (tracing str; thy))
-
-  val trace_prop = OuterParse.prop >> trace_top_lvl
-
-  val kind = OuterKeyword.thy_decl
-in
-  OuterSyntax.command "foobar" "traces a proposition" kind trace_prop
-end *}
-
-text {*
-  Now you can type
-
-  \begin{isabelle}
-  \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\
-  @{text "> \"True \<and> False\""}
-  \end{isabelle}
-  
-  and see the proposition in the tracing buffer.  
-
-  Note that so far we used @{ML thy_decl in OuterKeyword} as the kind indicator
-  for the command.  This means that the command finishes as soon as the
-  arguments are processed. Examples of this kind of commands are
-  \isacommand{definition} and \isacommand{declare}.  In other cases,
-  commands are expected to parse some arguments, for example a proposition,
-  and then ``open up'' a proof in order to prove the proposition (for example
-  \isacommand{lemma}) or prove some other properties (for example
-  \isacommand{function}). To achieve this kind of behaviour, you have to use the kind
-  indicator @{ML thy_goal in OuterKeyword}.  Note, however, once you change the 
-  ``kind'' of a command from @{ML thy_decl in OuterKeyword} to @{ML thy_goal in OuterKeyword} 
-  then the keyword file needs to be re-created!
-
-  Below we change \isacommand{foobar} so that it takes a proposition as
-  argument and then starts a proof in order to prove it. Therefore in Line 13, 
-  we set the kind indicator to @{ML thy_goal in OuterKeyword}.
-*}
-
-ML%linenosgray{*let
-  fun set_up_thm str ctxt =
-    let
-      val prop = Syntax.read_prop ctxt str
-    in
-      Proof.theorem_i NONE (K I) [[(prop,[])]] ctxt
-    end;
-  
-  val prove_prop = OuterParse.prop >>  
-      (fn str => Toplevel.print o 
-                    Toplevel.local_theory_to_proof NONE (set_up_thm str))
-  
-  val kind = OuterKeyword.thy_goal
-in
-  OuterSyntax.command "foobar" "proving a proposition" kind prove_prop
-end *}
-
-text {*
-  The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be
-  proved) and a context as argument.  The context is necessary in order to be able to use
-  @{ML Syntax.read_prop}, which converts a string into a proper proposition.
-  In Line 6 the function @{ML Proof.theorem_i} starts the proof for the
-  proposition. Its argument @{ML NONE} stands for a locale (which we chose to
-  omit); the argument @{ML "(K I)"} stands for a function that determines what
-  should be done with the theorem once it is proved (we chose to just forget
-  about it). Lines 9 to 11 contain the parser for the proposition.
-
-  If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following 
-  proof state:
-
-  \begin{isabelle}
-  \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
-  @{text "goal (1 subgoal):"}\\
-  @{text "1. True \<and> True"}
-  \end{isabelle}
-
-  and you can build the proof
-
-  \begin{isabelle}
-  \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
-  \isacommand{apply}@{text "(rule conjI)"}\\
-  \isacommand{apply}@{text "(rule TrueI)+"}\\
-  \isacommand{done}
-  \end{isabelle}
-
- 
-  
-  (FIXME What do @{ML "Toplevel.theory"} 
-  @{ML "Toplevel.print"} 
-  @{ML Toplevel.local_theory} do?)
-
-  (FIXME read a name and show how to store theorems)
-
-*}
-
-section {* Methods *}
-
-text {*
-  Methods are a central concept in Isabelle. They are the ones you use for example
-  in \isacommand{apply}. To print out all currently known methods you can use the 
-  Isabelle command. 
-*}
-
-print_methods
-
-text {*
-  An example of a very simple method is the following code.
-*}
-
-method_setup %gray foobar_meth = 
- {* Scan.succeed
-      (K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *}
-         "foobar method"
-
-text {*
-  It defines the method @{text foobar_meth}, which takes no arguments (therefore the
-  parser @{ML Scan.succeed}) and 
-  only applies the tactic @{thm [source] conjE} and then @{thm [source] conjI}.
-  This method can be used in the following proof
-*}
-
-lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
-apply(foobar_meth)
-txt {*
-  where it results in the goal state
-
-  \begin{minipage}{\textwidth}
-  @{subgoals}
-  \end{minipage} *}
-(*<*)oops(*>*)
-
-text {*
-  (FIXME: explain a version of rule-tac)
-*}
-
-(*<*)
-
-chapter {* Parsing *}
-
-text {*
-
-  Lots of Standard ML code is given in this document, for various reasons,
-  including:
-  \begin{itemize}
-  \item direct quotation of code found in the Isabelle source files,
-  or simplified versions of such code
-  \item identifiers found in the Isabelle source code, with their types 
-  (or specialisations of their types)
-  \item code examples, which can be run by the reader, to help illustrate the
-  behaviour of functions found in the Isabelle source code
-  \item ancillary functions, not from the Isabelle source code, 
-  which enable the reader to run relevant code examples
-  \item type abbreviations, which help explain the uses of certain functions
-  \end{itemize}
-
-*}
-
-section {* Parsing Isar input *}
-
-text {*
-
-  The typical parsing function has the type
-  \texttt{'src -> 'res * 'src}, with input  
-  of type \texttt{'src}, returning a result 
-  of type \texttt{'res}, which is (or is derived from) the first part of the
-  input, and also returning the remainder of the input.
-  (In the common case, when it is clear what the ``remainder of the input''
-  means, we will just say that the functions ``returns'' the
-  value of type \texttt{'res}). 
-  An exception is raised if an appropriate value 
-  cannot be produced from the input.
-  A range of exceptions can be used to identify different reasons 
-  for the failure of a parse.
-  
-  This contrasts the standard parsing function in Standard ML,
-  which is of type 
-  \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option};
-  (for example, \texttt{List.getItem} and \texttt{Substring.getc}).
-  However, much of the discussion at 
-  FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html
-  is relevant.
-
-  Naturally one may convert between the two different sorts of parsing functions
-  as follows:
-  \begin{verbatim}
-  open StringCvt ;
-  type ('res, 'src) ex_reader = 'src -> 'res * 'src
-  ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader 
-  fun ex_reader rdr src = Option.valOf (rdr src) ;
-  reader : ('res, 'src) ex_reader -> ('res, 'src) reader 
-  fun reader exrdr src = SOME (exrdr src) handle _ => NONE ;
-  \end{verbatim}
-  
-*}
-
-section{* The \texttt{Scan} structure *}
-
-text {* 
-  The source file is \texttt{src/General/scan.ML}.
-  This structure provides functions for using and combining parsing functions
-  of the type \texttt{'src -> 'res * 'src}.
-  Three exceptions are used:
-  \begin{verbatim}
-  exception MORE of string option;  (*need more input (prompt)*)
-  exception FAIL of string option;  (*try alternatives (reason of failure)*)
-  exception ABORT of string;        (*dead end*)
-  \end{verbatim}
-  Many functions in this structure (generally those with names composed of
-  symbols) are declared as infix.
-
-  Some functions from that structure are
-  \begin{verbatim}
-  |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
-  'src -> 'res2 * 'src''
-  --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
-  'src -> 'res1 * 'src''
-  -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') ->
-  'src -> ('res1 * 'res2) * 'src''
-  ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') ->
-  'src -> string * 'src''
-  \end{verbatim}
-  These functions parse a result off the input source twice.
-
-  \texttt{|--} and \texttt{--|} 
-  return the first result and the second result, respectively.
-
-  \texttt{--} returns both.
-
-  \verb|^^| returns the result of concatenating the two results
-  (which must be strings).
-
-  Note how, although the types 
-  \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same,
-  the types as shown help suggest the behaviour of the functions.
-  \begin{verbatim}
-  :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->
-  'src -> ('res1 * 'res2) * 'src''
-  :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') ->
-  'src -> 'res2 * 'src''
-  \end{verbatim}
-  These are similar to \texttt{|--} and \texttt{--|},
-  except that the second parsing function can depend on the result of the first.
-  \begin{verbatim}
-  >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src'
-  || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src
-  \end{verbatim}
-  \texttt{p >> f} applies a function \texttt{f} to the result of a parse.
-  
-  \texttt{||} tries a second parsing function if the first one
-  fails by raising an exception of the form \texttt{FAIL \_}.
-  
-  \begin{verbatim}
-  succeed : 'res -> ('src -> 'res * 'src) ;
-  fail : ('src -> 'res_src) ;
-  !! : ('src * string option -> string) -> 
-  ('src -> 'res_src) -> ('src -> 'res_src) ;
-  \end{verbatim}
-  \texttt{succeed r} returns \texttt{r}, with the input unchanged.
-  \texttt{fail} always fails, raising exception \texttt{FAIL NONE}.
-  \texttt{!! f} only affects the failure mode, turning a failure that
-  raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}.
-  This is used to prevent recovery from the failure ---
-  thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, 
-  it won't recover by trying \texttt{parse2}.
-
-  \begin{verbatim}
-  one : ('si -> bool) -> ('si list -> 'si * 'si list) ;
-  some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ;
-  \end{verbatim}
-  These require the input to be a list of items:
-  they fail, raising \texttt{MORE NONE} if the list is empty.
-  On other failures they raise \texttt{FAIL NONE} 
-
-  \texttt{one p} takes the first
-  item from the list if it satisfies \texttt{p}, otherwise fails.
-
-  \texttt{some f} takes the first
-  item from the list and applies \texttt{f} to it, failing if this returns
-  \texttt{NONE}.  
-
-  \begin{verbatim}
-  many : ('si -> bool) -> 'si list -> 'si list * 'si list ; 
-  \end{verbatim}
-  \texttt{many p} takes items from the input until it encounters one 
-  which does not satisfy \texttt{p}.  If it reaches the end of the input
-  it fails, raising \texttt{MORE NONE}.
-
-  \texttt{many1} (with the same type) fails if the first item 
-  does not satisfy \texttt{p}.  
-
-  \begin{verbatim}
-  option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src)
-  optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src)
-  \end{verbatim}
-  \texttt{option}: 
-  where the parser \texttt{f} succeeds with result \texttt{r} 
-  or raises \texttt{FAIL \_},
-  \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}.
-  
-  \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_},
-  \texttt{optional f default} provides the result \texttt{default}.
-
-  \begin{verbatim}
-  repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src
-  repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src
-  bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src 
-  \end{verbatim}
-  \texttt{repeat f} repeatedly parses an item off the remaining input until 
-  \texttt{f} fails with \texttt{FAIL \_}
-
-  \texttt{repeat1} is as for \texttt{repeat}, but requires at least one
-  successful parse.
-
-  \begin{verbatim}
-  lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src))
-  \end{verbatim}
-  \texttt{lift} changes the source type of a parser by putting in an extra
-  component \texttt{'ex}, which is ignored in the parsing.
-
-  The \texttt{Scan} structure also provides the type \texttt{lexicon}, 
-  HOW DO THEY WORK ?? TO BE COMPLETED
-  \begin{verbatim}
-  dest_lexicon: lexicon -> string list ;
-  make_lexicon: string list list -> lexicon ;
-  empty_lexicon: lexicon ;
-  extend_lexicon: string list list -> lexicon -> lexicon ;
-  merge_lexicons: lexicon -> lexicon -> lexicon ;
-  is_literal: lexicon -> string list -> bool ;
-  literal: lexicon -> string list -> string list * string list ;
-  \end{verbatim}
-  Two lexicons, for the commands and keywords, are stored and can be retrieved
-  by:
-  \begin{verbatim}
-  val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ;
-  val commands = Scan.dest_lexicon command_lexicon ;
-  val keywords = Scan.dest_lexicon keyword_lexicon ;
-  \end{verbatim}
-*}
-
-section{* The \texttt{OuterLex} structure *}
-
-text {*
-  The source file is @{text "src/Pure/Isar/outer_lex.ML"}.
-  In some other source files its name is abbreviated:
-  \begin{verbatim}
-  structure T = OuterLex;
-  \end{verbatim}
-  This structure defines the type \texttt{token}.
-  (The types
-  \texttt{OuterLex.token},
-  \texttt{OuterParse.token} and
-  \texttt{SpecParse.token} are all the same).
-  
-  Input text is split up into tokens, and the input source type for many parsing
-  functions is \texttt{token list}.
-
-  The datatype definition (which is not published in the signature) is
-  \begin{verbatim}
-  datatype token = Token of Position.T * (token_kind * string);
-  \end{verbatim}
-  but here are some runnable examples for viewing tokens: 
-
-*}
-
-
-
-
-ML{*
-  val toks = OuterSyntax.scan Position.none
-   "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ;
-*}
-
-ML{*
-  print_depth 20 ;
-*}
-
-ML{*
-  map OuterLex.text_of toks ;
-*}
-
-ML{*
-  val proper_toks = filter OuterLex.is_proper toks ;
-*}  
-
-ML{*
-  map OuterLex.kind_of proper_toks 
-*}
-
-ML{*
-  map OuterLex.unparse proper_toks ;
-*}
-
-ML{*
-  OuterLex.stopper
-*}
-
-text {*
-
-  The function \texttt{is\_proper : token -> bool} identifies tokens which are
-  not white space or comments: many parsing functions assume require spaces or
-  comments to have been filtered out.
-  
-  There is a special end-of-file token:
-  \begin{verbatim}
-  val (tok_eof : token, is_eof : token -> bool) = T.stopper ; 
-  (* end of file token *)
-  \end{verbatim}
-
-*}
-
-section {* The \texttt{OuterParse} structure *}
-
-text {*
-  The source file is \texttt{src/Pure/Isar/outer\_parse.ML}.
-  In some other source files its name is abbreviated:
-  \begin{verbatim}
-  structure P = OuterParse;
-  \end{verbatim}
-  Here the parsers use \texttt{token list} as the input source type. 
-  
-  Some of the parsers simply select the first token, provided that it is of the
-  right kind (as returned by \texttt{T.kind\_of}): these are 
-  \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var,
-  type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof}
-  Others select the first token, provided that it is one of several kinds,
-  (eg, \texttt{name, xname, text, typ}).
-
-  \begin{verbatim}
-  type 'a tlp = token list -> 'a * token list ; (* token list parser *)
-  $$$ : string -> string tlp
-  nat : int tlp ;
-  maybe : 'a tlp -> 'a option tlp ;
-  \end{verbatim}
-
-  \texttt{\$\$\$ s} returns the first token,
-  if it equals \texttt{s} \emph{and} \texttt{s} is a keyword.
-
-  \texttt{nat} returns the first token, if it is a number, and evaluates it.
-
-  \texttt{maybe}: if \texttt{p} returns \texttt{r}, 
-  then \texttt{maybe p} returns \texttt{SOME r} ;
-  if the first token is an underscore, it returns \texttt{NONE}.
-
-  A few examples:
-  \begin{verbatim}
-  P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *)
-  P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *)
-  val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ;
-  val proper_toks = List.filter T.is_proper toks ;
-  P.list P.nat toks ; (* OK, doesn't recognize white space *)
-  P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *)
-  P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *)
-  P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *)
-  val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ;
-  P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *)
-  \end{verbatim}
-
-  The following code helps run examples:
-  \begin{verbatim}
-  fun parse_str tlp str = 
-  let val toks : token list = OuterSyntax.scan str ;
-  val proper_toks = List.filter T.is_proper toks @ [tok_eof] ;
-  val (res, rem_toks) = tlp proper_toks ;
-  val rem_str = String.concat
-  (Library.separate " " (List.map T.unparse rem_toks)) ;
-  in (res, rem_str) end ;
-  \end{verbatim}
-
-  Some examples from \texttt{src/Pure/Isar/outer\_parse.ML}
-  \begin{verbatim}
-  val type_args =
-  type_ident >> Library.single ||
-  $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") ||
-  Scan.succeed [];
-  \end{verbatim}
-  There are three ways parsing a list of type arguments can succeed.
-  The first line reads a single type argument, and turns it into a singleton
-  list.
-  The second line reads "(", and then the remainder, ignoring the "(" ;
-  the remainder consists of a list of type identifiers (at least one),
-  and then a ")" which is also ignored.
-  The \texttt{!!!} ensures that if the parsing proceeds this far and then fails,
-  it won't try the third line (see the description of \texttt{Scan.!!}).
-  The third line consumes no input and returns the empty list.
-
-  \begin{verbatim}
-  fun triple2 (x, (y, z)) = (x, y, z);
-  val arity = xname -- ($$$ "::" |-- !!! (
-  Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) []
-  -- sort)) >> triple2;
-  \end{verbatim}
-  The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is
-  ignored), then optionally a list $ss$ of sorts and then another sort $s$.
-  The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$.
-  The second line reads the optional list of sorts:
-  it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored,
-  and between them a comma-separated list of sorts.
-  If this list is absent, the default \texttt{[]} provides the list of sorts.
-
-  \begin{verbatim}
-  parse_str P.type_args "('a, 'b) ntyp" ;
-  parse_str P.type_args "'a ntyp" ;
-  parse_str P.type_args "ntyp" ;
-  parse_str P.arity "ty :: tycl" ;
-  parse_str P.arity "ty :: (tycl1, tycl2) tycl" ;
-  \end{verbatim}
-
-*}
-
-section {* The \texttt{SpecParse} structure *}
-
-text {*
-  The source file is \texttt{src/Pure/Isar/spec\_parse.ML}.
-  This structure contains token list parsers for more complicated values.
-  For example, 
-  \begin{verbatim}
-  open SpecParse ;
-  attrib : Attrib.src tok_rdr ; 
-  attribs : Attrib.src list tok_rdr ;
-  opt_attribs : Attrib.src list tok_rdr ;
-  xthm : (thmref * Attrib.src list) tok_rdr ;
-  xthms1 : (thmref * Attrib.src list) list tok_rdr ;
-  
-  parse_str attrib "simp" ;
-  parse_str opt_attribs "hello" ;
-  val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ;
-  map Args.dest_src ass ;
-  val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ;
-  
-  parse_str xthm "mythm [attr]" ;
-  parse_str xthms1 "thm1 [attr] thms2" ;
-  \end{verbatim}
-  
-  As you can see, attributes are described using types of the \texttt{Args}
-  structure, described below.
-*}
-
-section{* The \texttt{Args} structure *}
-
-text {*
-  The source file is \texttt{src/Pure/Isar/args.ML}.
-  The primary type of this structure is the \texttt{src} datatype;
-  the single constructors not published in the signature, but 
-  \texttt{Args.src} and \texttt{Args.dest\_src}
-  are in fact the constructor and destructor functions.
-  Note that the types \texttt{Attrib.src} and \texttt{Method.src}
-  are in fact \texttt{Args.src}.
-
-  \begin{verbatim}
-  src : (string * Args.T list) * Position.T -> Args.src ;
-  dest_src : Args.src -> (string * Args.T list) * Position.T ;
-  Args.pretty_src : Proof.context -> Args.src -> Pretty.T ;
-  fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ;
-
-  val thy = ML_Context.the_context () ;
-  val ctxt = ProofContext.init thy ;
-  map (pr_src ctxt) ass ;
-  \end{verbatim}
-
-  So an \texttt{Args.src} consists of the first word, then a list of further 
-  ``arguments'', of type \texttt{Args.T}, with information about position in the
-  input.
-  \begin{verbatim}
-  (* how an Args.src is parsed *)
-  P.position : 'a tlp -> ('a * Position.T) tlp ;
-  P.arguments : Args.T list tlp ;
-
-  val parse_src : Args.src tlp =
-  P.position (P.xname -- P.arguments) >> Args.src ;
-  \end{verbatim}
-
-  \begin{verbatim}
-  val ((first_word, args), pos) = Args.dest_src asrc ;
-  map Args.string_of args ;
-  \end{verbatim}
-
-  The \texttt{Args} structure contains more parsers and parser transformers 
-  for which the input source type is \texttt{Args.T list}.  For example,
-  \begin{verbatim}
-  type 'a atlp = Args.T list -> 'a * Args.T list ;
-  open Args ;
-  nat : int atlp ; (* also Args.int *)
-  thm_sel : PureThy.interval list atlp ;
-  list : 'a atlp -> 'a list atlp ;
-  attribs : (string -> string) -> Args.src list atlp ;
-  opt_attribs : (string -> string) -> Args.src list atlp ;
-  
-  (* parse_atl_str : 'a atlp -> (string -> 'a * string) ;
-  given an Args.T list parser, to get a string parser *)
-  fun parse_atl_str atlp str = 
-  let val (ats, rem_str) = parse_str P.arguments str ;
-  val (res, rem_ats) = atlp ats ;
-  in (res, String.concat (Library.separate " "
-  (List.map Args.string_of rem_ats @ [rem_str]))) end ;
-
-  parse_atl_str Args.int "-1-," ;
-  parse_atl_str (Scan.option Args.int) "x1-," ;
-  parse_atl_str Args.thm_sel "(1-,4,13-22)" ;
-
-  val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I)
-  "[THEN trans [THEN sym], simp, OF sym]" ;
-  \end{verbatim}
-
-  From here, an attribute is interpreted using \texttt{Attrib.attribute}.
-
-  \texttt{Args} has a large number of functions which parse an \texttt{Args.src}
-  and also refer to a generic context.  
-  Note the use of \texttt{Scan.lift} for this.
-  (as does \texttt{Attrib} - RETHINK THIS)
-  
-  (\texttt{Args.syntax} shown below has type specialised)
-
-  \begin{verbatim}
-  type ('res, 'src) parse_fn = 'src -> 'res * 'src ;
-  type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ;
-  Scan.lift : 'a atlp -> 'a cgatlp ;
-  term : term cgatlp ;
-  typ : typ cgatlp ;
-  
-  Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ;
-  Attrib.thm : thm cgatlp ;
-  Attrib.thms : thm list cgatlp ;
-  Attrib.multi_thm : thm list cgatlp ;
-  
-  (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ;
-  given a (Context.generic * Args.T list) parser, to get a string parser *)
-  fun parse_cgatl_str cgatlp str = 
-  let 
-    (* use the current generic context *)
-    val generic = Context.Theory thy ;
-    val (ats, rem_str) = parse_str P.arguments str ;
-    (* ignore any change to the generic context *)
-    val (res, (_, rem_ats)) = cgatlp (generic, ats) ;
-  in (res, String.concat (Library.separate " "
-      (List.map Args.string_of rem_ats @ [rem_str]))) end ;
-  \end{verbatim}
-*}
-
-section{* Attributes, and the \texttt{Attrib} structure *}
-
-text {*
-  The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}.
-  The source file for the \texttt{Attrib} structure is
-  \texttt{src/Pure/Isar/attrib.ML}.
-  Most attributes use a theorem to change a generic context (for example, 
-  by declaring that the theorem should be used, by default, in simplification),
-  or change a theorem (which most often involves referring to the current
-  theory). 
-  The functions \texttt{Thm.rule\_attribute} and
-  \texttt{Thm.declaration\_attribute} create attributes of these kinds.
-
-  \begin{verbatim}
-  type attribute = Context.generic * thm -> Context.generic * thm;
-  type 'a trf = 'a -> 'a ; (* transformer of a given type *)
-  Thm.rule_attribute  : (Context.generic -> thm -> thm) -> attribute ;
-  Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ;
-
-  Attrib.print_attributes : theory -> unit ;
-  Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ;
-
-  List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ;
-  \end{verbatim}
-
-  An attribute is stored in a theory as indicated by:
-  \begin{verbatim}
-  Attrib.add_attributes : 
-  (bstring * (src -> attribute) * string) list -> theory trf ; 
-  (*
-  Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ;
-  *)
-  \end{verbatim}
-  where the first and third arguments are name and description of the attribute,
-  and the second is a function which parses the attribute input text 
-  (including the attribute name, which has necessarily already been parsed).
-  Here, \texttt{THEN\_att} is a function declared in the code for the
-  structure \texttt{Attrib}, but not published in its signature.
-  The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of 
-  \texttt{Attrib.add\_attributes} to add a number of attributes.
-
-  \begin{verbatim}
-  FullAttrib.THEN_att : src -> attribute ;
-  FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ;
-  FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ;
-  \end{verbatim}
-
-  \begin{verbatim}
-  Attrib.syntax : attribute cgatlp -> src -> attribute ;
-  Attrib.no_args : attribute -> src -> attribute ;
-  \end{verbatim}
-  When this is called as \texttt{syntax scan src (gc, th)}
-  the generic context \texttt{gc} is used 
-  (and potentially changed to \texttt{gc'})
-  by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would
-  then be applied to \texttt{(gc', th)}.
-  The source for parsing the attribute is the arguments part of \texttt{src},
-  which must all be consumed by the parse.
-
-  For example, for \texttt{Attrib.no\_args attr src}, the attribute parser 
-  simply returns \texttt{attr}, requiring that the arguments part of
-  \texttt{src} must be empty.
-
-  Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified:
-  \begin{verbatim}
-  fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ;
-  rot_att_n : int -> attribute ;
-  val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ;
-  val rotated_att : src -> attribute =
-  Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ;
-  
-  val THEN_arg : int cgatlp = Scan.lift 
-  (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ;
-
-  Attrib.thm : thm cgatlp ;
-
-  THEN_arg -- Attrib.thm : (int * thm) cgatlp ;
-
-  fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ;
-  THEN_att_n : int * thm -> attribute ;
-
-  val THEN_att : src -> attribute = Attrib.syntax
-  (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp);
-  \end{verbatim}
-  The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg}
-  read an optional argument, which for \texttt{rotated} is an integer, 
-  and for \texttt{THEN} is a natural enclosed in square brackets;
-  the default, if the argument is absent, is 1 in each case.
-  Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into
-  attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is
-  parsed by \texttt{Attrib.thm}.  
-  Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}.
-
-*}
-
-section{* Methods, and the \texttt{Method} structure *}
-
-text {*
-  The source file is \texttt{src/Pure/Isar/method.ML}.
-  The type \texttt{method} is defined by the datatype declaration
-  \begin{verbatim}
-  (* datatype method = Meth of thm list -> cases_tactic; *)
-  RuleCases.NO_CASES : tactic -> cases_tactic ;
-  \end{verbatim}
-  In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor
-  \texttt{Meth}.
-  A \texttt{cases\_tactic} is an elaborated version of a tactic.
-  \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a
-  \texttt{cases\_tactic} without any further case information.
-  For further details see the description of structure \texttt{RuleCases} below.
-  The list of theorems to be passed to a method consists of the current
-  \emph{facts} in the proof.
-  
-  \begin{verbatim}
-  RAW_METHOD : (thm list -> tactic) -> method ;
-  METHOD : (thm list -> tactic) -> method ;
-  
-  SIMPLE_METHOD : tactic -> method ;
-  SIMPLE_METHOD' : (int -> tactic) -> method ;
-  SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ;
-
-  RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ;
-  METHOD_CASES : (thm list -> cases_tactic) -> method ;
-  \end{verbatim}
-  A method is, in its simplest form, a tactic; applying the method is to apply
-  the tactic to the current goal state.
-
-  Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying 
-  \texttt{tacf} to the current {facts}, and applying that tactic to the
-  goal state.
-
-  \texttt{METHOD} is similar but also first applies
-  \texttt{Goal.conjunction\_tac} to all subgoals.
-
-  \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then
-  applies \texttt{tacf}.
-
-  \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then
-  applies \texttt{tacf} to subgoal 1.
-
-  \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by
-  \texttt{quant}, which may be, for example,
-  \texttt{ALLGOALS} (all subgoals),
-  \texttt{TRYALL} (try all subgoals, failure is OK),
-  \texttt{FIRSTGOAL} (try subgoals until it succeeds once), 
-  \texttt{(fn tacf => tacf 4)} (subgoal 4), etc
-  (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}).
-
-  A method is stored in a theory as indicated by:
-  \begin{verbatim}
-  Method.add_method : 
-  (bstring * (src -> Proof.context -> method) * string) -> theory trf ; 
-  ( *
-  * )
-  \end{verbatim}
-  where the first and third arguments are name and description of the method,
-  and the second is a function which parses the method input text 
-  (including the method name, which has necessarily already been parsed).
-
-  Here, \texttt{xxx} is a function declared in the code for the
-  structure \texttt{Method}, but not published in its signature.
-  The source file \texttt{src/Pure/Isar/method.ML} shows the use of 
-  \texttt{Method.add\_method} to add a number of methods.
-
-
-*}
-(*>*)
-end
\ No newline at end of file