CookBook/Package/Ind_Interface.thy
author Christian Urban <urbanc@in.tum.de>
Thu, 29 Jan 2009 09:46:17 +0000
changeset 88 ebbd0dd008c8
parent 76 b99fa5fa63fc
child 102 5e309df58557
permissions -rw-r--r--
adaptation of the package chapter to fit the rest

theory Ind_Interface
imports "../Base" Simple_Inductive_Package
begin

section{* The Interface \label{sec:ind-interface} *}

text {*

  In order to add a new inductive predicate to a theory with the help of our
  package, the user must \emph{invoke} it. For every package, there are
  essentially two different ways of invoking it, which we will refer to as
  \emph{external} and \emph{internal}. By external invocation we mean that the
  package is called from within a theory document. In this case, the type of
  the inductive predicate, as well as its introduction rules, are given as
  strings by the user. Before the package can actually make the definition,
  the type and introduction rules have to be parsed. In contrast, internal
  invocation means that the package is called by some other package. For
  example, the function definition package \cite{Krauss-IJCAR06} calls the
  inductive definition package to define the graph of the function. However,
  it is not a good idea for the function definition package to pass the
  introduction rules for the function graph to the inductive definition
  package as strings. In this case, it is better to directly pass the rules to
  the package as a list of terms, which is more robust than handling strings
  that are lacking the additional structure of terms. These two ways of
  invoking the package are reflected in its ML programming interface, which
  consists of two functions:

  @{ML_chunk [display] SIMPLE_INDUCTIVE_PACKAGE}
*}

text {*
  The function for external invocation of the package is called @{ML
  add_inductive in SimpleInductivePackage}, whereas the one for internal
  invocation is called @{ML add_inductive_i in SimpleInductivePackage}. Both
  of these functions take as arguments the names and types of the inductive
  predicates, the names and types of their parameters, the actual introduction
  rules and a \emph{local theory}.  They return a local theory containing the
  definition, together with a tuple containing the introduction and induction
  rules, which are stored in the local theory, too.  In contrast to an
  ordinary theory, which simply consists of a type signature, as well as
  tables for constants, axioms and theorems, a local theory also contains
  additional context information, such as locally fixed variables and local
  assumptions that may be used by the package. The type @{ML_type
  local_theory} is identical to the type of \emph{proof contexts} @{ML_type
  "Proof.context"}, although not every proof context constitutes a valid local
  theory.  Note that @{ML add_inductive_i in SimpleInductivePackage} expects
  the types of the predicates and parameters to be specified using the
  datatype @{ML_type typ} of Isabelle's logical framework, whereas @{ML
  add_inductive in SimpleInductivePackage} expects them to be given as
  optional strings. If no string is given for a particular predicate or
  parameter, this means that the type should be inferred by the
  package. Additional \emph{mixfix syntax} may be associated with the
  predicates and parameters as well. Note that @{ML add_inductive_i in
  SimpleInductivePackage} does not allow mixfix syntax to be associated with
  parameters, since it can only be used for parsing. The names of the
  predicates, parameters and rules are represented by the type @{ML_type
  Binding.binding}. Strings can be turned into elements of the type @{ML_type
  Binding.binding} using the function @{ML [display] "Binding.name : string ->
  Binding.binding"} Each introduction rule is given as a tuple containing its
  name, a list of \emph{attributes} and a logical formula. Note that the type
  @{ML_type Attrib.binding} used in the list of introduction rules is just a
  shorthand for the type @{ML_type "Binding.binding * Attrib.src list"}.  The
  function @{ML add_inductive_i in SimpleInductivePackage} expects the formula
  to be specified using the datatype @{ML_type term}, whereas @{ML
  add_inductive in SimpleInductivePackage} expects it to be given as a string.
  An attribute specifies additional actions and transformations that should be
  applied to a theorem, such as storing it in the rule databases used by
  automatic tactics like the simplifier. The code of the package, which will
  be described in the following section, will mostly treat attributes as a
  black box and just forward them to other functions for storing theorems in
  local theories.  The implementation of the function @{ML add_inductive in
  SimpleInductivePackage} for external invocation of the package is quite
  simple. Essentially, it just parses the introduction rules and then passes
  them on to @{ML add_inductive_i in SimpleInductivePackage}:

  @{ML_chunk [display] add_inductive}

  For parsing and type checking the introduction rules, we use the function
  
  @{ML [display] "Specification.read_specification:
  (Binding.binding * string option * mixfix) list ->  (*{variables}*)
  (Attrib.binding * string list) list list ->  (*{rules}*)
  local_theory ->
  (((Binding.binding * typ) * mixfix) list *
   (Attrib.binding * term list) list) *
  local_theory"}
*}

text {*
  During parsing, both predicates and parameters are treated as variables, so
  the lists \verb!preds_syn! and \verb!params_syn! are just appended
  before being passed to @{ML read_specification in Specification}. Note that the format
  for rules supported by @{ML read_specification in Specification} is more general than
  what is required for our package. It allows several rules to be associated
  with one name, and the list of rules can be partitioned into several
  sublists. In order for the list \verb!intro_srcs! of introduction rules
  to be acceptable as an input for @{ML read_specification in Specification}, we first
  have to turn it into a list of singleton lists. This transformation
  has to be reversed later on by applying the function
  @{ML [display] "the_single: 'a list -> 'a"}
  to the list \verb!specs! containing the parsed introduction rules.
  The function @{ML read_specification in Specification} also returns the list \verb!vars!
  of predicates and parameters that contains the inferred types as well.
  This list has to be chopped into the two lists \verb!preds_syn'! and
  \verb!params_syn'! for predicates and parameters, respectively.
  All variables occurring in a rule but not in the list of variables passed to
  @{ML read_specification in Specification} will be bound by a meta-level universal
  quantifier.
*}

text {*
  Finally, @{ML read_specification in Specification} also returns another local theory,
  but we can safely discard it. As an example, let us look at how we can use this
  function to parse the introduction rules of the @{text trcl} predicate:

  @{ML_response [display]
"Specification.read_specification
  [(Binding.name \"trcl\", NONE, NoSyn),
   (Binding.name \"r\", SOME \"'a \<Rightarrow> 'a \<Rightarrow> bool\", NoSyn)]
  [[((Binding.name \"base\", []), [\"trcl r x x\"])],
   [((Binding.name \"step\", []), [\"trcl r x y \<Longrightarrow> r y z \<Longrightarrow> trcl r x z\"])]]
  @{context}"
"((\<dots>,
  [(\<dots>,
    [Const (\"all\", \<dots>) $ Abs (\"x\", TFree (\"'a\", \<dots>),
       Const (\"Trueprop\", \<dots>) $
         (Free (\"trcl\", \<dots>) $ Free (\"r\", \<dots>) $ Bound 0 $ Bound 0))]),
   (\<dots>,
    [Const (\"all\", \<dots>) $ Abs (\"x\", TFree (\"'a\", \<dots>),
       Const (\"all\", \<dots>) $ Abs (\"y\", TFree (\"'a\", \<dots>),
         Const (\"all\", \<dots>) $ Abs (\"z\", TFree (\"'a\", \<dots>),
           Const (\"==>\", \<dots>) $
             (Const (\"Trueprop\", \<dots>) $
               (Free (\"trcl\", \<dots>) $ Free (\"r\", \<dots>) $ Bound 2 $ Bound 1)) $
             (Const (\"==>\", \<dots>) $ \<dots> $ \<dots>))))])]),
 \<dots>)
: (((Binding.binding * typ) * mixfix) list *
   (Attrib.binding * term list) list) * local_theory"}

  In the list of variables passed to @{ML read_specification in Specification}, we have
  used the mixfix annotation @{ML NoSyn} to indicate that we do not want to associate any
  mixfix syntax with the variable. Moreover, we have only specified the type of \texttt{r},
  whereas the type of \texttt{trcl} is computed using type inference.
  The local variables \texttt{x}, \texttt{y} and \texttt{z} of the introduction rules
  are turned into bound variables with the de Bruijn indices,
  whereas \texttt{trcl} and \texttt{r} remain free variables.

*}

text {*

  \paragraph{Parsers for theory syntax}

  Although the function @{ML add_inductive in SimpleInductivePackage} parses terms and types, it still
  cannot be used to invoke the package directly from within a theory document.
  In order to do this, we have to write another parser. Before we describe
  the process of writing parsers for theory syntax in more detail, we first
  show some examples of how we would like to use the inductive definition
  package.


  The definition of the transitive closure should look as follows:
*}

simple_inductive
  trcl for r :: "'a \<Rightarrow> 'a \<Rightarrow> bool"
where
  base: "trcl r x x"
| step: "trcl r x y \<Longrightarrow> r y z \<Longrightarrow> trcl r x z"
(*<*)
thm trcl_def
thm trcl.induct
thm base
thm step
thm trcl.intros

lemma trcl_strong_induct:
  assumes trcl: "trcl r x y"
  and I1: "\<And>x. P x x"
  and I2: "\<And>x y z. P x y \<Longrightarrow> trcl r x y \<Longrightarrow> r y z \<Longrightarrow> P x z"
  shows "P x y" 
proof -
  from trcl
  have "P x y \<and> trcl r x y"
  proof induct
    case (base x)
    from I1 and trcl.base show ?case ..
  next
    case (step x y z)
    then have "P x y" and "trcl r x y" by simp_all
    from `P x y` `trcl r x y` `r y z` have "P x z"
      by (rule I2)
    moreover from `trcl r x y` `r y z` have "trcl r x z"
      by (rule trcl.step)
    ultimately show ?case ..
  qed
  then show ?thesis ..
qed
(*>*)

text {* Even and odd numbers can be defined by *}

simple_inductive
  even and odd
where
  even0: "even 0"
| evenS: "odd n \<Longrightarrow> even (Suc n)"
| oddS: "even n \<Longrightarrow> odd (Suc n)"
(*<*)
thm even_def odd_def
thm even.induct odd.induct
thm even0
thm evenS
thm oddS
thm even_odd.intros
(*>*)

text {* The accessible part of a relation can be introduced as follows: *}

simple_inductive
  accpart for r :: "'a \<Rightarrow> 'a \<Rightarrow> bool"
where
  accpartI: "(\<And>y. r y x \<Longrightarrow> accpart r y) \<Longrightarrow> accpart r x"
(*<*)
thm accpart_def
thm accpart.induct
thm accpartI
(*>*)

text {*
  Moreover, it should also be possible to define the accessible part
  inside a locale fixing the relation @{text r}:
*}

locale rel =
  fixes r :: "'a \<Rightarrow> 'a \<Rightarrow> bool"

simple_inductive (in rel) accpart'
where
  accpartI': "\<And>x. (\<And>y. r y x \<Longrightarrow> accpart' y) \<Longrightarrow> accpart' x"
(*<*)
context rel
begin

thm accpartI'
thm accpart'.induct

end

thm rel.accpartI'
thm rel.accpart'.induct

ML{*val (result, lthy) = SimpleInductivePackage.add_inductive
  [(Binding.name "trcl'", NONE, NoSyn)] [(Binding.name "r", SOME "'a \<Rightarrow> 'a \<Rightarrow> bool", NoSyn)]
  [((Binding.name "base", []), "\<And>x. trcl' r x x"), ((Binding.name "step", []), "\<And>x y z. trcl' r x y \<Longrightarrow> r y z \<Longrightarrow> trcl' r x z")]
  (TheoryTarget.init NONE @{theory})
*}
(*>*)

text {*

  In this context, it is important to note that Isabelle distinguishes
  between \emph{outer} and \emph{inner} syntax. Theory commands such as
  \isa{\isacommand{simple{\isacharunderscore}inductive} $\ldots$ \isacommand{for} $\ldots$ \isacommand{where} $\ldots$}
  belong to the outer syntax, whereas items in quotation marks, in particular
  terms such as @{text [source] "trcl r x x"} and types such as
  @{text [source] "'a \<Rightarrow> 'a \<Rightarrow> bool"} belong to the inner syntax.
  Separating the two layers of outer and inner syntax greatly simplifies
  matters, because the parser for terms and types does not have to know
  anything about the possible syntax of theory commands, and the parser
  for theory commands need not be concerned about the syntactic structure
  of terms and types.

  \medskip
  \noindent
  The syntax of the \isa{\isacommand{simple{\isacharunderscore}inductive}} command
  can be described by the following railroad diagram:
  \begin{rail}
  'simple\_inductive' target? fixes ('for' fixes)? \\
  ('where' (thmdecl? prop + '|'))?
  ;
  \end{rail}

  \paragraph{Functional parsers}

  For parsing terms and types, Isabelle uses a rather general and sophisticated
  algorithm due to Earley, which is driven by \emph{priority grammars}.
  In contrast, parsers for theory syntax are built up using a set of combinators.
  Functional parsing using combinators is a well-established technique, which
  has been described by many authors, including Paulson \cite{paulson-ML-91}
  and Wadler \cite{Wadler-AFP95}. 
  The central idea is that a parser is a function of type @{ML_type "'a list -> 'b * 'a list"},
  where @{ML_type "'a"} is a type of \emph{tokens}, and @{ML_type "'b"} is a type for
  encoding items that the parser has recognized. When a parser is applied to a
  list of tokens whose prefix it can recognize, it returns an encoding of the
  prefix as an element of type @{ML_type "'b"}, together with the suffix of the list
  containing the remaining tokens. Otherwise, the parser raises an exception
  indicating a syntax error. The library for writing functional parsers in
  Isabelle can roughly be split up into two parts. The first part consists of a
  collection of generic parser combinators that are contained in the structure
  @{ML_struct Scan} defined in the file @{ML_file "Pure/General/scan.ML"} in the Isabelle
  sources. While these combinators do not make any assumptions about the concrete
  structure of the tokens used, the second part of the library consists of combinators
  for dealing with specific token types.
  The following is an excerpt from the signature of @{ML_struct Scan}:

  \begin{table}
  @{ML "|| : ('a -> 'b) * ('a -> 'b) -> 'a -> 'b"} \\
  @{ML "-- : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> ('b * 'd) * 'e"} \\
  @{ML "|-- : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> 'd * 'e"} \\
  @{ML "--| : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> 'b * 'e"} \\
  @{ML "optional: ('a -> 'b * 'a) -> 'b -> 'a -> 'b * 'a" in Scan} \\
  @{ML "repeat: ('a -> 'b * 'a) -> 'a -> 'b list * 'a" in Scan} \\
  @{ML "repeat1: ('a -> 'b * 'a) -> 'a -> 'b list * 'a" in Scan} \\
  @{ML ">> : ('a -> 'b * 'c) * ('b -> 'd) -> 'a -> 'd * 'c"} \\
  @{ML "!! : ('a * string option -> string) -> ('a -> 'b) -> 'a -> 'b"}
  \end{table}

  Interestingly, the functions shown above are so generic that they do not
  even rely on the input and output of the parser being a list of tokens.
  If \texttt{p} succeeds, i.e.\ does not raise an exception, the parser
  @{ML "p || q" for p q} returns the result of \texttt{p}, otherwise it returns
  the result of \texttt{q}. The parser @{ML "p -- q" for p q} first parses an
  item of type @{ML_type "'b"} using \texttt{p}, then passes the remaining tokens
  of type @{ML_type "'c"} to \texttt{q}, which parses an item of type @{ML_type "'d"}
  and returns the remaining tokens of type @{ML_type "'e"}, which are finally
  returned together with a pair of type @{ML_type "'b * 'd"} containing the two
  parsed items. The parsers @{ML "p |-- q" for p q} and @{ML "p --| q" for p q}
  work in a similar way as the previous one, with the difference that they
  discard the item parsed by the first and the second parser, respectively.
  If \texttt{p} succeeds, the parser @{ML "optional p x" for p x in Scan} returns the result
  of \texttt{p}, otherwise it returns the default value \texttt{x}. The parser
  @{ML "repeat p" for p in Scan} applies \texttt{p} as often as it can, returning a possibly
  empty list of parsed items. The parser @{ML "repeat1 p" for p in Scan} is similar,
  but requires \texttt{p} to succeed at least once. The parser
  @{ML "p >> f" for p f} uses \texttt{p} to parse an item of type @{ML_type "'b"}, to which
  it applies the function \texttt{f} yielding a value of type @{ML_type "'d"}, which
  is returned together with the remaining tokens of type @{ML_type "'c"}.
  Finally, @{ML "!!"} is used for transforming exceptions produced by parsers.
  If \texttt{p} raises an exception indicating that it cannot parse a given input,
  then an enclosing parser such as
  @{ML [display] "q -- p || r" for p q r}
  will try the alternative parser \texttt{r}. By writing
  @{ML [display] "q -- !! err p || r" for err p q r}
  instead, one can achieve that a failure of \texttt{p} causes the whole parser to abort.
  The @{ML "!!"} operator is similar to the \emph{cut} operator in Prolog, which prevents
  the interpreter from backtracking. The \texttt{err} function supplied as an argument
  to @{ML "!!"} can be used to produce an error message depending on the current
  state of the parser, as well as the optional error message returned by \texttt{p}.
  
  So far, we have only looked at combinators that construct more complex parsers
  from simpler parsers. In order for these combinators to be useful, we also need
  some basic parsers. As an example, we consider the following two parsers
  defined in @{ML_struct Scan}:
  
  \begin{table}
  @{ML "one: ('a -> bool) -> 'a list -> 'a * 'a list" in Scan} \\
  @{ML "$$ : string -> string list -> string * string list"}
  \end{table}
  
  The parser @{ML "one pred" for pred in Scan} parses exactly one token that
  satisfies the predicate \texttt{pred}, whereas @{ML "$$ s" for s} only
  accepts a token that equals the string \texttt{s}. Note that we can easily
  express @{ML "$$ s" for s} using @{ML "one" in Scan}:
  @{ML [display] "one (fn s' => s' = s)" for s in Scan}
  As an example, let us look at how we can use @{ML "$$"} and @{ML "--"} to parse
  the prefix ``\texttt{hello}'' of the character list ``\texttt{hello world}'':
  
  @{ML_response [display]
  "($$ \"h\" -- $$ \"e\" -- $$ \"l\" -- $$ \"l\" -- $$ \"o\")
  [\"h\", \"e\", \"l\", \"l\", \"o\", \" \", \"w\", \"o\", \"r\", \"l\", \"d\"]"
  "(((((\"h\", \"e\"), \"l\"), \"l\"), \"o\"), [\" \", \"w\", \"o\", \"r\", \"l\", \"d\"])
  : ((((string * string) * string) * string) * string) * string list"}

  Most of the time, however, we will have to deal with tokens that are not just strings.
  The parsers for the theory syntax, as well as the parsers for the argument syntax
  of proof methods and attributes use the token type @{ML_type OuterParse.token},
  which is identical to @{ML_type OuterLex.token}.
  The parser functions for the theory syntax are contained in the structure
  @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.
  In our parser, we will use the following functions:
  
  \begin{table}
  @{ML "$$$ : string -> token list -> string * token list" in OuterParse} \\
  @{ML "enum1: string -> (token list -> 'a * token list) -> token list ->
  'a list * token list" in OuterParse} \\
  @{ML "prop: token list -> string * token list" in OuterParse} \\
  @{ML "opt_target: token list -> string option * token list" in OuterParse} \\
  @{ML "fixes: token list ->
  (Binding.binding * string option * mixfix) list * token list" in OuterParse} \\
  @{ML "for_fixes: token list ->
  (Binding.binding * string option * mixfix) list * token list" in OuterParse} \\
  @{ML "!!! : (token list -> 'a) -> token list -> 'a" in OuterParse}
  \end{table}

  The parsers @{ML "$$$" in OuterParse} and @{ML "!!!" in OuterParse} are
  defined using the parsers @{ML "one" in Scan} and @{ML "!!"} from
  @{ML_struct Scan}.
  The parser @{ML "enum1 s p" for s p in OuterParse} parses a non-emtpy list of items
  recognized by the parser \texttt{p}, where the items are separated by \texttt{s}.
  A proposition can be parsed using the function @{ML prop in OuterParse}.
  Essentially, a proposition is just a string or an identifier, but using the
  specific parser function @{ML prop in OuterParse} leads to more instructive
  error messages, since the parser will complain that a proposition was expected
  when something else than a string or identifier is found.
  An optional locale target specification of the form \isa{(\isacommand{in}\ $\ldots$)}
  can be parsed using @{ML opt_target in OuterParse}.
  The lists of names of the predicates and parameters, together with optional
  types and syntax, are parsed using the functions @{ML "fixes" in OuterParse}
  and @{ML for_fixes in OuterParse}, respectively.
  In addition, the following function from @{ML_struct SpecParse} for parsing
  an optional theorem name and attribute, followed by a delimiter, will be useful:
  
  \begin{table}
  @{ML "opt_thm_name:
  string -> token list -> Attrib.binding * token list" in SpecParse}
  \end{table}

  We now have all the necessary tools to write the parser for our
  \isa{\isacommand{simple{\isacharunderscore}inductive}} command:
  
  @{ML_chunk [display] syntax}

  The definition of the parser \verb!ind_decl! closely follows the railroad
  diagram shown above. In order to make the code more readable, the structures
  @{ML_struct OuterParse} and @{ML_struct OuterKeyword} are abbreviated by
  \texttt{P} and \texttt{K}, respectively. Note how the parser combinator
  @{ML "!!!" in OuterParse} is used: once the keyword \texttt{where}
  has been parsed, a non-empty list of introduction rules must follow.
  Had we not used the combinator @{ML "!!!" in OuterParse}, a
  \texttt{where} not followed by a list of rules would have caused the parser
  to respond with the somewhat misleading error message

  \begin{verbatim}
  Outer syntax error: end of input expected, but keyword where was found
  \end{verbatim}

  rather than with the more instructive message

  \begin{verbatim}
  Outer syntax error: proposition expected, but terminator was found
  \end{verbatim}

  Once all arguments of the command have been parsed, we apply the function
  @{ML add_inductive in SimpleInductivePackage}, which yields a local theory
  transformer of type @{ML_type "local_theory -> local_theory"}. Commands in
  Isabelle/Isar are realized by transition transformers of type
  @{ML_type [display] "Toplevel.transition -> Toplevel.transition"}
  We can turn a local theory transformer into a transition transformer by using
  the function

  @{ML [display] "Toplevel.local_theory : string option ->
  (local_theory -> local_theory) ->
  Toplevel.transition -> Toplevel.transition"}
 
  which, apart from the local theory transformer, takes an optional name of a locale
  to be used as a basis for the local theory. 

  (FIXME : needs to be adjusted to new parser type)

  {\it
  The whole parser for our command has type
  @{ML_text [display] "OuterLex.token list ->
  (Toplevel.transition -> Toplevel.transition) * OuterLex.token list"}
  which is abbreviated by @{ML_text OuterSyntax.parser_fn}. The new command can be added
  to the system via the function
  @{ML_text [display] "OuterSyntax.command :
  string -> string -> OuterKeyword.T -> OuterSyntax.parser_fn -> unit"}
  which imperatively updates the parser table behind the scenes. }

  In addition to the parser, this
  function takes two strings representing the name of the command and a short description,
  as well as an element of type @{ML_type OuterKeyword.T} describing which \emph{kind} of
  command we intend to add. Since we want to add a command for declaring new concepts,
  we choose the kind @{ML "OuterKeyword.thy_decl"}. Other kinds include
  @{ML "OuterKeyword.thy_goal"}, which is similar to @{ML thy_decl in OuterKeyword},
  but requires the user to prove a goal before making the declaration, or
  @{ML "OuterKeyword.diag"}, which corresponds to a purely diagnostic command that does
  not change the context. For example, the @{ML thy_goal in OuterKeyword} kind is used
  by the \isa{\isacommand{function}} command \cite{Krauss-IJCAR06}, which requires the user
  to prove that a given set of equations is non-overlapping and covers all cases. The kind
  of the command should be chosen with care, since selecting the wrong one can cause strange
  behaviour of the user interface, such as failure of the undo mechanism.
*}

(*<*)
end
(*>*)