added more to the unification section
authorChristian Urban <urbanc@in.tum.de>
Tue, 10 Nov 2009 04:57:42 +0100
changeset 381 97518188ef0e
parent 380 0dc727055c11
child 382 3f153aa4f231
added more to the unification section
ProgTutorial/General.thy
progtutorial.pdf
--- a/ProgTutorial/General.thy	Mon Nov 09 10:07:08 2009 +0100
+++ b/ProgTutorial/General.thy	Tue Nov 10 04:57:42 2009 +0100
@@ -677,25 +677,25 @@
 section {* Unification and Matching (TBD) *}
 
 text {* 
-  Isabelle's terms and types contain schematic term variables
+  Isabelle's terms and types may contain schematic term variables
   (term-constructor @{ML Var}) and schematic type variables (term-constructor
-  @{ML TVar}). Both are printed out as variables with a leading question
-  mark. They stand for unknown terms and types, which can be made more concrete
-  by instantiations. Such instantiations can be calculated by unification or
-  matching. While in case of types, unification and matching is relatively
-  straightforward, in case of terms the algorithms are substantially more
-  complicated, because terms need higher-order versions of the unification and
-  matching algorithms. Below we shall use the antiquotation @{text "@{typ_pat \<dots>}"} and
-  @{text "@{term_pat \<dots>}"} from Section~\ref{sec:antiquote} in order to
-  construct examples involving schematic variables.
-
-  Let us begin with describing unification and matching of types.  Both
-  matching and unification of types return a type environment, ML-type
-  @{ML_type "Type.tyenv"}, which maps schematic type variables to types (and
-  sorts). Below we use the function @{ML_ind typ_unify in Sign} from the 
-  structure @{ML_struct Sign} for unifying
-  the types @{text "?'a * ?'b"} and @{text "?'b list * nat"}. This will
-  produce the mapping @{text "[?'a := ?'b list, ?'b := nat]"}.
+  @{ML TVar}). These variables stand for unknown entities, which can be made
+  more concrete by instantiations. Such instantiations might be a result of
+  unification or matching. While in case of types, unification and matching is
+  relatively straightforward, in case of terms the algorithms are
+  substantially more complicated, because terms need higher-order versions of
+  the unification and matching algorithms. Below we shall use the
+  antiquotations @{text "@{typ_pat \<dots>}"} and @{text "@{term_pat \<dots>}"} from
+  Section~\ref{sec:antiquote} in order to construct examples involving
+  schematic variables.
+
+  Let us begin with describing the unification and the matching function for
+  types.  Both return type environments, ML-type @{ML_type "Type.tyenv"},
+  which map schematic type variables to types (and sorts). Below we use the
+  function @{ML_ind typ_unify in Sign} from the structure @{ML_struct Sign}
+  for unifying the types @{text "?'a * ?'b"} and @{text "?'b list *
+  nat"}. This will produce the mapping, or type environment, @{text "[?'a :=
+  ?'b list, ?'b := nat]"}.
 *}
 
 ML{*val (tyenv_unif, _) = let
@@ -706,49 +706,124 @@
 end*}
 
 text {* 
-  The value @{ML_ind empty in Vartab} stands for the empty type environment, 
-  which is the value for starting the unification without any 
-  instantiations.\footnote{\bf FIXME: what is 0?}  In case of a failure 
-  @{ML typ_unify in Sign} will throw the exception @{ML_text TUNIFY}.
+  The environment @{ML_ind "Vartab.empty"} stands for the empty type environment, 
+  which is needed for starting the unification without any 
+  (pre)instantiations.  In case of a failure 
+  @{ML typ_unify in Sign} will throw the exception @{text TUNIFY}.
   We can print out the resulting type environment @{ML tyenv_unif} 
-  with the built-in function @{ML_ind dest in Vartab}.
+  with the built-in function @{ML_ind dest in Vartab} from the structure
+  @{ML_struct Vartab}.
 
   @{ML_response_fake [display,gray]
   "Vartab.dest tyenv_unif"
   "[((\"'a\", 0), ([\"HOL.type\"], \"?'b List.list\")), 
- ((\"'b\", 0), ([\"HOL.type\"], \"nat\"))]"}
-
-  For what follows we will use the following pretty-printing function
-  for @{ML_type "Type.tyenv"}s
+ ((\"'b\", 0), ([\"HOL.type\"], \"nat\"))]"} 
 *}
 
-ML{*fun pretty_tyenv ctxt tyenv =
+text_raw {*
+  \begin{figure}[t]
+  \begin{minipage}{\textwidth}
+  \begin{isabelle}*}
+ML{*fun pretty_helper f env =
+  env |> Vartab.dest  
+      |> map f 
+      |> commas 
+      |> enclose "[" "]" 
+      |> tracing
+
+fun pretty_tyenv ctxt tyenv =
 let
   fun aux (v, (s, T)) =
         Syntax.string_of_typ ctxt (TVar (v, s)) 
-        ^ " := " ^ Syntax.string_of_typ ctxt T
-in
-  tyenv |> Vartab.dest
-        |> map aux 
-        |> commas
-        |> enclose "[" "]"
-        |> tracing
+          ^ " := " ^ Syntax.string_of_typ ctxt T
+in 
+  pretty_helper aux tyenv
 end*}
+text_raw {*
+  \end{isabelle}
+  \end{minipage}
+  \caption{A pretty printing function for type environments, which are 
+  produced by unification and matching.\label{fig:prettyenv}}
+  \end{figure}
+*}
 
 text {*
-  which prints for the type environment above
+  The first components in this list stand for the schematic type variables and
+  the second are the associated sorts and types. In what follows we will use
+  the pretty-printing function in Figure~\ref{fig:prettyenv} for @{ML_type "Type.tyenv"}s.
+  For the type environment from the example this function prints out the 
+  more legible: 
 
   @{ML_response_fake [display, gray]
   "pretty_tyenv @{context} tyenv_unif"
   "[?'a := ?'b list, ?'b := nat]"}
 
-  Observe that the type environment that the function @{ML typ_unify in
+  The index @{text 0} in the example above is the maximal index of the schematic 
+  type variables occuring in @{text ty1} and @{text ty2}. It will be increased
+  whenever a new schematic type variable is introduced during unification.
+  This is for example the case when two schematic type variables have different, 
+  incomparable sorts. Then a new schematic type variable is introduced with 
+  the combined sorts. To show this let us use two sorts, @{text "s1"} and @{text "s2"}, 
+  which we attach to the schematic type variables @{text "?'a"} and @{text "?'b"}. 
+*}
+
+ML{*val (tyenv, index) = let
+  val ty2 = @{typ_pat "?'a::s1"}
+  val ty1 = @{typ_pat "?'b::s2"}
+in
+  Sign.typ_unify @{theory} (ty1, ty2) (Vartab.empty, 0) 
+end*}
+
+text {*
+  To print out the calculated type environment we also switch on the printing 
+  of sort information.
+*}
+
+ML{*show_sorts := true*}
+
+text {*
+  Now we can see
+
+  @{ML_response_fake [display,gray]
+  "pretty_tyenv @{context} tyenv"
+  "[?'a::s1 := ?'a1::{s1,s2}, ?'b::s2 := ?'a1::{s1,s2}]"}
+
+  the type variables @{text "?'a"} and @{text "?'b"} are instantiated
+  with a new type variable @{text "?'a1"} with sort @{text "{s1,s2}"}. Since a new
+  type variable has been introduced the @{ML index}, originally being @{text 0}, 
+  has been increased by @{text 1}.
+
+  @{ML_response [display,gray]
+  "index"
+  "1"}
+*}(*<*)ML %linenos{*show_sorts := false*}(*>*)
+
+text {*
+  The way the unification function @{ML typ_unify in Sign} is implemented 
+  using an initial type environment and initial index makes it easy to
+  unifying of more than two terms. For example 
+*}
+
+ML{*val (tyenvs, _) = let
+  val tys1 = (@{typ_pat "?'a"}, @{typ_pat "?'b list"})
+  val tys2 = (@{typ_pat "?'b"}, @{typ_pat "nat"})
+in
+  fold (Sign.typ_unify @{theory}) [tys1, tys2] (Vartab.empty, 0) 
+end*}
+
+text {*
+  Let us now return again to the unifier from the first example.
+
+  @{ML_response_fake [display, gray]
+  "pretty_tyenv @{context} tyenv_unif"
+  "[?'a := ?'b list, ?'b := nat]"}
+
+  Observe that the type environment which the function @{ML typ_unify in
   Sign} returns is in \emph{not} an instantiation in fully solved form: while @{text
   "?'b"} is instantiated to @{typ nat}, this is not propagated to the
   instantiation for @{text "?'a"}.  In unification theory, this is often
-  called an instantiation in \emph{triangular form}. These instantiations
-  are used because of performance reasons.
-
+  called an instantiation in \emph{triangular form}. These not fully solved 
+  instantiations are used because of performance reasons.
 
   To apply a type environment in triangular form to a type, say @{text "?'a *
   ?'b"}, you should use the function @{ML_ind norm_type in Envir}:
@@ -756,12 +831,14 @@
   @{ML_response_fake [display, gray]
   "Envir.norm_type tyenv_unif @{typ_pat \"?'a * ?'b\"}"
   "nat list * nat"}
-*}
-
-text {* 
+
+  With this function the type variables @{text "?'a"} and @{text "?'b"} are 
+  correctly instantiated.
+
   Matching of types can be done with the function @{ML_ind typ_match in Sign}
   from the structure @{ML_struct Sign}. It also returns a @{ML_type
-  Type.tyenv}. For example
+  Type.tyenv} and might raise the exception @{text TYPE_MATCH} in case
+  of failure. For example
 *}
 
 ML{*val tyenv_match = let
@@ -772,23 +849,24 @@
 end*}
 
 text {*
-  In case of failure, @{ML typ_match in Sign} will throw the exception 
-  @{ML_text TYPE_MATCH}. Printing out the calculated matcher gives
+  Printing out the calculated matcher gives
 
   @{ML_response_fake [display,gray]
   "pretty_tyenv @{context} tyenv_match"
   "[?'a := bool list, ?'b := nat]"}
   
-  Applying the matcher to a type needs to be done with the function
+  Unlike the unification, which uses the function @{ML norm_type in Envir}, 
+  applying the matcher to a type needs to be done with the function
   @{ML_ind subst_type in Envir}.
 
   @{ML_response_fake [display, gray]
   "Envir.subst_type tyenv_match @{typ_pat \"?'a * ?'b\"}"
   "bool list * nat"}
 
-  Be careful to observe the difference: use @{ML subst_type in Envir} 
-  for matchers and @{ML norm_type in Envir} for unifiers. To make the 
-  difference explicit let us calculate the following matcher.
+  Be careful to observe the difference: use always 
+  @{ML subst_type in Envir} for matchers and @{ML norm_type in Envir} 
+  for unifiers. To make the difference explicit, let us calculate the 
+  following matcher:
 *}
 
 ML{*val tyenv_match' = let
@@ -799,17 +877,15 @@
 end*}
 
 text {*
-  Now @{ML tyenv_unif} is equal to @{ML tyenv_match'}. Consider
-  again the type @{text "?'a * ?'b"}, which we used in the unification
-  and matching problems. If we apply @{ML norm_type in Envir} to 
-  the matcher we obtain
+  Now @{ML tyenv_unif} is equal to @{ML tyenv_match'}. If we apply 
+  @{ML norm_type in Envir} to the matcher we obtain
 
   @{ML_response_fake [display, gray]
   "Envir.norm_type tyenv_match' @{typ_pat \"?'a * ?'b\"}"
   "nat list * nat"}
 
   which is not a matcher for the second matching problem, and if
-  we apply @{ML subst_type in Envir} to the unifier we obtain
+  we apply @{ML subst_type in Envir} to the unifyier we obtain
 
   @{ML_response_fake [display, gray]
   "Envir.subst_type tyenv_unif @{typ_pat \"?'a * ?'b\"}"
@@ -829,16 +905,129 @@
 *}
 
 text {*
-  TBD below
-
-
+  Unification and matching of terms is substantially more complicated than the
+  type-case. The reason is that terms have abstractions and unification
+  or matching modulo plain equality is often not meaningful in this
+  context. Nevertheless, Isabelle implements the function @{ML_ind
+  first_order_match in Pattern} for terms.  This matching function returns a
+  type environment and a term environment.  To pretty print the latter we use
+  the function @{text "pretty_env"}:
+*}
+
+ML{*fun pretty_env ctxt env =
+let
+  fun aux (v, (T, t)) =
+        string_of_term ctxt (Var (v, T)) ^ " := " ^ string_of_term ctxt t
+in
+  pretty_helper aux env 
+end*}
+
+text {*
+  As can be seen from the @{text "aux"}-function, a term environment associates 
+  a schematic term variable with a type and a term.  An example of a first-order 
+  matching problem is the term @{term "P (\<lambda>a b. Q b a)"} and the pattern 
+  @{text "?X ?Y"}.
+*}
+
+ML{*val (_, fo_env) = let
+  val fo_pat = @{term_pat "(?X::(nat\<Rightarrow>nat\<Rightarrow>nat)\<Rightarrow>bool) ?Y"}
+  val trm = @{term "P::(nat\<Rightarrow>nat\<Rightarrow>nat)\<Rightarrow>bool"} 
+              $ @{term "\<lambda>a b. (Q::nat\<Rightarrow>nat\<Rightarrow>nat) b a"}
+  val init = (Vartab.empty, Vartab.empty) 
+in
+  Pattern.first_order_match @{theory} (fo_pat, trm) init
+end *}
+
+text {*
+  In this example we annotated explicitly types because then 
+  the type environment is empty and can be ignored. The 
+  resulting term environment is:
+
+  @{ML_response_fake [display, gray]
+  "pretty_env @{context} fo_env"
+  "[?X := P, ?Y := \<lambda>a b. Q b a]"}
+
+  The matcher can be applied to a term using the function @{ML_ind subst_term
+  in Envir} (remember the same convention for types applies to terms: @{ML
+  subst_term in Envir} is for matchers and @{ML norm_term in Envir} is for
+  unifyiers). The function @{ML subst_term in Envir} expects a type environment,
+  which is set to empty in the example below, and a term environment.
+
+  @{ML_response_fake [display, gray]
+  "let
+  val trm = @{term_pat \"(?X::(nat\<Rightarrow>nat\<Rightarrow>nat)\<Rightarrow>bool) ?Y\"}
+in
+  Envir.subst_term (Vartab.empty, fo_env) trm
+  |> string_of_term @{context}
+  |> tracing
+end"
+  "P (\<lambda>a b. Q b a)"}
+
+  First-order matching is useful for matching against applications and
+  variables. It can deal also with abstractions and a limited form of
+  alpha-equivalence, but this kind of matching should be used with care, since
+  it is not clear whether the result is meaningful. A working example is
+  matching @{text "\<lambda>x. P x"} against the pattern @{text "\<lambda>y. ?X y"}. In this
+  case first-order matching produces @{text "[?X := P]"}.
+
+  @{ML_response_fake [display, gray]
+  "let
+  val fo_pat = @{term_pat \"\<lambda>y. (?X::nat\<Rightarrow>bool) y\"}
+  val trm = @{term \"\<lambda>x. (P::nat\<Rightarrow>bool) x\"} 
+  val init = (Vartab.empty, Vartab.empty) 
+in
+  Pattern.first_order_match @{theory} (fo_pat, trm) init
+  |> snd 
+  |> pretty_env @{context} 
+end"
+  "[?X := P]"}
+*}
+
+text {*
+  In the context of unifying abstractions, more thouroughly studied is
+  higher-order pattern unification and higher-order pattern matching.  A
+  \emph{\index*{pattern}} is a lambda term whose ``head symbol'' (that is the
+  first symbol under abstractions) is either a constant, or a schematic or a free
+  variable. If it is a schematic variable then it can be only applied by
+  distinct bound variables. This excludes that a schematic variable is an
+  argument of a schematic variable and that a schematic variable is applied
+  twice to the same bound variable. The function @{ML_ind pattern in Pattern}
+  in the structure @{ML_struct Pattern} tests whether a term satisfies these
+  restrictions.
+
+
+  @{ML_response [display, gray]
+  "let
+  val trm_list = 
+        [@{term_pat \"?X\"},              @{term_pat \"a\"},
+         @{term_pat \"\<lambda>a b. ?X a b\"},    @{term_pat \"\<lambda>a b. (op +) a b\"},
+         @{term_pat \"\<lambda>a. (op +) a ?Y\"}, @{term_pat \"?X ?Y\"},
+         @{term_pat \"\<lambda>a b. ?X a b ?Y\"}, @{term_pat \"\<lambda>a. ?X a a\"}] 
+in
+  map Pattern.pattern trm_list
+end"
+  "[true, true, true, true, true, false, false, false]"}
+
+  The point of restricting unification and matching to patterns is that
+  it is decidable and produces most general unifiers. In this way 
+  matching and unify can be implemented so that they produce a type
+  and term environment (the latter actually produces only a term 
+  environment).
+*}
+
+
+text {*
   \begin{readmore}
   For term, unification and matching of higher-order patterns is implemented in 
-  @{ML_file "Pure/pattern.ML"}. Full higher-order unification is implemented
+  @{ML_file "Pure/pattern.ML"}. This file also contains a first-order matcher
+  for terms. Full higher-order unification is implemented
   in @{ML_file "Pure/unify.ML"}. 
   \end{readmore}
 *}
 
+
+
+
 section {* Type-Checking\label{sec:typechecking} *}
 
 text {* 
Binary file progtutorial.pdf has changed