diff -r 0dc727055c11 -r 97518188ef0e ProgTutorial/General.thy --- a/ProgTutorial/General.thy Mon Nov 09 10:07:08 2009 +0100 +++ b/ProgTutorial/General.thy Tue Nov 10 04:57:42 2009 +0100 @@ -677,25 +677,25 @@ section {* Unification and Matching (TBD) *} text {* - Isabelle's terms and types contain schematic term variables + Isabelle's terms and types may contain schematic term variables (term-constructor @{ML Var}) and schematic type variables (term-constructor - @{ML TVar}). Both are printed out as variables with a leading question - mark. They stand for unknown terms and types, which can be made more concrete - by instantiations. Such instantiations can be calculated by unification or - matching. While in case of types, unification and matching is relatively - straightforward, in case of terms the algorithms are substantially more - complicated, because terms need higher-order versions of the unification and - matching algorithms. Below we shall use the antiquotation @{text "@{typ_pat \}"} and - @{text "@{term_pat \}"} from Section~\ref{sec:antiquote} in order to - construct examples involving schematic variables. - - Let us begin with describing unification and matching of types. Both - matching and unification of types return a type environment, ML-type - @{ML_type "Type.tyenv"}, which maps schematic type variables to types (and - sorts). Below we use the function @{ML_ind typ_unify in Sign} from the - structure @{ML_struct Sign} for unifying - the types @{text "?'a * ?'b"} and @{text "?'b list * nat"}. This will - produce the mapping @{text "[?'a := ?'b list, ?'b := nat]"}. + @{ML TVar}). These variables stand for unknown entities, which can be made + more concrete by instantiations. Such instantiations might be a result of + unification or matching. While in case of types, unification and matching is + relatively straightforward, in case of terms the algorithms are + substantially more complicated, because terms need higher-order versions of + the unification and matching algorithms. Below we shall use the + antiquotations @{text "@{typ_pat \}"} and @{text "@{term_pat \}"} from + Section~\ref{sec:antiquote} in order to construct examples involving + schematic variables. + + Let us begin with describing the unification and the matching function for + types. Both return type environments, ML-type @{ML_type "Type.tyenv"}, + which map schematic type variables to types (and sorts). Below we use the + function @{ML_ind typ_unify in Sign} from the structure @{ML_struct Sign} + for unifying the types @{text "?'a * ?'b"} and @{text "?'b list * + nat"}. This will produce the mapping, or type environment, @{text "[?'a := + ?'b list, ?'b := nat]"}. *} ML{*val (tyenv_unif, _) = let @@ -706,49 +706,124 @@ end*} text {* - The value @{ML_ind empty in Vartab} stands for the empty type environment, - which is the value for starting the unification without any - instantiations.\footnote{\bf FIXME: what is 0?} In case of a failure - @{ML typ_unify in Sign} will throw the exception @{ML_text TUNIFY}. + The environment @{ML_ind "Vartab.empty"} stands for the empty type environment, + which is needed for starting the unification without any + (pre)instantiations. In case of a failure + @{ML typ_unify in Sign} will throw the exception @{text TUNIFY}. We can print out the resulting type environment @{ML tyenv_unif} - with the built-in function @{ML_ind dest in Vartab}. + with the built-in function @{ML_ind dest in Vartab} from the structure + @{ML_struct Vartab}. @{ML_response_fake [display,gray] "Vartab.dest tyenv_unif" "[((\"'a\", 0), ([\"HOL.type\"], \"?'b List.list\")), - ((\"'b\", 0), ([\"HOL.type\"], \"nat\"))]"} - - For what follows we will use the following pretty-printing function - for @{ML_type "Type.tyenv"}s + ((\"'b\", 0), ([\"HOL.type\"], \"nat\"))]"} *} -ML{*fun pretty_tyenv ctxt tyenv = +text_raw {* + \begin{figure}[t] + \begin{minipage}{\textwidth} + \begin{isabelle}*} +ML{*fun pretty_helper f env = + env |> Vartab.dest + |> map f + |> commas + |> enclose "[" "]" + |> tracing + +fun pretty_tyenv ctxt tyenv = let fun aux (v, (s, T)) = Syntax.string_of_typ ctxt (TVar (v, s)) - ^ " := " ^ Syntax.string_of_typ ctxt T -in - tyenv |> Vartab.dest - |> map aux - |> commas - |> enclose "[" "]" - |> tracing + ^ " := " ^ Syntax.string_of_typ ctxt T +in + pretty_helper aux tyenv end*} +text_raw {* + \end{isabelle} + \end{minipage} + \caption{A pretty printing function for type environments, which are + produced by unification and matching.\label{fig:prettyenv}} + \end{figure} +*} text {* - which prints for the type environment above + The first components in this list stand for the schematic type variables and + the second are the associated sorts and types. In what follows we will use + the pretty-printing function in Figure~\ref{fig:prettyenv} for @{ML_type "Type.tyenv"}s. + For the type environment from the example this function prints out the + more legible: @{ML_response_fake [display, gray] "pretty_tyenv @{context} tyenv_unif" "[?'a := ?'b list, ?'b := nat]"} - Observe that the type environment that the function @{ML typ_unify in + The index @{text 0} in the example above is the maximal index of the schematic + type variables occuring in @{text ty1} and @{text ty2}. It will be increased + whenever a new schematic type variable is introduced during unification. + This is for example the case when two schematic type variables have different, + incomparable sorts. Then a new schematic type variable is introduced with + the combined sorts. To show this let us use two sorts, @{text "s1"} and @{text "s2"}, + which we attach to the schematic type variables @{text "?'a"} and @{text "?'b"}. +*} + +ML{*val (tyenv, index) = let + val ty2 = @{typ_pat "?'a::s1"} + val ty1 = @{typ_pat "?'b::s2"} +in + Sign.typ_unify @{theory} (ty1, ty2) (Vartab.empty, 0) +end*} + +text {* + To print out the calculated type environment we also switch on the printing + of sort information. +*} + +ML{*show_sorts := true*} + +text {* + Now we can see + + @{ML_response_fake [display,gray] + "pretty_tyenv @{context} tyenv" + "[?'a::s1 := ?'a1::{s1,s2}, ?'b::s2 := ?'a1::{s1,s2}]"} + + the type variables @{text "?'a"} and @{text "?'b"} are instantiated + with a new type variable @{text "?'a1"} with sort @{text "{s1,s2}"}. Since a new + type variable has been introduced the @{ML index}, originally being @{text 0}, + has been increased by @{text 1}. + + @{ML_response [display,gray] + "index" + "1"} +*}(*<*)ML %linenos{*show_sorts := false*}(*>*) + +text {* + The way the unification function @{ML typ_unify in Sign} is implemented + using an initial type environment and initial index makes it easy to + unifying of more than two terms. For example +*} + +ML{*val (tyenvs, _) = let + val tys1 = (@{typ_pat "?'a"}, @{typ_pat "?'b list"}) + val tys2 = (@{typ_pat "?'b"}, @{typ_pat "nat"}) +in + fold (Sign.typ_unify @{theory}) [tys1, tys2] (Vartab.empty, 0) +end*} + +text {* + Let us now return again to the unifier from the first example. + + @{ML_response_fake [display, gray] + "pretty_tyenv @{context} tyenv_unif" + "[?'a := ?'b list, ?'b := nat]"} + + Observe that the type environment which the function @{ML typ_unify in Sign} returns is in \emph{not} an instantiation in fully solved form: while @{text "?'b"} is instantiated to @{typ nat}, this is not propagated to the instantiation for @{text "?'a"}. In unification theory, this is often - called an instantiation in \emph{triangular form}. These instantiations - are used because of performance reasons. - + called an instantiation in \emph{triangular form}. These not fully solved + instantiations are used because of performance reasons. To apply a type environment in triangular form to a type, say @{text "?'a * ?'b"}, you should use the function @{ML_ind norm_type in Envir}: @@ -756,12 +831,14 @@ @{ML_response_fake [display, gray] "Envir.norm_type tyenv_unif @{typ_pat \"?'a * ?'b\"}" "nat list * nat"} -*} - -text {* + + With this function the type variables @{text "?'a"} and @{text "?'b"} are + correctly instantiated. + Matching of types can be done with the function @{ML_ind typ_match in Sign} from the structure @{ML_struct Sign}. It also returns a @{ML_type - Type.tyenv}. For example + Type.tyenv} and might raise the exception @{text TYPE_MATCH} in case + of failure. For example *} ML{*val tyenv_match = let @@ -772,23 +849,24 @@ end*} text {* - In case of failure, @{ML typ_match in Sign} will throw the exception - @{ML_text TYPE_MATCH}. Printing out the calculated matcher gives + Printing out the calculated matcher gives @{ML_response_fake [display,gray] "pretty_tyenv @{context} tyenv_match" "[?'a := bool list, ?'b := nat]"} - Applying the matcher to a type needs to be done with the function + Unlike the unification, which uses the function @{ML norm_type in Envir}, + applying the matcher to a type needs to be done with the function @{ML_ind subst_type in Envir}. @{ML_response_fake [display, gray] "Envir.subst_type tyenv_match @{typ_pat \"?'a * ?'b\"}" "bool list * nat"} - Be careful to observe the difference: use @{ML subst_type in Envir} - for matchers and @{ML norm_type in Envir} for unifiers. To make the - difference explicit let us calculate the following matcher. + Be careful to observe the difference: use always + @{ML subst_type in Envir} for matchers and @{ML norm_type in Envir} + for unifiers. To make the difference explicit, let us calculate the + following matcher: *} ML{*val tyenv_match' = let @@ -799,17 +877,15 @@ end*} text {* - Now @{ML tyenv_unif} is equal to @{ML tyenv_match'}. Consider - again the type @{text "?'a * ?'b"}, which we used in the unification - and matching problems. If we apply @{ML norm_type in Envir} to - the matcher we obtain + Now @{ML tyenv_unif} is equal to @{ML tyenv_match'}. If we apply + @{ML norm_type in Envir} to the matcher we obtain @{ML_response_fake [display, gray] "Envir.norm_type tyenv_match' @{typ_pat \"?'a * ?'b\"}" "nat list * nat"} which is not a matcher for the second matching problem, and if - we apply @{ML subst_type in Envir} to the unifier we obtain + we apply @{ML subst_type in Envir} to the unifyier we obtain @{ML_response_fake [display, gray] "Envir.subst_type tyenv_unif @{typ_pat \"?'a * ?'b\"}" @@ -829,16 +905,129 @@ *} text {* - TBD below - - + Unification and matching of terms is substantially more complicated than the + type-case. The reason is that terms have abstractions and unification + or matching modulo plain equality is often not meaningful in this + context. Nevertheless, Isabelle implements the function @{ML_ind + first_order_match in Pattern} for terms. This matching function returns a + type environment and a term environment. To pretty print the latter we use + the function @{text "pretty_env"}: +*} + +ML{*fun pretty_env ctxt env = +let + fun aux (v, (T, t)) = + string_of_term ctxt (Var (v, T)) ^ " := " ^ string_of_term ctxt t +in + pretty_helper aux env +end*} + +text {* + As can be seen from the @{text "aux"}-function, a term environment associates + a schematic term variable with a type and a term. An example of a first-order + matching problem is the term @{term "P (\a b. Q b a)"} and the pattern + @{text "?X ?Y"}. +*} + +ML{*val (_, fo_env) = let + val fo_pat = @{term_pat "(?X::(nat\nat\nat)\bool) ?Y"} + val trm = @{term "P::(nat\nat\nat)\bool"} + $ @{term "\a b. (Q::nat\nat\nat) b a"} + val init = (Vartab.empty, Vartab.empty) +in + Pattern.first_order_match @{theory} (fo_pat, trm) init +end *} + +text {* + In this example we annotated explicitly types because then + the type environment is empty and can be ignored. The + resulting term environment is: + + @{ML_response_fake [display, gray] + "pretty_env @{context} fo_env" + "[?X := P, ?Y := \a b. Q b a]"} + + The matcher can be applied to a term using the function @{ML_ind subst_term + in Envir} (remember the same convention for types applies to terms: @{ML + subst_term in Envir} is for matchers and @{ML norm_term in Envir} is for + unifyiers). The function @{ML subst_term in Envir} expects a type environment, + which is set to empty in the example below, and a term environment. + + @{ML_response_fake [display, gray] + "let + val trm = @{term_pat \"(?X::(nat\nat\nat)\bool) ?Y\"} +in + Envir.subst_term (Vartab.empty, fo_env) trm + |> string_of_term @{context} + |> tracing +end" + "P (\a b. Q b a)"} + + First-order matching is useful for matching against applications and + variables. It can deal also with abstractions and a limited form of + alpha-equivalence, but this kind of matching should be used with care, since + it is not clear whether the result is meaningful. A working example is + matching @{text "\x. P x"} against the pattern @{text "\y. ?X y"}. In this + case first-order matching produces @{text "[?X := P]"}. + + @{ML_response_fake [display, gray] + "let + val fo_pat = @{term_pat \"\y. (?X::nat\bool) y\"} + val trm = @{term \"\x. (P::nat\bool) x\"} + val init = (Vartab.empty, Vartab.empty) +in + Pattern.first_order_match @{theory} (fo_pat, trm) init + |> snd + |> pretty_env @{context} +end" + "[?X := P]"} +*} + +text {* + In the context of unifying abstractions, more thouroughly studied is + higher-order pattern unification and higher-order pattern matching. A + \emph{\index*{pattern}} is a lambda term whose ``head symbol'' (that is the + first symbol under abstractions) is either a constant, or a schematic or a free + variable. If it is a schematic variable then it can be only applied by + distinct bound variables. This excludes that a schematic variable is an + argument of a schematic variable and that a schematic variable is applied + twice to the same bound variable. The function @{ML_ind pattern in Pattern} + in the structure @{ML_struct Pattern} tests whether a term satisfies these + restrictions. + + + @{ML_response [display, gray] + "let + val trm_list = + [@{term_pat \"?X\"}, @{term_pat \"a\"}, + @{term_pat \"\a b. ?X a b\"}, @{term_pat \"\a b. (op +) a b\"}, + @{term_pat \"\a. (op +) a ?Y\"}, @{term_pat \"?X ?Y\"}, + @{term_pat \"\a b. ?X a b ?Y\"}, @{term_pat \"\a. ?X a a\"}] +in + map Pattern.pattern trm_list +end" + "[true, true, true, true, true, false, false, false]"} + + The point of restricting unification and matching to patterns is that + it is decidable and produces most general unifiers. In this way + matching and unify can be implemented so that they produce a type + and term environment (the latter actually produces only a term + environment). +*} + + +text {* \begin{readmore} For term, unification and matching of higher-order patterns is implemented in - @{ML_file "Pure/pattern.ML"}. Full higher-order unification is implemented + @{ML_file "Pure/pattern.ML"}. This file also contains a first-order matcher + for terms. Full higher-order unification is implemented in @{ML_file "Pure/unify.ML"}. \end{readmore} *} + + + section {* Type-Checking\label{sec:typechecking} *} text {*