diff -r 07a269d9642b -r 9f46a9571e37 Journal/Paper.thy
--- a/Journal/Paper.thy	Wed Aug 03 17:08:31 2011 +0000
+++ b/Journal/Paper.thy	Fri Aug 05 05:34:11 2011 +0000
@@ -34,7 +34,7 @@
 abbreviation "ATOM \<equiv> Atom"
 abbreviation "PLUS \<equiv> Plus"
 abbreviation "TIMES \<equiv> Times"
-abbreviation "TIMESS \<equiv> Times_set"
+abbreviation "TIMESS \<equiv> Timess"
 abbreviation "STAR \<equiv> Star"
 
 
@@ -143,19 +143,18 @@
   wide range of textbooks on this subject, many of which are aimed at students
   and contain very detailed `pencil-and-paper' proofs (e.g.~\cite{Kozen97,
   HopcroftUllman69}). It seems natural to exercise theorem provers by
-  formalising the theorems and by verifying formally the algorithms.  A
-  popular choice for a theorem prover would be one based on Higher-Order Logic
-  (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
+  formalising the theorems and by verifying formally the algorithms.  
+
+  A popular choice for a theorem prover would be one based on Higher-Order
+  Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
   presented in this paper we will use the latter. HOL is a predicate calculus
   that allows quantification over predicate variables. Its type system is
-  based on Church's Simple Theory of Types \cite{Church40}.  Although 
-  many mathematical concepts can be conveniently expressed in HOL, there are some
+  based on Church's Simple Theory of Types \cite{Church40}.  Although many
+  mathematical concepts can be conveniently expressed in HOL, there are some
   limitations that hurt badly, if one attempts a simple-minded formalisation
-  of regular languages in it.
-
-  The typical approach to regular languages is to introduce finite automata
-  and then define everything in terms of them \cite{Kozen97}.  For example, 
-  a regular language is normally defined as:
+  of regular languages in it.  The typical approach to regular languages is to
+  introduce finite automata and then define everything in terms of them
+  \cite{Kozen97}.  For example, a regular language is normally defined as:
 
   \begin{dfntn}\label{baddef}
   A language @{text A} is \emph{regular}, provided there is a 
@@ -252,7 +251,8 @@
   language as one for which there exists an automaton that recognises all its
   strings. This is because we cannot make a definition in HOL that is polymorphic in 
   the state type and there is no type quantification available in HOL (unlike 
-  in Coq, for example).
+  in Coq, for example).\footnote{Slind already pointed out this problem in an email 
+  to the HOL4 mailing list on 21st April 2005.}
 
   An alternative, which provides us with a single type for automata, is to give every 
   state node an identity, for example a natural
@@ -266,8 +266,8 @@
   problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
   requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} 
   dismisses for this the option of using identities, because it leads according to 
-  him to ``messy proofs''. Since he does not need to define what a regular
-  language is, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes 
+  him to ``messy proofs''. Since he does not need to define what regular
+  languages are, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes 
 
   \begin{quote}
   \it%
@@ -317,7 +317,7 @@
   of automata \cite{Brzozowski10}). Such side-conditions mean that if we define a regular
   language as one for which there exists \emph{a} finite automaton that
   recognises all its strings (see Def.~\ref{baddef}), then we need a lemma which
-  ensures that another equivalent can be found satisfying the
+  ensures that another equivalent one can be found satisfying the
   side-condition. Unfortunately, such `little' and `obvious' lemmas make
   a formalisation of automata theory a hair-pulling experience.
 
@@ -360,11 +360,12 @@
   certain tagging-functions, and another indirect proof using Antimirov's
   partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
   tagging-functions have not been used before to establish the Myhill-Nerode
-  theorem. Derivatives of regular expressions have been recently used quite
-  widely in the literature about regular expressions. However, partial
-  derivatives are more suitable in the context of the Myhill-Nerode theorem,
-  since it is easier to establish formally their finiteness result.
-
+  theorem. Derivatives of regular expressions have been used recently quite
+  widely in the literature; partial derivatives, in contrast, attracted much
+  less attention. However, partial derivatives are more suitable in the
+  context of the Myhill-Nerode theorem, since it is easier to establish
+  formally their finiteness result. We have not found any proof that uses
+  either of them in order to prove the Myhill-Nerode theorem.
 *}
 
 section {* Preliminaries *}
@@ -398,10 +399,13 @@
   we will make use of the following properties of these constructions.
   
   \begin{prpstn}\label{langprops}\mbox{}\\
-  \begin{tabular}{@ {}ll}
+  \begin{tabular}{@ {}lp{10cm}}
   (i)   & @{thm star_unfold_left}     \\ 
   (ii)  & @{thm[mode=IfThen] pow_length}\\
   (iii) & @{thm conc_Union_left} \\ 
+  (iv)  & If @{thm (prem 1) star_decom} and @{thm (prem 2) star_decom} then
+          there exists an @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with @{text "x = x\<^isub>p @ x\<^isub>s"} 
+          and @{term "x\<^isub>p \<noteq> []"} such that @{term "x\<^isub>p \<in> A"} and @{term "x\<^isub>s \<in> A\<star>"}.
   \end{tabular}
   \end{prpstn}
 
@@ -1167,8 +1171,8 @@
   \noindent
   that means we take the image of @{text f} w.r.t.~all elements in the
   domain. With this we will be able to infer that the tagging-functions, seen
-  as relations, give rise to finitely many equivalence classes of @{const
-  UNIV}. Finally we will show that the tagging-relations are more refined than
+  as relations, give rise to finitely many equivalence classes. 
+  Finally we will show that the tagging-relations are more refined than
   @{term "\<approx>(lang r)"}, which implies that @{term "UNIV // \<approx>(lang r)"} must
   also be finite.  We formally define the notion of a \emph{tagging-relation}
   as follows.
@@ -1249,7 +1253,7 @@
   is that we need to establish that @{term "=(tag_Plus A B)="} refines @{term "\<approx>(A \<union> B)"}. 
   This amounts to showing @{term "x \<approx>A y"} or @{term "x \<approx>B y"} under the assumption
   @{term "x"}~@{term "=(tag_Plus A B)="}~@{term y}. As we shall see, this definition will 
-  provide us just the right assumptions in order to get the proof through.
+  provide us with just the right assumptions in order to get the proof through.
 
  \begin{proof}[@{const "PLUS"}-Case]
   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite
@@ -1264,7 +1268,7 @@
 
   \noindent
   The @{const TIMES}-case is slightly more complicated. We first prove the
-  following lemma, which will aid the refinement-proofs.
+  following lemma, which will aid the proof about refinement.
 
   \begin{lmm}\label{refinement}
   The relation @{text "\<^raw:$\threesim$>\<^bsub>tag\<^esub>"} refines @{term "\<approx>A"}, provided for
@@ -1274,15 +1278,21 @@
 
 
   \noindent
-  We therefore can clean information from how the strings @{text "x @ z"} are in @{text A}
-  and construct appropriate tagging-functions to infer that @{term "y @ z \<in> A"}.
-  For the @{const Times}-case we additionally need the notion of the set of all
-  possible partitions of a string
+  We therefore can analyse how the strings @{text "x @ z"} are in the language
+  @{text A} and then construct an appropriate tagging-function to infer that
+  @{term "y @ z"} are also in @{text A}.  For this we sill need the notion of
+  the set of all possible \emph{partitions} of a string
 
   \begin{equation}
   @{thm Partitions_def}
   \end{equation}
 
+  \noindent
+  If we know that @{text "(x\<^isub>p, x\<^isub>s) \<in> Partitions x"}, we will
+  refer to @{text "x\<^isub>p"} as the \emph{prefix} of the string @{text x},
+  respectively to @{text "x\<^isub>s"} as the \emph{suffix}.
+
+
   Now assuming  @{term "x @ z \<in> A \<cdot> B"} there are only two possible ways of how to `split' 
   this string to be in @{term "A \<cdot> B"}:
   %
@@ -1350,12 +1360,13 @@
   (second picture). In both cases we have to show that @{term "y @ z \<in> A \<cdot> B"}. The first case
   we will only go through if we know that  @{term "x \<approx>A y"} holds @{text "(*)"}. Because then 
   we can infer from @{term "x @ z\<^isub>p \<in> A"} that @{term "y @ z\<^isub>p \<in> A"} holds for all @{text "z\<^isub>p"}.
-  In the second case we only know that @{text "x\<^isub>p"} and @{text "x\<^isub>s"} is a possible partition
-  of the string @{text x}. So we have to know that both @{text "x\<^isub>p"} and the
+  In the second case we only know that @{text "x\<^isub>p"} and @{text "x\<^isub>s"} is one possible partition
+  of the string @{text x}. We have to know that both @{text "x\<^isub>p"} and the
   corresponding partition @{text "y\<^isub>p"} are in @{text "A"}, and that @{text "x\<^isub>s"} is `@{text B}-related' 
   to @{text "y\<^isub>s"} @{text "(**)"}. From the latter fact we can infer that @{text "y\<^isub>s @ z \<in> B"}.
+  This will solve the second case.
   Taking the two requirements, @{text "(*)"} and @{text "(**)"}, together we define the
-  tagging-function as:
+  tagging-function in the @{const Times}-case as:
 
   \begin{center}
   @{thm (lhs) tag_Times_def[where ?A="A" and ?B="B"]}~@{text "\<equiv>"}~
@@ -1363,27 +1374,32 @@
   \end{center}
 
   \noindent
-  With this definition in place, we can discharge the @{const "Times"}-case as follows.
+  We have to make the assumption for all suffixes @{text "x\<^isub>s"}, since we do 
+  not know anything about how the string @{term x} is partitioned.
+  With this definition in place, let us prove the @{const "Times"}-case.
 
 
   \begin{proof}[@{const TIMES}-Case]
   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
   @{term "tag_Times A B"} is a subset of this product set, and therefore finite.
-  By Lemma \ref{refinement} we know
+  For the refinement of @{term "\<approx>(A \<cdot> B)"} and @{text "\<^raw:$\threesim$>\<^bsub>\<times>tag A B\<^esub>"}, 
+  we have by Lemma \ref{refinement} 
 
   \begin{center}
    @{term "tag_Times A B x = tag_Times A B y"}
   \end{center}
 
   \noindent
-  and @{term "x @ z \<in> A \<cdot> B"}, and have to establish @{term "y @ z \<in> A \<cdot> B"}. As shown
-  above, there are two cases to be considered (see pictures above). First, 
-  there exists a @{text "z\<^isub>p"} and @{text "z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s \<in> B"}.
-  By the assumption about @{term "tag_Times A B"} we have
-  @{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
-  relation that @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"}, we can conclude in this case
-  with @{term "y @ z \<in> A \<cdot> B"}.
+  and @{term "x @ z \<in> A \<cdot> B"}, and have to establish @{term "y @ z \<in> A \<cdot>
+  B"}. As shown in the pictures above, there are two cases to be
+  considered. First, there exists a @{text "z\<^isub>p"} and @{text
+  "z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s
+  \<in> B"}.  By the assumption about @{term "tag_Times A B"} we have @{term "\<approx>A
+  `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Hence by the Myhill-Nerode
+  relation @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"},
+  we can conclude in this case with @{term "y @ z \<in> A \<cdot> B"} (recall @{text
+  "z\<^isub>p @ z\<^isub>s = z"}).
 
   Second there exists a partition @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with 
   @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}. We therefore have
@@ -1403,7 +1419,7 @@
   This means there must be a partition @{text "y\<^isub>p"} and @{text "y\<^isub>s"}
   such that @{term "y\<^isub>p \<in> A"} and @{term "\<approx>B `` {x\<^isub>s} = \<approx>B ``
   {y\<^isub>s}"}. Unfolding the Myhill-Nerode relation and together with the
-  facts that @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}, we
+  facts that @{text "x\<^isub>p \<in> A"} and \mbox{@{text "x\<^isub>s @ z \<in> B"}}, we
   obtain @{term "y\<^isub>p \<in> A"} and @{text "y\<^isub>s @ z \<in> B"}, as needed in
   this case.  We again can complete the @{const TIMES}-case by setting @{text
   A} to @{term "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
@@ -1411,7 +1427,7 @@
 
   \noindent
   The case for @{const Star} is similar to @{const TIMES}, but poses a few
-  extra challenges.  To deal with them we define first the notion of a \emph{string
+  extra challenges.  To deal with them, we define first the notion of a \emph{string
   prefix} and a \emph{strict string prefix}:
 
   \begin{center}
@@ -1421,8 +1437,7 @@
   \end{tabular}
   \end{center}
 
-  \noindent
-  When we analyse the case of @{text "x @ z"} being an element in @{term "A\<star>"}
+  When analysing the case of @{text "x @ z"} being an element in @{term "A\<star>"}
   and @{text x} is not the empty string, we have the following picture:
 
   \begin{center}
@@ -1466,13 +1481,13 @@
   \noindent
   We can find a strict prefix @{text "x\<^isub>p"} of @{text x} such that @{term "x\<^isub>p \<in> A\<star>"},
   @{text "x\<^isub>p < x"} and the rest @{term "x\<^isub>s @ z \<in> A\<star>"}. For example the empty string 
-  @{text "[]"} would do.
+  @{text "[]"} would do (recall @{term "x \<noteq> []"}).
   There are potentially many such prefixes, but there can only be finitely many of them (the 
   string @{text x} is finite). Let us therefore choose the longest one and call it 
   @{text "x\<^bsub>pmax\<^esub>"}. Now for the rest of the string @{text "x\<^isub>s @ z"} we
-  know it is in @{term "A\<star>"} and it is not the empty string. By Prop.~\ref{langprops}@{text "(i)"}, 
+  know it is in @{term "A\<star>"} and cannot be the empty string. By Prop.~\ref{langprops}@{text "(iv)"}, 
   we can separate
-  this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<in> A"}
+  this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<noteq> []"}, @{text "a \<in> A"}
   and @{term "b \<in> A\<star>"}. Now @{text a} must be strictly longer than @{text "x\<^isub>s"},
   otherwise @{text "x\<^bsub>pmax\<^esub>"} is not the longest prefix. That means @{text a}
   `overlaps' with @{text z}, splitting it into two components @{text "z\<^isub>a"} and
@@ -1497,8 +1512,9 @@
   that @{term "x @ z \<in> A\<star>"} implies @{term "y @ z \<in> A\<star>"}.
 
   We first need to consider the case that @{text x} is the empty string.
-  From the assumption we can infer @{text y} is the empty string and
-  clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
+  From the assumption about strict prefixes in @{text "\<^raw:$\threesim$>\<^bsub>\<star>tag A\<^esub>"}, we 
+  can infer @{text y} is the empty string and
+  then clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
   string, we can divide the string @{text "x @ z"} as shown in the picture 
   above. By the tagging-function and the facts @{text "x\<^bsub>pmax\<^esub> \<in> A\<^isup>\<star>"} and @{text "x\<^bsub>pmax\<^esub> < x"}, 
   we have
@@ -1515,17 +1531,17 @@
   \end{center}
   
   \noindent 
-  and we know there exist partitions @{text "y\<^isub>p"} and @{text
+  From this we know there exist partitions @{text "y\<^isub>p"} and @{text
   "y\<^isub>s"} with @{term "y\<^isub>p \<in> A\<star>"} and also @{term "x\<^isub>s \<approx>A
   y\<^isub>s"}. Unfolding the Myhill-Nerode relation we know @{term
   "y\<^isub>s @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
   Therefore @{term "y\<^isub>p @ (y\<^isub>s @ z\<^isub>a) @ z\<^isub>b \<in>
   A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. As the last step we have to set
-  @{text "A"} to @{term "lang r"} and complete the proof.
+  @{text "A"} to @{term "lang r"} and thus complete the proof.
   \end{proof}
 *}
 
-section {* Second Part based on Partial Derivatives *}
+section {* Second Part proved using Partial Derivatives *}
 
 text {*
   \noindent
@@ -1534,10 +1550,12 @@
   a more refined relation than @{term "\<approx>(lang r)"} for which we can
   show that there are only finitely many equivalence classes. So far we 
   showed this by induction on @{text "r"}. However, there is also 
-  an indirect method to come up with such a refined relation. Assume
-  the following two definitions for a left-quotient of a language, which 
-  we write as @{term "Der c A"} and @{term "Ders s A"} where 
-  @{text c} is a character and @{text s} a string:
+  an indirect method to come up with such a refined relation based on
+  derivatives of regular expressions \cite{Brzozowski64}. 
+
+  Assume the following two definitions for a \emph{left-quotient} of a language,
+  which we write as @{term "Der c A"} and @{term "Ders s A"} where @{text c}
+  is a character and @{text s} a string:
 
   \begin{center}
   \begin{tabular}{r@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
@@ -1547,6 +1565,14 @@
   \end{center}
 
   \noindent
+  In order to aid readability, we shall also make use of the following abbreviation:
+
+  \begin{center}
+  @{abbrev "Derss s A"}
+  \end{center}
+  
+
+  \noindent
   Clearly we have the following relation between the Myhill-Nerode relation
   (Def.~\ref{myhillneroderel}) and left-quotients
 
@@ -1555,20 +1581,20 @@
   \end{equation}
 
   \noindent
-  It is realtively easy to establish the following properties for left-quotients:
+  It is straightforward to establish the following properties for left-quotients:
   
   \begin{equation}
   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
-  @{thm (lhs) Der_zero}  & $=$ & @{thm (rhs) Der_zero}\\
-  @{thm (lhs) Der_one}   & $=$ & @{thm (rhs) Der_one}\\
-  @{thm (lhs) Der_atom}  & $=$ & @{thm (rhs) Der_atom}\\
-  @{thm (lhs) Der_union} & $=$ & @{thm (rhs) Der_union}\\
+  @{thm (lhs) Der_simps(1)} & $=$ & @{thm (rhs) Der_simps(1)}\\
+  @{thm (lhs) Der_simps(2)} & $=$ & @{thm (rhs) Der_simps(2)}\\
+  @{thm (lhs) Der_simps(3)} & $=$ & @{thm (rhs) Der_simps(3)}\\
+  @{thm (lhs) Der_simps(4)} & $=$ & @{thm (rhs) Der_simps(4)}\\
   @{thm (lhs) Der_conc}  & $=$ & @{thm (rhs) Der_conc}\\
   @{thm (lhs) Der_star}  & $=$ & @{thm (rhs) Der_star}\\
-  @{thm (lhs) Ders_nil}  & $=$ & @{thm (rhs) Ders_nil}\\
-  @{thm (lhs) Ders_cons} & $=$ & @{thm (rhs) Ders_cons}\\
-  %@{thm (lhs) Ders_append[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}  & $=$ 
-  %   & @{thm (rhs) Ders_append[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}\\
+  @{thm (lhs) Ders_simps(1)} & $=$ & @{thm (rhs) Ders_simps(1)}\\
+  @{thm (lhs) Ders_simps(2)} & $=$ & @{thm (rhs) Ders_simps(2)}\\
+  %@{thm (lhs) Ders_simps(3)[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}  & $=$ 
+  %   & @{thm (rhs) Ders_simps(3)[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}\\
   \end{tabular}}
   \end{equation}
 
@@ -1603,10 +1629,10 @@
   \end{center}
 
   \noindent
-  The last two lines extend @{const der} to strings (list of characters) where
-  list-cons is written \mbox{@{text "_ :: _"}}. The function @{term "nullable r"} needed 
-  in the @{const Times}-case tests whether a regular expression can recognise 
-  the empty string:
+  The last two clauses extend derivatives for characters to strings (list of
+  characters). The list-cons operator is written \mbox{@{text "_ :: _"}}. The
+  function @{term "nullable r"} needed in the @{const Times}-case tests
+  whether a regular expression can recognise the empty string:
 
   \begin{center}
   \begin{tabular}{c@ {\hspace{10mm}}c}
@@ -1639,8 +1665,8 @@
 
   \noindent
   The importance in the context of the Myhill-Nerode theorem is that 
-  we can use \eqref{mhders} and the equations above in order to 
-  establish that @{term "x \<approx>(lang r) y"} if and only if
+  we can use \eqref{mhders} and \eqref{Dersders} in order to 
+  establish that @{term "x \<approx>(lang r) y"} is equivalent to
   @{term "lang (ders x r) = lang (ders y r)"}. From this we obtain
 
   \begin{equation}
@@ -1649,29 +1675,35 @@
 
 
   \noindent
-  Consequently, we can use as the tagging relation @{text
-  "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"}, since it refines
-  @{term "\<approx>(lang r)"}. However, in order to be useful in the Myhill-Nerode
-  theorem, we have to show that for a language there are only finitely many
-  derivatives. Unfortunately, this is not true in general: Sakarovitch gives
-  an example with infinitely many derivatives
-  \cite[Page~141]{Sakarovitch09}. What Brzozowski \cite{Brzozowski64} proved is 
-  that for every language there \emph{are} finitely `dissimilar' derivatives for a
-  regular expression. Two regular expressions are said to be \emph{similar} 
-  provided they can be identified using the using the @{text "ACI"}-identities:
+  which means the right-hand side (seen as relation) refines the
+  Myhill-Nerode relation.  Consequently, we can use 
+  @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a potential tagging-relation
+  for the regular expression @{text r}. However, in
+  order to be useful in the Myhill-Nerode theorem, we also have to show that
+  for the corresponding language there are only finitely many derivatives---ensuring
+  that there are only finitely many equivalence classes. Unfortunately, this
+  is not true in general. Sakarovitch gives an example where a regular
+  expression  has infinitely many derivatives w.r.t.~a language
+  \cite[Page~141]{Sakarovitch09}. What Brzozowski \cite{Brzozowski64} proved
+  is that for every language there \emph{are} only finitely `dissimilar'
+  derivatives for a regular expression. Two regular expressions are said to be
+  \emph{similar} provided they can be identified using the using the @{text
+  "ACI"}-identities:
 
-  \begin{center}
-  \begin{tabular}{cl}
+  \begin{equation}\label{ACI}
+  \mbox{\begin{tabular}{cl}
   (@{text A}) & @{term "Plus (Plus r\<^isub>1 r\<^isub>2) r\<^isub>3"} $\equiv$ @{term "Plus r\<^isub>1 (Plus r\<^isub>2 r\<^isub>3)"}\\
   (@{text C}) & @{term "Plus r\<^isub>1 r\<^isub>2"} $\equiv$ @{term "Plus r\<^isub>2 r\<^isub>1"}\\
   (@{text I}) & @{term "Plus r r"} $\equiv$ @{term "r"}\\
-  \end{tabular}
-  \end{center}
+  \end{tabular}}
+  \end{equation}
 
   \noindent
-  Carrying this idea through menas we now have to reasoning modulo @{text "ACI"}.
-  This can be done, but it is very painful in a theorem prover (since there is
-  no direct characterisation of the set of dissimlar derivatives).
+  Carrying this idea through, we must not consider the set of all derivatives,
+  but the ones modulo @{text "ACI"}.  In principle, this can be formally
+  defined, but it is very painful in a theorem prover (since there is no
+  direct characterisation of the set of dissimlar derivatives).
+
 
   Fortunately, there is a much simpler approach using \emph{partial
   derivatives}. They were introduced by Antimirov \cite{Antimirov95} and can be defined
@@ -1686,49 +1718,97 @@
      & @{text "\<equiv>"} & @{thm (rhs) pder.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
   @{thm (lhs) pder.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
      & @{text "\<equiv>"} & @{text "if"}~@{term "nullable r\<^isub>1"}~@{text "then"}~%
-       @{term "(Times_set (pder c r\<^isub>1) r\<^isub>2) \<union> (pder c r\<^isub>2)"}\\
+       @{term "(Timess (pder c r\<^isub>1) r\<^isub>2) \<union> (pder c r\<^isub>2)"}\\
      & & \phantom{@{text "if"}~@{term "nullable r\<^isub>1"}~}@{text "else"}~%  
-                    @{term "Times_set (pder c r\<^isub>1) r\<^isub>2"}\\ 
+                    @{term "Timess (pder c r\<^isub>1) r\<^isub>2"}\\ 
   @{thm (lhs) pder.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) pder.simps(6)}\smallskip\\
   @{thm (lhs) pders.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(1)}\\
-  @{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(2)}\\
+  @{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{text "\<Union> (pders s) ` (pder c r)"}\\
   \end{tabular}
   \end{center}
 
   \noindent
-  Unlike `simple' derivatives, these functions return a set of regular
-  expressions. In the @{const Times} and @{const Star} cases we use 
+  Again the last two clauses extend partial derivatives from characters to strings. 
+  Unlike `simple' derivatives, the functions for partial derivatives return sets of regular
+  expressions. In the @{const Times} and @{const Star} cases we therefore use the
+  auxiliary definition
 
   \begin{center}
   @{text "TIMESS rs r \<equiv> {TIMES r' r | r' \<in> rs}"}
   \end{center}
 
   \noindent
-  in order to `sequence' a regular expressions with respect to a set of regular 
-  expresions. Note that in the last case we first build the set of partial derivatives
-  w.r.t~@{text c}, then build the image of this set under the function @{term "pders s"}
-  and finally `union up' all resulting sets. Note also that in some sense, partial 
-  derivatives have the @{text "ACI"}-identities already build in. Antimirov 
-  \cite{Antimirov95}
-  showed 
+  in order to `sequence' a regular expression with a set of regular
+  expressions. Note that in the last clause we first build the set of partial
+  derivatives w.r.t~the character @{text c}, then build the image of this set under the
+  function @{term "pders s"} and finally `union up' all resulting sets. It will be
+  convenient to introduce the following abbreviation
+
+  \begin{center}
+  @{abbrev "pderss s A"}
+  \end{center}
+
+  \noindent
+  which simplifies the last clause of @{const "pders"} to
+
+  \begin{center}
+  \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
+  @{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(2)}\\
+  \end{tabular}
+  \end{center}
+
+  Partial derivatives can be seen as having the @{text "ACI"}-identities already built in: 
+  taking the partial derivatives of the
+  regular expressions in \eqref{ACI} gives us in each case
+  equal sets.  Antimirov \cite{Antimirov95} showed a similar result to
+  \eqref{Dersders} for partial derivatives:
 
   \begin{equation}
-  \mbox{\begin{tabular}{c}
-  @{thm Der_pder}\\
-  @{thm Ders_pders}
+  \mbox{\begin{tabular}{lc}
+  @{text "(i)"}  & @{thm Der_pder}\\
+  @{text "(ii)"} & @{thm Ders_pders}
   \end{tabular}}
-  \end{equation}
+  \end{equation} 
+
+  \begin{proof}
+  The first fact is by a simple induction on @{text r}. For the second we slightly
+  modify Antimirov's proof by performing an induction on @{text s} where we
+  genaralise over all @{text r}. That means in the @{text "cons"}-case the 
+  induction hypothesis is
+
+  \begin{center}
+  @{text "(IH)"}\hspace{3mm}@{term "\<forall>r. Ders s (lang r) = \<Union> lang ` (pders s r)"}
+  \end{center}
 
   \noindent
-  and proved that for every language and regular expression there are only finitely
+  With this we can establish
+
+  \begin{center}
+  \begin{tabular}{r@ {\hspace{1.5mm}}c@ {\hspace{1.5mm}}ll}
+  @{term "Ders (c # s) (lang r)"} 
+    & @{text "="} & @{term "Ders s (Der c (lang r))"} & by def.\\
+    & @{text "="} & @{term "Ders s (\<Union> lang ` (pder c r))"} & by @{text "(i)"}\\
+    & @{text "="} & @{term "\<Union> (Ders s) ` (lang ` (pder c r))"} & by def.~of @{text "Ders"}\\
+    & @{text "="} & @{term "\<Union> lang ` (\<Union> pders s ` (pder c r))"} & by IH\\
+    & @{text "="} & @{term "\<Union> lang ` (pders (c # s) r)"} & by def.\\
+  \end{tabular}
+  \end{center}
+  
+  \noindent
+  In order to apply the induction hypothesis in the fourth step, we need the generalisation
+  over all regular expressions @{text r}. The case for the empty string is routine and omitted.
+  \end{proof}
+
+  Antimirov also proved that for every language and regular expression there are only finitely
   many partial derivatives.
 *}
 
 section {* Closure Properties *}
 
 text {*
-  The beautiful characteristics of regular languages is that they are closed
-  under many operations. Closure under union, sequencencing and Kleene-star
+  \noindent
+  The real beauty of regular languages is that they are closed
+  under almost all set operations. Closure under union, concatenation and Kleene-star
   are trivial to establish given our definition of regularity (Def.~\ref{regular}).
   More interesting is the closure under complement, because
   it seems difficult to construct a regular expression for the complement