diff -r cf1c17431dab -r b300f2c5d51d Journal/Paper.thy
--- a/Journal/Paper.thy	Mon Aug 15 22:36:26 2011 +0000
+++ b/Journal/Paper.thy	Tue Aug 16 10:21:14 2011 +0000
@@ -74,9 +74,9 @@
   Der ("Der _ _" [1000, 1000] 100) and
   ONE ("ONE" 999) and
   ZERO ("ZERO" 999) and
-  pders_lang ("pderl") and
+  pders_lang ("pdersl") and
   UNIV1 ("UNIV\<^isup>+") and
-  Ders_lang ("Derl")
+  Ders_lang ("Dersl")
 
 lemma meta_eq_app:
   shows "f \<equiv> \<lambda>x. g x \<Longrightarrow> f x \<equiv> g x"
@@ -159,7 +159,8 @@
 
   The typical approach to regular languages is to
   introduce finite automata and then define everything in terms of them
-  \cite{Kozen97}.  For example, a regular language is normally defined as:
+  \cite{ HopcroftUllman69,Kozen97}.  For example, a regular language is 
+  normally defined as:
 
   \begin{dfntn}\label{baddef}
   A language @{text A} is \emph{regular}, provided there is a 
@@ -251,22 +252,21 @@
   \noindent
   changes the type---the disjoint union is not a set, but a set of
   pairs. Using this definition for disjoint union means we do not have a
-  single type for automata. As a result we will not be able to define a
+  single type for the states of automata. As a result we will not be able to define a
   regular language as one for which there exists an automaton that recognises
   all its strings (Definition~\ref{baddef}). This is because we cannot make a
-  definition in HOL that is polymorphic in the state type and there is no type
+  definition in HOL that is only polymorphic in the state type and there is no type
   quantification available in HOL (unlike in Coq, for example).\footnote{Slind
   already pointed out this problem in an email to the HOL4 mailing list on
   21st April 2005.}
 
-
-  An alternative, which provides us with a single type for automata, is to give every 
-  state node an identity, for example a natural
-  number, and then be careful to rename these identities apart whenever
-  connecting two automata. This results in clunky proofs
-  establishing that properties are invariant under renaming. Similarly,
-  connecting two automata represented as matrices results in very adhoc
-  constructions, which are not pleasant to reason about.
+  An alternative, which provides us with a single type for states of automata,
+  is to give every state node an identity, for example a natural number, and
+  then be careful to rename these identities apart whenever connecting two
+  automata. This results in clunky proofs establishing that properties are
+  invariant under renaming. Similarly, connecting two automata represented as
+  matrices results in very adhoc constructions, which are not pleasant to
+  reason about.
 
   Functions are much better supported in Isabelle/HOL, but they still lead to similar
   problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
@@ -434,7 +434,7 @@
 
   Central to our proof will be the solution of equational systems
   involving equivalence classes of languages. For this we will use Arden's Lemma 
-  (see \cite[Page 100]{Sakarovitch09}),
+  (see for example \cite[Page 100]{Sakarovitch09}),
   which solves equations of the form @{term "X = A \<cdot> X \<union> B"} provided
   @{term "[] \<notin> A"}. However we will need the following `reverse' 
   version of Arden's Lemma (`reverse' in the sense of changing the order of @{term "A \<cdot> X"} to
@@ -448,7 +448,7 @@
 
   \begin{proof}
   For the right-to-left direction we assume @{thm (rhs) arden} and show
-  that @{thm (lhs) arden} holds. From Prop.~\ref{langprops}@{text "(i)"} 
+  that @{thm (lhs) arden} holds. From Property~\ref{langprops}@{text "(i)"} 
   we have @{term "A\<star> = A \<cdot> A\<star> \<union> {[]}"},
   which is equal to @{term "A\<star> = A\<star> \<cdot> A \<union> {[]}"}. Adding @{text B} to both 
   sides gives @{term "B \<cdot> A\<star> = B \<cdot> (A\<star> \<cdot> A \<union> {[]})"}, whose right-hand side
@@ -467,12 +467,12 @@
   of @{text "\<star>"}.
   For the inclusion in the other direction we assume a string @{text s}
   with length @{text k} is an element in @{text X}. Since @{thm (prem 1) arden}
-  we know by Prop.~\ref{langprops}@{text "(ii)"} that 
+  we know by Property~\ref{langprops}@{text "(ii)"} that 
   @{term "s \<notin> X \<cdot> (A \<up> Suc k)"} since its length is only @{text k}
   (the strings in @{term "X \<cdot> (A \<up> Suc k)"} are all longer). 
   From @{text "(*)"} it follows then that
   @{term s} must be an element in @{term "(\<Union>m\<in>{0..k}. B \<cdot> (A \<up> m))"}. This in turn
-  implies that @{term s} is in @{term "(\<Union>n. B \<cdot> (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} 
+  implies that @{term s} is in @{term "(\<Union>n. B \<cdot> (A \<up> n))"}. Using Property~\ref{langprops}@{text "(iii)"} 
   this is equal to @{term "B \<cdot> A\<star>"}, as we needed to show.
   \end{proof}
 
@@ -601,7 +601,7 @@
   that matches every string in @{text A}.
 
 
-  Our proof of Thm.~\ref{myhillnerodeone} relies on a method that can calculate a
+  Our proof of Theorem~\ref{myhillnerodeone} relies on a method that can calculate a
   regular expression for \emph{every} equivalence class, not just the ones 
   in @{term "finals A"}. We
   first define the notion of \emph{one-character-transition} between 
@@ -733,9 +733,9 @@
   \end{lmm}
 
   \noindent
-  Our proof of Thm.~\ref{myhillnerodeone} will proceed by transforming the
+  Our proof of Theorem~\ref{myhillnerodeone} will proceed by transforming the
   initial equational system into one in \emph{solved form} maintaining the invariant
-  in Lem.~\ref{inv}. From the solved form we will be able to read
+  in Lemma~\ref{inv}. From the solved form we will be able to read
   off the regular expressions. 
 
   In order to transform an equational system into solved form, we have two 
@@ -891,16 +891,15 @@
   \end{center}
 
   \noindent
-  We are not concerned here with the definition of this operator
-  (see Berghofer and Nipkow \cite{BerghoferNipkow00}), but note that we eliminate
-  in each @{const Iter}-step a single equation, and therefore 
-  have a well-founded termination order by taking the cardinality 
-  of the equational system @{text ES}. This enables us to prove
-  properties about our definition of @{const Solve} when we `call' it with
-  the equivalence class @{text X} and the initial equational system 
-  @{term "Init (UNIV // \<approx>A)"} from
+  We are not concerned here with the definition of this operator (see
+  Berghofer and Nipkow \cite{BerghoferNipkow00} for example), but note that we
+  eliminate in each @{const Iter}-step a single equation, and therefore have a
+  well-founded termination order by taking the cardinality of the equational
+  system @{text ES}. This enables us to prove properties about our definition
+  of @{const Solve} when we `call' it with the equivalence class @{text X} and
+  the initial equational system @{term "Init (UNIV // \<approx>A)"} from
   \eqref{initcs} using the principle:
-  %
+
   \begin{equation}\label{whileprinciple}
   \mbox{\begin{tabular}{l}
   @{term "invariant (Init (UNIV // \<approx>A))"} \\
@@ -964,7 +963,7 @@
 
   \begin{proof}
   Finiteness is given by the assumption and the way how we set up the 
-  initial equational system. Soundness is proved in Lem.~\ref{inv}. Distinctness
+  initial equational system. Soundness is proved in Lemma~\ref{inv}. Distinctness
   follows from the fact that the equivalence classes are disjoint. The @{text ardenable}
   property also follows from the setup of the initial equational system, as does 
   validity.
@@ -993,7 +992,7 @@
   \noindent
   Finiteness is straightforward, as the @{const Subst} and @{const Arden} operations 
   keep the equational system finite. These operations also preserve soundness 
-  and distinctness (we proved soundness for @{const Arden} in Lem.~\ref{ardenable}).
+  and distinctness (we proved soundness for @{const Arden} in Lemma~\ref{ardenable}).
   The property @{text ardenable} is clearly preserved because the append-operation
   cannot make a regular expression to match the empty string. Validity is
   given because @{const Arden} removes an equivalence class from @{text yrhs}
@@ -1032,16 +1031,16 @@
 
   \begin{proof} 
   In order to prove this lemma using \eqref{whileprinciple}, we have to use a slightly
-  stronger invariant since Lem.~\ref{iterone} and \ref{itertwo} have the precondition 
+  stronger invariant since Lemma~\ref{iterone} and \ref{itertwo} have the precondition 
   that @{term "(X, rhs) \<in> ES"} for some @{text rhs}. This precondition is needed
   in order to choose in the @{const Iter}-step an equation that is not \mbox{@{term "X = rhs"}}.
   Therefore our invariant cannot be just @{term "invariant ES"}, but must be 
   @{term "invariant ES \<and> (\<exists>rhs. (X, rhs) \<in> ES)"}. By assumption 
-  @{thm (prem 2) Solve} and Lem.~\ref{invzero}, the more general invariant holds for
+  @{thm (prem 2) Solve} and Lemma~\ref{invzero}, the more general invariant holds for
   the initial equational system. This is premise 1 of~\eqref{whileprinciple}.
-  Premise 2 is given by Lem.~\ref{iterone} and the fact that @{const Iter} might
+  Premise 2 is given by Lemma~\ref{iterone} and the fact that @{const Iter} might
   modify the @{text rhs} in the equation @{term "X = rhs"}, but does not remove it.
-  Premise 3 of~\eqref{whileprinciple} is by Lem.~\ref{itertwo}. Now in premise 4
+  Premise 3 of~\eqref{whileprinciple} is by Lemma~\ref{itertwo}. Now in premise 4
   we like to show that there exists a @{text rhs} such that @{term "ES = {(X, rhs)}"}
   and that @{text "invariant {(X, rhs)}"} holds, provided the condition @{text "Cond"}
   does not holds. By the stronger invariant we know there exists such a @{text "rhs"}
@@ -1066,7 +1065,7 @@
   and that the invariant holds for this equation. That means we 
   know @{text "X = \<Union>\<calL> ` rhs"}. We further know that
   this is equal to \mbox{@{text "\<Union>\<calL> ` (Arden X rhs)"}} using the properties of the 
-  invariant and Lem.~\ref{ardenable}. Using the validity property for the equation @{text "X = rhs"},
+  invariant and Lemma~\ref{ardenable}. Using the validity property for the equation @{text "X = rhs"},
   we can infer that @{term "rhss rhs \<subseteq> {X}"} and because the @{text Arden} operation
   removes that @{text X} from @{text rhs}, that @{term "rhss (Arden X rhs) = {}"}.
   This means the right-hand side @{term "Arden X rhs"} can only consist of terms of the form @{term "Lam r"}.
@@ -1075,11 +1074,11 @@
   \end{proof}
 
   \noindent
-  Lem.~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
+  Lemma~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
   of the Myhill-Nerode theorem.
 
-  \begin{proof}[of Thm.~\ref{myhillnerodeone}]
-  By Lem.~\ref{every_eqcl_has_reg} we know that there exists a regular expression for
+  \begin{proof}[Proof of Theorem~\ref{myhillnerodeone}]
+  By Lemma~\ref{every_eqcl_has_reg} we know that there exists a regular expression for
   every equivalence class in @{term "UNIV // \<approx>A"}. Since @{text "finals A"} is
   a subset of  @{term "UNIV // \<approx>A"}, we also know that for every equivalence class
   in @{term "finals A"} there exists a regular expression. Moreover by assumption 
@@ -1175,11 +1174,10 @@
   \end{dfntn}
 
   \noindent
-  For constructing @{text R} will
-  rely on some \emph{tagging-functions} defined over strings. Given the
-  inductive hypothesis, it will be easy to prove that the \emph{range} of
-  these tagging-functions is finite. The range of a function @{text f} is
-  defined as
+  For constructing @{text R}, will rely on some \emph{tagging-functions}
+  defined over strings. Given the inductive hypothesis, it will be easy to
+  prove that the \emph{range} of these tagging-functions is finite. The range
+  of a function @{text f} is defined as
 
   \begin{center}
   @{text "range f \<equiv> f ` UNIV"}
@@ -1225,10 +1223,11 @@
   finite by assumption. Now @{term "f (UNIV // =tag=)"} is a subset of @{text "range f"},
   and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the
   assumptions that @{text "X, Y \<in> "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}.
-  From the assumptions we can obtain @{text "x \<in> X"} and @{text "y \<in> Y"} with 
+  From the assumptions we obtain \mbox{@{text "x \<in> X"}} and @{text "y \<in> Y"} with 
   @{text "tag x = tag y"}. Since @{text x} and @{text y} are tag-related, this in 
   turn means that the equivalence classes @{text X}
-  and @{text Y} must be equal.
+  and @{text Y} must be equal. Therefore \eqref{finiteimageD} allows us to conclude
+  with @{thm (concl) finite_eq_tag_rel}.
   \end{proof}
 
   \begin{lmm}\label{fintwo} 
@@ -1256,7 +1255,7 @@
   \end{proof}
 
   \noindent
-  Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show
+  Chaining Lemma~\ref{finone} and \ref{fintwo} together, means in order to show
   that @{term "UNIV // \<approx>(lang r)"} is finite, we have to construct a tagging-function whose
   range can be shown to be finite and whose tagging-relation refines @{term "\<approx>(lang r)"}.
   Let us attempt the @{const PLUS}-case first. We take as tagging-function 
@@ -1310,7 +1309,7 @@
   and respectively to @{text "x\<^isub>s"} as the \emph{suffix}.
 
 
-  Now assuming  @{term "x @ z \<in> A \<cdot> B"} there are only two possible ways of how to `split' 
+  Now assuming  @{term "x @ z \<in> A \<cdot> B"}, there are only two possible ways of how to `split' 
   this string to be in @{term "A \<cdot> B"}:
   %
   \begin{center}
@@ -1391,7 +1390,7 @@
   \end{center}
 
   \noindent
-  We have to make the assumption for all suffixes @{text "x\<^isub>s"}, since we do 
+  Note that we have to make the assumption for all suffixes @{text "x\<^isub>s"}, since we do 
   not know anything about how the string @{term x} is partitioned.
   With this definition in place, let us prove the @{const "Times"}-case.
 
@@ -1502,7 +1501,7 @@
   There are potentially many such prefixes, but there can only be finitely many of them (the 
   string @{text x} is finite). Let us therefore choose the longest one and call it 
   @{text "x\<^bsub>pmax\<^esub>"}. Now for the rest of the string @{text "x\<^isub>s @ z"} we
-  know it is in @{term "A\<star>"} and cannot be the empty string. By Prop.~\ref{langprops}@{text "(iv)"}, 
+  know it is in @{term "A\<star>"} and cannot be the empty string. By Property~\ref{langprops}@{text "(iv)"}, 
   we can separate
   this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<noteq> []"}, @{text "a \<in> A"}
   and @{term "b \<in> A\<star>"}. Now @{text a} must be strictly longer than @{text "x\<^isub>s"},
@@ -1616,16 +1615,19 @@
   \end{equation}
 
   \noindent
-  where @{text "\<Delta>"} in the fifth line is a function that tests whether the empty string
-  is in the language and returns @{term "{[]}"} or @{term "{}"}, accordingly.
-  The only interesting case is the last one where we use Prop.~\ref{langprops}@{text "(i)"}
-  in order to infer that @{term "Der c (A\<star>) = Der c (A \<cdot> A\<star>)"}. We can 
-  then complete the proof by noting that @{term "Delta A \<cdot> Der c (A\<star>) \<subseteq> (Der c A) \<cdot> A\<star>"}.
-  
-  Brzozowski observed that the left-quotients for languages of regular
-  expressions can be calculated directly using the notion of \emph{derivatives
-  of a regular expression} \cite{Brzozowski64}. We define this notion in 
-  Isabelle/HOL as follows:
+  where @{text "\<Delta>"} in the fifth line is a function that tests whether the
+  empty string is in the language and returns @{term "{[]}"} or @{term "{}"},
+  accordingly.  In the last equation we use the list-cons operator written
+  \mbox{@{text "_ :: _"}}.  The only interesting case is the @{text "A\<star>"}-case
+  where we use Property~\ref{langprops}@{text "(i)"} in order to infer that
+  @{term "Der c (A\<star>) = Der c (A \<cdot> A\<star>)"}. We can then complete the proof by
+  using the fifth equation and noting that @{term "Delta A \<cdot> Der c (A\<star>) \<subseteq> (Der
+  c A) \<cdot> A\<star>"}. 
+
+  Brzozowski observed that the left-quotients for languages of
+  regular expressions can be calculated directly using the notion of
+  \emph{derivatives of a regular expression} \cite{Brzozowski64}. We define
+  this notion in Isabelle/HOL as follows:
 
   \begin{center}
   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
@@ -1646,8 +1648,7 @@
   \end{center}
 
   \noindent
-  The last two clauses extend derivatives from characters to strings---i.e.~list of
-  characters. The list-cons operator is written \mbox{@{text "_ :: _"}}. The
+  The last two clauses extend derivatives from characters to strings. The
   boolean function @{term "nullable r"} needed in the @{const Times}-case tests
   whether a regular expression can recognise the empty string. It can be defined as 
   follows.
@@ -1672,7 +1673,7 @@
   \noindent
   By induction on the regular expression @{text r}, respectively on the string @{text s}, 
   one can easily show that left-quotients and derivatives of regular expressions 
-  relate as follows (for example \cite{Sakarovitch09}):
+  relate as follows (see for example~\cite{Sakarovitch09}):
 
   \begin{equation}\label{Dersders}
   \mbox{\begin{tabular}{c}
@@ -1703,7 +1704,7 @@
   classes. Unfortunately, this is not true in general. Sakarovitch gives an
   example where a regular expression has infinitely many derivatives
   w.r.t.~the language \mbox{@{term "({a} \<cdot> {b})\<star> \<union> ({a} \<cdot> {b})\<star> \<cdot> {a}"}}
-  \cite[Page~141]{Sakarovitch09}.
+  (see \cite[Page~141]{Sakarovitch09}).
 
 
   What Brzozowski \cite{Brzozowski64} established is that for every language there
@@ -1783,7 +1784,7 @@
   taking the partial derivatives of the
   regular expressions in \eqref{ACI} gives us in each case
   equal sets.  Antimirov \cite{Antimirov95} showed a similar result to
-  \eqref{Dersders} for partial derivatives:
+  \eqref{Dersders} for partial derivatives, namely
 
   \begin{equation}\label{Derspders}
   \mbox{\begin{tabular}{lc}
@@ -1855,7 +1856,7 @@
   Antimirov's argument first shows this theorem for the language @{term UNIV1}, 
   which is the set of all non-empty strings. For this he proves:
 
-  \begin{equation}
+  \begin{equation}\label{pdersunivone}
   \mbox{\begin{tabular}{l}
   @{thm pders_lang_Zero}\\
   @{thm pders_lang_One}\\
@@ -1883,10 +1884,11 @@
   \noindent
   and for all languages @{text "A"}, @{thm pders_lang_subset[where B="UNIV",
   simplified]} holds.  Since we follow Antimirov's proof quite closely in our
-  formalisation, we omit the details.
+  formalisation (only the last two cases of \eqref{pdersunivone} involve some
+  non-routine induction argument), we omit the details.
 
 
-  Let us return to our proof of the second direction in the Myhill-Nerode
+  Let us now return to our proof about the second direction in the Myhill-Nerode
   theorem. The point of the above calculations is to use 
   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. 
 
@@ -1915,7 +1917,8 @@
   Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"},
   which we know is finite by Theorem~\ref{antimirov}. This means there 
   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"},
-  and we can again conclude the second part of the Myhill-Nerode theorem.
+  which refines @{term "\<approx>(lang r)"}, and consequently we can again conclude the 
+  second part of the Myhill-Nerode theorem.
   \end{proof}
 *}
 
@@ -1929,7 +1932,7 @@
   More interesting is the closure under complement, because it seems difficult
   to construct a regular expression for the complement language by direct
   means. However the existence of such a regular expression can now be easily
-  proved using the Myhill-Nerode theorem since
+  proved using both parts of the Myhill-Nerode theorem, since
 
   \begin{center}
   @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
@@ -1939,7 +1942,7 @@
   holds for any strings @{text "s\<^isub>1"} and @{text
   "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"}
   give rise to the same partitions. So if one is finite, the other is too, and
-  the other way around. Proving the existence of such a regular expression via
+  vice versa. Proving the existence of such a regular expression via
   automata using the standard method would be quite involved. It includes the
   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
@@ -2004,7 +2007,7 @@
   a regular expression @{text r} such that @{term "A = lang r"}. We also know
   that @{term "pders_lang B r"} is finite for every language @{text B} and 
   regular expression @{text r} (recall Theorem~\ref{antimirov}). By definition 
-  and Lemma~\ref{Derspders} therefore
+  and \eqref{Derspders} therefore
 
   
   \begin{equation}\label{eqq}
@@ -2013,8 +2016,8 @@
 
   \noindent
   Since there are only finitely many regular expressions in @{term "pders_lang
-  B r"}, we know by \eqref{uplus} that there exists a regular expression that
-  the right-hand side of \eqref{eqq} is equal to \mbox{@{term "lang (\<Uplus>(pders_lang B
+  B r"}, we know by \eqref{uplus} that there exists a regular expression so that
+  the right-hand side of \eqref{eqq} is equal to the language \mbox{@{term "lang (\<Uplus>(pders_lang B
   r))"}}. Thus the regular expression @{term "\<Uplus>(pders_lang B r)"} verifies that
   @{term "Ders_lang B A"} is regular.
 *}
@@ -2040,7 +2043,7 @@
   far. Using this theorem we can obviously prove when a language is \emph{not}
   regular---by establishing that it has infinitely many equivalence classes
   generated by the Myhill-Nerode relation (this is usually the purpose of the
-  pumping lemma \cite{Kozen97}).  We can also use it to establish the standard
+  Pumping Lemma \cite{Kozen97}).  We can also use it to establish the standard
   textbook results about closure properties of regular languages. Interesting
   is the case of closure under complement, because it seems difficult to
   construct a regular expression for the complement language by direct
@@ -2048,13 +2051,13 @@
   proved using the Myhill-Nerode theorem.  
 
   Our insistence on regular expressions for proving the Myhill-Nerode theorem
-  arose from the limitations of HOL on which the popular theorem provers HOL4,
+  arose from the limitations of HOL, on which the popular theorem provers HOL4,
   HOLlight and Isabelle/HOL are based. In order to guarantee consistency,
-  formalisations can only extend HOL by definitions that introduce a notion in
-  terms of already existing concepts. A convenient definition for automata
+  formalisations can only extend HOL by definitions that introduce a new concept in
+  terms of already existing notions. A convenient definition for automata
   (based on graphs) uses a polymorphic type for the state nodes. This allows
   us to use the standard operation of disjoint union in order to compose two
-  automata. Unfortunately, we cannot use such a polymorphic definition of
+  automata. Unfortunately, we cannot use such a polymorphic definition
   in HOL as part of the definition for regularity of a language (a
   set of strings).  Consider the following attempt
 
@@ -2065,17 +2068,18 @@
   \noindent
   which means the definiens is polymorphic in the type of the automata @{text
   "M"}, but the definiendum @{text "is_regular"} is not. Such definitions are
-  excluded in HOL, because they lead easily to inconsistencies (see
+  excluded in HOL, because they can lead easily to inconsistencies (see
   \cite{PittsHOL4} for a simple example). Also HOL does not contain
   type-quantifiers which would allow us to get rid of the polymorphism by
   quantifying over the type-variable @{text "\<alpha>"}. Therefore when defining
-  regularity in terms of automata, the only natural way out in HOL is to use state
-  nodes with an identity, for example a natural number. Unfortunatly, the
-  consequence is that we have to be careful when combining two automata so
+  regularity in terms of automata, the only natural way out in HOL is to use
+  state nodes with an identity, for example a natural number. Unfortunatly,
+  the consequence is that we have to be careful when combining two automata so
   that there is no clash between two such states. This makes formalisations
-  quite fiddly and unpleasant. Regular expressions proved much more convenient
-  for reasoning in HOL and we showed they can be used for establishing the
-  Myhill-Nerode theorem.
+  quite fiddly and rather unpleasant. Regular expressions proved much more
+  convenient for reasoning in HOL and we showed they can be used for
+  establishing the central result in regular language theory: the Myhill-Nerode 
+  theorem.
 
   While regular expressions are convenient, they have some limitations. One is
   that there seems to be no method of calculating a minimal regular expression
@@ -2086,32 +2090,6 @@
   been formally established, we refer the reader to Owens and Slind
   \cite{OwensSlind08}.
 
-  Our formalisation consists of 780 lines of Isabelle/Isar code for the first
-  direction and 460 for the second, plus around 300 lines of standard material
-  about regular languages. The formalisation about derivatives and partial
-  derivatives shown in Section~\ref{derivatives} consists of 390 lines of
-  code.  The algorithm for solving equational systems, which we used in the
-  first direction, is conceptually not that complicated. Still the use of sets
-  over which the algorithm operates, means it is not as easy to formalise as
-  one might wish. It seems sets cannot be avoided since the `input' of the
-  algorithm consists of equivalence classes and we cannot see how to
-  reformulate the theory so that we can use lists, which are usually easier to
-  reason about in a theorem prover.
-
-  While our formalisation might be seen large, it should be seen
-  in the context of the work done by Constable at al \cite{Constable00} who
-  formalised the Myhill-Nerode theorem in Nuprl using automata. They write
-  that their four-member team needed something on the magnitude of 18 months
-  for their formalisation. The estimate for our formalisation is that we
-  needed approximately 3 months and this included the time to find our proof
-  arguments. Unlike Constable et al, who were able to follow the proofs from
-  \cite{HopcroftUllman69}, we had to find our own arguments.  So for us the
-  formalisation was not the bottleneck. It is hard to gauge the size of a
-  formalisation in Nurpl, but from what is shown in the Nuprl Math Library
-  about their development it seems substantially larger than ours. The code of
-  ours can be found in the Mercurial Repository at
-  \mbox{\url{http://www4.in.tum.de/~urbanc/regexp.html}}.
-
   Our proof of the first direction is very much inspired by \emph{Brzozowski's
   algebraic method} used to convert a finite automaton to a regular expression
   \cite{Brzozowski64}. The close connection can be seen by considering the
@@ -2130,31 +2108,63 @@
 
   We presented two proofs for the second direction of the Myhill-Nerode
   theorem. One direct proof using tagging-functions and another using partial
-  derivatives. These proofs is where our method shines, because we can
-  completely side-step the standard argument \cite{Kozen97} where automata
-  need to be composed. However, it is also the direction where we had to spend
-  most of the `conceptual' time, as our first proof based on
-  tagging-functions is new for establishing the Myhill-Nerode theorem. All
-  standard proofs of this direction proceed by arguments over automata.
+  derivatives. This part of our work is where our method using regular
+  expressions shines, because we can completely side-step the standard
+  argument \cite{Kozen97} where automata need to be composed. However, it is
+  also the direction where we had to spend most of the `conceptual' time, as
+  our first proof based on tagging-functions is new for establishing the
+  Myhill-Nerode theorem. All standard proofs of this direction proceed by
+  arguments over automata.
   
-  Our indirect proof for the second direction arose from the interest in
+  The indirect proof for the second direction arose from our interest in
   Brzozowski's derivatives for regular expression matching. A corresponding
-  regular expression matcher has been formalised in HOL4 in
-  \cite{OwensSlind08}. In our opinion, this formalisation is considerably
+  regular expression matcher has been formalised by Owens and Slind in HOL4
+  \cite{OwensSlind08}. In our opinion, their formalisation is considerably
   slicker than for example the approach to regular expression matching taken
-  in \cite{Harper99} and \cite{Yi06}. While Brzozowski's derivatives lead to
-  simple regular expression matchers and he proved that there are only
+  in \cite{Harper99} and \cite{Yi06}. While Brzozowski's derivatives lead to a
+  simple regular expression matcher and he established that there are only
   finitely many dissimilar derivatives for every regular expression, this
   result is not as straightforward to formalise in a theorem prover. The
   reason is that the set of dissimilar derivatives is not defined inductively,
-  but in terms of an ACI-equivalence relation.
-
-  
+  but in terms of an ACI-equivalence relation. This difficulty prevented for
+  example Krauss and Nipkow to prove termination of their equivalence checker
+  for regular expressions \cite{KraussNipkow11}. Their checker is based on
+  derivatives and for their argument the lack of a formal proof of termination
+  is not crucial (it merely lets them ``sleep better'' \cite{KraussNipkow11}).
+  We expect that their development simplifies by using partial derivatives,
+  instead of derivatives, and that termination of the algorithm can be
+  formally established. However, since partial derivatives use sets of regular
+  expressions, one needs to carefully analyse whether the resulting algorithm
+  is still executable. Given the existing infrastructure for executable sets
+  in Isabelle/HOL, it should.
 
-  \medskip
+  Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
+  Isabelle/Isar code for the first direction and 460 for the second (the one
+  based on tagging functions), plus around 300 lines of standard material
+  about regular languages. The formalisation about derivatives and partial
+  derivatives shown in Section~\ref{derivatives} consists of 390 lines of
+  code.  The algorithm for solving equational systems, which we used in the
+  first direction, is conceptually relatively simple. Still the use of sets
+  over which the algorithm operates means it is not as easy to formalise as
+  one might hope. However, it seems sets cannot be avoided since the `input'
+  of the algorithm consists of equivalence classes and we cannot see how to
+  reformulate the theory so that we can use lists. Lists would be much easier
+  to reason about, since we can define function over them by recursion. For
+  sets we have to use set-comprehensions.
 
-  We expect that the development of Krauss \& Nipkow gets easier by
-  using partial derivatives.\medskip
+  While our formalisation might be seen large, it should be seen
+  in the context of the work done by Constable at al \cite{Constable00} who
+  formalised the Myhill-Nerode theorem in Nuprl using automata. They write
+  that their four-member team needed something on the magnitude of 18 months
+  for their formalisation. The estimate for our formalisation is that we
+  needed approximately 3 months and this included the time to find our proof
+  arguments. Unlike Constable et al, who were able to follow the proofs from
+  \cite{HopcroftUllman69}, we had to find our own arguments.  So for us the
+  formalisation was not the bottleneck. It is hard to gauge the size of a
+  formalisation in Nurpl, but from what is shown in the Nuprl Math Library
+  about their development it seems substantially larger than ours. The code of
+  ours can be found in the Mercurial Repository at
+  \mbox{\url{http://www4.in.tum.de/~urbanc/regexp.html}}.\medskip
   
   \noindent
   {\bf Acknowledgements:}