diff -r 13de6a49294e -r 17aa8c8fbe7d Journal/Paper.thy
--- a/Journal/Paper.thy	Mon Sep 05 14:15:32 2011 +0000
+++ b/Journal/Paper.thy	Mon Sep 05 15:42:29 2011 +0000
@@ -89,8 +89,8 @@
   pderiv ("pder") and
   pderivs ("pders") and
   pderivs_set ("pderss") and
-  SUBSEQ ("Subseq") and
-  SUPSEQ ("Supseq") and
+  SUBSEQ ("Sub") and
+  SUPSEQ ("Sup") and
   UP ("'(_')\<up>") and
   ALLS ("ALL")
   
@@ -425,7 +425,7 @@
   establish the usual closure properties, including complementation, for
   regular languages. We also use in one example the continuation lemma, which
   is based on Myhill-Nerode, for establishing non-regularity of languages 
-  \cite{rosenberg06}.\medskip
+  \cite{Rosenberg06}.\medskip
   
   \noindent 
   {\bf Contributions:} There is an extensive literature on regular languages.
@@ -2015,7 +2015,7 @@
   The beauty of regular languages is that they are closed under many set
   operations. Closure under union, concatenation and Kleene-star are trivial
   to establish given our definition of regularity (recall Definition~\ref{regular}).
-  More interesting is the closure under complement, because it seems difficult
+  More interesting in our setting is the closure under complement, because it seems difficult
   to construct a regular expression for the complement language by direct
   means. However the existence of such a regular expression can now be easily
   proved using both parts of the Myhill-Nerode theorem, since
@@ -2029,8 +2029,8 @@
   "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"}
   give rise to the same partitions. So if one is finite, the other is too, and
   vice versa. As noted earlier, our algorithm for solving equational systems
-  actually calculates the regular expression. Calculating this regular expression 
-  via
+  actually calculates the regular expression for the complement language. 
+  Calculating this regular expression via
   automata using the standard method would be quite involved. It includes the
   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
@@ -2112,7 +2112,7 @@
 
   Even more surprising is the fact that for \emph{every} language @{text A}, the language
   consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. 
-  A substring can be obtained
+  A \emph{substring} can be obtained
   by striking out zero or more characters from a string. This can be defined 
   inductively in Isabelle/HOL by the following three rules:
 
@@ -2144,8 +2144,8 @@
 
   \noindent
   Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use
-  Higman's Lemma, which is already proved in the Isabelle/HOL library. Higman's
-  Lemma allows us to infer that every set @{text A} of antichains, namely
+  Higman's Lemma, which is already proved in the Isabelle/HOL library \cite{Berghofer03}. 
+  Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying
 
   \begin{equation}\label{higman}
   @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
@@ -2154,7 +2154,8 @@
   \noindent
   is finite.
 
-  The first step in our proof is to establish the following properties for @{term SUPSEQ}
+  The first step in our proof of Lemma~\ref{subseqreg} is to establish the 
+  following properties for @{term SUPSEQ}
 
   \begin{equation}\label{supseqprops}
   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
@@ -2177,7 +2178,7 @@
 
   \begin{proof}
   Since our alphabet is finite, we have a regular expression, written @{text ALL}, that
-  matches every string. With this regular expression we can inductively define
+  matches every string. Using this regular expression we can inductively define
   the operation @{text "r\<up>"} 
 
   \begin{center}
@@ -2221,7 +2222,7 @@
   that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable, 
   except with itself.
   It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For
-  the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we can obtain 
+  the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we obtain 
   a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that
   the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
   be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, 
@@ -2229,7 +2230,7 @@
   given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
   for reasoning about well-foundedness). Since @{term "z"} is
   minimal and an element in @{term A}, we also know that @{term z} is in @{term M}.
-  From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in 
+  From this together with \mbox{@{term "z \<preceq> x"}}, we can infer that @{term x} is in 
   @{term "SUPSEQ M"}, as required.
   \end{proof}
 
@@ -2259,9 +2260,48 @@
   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
   must be regular. 
   \end{proof}
+
+  Finally we like to show that the Myhill-Nerode theorem is also convenient for establishing 
+  non-regularity of languages. For this we use the following version of the Continuation
+  Lemma (see for example~\cite{Rosenberg06}).
+
+  \begin{lmm}[Continuation Lemma]
+  If the language @{text A} is regular and the set @{text B} is infinite,
+  then there exist two distinct strings @{text x} and @{text y} in @{text B} 
+  such that @{term "x \<approx>A y"}.
+  \end{lmm}
+
+  \noindent
+  This lemma can be easily deduced from the Myhill-Nerode theorem and the Pigeonhole
+  Principle: Since @{text A} is regular, there can be only finitely many 
+  equivalence classes by the Myhill-Nerode relation. Hence an infinite set must contain 
+  at least two strings that are in the same equivalence class, that is
+  they need to be related by the Myhill-Nerode relation.
+
+  Using this lemma, it is straightforward to establish that the language 
+  \mbox{@{text "A \<equiv> \<Union>\<^isub>n a\<^sup>n @ b\<^sup>n"}}, where @{text "a\<^sup>n"} stands
+  for the strings consisting of @{text n} times the character a, is not
+  regular. For this consider the infinite set @{text "B \<equiv> \<Union>\<^isub>n a\<^sup>n"}.
+
+  \begin{lmm}
+  No two distinct strings in @{text "B"} are Myhill-Nerode related by @{text A}.
+  \end{lmm} 
+
+  \begin{proof}
+  After unfolding the definitions, we need to establish that for @{term "i \<noteq> j"},
+  the equality \mbox{@{text "a\<^sup>i @ b\<^sup>j = a\<^sup>n @ b\<^sup>n"}} leads to a contradiction. This is clearly the case
+  if we test that the two strings have the same amount of @{text a}'s and @{text b}'s;
+  the string on the right-hand side satisfies this property, but not the one on
+  the left-hand side. Therefore the strings cannot be equal and we have a contradiction.
+  \end{proof}
+
+  \noindent
+  To conclude, this lemma and the Continuation Lemma leads to a contradiction assuming @{text A}
+  is regular. Therefore the language @{text A} is not regular, as we wanted to show.
 *}
 
 
+
 section {* Conclusion and Related Work *}
 
 text {*