--- a/Journal/Paper.thy Mon Sep 05 14:15:32 2011 +0000
+++ b/Journal/Paper.thy Mon Sep 05 15:42:29 2011 +0000
@@ -89,8 +89,8 @@
pderiv ("pder") and
pderivs ("pders") and
pderivs_set ("pderss") and
- SUBSEQ ("Subseq") and
- SUPSEQ ("Supseq") and
+ SUBSEQ ("Sub") and
+ SUPSEQ ("Sup") and
UP ("'(_')\<up>") and
ALLS ("ALL")
@@ -425,7 +425,7 @@
establish the usual closure properties, including complementation, for
regular languages. We also use in one example the continuation lemma, which
is based on Myhill-Nerode, for establishing non-regularity of languages
- \cite{rosenberg06}.\medskip
+ \cite{Rosenberg06}.\medskip
\noindent
{\bf Contributions:} There is an extensive literature on regular languages.
@@ -2015,7 +2015,7 @@
The beauty of regular languages is that they are closed under many set
operations. Closure under union, concatenation and Kleene-star are trivial
to establish given our definition of regularity (recall Definition~\ref{regular}).
- More interesting is the closure under complement, because it seems difficult
+ More interesting in our setting is the closure under complement, because it seems difficult
to construct a regular expression for the complement language by direct
means. However the existence of such a regular expression can now be easily
proved using both parts of the Myhill-Nerode theorem, since
@@ -2029,8 +2029,8 @@
"s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"}
give rise to the same partitions. So if one is finite, the other is too, and
vice versa. As noted earlier, our algorithm for solving equational systems
- actually calculates the regular expression. Calculating this regular expression
- via
+ actually calculates the regular expression for the complement language.
+ Calculating this regular expression via
automata using the standard method would be quite involved. It includes the
steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
"\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
@@ -2112,7 +2112,7 @@
Even more surprising is the fact that for \emph{every} language @{text A}, the language
consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}.
- A substring can be obtained
+ A \emph{substring} can be obtained
by striking out zero or more characters from a string. This can be defined
inductively in Isabelle/HOL by the following three rules:
@@ -2144,8 +2144,8 @@
\noindent
Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use
- Higman's Lemma, which is already proved in the Isabelle/HOL library. Higman's
- Lemma allows us to infer that every set @{text A} of antichains, namely
+ Higman's Lemma, which is already proved in the Isabelle/HOL library \cite{Berghofer03}.
+ Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying
\begin{equation}\label{higman}
@{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
@@ -2154,7 +2154,8 @@
\noindent
is finite.
- The first step in our proof is to establish the following properties for @{term SUPSEQ}
+ The first step in our proof of Lemma~\ref{subseqreg} is to establish the
+ following properties for @{term SUPSEQ}
\begin{equation}\label{supseqprops}
\mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
@@ -2177,7 +2178,7 @@
\begin{proof}
Since our alphabet is finite, we have a regular expression, written @{text ALL}, that
- matches every string. With this regular expression we can inductively define
+ matches every string. Using this regular expression we can inductively define
the operation @{text "r\<up>"}
\begin{center}
@@ -2221,7 +2222,7 @@
that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable,
except with itself.
It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For
- the other direction we have @{term "x \<in> SUPSEQ A"}. From this we can obtain
+ the other direction we have @{term "x \<in> SUPSEQ A"}. From this we obtain
a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that
the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"},
@@ -2229,7 +2230,7 @@
given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
for reasoning about well-foundedness). Since @{term "z"} is
minimal and an element in @{term A}, we also know that @{term z} is in @{term M}.
- From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in
+ From this together with \mbox{@{term "z \<preceq> x"}}, we can infer that @{term x} is in
@{term "SUPSEQ M"}, as required.
\end{proof}
@@ -2259,9 +2260,48 @@
we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
must be regular.
\end{proof}
+
+ Finally we like to show that the Myhill-Nerode theorem is also convenient for establishing
+ non-regularity of languages. For this we use the following version of the Continuation
+ Lemma (see for example~\cite{Rosenberg06}).
+
+ \begin{lmm}[Continuation Lemma]
+ If the language @{text A} is regular and the set @{text B} is infinite,
+ then there exist two distinct strings @{text x} and @{text y} in @{text B}
+ such that @{term "x \<approx>A y"}.
+ \end{lmm}
+
+ \noindent
+ This lemma can be easily deduced from the Myhill-Nerode theorem and the Pigeonhole
+ Principle: Since @{text A} is regular, there can be only finitely many
+ equivalence classes by the Myhill-Nerode relation. Hence an infinite set must contain
+ at least two strings that are in the same equivalence class, that is
+ they need to be related by the Myhill-Nerode relation.
+
+ Using this lemma, it is straightforward to establish that the language
+ \mbox{@{text "A \<equiv> \<Union>\<^isub>n a\<^sup>n @ b\<^sup>n"}}, where @{text "a\<^sup>n"} stands
+ for the strings consisting of @{text n} times the character a, is not
+ regular. For this consider the infinite set @{text "B \<equiv> \<Union>\<^isub>n a\<^sup>n"}.
+
+ \begin{lmm}
+ No two distinct strings in @{text "B"} are Myhill-Nerode related by @{text A}.
+ \end{lmm}
+
+ \begin{proof}
+ After unfolding the definitions, we need to establish that for @{term "i \<noteq> j"},
+ the equality \mbox{@{text "a\<^sup>i @ b\<^sup>j = a\<^sup>n @ b\<^sup>n"}} leads to a contradiction. This is clearly the case
+ if we test that the two strings have the same amount of @{text a}'s and @{text b}'s;
+ the string on the right-hand side satisfies this property, but not the one on
+ the left-hand side. Therefore the strings cannot be equal and we have a contradiction.
+ \end{proof}
+
+ \noindent
+ To conclude, this lemma and the Continuation Lemma leads to a contradiction assuming @{text A}
+ is regular. Therefore the language @{text A} is not regular, as we wanted to show.
*}
+
section {* Conclusion and Related Work *}
text {*