diff -r 13de6a49294e -r 17aa8c8fbe7d Journal/Paper.thy --- a/Journal/Paper.thy Mon Sep 05 14:15:32 2011 +0000 +++ b/Journal/Paper.thy Mon Sep 05 15:42:29 2011 +0000 @@ -89,8 +89,8 @@ pderiv ("pder") and pderivs ("pders") and pderivs_set ("pderss") and - SUBSEQ ("Subseq") and - SUPSEQ ("Supseq") and + SUBSEQ ("Sub") and + SUPSEQ ("Sup") and UP ("'(_')\") and ALLS ("ALL") @@ -425,7 +425,7 @@ establish the usual closure properties, including complementation, for regular languages. We also use in one example the continuation lemma, which is based on Myhill-Nerode, for establishing non-regularity of languages - \cite{rosenberg06}.\medskip + \cite{Rosenberg06}.\medskip \noindent {\bf Contributions:} There is an extensive literature on regular languages. @@ -2015,7 +2015,7 @@ The beauty of regular languages is that they are closed under many set operations. Closure under union, concatenation and Kleene-star are trivial to establish given our definition of regularity (recall Definition~\ref{regular}). - More interesting is the closure under complement, because it seems difficult + More interesting in our setting is the closure under complement, because it seems difficult to construct a regular expression for the complement language by direct means. However the existence of such a regular expression can now be easily proved using both parts of the Myhill-Nerode theorem, since @@ -2029,8 +2029,8 @@ "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"} give rise to the same partitions. So if one is finite, the other is too, and vice versa. As noted earlier, our algorithm for solving equational systems - actually calculates the regular expression. Calculating this regular expression - via + actually calculates the regular expression for the complement language. + Calculating this regular expression via automata using the standard method would be quite involved. It includes the steps: regular expression @{text "\"} non-deterministic automaton @{text "\"} deterministic automaton @{text "\"} complement automaton @{text "\"} @@ -2112,7 +2112,7 @@ Even more surprising is the fact that for \emph{every} language @{text A}, the language consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. - A substring can be obtained + A \emph{substring} can be obtained by striking out zero or more characters from a string. This can be defined inductively in Isabelle/HOL by the following three rules: @@ -2144,8 +2144,8 @@ \noindent Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use - Higman's Lemma, which is already proved in the Isabelle/HOL library. Higman's - Lemma allows us to infer that every set @{text A} of antichains, namely + Higman's Lemma, which is already proved in the Isabelle/HOL library \cite{Berghofer03}. + Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying \begin{equation}\label{higman} @{text "\x, y \ A."}~@{term "x \ y \ \(x \ y) \ \(y \ x)"} @@ -2154,7 +2154,8 @@ \noindent is finite. - The first step in our proof is to establish the following properties for @{term SUPSEQ} + The first step in our proof of Lemma~\ref{subseqreg} is to establish the + following properties for @{term SUPSEQ} \begin{equation}\label{supseqprops} \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} @@ -2177,7 +2178,7 @@ \begin{proof} Since our alphabet is finite, we have a regular expression, written @{text ALL}, that - matches every string. With this regular expression we can inductively define + matches every string. Using this regular expression we can inductively define the operation @{text "r\"} \begin{center} @@ -2221,7 +2222,7 @@ that @{term "M \ {x \ A. minimal x A}"} is finite, since every minimal element is incomparable, except with itself. It is also straightforward to show that @{term "SUPSEQ M \ SUPSEQ A"}. For - the other direction we have @{term "x \ SUPSEQ A"}. From this we can obtain + the other direction we have @{term "x \ SUPSEQ A"}. From this we obtain a @{text y} such that @{term "y \ A"} and @{term "y \ x"}. Since we know that the relation \mbox{@{term "{(y, x). y \ x \ x \ y}"}} is well-founded, there must be a minimal element @{text "z"} such that @{term "z \ A"} and @{term "z \ y"}, @@ -2229,7 +2230,7 @@ given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure for reasoning about well-foundedness). Since @{term "z"} is minimal and an element in @{term A}, we also know that @{term z} is in @{term M}. - From this together with @{term "z \ x"}, we can infer that @{term x} is in + From this together with \mbox{@{term "z \ x"}}, we can infer that @{term x} is in @{term "SUPSEQ M"}, as required. \end{proof} @@ -2259,9 +2260,48 @@ we established already that regularity is preserved under complement, also @{term "SUBSEQ A"} must be regular. \end{proof} + + Finally we like to show that the Myhill-Nerode theorem is also convenient for establishing + non-regularity of languages. For this we use the following version of the Continuation + Lemma (see for example~\cite{Rosenberg06}). + + \begin{lmm}[Continuation Lemma] + If the language @{text A} is regular and the set @{text B} is infinite, + then there exist two distinct strings @{text x} and @{text y} in @{text B} + such that @{term "x \A y"}. + \end{lmm} + + \noindent + This lemma can be easily deduced from the Myhill-Nerode theorem and the Pigeonhole + Principle: Since @{text A} is regular, there can be only finitely many + equivalence classes by the Myhill-Nerode relation. Hence an infinite set must contain + at least two strings that are in the same equivalence class, that is + they need to be related by the Myhill-Nerode relation. + + Using this lemma, it is straightforward to establish that the language + \mbox{@{text "A \ \\<^isub>n a\<^sup>n @ b\<^sup>n"}}, where @{text "a\<^sup>n"} stands + for the strings consisting of @{text n} times the character a, is not + regular. For this consider the infinite set @{text "B \ \\<^isub>n a\<^sup>n"}. + + \begin{lmm} + No two distinct strings in @{text "B"} are Myhill-Nerode related by @{text A}. + \end{lmm} + + \begin{proof} + After unfolding the definitions, we need to establish that for @{term "i \ j"}, + the equality \mbox{@{text "a\<^sup>i @ b\<^sup>j = a\<^sup>n @ b\<^sup>n"}} leads to a contradiction. This is clearly the case + if we test that the two strings have the same amount of @{text a}'s and @{text b}'s; + the string on the right-hand side satisfies this property, but not the one on + the left-hand side. Therefore the strings cannot be equal and we have a contradiction. + \end{proof} + + \noindent + To conclude, this lemma and the Continuation Lemma leads to a contradiction assuming @{text A} + is regular. Therefore the language @{text A} is not regular, as we wanted to show. *} + section {* Conclusion and Related Work *} text {*