regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:13de6a49294e
+:17aa8c8fbe7d
 deriv ("der") and
 derivs ("ders") and
 pderiv ("pder") and
 pderivs ("pders") and
 pderivs_set ("pderss") and
-SUBSEQ ("Subseq") and
+SUBSEQ ("Sub") and
-SUPSEQ ("Supseq") and
+SUPSEQ ("Sup") and
 UP ("'(_')\<up>") and
 ALLS ("ALL")
 lemmas Deriv_simps = Deriv_empty Deriv_epsilon Deriv_char Deriv_union
 regular expressions. This theorem gives necessary and sufficient conditions
 for when a language is regular. As a corollary of this theorem we can easily
 establish the usual closure properties, including complementation, for
 regular languages. We also use in one example the continuation lemma, which
 is based on Myhill-Nerode, for establishing non-regularity of languages
-\cite{rosenberg06}.\medskip
+\cite{Rosenberg06}.\medskip
 \noindent
 {\bf Contributions:} There is an extensive literature on regular languages.
 To our best knowledge, our proof of the Myhill-Nerode theorem is the first
 that is based on regular expressions, only. The part of this theorem stating
 text {*
 \noindent
 The beauty of regular languages is that they are closed under many set
 operations. Closure under union, concatenation and Kleene-star are trivial
 to establish given our definition of regularity (recall Definition~\ref{regular}).
-More interesting is the closure under complement, because it seems difficult
+More interesting in our setting is the closure under complement, because it seems difficult
 to construct a regular expression for the complement language by direct
 means. However the existence of such a regular expression can now be easily
 proved using both parts of the Myhill-Nerode theorem, since
 \begin{center}
 \noindent
 holds for any strings @{text "s\<^isub>1"} and @{text
 "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"}
 give rise to the same partitions. So if one is finite, the other is too, and
 vice versa. As noted earlier, our algorithm for solving equational systems
-actually calculates the regular expression. Calculating this regular expression
+actually calculates the regular expression for the complement language.
-via
+Calculating this regular expression via
 automata using the standard method would be quite involved. It includes the
 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
 regular expression. Clearly not something you want to formalise in a theorem
 prover in which it is cumbersome to reason about automata.
 r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that
 @{term "Deriv_lang B A"} is regular.
 Even more surprising is the fact that for \emph{every} language @{text A}, the language
 consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}.
-A substring can be obtained
+A \emph{substring} can be obtained
 by striking out zero or more characters from a string. This can be defined
 inductively in Isabelle/HOL by the following three rules:
 \begin{center}
 @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm}
 are regular.
 \end{lmm}
 \noindent
 Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use
-Higman's Lemma, which is already proved in the Isabelle/HOL library. Higman's
+Higman's Lemma, which is already proved in the Isabelle/HOL library \cite{Berghofer03}.
-Lemma allows us to infer that every set @{text A} of antichains, namely
+Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying
 \begin{equation}\label{higman}
 @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
 \end{equation}
 \noindent
 is finite.
-The first step in our proof is to establish the following properties for @{term SUPSEQ}
+The first step in our proof of Lemma~\ref{subseqreg} is to establish the
+following properties for @{term SUPSEQ}
 \begin{equation}\label{supseqprops}
 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
 @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\
 @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\
 If @{text A} is regular, then also @{term "SUPSEQ A"}.
 \end{lmm}
 \begin{proof}
 Since our alphabet is finite, we have a regular expression, written @{text ALL}, that
-matches every string. With this regular expression we can inductively define
+matches every string. Using this regular expression we can inductively define
 the operation @{text "r\<up>"}
 \begin{center}
 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
 @{thm (lhs) UP.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(1)}\\
 \noindent
 By Higman's Lemma \eqref{higman} we know
 that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable,
 except with itself.
 It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For
-the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we can obtain
+the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we obtain
 a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that
 the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
 be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"},
 and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument
 given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
 for reasoning about well-foundedness). Since @{term "z"} is
 minimal and an element in @{term A}, we also know that @{term z} is in @{term M}.
-From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in
+From this together with \mbox{@{term "z \<preceq> x"}}, we can infer that @{term x} is in
 @{term "SUPSEQ M"}, as required.
 \end{proof}
 \noindent
 This lemma allows us to establish the second part of Lemma~\ref{subseqreg}.
 By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl}
 is regular, which means @{term "- SUBSEQ A"} is regular. But since
 we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
 must be regular.
 \end{proof}
+Finally we like to show that the Myhill-Nerode theorem is also convenient for establishing
+non-regularity of languages. For this we use the following version of the Continuation
+Lemma (see for example~\cite{Rosenberg06}).
+\begin{lmm}[Continuation Lemma]
+If the language @{text A} is regular and the set @{text B} is infinite,
+then there exist two distinct strings @{text x} and @{text y} in @{text B}
+such that @{term "x \<approx>A y"}.
+\end{lmm}
+\noindent
+This lemma can be easily deduced from the Myhill-Nerode theorem and the Pigeonhole
+Principle: Since @{text A} is regular, there can be only finitely many
+equivalence classes by the Myhill-Nerode relation. Hence an infinite set must contain
+at least two strings that are in the same equivalence class, that is
+they need to be related by the Myhill-Nerode relation.
+Using this lemma, it is straightforward to establish that the language
+\mbox{@{text "A \<equiv> \<Union>\<^isub>n a\<^sup>n @ b\<^sup>n"}}, where @{text "a\<^sup>n"} stands
+for the strings consisting of @{text n} times the character a, is not
+regular. For this consider the infinite set @{text "B \<equiv> \<Union>\<^isub>n a\<^sup>n"}.
+\begin{lmm}
+No two distinct strings in @{text "B"} are Myhill-Nerode related by @{text A}.
+\end{lmm}
+\begin{proof}
+After unfolding the definitions, we need to establish that for @{term "i \<noteq> j"},
+the equality \mbox{@{text "a\<^sup>i @ b\<^sup>j = a\<^sup>n @ b\<^sup>n"}} leads to a contradiction. This is clearly the case
+if we test that the two strings have the same amount of @{text a}'s and @{text b}'s;
+the string on the right-hand side satisfies this property, but not the one on
+the left-hand side. Therefore the strings cannot be equal and we have a contradiction.
+\end{proof}
+\noindent
+To conclude, this lemma and the Continuation Lemma leads to a contradiction assuming @{text A}
+is regular. Therefore the language @{text A} is not regular, as we wanted to show.
 *}
 section {* Conclusion and Related Work *}
 text {*

changeset 240	17aa8c8fbe7d
parent 239	13de6a49294e
child 242	093e45c44d91