423 regular expressions. This theorem gives necessary and sufficient conditions |
423 regular expressions. This theorem gives necessary and sufficient conditions |
424 for when a language is regular. As a corollary of this theorem we can easily |
424 for when a language is regular. As a corollary of this theorem we can easily |
425 establish the usual closure properties, including complementation, for |
425 establish the usual closure properties, including complementation, for |
426 regular languages. We also use in one example the continuation lemma, which |
426 regular languages. We also use in one example the continuation lemma, which |
427 is based on Myhill-Nerode, for establishing non-regularity of languages |
427 is based on Myhill-Nerode, for establishing non-regularity of languages |
428 \cite{rosenberg06}.\medskip |
428 \cite{Rosenberg06}.\medskip |
429 |
429 |
430 \noindent |
430 \noindent |
431 {\bf Contributions:} There is an extensive literature on regular languages. |
431 {\bf Contributions:} There is an extensive literature on regular languages. |
432 To our best knowledge, our proof of the Myhill-Nerode theorem is the first |
432 To our best knowledge, our proof of the Myhill-Nerode theorem is the first |
433 that is based on regular expressions, only. The part of this theorem stating |
433 that is based on regular expressions, only. The part of this theorem stating |
2013 text {* |
2013 text {* |
2014 \noindent |
2014 \noindent |
2015 The beauty of regular languages is that they are closed under many set |
2015 The beauty of regular languages is that they are closed under many set |
2016 operations. Closure under union, concatenation and Kleene-star are trivial |
2016 operations. Closure under union, concatenation and Kleene-star are trivial |
2017 to establish given our definition of regularity (recall Definition~\ref{regular}). |
2017 to establish given our definition of regularity (recall Definition~\ref{regular}). |
2018 More interesting is the closure under complement, because it seems difficult |
2018 More interesting in our setting is the closure under complement, because it seems difficult |
2019 to construct a regular expression for the complement language by direct |
2019 to construct a regular expression for the complement language by direct |
2020 means. However the existence of such a regular expression can now be easily |
2020 means. However the existence of such a regular expression can now be easily |
2021 proved using both parts of the Myhill-Nerode theorem, since |
2021 proved using both parts of the Myhill-Nerode theorem, since |
2022 |
2022 |
2023 \begin{center} |
2023 \begin{center} |
2027 \noindent |
2027 \noindent |
2028 holds for any strings @{text "s\<^isub>1"} and @{text |
2028 holds for any strings @{text "s\<^isub>1"} and @{text |
2029 "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"} |
2029 "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"} |
2030 give rise to the same partitions. So if one is finite, the other is too, and |
2030 give rise to the same partitions. So if one is finite, the other is too, and |
2031 vice versa. As noted earlier, our algorithm for solving equational systems |
2031 vice versa. As noted earlier, our algorithm for solving equational systems |
2032 actually calculates the regular expression. Calculating this regular expression |
2032 actually calculates the regular expression for the complement language. |
2033 via |
2033 Calculating this regular expression via |
2034 automata using the standard method would be quite involved. It includes the |
2034 automata using the standard method would be quite involved. It includes the |
2035 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
2035 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
2036 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
2036 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
2037 regular expression. Clearly not something you want to formalise in a theorem |
2037 regular expression. Clearly not something you want to formalise in a theorem |
2038 prover in which it is cumbersome to reason about automata. |
2038 prover in which it is cumbersome to reason about automata. |
2110 r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that |
2110 r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that |
2111 @{term "Deriv_lang B A"} is regular. |
2111 @{term "Deriv_lang B A"} is regular. |
2112 |
2112 |
2113 Even more surprising is the fact that for \emph{every} language @{text A}, the language |
2113 Even more surprising is the fact that for \emph{every} language @{text A}, the language |
2114 consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. |
2114 consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. |
2115 A substring can be obtained |
2115 A \emph{substring} can be obtained |
2116 by striking out zero or more characters from a string. This can be defined |
2116 by striking out zero or more characters from a string. This can be defined |
2117 inductively in Isabelle/HOL by the following three rules: |
2117 inductively in Isabelle/HOL by the following three rules: |
2118 |
2118 |
2119 \begin{center} |
2119 \begin{center} |
2120 @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} |
2120 @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} |
2142 are regular. |
2142 are regular. |
2143 \end{lmm} |
2143 \end{lmm} |
2144 |
2144 |
2145 \noindent |
2145 \noindent |
2146 Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use |
2146 Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use |
2147 Higman's Lemma, which is already proved in the Isabelle/HOL library. Higman's |
2147 Higman's Lemma, which is already proved in the Isabelle/HOL library \cite{Berghofer03}. |
2148 Lemma allows us to infer that every set @{text A} of antichains, namely |
2148 Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying |
2149 |
2149 |
2150 \begin{equation}\label{higman} |
2150 \begin{equation}\label{higman} |
2151 @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"} |
2151 @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"} |
2152 \end{equation} |
2152 \end{equation} |
2153 |
2153 |
2154 \noindent |
2154 \noindent |
2155 is finite. |
2155 is finite. |
2156 |
2156 |
2157 The first step in our proof is to establish the following properties for @{term SUPSEQ} |
2157 The first step in our proof of Lemma~\ref{subseqreg} is to establish the |
|
2158 following properties for @{term SUPSEQ} |
2158 |
2159 |
2159 \begin{equation}\label{supseqprops} |
2160 \begin{equation}\label{supseqprops} |
2160 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
2161 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
2161 @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\ |
2162 @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\ |
2162 @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ |
2163 @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ |
2175 If @{text A} is regular, then also @{term "SUPSEQ A"}. |
2176 If @{text A} is regular, then also @{term "SUPSEQ A"}. |
2176 \end{lmm} |
2177 \end{lmm} |
2177 |
2178 |
2178 \begin{proof} |
2179 \begin{proof} |
2179 Since our alphabet is finite, we have a regular expression, written @{text ALL}, that |
2180 Since our alphabet is finite, we have a regular expression, written @{text ALL}, that |
2180 matches every string. With this regular expression we can inductively define |
2181 matches every string. Using this regular expression we can inductively define |
2181 the operation @{text "r\<up>"} |
2182 the operation @{text "r\<up>"} |
2182 |
2183 |
2183 \begin{center} |
2184 \begin{center} |
2184 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
2185 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
2185 @{thm (lhs) UP.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(1)}\\ |
2186 @{thm (lhs) UP.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(1)}\\ |
2219 \noindent |
2220 \noindent |
2220 By Higman's Lemma \eqref{higman} we know |
2221 By Higman's Lemma \eqref{higman} we know |
2221 that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable, |
2222 that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable, |
2222 except with itself. |
2223 except with itself. |
2223 It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For |
2224 It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For |
2224 the other direction we have @{term "x \<in> SUPSEQ A"}. From this we can obtain |
2225 the other direction we have @{term "x \<in> SUPSEQ A"}. From this we obtain |
2225 a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that |
2226 a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that |
2226 the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must |
2227 the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must |
2227 be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, |
2228 be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, |
2228 and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument |
2229 and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument |
2229 given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure |
2230 given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure |
2230 for reasoning about well-foundedness). Since @{term "z"} is |
2231 for reasoning about well-foundedness). Since @{term "z"} is |
2231 minimal and an element in @{term A}, we also know that @{term z} is in @{term M}. |
2232 minimal and an element in @{term A}, we also know that @{term z} is in @{term M}. |
2232 From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in |
2233 From this together with \mbox{@{term "z \<preceq> x"}}, we can infer that @{term x} is in |
2233 @{term "SUPSEQ M"}, as required. |
2234 @{term "SUPSEQ M"}, as required. |
2234 \end{proof} |
2235 \end{proof} |
2235 |
2236 |
2236 \noindent |
2237 \noindent |
2237 This lemma allows us to establish the second part of Lemma~\ref{subseqreg}. |
2238 This lemma allows us to establish the second part of Lemma~\ref{subseqreg}. |
2257 By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl} |
2258 By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl} |
2258 is regular, which means @{term "- SUBSEQ A"} is regular. But since |
2259 is regular, which means @{term "- SUBSEQ A"} is regular. But since |
2259 we established already that regularity is preserved under complement, also @{term "SUBSEQ A"} |
2260 we established already that regularity is preserved under complement, also @{term "SUBSEQ A"} |
2260 must be regular. |
2261 must be regular. |
2261 \end{proof} |
2262 \end{proof} |
|
2263 |
|
2264 Finally we like to show that the Myhill-Nerode theorem is also convenient for establishing |
|
2265 non-regularity of languages. For this we use the following version of the Continuation |
|
2266 Lemma (see for example~\cite{Rosenberg06}). |
|
2267 |
|
2268 \begin{lmm}[Continuation Lemma] |
|
2269 If the language @{text A} is regular and the set @{text B} is infinite, |
|
2270 then there exist two distinct strings @{text x} and @{text y} in @{text B} |
|
2271 such that @{term "x \<approx>A y"}. |
|
2272 \end{lmm} |
|
2273 |
|
2274 \noindent |
|
2275 This lemma can be easily deduced from the Myhill-Nerode theorem and the Pigeonhole |
|
2276 Principle: Since @{text A} is regular, there can be only finitely many |
|
2277 equivalence classes by the Myhill-Nerode relation. Hence an infinite set must contain |
|
2278 at least two strings that are in the same equivalence class, that is |
|
2279 they need to be related by the Myhill-Nerode relation. |
|
2280 |
|
2281 Using this lemma, it is straightforward to establish that the language |
|
2282 \mbox{@{text "A \<equiv> \<Union>\<^isub>n a\<^sup>n @ b\<^sup>n"}}, where @{text "a\<^sup>n"} stands |
|
2283 for the strings consisting of @{text n} times the character a, is not |
|
2284 regular. For this consider the infinite set @{text "B \<equiv> \<Union>\<^isub>n a\<^sup>n"}. |
|
2285 |
|
2286 \begin{lmm} |
|
2287 No two distinct strings in @{text "B"} are Myhill-Nerode related by @{text A}. |
|
2288 \end{lmm} |
|
2289 |
|
2290 \begin{proof} |
|
2291 After unfolding the definitions, we need to establish that for @{term "i \<noteq> j"}, |
|
2292 the equality \mbox{@{text "a\<^sup>i @ b\<^sup>j = a\<^sup>n @ b\<^sup>n"}} leads to a contradiction. This is clearly the case |
|
2293 if we test that the two strings have the same amount of @{text a}'s and @{text b}'s; |
|
2294 the string on the right-hand side satisfies this property, but not the one on |
|
2295 the left-hand side. Therefore the strings cannot be equal and we have a contradiction. |
|
2296 \end{proof} |
|
2297 |
|
2298 \noindent |
|
2299 To conclude, this lemma and the Continuation Lemma leads to a contradiction assuming @{text A} |
|
2300 is regular. Therefore the language @{text A} is not regular, as we wanted to show. |
2262 *} |
2301 *} |
|
2302 |
2263 |
2303 |
2264 |
2304 |
2265 section {* Conclusion and Related Work *} |
2305 section {* Conclusion and Related Work *} |
2266 |
2306 |
2267 text {* |
2307 text {* |