Journal/Paper.thy
changeset 240 17aa8c8fbe7d
parent 239 13de6a49294e
child 242 093e45c44d91
equal deleted inserted replaced
239:13de6a49294e 240:17aa8c8fbe7d
    87   deriv ("der") and
    87   deriv ("der") and
    88   derivs ("ders") and
    88   derivs ("ders") and
    89   pderiv ("pder") and
    89   pderiv ("pder") and
    90   pderivs ("pders") and
    90   pderivs ("pders") and
    91   pderivs_set ("pderss") and
    91   pderivs_set ("pderss") and
    92   SUBSEQ ("Subseq") and
    92   SUBSEQ ("Sub") and
    93   SUPSEQ ("Supseq") and
    93   SUPSEQ ("Sup") and
    94   UP ("'(_')\<up>") and
    94   UP ("'(_')\<up>") and
    95   ALLS ("ALL")
    95   ALLS ("ALL")
    96   
    96   
    97   
    97   
    98 lemmas Deriv_simps = Deriv_empty Deriv_epsilon Deriv_char Deriv_union
    98 lemmas Deriv_simps = Deriv_empty Deriv_epsilon Deriv_char Deriv_union
   423   regular expressions. This theorem gives necessary and sufficient conditions
   423   regular expressions. This theorem gives necessary and sufficient conditions
   424   for when a language is regular. As a corollary of this theorem we can easily
   424   for when a language is regular. As a corollary of this theorem we can easily
   425   establish the usual closure properties, including complementation, for
   425   establish the usual closure properties, including complementation, for
   426   regular languages. We also use in one example the continuation lemma, which
   426   regular languages. We also use in one example the continuation lemma, which
   427   is based on Myhill-Nerode, for establishing non-regularity of languages 
   427   is based on Myhill-Nerode, for establishing non-regularity of languages 
   428   \cite{rosenberg06}.\medskip
   428   \cite{Rosenberg06}.\medskip
   429   
   429   
   430   \noindent 
   430   \noindent 
   431   {\bf Contributions:} There is an extensive literature on regular languages.
   431   {\bf Contributions:} There is an extensive literature on regular languages.
   432   To our best knowledge, our proof of the Myhill-Nerode theorem is the first
   432   To our best knowledge, our proof of the Myhill-Nerode theorem is the first
   433   that is based on regular expressions, only. The part of this theorem stating
   433   that is based on regular expressions, only. The part of this theorem stating
  2013 text {*
  2013 text {*
  2014   \noindent
  2014   \noindent
  2015   The beauty of regular languages is that they are closed under many set
  2015   The beauty of regular languages is that they are closed under many set
  2016   operations. Closure under union, concatenation and Kleene-star are trivial
  2016   operations. Closure under union, concatenation and Kleene-star are trivial
  2017   to establish given our definition of regularity (recall Definition~\ref{regular}).
  2017   to establish given our definition of regularity (recall Definition~\ref{regular}).
  2018   More interesting is the closure under complement, because it seems difficult
  2018   More interesting in our setting is the closure under complement, because it seems difficult
  2019   to construct a regular expression for the complement language by direct
  2019   to construct a regular expression for the complement language by direct
  2020   means. However the existence of such a regular expression can now be easily
  2020   means. However the existence of such a regular expression can now be easily
  2021   proved using both parts of the Myhill-Nerode theorem, since
  2021   proved using both parts of the Myhill-Nerode theorem, since
  2022 
  2022 
  2023   \begin{center}
  2023   \begin{center}
  2027   \noindent
  2027   \noindent
  2028   holds for any strings @{text "s\<^isub>1"} and @{text
  2028   holds for any strings @{text "s\<^isub>1"} and @{text
  2029   "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"}
  2029   "s\<^isub>2"}. Therefore @{text A} and the complement language @{term "-A"}
  2030   give rise to the same partitions. So if one is finite, the other is too, and
  2030   give rise to the same partitions. So if one is finite, the other is too, and
  2031   vice versa. As noted earlier, our algorithm for solving equational systems
  2031   vice versa. As noted earlier, our algorithm for solving equational systems
  2032   actually calculates the regular expression. Calculating this regular expression 
  2032   actually calculates the regular expression for the complement language. 
  2033   via
  2033   Calculating this regular expression via
  2034   automata using the standard method would be quite involved. It includes the
  2034   automata using the standard method would be quite involved. It includes the
  2035   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
  2035   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
  2036   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
  2036   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
  2037   regular expression. Clearly not something you want to formalise in a theorem
  2037   regular expression. Clearly not something you want to formalise in a theorem
  2038   prover in which it is cumbersome to reason about automata.
  2038   prover in which it is cumbersome to reason about automata.
  2110   r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that
  2110   r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that
  2111   @{term "Deriv_lang B A"} is regular.
  2111   @{term "Deriv_lang B A"} is regular.
  2112 
  2112 
  2113   Even more surprising is the fact that for \emph{every} language @{text A}, the language
  2113   Even more surprising is the fact that for \emph{every} language @{text A}, the language
  2114   consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. 
  2114   consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. 
  2115   A substring can be obtained
  2115   A \emph{substring} can be obtained
  2116   by striking out zero or more characters from a string. This can be defined 
  2116   by striking out zero or more characters from a string. This can be defined 
  2117   inductively in Isabelle/HOL by the following three rules:
  2117   inductively in Isabelle/HOL by the following three rules:
  2118 
  2118 
  2119   \begin{center}
  2119   \begin{center}
  2120   @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} 
  2120   @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} 
  2142   are regular.
  2142   are regular.
  2143   \end{lmm}
  2143   \end{lmm}
  2144 
  2144 
  2145   \noindent
  2145   \noindent
  2146   Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use
  2146   Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use
  2147   Higman's Lemma, which is already proved in the Isabelle/HOL library. Higman's
  2147   Higman's Lemma, which is already proved in the Isabelle/HOL library \cite{Berghofer03}. 
  2148   Lemma allows us to infer that every set @{text A} of antichains, namely
  2148   Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying
  2149 
  2149 
  2150   \begin{equation}\label{higman}
  2150   \begin{equation}\label{higman}
  2151   @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
  2151   @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
  2152   \end{equation} 
  2152   \end{equation} 
  2153 
  2153 
  2154   \noindent
  2154   \noindent
  2155   is finite.
  2155   is finite.
  2156 
  2156 
  2157   The first step in our proof is to establish the following properties for @{term SUPSEQ}
  2157   The first step in our proof of Lemma~\ref{subseqreg} is to establish the 
       
  2158   following properties for @{term SUPSEQ}
  2158 
  2159 
  2159   \begin{equation}\label{supseqprops}
  2160   \begin{equation}\label{supseqprops}
  2160   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  2161   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  2161   @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\  
  2162   @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\  
  2162   @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ 
  2163   @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ 
  2175   If @{text A} is regular, then also @{term "SUPSEQ A"}.
  2176   If @{text A} is regular, then also @{term "SUPSEQ A"}.
  2176   \end{lmm}
  2177   \end{lmm}
  2177 
  2178 
  2178   \begin{proof}
  2179   \begin{proof}
  2179   Since our alphabet is finite, we have a regular expression, written @{text ALL}, that
  2180   Since our alphabet is finite, we have a regular expression, written @{text ALL}, that
  2180   matches every string. With this regular expression we can inductively define
  2181   matches every string. Using this regular expression we can inductively define
  2181   the operation @{text "r\<up>"} 
  2182   the operation @{text "r\<up>"} 
  2182 
  2183 
  2183   \begin{center}
  2184   \begin{center}
  2184   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  2185   \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  2185   @{thm (lhs) UP.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(1)}\\  
  2186   @{thm (lhs) UP.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(1)}\\  
  2219   \noindent
  2220   \noindent
  2220   By Higman's Lemma \eqref{higman} we know
  2221   By Higman's Lemma \eqref{higman} we know
  2221   that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable, 
  2222   that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable, 
  2222   except with itself.
  2223   except with itself.
  2223   It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For
  2224   It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For
  2224   the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we can obtain 
  2225   the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we obtain 
  2225   a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that
  2226   a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that
  2226   the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
  2227   the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
  2227   be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, 
  2228   be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, 
  2228   and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument 
  2229   and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument 
  2229   given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
  2230   given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
  2230   for reasoning about well-foundedness). Since @{term "z"} is
  2231   for reasoning about well-foundedness). Since @{term "z"} is
  2231   minimal and an element in @{term A}, we also know that @{term z} is in @{term M}.
  2232   minimal and an element in @{term A}, we also know that @{term z} is in @{term M}.
  2232   From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in 
  2233   From this together with \mbox{@{term "z \<preceq> x"}}, we can infer that @{term x} is in 
  2233   @{term "SUPSEQ M"}, as required.
  2234   @{term "SUPSEQ M"}, as required.
  2234   \end{proof}
  2235   \end{proof}
  2235 
  2236 
  2236   \noindent
  2237   \noindent
  2237   This lemma allows us to establish the second part of Lemma~\ref{subseqreg}.
  2238   This lemma allows us to establish the second part of Lemma~\ref{subseqreg}.
  2257   By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl}
  2258   By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl}
  2258   is regular, which means @{term "- SUBSEQ A"} is regular. But since
  2259   is regular, which means @{term "- SUBSEQ A"} is regular. But since
  2259   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
  2260   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
  2260   must be regular. 
  2261   must be regular. 
  2261   \end{proof}
  2262   \end{proof}
       
  2263 
       
  2264   Finally we like to show that the Myhill-Nerode theorem is also convenient for establishing 
       
  2265   non-regularity of languages. For this we use the following version of the Continuation
       
  2266   Lemma (see for example~\cite{Rosenberg06}).
       
  2267 
       
  2268   \begin{lmm}[Continuation Lemma]
       
  2269   If the language @{text A} is regular and the set @{text B} is infinite,
       
  2270   then there exist two distinct strings @{text x} and @{text y} in @{text B} 
       
  2271   such that @{term "x \<approx>A y"}.
       
  2272   \end{lmm}
       
  2273 
       
  2274   \noindent
       
  2275   This lemma can be easily deduced from the Myhill-Nerode theorem and the Pigeonhole
       
  2276   Principle: Since @{text A} is regular, there can be only finitely many 
       
  2277   equivalence classes by the Myhill-Nerode relation. Hence an infinite set must contain 
       
  2278   at least two strings that are in the same equivalence class, that is
       
  2279   they need to be related by the Myhill-Nerode relation.
       
  2280 
       
  2281   Using this lemma, it is straightforward to establish that the language 
       
  2282   \mbox{@{text "A \<equiv> \<Union>\<^isub>n a\<^sup>n @ b\<^sup>n"}}, where @{text "a\<^sup>n"} stands
       
  2283   for the strings consisting of @{text n} times the character a, is not
       
  2284   regular. For this consider the infinite set @{text "B \<equiv> \<Union>\<^isub>n a\<^sup>n"}.
       
  2285 
       
  2286   \begin{lmm}
       
  2287   No two distinct strings in @{text "B"} are Myhill-Nerode related by @{text A}.
       
  2288   \end{lmm} 
       
  2289 
       
  2290   \begin{proof}
       
  2291   After unfolding the definitions, we need to establish that for @{term "i \<noteq> j"},
       
  2292   the equality \mbox{@{text "a\<^sup>i @ b\<^sup>j = a\<^sup>n @ b\<^sup>n"}} leads to a contradiction. This is clearly the case
       
  2293   if we test that the two strings have the same amount of @{text a}'s and @{text b}'s;
       
  2294   the string on the right-hand side satisfies this property, but not the one on
       
  2295   the left-hand side. Therefore the strings cannot be equal and we have a contradiction.
       
  2296   \end{proof}
       
  2297 
       
  2298   \noindent
       
  2299   To conclude, this lemma and the Continuation Lemma leads to a contradiction assuming @{text A}
       
  2300   is regular. Therefore the language @{text A} is not regular, as we wanted to show.
  2262 *}
  2301 *}
       
  2302 
  2263 
  2303 
  2264 
  2304 
  2265 section {* Conclusion and Related Work *}
  2305 section {* Conclusion and Related Work *}
  2266 
  2306 
  2267 text {*
  2307 text {*