Journal/Paper.thy
changeset 237 e02155fe8136
parent 233 e2dc11e12e0b
child 238 a6513d0b16fc
equal deleted inserted replaced
236:2d4f1334b5ca 237:e02155fe8136
  2107   B r"}, we know by \eqref{uplus} that there exists a regular expression so that
  2107   B r"}, we know by \eqref{uplus} that there exists a regular expression so that
  2108   the right-hand side of \eqref{eqq} is equal to the language \mbox{@{term "lang (\<Uplus>(pderivs_lang B
  2108   the right-hand side of \eqref{eqq} is equal to the language \mbox{@{term "lang (\<Uplus>(pderivs_lang B
  2109   r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that
  2109   r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that
  2110   @{term "Deriv_lang B A"} is regular.
  2110   @{term "Deriv_lang B A"} is regular.
  2111 
  2111 
  2112   Even more surprising is the fact that for every language @{text A}, the language
  2112   Even more surprising is the fact that for \emph{every} language @{text A}, the language
  2113   consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. 
  2113   consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. 
  2114   A substring can be obtained
  2114   A substring can be obtained
  2115   by striking out zero or more characters from a string. This can be defined 
  2115   by striking out zero or more characters from a string. This can be defined 
  2116   inductively by the following three rules:
  2116   inductively in Isabelle/HOL by the following three rules:
  2117 
  2117 
  2118   \begin{center}
  2118   \begin{center}
  2119   @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} 
  2119   @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} 
  2120   @{thm[mode=Rule] emb1[where as="x" and b="c" and bs="y"]}\hspace{10mm} 
  2120   @{thm[mode=Rule] emb1[where as="x" and b="c" and bs="y"]}\hspace{10mm} 
  2121   @{thm[mode=Rule] emb2[where as="x" and a="c" and bs="y"]}
  2121   @{thm[mode=Rule] emb2[where as="x" and a="c" and bs="y"]}
  2122   \end{center}
  2122   \end{center}
  2123 
  2123 
  2124   \noindent
  2124   \noindent
  2125   It can be easily proved that @{text "\<preceq>"} is a partial order. Now define the 
  2125   It can be easily proved that @{text "\<preceq>"} is a partial order. Now define the 
  2126   language of substrings and superstrings of a language @{text A}:
  2126   \emph{language of substrings} and \emph{superstrings} of a language @{text A} 
       
  2127   respectively as
  2127 
  2128 
  2128   \begin{center}
  2129   \begin{center}
  2129   \begin{tabular}{l}
  2130   \begin{tabular}{l}
  2130   @{thm SUBSEQ_def}\\
  2131   @{thm SUBSEQ_def}\\
  2131   @{thm SUPSEQ_def}
  2132   @{thm SUPSEQ_def}
  2134   
  2135   
  2135   \noindent
  2136   \noindent
  2136   We like to establish
  2137   We like to establish
  2137 
  2138 
  2138   \begin{lmm}\label{subseqreg}
  2139   \begin{lmm}\label{subseqreg}
  2139   For any language @{text A}, the languages @{term "SUBSEQ A"} and @{term "SUPSEQ A"}
  2140   For every language @{text A}, the languages @{term "SUBSEQ A"} and @{term "SUPSEQ A"}
  2140   are regular.
  2141   are regular.
  2141   \end{lmm}
  2142   \end{lmm}
  2142 
  2143 
  2143   \noindent
  2144   \noindent
  2144   Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use
  2145   Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use
  2145   Higman's lemma, which is already proved in the Isabelle/HOL library. Higman's
  2146   Higman's lemma, which is already proved in the Isabelle/HOL library. Higman's
  2146   lemma allows us to prove that every set @{text A} of antichains, namely
  2147   lemma allows us to infer that every set @{text A} of antichains, namely
  2147 
  2148 
  2148   \begin{equation}\label{higman}
  2149   \begin{equation}\label{higman}
  2149   @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
  2150   @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"}
  2150   \end{equation} 
  2151   \end{equation} 
  2151 
  2152 
  2152   \noindent
  2153   \noindent
  2153   is finite.
  2154   is finite.
  2154 
  2155 
  2155   It is straightforward to prove the following properties of @{term SUPSEQ}
  2156   The first step in our proof is to establish the following properties for @{term SUPSEQ}
  2156 
  2157 
  2157   \begin{equation}\label{supseqprops}
  2158   \begin{equation}\label{supseqprops}
  2158   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  2159   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  2159   @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\  
  2160   @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\  
  2160   @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ 
  2161   @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ 
  2168   \noindent
  2169   \noindent
  2169   whereby the last equation follows from the fact that @{term "A\<star>"} contains the
  2170   whereby the last equation follows from the fact that @{term "A\<star>"} contains the
  2170   empty string. With these properties at our disposal we can establish the lemma
  2171   empty string. With these properties at our disposal we can establish the lemma
  2171 
  2172 
  2172   \begin{lmm}
  2173   \begin{lmm}
  2173   If @{text A} is regular, then also @{term "SUBSEQ A"}.
  2174   If @{text A} is regular, then also @{term "SUPSEQ A"}.
  2174   \end{lmm}
  2175   \end{lmm}
  2175 
  2176 
  2176   \begin{proof}
  2177   \begin{proof}
  2177   Since our alphabet is finite, we can find a regular expression, written @{const Allreg}, that
  2178   Since our alphabet is finite, we can find a regular expression, written @{const Allreg}, that
  2178   matches every single-character string. With this regular expression we can inductively define
  2179   matches every single-character string. With this regular expression we can inductively define
  2190   @{thm (lhs) UP.simps(6)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(6)}
  2191   @{thm (lhs) UP.simps(6)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(6)}
  2191   \end{tabular}
  2192   \end{tabular}
  2192   \end{center}
  2193   \end{center}
  2193 
  2194 
  2194   \noindent
  2195   \noindent
  2195   and using \eqref{supseqprops} establish that @{thm lang_UP}. This shows
  2196   and use \eqref{supseqprops} establish that @{thm lang_UP} holds. This shows
  2196   that @{term "SUBSEQ A"} is regular provided @{text A} is.
  2197   that @{term "SUBSEQ A"} is regular provided @{text A} is.
  2197   \end{proof}
  2198   \end{proof}
  2198 
  2199 
  2199   \noindent
  2200   \noindent
  2200   Now we can prove the main lemma, namely
  2201   Now we can prove our main lemma, namely
  2201 
  2202 
  2202   \begin{lmm}\label{mset}
  2203   \begin{lmm}\label{mset}
  2203   For every language @{text A}, there exists a finite language @{text M} such that
  2204   For every language @{text A}, there exists a finite language @{text M} such that
  2204   \begin{center}
  2205   \begin{center}
  2205   \mbox{@{term "SUPSEQ M = SUPSEQ A"}}\;.
  2206   \mbox{@{term "SUPSEQ M = SUPSEQ A"}}\;.
  2206   \end{center}
  2207   \end{center}
  2207   \end{lmm}
  2208   \end{lmm}
  2208 
  2209 
  2209   \begin{proof}
  2210   \begin{proof}
  2210   For @{text M} we take the set of all minimal elements of @{text A}. An element @{text x} 
  2211   For @{text M} we take the set of all minimal elements of @{text A}. An element @{text x} 
  2211   is minimal in @{text A} provided
  2212   is said to be \emph{minimal} in @{text A} provided
  2212 
  2213 
  2213   \begin{center}
  2214   \begin{center}
  2214   @{thm minimal_def}
  2215   @{thm minimal_def}
  2215   \end{center}
  2216   \end{center}
  2216 
  2217 
  2217   \noindent
  2218   \noindent
  2218   By Higman's lemma \eqref{higman} we know
  2219   By Higman's lemma \eqref{higman} we know
  2219   that @{text "M"} is finite, since every minimal element is incomparable, 
  2220   that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable, 
  2220   except with itself.
  2221   except with itself.
  2221   It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For
  2222   It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For
  2222   the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we can obtain 
  2223   the other direction we have  @{term "x \<in> SUPSEQ A"}. From this we can obtain 
  2223   a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that
  2224   a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that
  2224   the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
  2225   the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must
  2225   be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, 
  2226   be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, 
  2226   and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument 
  2227   and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument 
  2227   given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
  2228   given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure
  2228   for reasoning about well-foundedness). Since @{term "z"} is
  2229   for reasoning about well-foundedness). Since @{term "z"} is
  2229   minimal and in @{term A}, we also know that @{term z} is in @{term M}.
  2230   minimal and an element in @{term A}, we also know that @{term z} is in @{term M}.
  2230   From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in 
  2231   From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in 
  2231   @{term "SUPSEQ M"} as required.
  2232   @{term "SUPSEQ M"}, as required.
  2232   \end{proof}
  2233   \end{proof}
  2233 
  2234 
  2234   \noindent
  2235   \noindent
  2235   This lemma allows us to establish the second part of Lemma~\ref{subseqreg}.
  2236   This lemma allows us to establish the second part of Lemma~\ref{subseqreg}.
  2236 
  2237 
  2239   a finite, and thus regular, language @{text M}. We further have @{term "SUPSEQ M = SUPSEQ A"},
  2240   a finite, and thus regular, language @{text M}. We further have @{term "SUPSEQ M = SUPSEQ A"},
  2240   which establishes the second part.    
  2241   which establishes the second part.    
  2241   \end{proof}
  2242   \end{proof}
  2242 
  2243 
  2243   \noindent
  2244   \noindent
  2244   In order to establish the first part of Lemma~\ref{subseqreg} we use the
  2245   In order to establish the first part of Lemma~\ref{subseqreg}, we use the
  2245   property proved in \cite{Shallit08}
  2246   property proved in \cite{Shallit08}, namely that
  2246 
  2247 
  2247   \begin{equation}\label{compl}
  2248   \begin{equation}\label{compl}
  2248   @{thm SUBSEQ_complement}
  2249   @{thm SUBSEQ_complement}
  2249   \end{equation}
  2250   \end{equation}
  2250 
  2251 
       
  2252   \noindent
       
  2253   holds. Now the first part is a simple consequence of the second part.
       
  2254 
  2251   \begin{proof}[The First Part of Lemma~\ref{subseqreg}]
  2255   \begin{proof}[The First Part of Lemma~\ref{subseqreg}]
  2252   By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl}
  2256   By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl}
  2253   is regular, which means the complement of @{term "SUBSEQ A"} is regular. But since
  2257   is regular, which means @{term "- SUBSEQ A"} is regular. But since
  2254   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
  2258   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
  2255   must be regular. 
  2259   must be regular. 
  2256   \end{proof}
  2260   \end{proof}
  2257 *}
  2261 *}
  2258 
  2262