2107 B r"}, we know by \eqref{uplus} that there exists a regular expression so that |
2107 B r"}, we know by \eqref{uplus} that there exists a regular expression so that |
2108 the right-hand side of \eqref{eqq} is equal to the language \mbox{@{term "lang (\<Uplus>(pderivs_lang B |
2108 the right-hand side of \eqref{eqq} is equal to the language \mbox{@{term "lang (\<Uplus>(pderivs_lang B |
2109 r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that |
2109 r))"}}. Thus the regular expression @{term "\<Uplus>(pderivs_lang B r)"} verifies that |
2110 @{term "Deriv_lang B A"} is regular. |
2110 @{term "Deriv_lang B A"} is regular. |
2111 |
2111 |
2112 Even more surprising is the fact that for every language @{text A}, the language |
2112 Even more surprising is the fact that for \emph{every} language @{text A}, the language |
2113 consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. |
2113 consisting of all substrings of @{text A} is regular \cite{Shallit08, Gasarch09}. |
2114 A substring can be obtained |
2114 A substring can be obtained |
2115 by striking out zero or more characters from a string. This can be defined |
2115 by striking out zero or more characters from a string. This can be defined |
2116 inductively by the following three rules: |
2116 inductively in Isabelle/HOL by the following three rules: |
2117 |
2117 |
2118 \begin{center} |
2118 \begin{center} |
2119 @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} |
2119 @{thm[mode=Axiom] emb0[where bs="x"]}\hspace{10mm} |
2120 @{thm[mode=Rule] emb1[where as="x" and b="c" and bs="y"]}\hspace{10mm} |
2120 @{thm[mode=Rule] emb1[where as="x" and b="c" and bs="y"]}\hspace{10mm} |
2121 @{thm[mode=Rule] emb2[where as="x" and a="c" and bs="y"]} |
2121 @{thm[mode=Rule] emb2[where as="x" and a="c" and bs="y"]} |
2122 \end{center} |
2122 \end{center} |
2123 |
2123 |
2124 \noindent |
2124 \noindent |
2125 It can be easily proved that @{text "\<preceq>"} is a partial order. Now define the |
2125 It can be easily proved that @{text "\<preceq>"} is a partial order. Now define the |
2126 language of substrings and superstrings of a language @{text A}: |
2126 \emph{language of substrings} and \emph{superstrings} of a language @{text A} |
|
2127 respectively as |
2127 |
2128 |
2128 \begin{center} |
2129 \begin{center} |
2129 \begin{tabular}{l} |
2130 \begin{tabular}{l} |
2130 @{thm SUBSEQ_def}\\ |
2131 @{thm SUBSEQ_def}\\ |
2131 @{thm SUPSEQ_def} |
2132 @{thm SUPSEQ_def} |
2134 |
2135 |
2135 \noindent |
2136 \noindent |
2136 We like to establish |
2137 We like to establish |
2137 |
2138 |
2138 \begin{lmm}\label{subseqreg} |
2139 \begin{lmm}\label{subseqreg} |
2139 For any language @{text A}, the languages @{term "SUBSEQ A"} and @{term "SUPSEQ A"} |
2140 For every language @{text A}, the languages @{term "SUBSEQ A"} and @{term "SUPSEQ A"} |
2140 are regular. |
2141 are regular. |
2141 \end{lmm} |
2142 \end{lmm} |
2142 |
2143 |
2143 \noindent |
2144 \noindent |
2144 Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use |
2145 Our proof follows the one given in \cite[92--95]{Shallit08}, except that we use |
2145 Higman's lemma, which is already proved in the Isabelle/HOL library. Higman's |
2146 Higman's lemma, which is already proved in the Isabelle/HOL library. Higman's |
2146 lemma allows us to prove that every set @{text A} of antichains, namely |
2147 lemma allows us to infer that every set @{text A} of antichains, namely |
2147 |
2148 |
2148 \begin{equation}\label{higman} |
2149 \begin{equation}\label{higman} |
2149 @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"} |
2150 @{text "\<forall>x, y \<in> A."}~@{term "x \<noteq> y \<longrightarrow> \<not>(x \<preceq> y) \<and> \<not>(y \<preceq> x)"} |
2150 \end{equation} |
2151 \end{equation} |
2151 |
2152 |
2152 \noindent |
2153 \noindent |
2153 is finite. |
2154 is finite. |
2154 |
2155 |
2155 It is straightforward to prove the following properties of @{term SUPSEQ} |
2156 The first step in our proof is to establish the following properties for @{term SUPSEQ} |
2156 |
2157 |
2157 \begin{equation}\label{supseqprops} |
2158 \begin{equation}\label{supseqprops} |
2158 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
2159 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
2159 @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\ |
2160 @{thm (lhs) SUPSEQ_simps(1)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(1)}\\ |
2160 @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ |
2161 @{thm (lhs) SUPSEQ_simps(2)} & @{text "\<equiv>"} & @{thm (rhs) SUPSEQ_simps(2)}\\ |
2168 \noindent |
2169 \noindent |
2169 whereby the last equation follows from the fact that @{term "A\<star>"} contains the |
2170 whereby the last equation follows from the fact that @{term "A\<star>"} contains the |
2170 empty string. With these properties at our disposal we can establish the lemma |
2171 empty string. With these properties at our disposal we can establish the lemma |
2171 |
2172 |
2172 \begin{lmm} |
2173 \begin{lmm} |
2173 If @{text A} is regular, then also @{term "SUBSEQ A"}. |
2174 If @{text A} is regular, then also @{term "SUPSEQ A"}. |
2174 \end{lmm} |
2175 \end{lmm} |
2175 |
2176 |
2176 \begin{proof} |
2177 \begin{proof} |
2177 Since our alphabet is finite, we can find a regular expression, written @{const Allreg}, that |
2178 Since our alphabet is finite, we can find a regular expression, written @{const Allreg}, that |
2178 matches every single-character string. With this regular expression we can inductively define |
2179 matches every single-character string. With this regular expression we can inductively define |
2190 @{thm (lhs) UP.simps(6)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(6)} |
2191 @{thm (lhs) UP.simps(6)} & @{text "\<equiv>"} & @{thm (rhs) UP.simps(6)} |
2191 \end{tabular} |
2192 \end{tabular} |
2192 \end{center} |
2193 \end{center} |
2193 |
2194 |
2194 \noindent |
2195 \noindent |
2195 and using \eqref{supseqprops} establish that @{thm lang_UP}. This shows |
2196 and use \eqref{supseqprops} establish that @{thm lang_UP} holds. This shows |
2196 that @{term "SUBSEQ A"} is regular provided @{text A} is. |
2197 that @{term "SUBSEQ A"} is regular provided @{text A} is. |
2197 \end{proof} |
2198 \end{proof} |
2198 |
2199 |
2199 \noindent |
2200 \noindent |
2200 Now we can prove the main lemma, namely |
2201 Now we can prove our main lemma, namely |
2201 |
2202 |
2202 \begin{lmm}\label{mset} |
2203 \begin{lmm}\label{mset} |
2203 For every language @{text A}, there exists a finite language @{text M} such that |
2204 For every language @{text A}, there exists a finite language @{text M} such that |
2204 \begin{center} |
2205 \begin{center} |
2205 \mbox{@{term "SUPSEQ M = SUPSEQ A"}}\;. |
2206 \mbox{@{term "SUPSEQ M = SUPSEQ A"}}\;. |
2206 \end{center} |
2207 \end{center} |
2207 \end{lmm} |
2208 \end{lmm} |
2208 |
2209 |
2209 \begin{proof} |
2210 \begin{proof} |
2210 For @{text M} we take the set of all minimal elements of @{text A}. An element @{text x} |
2211 For @{text M} we take the set of all minimal elements of @{text A}. An element @{text x} |
2211 is minimal in @{text A} provided |
2212 is said to be \emph{minimal} in @{text A} provided |
2212 |
2213 |
2213 \begin{center} |
2214 \begin{center} |
2214 @{thm minimal_def} |
2215 @{thm minimal_def} |
2215 \end{center} |
2216 \end{center} |
2216 |
2217 |
2217 \noindent |
2218 \noindent |
2218 By Higman's lemma \eqref{higman} we know |
2219 By Higman's lemma \eqref{higman} we know |
2219 that @{text "M"} is finite, since every minimal element is incomparable, |
2220 that @{term "M \<equiv> {x \<in> A. minimal x A}"} is finite, since every minimal element is incomparable, |
2220 except with itself. |
2221 except with itself. |
2221 It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For |
2222 It is also straightforward to show that @{term "SUPSEQ M \<subseteq> SUPSEQ A"}. For |
2222 the other direction we have @{term "x \<in> SUPSEQ A"}. From this we can obtain |
2223 the other direction we have @{term "x \<in> SUPSEQ A"}. From this we can obtain |
2223 a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that |
2224 a @{text y} such that @{term "y \<in> A"} and @{term "y \<preceq> x"}. Since we know that |
2224 the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must |
2225 the relation \mbox{@{term "{(y, x). y \<preceq> x \<and> x \<noteq> y}"}} is well-founded, there must |
2225 be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, |
2226 be a minimal element @{text "z"} such that @{term "z \<in> A"} and @{term "z \<preceq> y"}, |
2226 and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument |
2227 and hence by transitivity also \mbox{@{term "z \<preceq> x"}} (here we deviate from the argument |
2227 given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure |
2228 given in \cite{Shallit08}, because Isabelle/HOL provides already an extensive infrastructure |
2228 for reasoning about well-foundedness). Since @{term "z"} is |
2229 for reasoning about well-foundedness). Since @{term "z"} is |
2229 minimal and in @{term A}, we also know that @{term z} is in @{term M}. |
2230 minimal and an element in @{term A}, we also know that @{term z} is in @{term M}. |
2230 From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in |
2231 From this together with @{term "z \<preceq> x"}, we can infer that @{term x} is in |
2231 @{term "SUPSEQ M"} as required. |
2232 @{term "SUPSEQ M"}, as required. |
2232 \end{proof} |
2233 \end{proof} |
2233 |
2234 |
2234 \noindent |
2235 \noindent |
2235 This lemma allows us to establish the second part of Lemma~\ref{subseqreg}. |
2236 This lemma allows us to establish the second part of Lemma~\ref{subseqreg}. |
2236 |
2237 |
2239 a finite, and thus regular, language @{text M}. We further have @{term "SUPSEQ M = SUPSEQ A"}, |
2240 a finite, and thus regular, language @{text M}. We further have @{term "SUPSEQ M = SUPSEQ A"}, |
2240 which establishes the second part. |
2241 which establishes the second part. |
2241 \end{proof} |
2242 \end{proof} |
2242 |
2243 |
2243 \noindent |
2244 \noindent |
2244 In order to establish the first part of Lemma~\ref{subseqreg} we use the |
2245 In order to establish the first part of Lemma~\ref{subseqreg}, we use the |
2245 property proved in \cite{Shallit08} |
2246 property proved in \cite{Shallit08}, namely that |
2246 |
2247 |
2247 \begin{equation}\label{compl} |
2248 \begin{equation}\label{compl} |
2248 @{thm SUBSEQ_complement} |
2249 @{thm SUBSEQ_complement} |
2249 \end{equation} |
2250 \end{equation} |
2250 |
2251 |
|
2252 \noindent |
|
2253 holds. Now the first part is a simple consequence of the second part. |
|
2254 |
2251 \begin{proof}[The First Part of Lemma~\ref{subseqreg}] |
2255 \begin{proof}[The First Part of Lemma~\ref{subseqreg}] |
2252 By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl} |
2256 By the second part of Lemma~\ref{subseqreg}, we know the right-hand side of \eqref{compl} |
2253 is regular, which means the complement of @{term "SUBSEQ A"} is regular. But since |
2257 is regular, which means @{term "- SUBSEQ A"} is regular. But since |
2254 we established already that regularity is preserved under complement, also @{term "SUBSEQ A"} |
2258 we established already that regularity is preserved under complement, also @{term "SUBSEQ A"} |
2255 must be regular. |
2259 must be regular. |
2256 \end{proof} |
2260 \end{proof} |
2257 *} |
2261 *} |
2258 |
2262 |