24 UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and |
24 UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and |
25 L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and |
25 L ("\<^raw:\ensuremath{\cal{L}}>'(_')" [0] 101) and |
26 Lam ("\<lambda>'(_')" [100] 100) and |
26 Lam ("\<lambda>'(_')" [100] 100) and |
27 Trn ("_, _" [100, 100] 100) and |
27 Trn ("_, _" [100, 100] 100) and |
28 EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and |
28 EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and |
29 transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) |
29 transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longmapsto}}> _" [100, 100, 100] 100) and |
|
30 Setalt ("\<^raw:\ensuremath{\bigplus}>_" [1000] 999) |
30 (*>*) |
31 (*>*) |
31 |
32 |
32 |
33 |
33 section {* Introduction *} |
34 section {* Introduction *} |
34 |
35 |
114 \end{tabular} |
115 \end{tabular} |
115 \end{center} |
116 \end{center} |
116 |
117 |
117 \noindent |
118 \noindent |
118 On ``paper'' we can define the corresponding graph in terms of the disjoint |
119 On ``paper'' we can define the corresponding graph in terms of the disjoint |
119 union of the state nodes. Unfortunately in HOL, the definition for disjoint |
120 union of the state nodes. Unfortunately in HOL, the standard definition for disjoint |
120 union, namely |
121 union, namely |
121 % |
122 % |
122 \begin{equation}\label{disjointunion} |
123 \begin{equation}\label{disjointunion} |
123 @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"} |
124 @{term "UPLUS A\<^isub>1 A\<^isub>2 \<equiv> {(1, x) | x. x \<in> A\<^isub>1} \<union> {(2, y) | y. y \<in> A\<^isub>2}"} |
124 \end{equation} |
125 \end{equation} |
135 establishing that properties are invariant under renaming. Similarly, |
136 establishing that properties are invariant under renaming. Similarly, |
136 connecting two automata represented as matrices results in very adhoc |
137 connecting two automata represented as matrices results in very adhoc |
137 constructions, which are not pleasant to reason about. |
138 constructions, which are not pleasant to reason about. |
138 |
139 |
139 Functions are much better supported in Isabelle/HOL, but they still lead to similar |
140 Functions are much better supported in Isabelle/HOL, but they still lead to similar |
140 problems as with graphs. Composing two non-deterministic automata in parallel |
141 problems as with graphs. Composing, for example, two non-deterministic automata in parallel |
141 poses still the problem of how to implement disjoint unions. Nipkow \cite{Nipkow98} |
142 poses again the problem of how to implement disjoint unions. Nipkow \cite{Nipkow98} |
142 dismisses the option using identities, because it leads to messy proofs. He |
143 dismisses the option of using identities, because it leads to ``messy proofs''. He |
143 opts for a variant of \eqref{disjointunion}, but writes |
144 opts for a variant of \eqref{disjointunion}, but writes |
144 |
145 |
145 \begin{quote} |
146 \begin{quote} |
146 \it ``If the reader finds the above treatment in terms of bit lists revoltingly |
147 \it ``If the reader finds the above treatment in terms of bit lists revoltingly |
147 concrete, I cannot disagree.'' |
148 concrete, I cannot disagree.'' |
170 A language @{text A} is \emph{regular}, provided there is a regular expression that matches all |
171 A language @{text A} is \emph{regular}, provided there is a regular expression that matches all |
171 strings of @{text "A"}. |
172 strings of @{text "A"}. |
172 \end{definition} |
173 \end{definition} |
173 |
174 |
174 \noindent |
175 \noindent |
175 The reason is that regular expressions, unlike graphs and matrices, can |
176 The reason is that regular expressions, unlike graphs, matrices and functons, can |
176 be easily defined as inductive datatype. Consequently a corresponding reasoning |
177 be easily defined as inductive datatype. Consequently a corresponding reasoning |
177 infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation |
178 infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation |
178 of regular expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to |
179 of regular expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to |
179 show that a central result about regular languages---the Myhill-Nerode theorem---can |
180 show that a central result about regular languages---the Myhill-Nerode theorem---can |
180 be recreated by only using regular expressions. This theorem gives necessary |
181 be recreated by only using regular expressions. This theorem gives necessary |
181 and sufficient conditions for when a language is regular. As a corollary of this |
182 and sufficient conditions for when a language is regular. As a corollary of this |
182 theorem we can easily establish the usual closure properties, including |
183 theorem we can easily establish the usual closure properties, including |
183 complementation, for regular languages.\smallskip |
184 complementation, for regular languages.\smallskip |
184 |
185 |
185 \noindent |
186 \noindent |
186 {\bf Contributions:} To our knowledge, our proof of the Myhill-Nerode theorem is the |
187 {\bf Contributions:} |
|
188 There is an extensive literature on regular languages. |
|
189 To our knowledge, our proof of the Myhill-Nerode theorem is the |
187 first that is based on regular expressions, only. We prove the part of this theorem |
190 first that is based on regular expressions, only. We prove the part of this theorem |
188 stating that a regular expression has only finitely many partitions using certain |
191 stating that a regular expression has only finitely many partitions using certain |
189 tagging-functions. Again to our best knowledge, these tagging functions have |
192 tagging-functions. Again to our best knowledge, these tagging functions have |
190 not been used before to establish the Myhill-Nerode theorem. |
193 not been used before to establish the Myhill-Nerode theorem. |
191 *} |
194 *} |
209 \end{center} |
212 \end{center} |
210 |
213 |
211 \noindent |
214 \noindent |
212 where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A} |
215 where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A} |
213 is defined as the union over all powers, namely @{thm Star_def}. In the paper |
216 is defined as the union over all powers, namely @{thm Star_def}. In the paper |
214 we will often make use of the following properties. |
217 we will make use of the following properties of these constructions. |
215 |
218 |
216 \begin{proposition}\label{langprops}\mbox{}\\ |
219 \begin{proposition}\label{langprops}\mbox{}\\ |
217 \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll} |
220 \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll} |
218 (i) & @{thm star_cases} & (ii) & @{thm[mode=IfThen] pow_length}\\ |
221 (i) & @{thm star_cases} & (ii) & @{thm[mode=IfThen] pow_length}\\ |
219 (iii) & @{thm seq_Union_left} & |
222 (iii) & @{thm seq_Union_left} & |
220 \end{tabular} |
223 \end{tabular} |
221 \end{proposition} |
224 \end{proposition} |
222 |
225 |
223 \noindent |
226 \noindent |
224 We omit the proofs of these properties, but invite the reader to consult |
227 We omit the proofs, but invite the reader to consult |
225 our formalisation.\footnote{Available at ???} |
228 our formalisation.\footnote{Available at ???} |
226 |
229 |
227 |
230 |
228 The notation for the quotient of a language @{text A} according to an |
231 The notation for the quotient of a language @{text A} according to an |
229 equivalence relation @{term REL} is @{term "A // REL"}. We will write |
232 equivalence relation @{term REL} is in Isabelle/HOL @{term "A // REL"}. We will write |
230 @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined |
233 @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined |
231 as @{text "{y | y \<approx> x}"}. |
234 as @{text "{y | y \<approx> x}"}. |
232 |
235 |
233 |
236 |
234 Central to our proof will be the solution of equational systems |
237 Central to our proof will be the solution of equational systems |
272 implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} |
275 implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}@{text "(iii)"} |
273 this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed |
276 this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed |
274 \end{proof} |
277 \end{proof} |
275 |
278 |
276 \noindent |
279 \noindent |
277 Regular expressions are defined as the following inductive datatype |
280 Regular expressions are defined as the inductive datatype |
278 |
281 |
279 \begin{center} |
282 \begin{center} |
280 @{text r} @{text "::="} |
283 @{text r} @{text "::="} |
281 @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} |
284 @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} |
282 @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} |
285 @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} |
306 @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\ |
309 @{thm (rhs) L_rexp.simps(6)[where r="r"]}\\ |
307 \end{tabular} |
310 \end{tabular} |
308 \end{tabular} |
311 \end{tabular} |
309 \end{center} |
312 \end{center} |
310 |
313 |
311 |
314 \noindent |
|
315 Given a set or regular expressions @{text rs}, we will need the operation of generating |
|
316 a regular expressions that matches all languages of @{text rs}. We only need the existence |
|
317 of such an regular expressions therefore we use Isabelle's @{const "fold_graph"} and Hilbert's |
|
318 @{text "\<epsilon>"} to define @{term "\<Uplus>rs"} which, roughly speaking, folds @{const ALT} over the |
|
319 set @{text rs} with @{const NULL} for the empty set. We can prove that for finite sets @{text rs} |
|
320 |
|
321 \begin{center} |
|
322 @{thm (lhs) folds_alt_simp}@{text "= \<Union> (\<calL> ` rs)"} |
|
323 \end{center} |
|
324 |
|
325 \noindent |
|
326 holds. (whereby @{text "\<calL> ` rs"} stands for the |
|
327 image of the set @{text rs} under function @{text "\<calL>"}). |
|
328 |
312 |
329 |
313 *} |
330 *} |
314 |
331 |
315 section {* Finite Partitions Imply Regularity of a Language *} |
332 section {* Finite Partitions Imply Regularity of a Language *} |
316 |
333 |