diff -r 8ab3a06577cf -r 426070e68b21 Paper/Paper.thy --- a/Paper/Paper.thy Sun Feb 06 11:21:12 2011 +0000 +++ b/Paper/Paper.thy Mon Feb 07 10:23:23 2011 +0000 @@ -22,7 +22,8 @@ REL ("\") and UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and L ("L '(_')" [0] 101) and - EClass ("\_\\<^bsub>_\<^esub>" [100, 100] 100) + EClass ("\_\\<^bsub>_\<^esub>" [100, 100] 100) and + transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longrightarrow}}> _" [100, 100, 100] 100) (*>*) @@ -40,7 +41,7 @@ introduce finite automata and then define everything in terms of them. For example, a regular language is normally defined as one whose strings are recognised by a finite deterministic automaton. This approach has many - benefits. Among them is that it is easy to convince oneself from the fact that + benefits. Among them is the fact that it is easy to convince oneself that regular languages are closed under complementation: one just has to exchange the accepting and non-accepting states in the corresponding automaton to obtain an automaton for the complement language. The problem, however, lies with @@ -139,7 +140,7 @@ to be no substantial formalisation of automata theory and regular languages carried out in a HOL-based theorem prover. We are only aware of the large formalisation of automata theory in Nuprl \cite{Constable00} and - some smaller formalisations in Coq, for example \cite{Filliatre97}. + some smaller formalisations in Coq (for example \cite{Filliatre97}). In this paper, we will not attempt to formalise automata theory, but take a completely different approach to regular languages. Instead of defining a regular language as one @@ -153,12 +154,12 @@ \noindent The reason is that regular expressions, unlike graphs and matrices, can - be easily defined as inductive datatype. Therefore a corresponding reasoning - infrastructure comes for free. This has recently been used in HOL4 for formalising regular - expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to - show that a central result about regular languages, the Myhill-Nerode theorem, - can be recreated by only using regular expressions. This theorem gives a necessary - and sufficient condition for when a language is regular. As a corollary of this + be easily defined as inductive datatype. Consequently a corresponding reasoning + infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation + of regular expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to + show that a central result about regular languages---the Myhill-Nerode theorem---can + be recreated by only using regular expressions. This theorem gives necessary + and sufficient conditions for when a language is regular. As a corollary of this theorem we can easily establish the usual closure properties, including complementation, for regular languages.\smallskip @@ -176,11 +177,9 @@ Strings in Isabelle/HOL are lists of characters with the \emph{empty string} being represented by the empty list, written @{term "[]"}. \emph{Languages} are sets of strings. The language containing all strings is written in - Isabelle/HOL as @{term "UNIV::string set"}. The notation for the quotient - of a language @{text A} according to a relation @{term REL} is @{term "A // - REL"}. The concatenation of two languages is written @{term "A ;; B"}; a - language raised to the power $n$ is written @{term "A \ n"}. Both concepts - are defined as + Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages + is written @{term "A ;; B"} and a language raised to the power $n$ is written + @{term "A \ n"}. Their definitions are \begin{center} @{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]} @@ -192,10 +191,31 @@ \noindent where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A} - is defined as the union over all powers, namely @{thm Star_def}. + is defined as the union over all powers, namely @{thm Star_def}. In the paper + we will often make use of the following properties. + \begin{proposition}\label{langprops}\mbox{}\\ + \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll} + (i) & @{thm star_cases} & (ii) & @{thm[mode=IfThen] pow_length}\\ + (iii) & @{thm seq_Union_left} & + \end{tabular} + \end{proposition} + + \noindent + We omit the proofs of these properties, but invite the reader to consult + our formalisation.\footnote{Available at ???} + + + The notation for the quotient of a language @{text A} according to an + equivalence relation @{term REL} is @{term "A // REL"}. We will write + @{text "\x\\<^isub>\"} for the equivalence class defined + as @{text "{y | y \ x}"}. + + Central to our proof will be the solution of equational systems - involving regular expressions. For this we will use the following ``reverse'' + involving regular expressions. For this we will use Arden's lemma \cite{} + which solves equations of the form @{term "X = A ;; X \ B"} provided + @{term "[] \ A"}. However we will need the following ``reverse'' version of Arden's lemma. \begin{lemma}[Reverse Arden's Lemma]\mbox{}\\ @@ -206,7 +226,8 @@ \begin{proof} For the right-to-left direction we assume @{thm (rhs) ardens_revised} and show - that @{thm (lhs) ardens_revised} holds. From Lemma ??? we have @{term "A\ = {[]} \ A ;; A\"}, + that @{thm (lhs) ardens_revised} holds. From Prop.~\ref{langprops}$(i)$ + we have @{term "A\ = {[]} \ A ;; A\"}, which is equal to @{term "A\ = {[]} \ A\ ;; A"}. Adding @{text B} to both sides gives @{term "B ;; A\ = B ;; ({[]} \ A\ ;; A)"}, whose right-hand side is equal to @{term "(B ;; A\) ;; A \ B"}. This completes this direction. @@ -220,15 +241,17 @@ \noindent Using this property we can show that @{term "B ;; (A \ n) \ X"} holds for - all @{text n}. From this we can infer @{term "B ;; A\ \ X"} using Lemma ???. + all @{text n}. From this we can infer @{term "B ;; A\ \ X"} using the definition + of @{text "\"}. For the inclusion in the other direction we assume a string @{text s} with length @{text k} is element in @{text X}. Since @{thm (prem 1) ardens_revised} - we know that @{term "s \ X ;; (A \ Suc k)"} since its length is only @{text k} + we know by Prop.~\ref{langprops}$(ii)$ that + @{term "s \ X ;; (A \ Suc k)"} since its length is only @{text k} (the strings in @{term "X ;; (A \ Suc k)"} are all longer). From @{text "(*)"} it follows then that @{term s} must be element in @{term "(\m\{0..k}. B ;; (A \ m))"}. This in turn - implies that @{term s} is in @{term "(\n. B ;; (A \ n))"}. Using Lemma ??? this - is equal to @{term "B ;; A\"}, as we needed to show.\qed + implies that @{term s} is in @{term "(\n. B ;; (A \ n))"}. Using Prop.~\ref{langprops}$(iii)$ + this is equal to @{term "B ;; A\"}, as we needed to show.\qed \end{proof} \noindent @@ -275,13 +298,27 @@ @{thm str_eq_rel_def[simplified]} \end{definition} - \begin{definition} @{text "finals A"} are the equivalence classes that contain - strings from @{text A}\\ + \noindent + It is easy to see that @{term "\A"} is an equivalence relation, which partitions + the set of all string, @{text "UNIV"}, into a set of equivalence classed. We define + the set @{term "finals A"} as those equivalence classes that contain strings of + @{text A}, namely + + \begin{equation} @{thm finals_def} - \end{definition} + \end{equation} + + \noindent + It is easy to show that @{thm lang_is_union_of_finals} holds. We can also define + a notion of \emph{transition} between equivalence classes as - @{thm lang_is_union_of_finals} + \begin{equation} + @{thm transition_def} + \end{equation} + \noindent + which means if we add the character @{text c} to all strings in the equivalence + class @{text Y} HERE \begin{theorem} Given a language @{text A}.