Paper/Paper.thy
changeset 71 426070e68b21
parent 70 8ab3a06577cf
child 75 d63baacbdb16
--- a/Paper/Paper.thy	Sun Feb 06 11:21:12 2011 +0000
+++ b/Paper/Paper.thy	Mon Feb 07 10:23:23 2011 +0000
@@ -22,7 +22,8 @@
   REL ("\<approx>") and
   UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
   L ("L '(_')" [0] 101) and
-  EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) 
+  EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
+  transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longrightarrow}}> _" [100, 100, 100] 100)
 (*>*)
 
 
@@ -40,7 +41,7 @@
   introduce finite automata and then define everything in terms of them.  For
   example, a regular language is normally defined as one whose strings are
   recognised by a finite deterministic automaton. This approach has many
-  benefits. Among them is that it is easy to convince oneself from the fact that
+  benefits. Among them is the fact that it is easy to convince oneself that
   regular languages are closed under complementation: one just has to exchange
   the accepting and non-accepting states in the corresponding automaton to
   obtain an automaton for the complement language.  The problem, however, lies with
@@ -139,7 +140,7 @@
   to be no substantial formalisation of automata theory and regular languages 
   carried out in a HOL-based theorem prover. We are only aware of the 
   large formalisation of automata theory in Nuprl \cite{Constable00} and 
-  some smaller formalisations in Coq, for example \cite{Filliatre97}.
+  some smaller formalisations in Coq (for example \cite{Filliatre97}).
   
   In this paper, we will not attempt to formalise automata theory, but take a completely 
   different approach to regular languages. Instead of defining a regular language as one 
@@ -153,12 +154,12 @@
   
   \noindent
   The reason is that regular expressions, unlike graphs and matrices, can
-  be easily defined as inductive datatype. Therefore a corresponding reasoning 
-  infrastructure comes for free. This has recently been used in HOL4 for formalising regular 
-  expression matching based on derivatives \cite{OwensSlind08}.  The purpose of this paper is to
-  show that a central result about regular languages, the Myhill-Nerode theorem,  
-  can be recreated by only using regular expressions. This theorem gives a necessary
-  and sufficient condition for when a language is regular. As a corollary of this
+  be easily defined as inductive datatype. Consequently a corresponding reasoning 
+  infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
+  of regular expression matching based on derivatives \cite{OwensSlind08}.  The purpose of this paper is to
+  show that a central result about regular languages---the Myhill-Nerode theorem---can 
+  be recreated by only using regular expressions. This theorem gives necessary
+  and sufficient conditions for when a language is regular. As a corollary of this
   theorem we can easily establish the usual closure properties, including 
   complementation, for regular languages.\smallskip
   
@@ -176,11 +177,9 @@
   Strings in Isabelle/HOL are lists of characters with the \emph{empty string}
   being represented by the empty list, written @{term "[]"}. \emph{Languages}
   are sets of strings. The language containing all strings is written in
-  Isabelle/HOL as @{term "UNIV::string set"}.  The notation for the quotient
-  of a language @{text A} according to a relation @{term REL} is @{term "A //
-  REL"}. The concatenation of two languages is written @{term "A ;; B"}; a
-  language raised to the power $n$ is written @{term "A \<up> n"}. Both concepts
-  are defined as
+  Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages 
+  is written @{term "A ;; B"} and a language raised to the power $n$ is written 
+  @{term "A \<up> n"}. Their definitions are
 
   \begin{center}
   @{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
@@ -192,10 +191,31 @@
 
   \noindent
   where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
-  is defined as the union over all powers, namely @{thm Star_def}.
+  is defined as the union over all powers, namely @{thm Star_def}. In the paper
+  we will often make use of the following properties.
   
+  \begin{proposition}\label{langprops}\mbox{}\\
+  \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll}
+  (i)   & @{thm star_cases}      & (ii)  & @{thm[mode=IfThen] pow_length}\\
+  (iii) & @{thm seq_Union_left}  & 
+  \end{tabular}
+  \end{proposition}
+
+  \noindent
+  We omit the proofs of these properties, but invite the reader to consult
+  our formalisation.\footnote{Available at ???}
+
+
+  The notation for the quotient of a language @{text A} according to an 
+  equivalence relation @{term REL} is @{term "A // REL"}. We will write 
+  @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined 
+  as @{text "{y | y \<approx> x}"}.
+
+
   Central to our proof will be the solution of equational systems
-  involving regular expressions. For this we will use the following ``reverse'' 
+  involving regular expressions. For this we will use Arden's lemma \cite{}
+  which solves equations of the form @{term "X = A ;; X \<union> B"} provided
+  @{term "[] \<notin> A"}. However we will need the following ``reverse'' 
   version of Arden's lemma.
 
   \begin{lemma}[Reverse Arden's Lemma]\mbox{}\\
@@ -206,7 +226,8 @@
 
   \begin{proof}
   For the right-to-left direction we assume @{thm (rhs) ardens_revised} and show
-  that @{thm (lhs) ardens_revised} holds. From Lemma ??? we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
+  that @{thm (lhs) ardens_revised} holds. From Prop.~\ref{langprops}$(i)$ 
+  we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
   which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both 
   sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side
   is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction. 
@@ -220,15 +241,17 @@
   
   \noindent
   Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for
-  all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using Lemma ???.
+  all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using the definition
+  of @{text "\<star>"}.
   For the inclusion in the other direction we assume a string @{text s}
   with length @{text k} is element in @{text X}. Since @{thm (prem 1) ardens_revised}
-  we know that @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
+  we know by Prop.~\ref{langprops}$(ii)$ that 
+  @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
   (the strings in @{term "X ;; (A \<up> Suc k)"} are all longer). 
   From @{text "(*)"} it follows then that
   @{term s} must be element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn
-  implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Lemma ??? this
-  is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
+  implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}$(iii)$ 
+  this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
   \end{proof}
 
   \noindent
@@ -275,13 +298,27 @@
   @{thm str_eq_rel_def[simplified]}
   \end{definition}
 
-  \begin{definition} @{text "finals A"} are the equivalence classes that contain
-  strings from @{text A}\\
+  \noindent
+  It is easy to see that @{term "\<approx>A"} is an equivalence relation, which partitions
+  the set of all string, @{text "UNIV"}, into a set of equivalence classed. We define
+  the set @{term "finals A"} as those equivalence classes that contain strings of 
+  @{text A}, namely
+
+  \begin{equation} 
   @{thm finals_def}
-  \end{definition}
+  \end{equation}
+
+  \noindent
+  It is easy to show that @{thm lang_is_union_of_finals} holds. We can also define 
+  a notion of \emph{transition} between equivalence classes as
 
-  @{thm lang_is_union_of_finals}
+  \begin{equation} 
+  @{thm transition_def}
+  \end{equation}
 
+  \noindent
+  which means if we add the character @{text c} to all strings in the equivalence
+  class @{text Y} HERE
 
   \begin{theorem}
   Given a language @{text A}.