--- a/Paper/Paper.thy Sun Feb 06 11:21:12 2011 +0000
+++ b/Paper/Paper.thy Mon Feb 07 10:23:23 2011 +0000
@@ -22,7 +22,8 @@
REL ("\<approx>") and
UPLUS ("_ \<^raw:\ensuremath{\uplus}> _" [90, 90] 90) and
L ("L '(_')" [0] 101) and
- EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100)
+ EClass ("\<lbrakk>_\<rbrakk>\<^bsub>_\<^esub>" [100, 100] 100) and
+ transition ("_ \<^raw:\ensuremath{\stackrel{\text{>_\<^raw:}}{\Longrightarrow}}> _" [100, 100, 100] 100)
(*>*)
@@ -40,7 +41,7 @@
introduce finite automata and then define everything in terms of them. For
example, a regular language is normally defined as one whose strings are
recognised by a finite deterministic automaton. This approach has many
- benefits. Among them is that it is easy to convince oneself from the fact that
+ benefits. Among them is the fact that it is easy to convince oneself that
regular languages are closed under complementation: one just has to exchange
the accepting and non-accepting states in the corresponding automaton to
obtain an automaton for the complement language. The problem, however, lies with
@@ -139,7 +140,7 @@
to be no substantial formalisation of automata theory and regular languages
carried out in a HOL-based theorem prover. We are only aware of the
large formalisation of automata theory in Nuprl \cite{Constable00} and
- some smaller formalisations in Coq, for example \cite{Filliatre97}.
+ some smaller formalisations in Coq (for example \cite{Filliatre97}).
In this paper, we will not attempt to formalise automata theory, but take a completely
different approach to regular languages. Instead of defining a regular language as one
@@ -153,12 +154,12 @@
\noindent
The reason is that regular expressions, unlike graphs and matrices, can
- be easily defined as inductive datatype. Therefore a corresponding reasoning
- infrastructure comes for free. This has recently been used in HOL4 for formalising regular
- expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to
- show that a central result about regular languages, the Myhill-Nerode theorem,
- can be recreated by only using regular expressions. This theorem gives a necessary
- and sufficient condition for when a language is regular. As a corollary of this
+ be easily defined as inductive datatype. Consequently a corresponding reasoning
+ infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
+ of regular expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to
+ show that a central result about regular languages---the Myhill-Nerode theorem---can
+ be recreated by only using regular expressions. This theorem gives necessary
+ and sufficient conditions for when a language is regular. As a corollary of this
theorem we can easily establish the usual closure properties, including
complementation, for regular languages.\smallskip
@@ -176,11 +177,9 @@
Strings in Isabelle/HOL are lists of characters with the \emph{empty string}
being represented by the empty list, written @{term "[]"}. \emph{Languages}
are sets of strings. The language containing all strings is written in
- Isabelle/HOL as @{term "UNIV::string set"}. The notation for the quotient
- of a language @{text A} according to a relation @{term REL} is @{term "A //
- REL"}. The concatenation of two languages is written @{term "A ;; B"}; a
- language raised to the power $n$ is written @{term "A \<up> n"}. Both concepts
- are defined as
+ Isabelle/HOL as @{term "UNIV::string set"}. The concatenation of two languages
+ is written @{term "A ;; B"} and a language raised to the power $n$ is written
+ @{term "A \<up> n"}. Their definitions are
\begin{center}
@{thm Seq_def[THEN eq_reflection, where A1="A" and B1="B"]}
@@ -192,10 +191,31 @@
\noindent
where @{text "@"} is the usual list-append operation. The Kleene-star of a language @{text A}
- is defined as the union over all powers, namely @{thm Star_def}.
+ is defined as the union over all powers, namely @{thm Star_def}. In the paper
+ we will often make use of the following properties.
+ \begin{proposition}\label{langprops}\mbox{}\\
+ \begin{tabular}{@ {}ll@ {\hspace{10mm}}ll}
+ (i) & @{thm star_cases} & (ii) & @{thm[mode=IfThen] pow_length}\\
+ (iii) & @{thm seq_Union_left} &
+ \end{tabular}
+ \end{proposition}
+
+ \noindent
+ We omit the proofs of these properties, but invite the reader to consult
+ our formalisation.\footnote{Available at ???}
+
+
+ The notation for the quotient of a language @{text A} according to an
+ equivalence relation @{term REL} is @{term "A // REL"}. We will write
+ @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined
+ as @{text "{y | y \<approx> x}"}.
+
+
Central to our proof will be the solution of equational systems
- involving regular expressions. For this we will use the following ``reverse''
+ involving regular expressions. For this we will use Arden's lemma \cite{}
+ which solves equations of the form @{term "X = A ;; X \<union> B"} provided
+ @{term "[] \<notin> A"}. However we will need the following ``reverse''
version of Arden's lemma.
\begin{lemma}[Reverse Arden's Lemma]\mbox{}\\
@@ -206,7 +226,8 @@
\begin{proof}
For the right-to-left direction we assume @{thm (rhs) ardens_revised} and show
- that @{thm (lhs) ardens_revised} holds. From Lemma ??? we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
+ that @{thm (lhs) ardens_revised} holds. From Prop.~\ref{langprops}$(i)$
+ we have @{term "A\<star> = {[]} \<union> A ;; A\<star>"},
which is equal to @{term "A\<star> = {[]} \<union> A\<star> ;; A"}. Adding @{text B} to both
sides gives @{term "B ;; A\<star> = B ;; ({[]} \<union> A\<star> ;; A)"}, whose right-hand side
is equal to @{term "(B ;; A\<star>) ;; A \<union> B"}. This completes this direction.
@@ -220,15 +241,17 @@
\noindent
Using this property we can show that @{term "B ;; (A \<up> n) \<subseteq> X"} holds for
- all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using Lemma ???.
+ all @{text n}. From this we can infer @{term "B ;; A\<star> \<subseteq> X"} using the definition
+ of @{text "\<star>"}.
For the inclusion in the other direction we assume a string @{text s}
with length @{text k} is element in @{text X}. Since @{thm (prem 1) ardens_revised}
- we know that @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
+ we know by Prop.~\ref{langprops}$(ii)$ that
+ @{term "s \<notin> X ;; (A \<up> Suc k)"} since its length is only @{text k}
(the strings in @{term "X ;; (A \<up> Suc k)"} are all longer).
From @{text "(*)"} it follows then that
@{term s} must be element in @{term "(\<Union>m\<in>{0..k}. B ;; (A \<up> m))"}. This in turn
- implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Lemma ??? this
- is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
+ implies that @{term s} is in @{term "(\<Union>n. B ;; (A \<up> n))"}. Using Prop.~\ref{langprops}$(iii)$
+ this is equal to @{term "B ;; A\<star>"}, as we needed to show.\qed
\end{proof}
\noindent
@@ -275,13 +298,27 @@
@{thm str_eq_rel_def[simplified]}
\end{definition}
- \begin{definition} @{text "finals A"} are the equivalence classes that contain
- strings from @{text A}\\
+ \noindent
+ It is easy to see that @{term "\<approx>A"} is an equivalence relation, which partitions
+ the set of all string, @{text "UNIV"}, into a set of equivalence classed. We define
+ the set @{term "finals A"} as those equivalence classes that contain strings of
+ @{text A}, namely
+
+ \begin{equation}
@{thm finals_def}
- \end{definition}
+ \end{equation}
+
+ \noindent
+ It is easy to show that @{thm lang_is_union_of_finals} holds. We can also define
+ a notion of \emph{transition} between equivalence classes as
- @{thm lang_is_union_of_finals}
+ \begin{equation}
+ @{thm transition_def}
+ \end{equation}
+ \noindent
+ which means if we add the character @{text c} to all strings in the equivalence
+ class @{text Y} HERE
\begin{theorem}
Given a language @{text A}.