--- a/Paper/Paper.thy Mon Feb 14 07:42:16 2011 +0000
+++ b/Paper/Paper.thy Mon Feb 14 09:38:18 2011 +0000
@@ -46,7 +46,7 @@
wide range of textbooks on this subject, many of which are aimed at students
and contain very detailed ``pencil-and-paper'' proofs
(e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
- formalising these theorems and by verifying formally the algorithms.
+ formalising the theorems and by verifying formally the algorithms.
There is however a problem: the typical approach to regular languages is to
introduce finite automata and then define everything in terms of them. For
@@ -147,19 +147,30 @@
Functions are much better supported in Isabelle/HOL, but they still lead to similar
problems as with graphs. Composing, for example, two non-deterministic automata in parallel
requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98}
- dismisses the option of using identities, because it leads according to him to ``messy proofs''. He
- opts for a variant of \eqref{disjointunion} using bitlists, but writes
+ dismisses for this the option of using identities, because it leads according to
+ him to ``messy proofs''. He
+ opts for a variant of \eqref{disjointunion} using bitlists, but writes
\begin{quote}
\it%
\begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
+ `` & All lemmas appear obvious given a picture of the composition of automata\ldots
+ Yet their proofs require a painful amount of detail.''
+ \end{tabular}
+ \end{quote}
+
+ \noindent
+ and
+
+ \begin{quote}
+ \it%
+ \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
`` & If the reader finds the above treatment in terms of bit lists revoltingly
- concrete, I cannot disagree. A more abstract approach is clearly desirable.''\smallskip\\
- `` & All lemmas appear obvious given a picture of the composition of automata\ldots
- Yet their proofs require a painful amount of detail.''
+ concrete, I cannot disagree. A more abstract approach is clearly desirable.''
\end{tabular}
\end{quote}
-
+
+
\noindent
Moreover, it is not so clear how to conveniently impose a finiteness condition
upon functions in order to represent \emph{finite} automata. The best is
@@ -190,7 +201,9 @@
The reason is that regular expressions, unlike graphs, matrices and functons, can
be easily defined as inductive datatype. Consequently a corresponding reasoning
infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
- of regular expression matching based on derivatives \cite{OwensSlind08}. The purpose of this paper is to
+ of regular expression matching based on derivatives \cite{OwensSlind08} and
+ with an equivalence checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.
+ The purpose of this paper is to
show that a central result about regular languages---the Myhill-Nerode theorem---can
be recreated by only using regular expressions. This theorem gives necessary
and sufficient conditions for when a language is regular. As a corollary of this
@@ -252,7 +265,7 @@
Central to our proof will be the solution of equational systems
- involving sets of languages. For this we will use Arden's lemma \cite{Brzozowski64}
+ involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64}
which solves equations of the form @{term "X = A ;; X \<union> B"} provided
@{term "[] \<notin> A"}. However we will need the following ``reverse''
version of Arden's lemma.
@@ -358,9 +371,10 @@
\noindent
It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
- equivalence classes. Let us give an example: consider the regular language containing just
+ equivalence classes. To illustrate this quotient construction, let us give an
+ example: consider the regular language containing just
the string @{text "[c]"}. The relation @{term "\<approx>({[c]})"} partitions @{text UNIV}
- into the three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and @{text "X\<^isub>3"}
+ into three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and @{text "X\<^isub>3"}
as follows
\begin{center}
@@ -467,15 +481,15 @@
\end{equation}
\noindent
- The reason for adding the @{text \<lambda>}-marker to our equational system is
- to obtain this equation: it only holds in this form since none of
+ The reason for adding the @{text \<lambda>}-marker to our initial equational system is
+ to obtain this equation: it only holds with the marker since none of
the other terms contain the empty string.
- Our represeantation of the equations are pairs,
+ Our representation for the equations in Isabelle/HOL are pairs,
where the first component is an equivalence class and the second component
- is a set of terms standing for the right-hand side. Given a set of equivalence
+ is a set of terms. Given a set of equivalence
classes @{text CS}, our initial equational system @{term "Init CS"} is thus
- defined as
+ formally defined as
\begin{center}
\begin{tabular}{rcl}
@@ -491,7 +505,7 @@
\noindent
Because we use sets of terms
- for representing the right-hand sides in the equational system we can
+ for representing the right-hand sides of equations, we can
prove \eqref{inv1} and \eqref{inv2} more concisely as
%
\begin{lemma}\label{inv}
@@ -520,7 +534,7 @@
\noindent
which we also lift to entire right-hand sides of equations, written as
@{thm (lhs) append_rhs_rexp_def[where rexp="r"]}. With this we can define
- the \emph{arden-operation} for an equation of the form @{text "X = rhs"}:
+ the \emph{arden-operation} for an equation of the form @{text "X = rhs"} as:
\begin{center}
\begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
@@ -532,7 +546,7 @@
\end{center}
\noindent
- We first delete all terms of the form @{text "(X, r)"} from @{text rhs};
+ In this definition, we first delete all terms of the form @{text "(X, r)"} from @{text rhs};
then we calculate the combinded regular expressions for all @{text r} coming
from the deleted @{text "(X, r)"}, and take the @{const STAR} of it;
finally we append this regular expression to @{text rhs'}. It can be easily seen
@@ -572,13 +586,13 @@
\noindent
Finially, we can define how an equational system should be solved. For this
- we will iterate the elimination of an equation until only one equation
+ we will need to iterate the elimination of an equation until only one equation
will be left in the system. However, we not just want to have any equation
as being the last one, but the one for which we want to calculate the regular
expression. Therefore we define the iteration step so that it chooses an
equation with an equivalence class that is not @{text X}. This allows us to
- control, which equation will be the last. We use again Hilbert's choice operator,
- written @{text SOME}, to chose an equation in a equational system @{text ES}.
+ control, which equation will be the last. We use Hilbert's choice operator,
+ written @{text SOME}, to chose an equation to be eliminated in @{text ES}.
\begin{center}
\begin{tabular}{rc@ {\hspace{4mm}}r@ {\hspace{1mm}}l}
@@ -589,12 +603,18 @@
\end{center}
\noindent
- The last definition in our
-
+ The last definition applies @{term Iter} over and over again a condition
+ @{text COND} is \emph{not} satisfied anymore. The condition states that there
+ are more than one equation in the equational system @{text ES}. The
+ iteration is defined in terms of HOL's @{text while}-operator as follows:
+
\begin{center}
@{thm Solve_def}
\end{center}
+ \noindent
+ We are not concerned here with the definition of this operator in this paper
+ (see \cite{BerghoferNipkow00}).
\begin{center}
@{thm while_rule}