regexp: comparison Paper/Paper.thy

equal deleted inserted replaced

-:2409827d8eb8
+:d3fe0597080a
 Regular languages are an important and well-understood subject in Computer
 Science, with many beautiful theorems and many useful algorithms. There is a
 wide range of textbooks on this subject, many of which are aimed at students
 and contain very detailed ``pencil-and-paper'' proofs
 (e.g.~\cite{Kozen97}). It seems natural to exercise theorem provers by
-formalising these theorems and by verifying formally the algorithms.
+formalising the theorems and by verifying formally the algorithms.
 There is however a problem: the typical approach to regular languages is to
 introduce finite automata and then define everything in terms of them.  For
 example, a regular language is normally defined as one whose strings are
 recognised by a finite deterministic automaton. This approach has many
 constructions, which are not pleasant to reason about.
 Functions are much better supported in Isabelle/HOL, but they still lead to similar
 problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
 requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98}
-dismisses the option of using identities, because it leads according to him to ``messy proofs''. He
+dismisses for this the option of using identities, because it leads according to
-opts for a variant of \eqref{disjointunion} using bitlists, but writes
+him to ``messy proofs''. He
+opts for a variant of \eqref{disjointunion} using bitlists, but writes
+\begin{quote}
+\it%
+\begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
+`` & All lemmas appear obvious given a picture of the composition of automata\ldots
+Yet their proofs require a painful amount of detail.''
+\end{tabular}
+\end{quote}
+\noindent
+and
 \begin{quote}
 \it%
 \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
 `` & If the reader finds the above treatment in terms of bit lists revoltingly
-concrete, I cannot disagree. A more abstract approach is clearly desirable.''\smallskip\\
+concrete, I cannot disagree. A more abstract approach is clearly desirable.''
-`` & All lemmas appear obvious given a picture of the composition of automata\ldots
-Yet their proofs require a painful amount of detail.''
 \end{tabular}
 \end{quote}
 \noindent
 Moreover, it is not so clear how to conveniently impose a finiteness condition
 upon functions in order to represent \emph{finite} automata. The best is
 probably to resort to more advanced reasoning frameworks, such as \emph{locales}
 or \emph{type classes},
 \noindent
 The reason is that regular expressions, unlike graphs, matrices and functons, can
 be easily defined as inductive datatype. Consequently a corresponding reasoning
 infrastructure comes for free. This has recently been exploited in HOL4 with a formalisation
-of regular expression matching based on derivatives \cite{OwensSlind08}.  The purpose of this paper is to
+of regular expression matching based on derivatives \cite{OwensSlind08} and
+with an equivalence checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.
+The purpose of this paper is to
 show that a central result about regular languages---the Myhill-Nerode theorem---can
 be recreated by only using regular expressions. This theorem gives necessary
 and sufficient conditions for when a language is regular. As a corollary of this
 theorem we can easily establish the usual closure properties, including
 complementation, for regular languages.\smallskip
 @{text "\<lbrakk>x\<rbrakk>\<^isub>\<approx>"} for the equivalence class defined
 as @{text "{y | y \<approx> x}"}.
 Central to our proof will be the solution of equational systems
-involving sets of languages. For this we will use Arden's lemma \cite{Brzozowski64}
+involving equivalence classes of languages. For this we will use Arden's lemma \cite{Brzozowski64}
 which solves equations of the form @{term "X = A ;; X \<union> B"} provided
 @{term "[] \<notin> A"}. However we will need the following ``reverse''
 version of Arden's lemma.
 \begin{lemma}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
 \end{definition}
 \noindent
 It is easy to see that @{term "\<approx>A"} is an equivalence relation, which
 partitions the set of all strings, @{text "UNIV"}, into a set of disjoint
-equivalence classes. Let us give an example: consider the regular language containing just
+equivalence classes. To illustrate this quotient construction, let us give an
+example: consider the regular language containing just
 the string @{text "[c]"}. The relation @{term "\<approx>({[c]})"} partitions @{text UNIV}
-into the three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and  @{text "X\<^isub>3"}
+into three equivalence classes @{text "X\<^isub>1"}, @{text "X\<^isub>2"} and  @{text "X\<^isub>3"}
 as follows
 \begin{center}
 @{text "X\<^isub>1 = {[]}"}\hspace{5mm}
 @{text "X\<^isub>2 = {[c]}"}\hspace{5mm}
 \begin{equation}\label{inv2}
 @{text "X\<^isub>1 = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>p, CHAR c\<^isub>i\<^isub>p) \<union> \<calL>(\<lambda>(EMPTY))"}.
 \end{equation}
 \noindent
-The reason for adding the @{text \<lambda>}-marker to our equational system is
+The reason for adding the @{text \<lambda>}-marker to our initial equational system is
-to obtain this equation: it only holds in this form since none of
+to obtain this equation: it only holds with the marker since none of
 the other terms contain the empty string.
-Our represeantation of the equations are pairs,
+Our representation for the equations in Isabelle/HOL are pairs,
 where the first component is an equivalence class and the second component
-is a set of terms standing for the right-hand side. Given a set of equivalence
+is a set of terms. Given a set of equivalence
 classes @{text CS}, our initial equational system @{term "Init CS"} is thus
-defined as
+formally defined as
 \begin{center}
 \begin{tabular}{rcl}
 @{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} &
 @{text "if"}~@{term "[] \<in> X"}\\
 \noindent
 Because we use sets of terms
-for representing the right-hand sides in the equational system we can
+for representing the right-hand sides of equations, we can
 prove \eqref{inv1} and \eqref{inv2} more concisely as
 %
 \begin{lemma}\label{inv}
 If @{thm (prem 1) test} then @{text "X = \<Union> \<calL> ` rhs"}.
 \end{lemma}
 \end{center}
 \noindent
 which we also lift to entire right-hand sides of equations, written as
 @{thm (lhs) append_rhs_rexp_def[where rexp="r"]}. With this we can define
-the \emph{arden-operation} for an equation of the form @{text "X = rhs"}:
+the \emph{arden-operation} for an equation of the form @{text "X = rhs"} as:
 \begin{center}
 \begin{tabular}{rc@ {\hspace{2mm}}r@ {\hspace{1mm}}l}
 @{thm (lhs) Arden_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "let"}}\\
 & & @{text "rhs' ="} & @{term "rhs - {Trn X r | r. Trn X r \<in> rhs}"} \\
 & &  \multicolumn{2}{@ {\hspace{-2mm}}l}{@{text "in"}~~@{term "append_rhs_rexp rhs' r'"}}\\
 \end{tabular}
 \end{center}
 \noindent
-We first delete all terms of the form @{text "(X, r)"} from @{text rhs};
+In this definition, we first delete all terms of the form @{text "(X, r)"} from @{text rhs};
 then we calculate the combinded regular expressions for all @{text r} coming
 from the deleted @{text "(X, r)"}, and take the @{const STAR} of it;
 finally we append this regular expression to @{text rhs'}. It can be easily seen
 that this operation mimics Arden's lemma on the level of equations.
 The \emph{substituion-operation} takes an equation
 \end{tabular}
 \end{center}
 \noindent
 Finially, we can define how an equational system should be solved. For this
-we will iterate the elimination of an equation until only one equation
+we will need to iterate the elimination of an equation until only one equation
 will be left in the system. However, we not just want to have any equation
 as being the last one, but the one for which we want to calculate the regular
 expression. Therefore we define the iteration step so that it chooses an
 equation with an equivalence class that is not @{text X}. This allows us to
-control, which equation will be the last. We use again Hilbert's choice operator,
+control, which equation will be the last. We use Hilbert's choice operator,
-written @{text SOME}, to chose an equation in a equational system @{text ES}.
+written @{text SOME}, to chose an equation to be eliminated in @{text ES}.
 \begin{center}
 \begin{tabular}{rc@ {\hspace{4mm}}r@ {\hspace{1mm}}l}
 @{thm (lhs) Iter_def} & @{text "\<equiv>"}~~\mbox{} & \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "let"}}\\
 & & @{text "(Y, yrhs) ="} & @{term "SOME (Y, yrhs). (Y, yrhs) \<in> ES \<and> X \<noteq> Y"} \\
 & &  \multicolumn{2}{@ {\hspace{-4mm}}l}{@{text "in"}~~@{term "Remove ES Y yrhs"}}\\
 \end{tabular}
 \end{center}
 \noindent
-The last definition in our
+The last definition applies @{term Iter} over and over again a condition
+@{text COND} is \emph{not} satisfied anymore. The condition states that there
+are more than one equation in the equational system @{text ES}. The
+iteration is defined in terms of HOL's @{text while}-operator as follows:
 \begin{center}
 @{thm Solve_def}
 \end{center}
+\noindent
+We are not concerned here with the definition of this operator in this paper
+(see \cite{BerghoferNipkow00}).
 \begin{center}
 @{thm while_rule}
 \end{center}
 *}

changeset 101	d3fe0597080a
parent 100	2409827d8eb8
child 103	f460d5f75cb5