regexp: comparison Journal/Paper.thy

equal deleted inserted replaced

-:07a269d9642b
+:9f46a9571e37
 abbreviation "ZERO \<equiv> Zero"
 abbreviation "ONE \<equiv> One"
 abbreviation "ATOM \<equiv> Atom"
 abbreviation "PLUS \<equiv> Plus"
 abbreviation "TIMES \<equiv> Times"
-abbreviation "TIMESS \<equiv> Times_set"
+abbreviation "TIMESS \<equiv> Timess"
 abbreviation "STAR \<equiv> Star"
 notation (latex output)
 str_eq ("\<approx>\<^bsub>_\<^esub>") and
 Regular languages are an important and well-understood subject in Computer
 Science, with many beautiful theorems and many useful algorithms. There is a
 wide range of textbooks on this subject, many of which are aimed at students
 and contain very detailed `pencil-and-paper' proofs (e.g.~\cite{Kozen97,
 HopcroftUllman69}). It seems natural to exercise theorem provers by
-formalising the theorems and by verifying formally the algorithms.  A
+formalising the theorems and by verifying formally the algorithms.
-popular choice for a theorem prover would be one based on Higher-Order Logic
-(HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
+A popular choice for a theorem prover would be one based on Higher-Order
+Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
 presented in this paper we will use the latter. HOL is a predicate calculus
 that allows quantification over predicate variables. Its type system is
-based on Church's Simple Theory of Types \cite{Church40}.  Although
+based on Church's Simple Theory of Types \cite{Church40}.  Although many
-many mathematical concepts can be conveniently expressed in HOL, there are some
+mathematical concepts can be conveniently expressed in HOL, there are some
 limitations that hurt badly, if one attempts a simple-minded formalisation
-of regular languages in it.
+of regular languages in it.  The typical approach to regular languages is to
+introduce finite automata and then define everything in terms of them
-The typical approach to regular languages is to introduce finite automata
+\cite{Kozen97}.  For example, a regular language is normally defined as:
-and then define everything in terms of them \cite{Kozen97}.  For example,
-a regular language is normally defined as:
 \begin{dfntn}\label{baddef}
 A language @{text A} is \emph{regular}, provided there is a
 finite deterministic automaton that recognises all strings of @{text "A"}.
 \end{dfntn}
 pairs. Using this definition for disjoint union means we do not have a
 single type for automata. As a result we will not be able to define a regular
 language as one for which there exists an automaton that recognises all its
 strings. This is because we cannot make a definition in HOL that is polymorphic in
 the state type and there is no type quantification available in HOL (unlike
-in Coq, for example).
+in Coq, for example).\footnote{Slind already pointed out this problem in an email
+to the HOL4 mailing list on 21st April 2005.}
 An alternative, which provides us with a single type for automata, is to give every
 state node an identity, for example a natural
 number, and then be careful to rename these identities apart whenever
 connecting two automata. This results in clunky proofs
 Functions are much better supported in Isabelle/HOL, but they still lead to similar
 problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
 requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98}
 dismisses for this the option of using identities, because it leads according to
-him to ``messy proofs''. Since he does not need to define what a regular
+him to ``messy proofs''. Since he does not need to define what regular
-language is, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes
+languages are, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes
 \begin{quote}
 \it%
 \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
 `` & All lemmas appear obvious given a picture of the composition of automata\ldots
 `sink' state from which there is no connection to a final state (Brzozowski
 mentions this side-condition in the context of state complexity
 of automata \cite{Brzozowski10}). Such side-conditions mean that if we define a regular
 language as one for which there exists \emph{a} finite automaton that
 recognises all its strings (see Def.~\ref{baddef}), then we need a lemma which
-ensures that another equivalent can be found satisfying the
+ensures that another equivalent one can be found satisfying the
 side-condition. Unfortunately, such `little' and `obvious' lemmas make
 a formalisation of automata theory a hair-pulling experience.
 In this paper, we will not attempt to formalise automata theory in
 an argument about solving equational systems.  This argument appears to be
 folklore. For the other part, we give two proofs: one direct proof using
 certain tagging-functions, and another indirect proof using Antimirov's
 partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
 tagging-functions have not been used before to establish the Myhill-Nerode
-theorem. Derivatives of regular expressions have been recently used quite
+theorem. Derivatives of regular expressions have been used recently quite
-widely in the literature about regular expressions. However, partial
+widely in the literature; partial derivatives, in contrast, attracted much
-derivatives are more suitable in the context of the Myhill-Nerode theorem,
+less attention. However, partial derivatives are more suitable in the
-since it is easier to establish formally their finiteness result.
+context of the Myhill-Nerode theorem, since it is easier to establish
+formally their finiteness result. We have not found any proof that uses
+either of them in order to prove the Myhill-Nerode theorem.
 *}
 section {* Preliminaries *}
 text {*
 where @{text "@"} is the list-append operation. The Kleene-star of a language @{text A}
 is defined as the union over all powers, namely @{thm star_def}. In the paper
 we will make use of the following properties of these constructions.
 \begin{prpstn}\label{langprops}\mbox{}\\
-\begin{tabular}{@ {}ll}
+\begin{tabular}{@ {}lp{10cm}}
 (i)   & @{thm star_unfold_left}     \\
 (ii)  & @{thm[mode=IfThen] pow_length}\\
 (iii) & @{thm conc_Union_left} \\
+(iv)  & If @{thm (prem 1) star_decom} and @{thm (prem 2) star_decom} then
+there exists an @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with @{text "x = x\<^isub>p @ x\<^isub>s"}
+and @{term "x\<^isub>p \<noteq> []"} such that @{term "x\<^isub>p \<in> A"} and @{term "x\<^isub>s \<in> A\<star>"}.
 \end{tabular}
 \end{prpstn}
 \noindent
 In @{text "(ii)"} we use the notation @{term "length s"} for the length of a
 \end{center}
 \noindent
 that means we take the image of @{text f} w.r.t.~all elements in the
 domain. With this we will be able to infer that the tagging-functions, seen
-as relations, give rise to finitely many equivalence classes of @{const
+as relations, give rise to finitely many equivalence classes.
-UNIV}. Finally we will show that the tagging-relations are more refined than
+Finally we will show that the tagging-relations are more refined than
 @{term "\<approx>(lang r)"}, which implies that @{term "UNIV // \<approx>(lang r)"} must
 also be finite.  We formally define the notion of a \emph{tagging-relation}
 as follows.
 \noindent
 where @{text "A"} and @{text "B"} are some arbitrary languages. The reason for this choice
 is that we need to establish that @{term "=(tag_Plus A B)="} refines @{term "\<approx>(A \<union> B)"}.
 This amounts to showing @{term "x \<approx>A y"} or @{term "x \<approx>B y"} under the assumption
 @{term "x"}~@{term "=(tag_Plus A B)="}~@{term y}. As we shall see, this definition will
-provide us just the right assumptions in order to get the proof through.
+provide us with just the right assumptions in order to get the proof through.
 \begin{proof}[@{const "PLUS"}-Case]
 We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite
 (UNIV // \<approx>B)"} then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"}
 holds. The range of @{term "tag_Plus A B"} is a subset of this product
 "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
 \end{proof}
 \noindent
 The @{const TIMES}-case is slightly more complicated. We first prove the
-following lemma, which will aid the refinement-proofs.
+following lemma, which will aid the proof about refinement.
 \begin{lmm}\label{refinement}
 The relation @{text "\<^raw:$\threesim$>\<^bsub>tag\<^esub>"} refines @{term "\<approx>A"}, provided for
 all strings @{text x}, @{text y} and @{text z} we have \mbox{@{text "x \<^raw:$\threesim$>\<^bsub>tag\<^esub> y"}}
 and @{term "x @ z \<in> A"} imply @{text "y @ z \<in> A"}.
 \end{lmm}
 \noindent
-We therefore can clean information from how the strings @{text "x @ z"} are in @{text A}
+We therefore can analyse how the strings @{text "x @ z"} are in the language
-and construct appropriate tagging-functions to infer that @{term "y @ z \<in> A"}.
+@{text A} and then construct an appropriate tagging-function to infer that
-For the @{const Times}-case we additionally need the notion of the set of all
+@{term "y @ z"} are also in @{text A}.  For this we sill need the notion of
-possible partitions of a string
+the set of all possible \emph{partitions} of a string
 \begin{equation}
 @{thm Partitions_def}
 \end{equation}
+\noindent
+If we know that @{text "(x\<^isub>p, x\<^isub>s) \<in> Partitions x"}, we will
+refer to @{text "x\<^isub>p"} as the \emph{prefix} of the string @{text x},
+respectively to @{text "x\<^isub>s"} as the \emph{suffix}.
 Now assuming  @{term "x @ z \<in> A \<cdot> B"} there are only two possible ways of how to `split'
 this string to be in @{term "A \<cdot> B"}:
 %
 \begin{center}
 Either @{text x} and a prefix of @{text "z"} is in @{text A} and the rest in @{text B}
 (first picture) or there is a prefix of @{text x} in @{text A} and the rest is in @{text B}
 (second picture). In both cases we have to show that @{term "y @ z \<in> A \<cdot> B"}. The first case
 we will only go through if we know that  @{term "x \<approx>A y"} holds @{text "(*)"}. Because then
 we can infer from @{term "x @ z\<^isub>p \<in> A"} that @{term "y @ z\<^isub>p \<in> A"} holds for all @{text "z\<^isub>p"}.
-In the second case we only know that @{text "x\<^isub>p"} and @{text "x\<^isub>s"} is a possible partition
+In the second case we only know that @{text "x\<^isub>p"} and @{text "x\<^isub>s"} is one possible partition
-of the string @{text x}. So we have to know that both @{text "x\<^isub>p"} and the
+of the string @{text x}. We have to know that both @{text "x\<^isub>p"} and the
 corresponding partition @{text "y\<^isub>p"} are in @{text "A"}, and that @{text "x\<^isub>s"} is `@{text B}-related'
 to @{text "y\<^isub>s"} @{text "(**)"}. From the latter fact we can infer that @{text "y\<^isub>s @ z \<in> B"}.
+This will solve the second case.
 Taking the two requirements, @{text "(*)"} and @{text "(**)"}, together we define the
-tagging-function as:
+tagging-function in the @{const Times}-case as:
 \begin{center}
 @{thm (lhs) tag_Times_def[where ?A="A" and ?B="B"]}~@{text "\<equiv>"}~
 @{text "(\<lbrakk>x\<rbrakk>\<^bsub>\<approx>A\<^esub>, {\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>B\<^esub> | x\<^isub>p \<in> A \<and> (x\<^isub>p, x\<^isub>s) \<in> Partitions x})"}
 \end{center}
 \noindent
-With this definition in place, we can discharge the @{const "Times"}-case as follows.
+We have to make the assumption for all suffixes @{text "x\<^isub>s"}, since we do
+not know anything about how the string @{term x} is partitioned.
+With this definition in place, let us prove the @{const "Times"}-case.
 \begin{proof}[@{const TIMES}-Case]
 If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
 then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
 @{term "tag_Times A B"} is a subset of this product set, and therefore finite.
-By Lemma \ref{refinement} we know
+For the refinement of @{term "\<approx>(A \<cdot> B)"} and @{text "\<^raw:$\threesim$>\<^bsub>\<times>tag A B\<^esub>"},
+we have by Lemma \ref{refinement}
 \begin{center}
 @{term "tag_Times A B x = tag_Times A B y"}
 \end{center}
 \noindent
-and @{term "x @ z \<in> A \<cdot> B"}, and have to establish @{term "y @ z \<in> A \<cdot> B"}. As shown
+and @{term "x @ z \<in> A \<cdot> B"}, and have to establish @{term "y @ z \<in> A \<cdot>
-above, there are two cases to be considered (see pictures above). First,
+B"}. As shown in the pictures above, there are two cases to be
-there exists a @{text "z\<^isub>p"} and @{text "z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s \<in> B"}.
+considered. First, there exists a @{text "z\<^isub>p"} and @{text
-By the assumption about @{term "tag_Times A B"} we have
+"z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s
-@{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
+\<in> B"}.  By the assumption about @{term "tag_Times A B"} we have @{term "\<approx>A
-relation that @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"}, we can conclude in this case
+`` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Hence by the Myhill-Nerode
-with @{term "y @ z \<in> A \<cdot> B"}.
+relation @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"},
+we can conclude in this case with @{term "y @ z \<in> A \<cdot> B"} (recall @{text
+"z\<^isub>p @ z\<^isub>s = z"}).
 Second there exists a partition @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with
 @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}. We therefore have
 \begin{center}
 \noindent
 This means there must be a partition @{text "y\<^isub>p"} and @{text "y\<^isub>s"}
 such that @{term "y\<^isub>p \<in> A"} and @{term "\<approx>B `` {x\<^isub>s} = \<approx>B ``
 {y\<^isub>s}"}. Unfolding the Myhill-Nerode relation and together with the
-facts that @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}, we
+facts that @{text "x\<^isub>p \<in> A"} and \mbox{@{text "x\<^isub>s @ z \<in> B"}}, we
 obtain @{term "y\<^isub>p \<in> A"} and @{text "y\<^isub>s @ z \<in> B"}, as needed in
 this case.  We again can complete the @{const TIMES}-case by setting @{text
 A} to @{term "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
 \end{proof}
 \noindent
 The case for @{const Star} is similar to @{const TIMES}, but poses a few
-extra challenges.  To deal with them we define first the notion of a \emph{string
+extra challenges.  To deal with them, we define first the notion of a \emph{string
 prefix} and a \emph{strict string prefix}:
 \begin{center}
 \begin{tabular}{l}
 @{text "x \<le> y \<equiv> \<exists>z. y = x @ z"}\\
 @{text "x < y \<equiv> x \<le> y \<and> x \<noteq> y"}
 \end{tabular}
 \end{center}
-\noindent
+When analysing the case of @{text "x @ z"} being an element in @{term "A\<star>"}
-When we analyse the case of @{text "x @ z"} being an element in @{term "A\<star>"}
 and @{text x} is not the empty string, we have the following picture:
 \begin{center}
 \scalebox{1}{
 \begin{tikzpicture}
 \end{center}
 %
 \noindent
 We can find a strict prefix @{text "x\<^isub>p"} of @{text x} such that @{term "x\<^isub>p \<in> A\<star>"},
 @{text "x\<^isub>p < x"} and the rest @{term "x\<^isub>s @ z \<in> A\<star>"}. For example the empty string
-@{text "[]"} would do.
+@{text "[]"} would do (recall @{term "x \<noteq> []"}).
 There are potentially many such prefixes, but there can only be finitely many of them (the
 string @{text x} is finite). Let us therefore choose the longest one and call it
 @{text "x\<^bsub>pmax\<^esub>"}. Now for the rest of the string @{text "x\<^isub>s @ z"} we
-know it is in @{term "A\<star>"} and it is not the empty string. By Prop.~\ref{langprops}@{text "(i)"},
+know it is in @{term "A\<star>"} and cannot be the empty string. By Prop.~\ref{langprops}@{text "(iv)"},
 we can separate
-this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<in> A"}
+this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<noteq> []"}, @{text "a \<in> A"}
 and @{term "b \<in> A\<star>"}. Now @{text a} must be strictly longer than @{text "x\<^isub>s"},
 otherwise @{text "x\<^bsub>pmax\<^esub>"} is not the longest prefix. That means @{text a}
 `overlaps' with @{text z}, splitting it into two components @{text "z\<^isub>a"} and
 @{text "z\<^isub>b"}. For this we know that @{text "x\<^isub>s @ z\<^isub>a \<in> A"} and
 @{term "z\<^isub>b \<in> A\<star>"}. To cut a story short, we have divided @{term "x @ z \<in> A\<star>"}
 @{term "tag_Star A"} is a subset of this set, and therefore finite.
 Again we have to show under the assumption @{term "x"}~@{term "=(tag_Star A)="}~@{term y}
 that @{term "x @ z \<in> A\<star>"} implies @{term "y @ z \<in> A\<star>"}.
 We first need to consider the case that @{text x} is the empty string.
-From the assumption we can infer @{text y} is the empty string and
+From the assumption about strict prefixes in @{text "\<^raw:$\threesim$>\<^bsub>\<star>tag A\<^esub>"}, we
-clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
+can infer @{text y} is the empty string and
+then clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
 string, we can divide the string @{text "x @ z"} as shown in the picture
 above. By the tagging-function and the facts @{text "x\<^bsub>pmax\<^esub> \<in> A\<^isup>\<star>"} and @{text "x\<^bsub>pmax\<^esub> < x"},
 we have
 \begin{center}
 \begin{center}
 @{text "\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> \<in> {\<lbrakk>y\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> | y\<^bsub>p\<^esub> < y \<and> y\<^bsub>p\<^esub> \<in> A\<^isup>\<star> \<and> (y\<^bsub>p\<^esub>, y\<^isub>s) \<in> Partitions y}"}
 \end{center}
 \noindent
-and we know there exist partitions @{text "y\<^isub>p"} and @{text
+From this we know there exist partitions @{text "y\<^isub>p"} and @{text
 "y\<^isub>s"} with @{term "y\<^isub>p \<in> A\<star>"} and also @{term "x\<^isub>s \<approx>A
 y\<^isub>s"}. Unfolding the Myhill-Nerode relation we know @{term
 "y\<^isub>s @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
 Therefore @{term "y\<^isub>p @ (y\<^isub>s @ z\<^isub>a) @ z\<^isub>b \<in>
 A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. As the last step we have to set
-@{text "A"} to @{term "lang r"} and complete the proof.
+@{text "A"} to @{term "lang r"} and thus complete the proof.
 \end{proof}
 *}
-section {* Second Part based on Partial Derivatives *}
+section {* Second Part proved using Partial Derivatives *}
 text {*
 \noindent
 As we have seen in the previous section, in order to establish
 the second direction of the Myhill-Nerode theorem, we need to find
 a more refined relation than @{term "\<approx>(lang r)"} for which we can
 show that there are only finitely many equivalence classes. So far we
 showed this by induction on @{text "r"}. However, there is also
-an indirect method to come up with such a refined relation. Assume
+an indirect method to come up with such a refined relation based on
-the following two definitions for a left-quotient of a language, which
+derivatives of regular expressions \cite{Brzozowski64}.
-we write as @{term "Der c A"} and @{term "Ders s A"} where
-@{text c} is a character and @{text s} a string:
+Assume the following two definitions for a \emph{left-quotient} of a language,
+which we write as @{term "Der c A"} and @{term "Ders s A"} where @{text c}
+is a character and @{text s} a string:
 \begin{center}
 \begin{tabular}{r@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
 @{thm (lhs) Der_def}  & @{text "\<equiv>"} & @{thm (rhs) Der_def}\\
 @{thm (lhs) Ders_def} & @{text "\<equiv>"} & @{thm (rhs) Ders_def}\\
 \end{tabular}
 \end{center}
 \noindent
+In order to aid readability, we shall also make use of the following abbreviation:
+\begin{center}
+@{abbrev "Derss s A"}
+\end{center}
+\noindent
 Clearly we have the following relation between the Myhill-Nerode relation
 (Def.~\ref{myhillneroderel}) and left-quotients
 \begin{equation}\label{mhders}
 @{term "x \<approx>A y"} \hspace{4mm}\text{if and only if}\hspace{4mm} @{term "Ders x A = Ders y A"}
 \end{equation}
 \noindent
-It is realtively easy to establish the following properties for left-quotients:
+It is straightforward to establish the following properties for left-quotients:
 \begin{equation}
 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
-@{thm (lhs) Der_zero}  & $=$ & @{thm (rhs) Der_zero}\\
+@{thm (lhs) Der_simps(1)} & $=$ & @{thm (rhs) Der_simps(1)}\\
-@{thm (lhs) Der_one}   & $=$ & @{thm (rhs) Der_one}\\
+@{thm (lhs) Der_simps(2)} & $=$ & @{thm (rhs) Der_simps(2)}\\
-@{thm (lhs) Der_atom}  & $=$ & @{thm (rhs) Der_atom}\\
+@{thm (lhs) Der_simps(3)} & $=$ & @{thm (rhs) Der_simps(3)}\\
-@{thm (lhs) Der_union} & $=$ & @{thm (rhs) Der_union}\\
+@{thm (lhs) Der_simps(4)} & $=$ & @{thm (rhs) Der_simps(4)}\\
 @{thm (lhs) Der_conc}  & $=$ & @{thm (rhs) Der_conc}\\
 @{thm (lhs) Der_star}  & $=$ & @{thm (rhs) Der_star}\\
-@{thm (lhs) Ders_nil}  & $=$ & @{thm (rhs) Ders_nil}\\
+@{thm (lhs) Ders_simps(1)} & $=$ & @{thm (rhs) Ders_simps(1)}\\
-@{thm (lhs) Ders_cons} & $=$ & @{thm (rhs) Ders_cons}\\
+@{thm (lhs) Ders_simps(2)} & $=$ & @{thm (rhs) Ders_simps(2)}\\
-%@{thm (lhs) Ders_append[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}  & $=$
+%@{thm (lhs) Ders_simps(3)[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}  & $=$
-%   & @{thm (rhs) Ders_append[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}\\
+%   & @{thm (rhs) Ders_simps(3)[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}\\
 \end{tabular}}
 \end{equation}
 \noindent
 where @{text "\<Delta>"} is a function that tests whether the empty string
 @{thm (lhs) ders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) ders.simps(2)}\\
 \end{tabular}
 \end{center}
 \noindent
-The last two lines extend @{const der} to strings (list of characters) where
+The last two clauses extend derivatives for characters to strings (list of
-list-cons is written \mbox{@{text "_ :: _"}}. The function @{term "nullable r"} needed
+characters). The list-cons operator is written \mbox{@{text "_ :: _"}}. The
-in the @{const Times}-case tests whether a regular expression can recognise
+function @{term "nullable r"} needed in the @{const Times}-case tests
-the empty string:
+whether a regular expression can recognise the empty string:
 \begin{center}
 \begin{tabular}{c@ {\hspace{10mm}}c}
 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
 @{thm (lhs) nullable.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
 \end{tabular}}
 \end{equation}
 \noindent
 The importance in the context of the Myhill-Nerode theorem is that
-we can use \eqref{mhders} and the equations above in order to
+we can use \eqref{mhders} and \eqref{Dersders} in order to
-establish that @{term "x \<approx>(lang r) y"} if and only if
+establish that @{term "x \<approx>(lang r) y"} is equivalent to
 @{term "lang (ders x r) = lang (ders y r)"}. From this we obtain
 \begin{equation}
 @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
 \end{equation}
 \noindent
-Consequently, we can use as the tagging relation @{text
+which means the right-hand side (seen as relation) refines the
-"\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"}, since it refines
+Myhill-Nerode relation.  Consequently, we can use
-@{term "\<approx>(lang r)"}. However, in order to be useful in the Myhill-Nerode
+@{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a potential tagging-relation
-theorem, we have to show that for a language there are only finitely many
+for the regular expression @{text r}. However, in
-derivatives. Unfortunately, this is not true in general: Sakarovitch gives
+order to be useful in the Myhill-Nerode theorem, we also have to show that
-an example with infinitely many derivatives
+for the corresponding language there are only finitely many derivatives---ensuring
-\cite[Page~141]{Sakarovitch09}. What Brzozowski \cite{Brzozowski64} proved is
+that there are only finitely many equivalence classes. Unfortunately, this
-that for every language there \emph{are} finitely `dissimilar' derivatives for a
+is not true in general. Sakarovitch gives an example where a regular
-regular expression. Two regular expressions are said to be \emph{similar}
+expression  has infinitely many derivatives w.r.t.~a language
-provided they can be identified using the using the @{text "ACI"}-identities:
+\cite[Page~141]{Sakarovitch09}. What Brzozowski \cite{Brzozowski64} proved
+is that for every language there \emph{are} only finitely `dissimilar'
-\begin{center}
+derivatives for a regular expression. Two regular expressions are said to be
-\begin{tabular}{cl}
+\emph{similar} provided they can be identified using the using the @{text
+"ACI"}-identities:
+\begin{equation}\label{ACI}
+\mbox{\begin{tabular}{cl}
 (@{text A}) & @{term "Plus (Plus r\<^isub>1 r\<^isub>2) r\<^isub>3"} $\equiv$ @{term "Plus r\<^isub>1 (Plus r\<^isub>2 r\<^isub>3)"}\\
 (@{text C}) & @{term "Plus r\<^isub>1 r\<^isub>2"} $\equiv$ @{term "Plus r\<^isub>2 r\<^isub>1"}\\
 (@{text I}) & @{term "Plus r r"} $\equiv$ @{term "r"}\\
-\end{tabular}
+\end{tabular}}
-\end{center}
+\end{equation}
 \noindent
-Carrying this idea through menas we now have to reasoning modulo @{text "ACI"}.
+Carrying this idea through, we must not consider the set of all derivatives,
-This can be done, but it is very painful in a theorem prover (since there is
+but the ones modulo @{text "ACI"}.  In principle, this can be formally
-no direct characterisation of the set of dissimlar derivatives).
+defined, but it is very painful in a theorem prover (since there is no
+direct characterisation of the set of dissimlar derivatives).
 Fortunately, there is a much simpler approach using \emph{partial
 derivatives}. They were introduced by Antimirov \cite{Antimirov95} and can be defined
 in Isabelle/HOL as follows:
 @{thm (lhs) pder.simps(3)[where c'="d"]}  & @{text "\<equiv>"} & @{thm (rhs) pder.simps(3)[where c'="d"]}\\
 @{thm (lhs) pder.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}
 & @{text "\<equiv>"} & @{thm (rhs) pder.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
 @{thm (lhs) pder.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}
 & @{text "\<equiv>"} & @{text "if"}~@{term "nullable r\<^isub>1"}~@{text "then"}~%
-@{term "(Times_set (pder c r\<^isub>1) r\<^isub>2) \<union> (pder c r\<^isub>2)"}\\
+@{term "(Timess (pder c r\<^isub>1) r\<^isub>2) \<union> (pder c r\<^isub>2)"}\\
 & & \phantom{@{text "if"}~@{term "nullable r\<^isub>1"}~}@{text "else"}~%
-@{term "Times_set (pder c r\<^isub>1) r\<^isub>2"}\\
+@{term "Timess (pder c r\<^isub>1) r\<^isub>2"}\\
 @{thm (lhs) pder.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) pder.simps(6)}\smallskip\\
 @{thm (lhs) pders.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(1)}\\
+@{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{text "\<Union> (pders s) ` (pder c r)"}\\
+\end{tabular}
+\end{center}
+\noindent
+Again the last two clauses extend partial derivatives from characters to strings.
+Unlike `simple' derivatives, the functions for partial derivatives return sets of regular
+expressions. In the @{const Times} and @{const Star} cases we therefore use the
+auxiliary definition
+\begin{center}
+@{text "TIMESS rs r \<equiv> {TIMES r' r | r' \<in> rs}"}
+\end{center}
+\noindent
+in order to `sequence' a regular expression with a set of regular
+expressions. Note that in the last clause we first build the set of partial
+derivatives w.r.t~the character @{text c}, then build the image of this set under the
+function @{term "pders s"} and finally `union up' all resulting sets. It will be
+convenient to introduce the following abbreviation
+\begin{center}
+@{abbrev "pderss s A"}
+\end{center}
+\noindent
+which simplifies the last clause of @{const "pders"} to
+\begin{center}
+\begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
 @{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(2)}\\
 \end{tabular}
 \end{center}
-\noindent
+Partial derivatives can be seen as having the @{text "ACI"}-identities already built in:
-Unlike `simple' derivatives, these functions return a set of regular
+taking the partial derivatives of the
-expressions. In the @{const Times} and @{const Star} cases we use
+regular expressions in \eqref{ACI} gives us in each case
+equal sets.  Antimirov \cite{Antimirov95} showed a similar result to
-\begin{center}
+\eqref{Dersders} for partial derivatives:
-@{text "TIMESS rs r \<equiv> {TIMES r' r | r' \<in> rs}"}
-\end{center}
-\noindent
-in order to `sequence' a regular expressions with respect to a set of regular
-expresions. Note that in the last case we first build the set of partial derivatives
-w.r.t~@{text c}, then build the image of this set under the function @{term "pders s"}
-and finally `union up' all resulting sets. Note also that in some sense, partial
-derivatives have the @{text "ACI"}-identities already build in. Antimirov
-\cite{Antimirov95}
-showed
 \begin{equation}
-\mbox{\begin{tabular}{c}
+\mbox{\begin{tabular}{lc}
-@{thm Der_pder}\\
+@{text "(i)"}  & @{thm Der_pder}\\
-@{thm Ders_pders}
+@{text "(ii)"} & @{thm Ders_pders}
 \end{tabular}}
 \end{equation}
-\noindent
+\begin{proof}
-and proved that for every language and regular expression there are only finitely
+The first fact is by a simple induction on @{text r}. For the second we slightly
+modify Antimirov's proof by performing an induction on @{text s} where we
+genaralise over all @{text r}. That means in the @{text "cons"}-case the
+induction hypothesis is
+\begin{center}
+@{text "(IH)"}\hspace{3mm}@{term "\<forall>r. Ders s (lang r) = \<Union> lang ` (pders s r)"}
+\end{center}
+\noindent
+With this we can establish
+\begin{center}
+\begin{tabular}{r@ {\hspace{1.5mm}}c@ {\hspace{1.5mm}}ll}
+@{term "Ders (c # s) (lang r)"}
+& @{text "="} & @{term "Ders s (Der c (lang r))"} & by def.\\
+& @{text "="} & @{term "Ders s (\<Union> lang ` (pder c r))"} & by @{text "(i)"}\\
+& @{text "="} & @{term "\<Union> (Ders s) ` (lang ` (pder c r))"} & by def.~of @{text "Ders"}\\
+& @{text "="} & @{term "\<Union> lang ` (\<Union> pders s ` (pder c r))"} & by IH\\
+& @{text "="} & @{term "\<Union> lang ` (pders (c # s) r)"} & by def.\\
+\end{tabular}
+\end{center}
+\noindent
+In order to apply the induction hypothesis in the fourth step, we need the generalisation
+over all regular expressions @{text r}. The case for the empty string is routine and omitted.
+\end{proof}
+Antimirov also proved that for every language and regular expression there are only finitely
 many partial derivatives.
 *}
 section {* Closure Properties *}
 text {*
-The beautiful characteristics of regular languages is that they are closed
+\noindent
-under many operations. Closure under union, sequencencing and Kleene-star
+The real beauty of regular languages is that they are closed
+under almost all set operations. Closure under union, concatenation and Kleene-star
 are trivial to establish given our definition of regularity (Def.~\ref{regular}).
 More interesting is the closure under complement, because
 it seems difficult to construct a regular expression for the complement
 language by direct means. However the existence of such a regular expression
 can now be easily proved using the Myhill-Nerode theorem since

changeset 187	9f46a9571e37
parent 186	07a269d9642b
child 190	b73478aaf33e