--- a/Journal/Paper.thy	Tue Aug 09 22:14:41 2011 +0000
+++ b/Journal/Paper.thy	Tue Aug 09 22:15:11 2011 +0000
@@ -249,7 +249,7 @@
   pairs. Using this definition for disjoint union means we do not have a
   single type for automata. As a result we will not be able to define a regular
   language as one for which there exists an automaton that recognises all its
-  strings. This is because we cannot make a definition in HOL that is polymorphic in 
+  strings (Definition~\ref{baddef}). This is because we cannot make a definition in HOL that is polymorphic in 
   the state type and there is no type quantification available in HOL (unlike 
   in Coq, for example).\footnote{Slind already pointed out this problem in an email 
   to the HOL4 mailing list on 21st April 2005.}
@@ -305,21 +305,21 @@
   larger formalisations of automata theory are carried out in Nuprl
   \cite{Constable00} and in Coq \cite{Filliatre97}.
 
-  Also one might consider automata theory as a well-worn stock subject where
-  everything is crystal clear. However, paper proofs about automata often
-  involve subtle side-conditions which are easily overlooked, but which make
-  formal reasoning rather painful. For example Kozen's proof of the
-  Myhill-Nerode theorem requires that automata do not have inaccessible
+  Also one might consider automata theory and regular languages as a well-worn
+  stock subject where everything is crystal clear. However, paper proofs about
+  automata often involve subtle side-conditions which are easily overlooked,
+  but which make formal reasoning rather painful. For example Kozen's proof of
+  the Myhill-Nerode theorem requires that automata do not have inaccessible
   states \cite{Kozen97}. Another subtle side-condition is completeness of
-  automata, that is automata need to have total transition functions and at most one
-  `sink' state from which there is no connection to a final state (Brzozowski
-  mentions this side-condition in the context of state complexity
-  of automata \cite{Brzozowski10}). Such side-conditions mean that if we define a regular
-  language as one for which there exists \emph{a} finite automaton that
-  recognises all its strings (see Def.~\ref{baddef}), then we need a lemma which
-  ensures that another equivalent one can be found satisfying the
-  side-condition. Unfortunately, such `little' and `obvious' lemmas make
-  a formalisation of automata theory a hair-pulling experience.
+  automata, that is automata need to have total transition functions and at
+  most one `sink' state from which there is no connection to a final state
+  (Brzozowski mentions this side-condition in the context of state complexity
+  of automata \cite{Brzozowski10}). Such side-conditions mean that if we
+  define a regular language as one for which there exists \emph{a} finite
+  automaton that recognises all its strings (see Def.~\ref{baddef}), then we
+  need a lemma which ensures that another equivalent one can be found
+  satisfying the side-condition. Unfortunately, such `little' and `obvious'
+  lemmas make a formalisation of automata theory a hair-pulling experience.
 
 
   In this paper, we will not attempt to formalise automata theory in
@@ -361,11 +361,11 @@
   partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
   tagging-functions have not been used before to establish the Myhill-Nerode
   theorem. Derivatives of regular expressions have been used recently quite
-  widely in the literature; partial derivatives, in contrast, attracted much
+  widely in the literature; partial derivatives, in contrast, attract much
   less attention. However, partial derivatives are more suitable in the
   context of the Myhill-Nerode theorem, since it is easier to establish
-  formally their finiteness result. We have not found any proof that uses
-  either of them in order to prove the Myhill-Nerode theorem.
+  formally their finiteness result. We are not aware of any proof that uses
+  either of them for proving the Myhill-Nerode theorem.
 *}
 
 section {* Preliminaries *}
@@ -412,7 +412,10 @@
   \noindent
   In @{text "(ii)"} we use the notation @{term "length s"} for the length of a
   string; this property states that if \mbox{@{term "[] \<notin> A"}} then the lengths of
-  the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}.  We omit
+  the strings in @{term "A \<up> (Suc n)"} must be longer than @{text n}.  
+  Property @{text "(iv)"} states that a non-empty string in @{term "A\<star>"} can
+  always be split up into a non-empty prefix belonging to @{text "A"} and the
+  rest being in @{term "A\<star>"}. We omit
   the proofs for these properties, but invite the reader to consult our
   formalisation.\footnote{Available at \url{http://www4.in.tum.de/~urbanc/regexp.html}}
 
@@ -511,7 +514,15 @@
 
   \noindent
   holds, whereby @{text "\<calL> ` rs"} stands for the 
-  image of the set @{text rs} under function @{text "\<calL>"}.
+  image of the set @{text rs} under function @{text "\<calL>"} defined as
+
+  \begin{center}
+  @{term "lang ` rs \<equiv> {lang r | r. r \<in> rs}"}
+  \end{center}
+
+  \noindent
+  In what follows we shall use this convenient short-hand notation for images of sets 
+  also with other functions.
 *}
 
 
@@ -1147,7 +1158,7 @@
   \end{equation}
 
   \noindent
-  The relation @{term "\<approx>(lang r)"} partitions the set of all strings into some
+  The relation @{term "\<approx>(lang r)"} partitions the set of all strings, @{term UNIV}, into some
   equivalence classes. To show that there are only finitely many of them, it
   suffices to show in each induction step that another relation, say @{text
   R}, has finitely many equivalence classes and refines @{term "\<approx>(lang r)"}. 
@@ -1272,7 +1283,7 @@
 
   \begin{lmm}\label{refinement}
   The relation @{text "\<^raw:$\threesim$>\<^bsub>tag\<^esub>"} refines @{term "\<approx>A"}, provided for
-  all strings @{text x}, @{text y} and @{text z} we have \mbox{@{text "x \<^raw:$\threesim$>\<^bsub>tag\<^esub> y"}}
+  all strings @{text x}, @{text y} and @{text z} we have that \mbox{@{text "x \<^raw:$\threesim$>\<^bsub>tag\<^esub> y"}}
   and @{term "x @ z \<in> A"} imply @{text "y @ z \<in> A"}.
   \end{lmm}
 
@@ -1280,8 +1291,8 @@
   \noindent
   We therefore can analyse how the strings @{text "x @ z"} are in the language
   @{text A} and then construct an appropriate tagging-function to infer that
-  @{term "y @ z"} are also in @{text A}.  For this we sill need the notion of
-  the set of all possible \emph{partitions} of a string
+  @{term "y @ z"} are also in @{text A}.  For this we will use the notion of
+  the set of all possible \emph{partitions} of a string:
 
   \begin{equation}
   @{thm Partitions_def}
@@ -1290,7 +1301,7 @@
   \noindent
   If we know that @{text "(x\<^isub>p, x\<^isub>s) \<in> Partitions x"}, we will
   refer to @{text "x\<^isub>p"} as the \emph{prefix} of the string @{text x},
-  respectively to @{text "x\<^isub>s"} as the \emph{suffix}.
+  and respectively to @{text "x\<^isub>s"} as the \emph{suffix}.
 
 
   Now assuming  @{term "x @ z \<in> A \<cdot> B"} there are only two possible ways of how to `split' 
@@ -1531,12 +1542,12 @@
   \end{center}
   
   \noindent 
-  From this we know there exist partitions @{text "y\<^isub>p"} and @{text
+  From this we know there exist a partition @{text "y\<^isub>p"} and @{text
   "y\<^isub>s"} with @{term "y\<^isub>p \<in> A\<star>"} and also @{term "x\<^isub>s \<approx>A
   y\<^isub>s"}. Unfolding the Myhill-Nerode relation we know @{term
   "y\<^isub>s @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
   Therefore @{term "y\<^isub>p @ (y\<^isub>s @ z\<^isub>a) @ z\<^isub>b \<in>
-  A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. As the last step we have to set
+  A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. The last step is to set
   @{text "A"} to @{term "lang r"} and thus complete the proof.
   \end{proof}
 *}
@@ -1550,7 +1561,7 @@
   a more refined relation than @{term "\<approx>(lang r)"} for which we can
   show that there are only finitely many equivalence classes. So far we 
   showed this by induction on @{text "r"}. However, there is also 
-  an indirect method to come up with such a refined relation based on
+  an indirect method to come up with such a refined relation by using
   derivatives of regular expressions \cite{Brzozowski64}. 
 
   Assume the following two definitions for a \emph{left-quotient} of a language,
@@ -1565,14 +1576,14 @@
   \end{center}
 
   \noindent
-  In order to aid readability, we shall also make use of the following abbreviation:
+  In order to aid readability, we shall also make use of the following abbreviation
 
   \begin{center}
-  @{abbrev "Derss s A"}
+  @{abbrev "Derss s As"}
   \end{center}
   
-
   \noindent
+  where we apply the left-quotient to a set of languages and then combine the results.
   Clearly we have the following relation between the Myhill-Nerode relation
   (Def.~\ref{myhillneroderel}) and left-quotients
 
@@ -1581,7 +1592,7 @@
   \end{equation}
 
   \noindent
-  It is straightforward to establish the following properties for left-quotients:
+  It is also straightforward to establish the following properties for left-quotients:
   
   \begin{equation}
   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
@@ -1600,8 +1611,8 @@
 
   \noindent
   where @{text "\<Delta>"} is a function that tests whether the empty string
-  is in the language and returns @{term "{[]}"} or @{term "{}"}, respectively.
-  The only interesting case above is the last one where we use Prop.~\ref{langprops}
+  is in the language and returns @{term "{[]}"} or @{term "{}"}, accordingly.
+  The only interesting case above is the last one where we use Prop.~\ref{langprops}@{text "(i)"}
   in order to infer that @{term "Der c (A\<star>) = Der c (A \<cdot> A\<star>)"}. We can 
   then complete the proof by observing that @{term "Delta A \<cdot> Der c (A\<star>) \<subseteq> (Der c A) \<cdot> A\<star>"}.
   
@@ -1629,8 +1640,8 @@
   \end{center}
 
   \noindent
-  The last two clauses extend derivatives for characters to strings (list of
-  characters). The list-cons operator is written \mbox{@{text "_ :: _"}}. The
+  The last two clauses extend derivatives from characters to strings---i.e.~list of
+  characters. The list-cons operator is written \mbox{@{text "_ :: _"}}. The
   function @{term "nullable r"} needed in the @{const Times}-case tests
   whether a regular expression can recognise the empty string:
 
@@ -1652,7 +1663,7 @@
   \end{center}
 
   \noindent
-  By induction on the regular expression @{text r}, respectively the string @{text s}, 
+  By induction on the regular expression @{text r}, respectively on the string @{text s}, 
   one can easily show that left-quotients and derivatives relate as follows 
   \cite{Sakarovitch09}:
 
@@ -1667,7 +1678,7 @@
   The importance in the context of the Myhill-Nerode theorem is that 
   we can use \eqref{mhders} and \eqref{Dersders} in order to 
   establish that @{term "x \<approx>(lang r) y"} is equivalent to
-  @{term "lang (ders x r) = lang (ders y r)"}. From this we obtain
+  @{term "lang (ders x r) = lang (ders y r)"}. Hence
 
   \begin{equation}
   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
@@ -1677,10 +1688,10 @@
   \noindent
   which means the right-hand side (seen as relation) refines the
   Myhill-Nerode relation.  Consequently, we can use 
-  @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a potential tagging-relation
+  @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a tagging-relation
   for the regular expression @{text r}. However, in
-  order to be useful in the Myhill-Nerode theorem, we also have to show that
-  for the corresponding language there are only finitely many derivatives---ensuring
+  order to be useful for teh second part of the Myhill-Nerode theorem, we also have to show that
+  for the corresponding language there are only finitely many derivatives---thus ensuring
   that there are only finitely many equivalence classes. Unfortunately, this
   is not true in general. Sakarovitch gives an example where a regular
   expression  has infinitely many derivatives w.r.t.~a language
@@ -1700,8 +1711,8 @@
 
   \noindent
   Carrying this idea through, we must not consider the set of all derivatives,
-  but the ones modulo @{text "ACI"}.  In principle, this can be formally
-  defined, but it is very painful in a theorem prover (since there is no
+  but the ones modulo @{text "ACI"}.  In principle, this can be done formally, 
+  but it is very painful in a theorem prover (since there is no
   direct characterisation of the set of dissimlar derivatives).
 
 
@@ -1742,7 +1753,7 @@
   expressions. Note that in the last clause we first build the set of partial
   derivatives w.r.t~the character @{text c}, then build the image of this set under the
   function @{term "pders s"} and finally `union up' all resulting sets. It will be
-  convenient to introduce the following abbreviation
+  convenient to introduce for this the following abbreviation
 
   \begin{center}
   @{abbrev "pderss s A"}
@@ -1763,7 +1774,7 @@
   equal sets.  Antimirov \cite{Antimirov95} showed a similar result to
   \eqref{Dersders} for partial derivatives:
 
-  \begin{equation}
+  \begin{equation}\label{Derspders}
   \mbox{\begin{tabular}{lc}
   @{text "(i)"}  & @{thm Der_pder}\\
   @{text "(ii)"} & @{thm Ders_pders}
@@ -1787,7 +1798,7 @@
   \begin{tabular}{r@ {\hspace{1.5mm}}c@ {\hspace{1.5mm}}ll}
   @{term "Ders (c # s) (lang r)"} 
     & @{text "="} & @{term "Ders s (Der c (lang r))"} & by def.\\
-    & @{text "="} & @{term "Ders s (\<Union> lang ` (pder c r))"} & by @{text "(i)"}\\
+    & @{text "="} & @{term "Ders s (\<Union> lang ` (pder c r))"} & by @{text "("}\ref{Derspders}@{text ".i)"}\\
     & @{text "="} & @{term "\<Union> (Ders s) ` (lang ` (pder c r))"} & by def.~of @{text "Ders"}\\
     & @{text "="} & @{term "\<Union> lang ` (\<Union> pders s ` (pder c r))"} & by IH\\
     & @{text "="} & @{term "\<Union> lang ` (pders (c # s) r)"} & by def.\\
@@ -1795,12 +1806,29 @@
   \end{center}
   
   \noindent
-  In order to apply the induction hypothesis in the fourth step, we need the generalisation
-  over all regular expressions @{text r}. The case for the empty string is routine and omitted.
+  Note that in order to apply the induction hypothesis in the fourth equation, we
+  need the generalisation over all regular expressions @{text r}. The case for
+  the empty string is routine and omitted.
   \end{proof}
 
-  Antimirov also proved that for every language and regular expression there are only finitely
-  many partial derivatives.
+  \noindent
+  Taking \eqref{Dersders} and \eqref{Derspders} together gives the relationship 
+  between languages of derivatives and partial derivatives
+
+  \begin{equation}
+  \mbox{\begin{tabular}{lc}
+  @{text "(i)"}  & @{thm der_pder[symmetric]}\\
+  @{text "(ii)"} & @{thm ders_pders[symmetric]}
+  \end{tabular}}
+  \end{equation} 
+
+  \noindent
+  These two properties confirm the observation made earlier 
+  that by using sets, partial derivatives have the @{text "ACI"}-identidies
+  of derivatives already built in. 
+
+  Antimirov also proved that for every language and regular expression 
+  there are only finitely many partial derivatives.
 *}
 
 section {* Closure Properties *}
changeset 190	b73478aaf33e
parent 187	9f46a9571e37
child 193	2a5ac68db24b