diff -r 11c3c302fa2e -r 204856ef5573 Journal/Paper.thy
--- a/Journal/Paper.thy	Wed Aug 17 07:43:09 2011 +0000
+++ b/Journal/Paper.thy	Wed Aug 17 17:36:19 2011 +0000
@@ -157,10 +157,10 @@
   limitations that hurt badly, if one attempts a simple-minded formalisation
   of regular languages in it. 
 
-  The typical approach to regular languages is to
-  introduce finite automata and then define everything in terms of them
-  \cite{ HopcroftUllman69,Kozen97}.  For example, a regular language is 
-  normally defined as:
+  The typical approach to regular languages is to introduce finite
+  deterministic automata and then define everything in terms of them \cite{
+  HopcroftUllman69,Kozen97}.  For example, a regular language is normally
+  defined as:
 
   \begin{dfntn}\label{baddef}
   A language @{text A} is \emph{regular}, provided there is a 
@@ -252,13 +252,14 @@
   \noindent
   changes the type---the disjoint union is not a set, but a set of
   pairs. Using this definition for disjoint union means we do not have a
-  single type for the states of automata. As a result we will not be able to define a
-  regular language as one for which there exists an automaton that recognises
-  all its strings (Definition~\ref{baddef}). This is because we cannot make a
-  definition in HOL that is only polymorphic in the state type and there is no type
-  quantification available in HOL (unlike in Coq, for example).\footnote{Slind
-  already pointed out this problem in an email to the HOL4 mailing list on
-  21st April 2005.}
+  single type for the states of automata. As a result we will not be able to
+  define in our fomalisation a regular language as one for which there exists
+  an automaton that recognises all its strings (Definition~\ref{baddef}). This
+  is because we cannot make a definition in HOL that is only polymorphic in
+  the state type, but not in the predicate for regularity; and there is no
+  type quantification available in HOL (unlike in Coq, for
+  example).\footnote{Slind already pointed out this problem in an email to the
+  HOL4 mailing list on 21st April 2005.}
 
   An alternative, which provides us with a single type for states of automata,
   is to give every state node an identity, for example a natural number, and
@@ -268,12 +269,15 @@
   matrices results in very adhoc constructions, which are not pleasant to
   reason about.
 
-  Functions are much better supported in Isabelle/HOL, but they still lead to similar
-  problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
-  requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} 
-  dismisses for this the option of using identities, because it leads according to 
-  him to ``messy proofs''. Since he does not need to define what regular
-  languages are, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes 
+  Functions are much better supported in Isabelle/HOL, but they still lead to
+  similar problems as with graphs.  Composing, for example, two
+  non-deterministic automata in parallel requires also the formalisation of
+  disjoint unions. Nipkow \cite{Nipkow98} dismisses for this the option of
+  using identities, because it leads according to him to ``messy
+  proofs''. Since he does not need to define what regular languages are,
+  Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but
+  writes\smallskip
+
 
   \begin{quote}
   \it%
@@ -281,10 +285,10 @@
   `` & All lemmas appear obvious given a picture of the composition of automata\ldots
        Yet their proofs require a painful amount of detail.''
   \end{tabular}
-  \end{quote}
+  \end{quote}\smallskip
 
   \noindent
-  and
+  and\smallskip
   
   \begin{quote}
   \it%
@@ -292,7 +296,7 @@
   `` & If the reader finds the above treatment in terms of bit lists revoltingly
        concrete, I cannot disagree. A more abstract approach is clearly desirable.''
   \end{tabular}
-  \end{quote}
+  \end{quote}\smallskip
 
 
   \noindent
@@ -1315,10 +1319,10 @@
   \begin{center}
   \begin{tabular}{c}
   \scalebox{1}{
-  \begin{tikzpicture}
-    \node[draw,minimum height=3.8ex] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
-    \node[draw,minimum height=3.8ex, right=-0.03em of x] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ };
-    \node[draw,minimum height=3.8ex, right=-0.03em of za] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$  };
+  \begin{tikzpicture}[fill=gray!20]
+    \node[draw,minimum height=3.8ex, fill] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of x, fill] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$  };
 
     \draw[decoration={brace,transform={yscale=3}},decorate]
            (x.north west) -- ($(za.north west)+(0em,0em)$)
@@ -1342,10 +1346,10 @@
   \end{tikzpicture}}
   \\[2mm]
   \scalebox{1}{
-  \begin{tikzpicture}
-    \node[draw,minimum height=3.8ex] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ };
-    \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ };
-    \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
+  \begin{tikzpicture}[fill=gray!20]
+    \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
 
     \draw[decoration={brace,transform={yscale=3}},decorate]
            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
@@ -1458,11 +1462,11 @@
 
   \begin{center}
   \scalebox{1}{
-  \begin{tikzpicture}
-    \node[draw,minimum height=3.8ex] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ };
-    \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ };
-    \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
-    \node[draw,minimum height=3.8ex, right=-0.03em of za] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
+  \begin{tikzpicture}[fill=gray!20]
+    \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
+    \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
 
     \draw[decoration={brace,transform={yscale=3}},decorate]
            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
@@ -1654,13 +1658,11 @@
   follows.
 
   \begin{center}
-  \begin{tabular}{c@ {\hspace{10mm}}c}
+  \begin{tabular}{c}
   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
   @{thm (lhs) nullable.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
   @{thm (lhs) nullable.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(2)}\\
   @{thm (lhs) nullable.simps(3)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(3)}\\
-  \end{tabular} &
-  \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
   @{thm (lhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
      & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
   @{thm (lhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
@@ -1685,8 +1687,15 @@
   \noindent
   The importance of this fact in the context of the Myhill-Nerode theorem is that 
   we can use \eqref{mhders} and \eqref{Dersders} in order to 
-  establish that @{term "x \<approx>(lang r) y"} is equivalent to
-  \mbox{@{term "lang (ders x r) = lang (ders y r)"}}. Hence
+  establish that 
+
+  \begin{center}
+  @{term "x \<approx>(lang r) y"} \hspace{4mm}if and only if\hspace{4mm}
+  @{term "lang (ders x r) = lang (ders y r)"}. 
+  \end{center}
+
+  \noindent
+  holds and hence
 
   \begin{equation}
   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
@@ -1694,7 +1703,7 @@
 
 
   \noindent
-  which means the right-hand side (seen as a relation) refines the Myhill-Nerode
+  This means the right-hand side (seen as a relation) refines the Myhill-Nerode
   relation.  Consequently, we can use @{text
   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
   tagging-relation. However, in order to be useful for the second part of the
@@ -1840,7 +1849,7 @@
   of derivatives already built in. 
 
   Antimirov also proved that for every language and regular expression 
-  there are only finitely many partial derivatives, whereby the partial
+  there are only finitely many partial derivatives, whereby the set of partial
   derivatives of @{text r} w.r.t.~a language @{text A} is defined as
 
   \begin{equation}\label{Pdersdef}
@@ -1853,7 +1862,7 @@
   \end{thrm}
 
   \noindent
-  Antimirov's proof first shows this theorem for the language @{term UNIV1}, 
+  Antimirov's proof first establishes this theorem for the language @{term UNIV1}, 
   which is the set of all non-empty strings. For this he proves:
 
   \begin{equation}\label{pdersunivone}
@@ -1885,10 +1894,10 @@
   and for all languages @{text "A"}, @{term "pders_lang A r"} is a subset of
   @{term "pders_lang UNIV r"}.  Since we follow Antimirov's proof quite
   closely in our formalisation (only the last two cases of
-  \eqref{pdersunivone} involve some non-routine induction argument), we omit
+  \eqref{pdersunivone} involve some non-routine induction arguments), we omit
   the details.
 
-  Let us now return to our proof about the second direction in the Myhill-Nerode
+  Let us now return to our proof for the second direction in the Myhill-Nerode
   theorem. The point of the above calculations is to use 
   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. 
 
@@ -1915,14 +1924,14 @@
 
   \noindent
   Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"},
-  which we know is finite by Theorem~\ref{antimirov}. This means there 
+  which we know is finite by Theorem~\ref{antimirov}. Consequently there 
   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"},
-  which refines @{term "\<approx>(lang r)"}, and consequently we can again conclude the 
+  which refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the 
   second part of the Myhill-Nerode theorem.
   \end{proof}
 *}
 
-section {* Closure Properties of Regular Languages *}
+section {* Closure Properties of Regular Languages\label{closure} *}
 
 text {*
   \noindent
@@ -1960,7 +1969,8 @@
   \end{center}
 
   \noindent
-  Closure of regular languages under reversal, that is
+  Since all finite languages are regular, then by closure under complement also
+  all co-finite languages. Closure of regular languages under reversal, that is
 
   \begin{center}
   @{text "A\<^bsup>-1\<^esup> \<equiv> {s\<^bsup>-1\<^esup> | s \<in> A}"}
@@ -2051,10 +2061,10 @@
   proved using the Myhill-Nerode theorem.  
 
   Our insistence on regular expressions for proving the Myhill-Nerode theorem
-  arose from the limitations of HOL, on which the popular theorem provers HOL4,
-  HOLlight and Isabelle/HOL are based. In order to guarantee consistency,
-  formalisations can extend HOL with definitions that introduce a new concept in
-  only terms of already existing notions. A convenient definition for automata
+  arose from the limitations of HOL, used in the popular theorem provers HOL4,
+  HOLlight and Isabelle/HOL. In order to guarantee consistency,
+  formalisations in HOL can only extend the logic with definitions that introduce a new concept in
+  terms of already existing notions. A convenient definition for automata
   (based on graphs) uses a polymorphic type for the state nodes. This allows
   us to use the standard operation for disjoint union whenever we need to compose two
   automata. Unfortunately, we cannot use such a polymorphic definition
@@ -2062,14 +2072,14 @@
   over set of strings).  Consider the following attempt:
 
   \begin{center}
-  @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_finite_automata (M) \<and> \<calL>(M) = A"}
+  @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
   \end{center}
 
   \noindent
   In this definifion, the definiens is polymorphic in the type of the automata
-  @{text "M"} (indicated by the @{text "\<alpha>"}), but the definiendum @{text
-  "is_regular"} is not. Such definitions are excluded in HOL, because they can
-  lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
+  @{text "M"} (indicated by dependency on @{text "\<alpha>"}), but the definiendum
+  @{text "is_regular"} is not. Such definitions are excluded in HOL, because
+  they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
   example). Also HOL does not contain type-quantifiers which would allow us to
   get rid of the polymorphism by quantifying over the type-variable @{text
   "\<alpha>"}. Therefore when defining regularity in terms of automata, the only
@@ -2079,8 +2089,10 @@
   such states. This makes formalisations quite fiddly and rather
   unpleasant. Regular expressions proved much more convenient for reasoning in
   HOL since they can be defined as inductive datatype and a reasoning
-  infrastructure comes for free. We showed they can be used for establishing
-  the central result in regular language theory---the Myhill-Nerode theorem.
+  infrastructure comes for free. The definition of regularity in terms of
+  regular expressions poses no problem at all for HOL.  We showed in this
+  paper that they can be used for establishing the central result in regular
+  language theory---the Myhill-Nerode theorem.
 
   While regular expressions are convenient, they have some limitations. One is
   that there seems to be no method of calculating a minimal regular expression
@@ -2141,20 +2153,21 @@
   algorithm is still executable. Given the existing infrastructure for
   executable sets in Isabelle/HOL, it should.
 
-
   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
   Isabelle/Isar code for the first direction and 460 for the second (the one
-  based on tagging functions), plus around 300 lines of standard material
+  based on tagging-functions), plus around 300 lines of standard material
   about regular languages. The formalisation of derivatives and partial
   derivatives shown in Section~\ref{derivatives} consists of 390 lines of
-  code.  The algorithm for solving equational systems, which we used in the
-  first direction, is conceptually relatively simple. Still the use of sets
-  over which the algorithm operates means it is not as easy to formalise as
-  one might hope. However, it seems sets cannot be avoided since the `input'
-  of the algorithm consists of equivalence classes and we cannot see how to
-  reformulate the theory so that we can use lists. Lists would be much easier
-  to reason about, since we can define functions over them by recursion. For
-  sets we have to use set-comprehensions, which is slightly unwieldy.
+  code.  The closure properties in Section~\ref{closure} can be established in
+  190 lines of code.  The algorithm for solving equational systems, which we
+  used in the first direction, is conceptually relatively simple. Still the
+  use of sets over which the algorithm operates means it is not as easy to
+  formalise as one might hope. However, it seems sets cannot be avoided since
+  the `input' of the algorithm consists of equivalence classes and we cannot
+  see how to reformulate the theory so that we can use lists. Lists would be
+  much easier to reason about, since we can define functions over them by
+  recursion. For sets we have to use set-comprehensions, which is slightly
+  unwieldy.
 
   While our formalisation might appear large, it should be seen
   in the context of the work done by Constable at al \cite{Constable00} who