--- a/Journal/Paper.thy	Fri Apr 20 11:27:49 2012 +0000
+++ b/Journal/Paper.thy	Fri Apr 20 11:45:06 2012 +0000
@@ -211,8 +211,8 @@
   Regular languages are an important and well-understood subject in Computer
   Science, with many beautiful theorems and many useful algorithms. There is a
   wide range of textbooks on this subject, many of which are aimed at students
-  and contain very detailed `pencil-and-paper' proofs (e.g.~\cite{Kozen97,
-  HopcroftUllman69}). It seems natural to exercise theorem provers by
+  and contain very detailed `pencil-and-paper' proofs 
+  (e.g.~\cite{HopcroftUllman69,Kozen97}). It seems natural to exercise theorem provers by
   formalising the theorems and by verifying formally the algorithms.  
 
   A popular choice for a theorem prover would be one based on Higher-Order
@@ -239,15 +239,15 @@
   convince oneself that regular languages are closed under complementation:
   one just has to exchange the accepting and non-accepting states in the
   corresponding automaton to obtain an automaton for the complement language.
-  The problem, however, lies with formalising such reasoning in a HOL-based
-  theorem prover. Automata are built up from states and transitions that need
-  to be represented as graphs, matrices or functions, none of which can be
+  The problem, however, lies with formalising such reasoning in a 
+  theorem prover. Automata are built up from states and transitions that are 
+  usually represented as graphs, matrices or functions, none of which can be
   defined as an inductive datatype.
 
   In case of graphs and matrices, this means we have to build our own
   reasoning infrastructure for them, as neither Isabelle/HOL nor HOL4 nor
   HOLlight support them with libraries. Even worse, reasoning about graphs and
-  matrices can be a real hassle in HOL-based theorem provers, because
+  matrices can be a real hassle in theorem provers, because
   we have to be able to combine automata.  Consider for
   example the operation of sequencing two automata, say $A_1$ and $A_2$, by
   connecting the accepting states of $A_1$ to the initial state of $A_2$:
@@ -381,7 +381,7 @@
   lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata
   working over bit strings in the context of Presburger arithmetic.  The only
   larger formalisations of automata theory are carried out in Nuprl
-  \cite{Constable00} and in Coq, e.g.~\cite{Filliatre97,Almeidaetal10}.
+  \cite{Constable00} and in Coq, e.g.~\cite{Almeidaetal10,Filliatre97}.
 
   Also, one might consider automata as just convenient `vehicles' for
   establishing properties about regular languages.  However, paper proofs
@@ -517,45 +517,14 @@
   which solves equations of the form @{term "X = A \<cdot> X \<union> B"} provided
   @{term "[] \<notin> A"}. However we will need the following `reversed' 
   version of Arden's Lemma (`reversed' in the sense of changing the order of @{term "A \<cdot> X"} to
-  \mbox{@{term "X \<cdot> A"}}).
+  \mbox{@{term "X \<cdot> A"}}).\footnote{The details of its proof are given in the Appendix.}
 
-  \begin{lmm}[Reversed Arden's Lemma]\label{arden}\mbox{}\\
+  \begin{lmm}[(Reversed Arden's Lemma)]\label{arden}\mbox{}\\
   If @{thm (prem 1) reversed_Arden} then
   @{thm (lhs) reversed_Arden} if and only if
   @{thm (rhs) reversed_Arden}.
   \end{lmm}
 
-  \begin{proof}
-  For the right-to-left direction we assume @{thm (rhs) reversed_Arden} and show
-  that @{thm (lhs) reversed_Arden} holds. From Property~\ref{langprops}@{text "(i)"} 
-  we have @{term "A\<star> = A \<cdot> A\<star> \<union> {[]}"},
-  which is equal to @{term "A\<star> = A\<star> \<cdot> A \<union> {[]}"}. Adding @{text B} to both 
-  sides gives @{term "B \<cdot> A\<star> = B \<cdot> (A\<star> \<cdot> A \<union> {[]})"}, whose right-hand side
-  is equal to @{term "(B \<cdot> A\<star>) \<cdot> A \<union> B"}. Applying the assumed equation 
-  completes this direction. 
-
-  For the other direction we assume @{thm (lhs) reversed_Arden}. By a simple induction
-  on @{text n}, we can establish the property
-
-  \begin{center}
-  @{text "(*)"}\hspace{5mm} @{thm (concl) reversed_arden_helper}
-  \end{center}
-  
-  \noindent
-  Using this property we can show that @{term "B \<cdot> (A \<up> n) \<subseteq> X"} holds for
-  all @{text n}. From this we can infer @{term "B \<cdot> A\<star> \<subseteq> X"} using the definition
-  of @{text "\<star>"}.
-  For the inclusion in the other direction we assume a string @{text s}
-  with length @{text k} is an element in @{text X}. Since @{thm (prem 1) reversed_Arden}
-  we know by Property~\ref{langprops}@{text "(ii)"} that 
-  @{term "s \<notin> X \<cdot> (A \<up> Suc k)"} since its length is only @{text k}
-  (the strings in @{term "X \<cdot> (A \<up> Suc k)"} are all longer). 
-  From @{text "(*)"} it follows then that
-  @{term s} must be an element in @{term "(\<Union>m\<le>k. B \<cdot> (A \<up> m))"}. This in turn
-  implies that @{term s} is in @{term "(\<Union>n. B \<cdot> (A \<up> n))"}. Using Property~\ref{langprops}@{text "(iii)"} 
-  this is equal to @{term "B \<cdot> A\<star>"}, as we needed to show.
-  \end{proof}
-
   \noindent
   Regular expressions are defined as the inductive datatype
 
@@ -588,9 +557,11 @@
   \end{center}
 
   Given a finite set of regular expressions @{text rs}, we will make use of the operation of generating 
-  a regular expression that matches the union of all languages of @{text rs}. We only need to know the 
+  a regular expression that matches the union of all languages of @{text rs}. 
+  This definion is not trivial in a theorem prover, but since 
+  we only need to know the 
   existence
-  of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
+  of such a regular expression, we can use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
   @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const PLUS} over the 
   set @{text rs} with @{const ZERO} for the empty set. We can prove that for a finite set @{text rs}
   %
@@ -620,7 +591,7 @@
   strings are related, provided there is no distinguishing extension in this
   language. This can be defined as a ternary relation.
 
-  \begin{dfntn}[Myhill-Nerode Relation]\label{myhillneroderel} 
+  \begin{dfntn}[(Myhill-Nerode Relation)]\label{myhillneroderel} 
   Given a language @{text A}, two strings @{text x} and
   @{text y} are Myhill-Nerode related provided
   \begin{center}
@@ -1087,7 +1058,7 @@
   \noindent
   We also need the fact that @{text Iter} decreases the termination measure.
 
-  \begin{lmm}\label{itertwo}
+  \begin{lmm}\label{itertwo}\mbox{}\\
   @{thm[mode=IfThen] iteration_step_measure[simplified (no_asm), where xrhs="rhs"]}
   \end{lmm}
 
@@ -1158,7 +1129,7 @@
   Lemma~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
   of the Myhill-Nerode Theorem.
 
-  \begin{proof}[Proof of Theorem~\ref{myhillnerodeone}]
+  \begin{proof}[of Theorem~\ref{myhillnerodeone}]
   By Lemma~\ref{every_eqcl_has_reg} we know that there exists a regular expression for
   every equivalence class in @{term "UNIV // \<approx>A"}. Since @{text "finals A"} is
   a subset of  @{term "UNIV // \<approx>A"}, we also know that for every equivalence class
@@ -1198,7 +1169,7 @@
   the base cases are straightforward.
 
 
-  \begin{proof}[Base Cases]
+  \begin{proof}[(Base Cases)]
   The cases for @{const ZERO}, @{const ONE} and @{const ATOM} are routine, because 
   we can easily establish that
 
@@ -1282,7 +1253,7 @@
   as follows.
 
 
-  \begin{dfntn}[Tagging-Relation] Given a tagging-function @{text tag}, then two strings @{text x}
+  \begin{dfntn}[(Tagging-Relation)] Given a tagging-function @{text tag}, then two strings @{text x}
   and @{text y} are \emph{tag-related} provided
   \begin{center}
   @{text "x \<^raw:$\threesim$>\<^bsub>tag\<^esub> y \<equiv> tag x = tag y"}\;.
@@ -1360,7 +1331,7 @@
   @{term "x"}~@{term "=(tag_Plus A B)="}~@{term y}. As we shall see, this definition will 
   provide us with just the right assumptions in order to get the proof through.
 
- \begin{proof}[@{const "PLUS"}-Case]
+ \begin{proof}[(@{const "PLUS"}-Case)]
   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite
   (UNIV // \<approx>B)"} then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"}
   holds. The range of @{term "tag_Plus A B"} is a subset of this product
@@ -1404,7 +1375,7 @@
   \begin{center}
   \begin{tabular}{c}
   \scalebox{1}{
-  \begin{tikzpicture}[fill=gray!20]
+  \begin{tikzpicture}[scale=0.8,fill=gray!20]
     \node[draw,minimum height=3.8ex, fill] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
     \node[draw,minimum height=3.8ex, right=-0.03em of x, fill] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ };
     \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$  };
@@ -1431,7 +1402,7 @@
   \end{tikzpicture}}
   \\[2mm]
   \scalebox{1}{
-  \begin{tikzpicture}[fill=gray!20]
+  \begin{tikzpicture}[scale=0.8,fill=gray!20]
     \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ };
     \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ };
     \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
@@ -1484,7 +1455,7 @@
   With this definition in place, let us prove the @{const "Times"}-case.
 
 
-  \begin{proof}[@{const TIMES}-Case]
+  \begin{proof}[(@{const TIMES}-Case)]
   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
   @{term "tag_Times A B"} is a subset of this product set, and therefore finite.
@@ -1547,7 +1518,7 @@
 
   \begin{center}
   \scalebox{1}{
-  \begin{tikzpicture}[fill=gray!20]
+  \begin{tikzpicture}[scale=0.8,fill=gray!20]
     \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ };
     \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ };
     \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
@@ -1609,7 +1580,7 @@
   @{text "{\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> | x\<^isub>p < x \<and> x\<^isub>p \<in> A\<^isup>\<star> \<and> (x\<^isub>p, x\<^isub>s) \<in> Partitions x}"}
   \end{center}
 
-  \begin{proof}[@{const Star}-Case]
+  \begin{proof}[(@{const Star}-Case)]
   If @{term "finite (UNIV // \<approx>A)"} 
   then @{term "finite (Pow (UNIV // \<approx>A))"} holds. The range of
   @{term "tag_Star A"} is a subset of this set, and therefore finite.
@@ -1646,9 +1617,10 @@
   \end{proof}
 *}
 
-section {* Second Part proved using Partial Derivatives\label{derivatives} *}
+section {* Second Part proved using Partial Derivatives *}
 
 text {*
+  \label{derivatives} 
   \noindent
   As we have seen in the previous section, in order to establish
   the second direction of the Myhill-Nerode Theorem, it is sufficient to find 
@@ -1717,7 +1689,7 @@
   this notion in Isabelle/HOL as follows:
 
   \begin{center}
-  \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
+  \begin{longtable}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
   @{thm (lhs) deriv.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) deriv.simps(1)}\\
   @{thm (lhs) deriv.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) deriv.simps(2)}\\
   @{thm (lhs) deriv.simps(3)[where c'="d"]}  & @{text "\<equiv>"} & @{thm (rhs) deriv.simps(3)[where c'="d"]}\\
@@ -1730,8 +1702,8 @@
                     @{term "Times (deriv c r\<^isub>1) r\<^isub>2"}\\ 
   @{thm (lhs) deriv.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) deriv.simps(6)}\smallskip\\
   @{thm (lhs) derivs.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) derivs.simps(1)}\\
-  @{thm (lhs) derivs.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) derivs.simps(2)}\\
-  \end{tabular}
+  @{thm (lhs) derivs.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) derivs.simps(2)}
+  \end{longtable}
   \end{center}
 
   \noindent
@@ -1940,7 +1912,7 @@
   @{thm pderivs_lang_def}
   \end{equation}
 
-  \begin{thrm}[Antimirov \cite{Antimirov95}]\label{antimirov}
+  \begin{thrm}[(Antimirov \cite{Antimirov95})]\label{antimirov}
   For every language @{text A} and every regular expression @{text r}, 
   \mbox{@{thm finite_pderivs_lang}}.
   \end{thrm}
@@ -1986,7 +1958,7 @@
   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"} as tagging-relation. 
 
 
-  \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)]
+  \begin{proof}[of Theorem~\ref{myhillnerodetwo} (second version)]
   Using \eqref{mhders}
   and \eqref{Derspders} we can easily infer that
 
@@ -2015,9 +1987,10 @@
   \end{proof}
 *}
 
-section {* Closure Properties of Regular Languages\label{closure} *}
+section {* Closure Properties of Regular Languages *}
 
 text {*
+  \label{closure} 
   \noindent
   The beauty of regular languages is that they are closed under many set
   operations. Closure under union, concatenation and Kleene-star are trivial
@@ -2042,7 +2015,7 @@
   steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text
   "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"}
   regular expression. Clearly not something you want to formalise in a theorem
-  prover in which it is cumbersome to reason about automata.
+  prover if it is cumbersome to reason about automata.
 
   Once closure under complement is established, closure under intersection
   and set difference is also easy, because
@@ -2119,7 +2092,7 @@
 
   Even more surprising is the fact that for \emph{every} language @{text A}, the language
   consisting of all (scattered) substrings of @{text A} is regular \cite{Haines69} (see also 
-  \cite{Shallit08, Gasarch09}). 
+  \cite{Shallit08,Gasarch09}). 
   A \emph{(scattered) substring} can be obtained
   by striking out zero or more characters from a string. This can be defined 
   inductively in Isabelle/HOL by the following three rules:
@@ -2145,7 +2118,7 @@
   \noindent
   We like to establish
 
-  \begin{thrm}[Haines \cite{Haines69}]\label{subseqreg}
+  \begin{thrm}[(Haines \cite{Haines69})]\label{subseqreg}
   For every language @{text A}, the languages @{text "(i)"} @{term "SUBSEQ A"} and 
   @{text "(ii)"} @{term "SUPSEQ A"}
   are regular.
@@ -2250,7 +2223,7 @@
   \noindent
   This lemma allows us to establish the second part of Theorem~\ref{subseqreg}.
 
-  \begin{proof}[Proof of the Second Part of Theorem~\ref{subseqreg}]
+  \begin{proof}[of the Second Part of Theorem~\ref{subseqreg}]
   Given any language @{text A}, by Lemma~\ref{mset} we know there exists
   a finite, and thus regular, language @{text M}. We further have @{term "SUPSEQ M = SUPSEQ A"},
   which establishes the second part.    
@@ -2267,7 +2240,7 @@
   \noindent
   holds. Now the first part of Theorem~\ref{subseqreg} is a simple consequence of the second part.
 
-  \begin{proof}[Proof of the First Part of Theorem~\ref{subseqreg}]
+  \begin{proof}[of the First Part of Theorem~\ref{subseqreg}]
   By the second part, we know the right-hand side of \eqref{compl}
   is regular, which means @{term "- SUBSEQ A"} is regular. But since
   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
@@ -2278,7 +2251,7 @@
   the non-regularity of languages. For this we use the following version of the Continuation
   Lemma (see for example~\cite{Rosenberg06}).
 
-  \begin{lmm}[Continuation Lemma]
+  \begin{lmm}[(Continuation Lemma)]
   If a language @{text A} is regular and a set of strings @{text B} is infinite,
   then there exist two distinct strings @{text x} and @{text y} in @{text B} 
   such that @{term "x \<approx>A y"}.
@@ -2324,12 +2297,12 @@
   \noindent
   In this paper we took the view that a regular language is one where there
   exists a regular expression that matches all of its strings. Regular
-  expressions can conveniently be defined as a datatype in HOL-based theorem
+  expressions can conveniently be defined as a datatype in theorem
   provers. For us it was therefore interesting to find out how far we can push
   this point of view. We have established in Isabelle/HOL both directions 
   of the Myhill-Nerode Theorem.
   %
-  \begin{thrm}[The Myhill-Nerode Theorem]\mbox{}\\
+  \begin{thrm}[(The Myhill-Nerode Theorem)]\mbox{}\\
   A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
   \end{thrm}
   
@@ -2346,8 +2319,8 @@
   proved using the Myhill-Nerode Theorem.  
 
   Our insistence on regular expressions for proving the Myhill-Nerode Theorem
-  arose from the limitations of HOL, which is the logic underlying the popular theorem provers HOL4,
-  HOLlight and Isabelle/HOL. In order to guarantee consistency,
+  arose from the problem of defining formally the regularity of a language.
+  In order to guarantee consistency,
   formalisations in HOL can only extend the logic with definitions that introduce a new concept in
   terms of already existing notions. A convenient definition for automata
   (based on graphs) uses a polymorphic type for the state nodes. This allows
@@ -2367,7 +2340,7 @@
   they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
   example). Also HOL does not contain type-quantifiers which would allow us to
   get rid of the polymorphism by quantifying over the type-variable @{text
-  "\<alpha>"}. Therefore when defining regularity in terms of automata, the only
+  "\<alpha>"}. Therefore when defining regularity in terms of automata, the
   natural way out in HOL is to resort to state nodes with an identity, for
   example a natural number. Unfortunatly, the consequence is that we have to
   be careful when combining two automata so that there is no clash between two
@@ -2473,7 +2446,12 @@
   main result is the
   correctness of Mirkin's construction of an automaton from a regular
   expression using partial derivatives. This took approximately 10600 lines
-  of code.  In terms of time, the estimate for our formalisation is that we
+  of code.  Also Braibant formalised a large part of regular language
+  theory and Kleene algebras in Coq \cite{Braibant12}. While he is mainly interested
+  in implementing decision procedures for Kleene algebras, his library
+  includes a proof of the Myhill-Nerode theorem. He reckons that our 
+  ``development is more concise'' than his one based on matrices \cite[Page 67]{Braibant12}.
+  In terms of time, the estimate for our formalisation is that we
   needed approximately 3 months and this included the time to find our proof
   arguments. Unlike Constable et al, who were able to follow the Myhill-Nerode 
   proof from \cite{HopcroftUllman69}, we had to find our own arguments.  So for us the
@@ -2487,6 +2465,49 @@
   We are grateful for the comments we received from Larry Paulson.  Tobias
   Nipkow made us aware of the properties in Theorem~\ref{subseqreg} and Tjark
   Weber helped us with proving them.
+
+  \bibliographystyle{plain}
+  \bibliography{root}
+
+  \newpage
+  \begin{appendix}
+  \section{Appendix$^\star$}
+
+  \renewcommand{\thefootnote}{\mbox{$\star$}}
+  \footnotetext{If the reviewers deem more suitable, the authors are
+  prepared to drop material or move it to an electronic appendix.}
+  
+  \begin{proof}[of Lemma~\ref{arden}]
+  For the right-to-left direction we assume @{thm (rhs) reversed_Arden} and show
+  that @{thm (lhs) reversed_Arden} holds. From Property~\ref{langprops}@{text "(i)"} 
+  we have @{term "A\<star> = A \<cdot> A\<star> \<union> {[]}"},
+  which is equal to @{term "A\<star> = A\<star> \<cdot> A \<union> {[]}"}. Adding @{text B} to both 
+  sides gives @{term "B \<cdot> A\<star> = B \<cdot> (A\<star> \<cdot> A \<union> {[]})"}, whose right-hand side
+  is equal to @{term "(B \<cdot> A\<star>) \<cdot> A \<union> B"}. Applying the assumed equation 
+  completes this direction. 
+
+  For the other direction we assume @{thm (lhs) reversed_Arden}. By a simple induction
+  on @{text n}, we can establish the property
+
+  \begin{center}
+  @{text "(*)"}\hspace{5mm} @{thm (concl) reversed_arden_helper}
+  \end{center}
+  
+  \noindent
+  Using this property we can show that @{term "B \<cdot> (A \<up> n) \<subseteq> X"} holds for
+  all @{text n}. From this we can infer @{term "B \<cdot> A\<star> \<subseteq> X"} using the definition
+  of @{text "\<star>"}.
+  For the inclusion in the other direction we assume a string @{text s}
+  with length @{text k} is an element in @{text X}. Since @{thm (prem 1) reversed_Arden}
+  we know by Property~\ref{langprops}@{text "(ii)"} that 
+  @{term "s \<notin> X \<cdot> (A \<up> Suc k)"} since its length is only @{text k}
+  (the strings in @{term "X \<cdot> (A \<up> Suc k)"} are all longer). 
+  From @{text "(*)"} it follows then that
+  @{term s} must be an element in @{term "(\<Union>m\<le>k. B \<cdot> (A \<up> m))"}. This in turn
+  implies that @{term s} is in @{term "(\<Union>n. B \<cdot> (A \<up> n))"}. Using Property~\ref{langprops}@{text "(iii)"} 
+  this is equal to @{term "B \<cdot> A\<star>"}, as we needed to show.
+  \end{proof}
+  \end{appendix}
 *}
changeset 348	bea94f1e6771
parent 338	e7504bfdbd50
child 350	8ce9a432680b