# HG changeset patch
# User urbanc
# Date 1311677906 0
# Node ID d371536861bc63fdd1e9fb14a6d4d9445eb63642
# Parent  21ee3a852a02d3844cb751f3e749537ef6542f2c
more on the introduction of the journal paper

diff -r 21ee3a852a02 -r d371536861bc Journal/Paper.thy
--- a/Journal/Paper.thy	Mon Jul 25 18:00:52 2011 +0000
+++ b/Journal/Paper.thy	Tue Jul 26 10:58:26 2011 +0000
@@ -29,16 +29,14 @@
   "_Collect p P"      <= "{p|xs. P}"
   "_CollectIn p A P"  <= "{p : A. P}"
 
-abbreviation "NULL \<equiv> Zero"
-abbreviation "EMPTY \<equiv> One"
-abbreviation "CHAR \<equiv> Atom"
-abbreviation "ALT \<equiv> Plus"
-abbreviation "SEQ \<equiv> Times"
+abbreviation "ZERO \<equiv> Zero"
+abbreviation "ONE \<equiv> One"
+abbreviation "ATOM \<equiv> Atom"
+abbreviation "PLUS \<equiv> Plus"
+abbreviation "TIMES \<equiv> Times"
 abbreviation "STAR \<equiv> Star"
 
 
-ML {* @{term "op ^^"} *}
-
 notation (latex output)
   str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
   str_eq ("_ \<approx>\<^bsub>_\<^esub> _") and
@@ -132,11 +130,11 @@
   (e.g.~\cite{Kozen97, HopcroftUllman69}). It seems natural to exercise theorem provers by
   formalising the theorems and by verifying formally the algorithms.  A
   popular choice for a theorem prover would be one based on Higher-Order Logic
-  (HOL), such as HOL4, HOLlight and Isabelle/HOL. For our development we will
-  use the latter theorem prover. One distinguishing feature of HOL is it's
-  type system based on Church's Simple Theory of Types \cite{Church40}.  The
-  limitations of this type system are one of the main motivations
-  behind the work presented in this paper.
+  (HOL), for example HOL4, HOLlight and Isabelle/HOL. For our development 
+  we will use the latter. One distinguishing feature of HOL is it's
+  type system, which is based on Church's Simple Theory of Types \cite{Church40}.  The
+  limitations of this type system are one of the underlying motivations for the 
+  work presented in this paper.
 
   The typical approach to regular languages is to
   introduce finite automata and then define everything in terms of them.  For
@@ -225,12 +223,13 @@
   \end{equation}
 
   \noindent
-  changes the type---the disjoint union is not a set, but a set of pairs. 
-  Using this definition for disjoint union means we do not have a single type for automata
-  and hence will not be able to state certain properties about \emph{all}
-  automata, since there is no type quantification available in HOL (unlike in Coq, for example). 
-  As a result, we would not be able to define a language being regular
-  in terms of the existence of an automata.
+  changes the type---the disjoint union is not a set, but a set of
+  pairs. Using this definition for disjoint union means we do not have a
+  single type for automata. As a result we will not be able to define a regular
+  language as one for which there exists an automaton that recognises all its
+  strings. Similarly, we cannot state properties about \emph{all} automata,
+  since there is no type quantification available in HOL (unlike in Coq, for
+  example).
 
   An alternative, which provides us with a single type for automata, is to give every 
   state node an identity, for example a natural
@@ -283,24 +282,24 @@
   larger formalisations of automata theory are carried out in Nuprl
   \cite{Constable00} and in Coq \cite{Filliatre97}.
 
-  Also one might consider the Myhill-Nerode theorem as well-worn stock 
-  material for a computer scientists, but paper proofs of this theorem contain some 
-  subtle side-conditions which are easily overlooked and which make formal reasoning 
-  painful. For example in Kozen's proof \cite{Kozen97} it is not sufficient to
-  have just any automata recognising a regular language, but one which does
-  not have inaccessible states. This means if we define a regular language
-  for which \emph{any} finite automaton exists, then one has to ensure that
-  another equivalent can be found satisfying the side-conditions. Similarly
-  Brozowski mentiones in \cite{Brozowski10} side-conditions of finite automata
-  in connection of state-complexity: there it is required that automata
-  must be complete in the sense of having a total transition function. 
-  Furthermore, if a `sink' state is present which accepts no word, then there
-  must be only one such state. . . .
-  Such 'little' lemmas make formalisation of these results in a theroem
-  prover hair-pulling experiences.
+  One might also consider the Myhill-Nerode theorem as well-worn stock
+  material where everything is clear. However, paper proofs of this theorem
+  often involve subtle side-conditions which are easily overlooked, but which
+  make formal reasoning rather painful. For example Kozen's proof requires
+  that the automata do not have inaccessible states \cite{Kozen97}. Another
+  subtle side-condition is completeness of automata: 
+  automata need to have total transition functions and at most one `sink'
+  state from which there is no connection to a final state (Brozowski mentions
+  this side-condition in connection with state complexity
+  \cite{Brozowski10}). Such side-conditions mean that if we define a regular
+  language as one for which there exists \emph{any} finite automaton, then we
+  need a lemma which ensures that another equivalent can be found satisfying the
+  side-condition. Unfortunately, such `little' and `obvious' lemmas make 
+  formalisations of results in automata theory hair-pulling experiences.
+
   
   In this paper, we will not attempt to formalise automata theory in
-  Isabelle/HOL nor attempt to formalise any automata proof from the
+  Isabelle/HOL nor will we attempt to formalise automata proofs from the
   literature, but take a different approach to regular languages than is
   usually taken. Instead of defining a regular language as one where there
   exists an automaton that recognises all strings of the language, we define a
@@ -313,9 +312,10 @@
   
   \noindent
   The reason is that regular expressions, unlike graphs, matrices and
-  functions, can be easily defined as inductive datatype. Consequently a
-  corresponding reasoning infrastructure (like induction or recursion) comes
-  for free. This has recently been exploited in HOL4 with a formalisation of
+  functions, can be easily defined as an inductive datatype. No side-conditions
+  will be needed for regular expressions. Moreover, a reasoning infrastructure 
+  (like induction and recursion) comes for free in HOL-based theorem provers. 
+  This has recently been exploited in HOL4 with a formalisation of
   regular expression matching based on derivatives \cite{OwensSlind08} and
   with an equivalence checker for regular expressions in Isabelle/HOL
   \cite{KraussNipkow11}.  The purpose of this paper is to show that a central
@@ -329,13 +329,13 @@
   {\bf Contributions:} There is an extensive literature on regular languages.
   To our best knowledge, our proof of the Myhill-Nerode theorem is the first
   that is based on regular expressions, only. The part of this theorem stating
-  that finitely many partitions of a language w.r.t.~to the Myhill-Nerode
-  relation imply that the language is regular is proved by a folklore argument
-  of solving equational sytems.  For the other part we give two proofs: a
+  that finitely many partitions imply regularity of the language is proved by 
+  an argument about solving equational sytems.  This argument seems to be folklore.
+  For the other part, we give two proofs: a
   direct proof using certain tagging-functions, and an indirect proof using
   Antimirov's partial derivatives \cite{Antimirov95} (also earlier russion work). 
   Again to our best knowledge, the tagging-functions have not been used before to
-  establish the Myhill-Nerode theorem.
+  establish the Myhill-Nerode theorem. 
 
 *}
 
@@ -432,11 +432,11 @@
 
   \begin{center}
   @{text r} @{text "::="}
-  @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
-  @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
-  @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
-  @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
-  @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term ZERO}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term ONE}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term "ATOM c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term "TIMES r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
+  @{term "PLUS r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   @{term "STAR r"}
   \end{center}
 
@@ -466,8 +466,8 @@
   a regular expression that matches the union of all languages of @{text rs}. We only need to know the 
   existence
   of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
-  @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const ALT} over the 
-  set @{text rs} with @{const NULL} for the empty set. We can prove that for a finite set @{text rs}
+  @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const PLUS} over the 
+  set @{text rs} with @{const ZERO} for the empty set. We can prove that for a finite set @{text rs}
   %
   \begin{equation}\label{uplus}
   \mbox{@{thm (lhs) folds_alt_simp} @{text "= \<Union> (\<calL> ` rs)"}}
@@ -567,15 +567,15 @@
   
   \begin{center}
   \begin{tabular}{rcl}
-  @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
-  @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
+  @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) + \<lambda>(ONE)"} \\
+  @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, ATOM c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, ATOM c\<^isub>2\<^isub>o)"} \\
   & $\vdots$ \\
-  @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\
+  @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, ATOM c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, ATOM c\<^isub>n\<^isub>q)"}\\
   \end{tabular}
   \end{center}
 
   \noindent
-  where the terms @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"}
+  where the terms @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"}
   stand for all transitions @{term "Y\<^isub>i\<^isub>j \<Turnstile>c\<^isub>i\<^isub>j\<Rightarrow>
   X\<^isub>i"}. 
   %The intuition behind the equational system is that every 
@@ -584,10 +584,10 @@
   %are the @{text "Y\<^isub>i\<^isub>j"}; the @{text "c\<^isub>i\<^isub>j"} are the labels of the transitions from these 
   %predecessor states to @{text X\<^isub>i}. 
   There can only be
-  finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
+  finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side 
   since by assumption there are only finitely many
   equivalence classes and only finitely many characters.
-  The term @{text "\<lambda>(EMPTY)"} in the first equation acts as a marker for the initial state, that
+  The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that
   is the equivalence class
   containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
   single `initial' state in the equational system, which is different from
@@ -598,7 +598,7 @@
   be reached by adding a character to the end of @{text Y}. This is also the
   reason why we have to use our reverse version of Arden's Lemma.}
   %In our initial equation system there can only be
-  %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
+  %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side 
   %since by assumption there are only finitely many
   %equivalence classes and only finitely many characters. 
   Overloading the function @{text \<calL>} for the two kinds of terms in the
@@ -614,14 +614,14 @@
   and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
   %
   \begin{equation}\label{inv1}
-  @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}.
+  @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, ATOM c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, ATOM c\<^isub>i\<^isub>q)"}.
   \end{equation}
 
   \noindent
   hold. Similarly for @{text "X\<^isub>1"} we can show the following equation
   %
   \begin{equation}\label{inv2}
-  @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(EMPTY))"}.
+  @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(ONE))"}.
   \end{equation}
 
   \noindent
@@ -641,8 +641,8 @@
   \mbox{\begin{tabular}{rcl}     
   @{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} & 
   @{text "if"}~@{term "[] \<in> X"}\\
-  & & @{text "then"}~@{term "{Trn Y (CHAR c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam EMPTY}"}\\
-  & & @{text "else"}~@{term "{Trn Y (CHAR c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
+  & & @{text "then"}~@{term "{Trn Y (ATOM c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam ONE}"}\\
+  & & @{text "else"}~@{term "{Trn Y (ATOM c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
   @{thm (lhs) Init_def}     & @{text "\<equiv>"} & @{thm (rhs) Init_def}
   \end{tabular}}
   \end{equation}
@@ -986,6 +986,7 @@
 section {* Myhill-Nerode, Second Part *}
 
 text {*
+  \noindent
   We will prove in this section the second part of the Myhill-Nerode
   theorem. It can be formulated in our setting as follows:
 
@@ -999,7 +1000,7 @@
 
 
   \begin{proof}[Base Cases]
-  The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are routine, because 
+  The cases for @{const ZERO}, @{const ONE} and @{const ATOM} are routine, because 
   we can easily establish that
 
   \begin{center}
@@ -1093,9 +1094,9 @@
   Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show
   that @{term "UNIV // \<approx>(L r)"} is finite, we have to find a tagging-function whose
   range can be shown to be finite and whose tagging-relation refines @{term "\<approx>(L r)"}.
-  Let us attempt the @{const ALT}-case first.
+  Let us attempt the @{const PLUS}-case first.
 
-  \begin{proof}[@{const "ALT"}-Case] 
+  \begin{proof}[@{const "PLUS"}-Case] 
   We take as tagging-function 
   %
   \begin{center}
@@ -1106,7 +1107,7 @@
   where @{text "A"} and @{text "B"} are some arbitrary languages.
   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
   then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"} holds. The range of
-  @{term "tag_str_ALT A B"} is a subset of this product set---so finite. It remains to be shown
+  @{term "tag_str_PLUS A B"} is a subset of this product set---so finite. It remains to be shown
   that @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} refines @{term "\<approx>(A \<union> B)"}. This amounts to 
   showing
   %
@@ -1134,7 +1135,7 @@
 
   \noindent
   The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately,
-  they are slightly more complicated. In the @{const SEQ}-case we essentially have
+  they are slightly more complicated. In the @{const TIMES}-case we essentially have
   to be able to infer that 
   %
   \begin{center}
@@ -1239,14 +1240,14 @@
   with the idea that in the first split we have to make sure that @{text "(x - x') @ z"}
   is in the language @{text B}.
 
-  \begin{proof}[@{const SEQ}-Case]
+  \begin{proof}[@{const TIMES}-Case]
   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
-  @{term "tag_str_SEQ A B"} is a subset of this product set, and therefore finite.
+  @{term "tag_str_TIMES A B"} is a subset of this product set, and therefore finite.
   We have to show injectivity of this tagging-function as
   %
   \begin{center}
-  @{term "\<forall>z. tag_str_SEQ A B x = tag_str_SEQ A B y \<and> x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
+  @{term "\<forall>z. tag_str_TIMES A B x = tag_str_TIMES A B y \<and> x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
   \end{center}
   %
   \noindent
@@ -1259,7 +1260,7 @@
   \end{center}
   %
   \noindent
-  and by the assumption about @{term "tag_str_SEQ A B"} also 
+  and by the assumption about @{term "tag_str_TIMES A B"} also 
   %
   \begin{center}
   @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {y - y'} |y'. y' \<le> y \<and> y' \<in> A})"}
@@ -1274,15 +1275,15 @@
   @{term "y @ z \<in> A \<cdot> B"}, as needed in this case.
 
   Second, there exists a @{text "z'"} such that @{term "x @ z' \<in> A"} and @{text "z - z' \<in> B"}.
-  By the assumption about @{term "tag_str_SEQ A B"} we have
+  By the assumption about @{term "tag_str_TIMES A B"} we have
   @{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
   relation that @{term "y @ z' \<in> A"} holds. Using @{text "z - z' \<in> B"}, we can conclude also in this case
-  with @{term "y @ z \<in> A \<cdot> B"}. We again can complete the @{const SEQ}-case
+  with @{term "y @ z \<in> A \<cdot> B"}. We again can complete the @{const TIMES}-case
   by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed 
   \end{proof}
   
   \noindent
-  The case for @{const STAR} is similar to @{const SEQ}, but poses a few extra challenges. When
+  The case for @{const STAR} is similar to @{const TIMES}, but poses a few extra challenges. When
   we analyse the case that @{text "x @ z"} is an element in @{term "A\<star>"} and @{text x} is not the 
   empty string, we 
   have the following picture:
@@ -1390,8 +1391,14 @@
 section {* Second Part based on Partial Derivatives *}
 
 text {*
-   We briefly considered using the method Brzozowski presented in the
-  Appendix of~\cite{Brzozowski64} in order to prove the second
+  \noindent
+  As we have seen in the previous section, in order to establish
+  the second direction of the Myhill-Nerode theorem, we need to find 
+  a more refined relation (more refined than ??) for which we can
+  show that there are only finitely many equivalence classes.
+  Brzozowski presented in the Appendix of~\cite{Brzozowski64} 
+
+  in order to prove the second
   direction of the Myhill-Nerode theorem. There he calculates the
   derivatives for regular expressions and shows that for every
   language there can be only finitely many of them %derivations (if
@@ -1404,6 +1411,9 @@
   of derivatives modulo ACI. Therefore we preferred our direct method
   of using tagging-functions. 
 
+  The problem of finiteness modulo ACI can be avoided by using partial
+  derivatives introduced by Antimirov \cite{Antimirov}.
+
 *}
 
 section {* Closure Properties *}
@@ -1501,11 +1511,12 @@
   establishing the Myhill-Nerode theorem. All standard proofs of this
   direction proceed by arguments over automata.\medskip
 
-  We expect that the development of Krauss \& Nipkoe gets easier by
-  using partial derivatives.
+  We expect that the development of Krauss \& Nipkow gets easier by
+  using partial derivatives.\medskip
   
   \noindent
-  {\bf Acknowledgements:} We are grateful for the comments we received from Larry
+  {\bf Acknowledgements:}
+  We are grateful for the comments we received from Larry
   Paulson.
 
 *}