Journal/Paper.thy
changeset 173 d371536861bc
parent 172 21ee3a852a02
child 174 2b414a8a7132
equal deleted inserted replaced
172:21ee3a852a02 173:d371536861bc
    27 translations
    27 translations
    28   "_Collect p P"      <= "{p. P}"
    28   "_Collect p P"      <= "{p. P}"
    29   "_Collect p P"      <= "{p|xs. P}"
    29   "_Collect p P"      <= "{p|xs. P}"
    30   "_CollectIn p A P"  <= "{p : A. P}"
    30   "_CollectIn p A P"  <= "{p : A. P}"
    31 
    31 
    32 abbreviation "NULL \<equiv> Zero"
    32 abbreviation "ZERO \<equiv> Zero"
    33 abbreviation "EMPTY \<equiv> One"
    33 abbreviation "ONE \<equiv> One"
    34 abbreviation "CHAR \<equiv> Atom"
    34 abbreviation "ATOM \<equiv> Atom"
    35 abbreviation "ALT \<equiv> Plus"
    35 abbreviation "PLUS \<equiv> Plus"
    36 abbreviation "SEQ \<equiv> Times"
    36 abbreviation "TIMES \<equiv> Times"
    37 abbreviation "STAR \<equiv> Star"
    37 abbreviation "STAR \<equiv> Star"
    38 
    38 
    39 
       
    40 ML {* @{term "op ^^"} *}
       
    41 
    39 
    42 notation (latex output)
    40 notation (latex output)
    43   str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
    41   str_eq_rel ("\<approx>\<^bsub>_\<^esub>") and
    44   str_eq ("_ \<approx>\<^bsub>_\<^esub> _") and
    42   str_eq ("_ \<approx>\<^bsub>_\<^esub> _") and
    45   conc (infixr "\<cdot>" 100) and
    43   conc (infixr "\<cdot>" 100) and
   130   wide range of textbooks on this subject, many of which are aimed at students
   128   wide range of textbooks on this subject, many of which are aimed at students
   131   and contain very detailed `pencil-and-paper' proofs
   129   and contain very detailed `pencil-and-paper' proofs
   132   (e.g.~\cite{Kozen97, HopcroftUllman69}). It seems natural to exercise theorem provers by
   130   (e.g.~\cite{Kozen97, HopcroftUllman69}). It seems natural to exercise theorem provers by
   133   formalising the theorems and by verifying formally the algorithms.  A
   131   formalising the theorems and by verifying formally the algorithms.  A
   134   popular choice for a theorem prover would be one based on Higher-Order Logic
   132   popular choice for a theorem prover would be one based on Higher-Order Logic
   135   (HOL), such as HOL4, HOLlight and Isabelle/HOL. For our development we will
   133   (HOL), for example HOL4, HOLlight and Isabelle/HOL. For our development 
   136   use the latter theorem prover. One distinguishing feature of HOL is it's
   134   we will use the latter. One distinguishing feature of HOL is it's
   137   type system based on Church's Simple Theory of Types \cite{Church40}.  The
   135   type system, which is based on Church's Simple Theory of Types \cite{Church40}.  The
   138   limitations of this type system are one of the main motivations
   136   limitations of this type system are one of the underlying motivations for the 
   139   behind the work presented in this paper.
   137   work presented in this paper.
   140 
   138 
   141   The typical approach to regular languages is to
   139   The typical approach to regular languages is to
   142   introduce finite automata and then define everything in terms of them.  For
   140   introduce finite automata and then define everything in terms of them.  For
   143   example, a regular language is normally defined as one whose strings are
   141   example, a regular language is normally defined as one whose strings are
   144   recognised by a finite deterministic automaton. This approach has many
   142   recognised by a finite deterministic automaton. This approach has many
   223   \begin{equation}\label{disjointunion}
   221   \begin{equation}\label{disjointunion}
   224   @{text "A\<^isub>1 \<uplus> A\<^isub>2 \<equiv> {(1, x) | x \<in> A\<^isub>1} \<union> {(2, y) | y \<in> A\<^isub>2}"}
   222   @{text "A\<^isub>1 \<uplus> A\<^isub>2 \<equiv> {(1, x) | x \<in> A\<^isub>1} \<union> {(2, y) | y \<in> A\<^isub>2}"}
   225   \end{equation}
   223   \end{equation}
   226 
   224 
   227   \noindent
   225   \noindent
   228   changes the type---the disjoint union is not a set, but a set of pairs. 
   226   changes the type---the disjoint union is not a set, but a set of
   229   Using this definition for disjoint union means we do not have a single type for automata
   227   pairs. Using this definition for disjoint union means we do not have a
   230   and hence will not be able to state certain properties about \emph{all}
   228   single type for automata. As a result we will not be able to define a regular
   231   automata, since there is no type quantification available in HOL (unlike in Coq, for example). 
   229   language as one for which there exists an automaton that recognises all its
   232   As a result, we would not be able to define a language being regular
   230   strings. Similarly, we cannot state properties about \emph{all} automata,
   233   in terms of the existence of an automata.
   231   since there is no type quantification available in HOL (unlike in Coq, for
       
   232   example).
   234 
   233 
   235   An alternative, which provides us with a single type for automata, is to give every 
   234   An alternative, which provides us with a single type for automata, is to give every 
   236   state node an identity, for example a natural
   235   state node an identity, for example a natural
   237   number, and then be careful to rename these identities apart whenever
   236   number, and then be careful to rename these identities apart whenever
   238   connecting two automata. This results in clunky proofs
   237   connecting two automata. This results in clunky proofs
   281   lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata
   280   lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata
   282   working over bit strings in the context of Presburger arithmetic.  The only
   281   working over bit strings in the context of Presburger arithmetic.  The only
   283   larger formalisations of automata theory are carried out in Nuprl
   282   larger formalisations of automata theory are carried out in Nuprl
   284   \cite{Constable00} and in Coq \cite{Filliatre97}.
   283   \cite{Constable00} and in Coq \cite{Filliatre97}.
   285 
   284 
   286   Also one might consider the Myhill-Nerode theorem as well-worn stock 
   285   One might also consider the Myhill-Nerode theorem as well-worn stock
   287   material for a computer scientists, but paper proofs of this theorem contain some 
   286   material where everything is clear. However, paper proofs of this theorem
   288   subtle side-conditions which are easily overlooked and which make formal reasoning 
   287   often involve subtle side-conditions which are easily overlooked, but which
   289   painful. For example in Kozen's proof \cite{Kozen97} it is not sufficient to
   288   make formal reasoning rather painful. For example Kozen's proof requires
   290   have just any automata recognising a regular language, but one which does
   289   that the automata do not have inaccessible states \cite{Kozen97}. Another
   291   not have inaccessible states. This means if we define a regular language
   290   subtle side-condition is completeness of automata: 
   292   for which \emph{any} finite automaton exists, then one has to ensure that
   291   automata need to have total transition functions and at most one `sink'
   293   another equivalent can be found satisfying the side-conditions. Similarly
   292   state from which there is no connection to a final state (Brozowski mentions
   294   Brozowski mentiones in \cite{Brozowski10} side-conditions of finite automata
   293   this side-condition in connection with state complexity
   295   in connection of state-complexity: there it is required that automata
   294   \cite{Brozowski10}). Such side-conditions mean that if we define a regular
   296   must be complete in the sense of having a total transition function. 
   295   language as one for which there exists \emph{any} finite automaton, then we
   297   Furthermore, if a `sink' state is present which accepts no word, then there
   296   need a lemma which ensures that another equivalent can be found satisfying the
   298   must be only one such state. . . .
   297   side-condition. Unfortunately, such `little' and `obvious' lemmas make 
   299   Such 'little' lemmas make formalisation of these results in a theroem
   298   formalisations of results in automata theory hair-pulling experiences.
   300   prover hair-pulling experiences.
   299 
   301   
   300   
   302   In this paper, we will not attempt to formalise automata theory in
   301   In this paper, we will not attempt to formalise automata theory in
   303   Isabelle/HOL nor attempt to formalise any automata proof from the
   302   Isabelle/HOL nor will we attempt to formalise automata proofs from the
   304   literature, but take a different approach to regular languages than is
   303   literature, but take a different approach to regular languages than is
   305   usually taken. Instead of defining a regular language as one where there
   304   usually taken. Instead of defining a regular language as one where there
   306   exists an automaton that recognises all strings of the language, we define a
   305   exists an automaton that recognises all strings of the language, we define a
   307   regular language as:
   306   regular language as:
   308 
   307 
   311   strings of @{text "A"}.
   310   strings of @{text "A"}.
   312   \end{dfntn}
   311   \end{dfntn}
   313   
   312   
   314   \noindent
   313   \noindent
   315   The reason is that regular expressions, unlike graphs, matrices and
   314   The reason is that regular expressions, unlike graphs, matrices and
   316   functions, can be easily defined as inductive datatype. Consequently a
   315   functions, can be easily defined as an inductive datatype. No side-conditions
   317   corresponding reasoning infrastructure (like induction or recursion) comes
   316   will be needed for regular expressions. Moreover, a reasoning infrastructure 
   318   for free. This has recently been exploited in HOL4 with a formalisation of
   317   (like induction and recursion) comes for free in HOL-based theorem provers. 
       
   318   This has recently been exploited in HOL4 with a formalisation of
   319   regular expression matching based on derivatives \cite{OwensSlind08} and
   319   regular expression matching based on derivatives \cite{OwensSlind08} and
   320   with an equivalence checker for regular expressions in Isabelle/HOL
   320   with an equivalence checker for regular expressions in Isabelle/HOL
   321   \cite{KraussNipkow11}.  The purpose of this paper is to show that a central
   321   \cite{KraussNipkow11}.  The purpose of this paper is to show that a central
   322   result about regular languages---the Myhill-Nerode theorem---can be
   322   result about regular languages---the Myhill-Nerode theorem---can be
   323   recreated by only using regular expressions. This theorem gives necessary
   323   recreated by only using regular expressions. This theorem gives necessary
   327   
   327   
   328   \noindent
   328   \noindent
   329   {\bf Contributions:} There is an extensive literature on regular languages.
   329   {\bf Contributions:} There is an extensive literature on regular languages.
   330   To our best knowledge, our proof of the Myhill-Nerode theorem is the first
   330   To our best knowledge, our proof of the Myhill-Nerode theorem is the first
   331   that is based on regular expressions, only. The part of this theorem stating
   331   that is based on regular expressions, only. The part of this theorem stating
   332   that finitely many partitions of a language w.r.t.~to the Myhill-Nerode
   332   that finitely many partitions imply regularity of the language is proved by 
   333   relation imply that the language is regular is proved by a folklore argument
   333   an argument about solving equational sytems.  This argument seems to be folklore.
   334   of solving equational sytems.  For the other part we give two proofs: a
   334   For the other part, we give two proofs: a
   335   direct proof using certain tagging-functions, and an indirect proof using
   335   direct proof using certain tagging-functions, and an indirect proof using
   336   Antimirov's partial derivatives \cite{Antimirov95} (also earlier russion work). 
   336   Antimirov's partial derivatives \cite{Antimirov95} (also earlier russion work). 
   337   Again to our best knowledge, the tagging-functions have not been used before to
   337   Again to our best knowledge, the tagging-functions have not been used before to
   338   establish the Myhill-Nerode theorem.
   338   establish the Myhill-Nerode theorem. 
   339 
   339 
   340 *}
   340 *}
   341 
   341 
   342 section {* Preliminaries *}
   342 section {* Preliminaries *}
   343 
   343 
   430   \noindent
   430   \noindent
   431   Regular expressions are defined as the inductive datatype
   431   Regular expressions are defined as the inductive datatype
   432 
   432 
   433   \begin{center}
   433   \begin{center}
   434   @{text r} @{text "::="}
   434   @{text r} @{text "::="}
   435   @{term NULL}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   435   @{term ZERO}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   436   @{term EMPTY}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   436   @{term ONE}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   437   @{term "CHAR c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   437   @{term "ATOM c"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   438   @{term "SEQ r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   438   @{term "TIMES r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   439   @{term "ALT r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   439   @{term "PLUS r r"}\hspace{1.5mm}@{text"|"}\hspace{1.5mm} 
   440   @{term "STAR r"}
   440   @{term "STAR r"}
   441   \end{center}
   441   \end{center}
   442 
   442 
   443   \noindent
   443   \noindent
   444   and the language matched by a regular expression is defined as
   444   and the language matched by a regular expression is defined as
   464 
   464 
   465   Given a finite set of regular expressions @{text rs}, we will make use of the operation of generating 
   465   Given a finite set of regular expressions @{text rs}, we will make use of the operation of generating 
   466   a regular expression that matches the union of all languages of @{text rs}. We only need to know the 
   466   a regular expression that matches the union of all languages of @{text rs}. We only need to know the 
   467   existence
   467   existence
   468   of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
   468   of such a regular expression and therefore we use Isabelle/HOL's @{const "fold_graph"} and Hilbert's
   469   @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const ALT} over the 
   469   @{text "\<epsilon>"} to define @{term "\<Uplus>rs"}. This operation, roughly speaking, folds @{const PLUS} over the 
   470   set @{text rs} with @{const NULL} for the empty set. We can prove that for a finite set @{text rs}
   470   set @{text rs} with @{const ZERO} for the empty set. We can prove that for a finite set @{text rs}
   471   %
   471   %
   472   \begin{equation}\label{uplus}
   472   \begin{equation}\label{uplus}
   473   \mbox{@{thm (lhs) folds_alt_simp} @{text "= \<Union> (\<calL> ` rs)"}}
   473   \mbox{@{thm (lhs) folds_alt_simp} @{text "= \<Union> (\<calL> ` rs)"}}
   474   \end{equation}
   474   \end{equation}
   475 
   475 
   565   contains the empty string @{text "[]"} (since equivalence classes are disjoint).
   565   contains the empty string @{text "[]"} (since equivalence classes are disjoint).
   566   Let us assume @{text "[] \<in> X\<^isub>1"}. We build the following equational system
   566   Let us assume @{text "[] \<in> X\<^isub>1"}. We build the following equational system
   567   
   567   
   568   \begin{center}
   568   \begin{center}
   569   \begin{tabular}{rcl}
   569   \begin{tabular}{rcl}
   570   @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) + \<lambda>(EMPTY)"} \\
   570   @{text "X\<^isub>1"} & @{text "="} & @{text "(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) + \<dots> + (Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) + \<lambda>(ONE)"} \\
   571   @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, CHAR c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, CHAR c\<^isub>2\<^isub>o)"} \\
   571   @{text "X\<^isub>2"} & @{text "="} & @{text "(Y\<^isub>2\<^isub>1, ATOM c\<^isub>2\<^isub>1) + \<dots> + (Y\<^isub>2\<^isub>o, ATOM c\<^isub>2\<^isub>o)"} \\
   572   & $\vdots$ \\
   572   & $\vdots$ \\
   573   @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, CHAR c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, CHAR c\<^isub>n\<^isub>q)"}\\
   573   @{text "X\<^isub>n"} & @{text "="} & @{text "(Y\<^isub>n\<^isub>1, ATOM c\<^isub>n\<^isub>1) + \<dots> + (Y\<^isub>n\<^isub>q, ATOM c\<^isub>n\<^isub>q)"}\\
   574   \end{tabular}
   574   \end{tabular}
   575   \end{center}
   575   \end{center}
   576 
   576 
   577   \noindent
   577   \noindent
   578   where the terms @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"}
   578   where the terms @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"}
   579   stand for all transitions @{term "Y\<^isub>i\<^isub>j \<Turnstile>c\<^isub>i\<^isub>j\<Rightarrow>
   579   stand for all transitions @{term "Y\<^isub>i\<^isub>j \<Turnstile>c\<^isub>i\<^isub>j\<Rightarrow>
   580   X\<^isub>i"}. 
   580   X\<^isub>i"}. 
   581   %The intuition behind the equational system is that every 
   581   %The intuition behind the equational system is that every 
   582   %equation @{text "X\<^isub>i = rhs\<^isub>i"} in this system
   582   %equation @{text "X\<^isub>i = rhs\<^isub>i"} in this system
   583   %corresponds roughly to a state of an automaton whose name is @{text X\<^isub>i} and its predecessor states 
   583   %corresponds roughly to a state of an automaton whose name is @{text X\<^isub>i} and its predecessor states 
   584   %are the @{text "Y\<^isub>i\<^isub>j"}; the @{text "c\<^isub>i\<^isub>j"} are the labels of the transitions from these 
   584   %are the @{text "Y\<^isub>i\<^isub>j"}; the @{text "c\<^isub>i\<^isub>j"} are the labels of the transitions from these 
   585   %predecessor states to @{text X\<^isub>i}. 
   585   %predecessor states to @{text X\<^isub>i}. 
   586   There can only be
   586   There can only be
   587   finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
   587   finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side 
   588   since by assumption there are only finitely many
   588   since by assumption there are only finitely many
   589   equivalence classes and only finitely many characters.
   589   equivalence classes and only finitely many characters.
   590   The term @{text "\<lambda>(EMPTY)"} in the first equation acts as a marker for the initial state, that
   590   The term @{text "\<lambda>(ONE)"} in the first equation acts as a marker for the initial state, that
   591   is the equivalence class
   591   is the equivalence class
   592   containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
   592   containing @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
   593   single `initial' state in the equational system, which is different from
   593   single `initial' state in the equational system, which is different from
   594   the method by Brzozowski \cite{Brzozowski64}, where he marks the
   594   the method by Brzozowski \cite{Brzozowski64}, where he marks the
   595   `terminal' states. We are forced to set up the equational system in our
   595   `terminal' states. We are forced to set up the equational system in our
   596   way, because the Myhill-Nerode relation determines the `direction' of the
   596   way, because the Myhill-Nerode relation determines the `direction' of the
   597   transitions---the successor `state' of an equivalence class @{text Y} can
   597   transitions---the successor `state' of an equivalence class @{text Y} can
   598   be reached by adding a character to the end of @{text Y}. This is also the
   598   be reached by adding a character to the end of @{text Y}. This is also the
   599   reason why we have to use our reverse version of Arden's Lemma.}
   599   reason why we have to use our reverse version of Arden's Lemma.}
   600   %In our initial equation system there can only be
   600   %In our initial equation system there can only be
   601   %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, CHAR c\<^isub>i\<^isub>j)"} in a right-hand side 
   601   %finitely many terms of the form @{text "(Y\<^isub>i\<^isub>j, ATOM c\<^isub>i\<^isub>j)"} in a right-hand side 
   602   %since by assumption there are only finitely many
   602   %since by assumption there are only finitely many
   603   %equivalence classes and only finitely many characters. 
   603   %equivalence classes and only finitely many characters. 
   604   Overloading the function @{text \<calL>} for the two kinds of terms in the
   604   Overloading the function @{text \<calL>} for the two kinds of terms in the
   605   equational system, we have
   605   equational system, we have
   606   
   606   
   612 
   612 
   613   \noindent
   613   \noindent
   614   and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
   614   and we can prove for @{text "X\<^isub>2\<^isub>.\<^isub>.\<^isub>n"} that the following equations
   615   %
   615   %
   616   \begin{equation}\label{inv1}
   616   \begin{equation}\label{inv1}
   617   @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, CHAR c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, CHAR c\<^isub>i\<^isub>q)"}.
   617   @{text "X\<^isub>i = \<calL>(Y\<^isub>i\<^isub>1, ATOM c\<^isub>i\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>i\<^isub>q, ATOM c\<^isub>i\<^isub>q)"}.
   618   \end{equation}
   618   \end{equation}
   619 
   619 
   620   \noindent
   620   \noindent
   621   hold. Similarly for @{text "X\<^isub>1"} we can show the following equation
   621   hold. Similarly for @{text "X\<^isub>1"} we can show the following equation
   622   %
   622   %
   623   \begin{equation}\label{inv2}
   623   \begin{equation}\label{inv2}
   624   @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, CHAR c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, CHAR c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(EMPTY))"}.
   624   @{text "X\<^isub>1 = \<calL>(Y\<^isub>1\<^isub>1, ATOM c\<^isub>1\<^isub>1) \<union> \<dots> \<union> \<calL>(Y\<^isub>1\<^isub>p, ATOM c\<^isub>1\<^isub>p) \<union> \<calL>(\<lambda>(ONE))"}.
   625   \end{equation}
   625   \end{equation}
   626 
   626 
   627   \noindent
   627   \noindent
   628   holds. The reason for adding the @{text \<lambda>}-marker to our initial equational system is 
   628   holds. The reason for adding the @{text \<lambda>}-marker to our initial equational system is 
   629   to obtain this equation: it only holds with the marker, since none of 
   629   to obtain this equation: it only holds with the marker, since none of 
   639   %
   639   %
   640   \begin{equation}\label{initcs}
   640   \begin{equation}\label{initcs}
   641   \mbox{\begin{tabular}{rcl}     
   641   \mbox{\begin{tabular}{rcl}     
   642   @{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} & 
   642   @{thm (lhs) Init_rhs_def} & @{text "\<equiv>"} & 
   643   @{text "if"}~@{term "[] \<in> X"}\\
   643   @{text "if"}~@{term "[] \<in> X"}\\
   644   & & @{text "then"}~@{term "{Trn Y (CHAR c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam EMPTY}"}\\
   644   & & @{text "then"}~@{term "{Trn Y (ATOM c) | Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X} \<union> {Lam ONE}"}\\
   645   & & @{text "else"}~@{term "{Trn Y (CHAR c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
   645   & & @{text "else"}~@{term "{Trn Y (ATOM c)| Y c. Y \<in> CS \<and> Y \<Turnstile>c\<Rightarrow> X}"}\\
   646   @{thm (lhs) Init_def}     & @{text "\<equiv>"} & @{thm (rhs) Init_def}
   646   @{thm (lhs) Init_def}     & @{text "\<equiv>"} & @{thm (rhs) Init_def}
   647   \end{tabular}}
   647   \end{tabular}}
   648   \end{equation}
   648   \end{equation}
   649 
   649 
   650   
   650   
   984 
   984 
   985 
   985 
   986 section {* Myhill-Nerode, Second Part *}
   986 section {* Myhill-Nerode, Second Part *}
   987 
   987 
   988 text {*
   988 text {*
       
   989   \noindent
   989   We will prove in this section the second part of the Myhill-Nerode
   990   We will prove in this section the second part of the Myhill-Nerode
   990   theorem. It can be formulated in our setting as follows:
   991   theorem. It can be formulated in our setting as follows:
   991 
   992 
   992   \begin{thrm}
   993   \begin{thrm}
   993   Given @{text "r"} is a regular expression, then @{thm Myhill_Nerode2}.
   994   Given @{text "r"} is a regular expression, then @{thm Myhill_Nerode2}.
   997   The proof will be by induction on the structure of @{text r}. It turns out
   998   The proof will be by induction on the structure of @{text r}. It turns out
   998   the base cases are straightforward.
   999   the base cases are straightforward.
   999 
  1000 
  1000 
  1001 
  1001   \begin{proof}[Base Cases]
  1002   \begin{proof}[Base Cases]
  1002   The cases for @{const NULL}, @{const EMPTY} and @{const CHAR} are routine, because 
  1003   The cases for @{const ZERO}, @{const ONE} and @{const ATOM} are routine, because 
  1003   we can easily establish that
  1004   we can easily establish that
  1004 
  1005 
  1005   \begin{center}
  1006   \begin{center}
  1006   \begin{tabular}{l}
  1007   \begin{tabular}{l}
  1007   @{thm quot_zero_eq}\\
  1008   @{thm quot_zero_eq}\\
  1091 
  1092 
  1092   \noindent
  1093   \noindent
  1093   Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show
  1094   Chaining Lem.~\ref{finone} and \ref{fintwo} together, means in order to show
  1094   that @{term "UNIV // \<approx>(L r)"} is finite, we have to find a tagging-function whose
  1095   that @{term "UNIV // \<approx>(L r)"} is finite, we have to find a tagging-function whose
  1095   range can be shown to be finite and whose tagging-relation refines @{term "\<approx>(L r)"}.
  1096   range can be shown to be finite and whose tagging-relation refines @{term "\<approx>(L r)"}.
  1096   Let us attempt the @{const ALT}-case first.
  1097   Let us attempt the @{const PLUS}-case first.
  1097 
  1098 
  1098   \begin{proof}[@{const "ALT"}-Case] 
  1099   \begin{proof}[@{const "PLUS"}-Case] 
  1099   We take as tagging-function 
  1100   We take as tagging-function 
  1100   %
  1101   %
  1101   \begin{center}
  1102   \begin{center}
  1102   @{thm tag_str_Plus_def[where A="A" and B="B", THEN meta_eq_app]}
  1103   @{thm tag_str_Plus_def[where A="A" and B="B", THEN meta_eq_app]}
  1103   \end{center}
  1104   \end{center}
  1104 
  1105 
  1105   \noindent
  1106   \noindent
  1106   where @{text "A"} and @{text "B"} are some arbitrary languages.
  1107   where @{text "A"} and @{text "B"} are some arbitrary languages.
  1107   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
  1108   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
  1108   then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"} holds. The range of
  1109   then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"} holds. The range of
  1109   @{term "tag_str_ALT A B"} is a subset of this product set---so finite. It remains to be shown
  1110   @{term "tag_str_PLUS A B"} is a subset of this product set---so finite. It remains to be shown
  1110   that @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} refines @{term "\<approx>(A \<union> B)"}. This amounts to 
  1111   that @{text "=tag\<^isub>A\<^isub>L\<^isub>T A B="} refines @{term "\<approx>(A \<union> B)"}. This amounts to 
  1111   showing
  1112   showing
  1112   %
  1113   %
  1113   \begin{center}
  1114   \begin{center}
  1114   @{term "tag\<^isub>A\<^isub>L\<^isub>T A B x = tag\<^isub>A\<^isub>L\<^isub>T A B y \<longrightarrow> x \<approx>(A \<union> B) y"}
  1115   @{term "tag\<^isub>A\<^isub>L\<^isub>T A B x = tag\<^isub>A\<^isub>L\<^isub>T A B y \<longrightarrow> x \<approx>(A \<union> B) y"}
  1132   \end{proof}
  1133   \end{proof}
  1133 
  1134 
  1134 
  1135 
  1135   \noindent
  1136   \noindent
  1136   The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately,
  1137   The pattern in \eqref{pattern} is repeated for the other two cases. Unfortunately,
  1137   they are slightly more complicated. In the @{const SEQ}-case we essentially have
  1138   they are slightly more complicated. In the @{const TIMES}-case we essentially have
  1138   to be able to infer that 
  1139   to be able to infer that 
  1139   %
  1140   %
  1140   \begin{center}
  1141   \begin{center}
  1141   @{text "\<dots>"}@{term "x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
  1142   @{text "\<dots>"}@{term "x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
  1142   \end{center}
  1143   \end{center}
  1237 
  1238 
  1238   \noindent
  1239   \noindent
  1239   with the idea that in the first split we have to make sure that @{text "(x - x') @ z"}
  1240   with the idea that in the first split we have to make sure that @{text "(x - x') @ z"}
  1240   is in the language @{text B}.
  1241   is in the language @{text B}.
  1241 
  1242 
  1242   \begin{proof}[@{const SEQ}-Case]
  1243   \begin{proof}[@{const TIMES}-Case]
  1243   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
  1244   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
  1244   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
  1245   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
  1245   @{term "tag_str_SEQ A B"} is a subset of this product set, and therefore finite.
  1246   @{term "tag_str_TIMES A B"} is a subset of this product set, and therefore finite.
  1246   We have to show injectivity of this tagging-function as
  1247   We have to show injectivity of this tagging-function as
  1247   %
  1248   %
  1248   \begin{center}
  1249   \begin{center}
  1249   @{term "\<forall>z. tag_str_SEQ A B x = tag_str_SEQ A B y \<and> x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
  1250   @{term "\<forall>z. tag_str_TIMES A B x = tag_str_TIMES A B y \<and> x @ z \<in> A \<cdot> B \<longrightarrow> y @ z \<in> A \<cdot> B"}
  1250   \end{center}
  1251   \end{center}
  1251   %
  1252   %
  1252   \noindent
  1253   \noindent
  1253   There are two cases to be considered (see pictures above). First, there exists 
  1254   There are two cases to be considered (see pictures above). First, there exists 
  1254   a @{text "x'"} such that
  1255   a @{text "x'"} such that
  1257   \begin{center}
  1258   \begin{center}
  1258   @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {x - x'} |x'. x' \<le> x \<and> x' \<in> A})"}
  1259   @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {x - x'} |x'. x' \<le> x \<and> x' \<in> A})"}
  1259   \end{center}
  1260   \end{center}
  1260   %
  1261   %
  1261   \noindent
  1262   \noindent
  1262   and by the assumption about @{term "tag_str_SEQ A B"} also 
  1263   and by the assumption about @{term "tag_str_TIMES A B"} also 
  1263   %
  1264   %
  1264   \begin{center}
  1265   \begin{center}
  1265   @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {y - y'} |y'. y' \<le> y \<and> y' \<in> A})"}
  1266   @{term "(\<approx>B `` {x - x'}) \<in> ({\<approx>B `` {y - y'} |y'. y' \<le> y \<and> y' \<in> A})"}
  1266   \end{center}
  1267   \end{center}
  1267   %
  1268   %
  1272   relation and together with the fact that @{text "(x - x') @ z \<in> B"}, we
  1273   relation and together with the fact that @{text "(x - x') @ z \<in> B"}, we
  1273   have @{text "(y - y') @ z \<in> B"}. We already know @{text "y' \<in> A"}, therefore
  1274   have @{text "(y - y') @ z \<in> B"}. We already know @{text "y' \<in> A"}, therefore
  1274   @{term "y @ z \<in> A \<cdot> B"}, as needed in this case.
  1275   @{term "y @ z \<in> A \<cdot> B"}, as needed in this case.
  1275 
  1276 
  1276   Second, there exists a @{text "z'"} such that @{term "x @ z' \<in> A"} and @{text "z - z' \<in> B"}.
  1277   Second, there exists a @{text "z'"} such that @{term "x @ z' \<in> A"} and @{text "z - z' \<in> B"}.
  1277   By the assumption about @{term "tag_str_SEQ A B"} we have
  1278   By the assumption about @{term "tag_str_TIMES A B"} we have
  1278   @{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
  1279   @{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
  1279   relation that @{term "y @ z' \<in> A"} holds. Using @{text "z - z' \<in> B"}, we can conclude also in this case
  1280   relation that @{term "y @ z' \<in> A"} holds. Using @{text "z - z' \<in> B"}, we can conclude also in this case
  1280   with @{term "y @ z \<in> A \<cdot> B"}. We again can complete the @{const SEQ}-case
  1281   with @{term "y @ z \<in> A \<cdot> B"}. We again can complete the @{const TIMES}-case
  1281   by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed 
  1282   by setting @{text A} to @{term "L r\<^isub>1"} and @{text B} to @{term "L r\<^isub>2"}.\qed 
  1282   \end{proof}
  1283   \end{proof}
  1283   
  1284   
  1284   \noindent
  1285   \noindent
  1285   The case for @{const STAR} is similar to @{const SEQ}, but poses a few extra challenges. When
  1286   The case for @{const STAR} is similar to @{const TIMES}, but poses a few extra challenges. When
  1286   we analyse the case that @{text "x @ z"} is an element in @{term "A\<star>"} and @{text x} is not the 
  1287   we analyse the case that @{text "x @ z"} is an element in @{term "A\<star>"} and @{text x} is not the 
  1287   empty string, we 
  1288   empty string, we 
  1288   have the following picture:
  1289   have the following picture:
  1289   %
  1290   %
  1290   \begin{center}
  1291   \begin{center}
  1388 *}
  1389 *}
  1389 
  1390 
  1390 section {* Second Part based on Partial Derivatives *}
  1391 section {* Second Part based on Partial Derivatives *}
  1391 
  1392 
  1392 text {*
  1393 text {*
  1393    We briefly considered using the method Brzozowski presented in the
  1394   \noindent
  1394   Appendix of~\cite{Brzozowski64} in order to prove the second
  1395   As we have seen in the previous section, in order to establish
       
  1396   the second direction of the Myhill-Nerode theorem, we need to find 
       
  1397   a more refined relation (more refined than ??) for which we can
       
  1398   show that there are only finitely many equivalence classes.
       
  1399   Brzozowski presented in the Appendix of~\cite{Brzozowski64} 
       
  1400 
       
  1401   in order to prove the second
  1395   direction of the Myhill-Nerode theorem. There he calculates the
  1402   direction of the Myhill-Nerode theorem. There he calculates the
  1396   derivatives for regular expressions and shows that for every
  1403   derivatives for regular expressions and shows that for every
  1397   language there can be only finitely many of them %derivations (if
  1404   language there can be only finitely many of them %derivations (if
  1398   regarded equal modulo ACI). We could have used as tagging-function
  1405   regarded equal modulo ACI). We could have used as tagging-function
  1399   the set of derivatives of a regular expression with respect to a
  1406   the set of derivatives of a regular expression with respect to a
  1401   whenever their derivative is the same, together with the fact that
  1408   whenever their derivative is the same, together with the fact that
  1402   there are only finitely such derivatives would give us a similar
  1409   there are only finitely such derivatives would give us a similar
  1403   argument as ours. However it seems not so easy to calculate the set
  1410   argument as ours. However it seems not so easy to calculate the set
  1404   of derivatives modulo ACI. Therefore we preferred our direct method
  1411   of derivatives modulo ACI. Therefore we preferred our direct method
  1405   of using tagging-functions. 
  1412   of using tagging-functions. 
       
  1413 
       
  1414   The problem of finiteness modulo ACI can be avoided by using partial
       
  1415   derivatives introduced by Antimirov \cite{Antimirov}.
  1406 
  1416 
  1407 *}
  1417 *}
  1408 
  1418 
  1409 section {* Closure Properties *}
  1419 section {* Closure Properties *}
  1410 
  1420 
  1499   direction where we had to spend most of the `conceptual' time, as
  1509   direction where we had to spend most of the `conceptual' time, as
  1500   our proof-argument based on tagging-functions is new for
  1510   our proof-argument based on tagging-functions is new for
  1501   establishing the Myhill-Nerode theorem. All standard proofs of this
  1511   establishing the Myhill-Nerode theorem. All standard proofs of this
  1502   direction proceed by arguments over automata.\medskip
  1512   direction proceed by arguments over automata.\medskip
  1503 
  1513 
  1504   We expect that the development of Krauss \& Nipkoe gets easier by
  1514   We expect that the development of Krauss \& Nipkow gets easier by
  1505   using partial derivatives.
  1515   using partial derivatives.\medskip
  1506   
  1516   
  1507   \noindent
  1517   \noindent
  1508   {\bf Acknowledgements:} We are grateful for the comments we received from Larry
  1518   {\bf Acknowledgements:}
       
  1519   We are grateful for the comments we received from Larry
  1509   Paulson.
  1520   Paulson.
  1510 
  1521 
  1511 *}
  1522 *}
  1512 
  1523 
  1513 
  1524