Journal/Paper.thy
changeset 248 47446f111550
parent 247 087e6c255e33
child 249 061b32d78471
equal deleted inserted replaced
247:087e6c255e33 248:47446f111550
   318 
   318 
   319   \noindent
   319   \noindent
   320   changes the type---the disjoint union is not a set, but a set of
   320   changes the type---the disjoint union is not a set, but a set of
   321   pairs. Using this definition for disjoint union means we do not have a
   321   pairs. Using this definition for disjoint union means we do not have a
   322   single type for the states of automata. As a result we will not be able to
   322   single type for the states of automata. As a result we will not be able to
   323   define in our fomalisation a regular language as one for which there exists
   323   define a regular language as one for which there exists
   324   an automaton that recognises all its strings (Definition~\ref{baddef}). This
   324   an automaton that recognises all its strings (Definition~\ref{baddef}). This
   325   is because we cannot make a definition in HOL that is only polymorphic in
   325   is because we cannot make a definition in HOL that is only polymorphic in
   326   the state type, but not in the predicate for regularity; and there is no
   326   the state type, but not in the predicate for regularity; and there is no
   327   type quantification available in HOL (unlike in Coq, for
   327   type quantification available in HOL (unlike in Coq, for
   328   example).\footnote{Slind already pointed out this problem in an email to the
   328   example).\footnote{Slind already pointed out this problem in an email to the
   384 
   384 
   385   Also, one might consider automata as just convenient `vehicles' for
   385   Also, one might consider automata as just convenient `vehicles' for
   386   establishing properties about regular languages.  However, paper proofs
   386   establishing properties about regular languages.  However, paper proofs
   387   about automata often involve subtle side-conditions which are easily
   387   about automata often involve subtle side-conditions which are easily
   388   overlooked, but which make formal reasoning rather painful. For example
   388   overlooked, but which make formal reasoning rather painful. For example
   389   Kozen's proof of the Myhill-Nerode theorem requires that automata do not
   389   Kozen's proof of the Myhill-Nerode Theorem requires that automata do not
   390   have inaccessible states \cite{Kozen97}. Another subtle side-condition is
   390   have inaccessible states \cite{Kozen97}. Another subtle side-condition is
   391   completeness of automata, that is automata need to have total transition
   391   completeness of automata, that is automata need to have total transition
   392   functions and at most one `sink' state from which there is no connection to
   392   functions and at most one `sink' state from which there is no connection to
   393   a final state (Brzozowski mentions this side-condition in the context of
   393   a final state (Brzozowski mentions this side-condition in the context of
   394   state complexity of automata \cite{Brzozowski10}). Such side-conditions mean
   394   state complexity of automata \cite{Brzozowski10}). Such side-conditions mean
   421   like we need for automata. This convenience of regular expressions has
   421   like we need for automata. This convenience of regular expressions has
   422   recently been exploited in HOL4 with a formalisation of regular expression
   422   recently been exploited in HOL4 with a formalisation of regular expression
   423   matching based on derivatives \cite{OwensSlind08} and with an equivalence
   423   matching based on derivatives \cite{OwensSlind08} and with an equivalence
   424   checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.  The
   424   checker for regular expressions in Isabelle/HOL \cite{KraussNipkow11}.  The
   425   main purpose of this paper is to show that a central result about regular
   425   main purpose of this paper is to show that a central result about regular
   426   languages---the Myhill-Nerode theorem---can be recreated by only using
   426   languages---the Myhill-Nerode Theorem---can be recreated by only using
   427   regular expressions. This theorem gives necessary and sufficient conditions
   427   regular expressions. This theorem gives necessary and sufficient conditions
   428   for when a language is regular. As a corollary of this theorem we can easily
   428   for when a language is regular. As a corollary of this theorem we can easily
   429   establish the usual closure properties, including complementation, for
   429   establish the usual closure properties, including complementation, for
   430   regular languages. We use the continuation lemma \cite{Rosenberg06}, 
   430   regular languages. We use the continuation lemma \cite{Rosenberg06}, 
   431   which is also a corollary of the Myhill-Nerode theorem, for establishing 
   431   which is also a corollary of the Myhill-Nerode Theorem, for establishing 
   432   the non-regularity of the language @{text "a\<^isup>nb\<^isup>n"}.\medskip
   432   the non-regularity of the language @{text "a\<^isup>nb\<^isup>n"}.\medskip
   433   
   433   
   434   \noindent 
   434   \noindent 
   435   {\bf Contributions:} There is an extensive literature on regular languages.
   435   {\bf Contributions:} There is an extensive literature on regular languages.
   436   To our best knowledge, our proof of the Myhill-Nerode theorem is the first
   436   To our best knowledge, our proof of the Myhill-Nerode Theorem is the first
   437   that is based on regular expressions, only. The part of this theorem stating
   437   that is based on regular expressions, only. The part of this theorem stating
   438   that finitely many partitions imply regularity of the language is proved by
   438   that finitely many partitions imply regularity of the language is proved by
   439   an argument about solving equational systems.  This argument appears to be
   439   an argument about solving equational systems.  This argument appears to be
   440   folklore. For the other part, we give two proofs: one direct proof using
   440   folklore. For the other part, we give two proofs: one direct proof using
   441   certain tagging-functions, and another indirect proof using Antimirov's
   441   certain tagging-functions, and another indirect proof using Antimirov's
   442   partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
   442   partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
   443   tagging-functions have not been used before for establishing the Myhill-Nerode
   443   tagging-functions have not been used before for establishing the Myhill-Nerode
   444   theorem. Derivatives of regular expressions have been used recently quite
   444   Theorem. Derivatives of regular expressions have been used recently quite
   445   widely in the literature; partial derivatives, in contrast, attract much
   445   widely in the literature; partial derivatives, in contrast, attract much
   446   less attention. However, partial derivatives are more suitable in the
   446   less attention. However, partial derivatives are more suitable in the
   447   context of the Myhill-Nerode theorem, since it is easier to establish
   447   context of the Myhill-Nerode Theorem, since it is easier to establish
   448   formally their finiteness result. We are not aware of any proof that uses
   448   formally their finiteness result. We are not aware of any proof that uses
   449   either of them for proving the Myhill-Nerode theorem.
   449   either of them for proving the Myhill-Nerode Theorem.
   450 *}
   450 *}
   451 
   451 
   452 section {* Preliminaries *}
   452 section {* Preliminaries *}
   453 
   453 
   454 text {*
   454 text {*
   612 
   612 
   613 section {* The Myhill-Nerode Theorem, First Part *}
   613 section {* The Myhill-Nerode Theorem, First Part *}
   614 
   614 
   615 text {*
   615 text {*
   616   \noindent
   616   \noindent
   617   The key definition in the Myhill-Nerode theorem is the
   617   The key definition in the Myhill-Nerode Theorem is the
   618   \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
   618   \emph{Myhill-Nerode Relation}, which states that w.r.t.~a language two 
   619   strings are related, provided there is no distinguishing extension in this
   619   strings are related, provided there is no distinguishing extension in this
   620   language. This can be defined as a tertiary relation.
   620   language. This can be defined as a tertiary relation.
   621 
   621 
   622   \begin{dfntn}[Myhill-Nerode Relation]\label{myhillneroderel} 
   622   \begin{dfntn}[Myhill-Nerode Relation]\label{myhillneroderel} 
   623   Given a language @{text A}, two strings @{text x} and
   623   Given a language @{text A}, two strings @{text x} and
   642   @{text "X\<^isub>2 = {[c]}"}\\
   642   @{text "X\<^isub>2 = {[c]}"}\\
   643   @{text "X\<^isub>3 = UNIV - {[], [c]}"}
   643   @{text "X\<^isub>3 = UNIV - {[], [c]}"}
   644   \end{tabular}
   644   \end{tabular}
   645   \end{center}
   645   \end{center}
   646 
   646 
   647   One direction of the Myhill-Nerode theorem establishes 
   647   One direction of the Myhill-Nerode Theorem establishes 
   648   that if there are finitely many equivalence classes, like in the example above, then 
   648   that if there are finitely many equivalence classes, like in the example above, then 
   649   the language is regular. In our setting we therefore have to show:
   649   the language is regular. In our setting we therefore have to show:
   650   
   650   
   651   \begin{thrm}\label{myhillnerodeone}
   651   \begin{thrm}\label{myhillnerodeone}
   652   @{thm[mode=IfThen] Myhill_Nerode1}
   652   @{thm[mode=IfThen] Myhill_Nerode1}
   730   is the equivalence class
   730   is the equivalence class
   731   containing the empty string @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
   731   containing the empty string @{text "[]"}.\footnote{Note that we mark, roughly speaking, the
   732   single `initial' state in the equational system, which is different from
   732   single `initial' state in the equational system, which is different from
   733   the method by Brzozowski \cite{Brzozowski64}, where he marks the
   733   the method by Brzozowski \cite{Brzozowski64}, where he marks the
   734   `terminal' states. We are forced to set up the equational system in our
   734   `terminal' states. We are forced to set up the equational system in our
   735   way, because the Myhill-Nerode relation determines the `direction' of the
   735   way, because the Myhill-Nerode Relation determines the `direction' of the
   736   transitions---the successor `state' of an equivalence class @{text Y} can
   736   transitions---the successor `state' of an equivalence class @{text Y} can
   737   be reached by adding a character to the end of @{text Y}. This is also the
   737   be reached by adding a character to the end of @{text Y}. This is also the
   738   reason why we have to use our reversed version of Arden's Lemma.}
   738   reason why we have to use our reversed version of Arden's Lemma.}
   739   In our running example we have the initial equational system
   739   In our running example we have the initial equational system
   740 
   740 
  1150   With this we can conclude the proof.
  1150   With this we can conclude the proof.
  1151   \end{proof}
  1151   \end{proof}
  1152 
  1152 
  1153   \noindent
  1153   \noindent
  1154   Lemma~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
  1154   Lemma~\ref{every_eqcl_has_reg} allows us to finally give a proof for the first direction
  1155   of the Myhill-Nerode theorem.
  1155   of the Myhill-Nerode Theorem.
  1156 
  1156 
  1157   \begin{proof}[Proof of Theorem~\ref{myhillnerodeone}]
  1157   \begin{proof}[Proof of Theorem~\ref{myhillnerodeone}]
  1158   By Lemma~\ref{every_eqcl_has_reg} we know that there exists a regular expression for
  1158   By Lemma~\ref{every_eqcl_has_reg} we know that there exists a regular expression for
  1159   every equivalence class in @{term "UNIV // \<approx>A"}. Since @{text "finals A"} is
  1159   every equivalence class in @{term "UNIV // \<approx>A"}. Since @{text "finals A"} is
  1160   a subset of  @{term "UNIV // \<approx>A"}, we also know that for every equivalence class
  1160   a subset of  @{term "UNIV // \<approx>A"}, we also know that for every equivalence class
  1181 section {* Myhill-Nerode, Second Part *}
  1181 section {* Myhill-Nerode, Second Part *}
  1182 
  1182 
  1183 text {*
  1183 text {*
  1184   \noindent
  1184   \noindent
  1185   In this section we will give a proof for establishing the second 
  1185   In this section we will give a proof for establishing the second 
  1186   part of the Myhill-Nerode theorem. It can be formulated in our setting as follows:
  1186   part of the Myhill-Nerode Theorem. It can be formulated in our setting as follows:
  1187 
  1187 
  1188   \begin{thrm}\label{myhillnerodetwo}
  1188   \begin{thrm}\label{myhillnerodetwo}
  1189   Given @{text "r"} is a regular expression, then @{thm Myhill_Nerode2}.
  1189   Given @{text "r"} is a regular expression, then @{thm Myhill_Nerode2}.
  1190   \end{thrm}  
  1190   \end{thrm}  
  1191 
  1191 
  1257   A relation @{text "R\<^isub>1"} \emph{refines} @{text "R\<^isub>2"}
  1257   A relation @{text "R\<^isub>1"} \emph{refines} @{text "R\<^isub>2"}
  1258   provided @{text "R\<^isub>1 \<subseteq> R\<^isub>2"}. 
  1258   provided @{text "R\<^isub>1 \<subseteq> R\<^isub>2"}. 
  1259   \end{dfntn}
  1259   \end{dfntn}
  1260 
  1260 
  1261   \noindent
  1261   \noindent
  1262   For constructing @{text R}, will rely on some \emph{tagging-functions}
  1262   For constructing @{text R}, we will rely on some \emph{tagging-functions}
  1263   defined over strings. Given the inductive hypothesis, it will be easy to
  1263   defined over strings. Given the inductive hypothesis, it will be easy to
  1264   prove that the \emph{range} of these tagging-functions is finite. The range
  1264   prove that the \emph{range} of these tagging-functions is finite. The range
  1265   of a function @{text f} is defined as
  1265   of a function @{text f} is defined as
  1266 
  1266 
  1267   \begin{center}
  1267   \begin{center}
  1496   B"}. As shown in the pictures above, there are two cases to be
  1496   B"}. As shown in the pictures above, there are two cases to be
  1497   considered. First, there exists a @{text "z\<^isub>p"} and @{text
  1497   considered. First, there exists a @{text "z\<^isub>p"} and @{text
  1498   "z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s
  1498   "z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s
  1499   \<in> B"}.  By the assumption about @{term "tag_Times A B"} we have @{term "\<approx>A
  1499   \<in> B"}.  By the assumption about @{term "tag_Times A B"} we have @{term "\<approx>A
  1500   `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Hence by the Myhill-Nerode
  1500   `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Hence by the Myhill-Nerode
  1501   relation @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"},
  1501   Relation @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"},
  1502   we can conclude in this case with @{term "y @ z \<in> A \<cdot> B"} (recall @{text
  1502   we can conclude in this case with @{term "y @ z \<in> A \<cdot> B"} (recall @{text
  1503   "z\<^isub>p @ z\<^isub>s = z"}).
  1503   "z\<^isub>p @ z\<^isub>s = z"}).
  1504 
  1504 
  1505   Second there exists a partition @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with 
  1505   Second there exists a partition @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with 
  1506   @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}. We therefore have
  1506   @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}. We therefore have
  1517   \end{center}
  1517   \end{center}
  1518   
  1518   
  1519   \noindent
  1519   \noindent
  1520   This means there must be a partition @{text "y\<^isub>p"} and @{text "y\<^isub>s"}
  1520   This means there must be a partition @{text "y\<^isub>p"} and @{text "y\<^isub>s"}
  1521   such that @{term "y\<^isub>p \<in> A"} and @{term "\<approx>B `` {x\<^isub>s} = \<approx>B ``
  1521   such that @{term "y\<^isub>p \<in> A"} and @{term "\<approx>B `` {x\<^isub>s} = \<approx>B ``
  1522   {y\<^isub>s}"}. Unfolding the Myhill-Nerode relation and together with the
  1522   {y\<^isub>s}"}. Unfolding the Myhill-Nerode Relation and together with the
  1523   facts that @{text "x\<^isub>p \<in> A"} and \mbox{@{text "x\<^isub>s @ z \<in> B"}}, we
  1523   facts that @{text "x\<^isub>p \<in> A"} and \mbox{@{text "x\<^isub>s @ z \<in> B"}}, we
  1524   obtain @{term "y\<^isub>p \<in> A"} and @{text "y\<^isub>s @ z \<in> B"}, as needed in
  1524   obtain @{term "y\<^isub>p \<in> A"} and @{text "y\<^isub>s @ z \<in> B"}, as needed in
  1525   this case.  We again can complete the @{const TIMES}-case by setting @{text
  1525   this case.  We again can complete the @{const TIMES}-case by setting @{text
  1526   A} to @{term "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
  1526   A} to @{term "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
  1527   \end{proof}
  1527   \end{proof}
  1632   \end{center}
  1632   \end{center}
  1633   
  1633   
  1634   \noindent 
  1634   \noindent 
  1635   From this we know there exist a partition @{text "y\<^isub>p"} and @{text
  1635   From this we know there exist a partition @{text "y\<^isub>p"} and @{text
  1636   "y\<^isub>s"} with @{term "y\<^isub>p \<in> A\<star>"} and also @{term "x\<^isub>s \<approx>A
  1636   "y\<^isub>s"} with @{term "y\<^isub>p \<in> A\<star>"} and also @{term "x\<^isub>s \<approx>A
  1637   y\<^isub>s"}. Unfolding the Myhill-Nerode relation we know @{term
  1637   y\<^isub>s"}. Unfolding the Myhill-Nerode Relation we know @{term
  1638   "y\<^isub>s @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
  1638   "y\<^isub>s @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
  1639   Therefore @{term "y\<^isub>p @ (y\<^isub>s @ z\<^isub>a) @ z\<^isub>b \<in>
  1639   Therefore @{term "y\<^isub>p @ (y\<^isub>s @ z\<^isub>a) @ z\<^isub>b \<in>
  1640   A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. The last step is to set
  1640   A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. The last step is to set
  1641   @{text "A"} to @{term "lang r"} and thus complete the proof.
  1641   @{text "A"} to @{term "lang r"} and thus complete the proof.
  1642   \end{proof}
  1642   \end{proof}
  1645 section {* Second Part proved using Partial Derivatives\label{derivatives} *}
  1645 section {* Second Part proved using Partial Derivatives\label{derivatives} *}
  1646 
  1646 
  1647 text {*
  1647 text {*
  1648   \noindent
  1648   \noindent
  1649   As we have seen in the previous section, in order to establish
  1649   As we have seen in the previous section, in order to establish
  1650   the second direction of the Myhill-Nerode theorem, it is sufficient to find 
  1650   the second direction of the Myhill-Nerode Theorem, it is sufficient to find 
  1651   a more refined relation than @{term "\<approx>(lang r)"} for which we can
  1651   a more refined relation than @{term "\<approx>(lang r)"} for which we can
  1652   show that there are only finitely many equivalence classes. So far we 
  1652   show that there are only finitely many equivalence classes. So far we 
  1653   showed this directly by induction on @{text "r"} using tagging-functions. 
  1653   showed this directly by induction on @{text "r"} using tagging-functions. 
  1654   However, there is also  an indirect method to come up with such a refined 
  1654   However, there is also  an indirect method to come up with such a refined 
  1655   relation by using derivatives of regular expressions~\cite{Brzozowski64}. 
  1655   relation by using derivatives of regular expressions~\cite{Brzozowski64}. 
  1672   @{abbrev "Derivss s As"}
  1672   @{abbrev "Derivss s As"}
  1673   \end{center}
  1673   \end{center}
  1674   
  1674   
  1675   \noindent
  1675   \noindent
  1676   where we apply the left-quotient to a set of languages and then combine the results.
  1676   where we apply the left-quotient to a set of languages and then combine the results.
  1677   Clearly we have the following equivalence between the Myhill-Nerode relation
  1677   Clearly we have the following equivalence between the Myhill-Nerode Relation
  1678   (Definition~\ref{myhillneroderel}) and left-quotients
  1678   (Definition~\ref{myhillneroderel}) and left-quotients
  1679 
  1679 
  1680   \begin{equation}\label{mhders}
  1680   \begin{equation}\label{mhders}
  1681   @{term "x \<approx>A y"} \hspace{4mm}\text{if and only if}\hspace{4mm} @{term "Derivs x A = Derivs y A"}
  1681   @{term "x \<approx>A y"} \hspace{4mm}\text{if and only if}\hspace{4mm} @{term "Derivs x A = Derivs y A"}
  1682   \end{equation}
  1682   \end{equation}
  1762   @{thm Derivs_derivs}
  1762   @{thm Derivs_derivs}
  1763   \end{tabular}}
  1763   \end{tabular}}
  1764   \end{equation}
  1764   \end{equation}
  1765 
  1765 
  1766   \noindent
  1766   \noindent
  1767   The importance of this fact in the context of the Myhill-Nerode theorem is that 
  1767   The importance of this fact in the context of the Myhill-Nerode Theorem is that 
  1768   we can use \eqref{mhders} and \eqref{Dersders} in order to 
  1768   we can use \eqref{mhders} and \eqref{Dersders} in order to 
  1769   establish that 
  1769   establish that 
  1770 
  1770 
  1771   \begin{center}
  1771   \begin{center}
  1772   @{term "x \<approx>(lang r) y"} \hspace{4mm}if and only if\hspace{4mm}
  1772   @{term "x \<approx>(lang r) y"} \hspace{4mm}if and only if\hspace{4mm}
  1781   \end{equation}
  1781   \end{equation}
  1782 
  1782 
  1783 
  1783 
  1784   \noindent
  1784   \noindent
  1785   This means the right-hand side (seen as a relation) refines the Myhill-Nerode
  1785   This means the right-hand side (seen as a relation) refines the Myhill-Nerode
  1786   relation.  Consequently, we can use @{text
  1786   Relation.  Consequently, we can use @{text
  1787   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
  1787   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
  1788   tagging-relation. However, in order to be useful for the second part of the
  1788   tagging-relation. However, in order to be useful for the second part of the
  1789   Myhill-Nerode theorem, we have to be able to establish that for the
  1789   Myhill-Nerode Theorem, we have to be able to establish that for the
  1790   corresponding language there are only finitely many derivatives---thus
  1790   corresponding language there are only finitely many derivatives---thus
  1791   ensuring that there are only finitely many equivalence
  1791   ensuring that there are only finitely many equivalence
  1792   classes. Unfortunately, this is not true in general. Sakarovitch gives an
  1792   classes. Unfortunately, this is not true in general. Sakarovitch gives an
  1793   example where a regular expression has infinitely many derivatives
  1793   example where a regular expression has infinitely many derivatives
  1794   w.r.t.~the language @{text "(ab)\<^isup>\<star> \<union> (ab)\<^isup>\<star>a"}, which is formally 
  1794   w.r.t.~the language @{text "(ab)\<^isup>\<star> \<union> (ab)\<^isup>\<star>a"}, which is formally 
  1976   closely in our formalisation (only the last two cases of
  1976   closely in our formalisation (only the last two cases of
  1977   \eqref{pdersunivone} involve some non-routine induction arguments), we omit
  1977   \eqref{pdersunivone} involve some non-routine induction arguments), we omit
  1978   the details.
  1978   the details.
  1979 
  1979 
  1980   Let us now return to our proof for the second direction in the Myhill-Nerode
  1980   Let us now return to our proof for the second direction in the Myhill-Nerode
  1981   theorem. The point of the above calculations is to use 
  1981   Theorem. The point of the above calculations is to use 
  1982   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"} as tagging-relation. 
  1982   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"} as tagging-relation. 
  1983 
  1983 
  1984 
  1984 
  1985   \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)]
  1985   \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)]
  1986   Using \eqref{mhders}
  1986   Using \eqref{mhders}
  2005   \noindent
  2005   \noindent
  2006   Now the range of @{term "\<lambda>x. pderivs x r"} is a subset of @{term "Pow (pderivs_lang UNIV r)"},
  2006   Now the range of @{term "\<lambda>x. pderivs x r"} is a subset of @{term "Pow (pderivs_lang UNIV r)"},
  2007   which we know is finite by Theorem~\ref{antimirov}. Consequently there 
  2007   which we know is finite by Theorem~\ref{antimirov}. Consequently there 
  2008   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"}.
  2008   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"}.
  2009   This relation refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the 
  2009   This relation refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the 
  2010   second part of the Myhill-Nerode theorem.
  2010   second part of the Myhill-Nerode Theorem.
  2011   \end{proof}
  2011   \end{proof}
  2012 *}
  2012 *}
  2013 
  2013 
  2014 section {* Closure Properties of Regular Languages\label{closure} *}
  2014 section {* Closure Properties of Regular Languages\label{closure} *}
  2015 
  2015 
  2019   operations. Closure under union, concatenation and Kleene-star are trivial
  2019   operations. Closure under union, concatenation and Kleene-star are trivial
  2020   to establish given our definition of regularity (recall Definition~\ref{regular}).
  2020   to establish given our definition of regularity (recall Definition~\ref{regular}).
  2021   More interesting in our setting is the closure under complement, because it seems difficult
  2021   More interesting in our setting is the closure under complement, because it seems difficult
  2022   to construct a regular expression for the complement language by direct
  2022   to construct a regular expression for the complement language by direct
  2023   means. However the existence of such a regular expression can now be easily
  2023   means. However the existence of such a regular expression can now be easily
  2024   proved using both parts of the Myhill-Nerode theorem, since
  2024   proved using both parts of the Myhill-Nerode Theorem, since
  2025 
  2025 
  2026   \begin{center}
  2026   \begin{center}
  2027   @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
  2027   @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"}
  2028   \end{center}
  2028   \end{center}
  2029   
  2029   
  2148   \end{thrm}
  2148   \end{thrm}
  2149 
  2149 
  2150   \noindent
  2150   \noindent
  2151   Our proof follows the one given in \cite[Pages 92--95]{Shallit08}, except that we use
  2151   Our proof follows the one given in \cite[Pages 92--95]{Shallit08}, except that we use
  2152   Higman's Lemma, which is already proved in the Isabelle/HOL library 
  2152   Higman's Lemma, which is already proved in the Isabelle/HOL library 
  2153   \cite{Berghofer03}.\footnote{Unfortunately, Berghofer's formalisation of Higman's lemma 
  2153   \cite{Berghofer03}.\footnote{Unfortunately, Berghofer's formalisation of Higman's Lemma 
  2154   is restricted to 2-letter alphabets,
  2154   is restricted to 2-letter alphabets,
  2155   which means also our formalisation of Theorem~\ref{subseqreg} is `tainted' with 
  2155   which means also our formalisation of Theorem~\ref{subseqreg} is `tainted' with 
  2156   this constraint. However our methodology is applicable to any alphabet of finite size.} 
  2156   this constraint. However our methodology is applicable to any alphabet of finite size.} 
  2157   Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying
  2157   Higman's Lemma allows us to infer that every set @{text A} of antichains, satisfying
  2158 
  2158 
  2268   is regular, which means @{term "- SUBSEQ A"} is regular. But since
  2268   is regular, which means @{term "- SUBSEQ A"} is regular. But since
  2269   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
  2269   we established already that regularity is preserved under complement, also @{term "SUBSEQ A"}
  2270   must be regular. 
  2270   must be regular. 
  2271   \end{proof}
  2271   \end{proof}
  2272 
  2272 
  2273   Finally we like to show that the Myhill-Nerode theorem is also convenient for establishing 
  2273   Finally we like to show that the Myhill-Nerode Theorem is also convenient for establishing 
  2274   the non-regularity of languages. For this we use the following version of the Continuation
  2274   the non-regularity of languages. For this we use the following version of the Continuation
  2275   Lemma (see for example~\cite{Rosenberg06}).
  2275   Lemma (see for example~\cite{Rosenberg06}).
  2276 
  2276 
  2277   \begin{lmm}[Continuation Lemma]
  2277   \begin{lmm}[Continuation Lemma]
  2278   If a language @{text A} is regular and a set @{text B} is infinite,
  2278   If a language @{text A} is regular and a set @{text B} is infinite,
  2279   then there exist two distinct strings @{text x} and @{text y} in @{text B} 
  2279   then there exist two distinct strings @{text x} and @{text y} in @{text B} 
  2280   such that @{term "x \<approx>A y"}.
  2280   such that @{term "x \<approx>A y"}.
  2281   \end{lmm}
  2281   \end{lmm}
  2282 
  2282 
  2283   \noindent
  2283   \noindent
  2284   This lemma can be easily deduced from the Myhill-Nerode theorem and the Pigeonhole
  2284   This lemma can be easily deduced from the Myhill-Nerode Theorem and the Pigeonhole
  2285   Principle: Since @{text A} is regular, there can be only finitely many 
  2285   Principle: Since @{text A} is regular, there can be only finitely many 
  2286   equivalence classes. Hence an infinite set must contain 
  2286   equivalence classes. Hence an infinite set must contain 
  2287   at least two strings that are in the same equivalence class, that is
  2287   at least two strings that are in the same equivalence class, that is
  2288   they need to be related by the Myhill-Nerode relation.
  2288   they need to be related by the Myhill-Nerode Relation.
  2289 
  2289 
  2290   Using this lemma, it is straightforward to establish that the language 
  2290   Using this lemma, it is straightforward to establish that the language 
  2291   \mbox{@{text "A \<equiv> \<Union>\<^isub>n a\<^sup>n @ b\<^sup>n"}} is not regular (@{text "a\<^sup>n"} stands
  2291   \mbox{@{text "A \<equiv> \<Union>\<^isub>n a\<^sup>n @ b\<^sup>n"}} is not regular (@{text "a\<^sup>n"} stands
  2292   for the strings consisting of @{text n} times the character a; similarly for
  2292   for the strings consisting of @{text n} times the character a; similarly for
  2293   @{text "b\<^isup>n"}). For this consider the infinite set @{text "B \<equiv> \<Union>\<^isub>n a\<^sup>n"}.
  2293   @{text "b\<^isup>n"}). For this consider the infinite set @{text "B \<equiv> \<Union>\<^isub>n a\<^sup>n"}.
  2303   the string on the right-hand side satisfies this property, but not the one on
  2303   the string on the right-hand side satisfies this property, but not the one on
  2304   the left-hand side. Therefore the strings cannot be equal and we have a contradiction.
  2304   the left-hand side. Therefore the strings cannot be equal and we have a contradiction.
  2305   \end{proof}
  2305   \end{proof}
  2306 
  2306 
  2307   \noindent
  2307   \noindent
  2308   To conclude, this lemma and the Continuation Lemma leads to a contradiction assuming @{text A}
  2308   To conclude the proof on non-regularity of language @{text A}, the
  2309   is regular. Therefore the language @{text A} is not regular, as we wanted to show.
  2309   Continuation Lemma and the lemma above lead to a contradiction assuming
       
  2310   @{text A} is regular. Therefore the language @{text A} is not regular, as we
       
  2311   wanted to show.
  2310 *}
  2312 *}
  2311 
  2313 
  2312 
  2314 
  2313 
  2315 
  2314 section {* Conclusion and Related Work *}
  2316 section {* Conclusion and Related Work *}
  2318   In this paper we took the view that a regular language is one where there
  2320   In this paper we took the view that a regular language is one where there
  2319   exists a regular expression that matches all of its strings. Regular
  2321   exists a regular expression that matches all of its strings. Regular
  2320   expressions can conveniently be defined as a datatype in HOL-based theorem
  2322   expressions can conveniently be defined as a datatype in HOL-based theorem
  2321   provers. For us it was therefore interesting to find out how far we can push
  2323   provers. For us it was therefore interesting to find out how far we can push
  2322   this point of view. We have established in Isabelle/HOL both directions 
  2324   this point of view. We have established in Isabelle/HOL both directions 
  2323   of the Myhill-Nerode theorem.
  2325   of the Myhill-Nerode Theorem.
  2324   %
  2326   %
  2325   \begin{thrm}[The Myhill-Nerode Theorem]\mbox{}\\
  2327   \begin{thrm}[The Myhill-Nerode Theorem]\mbox{}\\
  2326   A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
  2328   A language @{text A} is regular if and only if @{thm (rhs) Myhill_Nerode}.
  2327   \end{thrm}
  2329   \end{thrm}
  2328   
  2330   
  2329   \noindent
  2331   \noindent
  2330   Having formalised this theorem means we pushed our point of view quite
  2332   Having formalised this theorem means we pushed our point of view quite
  2331   far. Using this theorem we can obviously prove when a language is \emph{not}
  2333   far. Using this theorem we can obviously prove when a language is \emph{not}
  2332   regular---by establishing that it has infinitely many equivalence classes
  2334   regular---by establishing that it has infinitely many equivalence classes
  2333   generated by the Myhill-Nerode relation (this is usually the purpose of the
  2335   generated by the Myhill-Nerode Relation (this is usually the purpose of the
  2334   Pumping Lemma \cite{Kozen97}).  We can also use it to establish the standard
  2336   Pumping Lemma \cite{Kozen97}).  We can also use it to establish the standard
  2335   textbook results about closure properties of regular languages. Interesting
  2337   textbook results about closure properties of regular languages. Interesting
  2336   is the case of closure under complement, because it seems difficult to
  2338   is the case of closure under complement, because it seems difficult to
  2337   construct a regular expression for the complement language by direct
  2339   construct a regular expression for the complement language by direct
  2338   means. However the existence of such a regular expression can be easily
  2340   means. However the existence of such a regular expression can be easily
  2339   proved using the Myhill-Nerode theorem.  
  2341   proved using the Myhill-Nerode Theorem.  
  2340 
  2342 
  2341   Our insistence on regular expressions for proving the Myhill-Nerode theorem
  2343   Our insistence on regular expressions for proving the Myhill-Nerode Theorem
  2342   arose from the limitations of HOL, used in the popular theorem provers HOL4,
  2344   arose from the limitations of HOL, which is the logic underlying the popular theorem provers HOL4,
  2343   HOLlight and Isabelle/HOL. In order to guarantee consistency,
  2345   HOLlight and Isabelle/HOL. In order to guarantee consistency,
  2344   formalisations in HOL can only extend the logic with definitions that introduce a new concept in
  2346   formalisations in HOL can only extend the logic with definitions that introduce a new concept in
  2345   terms of already existing notions. A convenient definition for automata
  2347   terms of already existing notions. A convenient definition for automata
  2346   (based on graphs) uses a polymorphic type for the state nodes. This allows
  2348   (based on graphs) uses a polymorphic type for the state nodes. This allows
  2347   us to use the standard operation for disjoint union whenever we need to compose two
  2349   us to use the standard operation for disjoint union whenever we need to compose two
  2348   automata. Unfortunately, we cannot use such a polymorphic definition
  2350   automata. Unfortunately, we cannot use such a polymorphic definition
  2349   in HOL as part of the definition for regularity of a language (a predicate
  2351   in HOL as part of the definition for regularity of a language (a predicate
  2350   over set of strings).  Consider for example the following attempt:
  2352   over sets of strings).  Consider for example the following attempt:
  2351 
  2353 
  2352   \begin{center}
  2354   \begin{center}
  2353   @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
  2355   @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
  2354   \end{center}
  2356   \end{center}
  2355 
  2357 
  2356   \noindent
  2358   \noindent
  2357   In this definifion, the definiens is polymorphic in the type of the automata
  2359   In this definifion, the definiens is polymorphic in the type of the automata
  2358   @{text "M"} (indicated by dependency on @{text "\<alpha>"}), but the definiendum
  2360   @{text "M"} (indicated by dependency on the type-variable @{text "\<alpha>"}), but the definiendum
  2359   @{text "is_regular"} is not. Such definitions are excluded in HOL, because
  2361   @{text "is_regular"} is not. Such definitions are excluded from HOL, because
  2360   they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
  2362   they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
  2361   example). Also HOL does not contain type-quantifiers which would allow us to
  2363   example). Also HOL does not contain type-quantifiers which would allow us to
  2362   get rid of the polymorphism by quantifying over the type-variable @{text
  2364   get rid of the polymorphism by quantifying over the type-variable @{text
  2363   "\<alpha>"}. Therefore when defining regularity in terms of automata, the only
  2365   "\<alpha>"}. Therefore when defining regularity in terms of automata, the only
  2364   natural way out in HOL is to resort to state nodes with an identity, for
  2366   natural way out in HOL is to resort to state nodes with an identity, for
  2368   unpleasant. Regular expressions proved much more convenient for reasoning in
  2370   unpleasant. Regular expressions proved much more convenient for reasoning in
  2369   HOL since they can be defined as inductive datatype and a reasoning
  2371   HOL since they can be defined as inductive datatype and a reasoning
  2370   infrastructure comes for free. The definition of regularity in terms of
  2372   infrastructure comes for free. The definition of regularity in terms of
  2371   regular expressions poses no problem at all for HOL.  We showed in this
  2373   regular expressions poses no problem at all for HOL.  We showed in this
  2372   paper that they can be used for establishing the central result in regular
  2374   paper that they can be used for establishing the central result in regular
  2373   language theory---the Myhill-Nerode theorem.
  2375   language theory---the Myhill-Nerode Theorem.
  2374 
  2376 
  2375   While regular expressions are convenient, they have some limitations. One is
  2377   While regular expressions are convenient, they have some limitations. One is
  2376   that there seems to be no method of calculating a minimal regular expression
  2378   that there seems to be no method of calculating a minimal regular expression
  2377   (for example in terms of length) for a regular language, like there is for
  2379   (for example in terms of length) for a regular language, like there is for
  2378   automata. On the other hand, efficient regular expression matching, without
  2380   automata. On the other hand, efficient regular expression matching, without
  2386   Our proof of the first direction is very much inspired by \emph{Brzozowski's
  2388   Our proof of the first direction is very much inspired by \emph{Brzozowski's
  2387   algebraic method} used to convert a finite automaton to a regular expression
  2389   algebraic method} used to convert a finite automaton to a regular expression
  2388   \cite{Brzozowski64}. The close connection can be seen by considering the
  2390   \cite{Brzozowski64}. The close connection can be seen by considering the
  2389   equivalence classes as the states of the minimal automaton for the regular
  2391   equivalence classes as the states of the minimal automaton for the regular
  2390   language.  However there are some subtle differences. Because our equivalence 
  2392   language.  However there are some subtle differences. Because our equivalence 
  2391   classes (or correspondingly states) arise from the Myhil-Nerode Relation, the most natural
  2393   classes (or correspondingly states) arise from the Myhill-Nerode Relation, the most natural
  2392   choice is to characterise each state with the set of strings starting from
  2394   choice is to characterise each state with the set of strings starting from
  2393   the initial state leading up to that state. Usually, however, the states are
  2395   the initial state leading up to that state. Usually, however, the states are
  2394   characterised as the strings starting from that state leading to the
  2396   characterised as the strings starting from that state leading to the
  2395   terminal states.  The first choice has consequences about how the initial
  2397   terminal states.  The first choice has consequences about how the initial
  2396   equational system is set up. We have the $\lambda$-term on our `initial
  2398   equational system is set up. We have the $\lambda$-term on our `initial
  2397   state', while Brzozowski has it on the terminal states. This means we also
  2399   state', while Brzozowski has it on the terminal states. This means we also
  2398   need to reverse the direction of Arden's Lemma. We have not found anything
  2400   need to reverse the direction of Arden's Lemma. We have not found anything
  2399   in the literature about our way of proving the first direction of the
  2401   in the literature about our way of proving the first direction of the
  2400   Myhill-Nerode theorem, but it appears to be folklore.
  2402   Myhill-Nerode Theorem, but it appears to be folklore.
  2401 
  2403 
  2402   We presented two proofs for the second direction of the Myhill-Nerode
  2404   We presented two proofs for the second direction of the Myhill-Nerode
  2403   theorem. One direct proof using tagging-functions and another using partial
  2405   Theorem. One direct proof using tagging-functions and another using partial
  2404   derivatives. This part of our work is where our method using regular
  2406   derivatives. This part of our work is where our method using regular
  2405   expressions shines, because we can completely side-step the standard
  2407   expressions shines, because we can completely side-step the standard
  2406   argument \cite{Kozen97} where automata need to be composed. However, it is
  2408   argument \cite{Kozen97} where automata need to be composed. However, it is
  2407   also the direction where we had to spend most of the `conceptual' time, as
  2409   also the direction where we had to spend most of the `conceptual' time, as
  2408   our first proof based on tagging-functions is new for establishing the
  2410   our first proof based on tagging-functions is new for establishing the
  2409   Myhill-Nerode theorem. All standard proofs of this direction proceed by
  2411   Myhill-Nerode Theorem. All standard proofs of this direction proceed by
  2410   arguments over automata.
  2412   arguments over automata.
  2411   
  2413   
  2412   The indirect proof for the second direction arose from our interest in
  2414   The indirect proof for the second direction arose from our interest in
  2413   Brzozowski's derivatives for regular expression matching.  While Brzozowski 
  2415   Brzozowski's derivatives for regular expression matching.  While Brzozowski 
  2414   already established that there are only
  2416   already established that there are only
  2427   Theorem~\ref{antimirov}). However, since partial derivatives use sets of
  2429   Theorem~\ref{antimirov}). However, since partial derivatives use sets of
  2428   regular expressions, one needs to carefully analyse whether the resulting
  2430   regular expressions, one needs to carefully analyse whether the resulting
  2429   algorithm is still executable. Given the existing infrastructure for
  2431   algorithm is still executable. Given the existing infrastructure for
  2430   executable sets in Isabelle/HOL \cite{Haftmann09}, it should.
  2432   executable sets in Isabelle/HOL \cite{Haftmann09}, it should.
  2431 
  2433 
  2432   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
  2434   Our formalisation of the Myhill-Nerode Theorem consists of 780 lines of
  2433   Isabelle/Isar code for the first direction and 460 for the second (the one
  2435   Isabelle/Isar code for the first direction and 460 for the second (the one
  2434   based on tagging-functions), plus around 300 lines of standard material
  2436   based on tagging-functions), plus around 300 lines of standard material
  2435   about regular languages. The formalisation of derivatives and partial
  2437   about regular languages. The formalisation of derivatives and partial
  2436   derivatives shown in Section~\ref{derivatives} consists of 390 lines of
  2438   derivatives shown in Section~\ref{derivatives} consists of 390 lines of
  2437   code.  The closure properties in Section~\ref{closure} (except Theorem~\ref{subseqreg}) 
  2439   code.  The closure properties in Section~\ref{closure} (except Theorem~\ref{subseqreg}) 
  2441   The algorithm for solving equational systems, which we
  2443   The algorithm for solving equational systems, which we
  2442   used in the first direction, is conceptually relatively simple. Still the
  2444   used in the first direction, is conceptually relatively simple. Still the
  2443   use of sets over which the algorithm operates means it is not as easy to
  2445   use of sets over which the algorithm operates means it is not as easy to
  2444   formalise as one might hope. However, it seems sets cannot be avoided since
  2446   formalise as one might hope. However, it seems sets cannot be avoided since
  2445   the `input' of the algorithm consists of equivalence classes and we cannot
  2447   the `input' of the algorithm consists of equivalence classes and we cannot
  2446   see how to reformulate the theory so that we can use lists. Lists would be
  2448   see how to reformulate the theory so that we can use lists or matrices. Lists would be
  2447   much easier to reason about, since we can define functions over them by
  2449   much easier to reason about, since we can define functions over them by
  2448   recursion. For sets we have to use set-comprehensions, which is slightly
  2450   recursion. For sets we have to use set-comprehensions, which is slightly
  2449   unwieldy.
  2451   unwieldy. Matrices would allow us to use the slick formalisation by 
       
  2452   Nipkow of the Gauss-Jordan algorithm \cite{Nipkow11}.
  2450 
  2453 
  2451   While our formalisation might appear large, it should be seen
  2454   While our formalisation might appear large, it should be seen
  2452   in the context of the work done by Constable at al \cite{Constable00} who
  2455   in the context of the work done by Constable at al \cite{Constable00} who
  2453   formalised the Myhill-Nerode theorem in Nuprl using automata. They write
  2456   formalised the Myhill-Nerode Theorem in Nuprl using automata. They write
  2454   that their four-member team needed something on the magnitude of 18 months
  2457   that their four-member team needed something on the magnitude of 18 months
  2455   for their formalisation. It is hard to gauge the size of a
  2458   for their formalisation. It is hard to gauge the size of a
  2456   formalisation in Nurpl, but from what is shown in the Nuprl Math Library
  2459   formalisation in Nurpl, but from what is shown in the Nuprl Math Library
  2457   about their development it seems substantially larger than ours. We attribute
  2460   about their development it seems substantially larger than ours. We attribute
  2458   this to our use of regular expressions, which meant we did not need to `fight' 
  2461   this to our use of regular expressions, which meant we did not need to `fight'