Journal/Paper.thy
changeset 187 9f46a9571e37
parent 186 07a269d9642b
child 190 b73478aaf33e
equal deleted inserted replaced
186:07a269d9642b 187:9f46a9571e37
    32 abbreviation "ZERO \<equiv> Zero"
    32 abbreviation "ZERO \<equiv> Zero"
    33 abbreviation "ONE \<equiv> One"
    33 abbreviation "ONE \<equiv> One"
    34 abbreviation "ATOM \<equiv> Atom"
    34 abbreviation "ATOM \<equiv> Atom"
    35 abbreviation "PLUS \<equiv> Plus"
    35 abbreviation "PLUS \<equiv> Plus"
    36 abbreviation "TIMES \<equiv> Times"
    36 abbreviation "TIMES \<equiv> Times"
    37 abbreviation "TIMESS \<equiv> Times_set"
    37 abbreviation "TIMESS \<equiv> Timess"
    38 abbreviation "STAR \<equiv> Star"
    38 abbreviation "STAR \<equiv> Star"
    39 
    39 
    40 
    40 
    41 notation (latex output)
    41 notation (latex output)
    42   str_eq ("\<approx>\<^bsub>_\<^esub>") and
    42   str_eq ("\<approx>\<^bsub>_\<^esub>") and
   141   Regular languages are an important and well-understood subject in Computer
   141   Regular languages are an important and well-understood subject in Computer
   142   Science, with many beautiful theorems and many useful algorithms. There is a
   142   Science, with many beautiful theorems and many useful algorithms. There is a
   143   wide range of textbooks on this subject, many of which are aimed at students
   143   wide range of textbooks on this subject, many of which are aimed at students
   144   and contain very detailed `pencil-and-paper' proofs (e.g.~\cite{Kozen97,
   144   and contain very detailed `pencil-and-paper' proofs (e.g.~\cite{Kozen97,
   145   HopcroftUllman69}). It seems natural to exercise theorem provers by
   145   HopcroftUllman69}). It seems natural to exercise theorem provers by
   146   formalising the theorems and by verifying formally the algorithms.  A
   146   formalising the theorems and by verifying formally the algorithms.  
   147   popular choice for a theorem prover would be one based on Higher-Order Logic
   147 
   148   (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
   148   A popular choice for a theorem prover would be one based on Higher-Order
       
   149   Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
   149   presented in this paper we will use the latter. HOL is a predicate calculus
   150   presented in this paper we will use the latter. HOL is a predicate calculus
   150   that allows quantification over predicate variables. Its type system is
   151   that allows quantification over predicate variables. Its type system is
   151   based on Church's Simple Theory of Types \cite{Church40}.  Although 
   152   based on Church's Simple Theory of Types \cite{Church40}.  Although many
   152   many mathematical concepts can be conveniently expressed in HOL, there are some
   153   mathematical concepts can be conveniently expressed in HOL, there are some
   153   limitations that hurt badly, if one attempts a simple-minded formalisation
   154   limitations that hurt badly, if one attempts a simple-minded formalisation
   154   of regular languages in it.
   155   of regular languages in it.  The typical approach to regular languages is to
   155 
   156   introduce finite automata and then define everything in terms of them
   156   The typical approach to regular languages is to introduce finite automata
   157   \cite{Kozen97}.  For example, a regular language is normally defined as:
   157   and then define everything in terms of them \cite{Kozen97}.  For example, 
       
   158   a regular language is normally defined as:
       
   159 
   158 
   160   \begin{dfntn}\label{baddef}
   159   \begin{dfntn}\label{baddef}
   161   A language @{text A} is \emph{regular}, provided there is a 
   160   A language @{text A} is \emph{regular}, provided there is a 
   162   finite deterministic automaton that recognises all strings of @{text "A"}.
   161   finite deterministic automaton that recognises all strings of @{text "A"}.
   163   \end{dfntn}
   162   \end{dfntn}
   250   pairs. Using this definition for disjoint union means we do not have a
   249   pairs. Using this definition for disjoint union means we do not have a
   251   single type for automata. As a result we will not be able to define a regular
   250   single type for automata. As a result we will not be able to define a regular
   252   language as one for which there exists an automaton that recognises all its
   251   language as one for which there exists an automaton that recognises all its
   253   strings. This is because we cannot make a definition in HOL that is polymorphic in 
   252   strings. This is because we cannot make a definition in HOL that is polymorphic in 
   254   the state type and there is no type quantification available in HOL (unlike 
   253   the state type and there is no type quantification available in HOL (unlike 
   255   in Coq, for example).
   254   in Coq, for example).\footnote{Slind already pointed out this problem in an email 
       
   255   to the HOL4 mailing list on 21st April 2005.}
   256 
   256 
   257   An alternative, which provides us with a single type for automata, is to give every 
   257   An alternative, which provides us with a single type for automata, is to give every 
   258   state node an identity, for example a natural
   258   state node an identity, for example a natural
   259   number, and then be careful to rename these identities apart whenever
   259   number, and then be careful to rename these identities apart whenever
   260   connecting two automata. This results in clunky proofs
   260   connecting two automata. This results in clunky proofs
   264 
   264 
   265   Functions are much better supported in Isabelle/HOL, but they still lead to similar
   265   Functions are much better supported in Isabelle/HOL, but they still lead to similar
   266   problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
   266   problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
   267   requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} 
   267   requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} 
   268   dismisses for this the option of using identities, because it leads according to 
   268   dismisses for this the option of using identities, because it leads according to 
   269   him to ``messy proofs''. Since he does not need to define what a regular
   269   him to ``messy proofs''. Since he does not need to define what regular
   270   language is, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes 
   270   languages are, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes 
   271 
   271 
   272   \begin{quote}
   272   \begin{quote}
   273   \it%
   273   \it%
   274   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
   274   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
   275   `` & All lemmas appear obvious given a picture of the composition of automata\ldots
   275   `` & All lemmas appear obvious given a picture of the composition of automata\ldots
   315   `sink' state from which there is no connection to a final state (Brzozowski
   315   `sink' state from which there is no connection to a final state (Brzozowski
   316   mentions this side-condition in the context of state complexity
   316   mentions this side-condition in the context of state complexity
   317   of automata \cite{Brzozowski10}). Such side-conditions mean that if we define a regular
   317   of automata \cite{Brzozowski10}). Such side-conditions mean that if we define a regular
   318   language as one for which there exists \emph{a} finite automaton that
   318   language as one for which there exists \emph{a} finite automaton that
   319   recognises all its strings (see Def.~\ref{baddef}), then we need a lemma which
   319   recognises all its strings (see Def.~\ref{baddef}), then we need a lemma which
   320   ensures that another equivalent can be found satisfying the
   320   ensures that another equivalent one can be found satisfying the
   321   side-condition. Unfortunately, such `little' and `obvious' lemmas make
   321   side-condition. Unfortunately, such `little' and `obvious' lemmas make
   322   a formalisation of automata theory a hair-pulling experience.
   322   a formalisation of automata theory a hair-pulling experience.
   323 
   323 
   324 
   324 
   325   In this paper, we will not attempt to formalise automata theory in
   325   In this paper, we will not attempt to formalise automata theory in
   358   an argument about solving equational systems.  This argument appears to be
   358   an argument about solving equational systems.  This argument appears to be
   359   folklore. For the other part, we give two proofs: one direct proof using
   359   folklore. For the other part, we give two proofs: one direct proof using
   360   certain tagging-functions, and another indirect proof using Antimirov's
   360   certain tagging-functions, and another indirect proof using Antimirov's
   361   partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
   361   partial derivatives \cite{Antimirov95}. Again to our best knowledge, the
   362   tagging-functions have not been used before to establish the Myhill-Nerode
   362   tagging-functions have not been used before to establish the Myhill-Nerode
   363   theorem. Derivatives of regular expressions have been recently used quite
   363   theorem. Derivatives of regular expressions have been used recently quite
   364   widely in the literature about regular expressions. However, partial
   364   widely in the literature; partial derivatives, in contrast, attracted much
   365   derivatives are more suitable in the context of the Myhill-Nerode theorem,
   365   less attention. However, partial derivatives are more suitable in the
   366   since it is easier to establish formally their finiteness result.
   366   context of the Myhill-Nerode theorem, since it is easier to establish
   367 
   367   formally their finiteness result. We have not found any proof that uses
       
   368   either of them in order to prove the Myhill-Nerode theorem.
   368 *}
   369 *}
   369 
   370 
   370 section {* Preliminaries *}
   371 section {* Preliminaries *}
   371 
   372 
   372 text {*
   373 text {*
   396   where @{text "@"} is the list-append operation. The Kleene-star of a language @{text A}
   397   where @{text "@"} is the list-append operation. The Kleene-star of a language @{text A}
   397   is defined as the union over all powers, namely @{thm star_def}. In the paper
   398   is defined as the union over all powers, namely @{thm star_def}. In the paper
   398   we will make use of the following properties of these constructions.
   399   we will make use of the following properties of these constructions.
   399   
   400   
   400   \begin{prpstn}\label{langprops}\mbox{}\\
   401   \begin{prpstn}\label{langprops}\mbox{}\\
   401   \begin{tabular}{@ {}ll}
   402   \begin{tabular}{@ {}lp{10cm}}
   402   (i)   & @{thm star_unfold_left}     \\ 
   403   (i)   & @{thm star_unfold_left}     \\ 
   403   (ii)  & @{thm[mode=IfThen] pow_length}\\
   404   (ii)  & @{thm[mode=IfThen] pow_length}\\
   404   (iii) & @{thm conc_Union_left} \\ 
   405   (iii) & @{thm conc_Union_left} \\ 
       
   406   (iv)  & If @{thm (prem 1) star_decom} and @{thm (prem 2) star_decom} then
       
   407           there exists an @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with @{text "x = x\<^isub>p @ x\<^isub>s"} 
       
   408           and @{term "x\<^isub>p \<noteq> []"} such that @{term "x\<^isub>p \<in> A"} and @{term "x\<^isub>s \<in> A\<star>"}.
   405   \end{tabular}
   409   \end{tabular}
   406   \end{prpstn}
   410   \end{prpstn}
   407 
   411 
   408   \noindent
   412   \noindent
   409   In @{text "(ii)"} we use the notation @{term "length s"} for the length of a
   413   In @{text "(ii)"} we use the notation @{term "length s"} for the length of a
  1165   \end{center}
  1169   \end{center}
  1166 
  1170 
  1167   \noindent
  1171   \noindent
  1168   that means we take the image of @{text f} w.r.t.~all elements in the
  1172   that means we take the image of @{text f} w.r.t.~all elements in the
  1169   domain. With this we will be able to infer that the tagging-functions, seen
  1173   domain. With this we will be able to infer that the tagging-functions, seen
  1170   as relations, give rise to finitely many equivalence classes of @{const
  1174   as relations, give rise to finitely many equivalence classes. 
  1171   UNIV}. Finally we will show that the tagging-relations are more refined than
  1175   Finally we will show that the tagging-relations are more refined than
  1172   @{term "\<approx>(lang r)"}, which implies that @{term "UNIV // \<approx>(lang r)"} must
  1176   @{term "\<approx>(lang r)"}, which implies that @{term "UNIV // \<approx>(lang r)"} must
  1173   also be finite.  We formally define the notion of a \emph{tagging-relation}
  1177   also be finite.  We formally define the notion of a \emph{tagging-relation}
  1174   as follows.
  1178   as follows.
  1175 
  1179 
  1176 
  1180 
  1247   \noindent
  1251   \noindent
  1248   where @{text "A"} and @{text "B"} are some arbitrary languages. The reason for this choice 
  1252   where @{text "A"} and @{text "B"} are some arbitrary languages. The reason for this choice 
  1249   is that we need to establish that @{term "=(tag_Plus A B)="} refines @{term "\<approx>(A \<union> B)"}. 
  1253   is that we need to establish that @{term "=(tag_Plus A B)="} refines @{term "\<approx>(A \<union> B)"}. 
  1250   This amounts to showing @{term "x \<approx>A y"} or @{term "x \<approx>B y"} under the assumption
  1254   This amounts to showing @{term "x \<approx>A y"} or @{term "x \<approx>B y"} under the assumption
  1251   @{term "x"}~@{term "=(tag_Plus A B)="}~@{term y}. As we shall see, this definition will 
  1255   @{term "x"}~@{term "=(tag_Plus A B)="}~@{term y}. As we shall see, this definition will 
  1252   provide us just the right assumptions in order to get the proof through.
  1256   provide us with just the right assumptions in order to get the proof through.
  1253 
  1257 
  1254  \begin{proof}[@{const "PLUS"}-Case]
  1258  \begin{proof}[@{const "PLUS"}-Case]
  1255   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite
  1259   We can show in general, if @{term "finite (UNIV // \<approx>A)"} and @{term "finite
  1256   (UNIV // \<approx>B)"} then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"}
  1260   (UNIV // \<approx>B)"} then @{term "finite ((UNIV // \<approx>A) \<times> (UNIV // \<approx>B))"}
  1257   holds. The range of @{term "tag_Plus A B"} is a subset of this product
  1261   holds. The range of @{term "tag_Plus A B"} is a subset of this product
  1262   "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
  1266   "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
  1263   \end{proof}
  1267   \end{proof}
  1264 
  1268 
  1265   \noindent
  1269   \noindent
  1266   The @{const TIMES}-case is slightly more complicated. We first prove the
  1270   The @{const TIMES}-case is slightly more complicated. We first prove the
  1267   following lemma, which will aid the refinement-proofs.
  1271   following lemma, which will aid the proof about refinement.
  1268 
  1272 
  1269   \begin{lmm}\label{refinement}
  1273   \begin{lmm}\label{refinement}
  1270   The relation @{text "\<^raw:$\threesim$>\<^bsub>tag\<^esub>"} refines @{term "\<approx>A"}, provided for
  1274   The relation @{text "\<^raw:$\threesim$>\<^bsub>tag\<^esub>"} refines @{term "\<approx>A"}, provided for
  1271   all strings @{text x}, @{text y} and @{text z} we have \mbox{@{text "x \<^raw:$\threesim$>\<^bsub>tag\<^esub> y"}}
  1275   all strings @{text x}, @{text y} and @{text z} we have \mbox{@{text "x \<^raw:$\threesim$>\<^bsub>tag\<^esub> y"}}
  1272   and @{term "x @ z \<in> A"} imply @{text "y @ z \<in> A"}.
  1276   and @{term "x @ z \<in> A"} imply @{text "y @ z \<in> A"}.
  1273   \end{lmm}
  1277   \end{lmm}
  1274 
  1278 
  1275 
  1279 
  1276   \noindent
  1280   \noindent
  1277   We therefore can clean information from how the strings @{text "x @ z"} are in @{text A}
  1281   We therefore can analyse how the strings @{text "x @ z"} are in the language
  1278   and construct appropriate tagging-functions to infer that @{term "y @ z \<in> A"}.
  1282   @{text A} and then construct an appropriate tagging-function to infer that
  1279   For the @{const Times}-case we additionally need the notion of the set of all
  1283   @{term "y @ z"} are also in @{text A}.  For this we sill need the notion of
  1280   possible partitions of a string
  1284   the set of all possible \emph{partitions} of a string
  1281 
  1285 
  1282   \begin{equation}
  1286   \begin{equation}
  1283   @{thm Partitions_def}
  1287   @{thm Partitions_def}
  1284   \end{equation}
  1288   \end{equation}
       
  1289 
       
  1290   \noindent
       
  1291   If we know that @{text "(x\<^isub>p, x\<^isub>s) \<in> Partitions x"}, we will
       
  1292   refer to @{text "x\<^isub>p"} as the \emph{prefix} of the string @{text x},
       
  1293   respectively to @{text "x\<^isub>s"} as the \emph{suffix}.
       
  1294 
  1285 
  1295 
  1286   Now assuming  @{term "x @ z \<in> A \<cdot> B"} there are only two possible ways of how to `split' 
  1296   Now assuming  @{term "x @ z \<in> A \<cdot> B"} there are only two possible ways of how to `split' 
  1287   this string to be in @{term "A \<cdot> B"}:
  1297   this string to be in @{term "A \<cdot> B"}:
  1288   %
  1298   %
  1289   \begin{center}
  1299   \begin{center}
  1348   Either @{text x} and a prefix of @{text "z"} is in @{text A} and the rest in @{text B} 
  1358   Either @{text x} and a prefix of @{text "z"} is in @{text A} and the rest in @{text B} 
  1349   (first picture) or there is a prefix of @{text x} in @{text A} and the rest is in @{text B} 
  1359   (first picture) or there is a prefix of @{text x} in @{text A} and the rest is in @{text B} 
  1350   (second picture). In both cases we have to show that @{term "y @ z \<in> A \<cdot> B"}. The first case
  1360   (second picture). In both cases we have to show that @{term "y @ z \<in> A \<cdot> B"}. The first case
  1351   we will only go through if we know that  @{term "x \<approx>A y"} holds @{text "(*)"}. Because then 
  1361   we will only go through if we know that  @{term "x \<approx>A y"} holds @{text "(*)"}. Because then 
  1352   we can infer from @{term "x @ z\<^isub>p \<in> A"} that @{term "y @ z\<^isub>p \<in> A"} holds for all @{text "z\<^isub>p"}.
  1362   we can infer from @{term "x @ z\<^isub>p \<in> A"} that @{term "y @ z\<^isub>p \<in> A"} holds for all @{text "z\<^isub>p"}.
  1353   In the second case we only know that @{text "x\<^isub>p"} and @{text "x\<^isub>s"} is a possible partition
  1363   In the second case we only know that @{text "x\<^isub>p"} and @{text "x\<^isub>s"} is one possible partition
  1354   of the string @{text x}. So we have to know that both @{text "x\<^isub>p"} and the
  1364   of the string @{text x}. We have to know that both @{text "x\<^isub>p"} and the
  1355   corresponding partition @{text "y\<^isub>p"} are in @{text "A"}, and that @{text "x\<^isub>s"} is `@{text B}-related' 
  1365   corresponding partition @{text "y\<^isub>p"} are in @{text "A"}, and that @{text "x\<^isub>s"} is `@{text B}-related' 
  1356   to @{text "y\<^isub>s"} @{text "(**)"}. From the latter fact we can infer that @{text "y\<^isub>s @ z \<in> B"}.
  1366   to @{text "y\<^isub>s"} @{text "(**)"}. From the latter fact we can infer that @{text "y\<^isub>s @ z \<in> B"}.
       
  1367   This will solve the second case.
  1357   Taking the two requirements, @{text "(*)"} and @{text "(**)"}, together we define the
  1368   Taking the two requirements, @{text "(*)"} and @{text "(**)"}, together we define the
  1358   tagging-function as:
  1369   tagging-function in the @{const Times}-case as:
  1359 
  1370 
  1360   \begin{center}
  1371   \begin{center}
  1361   @{thm (lhs) tag_Times_def[where ?A="A" and ?B="B"]}~@{text "\<equiv>"}~
  1372   @{thm (lhs) tag_Times_def[where ?A="A" and ?B="B"]}~@{text "\<equiv>"}~
  1362   @{text "(\<lbrakk>x\<rbrakk>\<^bsub>\<approx>A\<^esub>, {\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>B\<^esub> | x\<^isub>p \<in> A \<and> (x\<^isub>p, x\<^isub>s) \<in> Partitions x})"}
  1373   @{text "(\<lbrakk>x\<rbrakk>\<^bsub>\<approx>A\<^esub>, {\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>B\<^esub> | x\<^isub>p \<in> A \<and> (x\<^isub>p, x\<^isub>s) \<in> Partitions x})"}
  1363   \end{center}
  1374   \end{center}
  1364 
  1375 
  1365   \noindent
  1376   \noindent
  1366   With this definition in place, we can discharge the @{const "Times"}-case as follows.
  1377   We have to make the assumption for all suffixes @{text "x\<^isub>s"}, since we do 
       
  1378   not know anything about how the string @{term x} is partitioned.
       
  1379   With this definition in place, let us prove the @{const "Times"}-case.
  1367 
  1380 
  1368 
  1381 
  1369   \begin{proof}[@{const TIMES}-Case]
  1382   \begin{proof}[@{const TIMES}-Case]
  1370   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
  1383   If @{term "finite (UNIV // \<approx>A)"} and @{term "finite (UNIV // \<approx>B)"}
  1371   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
  1384   then @{term "finite ((UNIV // \<approx>A) \<times> (Pow (UNIV // \<approx>B)))"} holds. The range of
  1372   @{term "tag_Times A B"} is a subset of this product set, and therefore finite.
  1385   @{term "tag_Times A B"} is a subset of this product set, and therefore finite.
  1373   By Lemma \ref{refinement} we know
  1386   For the refinement of @{term "\<approx>(A \<cdot> B)"} and @{text "\<^raw:$\threesim$>\<^bsub>\<times>tag A B\<^esub>"}, 
       
  1387   we have by Lemma \ref{refinement} 
  1374 
  1388 
  1375   \begin{center}
  1389   \begin{center}
  1376    @{term "tag_Times A B x = tag_Times A B y"}
  1390    @{term "tag_Times A B x = tag_Times A B y"}
  1377   \end{center}
  1391   \end{center}
  1378 
  1392 
  1379   \noindent
  1393   \noindent
  1380   and @{term "x @ z \<in> A \<cdot> B"}, and have to establish @{term "y @ z \<in> A \<cdot> B"}. As shown
  1394   and @{term "x @ z \<in> A \<cdot> B"}, and have to establish @{term "y @ z \<in> A \<cdot>
  1381   above, there are two cases to be considered (see pictures above). First, 
  1395   B"}. As shown in the pictures above, there are two cases to be
  1382   there exists a @{text "z\<^isub>p"} and @{text "z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s \<in> B"}.
  1396   considered. First, there exists a @{text "z\<^isub>p"} and @{text
  1383   By the assumption about @{term "tag_Times A B"} we have
  1397   "z\<^isub>s"} such that @{term "x @ z\<^isub>p \<in> A"} and @{text "z\<^isub>s
  1384   @{term "\<approx>A `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Which means by the Myhill-Nerode
  1398   \<in> B"}.  By the assumption about @{term "tag_Times A B"} we have @{term "\<approx>A
  1385   relation that @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"}, we can conclude in this case
  1399   `` {x} = \<approx>A `` {y}"} and thus @{term "x \<approx>A y"}. Hence by the Myhill-Nerode
  1386   with @{term "y @ z \<in> A \<cdot> B"}.
  1400   relation @{term "y @ z\<^isub>p \<in> A"} holds. Using @{text "z\<^isub>s \<in> B"},
       
  1401   we can conclude in this case with @{term "y @ z \<in> A \<cdot> B"} (recall @{text
       
  1402   "z\<^isub>p @ z\<^isub>s = z"}).
  1387 
  1403 
  1388   Second there exists a partition @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with 
  1404   Second there exists a partition @{text "x\<^isub>p"} and @{text "x\<^isub>s"} with 
  1389   @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}. We therefore have
  1405   @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}. We therefore have
  1390   
  1406   
  1391   \begin{center}
  1407   \begin{center}
  1401   
  1417   
  1402   \noindent
  1418   \noindent
  1403   This means there must be a partition @{text "y\<^isub>p"} and @{text "y\<^isub>s"}
  1419   This means there must be a partition @{text "y\<^isub>p"} and @{text "y\<^isub>s"}
  1404   such that @{term "y\<^isub>p \<in> A"} and @{term "\<approx>B `` {x\<^isub>s} = \<approx>B ``
  1420   such that @{term "y\<^isub>p \<in> A"} and @{term "\<approx>B `` {x\<^isub>s} = \<approx>B ``
  1405   {y\<^isub>s}"}. Unfolding the Myhill-Nerode relation and together with the
  1421   {y\<^isub>s}"}. Unfolding the Myhill-Nerode relation and together with the
  1406   facts that @{text "x\<^isub>p \<in> A"} and @{text "x\<^isub>s @ z \<in> B"}, we
  1422   facts that @{text "x\<^isub>p \<in> A"} and \mbox{@{text "x\<^isub>s @ z \<in> B"}}, we
  1407   obtain @{term "y\<^isub>p \<in> A"} and @{text "y\<^isub>s @ z \<in> B"}, as needed in
  1423   obtain @{term "y\<^isub>p \<in> A"} and @{text "y\<^isub>s @ z \<in> B"}, as needed in
  1408   this case.  We again can complete the @{const TIMES}-case by setting @{text
  1424   this case.  We again can complete the @{const TIMES}-case by setting @{text
  1409   A} to @{term "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
  1425   A} to @{term "lang r\<^isub>1"} and @{text B} to @{term "lang r\<^isub>2"}.
  1410   \end{proof}
  1426   \end{proof}
  1411 
  1427 
  1412   \noindent
  1428   \noindent
  1413   The case for @{const Star} is similar to @{const TIMES}, but poses a few
  1429   The case for @{const Star} is similar to @{const TIMES}, but poses a few
  1414   extra challenges.  To deal with them we define first the notion of a \emph{string
  1430   extra challenges.  To deal with them, we define first the notion of a \emph{string
  1415   prefix} and a \emph{strict string prefix}:
  1431   prefix} and a \emph{strict string prefix}:
  1416 
  1432 
  1417   \begin{center}
  1433   \begin{center}
  1418   \begin{tabular}{l}
  1434   \begin{tabular}{l}
  1419   @{text "x \<le> y \<equiv> \<exists>z. y = x @ z"}\\
  1435   @{text "x \<le> y \<equiv> \<exists>z. y = x @ z"}\\
  1420   @{text "x < y \<equiv> x \<le> y \<and> x \<noteq> y"}
  1436   @{text "x < y \<equiv> x \<le> y \<and> x \<noteq> y"}
  1421   \end{tabular}
  1437   \end{tabular}
  1422   \end{center}
  1438   \end{center}
  1423 
  1439 
  1424   \noindent
  1440   When analysing the case of @{text "x @ z"} being an element in @{term "A\<star>"}
  1425   When we analyse the case of @{text "x @ z"} being an element in @{term "A\<star>"}
       
  1426   and @{text x} is not the empty string, we have the following picture:
  1441   and @{text x} is not the empty string, we have the following picture:
  1427 
  1442 
  1428   \begin{center}
  1443   \begin{center}
  1429   \scalebox{1}{
  1444   \scalebox{1}{
  1430   \begin{tikzpicture}
  1445   \begin{tikzpicture}
  1464   \end{center}
  1479   \end{center}
  1465   %
  1480   %
  1466   \noindent
  1481   \noindent
  1467   We can find a strict prefix @{text "x\<^isub>p"} of @{text x} such that @{term "x\<^isub>p \<in> A\<star>"},
  1482   We can find a strict prefix @{text "x\<^isub>p"} of @{text x} such that @{term "x\<^isub>p \<in> A\<star>"},
  1468   @{text "x\<^isub>p < x"} and the rest @{term "x\<^isub>s @ z \<in> A\<star>"}. For example the empty string 
  1483   @{text "x\<^isub>p < x"} and the rest @{term "x\<^isub>s @ z \<in> A\<star>"}. For example the empty string 
  1469   @{text "[]"} would do.
  1484   @{text "[]"} would do (recall @{term "x \<noteq> []"}).
  1470   There are potentially many such prefixes, but there can only be finitely many of them (the 
  1485   There are potentially many such prefixes, but there can only be finitely many of them (the 
  1471   string @{text x} is finite). Let us therefore choose the longest one and call it 
  1486   string @{text x} is finite). Let us therefore choose the longest one and call it 
  1472   @{text "x\<^bsub>pmax\<^esub>"}. Now for the rest of the string @{text "x\<^isub>s @ z"} we
  1487   @{text "x\<^bsub>pmax\<^esub>"}. Now for the rest of the string @{text "x\<^isub>s @ z"} we
  1473   know it is in @{term "A\<star>"} and it is not the empty string. By Prop.~\ref{langprops}@{text "(i)"}, 
  1488   know it is in @{term "A\<star>"} and cannot be the empty string. By Prop.~\ref{langprops}@{text "(iv)"}, 
  1474   we can separate
  1489   we can separate
  1475   this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<in> A"}
  1490   this string into two parts, say @{text "a"} and @{text "b"}, such that @{text "a \<noteq> []"}, @{text "a \<in> A"}
  1476   and @{term "b \<in> A\<star>"}. Now @{text a} must be strictly longer than @{text "x\<^isub>s"},
  1491   and @{term "b \<in> A\<star>"}. Now @{text a} must be strictly longer than @{text "x\<^isub>s"},
  1477   otherwise @{text "x\<^bsub>pmax\<^esub>"} is not the longest prefix. That means @{text a}
  1492   otherwise @{text "x\<^bsub>pmax\<^esub>"} is not the longest prefix. That means @{text a}
  1478   `overlaps' with @{text z}, splitting it into two components @{text "z\<^isub>a"} and
  1493   `overlaps' with @{text z}, splitting it into two components @{text "z\<^isub>a"} and
  1479    @{text "z\<^isub>b"}. For this we know that @{text "x\<^isub>s @ z\<^isub>a \<in> A"} and
  1494    @{text "z\<^isub>b"}. For this we know that @{text "x\<^isub>s @ z\<^isub>a \<in> A"} and
  1480   @{term "z\<^isub>b \<in> A\<star>"}. To cut a story short, we have divided @{term "x @ z \<in> A\<star>"}
  1495   @{term "z\<^isub>b \<in> A\<star>"}. To cut a story short, we have divided @{term "x @ z \<in> A\<star>"}
  1495   @{term "tag_Star A"} is a subset of this set, and therefore finite.
  1510   @{term "tag_Star A"} is a subset of this set, and therefore finite.
  1496   Again we have to show under the assumption @{term "x"}~@{term "=(tag_Star A)="}~@{term y}
  1511   Again we have to show under the assumption @{term "x"}~@{term "=(tag_Star A)="}~@{term y}
  1497   that @{term "x @ z \<in> A\<star>"} implies @{term "y @ z \<in> A\<star>"}.
  1512   that @{term "x @ z \<in> A\<star>"} implies @{term "y @ z \<in> A\<star>"}.
  1498 
  1513 
  1499   We first need to consider the case that @{text x} is the empty string.
  1514   We first need to consider the case that @{text x} is the empty string.
  1500   From the assumption we can infer @{text y} is the empty string and
  1515   From the assumption about strict prefixes in @{text "\<^raw:$\threesim$>\<^bsub>\<star>tag A\<^esub>"}, we 
  1501   clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
  1516   can infer @{text y} is the empty string and
       
  1517   then clearly have @{term "y @ z \<in> A\<star>"}. In case @{text x} is not the empty
  1502   string, we can divide the string @{text "x @ z"} as shown in the picture 
  1518   string, we can divide the string @{text "x @ z"} as shown in the picture 
  1503   above. By the tagging-function and the facts @{text "x\<^bsub>pmax\<^esub> \<in> A\<^isup>\<star>"} and @{text "x\<^bsub>pmax\<^esub> < x"}, 
  1519   above. By the tagging-function and the facts @{text "x\<^bsub>pmax\<^esub> \<in> A\<^isup>\<star>"} and @{text "x\<^bsub>pmax\<^esub> < x"}, 
  1504   we have
  1520   we have
  1505 
  1521 
  1506   \begin{center}
  1522   \begin{center}
  1513   \begin{center}
  1529   \begin{center}
  1514   @{text "\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> \<in> {\<lbrakk>y\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> | y\<^bsub>p\<^esub> < y \<and> y\<^bsub>p\<^esub> \<in> A\<^isup>\<star> \<and> (y\<^bsub>p\<^esub>, y\<^isub>s) \<in> Partitions y}"}
  1530   @{text "\<lbrakk>x\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> \<in> {\<lbrakk>y\<^isub>s\<rbrakk>\<^bsub>\<approx>A\<^esub> | y\<^bsub>p\<^esub> < y \<and> y\<^bsub>p\<^esub> \<in> A\<^isup>\<star> \<and> (y\<^bsub>p\<^esub>, y\<^isub>s) \<in> Partitions y}"}
  1515   \end{center}
  1531   \end{center}
  1516   
  1532   
  1517   \noindent 
  1533   \noindent 
  1518   and we know there exist partitions @{text "y\<^isub>p"} and @{text
  1534   From this we know there exist partitions @{text "y\<^isub>p"} and @{text
  1519   "y\<^isub>s"} with @{term "y\<^isub>p \<in> A\<star>"} and also @{term "x\<^isub>s \<approx>A
  1535   "y\<^isub>s"} with @{term "y\<^isub>p \<in> A\<star>"} and also @{term "x\<^isub>s \<approx>A
  1520   y\<^isub>s"}. Unfolding the Myhill-Nerode relation we know @{term
  1536   y\<^isub>s"}. Unfolding the Myhill-Nerode relation we know @{term
  1521   "y\<^isub>s @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
  1537   "y\<^isub>s @ z\<^isub>a \<in> A"}. We also know that @{term "z\<^isub>b \<in> A\<star>"}.
  1522   Therefore @{term "y\<^isub>p @ (y\<^isub>s @ z\<^isub>a) @ z\<^isub>b \<in>
  1538   Therefore @{term "y\<^isub>p @ (y\<^isub>s @ z\<^isub>a) @ z\<^isub>b \<in>
  1523   A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. As the last step we have to set
  1539   A\<star>"}, which means @{term "y @ z \<in> A\<star>"}. As the last step we have to set
  1524   @{text "A"} to @{term "lang r"} and complete the proof.
  1540   @{text "A"} to @{term "lang r"} and thus complete the proof.
  1525   \end{proof}
  1541   \end{proof}
  1526 *}
  1542 *}
  1527 
  1543 
  1528 section {* Second Part based on Partial Derivatives *}
  1544 section {* Second Part proved using Partial Derivatives *}
  1529 
  1545 
  1530 text {*
  1546 text {*
  1531   \noindent
  1547   \noindent
  1532   As we have seen in the previous section, in order to establish
  1548   As we have seen in the previous section, in order to establish
  1533   the second direction of the Myhill-Nerode theorem, we need to find 
  1549   the second direction of the Myhill-Nerode theorem, we need to find 
  1534   a more refined relation than @{term "\<approx>(lang r)"} for which we can
  1550   a more refined relation than @{term "\<approx>(lang r)"} for which we can
  1535   show that there are only finitely many equivalence classes. So far we 
  1551   show that there are only finitely many equivalence classes. So far we 
  1536   showed this by induction on @{text "r"}. However, there is also 
  1552   showed this by induction on @{text "r"}. However, there is also 
  1537   an indirect method to come up with such a refined relation. Assume
  1553   an indirect method to come up with such a refined relation based on
  1538   the following two definitions for a left-quotient of a language, which 
  1554   derivatives of regular expressions \cite{Brzozowski64}. 
  1539   we write as @{term "Der c A"} and @{term "Ders s A"} where 
  1555 
  1540   @{text c} is a character and @{text s} a string:
  1556   Assume the following two definitions for a \emph{left-quotient} of a language,
       
  1557   which we write as @{term "Der c A"} and @{term "Ders s A"} where @{text c}
       
  1558   is a character and @{text s} a string:
  1541 
  1559 
  1542   \begin{center}
  1560   \begin{center}
  1543   \begin{tabular}{r@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
  1561   \begin{tabular}{r@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
  1544   @{thm (lhs) Der_def}  & @{text "\<equiv>"} & @{thm (rhs) Der_def}\\
  1562   @{thm (lhs) Der_def}  & @{text "\<equiv>"} & @{thm (rhs) Der_def}\\
  1545   @{thm (lhs) Ders_def} & @{text "\<equiv>"} & @{thm (rhs) Ders_def}\\
  1563   @{thm (lhs) Ders_def} & @{text "\<equiv>"} & @{thm (rhs) Ders_def}\\
  1546   \end{tabular}
  1564   \end{tabular}
  1547   \end{center}
  1565   \end{center}
  1548 
  1566 
  1549   \noindent
  1567   \noindent
       
  1568   In order to aid readability, we shall also make use of the following abbreviation:
       
  1569 
       
  1570   \begin{center}
       
  1571   @{abbrev "Derss s A"}
       
  1572   \end{center}
       
  1573   
       
  1574 
       
  1575   \noindent
  1550   Clearly we have the following relation between the Myhill-Nerode relation
  1576   Clearly we have the following relation between the Myhill-Nerode relation
  1551   (Def.~\ref{myhillneroderel}) and left-quotients
  1577   (Def.~\ref{myhillneroderel}) and left-quotients
  1552 
  1578 
  1553   \begin{equation}\label{mhders}
  1579   \begin{equation}\label{mhders}
  1554   @{term "x \<approx>A y"} \hspace{4mm}\text{if and only if}\hspace{4mm} @{term "Ders x A = Ders y A"}
  1580   @{term "x \<approx>A y"} \hspace{4mm}\text{if and only if}\hspace{4mm} @{term "Ders x A = Ders y A"}
  1555   \end{equation}
  1581   \end{equation}
  1556 
  1582 
  1557   \noindent
  1583   \noindent
  1558   It is realtively easy to establish the following properties for left-quotients:
  1584   It is straightforward to establish the following properties for left-quotients:
  1559   
  1585   
  1560   \begin{equation}
  1586   \begin{equation}
  1561   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
  1587   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{2mm}}l}
  1562   @{thm (lhs) Der_zero}  & $=$ & @{thm (rhs) Der_zero}\\
  1588   @{thm (lhs) Der_simps(1)} & $=$ & @{thm (rhs) Der_simps(1)}\\
  1563   @{thm (lhs) Der_one}   & $=$ & @{thm (rhs) Der_one}\\
  1589   @{thm (lhs) Der_simps(2)} & $=$ & @{thm (rhs) Der_simps(2)}\\
  1564   @{thm (lhs) Der_atom}  & $=$ & @{thm (rhs) Der_atom}\\
  1590   @{thm (lhs) Der_simps(3)} & $=$ & @{thm (rhs) Der_simps(3)}\\
  1565   @{thm (lhs) Der_union} & $=$ & @{thm (rhs) Der_union}\\
  1591   @{thm (lhs) Der_simps(4)} & $=$ & @{thm (rhs) Der_simps(4)}\\
  1566   @{thm (lhs) Der_conc}  & $=$ & @{thm (rhs) Der_conc}\\
  1592   @{thm (lhs) Der_conc}  & $=$ & @{thm (rhs) Der_conc}\\
  1567   @{thm (lhs) Der_star}  & $=$ & @{thm (rhs) Der_star}\\
  1593   @{thm (lhs) Der_star}  & $=$ & @{thm (rhs) Der_star}\\
  1568   @{thm (lhs) Ders_nil}  & $=$ & @{thm (rhs) Ders_nil}\\
  1594   @{thm (lhs) Ders_simps(1)} & $=$ & @{thm (rhs) Ders_simps(1)}\\
  1569   @{thm (lhs) Ders_cons} & $=$ & @{thm (rhs) Ders_cons}\\
  1595   @{thm (lhs) Ders_simps(2)} & $=$ & @{thm (rhs) Ders_simps(2)}\\
  1570   %@{thm (lhs) Ders_append[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}  & $=$ 
  1596   %@{thm (lhs) Ders_simps(3)[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}  & $=$ 
  1571   %   & @{thm (rhs) Ders_append[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}\\
  1597   %   & @{thm (rhs) Ders_simps(3)[where ?s1.0="s\<^isub>1" and ?s2.0="s\<^isub>2"]}\\
  1572   \end{tabular}}
  1598   \end{tabular}}
  1573   \end{equation}
  1599   \end{equation}
  1574 
  1600 
  1575   \noindent
  1601   \noindent
  1576   where @{text "\<Delta>"} is a function that tests whether the empty string
  1602   where @{text "\<Delta>"} is a function that tests whether the empty string
  1601   @{thm (lhs) ders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) ders.simps(2)}\\
  1627   @{thm (lhs) ders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) ders.simps(2)}\\
  1602   \end{tabular}
  1628   \end{tabular}
  1603   \end{center}
  1629   \end{center}
  1604 
  1630 
  1605   \noindent
  1631   \noindent
  1606   The last two lines extend @{const der} to strings (list of characters) where
  1632   The last two clauses extend derivatives for characters to strings (list of
  1607   list-cons is written \mbox{@{text "_ :: _"}}. The function @{term "nullable r"} needed 
  1633   characters). The list-cons operator is written \mbox{@{text "_ :: _"}}. The
  1608   in the @{const Times}-case tests whether a regular expression can recognise 
  1634   function @{term "nullable r"} needed in the @{const Times}-case tests
  1609   the empty string:
  1635   whether a regular expression can recognise the empty string:
  1610 
  1636 
  1611   \begin{center}
  1637   \begin{center}
  1612   \begin{tabular}{c@ {\hspace{10mm}}c}
  1638   \begin{tabular}{c@ {\hspace{10mm}}c}
  1613   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
  1639   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
  1614   @{thm (lhs) nullable.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
  1640   @{thm (lhs) nullable.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
  1637   \end{tabular}}
  1663   \end{tabular}}
  1638   \end{equation}
  1664   \end{equation}
  1639 
  1665 
  1640   \noindent
  1666   \noindent
  1641   The importance in the context of the Myhill-Nerode theorem is that 
  1667   The importance in the context of the Myhill-Nerode theorem is that 
  1642   we can use \eqref{mhders} and the equations above in order to 
  1668   we can use \eqref{mhders} and \eqref{Dersders} in order to 
  1643   establish that @{term "x \<approx>(lang r) y"} if and only if
  1669   establish that @{term "x \<approx>(lang r) y"} is equivalent to
  1644   @{term "lang (ders x r) = lang (ders y r)"}. From this we obtain
  1670   @{term "lang (ders x r) = lang (ders y r)"}. From this we obtain
  1645 
  1671 
  1646   \begin{equation}
  1672   \begin{equation}
  1647   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
  1673   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
  1648   \end{equation}
  1674   \end{equation}
  1649 
  1675 
  1650 
  1676 
  1651   \noindent
  1677   \noindent
  1652   Consequently, we can use as the tagging relation @{text
  1678   which means the right-hand side (seen as relation) refines the
  1653   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"}, since it refines
  1679   Myhill-Nerode relation.  Consequently, we can use 
  1654   @{term "\<approx>(lang r)"}. However, in order to be useful in the Myhill-Nerode
  1680   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a potential tagging-relation
  1655   theorem, we have to show that for a language there are only finitely many
  1681   for the regular expression @{text r}. However, in
  1656   derivatives. Unfortunately, this is not true in general: Sakarovitch gives
  1682   order to be useful in the Myhill-Nerode theorem, we also have to show that
  1657   an example with infinitely many derivatives
  1683   for the corresponding language there are only finitely many derivatives---ensuring
  1658   \cite[Page~141]{Sakarovitch09}. What Brzozowski \cite{Brzozowski64} proved is 
  1684   that there are only finitely many equivalence classes. Unfortunately, this
  1659   that for every language there \emph{are} finitely `dissimilar' derivatives for a
  1685   is not true in general. Sakarovitch gives an example where a regular
  1660   regular expression. Two regular expressions are said to be \emph{similar} 
  1686   expression  has infinitely many derivatives w.r.t.~a language
  1661   provided they can be identified using the using the @{text "ACI"}-identities:
  1687   \cite[Page~141]{Sakarovitch09}. What Brzozowski \cite{Brzozowski64} proved
  1662 
  1688   is that for every language there \emph{are} only finitely `dissimilar'
  1663   \begin{center}
  1689   derivatives for a regular expression. Two regular expressions are said to be
  1664   \begin{tabular}{cl}
  1690   \emph{similar} provided they can be identified using the using the @{text
       
  1691   "ACI"}-identities:
       
  1692 
       
  1693   \begin{equation}\label{ACI}
       
  1694   \mbox{\begin{tabular}{cl}
  1665   (@{text A}) & @{term "Plus (Plus r\<^isub>1 r\<^isub>2) r\<^isub>3"} $\equiv$ @{term "Plus r\<^isub>1 (Plus r\<^isub>2 r\<^isub>3)"}\\
  1695   (@{text A}) & @{term "Plus (Plus r\<^isub>1 r\<^isub>2) r\<^isub>3"} $\equiv$ @{term "Plus r\<^isub>1 (Plus r\<^isub>2 r\<^isub>3)"}\\
  1666   (@{text C}) & @{term "Plus r\<^isub>1 r\<^isub>2"} $\equiv$ @{term "Plus r\<^isub>2 r\<^isub>1"}\\
  1696   (@{text C}) & @{term "Plus r\<^isub>1 r\<^isub>2"} $\equiv$ @{term "Plus r\<^isub>2 r\<^isub>1"}\\
  1667   (@{text I}) & @{term "Plus r r"} $\equiv$ @{term "r"}\\
  1697   (@{text I}) & @{term "Plus r r"} $\equiv$ @{term "r"}\\
  1668   \end{tabular}
  1698   \end{tabular}}
  1669   \end{center}
  1699   \end{equation}
  1670 
  1700 
  1671   \noindent
  1701   \noindent
  1672   Carrying this idea through menas we now have to reasoning modulo @{text "ACI"}.
  1702   Carrying this idea through, we must not consider the set of all derivatives,
  1673   This can be done, but it is very painful in a theorem prover (since there is
  1703   but the ones modulo @{text "ACI"}.  In principle, this can be formally
  1674   no direct characterisation of the set of dissimlar derivatives).
  1704   defined, but it is very painful in a theorem prover (since there is no
       
  1705   direct characterisation of the set of dissimlar derivatives).
       
  1706 
  1675 
  1707 
  1676   Fortunately, there is a much simpler approach using \emph{partial
  1708   Fortunately, there is a much simpler approach using \emph{partial
  1677   derivatives}. They were introduced by Antimirov \cite{Antimirov95} and can be defined
  1709   derivatives}. They were introduced by Antimirov \cite{Antimirov95} and can be defined
  1678   in Isabelle/HOL as follows:
  1710   in Isabelle/HOL as follows:
  1679 
  1711 
  1684   @{thm (lhs) pder.simps(3)[where c'="d"]}  & @{text "\<equiv>"} & @{thm (rhs) pder.simps(3)[where c'="d"]}\\
  1716   @{thm (lhs) pder.simps(3)[where c'="d"]}  & @{text "\<equiv>"} & @{thm (rhs) pder.simps(3)[where c'="d"]}\\
  1685   @{thm (lhs) pder.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1717   @{thm (lhs) pder.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1686      & @{text "\<equiv>"} & @{thm (rhs) pder.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
  1718      & @{text "\<equiv>"} & @{thm (rhs) pder.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
  1687   @{thm (lhs) pder.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1719   @{thm (lhs) pder.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1688      & @{text "\<equiv>"} & @{text "if"}~@{term "nullable r\<^isub>1"}~@{text "then"}~%
  1720      & @{text "\<equiv>"} & @{text "if"}~@{term "nullable r\<^isub>1"}~@{text "then"}~%
  1689        @{term "(Times_set (pder c r\<^isub>1) r\<^isub>2) \<union> (pder c r\<^isub>2)"}\\
  1721        @{term "(Timess (pder c r\<^isub>1) r\<^isub>2) \<union> (pder c r\<^isub>2)"}\\
  1690      & & \phantom{@{text "if"}~@{term "nullable r\<^isub>1"}~}@{text "else"}~%  
  1722      & & \phantom{@{text "if"}~@{term "nullable r\<^isub>1"}~}@{text "else"}~%  
  1691                     @{term "Times_set (pder c r\<^isub>1) r\<^isub>2"}\\ 
  1723                     @{term "Timess (pder c r\<^isub>1) r\<^isub>2"}\\ 
  1692   @{thm (lhs) pder.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) pder.simps(6)}\smallskip\\
  1724   @{thm (lhs) pder.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) pder.simps(6)}\smallskip\\
  1693   @{thm (lhs) pders.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(1)}\\
  1725   @{thm (lhs) pders.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(1)}\\
       
  1726   @{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{text "\<Union> (pders s) ` (pder c r)"}\\
       
  1727   \end{tabular}
       
  1728   \end{center}
       
  1729 
       
  1730   \noindent
       
  1731   Again the last two clauses extend partial derivatives from characters to strings. 
       
  1732   Unlike `simple' derivatives, the functions for partial derivatives return sets of regular
       
  1733   expressions. In the @{const Times} and @{const Star} cases we therefore use the
       
  1734   auxiliary definition
       
  1735 
       
  1736   \begin{center}
       
  1737   @{text "TIMESS rs r \<equiv> {TIMES r' r | r' \<in> rs}"}
       
  1738   \end{center}
       
  1739 
       
  1740   \noindent
       
  1741   in order to `sequence' a regular expression with a set of regular
       
  1742   expressions. Note that in the last clause we first build the set of partial
       
  1743   derivatives w.r.t~the character @{text c}, then build the image of this set under the
       
  1744   function @{term "pders s"} and finally `union up' all resulting sets. It will be
       
  1745   convenient to introduce the following abbreviation
       
  1746 
       
  1747   \begin{center}
       
  1748   @{abbrev "pderss s A"}
       
  1749   \end{center}
       
  1750 
       
  1751   \noindent
       
  1752   which simplifies the last clause of @{const "pders"} to
       
  1753 
       
  1754   \begin{center}
       
  1755   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
  1694   @{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(2)}\\
  1756   @{thm (lhs) pders.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) pders.simps(2)}\\
  1695   \end{tabular}
  1757   \end{tabular}
  1696   \end{center}
  1758   \end{center}
  1697 
  1759 
  1698   \noindent
  1760   Partial derivatives can be seen as having the @{text "ACI"}-identities already built in: 
  1699   Unlike `simple' derivatives, these functions return a set of regular
  1761   taking the partial derivatives of the
  1700   expressions. In the @{const Times} and @{const Star} cases we use 
  1762   regular expressions in \eqref{ACI} gives us in each case
  1701 
  1763   equal sets.  Antimirov \cite{Antimirov95} showed a similar result to
  1702   \begin{center}
  1764   \eqref{Dersders} for partial derivatives:
  1703   @{text "TIMESS rs r \<equiv> {TIMES r' r | r' \<in> rs}"}
       
  1704   \end{center}
       
  1705 
       
  1706   \noindent
       
  1707   in order to `sequence' a regular expressions with respect to a set of regular 
       
  1708   expresions. Note that in the last case we first build the set of partial derivatives
       
  1709   w.r.t~@{text c}, then build the image of this set under the function @{term "pders s"}
       
  1710   and finally `union up' all resulting sets. Note also that in some sense, partial 
       
  1711   derivatives have the @{text "ACI"}-identities already build in. Antimirov 
       
  1712   \cite{Antimirov95}
       
  1713   showed 
       
  1714 
  1765 
  1715   \begin{equation}
  1766   \begin{equation}
  1716   \mbox{\begin{tabular}{c}
  1767   \mbox{\begin{tabular}{lc}
  1717   @{thm Der_pder}\\
  1768   @{text "(i)"}  & @{thm Der_pder}\\
  1718   @{thm Ders_pders}
  1769   @{text "(ii)"} & @{thm Ders_pders}
  1719   \end{tabular}}
  1770   \end{tabular}}
  1720   \end{equation}
  1771   \end{equation} 
  1721 
  1772 
  1722   \noindent
  1773   \begin{proof}
  1723   and proved that for every language and regular expression there are only finitely
  1774   The first fact is by a simple induction on @{text r}. For the second we slightly
       
  1775   modify Antimirov's proof by performing an induction on @{text s} where we
       
  1776   genaralise over all @{text r}. That means in the @{text "cons"}-case the 
       
  1777   induction hypothesis is
       
  1778 
       
  1779   \begin{center}
       
  1780   @{text "(IH)"}\hspace{3mm}@{term "\<forall>r. Ders s (lang r) = \<Union> lang ` (pders s r)"}
       
  1781   \end{center}
       
  1782 
       
  1783   \noindent
       
  1784   With this we can establish
       
  1785 
       
  1786   \begin{center}
       
  1787   \begin{tabular}{r@ {\hspace{1.5mm}}c@ {\hspace{1.5mm}}ll}
       
  1788   @{term "Ders (c # s) (lang r)"} 
       
  1789     & @{text "="} & @{term "Ders s (Der c (lang r))"} & by def.\\
       
  1790     & @{text "="} & @{term "Ders s (\<Union> lang ` (pder c r))"} & by @{text "(i)"}\\
       
  1791     & @{text "="} & @{term "\<Union> (Ders s) ` (lang ` (pder c r))"} & by def.~of @{text "Ders"}\\
       
  1792     & @{text "="} & @{term "\<Union> lang ` (\<Union> pders s ` (pder c r))"} & by IH\\
       
  1793     & @{text "="} & @{term "\<Union> lang ` (pders (c # s) r)"} & by def.\\
       
  1794   \end{tabular}
       
  1795   \end{center}
       
  1796   
       
  1797   \noindent
       
  1798   In order to apply the induction hypothesis in the fourth step, we need the generalisation
       
  1799   over all regular expressions @{text r}. The case for the empty string is routine and omitted.
       
  1800   \end{proof}
       
  1801 
       
  1802   Antimirov also proved that for every language and regular expression there are only finitely
  1724   many partial derivatives.
  1803   many partial derivatives.
  1725 *}
  1804 *}
  1726 
  1805 
  1727 section {* Closure Properties *}
  1806 section {* Closure Properties *}
  1728 
  1807 
  1729 text {*
  1808 text {*
  1730   The beautiful characteristics of regular languages is that they are closed
  1809   \noindent
  1731   under many operations. Closure under union, sequencencing and Kleene-star
  1810   The real beauty of regular languages is that they are closed
       
  1811   under almost all set operations. Closure under union, concatenation and Kleene-star
  1732   are trivial to establish given our definition of regularity (Def.~\ref{regular}).
  1812   are trivial to establish given our definition of regularity (Def.~\ref{regular}).
  1733   More interesting is the closure under complement, because
  1813   More interesting is the closure under complement, because
  1734   it seems difficult to construct a regular expression for the complement
  1814   it seems difficult to construct a regular expression for the complement
  1735   language by direct means. However the existence of such a regular expression
  1815   language by direct means. However the existence of such a regular expression
  1736   can now be easily proved using the Myhill-Nerode theorem since 
  1816   can now be easily proved using the Myhill-Nerode theorem since