Journal/Paper.thy
changeset 200 204856ef5573
parent 199 11c3c302fa2e
child 201 9fbf6d9f85ae
equal deleted inserted replaced
199:11c3c302fa2e 200:204856ef5573
   155   based on Church's Simple Theory of Types \cite{Church40}.  Although many
   155   based on Church's Simple Theory of Types \cite{Church40}.  Although many
   156   mathematical concepts can be conveniently expressed in HOL, there are some
   156   mathematical concepts can be conveniently expressed in HOL, there are some
   157   limitations that hurt badly, if one attempts a simple-minded formalisation
   157   limitations that hurt badly, if one attempts a simple-minded formalisation
   158   of regular languages in it. 
   158   of regular languages in it. 
   159 
   159 
   160   The typical approach to regular languages is to
   160   The typical approach to regular languages is to introduce finite
   161   introduce finite automata and then define everything in terms of them
   161   deterministic automata and then define everything in terms of them \cite{
   162   \cite{ HopcroftUllman69,Kozen97}.  For example, a regular language is 
   162   HopcroftUllman69,Kozen97}.  For example, a regular language is normally
   163   normally defined as:
   163   defined as:
   164 
   164 
   165   \begin{dfntn}\label{baddef}
   165   \begin{dfntn}\label{baddef}
   166   A language @{text A} is \emph{regular}, provided there is a 
   166   A language @{text A} is \emph{regular}, provided there is a 
   167   finite deterministic automaton that recognises all strings of @{text "A"}.
   167   finite deterministic automaton that recognises all strings of @{text "A"}.
   168   \end{dfntn}
   168   \end{dfntn}
   250   \end{equation}
   250   \end{equation}
   251 
   251 
   252   \noindent
   252   \noindent
   253   changes the type---the disjoint union is not a set, but a set of
   253   changes the type---the disjoint union is not a set, but a set of
   254   pairs. Using this definition for disjoint union means we do not have a
   254   pairs. Using this definition for disjoint union means we do not have a
   255   single type for the states of automata. As a result we will not be able to define a
   255   single type for the states of automata. As a result we will not be able to
   256   regular language as one for which there exists an automaton that recognises
   256   define in our fomalisation a regular language as one for which there exists
   257   all its strings (Definition~\ref{baddef}). This is because we cannot make a
   257   an automaton that recognises all its strings (Definition~\ref{baddef}). This
   258   definition in HOL that is only polymorphic in the state type and there is no type
   258   is because we cannot make a definition in HOL that is only polymorphic in
   259   quantification available in HOL (unlike in Coq, for example).\footnote{Slind
   259   the state type, but not in the predicate for regularity; and there is no
   260   already pointed out this problem in an email to the HOL4 mailing list on
   260   type quantification available in HOL (unlike in Coq, for
   261   21st April 2005.}
   261   example).\footnote{Slind already pointed out this problem in an email to the
       
   262   HOL4 mailing list on 21st April 2005.}
   262 
   263 
   263   An alternative, which provides us with a single type for states of automata,
   264   An alternative, which provides us with a single type for states of automata,
   264   is to give every state node an identity, for example a natural number, and
   265   is to give every state node an identity, for example a natural number, and
   265   then be careful to rename these identities apart whenever connecting two
   266   then be careful to rename these identities apart whenever connecting two
   266   automata. This results in clunky proofs establishing that properties are
   267   automata. This results in clunky proofs establishing that properties are
   267   invariant under renaming. Similarly, connecting two automata represented as
   268   invariant under renaming. Similarly, connecting two automata represented as
   268   matrices results in very adhoc constructions, which are not pleasant to
   269   matrices results in very adhoc constructions, which are not pleasant to
   269   reason about.
   270   reason about.
   270 
   271 
   271   Functions are much better supported in Isabelle/HOL, but they still lead to similar
   272   Functions are much better supported in Isabelle/HOL, but they still lead to
   272   problems as with graphs.  Composing, for example, two non-deterministic automata in parallel
   273   similar problems as with graphs.  Composing, for example, two
   273   requires also the formalisation of disjoint unions. Nipkow \cite{Nipkow98} 
   274   non-deterministic automata in parallel requires also the formalisation of
   274   dismisses for this the option of using identities, because it leads according to 
   275   disjoint unions. Nipkow \cite{Nipkow98} dismisses for this the option of
   275   him to ``messy proofs''. Since he does not need to define what regular
   276   using identities, because it leads according to him to ``messy
   276   languages are, Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but writes 
   277   proofs''. Since he does not need to define what regular languages are,
       
   278   Nipkow opts for a variant of \eqref{disjointunion} using bit lists, but
       
   279   writes\smallskip
       
   280 
   277 
   281 
   278   \begin{quote}
   282   \begin{quote}
   279   \it%
   283   \it%
   280   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
   284   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
   281   `` & All lemmas appear obvious given a picture of the composition of automata\ldots
   285   `` & All lemmas appear obvious given a picture of the composition of automata\ldots
   282        Yet their proofs require a painful amount of detail.''
   286        Yet their proofs require a painful amount of detail.''
   283   \end{tabular}
   287   \end{tabular}
   284   \end{quote}
   288   \end{quote}\smallskip
   285 
   289 
   286   \noindent
   290   \noindent
   287   and
   291   and\smallskip
   288   
   292   
   289   \begin{quote}
   293   \begin{quote}
   290   \it%
   294   \it%
   291   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
   295   \begin{tabular}{@ {}l@ {}p{0.88\textwidth}@ {}}
   292   `` & If the reader finds the above treatment in terms of bit lists revoltingly
   296   `` & If the reader finds the above treatment in terms of bit lists revoltingly
   293        concrete, I cannot disagree. A more abstract approach is clearly desirable.''
   297        concrete, I cannot disagree. A more abstract approach is clearly desirable.''
   294   \end{tabular}
   298   \end{tabular}
   295   \end{quote}
   299   \end{quote}\smallskip
   296 
   300 
   297 
   301 
   298   \noindent
   302   \noindent
   299   Moreover, it is not so clear how to conveniently impose a finiteness
   303   Moreover, it is not so clear how to conveniently impose a finiteness
   300   condition upon functions in order to represent \emph{finite} automata. The
   304   condition upon functions in order to represent \emph{finite} automata. The
  1313   this string to be in @{term "A \<cdot> B"}:
  1317   this string to be in @{term "A \<cdot> B"}:
  1314   %
  1318   %
  1315   \begin{center}
  1319   \begin{center}
  1316   \begin{tabular}{c}
  1320   \begin{tabular}{c}
  1317   \scalebox{1}{
  1321   \scalebox{1}{
  1318   \begin{tikzpicture}
  1322   \begin{tikzpicture}[fill=gray!20]
  1319     \node[draw,minimum height=3.8ex] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
  1323     \node[draw,minimum height=3.8ex, fill] (x) { $\hspace{4.8em}@{text x}\hspace{4.8em}$ };
  1320     \node[draw,minimum height=3.8ex, right=-0.03em of x] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ };
  1324     \node[draw,minimum height=3.8ex, right=-0.03em of x, fill] (za) { $\hspace{0.6em}@{text "z\<^isub>p"}\hspace{0.6em}$ };
  1321     \node[draw,minimum height=3.8ex, right=-0.03em of za] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$  };
  1325     \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zza) { $\hspace{2.6em}@{text "z\<^isub>s"}\hspace{2.6em}$  };
  1322 
  1326 
  1323     \draw[decoration={brace,transform={yscale=3}},decorate]
  1327     \draw[decoration={brace,transform={yscale=3}},decorate]
  1324            (x.north west) -- ($(za.north west)+(0em,0em)$)
  1328            (x.north west) -- ($(za.north west)+(0em,0em)$)
  1325                node[midway, above=0.5em]{@{text x}};
  1329                node[midway, above=0.5em]{@{text x}};
  1326 
  1330 
  1340            ($(zza.south east)+(0em,0ex)$) -- ($(za.south east)+(0em,0ex)$)
  1344            ($(zza.south east)+(0em,0ex)$) -- ($(za.south east)+(0em,0ex)$)
  1341                node[midway, below=0.5em]{@{text "z\<^isub>s \<in> B"}};
  1345                node[midway, below=0.5em]{@{text "z\<^isub>s \<in> B"}};
  1342   \end{tikzpicture}}
  1346   \end{tikzpicture}}
  1343   \\[2mm]
  1347   \\[2mm]
  1344   \scalebox{1}{
  1348   \scalebox{1}{
  1345   \begin{tikzpicture}
  1349   \begin{tikzpicture}[fill=gray!20]
  1346     \node[draw,minimum height=3.8ex] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ };
  1350     \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{3em}@{text "x\<^isub>p"}\hspace{3em}$ };
  1347     \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ };
  1351     \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.2em}@{text "x\<^isub>s"}\hspace{0.2em}$ };
  1348     \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
  1352     \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (z) { $\hspace{5em}@{text z}\hspace{5em}$ };
  1349 
  1353 
  1350     \draw[decoration={brace,transform={yscale=3}},decorate]
  1354     \draw[decoration={brace,transform={yscale=3}},decorate]
  1351            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
  1355            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
  1352                node[midway, above=0.5em]{@{text x}};
  1356                node[midway, above=0.5em]{@{text x}};
  1353 
  1357 
  1456   When analysing the case of @{text "x @ z"} being an element in @{term "A\<star>"}
  1460   When analysing the case of @{text "x @ z"} being an element in @{term "A\<star>"}
  1457   and @{text x} is not the empty string, we have the following picture:
  1461   and @{text x} is not the empty string, we have the following picture:
  1458 
  1462 
  1459   \begin{center}
  1463   \begin{center}
  1460   \scalebox{1}{
  1464   \scalebox{1}{
  1461   \begin{tikzpicture}
  1465   \begin{tikzpicture}[fill=gray!20]
  1462     \node[draw,minimum height=3.8ex] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ };
  1466     \node[draw,minimum height=3.8ex, fill] (xa) { $\hspace{4em}@{text "x\<^bsub>pmax\<^esub>"}\hspace{4em}$ };
  1463     \node[draw,minimum height=3.8ex, right=-0.03em of xa] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ };
  1467     \node[draw,minimum height=3.8ex, right=-0.03em of xa, fill] (xxa) { $\hspace{0.5em}@{text "x\<^bsub>s\<^esub>"}\hspace{0.5em}$ };
  1464     \node[draw,minimum height=3.8ex, right=-0.03em of xxa] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
  1468     \node[draw,minimum height=3.8ex, right=-0.03em of xxa, fill] (za) { $\hspace{2em}@{text "z\<^isub>a"}\hspace{2em}$ };
  1465     \node[draw,minimum height=3.8ex, right=-0.03em of za] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
  1469     \node[draw,minimum height=3.8ex, right=-0.03em of za, fill] (zb) { $\hspace{7em}@{text "z\<^isub>b"}\hspace{7em}$ };
  1466 
  1470 
  1467     \draw[decoration={brace,transform={yscale=3}},decorate]
  1471     \draw[decoration={brace,transform={yscale=3}},decorate]
  1468            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
  1472            (xa.north west) -- ($(xxa.north east)+(0em,0em)$)
  1469                node[midway, above=0.5em]{@{text x}};
  1473                node[midway, above=0.5em]{@{text x}};
  1470 
  1474 
  1652   boolean function @{term "nullable r"} needed in the @{const Times}-case tests
  1656   boolean function @{term "nullable r"} needed in the @{const Times}-case tests
  1653   whether a regular expression can recognise the empty string. It can be defined as 
  1657   whether a regular expression can recognise the empty string. It can be defined as 
  1654   follows.
  1658   follows.
  1655 
  1659 
  1656   \begin{center}
  1660   \begin{center}
  1657   \begin{tabular}{c@ {\hspace{10mm}}c}
  1661   \begin{tabular}{c}
  1658   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
  1662   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
  1659   @{thm (lhs) nullable.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
  1663   @{thm (lhs) nullable.simps(1)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(1)}\\
  1660   @{thm (lhs) nullable.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(2)}\\
  1664   @{thm (lhs) nullable.simps(2)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(2)}\\
  1661   @{thm (lhs) nullable.simps(3)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(3)}\\
  1665   @{thm (lhs) nullable.simps(3)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(3)}\\
  1662   \end{tabular} &
       
  1663   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1.5mm}}l@ {}}
       
  1664   @{thm (lhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1666   @{thm (lhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1665      & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
  1667      & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(4)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
  1666   @{thm (lhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1668   @{thm (lhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}  
  1667      & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
  1669      & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(5)[where ?r1.0="r\<^isub>1" and ?r2.0="r\<^isub>2"]}\\
  1668   @{thm (lhs) nullable.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(6)}\\
  1670   @{thm (lhs) nullable.simps(6)}  & @{text "\<equiv>"} & @{thm (rhs) nullable.simps(6)}\\
  1683   \end{equation}
  1685   \end{equation}
  1684 
  1686 
  1685   \noindent
  1687   \noindent
  1686   The importance of this fact in the context of the Myhill-Nerode theorem is that 
  1688   The importance of this fact in the context of the Myhill-Nerode theorem is that 
  1687   we can use \eqref{mhders} and \eqref{Dersders} in order to 
  1689   we can use \eqref{mhders} and \eqref{Dersders} in order to 
  1688   establish that @{term "x \<approx>(lang r) y"} is equivalent to
  1690   establish that 
  1689   \mbox{@{term "lang (ders x r) = lang (ders y r)"}}. Hence
  1691 
       
  1692   \begin{center}
       
  1693   @{term "x \<approx>(lang r) y"} \hspace{4mm}if and only if\hspace{4mm}
       
  1694   @{term "lang (ders x r) = lang (ders y r)"}. 
       
  1695   \end{center}
       
  1696 
       
  1697   \noindent
       
  1698   holds and hence
  1690 
  1699 
  1691   \begin{equation}
  1700   \begin{equation}
  1692   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
  1701   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
  1693   \end{equation}
  1702   \end{equation}
  1694 
  1703 
  1695 
  1704 
  1696   \noindent
  1705   \noindent
  1697   which means the right-hand side (seen as a relation) refines the Myhill-Nerode
  1706   This means the right-hand side (seen as a relation) refines the Myhill-Nerode
  1698   relation.  Consequently, we can use @{text
  1707   relation.  Consequently, we can use @{text
  1699   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
  1708   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
  1700   tagging-relation. However, in order to be useful for the second part of the
  1709   tagging-relation. However, in order to be useful for the second part of the
  1701   Myhill-Nerode theorem, we have to be able to establish that for the
  1710   Myhill-Nerode theorem, we have to be able to establish that for the
  1702   corresponding language there are only finitely many derivatives---thus
  1711   corresponding language there are only finitely many derivatives---thus
  1838   These two properties confirm the observation made earlier 
  1847   These two properties confirm the observation made earlier 
  1839   that by using sets, partial derivatives have the @{text "ACI"}-identities
  1848   that by using sets, partial derivatives have the @{text "ACI"}-identities
  1840   of derivatives already built in. 
  1849   of derivatives already built in. 
  1841 
  1850 
  1842   Antimirov also proved that for every language and regular expression 
  1851   Antimirov also proved that for every language and regular expression 
  1843   there are only finitely many partial derivatives, whereby the partial
  1852   there are only finitely many partial derivatives, whereby the set of partial
  1844   derivatives of @{text r} w.r.t.~a language @{text A} is defined as
  1853   derivatives of @{text r} w.r.t.~a language @{text A} is defined as
  1845 
  1854 
  1846   \begin{equation}\label{Pdersdef}
  1855   \begin{equation}\label{Pdersdef}
  1847   @{thm pders_lang_def}
  1856   @{thm pders_lang_def}
  1848   \end{equation}
  1857   \end{equation}
  1851   For every language @{text A} and every regular expression @{text r}, 
  1860   For every language @{text A} and every regular expression @{text r}, 
  1852   \mbox{@{thm finite_pders_lang}}.
  1861   \mbox{@{thm finite_pders_lang}}.
  1853   \end{thrm}
  1862   \end{thrm}
  1854 
  1863 
  1855   \noindent
  1864   \noindent
  1856   Antimirov's proof first shows this theorem for the language @{term UNIV1}, 
  1865   Antimirov's proof first establishes this theorem for the language @{term UNIV1}, 
  1857   which is the set of all non-empty strings. For this he proves:
  1866   which is the set of all non-empty strings. For this he proves:
  1858 
  1867 
  1859   \begin{equation}\label{pdersunivone}
  1868   \begin{equation}\label{pdersunivone}
  1860   \mbox{\begin{tabular}{l}
  1869   \mbox{\begin{tabular}{l}
  1861   @{thm pders_lang_Zero}\\
  1870   @{thm pders_lang_Zero}\\
  1883 
  1892 
  1884   \noindent
  1893   \noindent
  1885   and for all languages @{text "A"}, @{term "pders_lang A r"} is a subset of
  1894   and for all languages @{text "A"}, @{term "pders_lang A r"} is a subset of
  1886   @{term "pders_lang UNIV r"}.  Since we follow Antimirov's proof quite
  1895   @{term "pders_lang UNIV r"}.  Since we follow Antimirov's proof quite
  1887   closely in our formalisation (only the last two cases of
  1896   closely in our formalisation (only the last two cases of
  1888   \eqref{pdersunivone} involve some non-routine induction argument), we omit
  1897   \eqref{pdersunivone} involve some non-routine induction arguments), we omit
  1889   the details.
  1898   the details.
  1890 
  1899 
  1891   Let us now return to our proof about the second direction in the Myhill-Nerode
  1900   Let us now return to our proof for the second direction in the Myhill-Nerode
  1892   theorem. The point of the above calculations is to use 
  1901   theorem. The point of the above calculations is to use 
  1893   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. 
  1902   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. 
  1894 
  1903 
  1895 
  1904 
  1896   \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)]
  1905   \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)]
  1913   @{thm pders_lang_def[where A="UNIV", simplified]}
  1922   @{thm pders_lang_def[where A="UNIV", simplified]}
  1914   \end{center}
  1923   \end{center}
  1915 
  1924 
  1916   \noindent
  1925   \noindent
  1917   Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"},
  1926   Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"},
  1918   which we know is finite by Theorem~\ref{antimirov}. This means there 
  1927   which we know is finite by Theorem~\ref{antimirov}. Consequently there 
  1919   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"},
  1928   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"},
  1920   which refines @{term "\<approx>(lang r)"}, and consequently we can again conclude the 
  1929   which refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the 
  1921   second part of the Myhill-Nerode theorem.
  1930   second part of the Myhill-Nerode theorem.
  1922   \end{proof}
  1931   \end{proof}
  1923 *}
  1932 *}
  1924 
  1933 
  1925 section {* Closure Properties of Regular Languages *}
  1934 section {* Closure Properties of Regular Languages\label{closure} *}
  1926 
  1935 
  1927 text {*
  1936 text {*
  1928   \noindent
  1937   \noindent
  1929   The beauty of regular languages is that they are closed under many set
  1938   The beauty of regular languages is that they are closed under many set
  1930   operations. Closure under union, concatenation and Kleene-star are trivial
  1939   operations. Closure under union, concatenation and Kleene-star are trivial
  1958   @{term "A - B = - (- A \<union> B)"}
  1967   @{term "A - B = - (- A \<union> B)"}
  1959   \end{tabular}
  1968   \end{tabular}
  1960   \end{center}
  1969   \end{center}
  1961 
  1970 
  1962   \noindent
  1971   \noindent
  1963   Closure of regular languages under reversal, that is
  1972   Since all finite languages are regular, then by closure under complement also
       
  1973   all co-finite languages. Closure of regular languages under reversal, that is
  1964 
  1974 
  1965   \begin{center}
  1975   \begin{center}
  1966   @{text "A\<^bsup>-1\<^esup> \<equiv> {s\<^bsup>-1\<^esup> | s \<in> A}"}
  1976   @{text "A\<^bsup>-1\<^esup> \<equiv> {s\<^bsup>-1\<^esup> | s \<in> A}"}
  1967   \end{center}
  1977   \end{center}
  1968 
  1978 
  2049   construct a regular expression for the complement language by direct
  2059   construct a regular expression for the complement language by direct
  2050   means. However the existence of such a regular expression can be easily
  2060   means. However the existence of such a regular expression can be easily
  2051   proved using the Myhill-Nerode theorem.  
  2061   proved using the Myhill-Nerode theorem.  
  2052 
  2062 
  2053   Our insistence on regular expressions for proving the Myhill-Nerode theorem
  2063   Our insistence on regular expressions for proving the Myhill-Nerode theorem
  2054   arose from the limitations of HOL, on which the popular theorem provers HOL4,
  2064   arose from the limitations of HOL, used in the popular theorem provers HOL4,
  2055   HOLlight and Isabelle/HOL are based. In order to guarantee consistency,
  2065   HOLlight and Isabelle/HOL. In order to guarantee consistency,
  2056   formalisations can extend HOL with definitions that introduce a new concept in
  2066   formalisations in HOL can only extend the logic with definitions that introduce a new concept in
  2057   only terms of already existing notions. A convenient definition for automata
  2067   terms of already existing notions. A convenient definition for automata
  2058   (based on graphs) uses a polymorphic type for the state nodes. This allows
  2068   (based on graphs) uses a polymorphic type for the state nodes. This allows
  2059   us to use the standard operation for disjoint union whenever we need to compose two
  2069   us to use the standard operation for disjoint union whenever we need to compose two
  2060   automata. Unfortunately, we cannot use such a polymorphic definition
  2070   automata. Unfortunately, we cannot use such a polymorphic definition
  2061   in HOL as part of the definition for regularity of a language (a predicate
  2071   in HOL as part of the definition for regularity of a language (a predicate
  2062   over set of strings).  Consider the following attempt:
  2072   over set of strings).  Consider the following attempt:
  2063 
  2073 
  2064   \begin{center}
  2074   \begin{center}
  2065   @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_finite_automata (M) \<and> \<calL>(M) = A"}
  2075   @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_dfa (M) \<and> \<calL>(M) = A"}
  2066   \end{center}
  2076   \end{center}
  2067 
  2077 
  2068   \noindent
  2078   \noindent
  2069   In this definifion, the definiens is polymorphic in the type of the automata
  2079   In this definifion, the definiens is polymorphic in the type of the automata
  2070   @{text "M"} (indicated by the @{text "\<alpha>"}), but the definiendum @{text
  2080   @{text "M"} (indicated by dependency on @{text "\<alpha>"}), but the definiendum
  2071   "is_regular"} is not. Such definitions are excluded in HOL, because they can
  2081   @{text "is_regular"} is not. Such definitions are excluded in HOL, because
  2072   lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
  2082   they can lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
  2073   example). Also HOL does not contain type-quantifiers which would allow us to
  2083   example). Also HOL does not contain type-quantifiers which would allow us to
  2074   get rid of the polymorphism by quantifying over the type-variable @{text
  2084   get rid of the polymorphism by quantifying over the type-variable @{text
  2075   "\<alpha>"}. Therefore when defining regularity in terms of automata, the only
  2085   "\<alpha>"}. Therefore when defining regularity in terms of automata, the only
  2076   natural way out in HOL is to resort to state nodes with an identity, for
  2086   natural way out in HOL is to resort to state nodes with an identity, for
  2077   example a natural number. Unfortunatly, the consequence is that we have to
  2087   example a natural number. Unfortunatly, the consequence is that we have to
  2078   be careful when combining two automata so that there is no clash between two
  2088   be careful when combining two automata so that there is no clash between two
  2079   such states. This makes formalisations quite fiddly and rather
  2089   such states. This makes formalisations quite fiddly and rather
  2080   unpleasant. Regular expressions proved much more convenient for reasoning in
  2090   unpleasant. Regular expressions proved much more convenient for reasoning in
  2081   HOL since they can be defined as inductive datatype and a reasoning
  2091   HOL since they can be defined as inductive datatype and a reasoning
  2082   infrastructure comes for free. We showed they can be used for establishing
  2092   infrastructure comes for free. The definition of regularity in terms of
  2083   the central result in regular language theory---the Myhill-Nerode theorem.
  2093   regular expressions poses no problem at all for HOL.  We showed in this
       
  2094   paper that they can be used for establishing the central result in regular
       
  2095   language theory---the Myhill-Nerode theorem.
  2084 
  2096 
  2085   While regular expressions are convenient, they have some limitations. One is
  2097   While regular expressions are convenient, they have some limitations. One is
  2086   that there seems to be no method of calculating a minimal regular expression
  2098   that there seems to be no method of calculating a minimal regular expression
  2087   (for example in terms of length) for a regular language, like there is for
  2099   (for example in terms of length) for a regular language, like there is for
  2088   automata. On the other hand, efficient regular expression matching, without
  2100   automata. On the other hand, efficient regular expression matching, without
  2139   Theorem~\ref{antimirov}). However, since partial derivatives use sets of
  2151   Theorem~\ref{antimirov}). However, since partial derivatives use sets of
  2140   regular expressions, one needs to carefully analyse whether the resulting
  2152   regular expressions, one needs to carefully analyse whether the resulting
  2141   algorithm is still executable. Given the existing infrastructure for
  2153   algorithm is still executable. Given the existing infrastructure for
  2142   executable sets in Isabelle/HOL, it should.
  2154   executable sets in Isabelle/HOL, it should.
  2143 
  2155 
  2144 
       
  2145   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
  2156   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
  2146   Isabelle/Isar code for the first direction and 460 for the second (the one
  2157   Isabelle/Isar code for the first direction and 460 for the second (the one
  2147   based on tagging functions), plus around 300 lines of standard material
  2158   based on tagging-functions), plus around 300 lines of standard material
  2148   about regular languages. The formalisation of derivatives and partial
  2159   about regular languages. The formalisation of derivatives and partial
  2149   derivatives shown in Section~\ref{derivatives} consists of 390 lines of
  2160   derivatives shown in Section~\ref{derivatives} consists of 390 lines of
  2150   code.  The algorithm for solving equational systems, which we used in the
  2161   code.  The closure properties in Section~\ref{closure} can be established in
  2151   first direction, is conceptually relatively simple. Still the use of sets
  2162   190 lines of code.  The algorithm for solving equational systems, which we
  2152   over which the algorithm operates means it is not as easy to formalise as
  2163   used in the first direction, is conceptually relatively simple. Still the
  2153   one might hope. However, it seems sets cannot be avoided since the `input'
  2164   use of sets over which the algorithm operates means it is not as easy to
  2154   of the algorithm consists of equivalence classes and we cannot see how to
  2165   formalise as one might hope. However, it seems sets cannot be avoided since
  2155   reformulate the theory so that we can use lists. Lists would be much easier
  2166   the `input' of the algorithm consists of equivalence classes and we cannot
  2156   to reason about, since we can define functions over them by recursion. For
  2167   see how to reformulate the theory so that we can use lists. Lists would be
  2157   sets we have to use set-comprehensions, which is slightly unwieldy.
  2168   much easier to reason about, since we can define functions over them by
       
  2169   recursion. For sets we have to use set-comprehensions, which is slightly
       
  2170   unwieldy.
  2158 
  2171 
  2159   While our formalisation might appear large, it should be seen
  2172   While our formalisation might appear large, it should be seen
  2160   in the context of the work done by Constable at al \cite{Constable00} who
  2173   in the context of the work done by Constable at al \cite{Constable00} who
  2161   formalised the Myhill-Nerode theorem in Nuprl using automata. They write
  2174   formalised the Myhill-Nerode theorem in Nuprl using automata. They write
  2162   that their four-member team needed something on the magnitude of 18 months
  2175   that their four-member team needed something on the magnitude of 18 months