Journal/Paper.thy
changeset 199 11c3c302fa2e
parent 198 b300f2c5d51d
child 200 204856ef5573
equal deleted inserted replaced
198:b300f2c5d51d 199:11c3c302fa2e
  1560 section {* Second Part proved using Partial Derivatives\label{derivatives} *}
  1560 section {* Second Part proved using Partial Derivatives\label{derivatives} *}
  1561 
  1561 
  1562 text {*
  1562 text {*
  1563   \noindent
  1563   \noindent
  1564   As we have seen in the previous section, in order to establish
  1564   As we have seen in the previous section, in order to establish
  1565   the second direction of the Myhill-Nerode theorem, we need to find 
  1565   the second direction of the Myhill-Nerode theorem, it is sufficient to find 
  1566   a more refined relation than @{term "\<approx>(lang r)"} for which we can
  1566   a more refined relation than @{term "\<approx>(lang r)"} for which we can
  1567   show that there are only finitely many equivalence classes. So far we 
  1567   show that there are only finitely many equivalence classes. So far we 
  1568   showed this directly by induction on @{text "r"} using tagging-functions. 
  1568   showed this directly by induction on @{text "r"} using tagging-functions. 
  1569   However, there is also  an indirect method to come up with such a refined 
  1569   However, there is also  an indirect method to come up with such a refined 
  1570   relation by using derivatives of regular expressions~\cite{Brzozowski64}. 
  1570   relation by using derivatives of regular expressions~\cite{Brzozowski64}. 
  1615   \end{equation}
  1615   \end{equation}
  1616 
  1616 
  1617   \noindent
  1617   \noindent
  1618   where @{text "\<Delta>"} in the fifth line is a function that tests whether the
  1618   where @{text "\<Delta>"} in the fifth line is a function that tests whether the
  1619   empty string is in the language and returns @{term "{[]}"} or @{term "{}"},
  1619   empty string is in the language and returns @{term "{[]}"} or @{term "{}"},
  1620   accordingly.  In the last equation we use the list-cons operator written
  1620   accordingly.  Note also that in the last equation we use the list-cons operator written
  1621   \mbox{@{text "_ :: _"}}.  The only interesting case is the @{text "A\<star>"}-case
  1621   \mbox{@{text "_ :: _"}}.  The only interesting case is the case of @{term "A\<star>"}
  1622   where we use Property~\ref{langprops}@{text "(i)"} in order to infer that
  1622   where we use Property~\ref{langprops}@{text "(i)"} in order to infer that
  1623   @{term "Der c (A\<star>) = Der c (A \<cdot> A\<star>)"}. We can then complete the proof by
  1623   @{term "Der c (A\<star>) = Der c (A \<cdot> A\<star>)"}. We can then complete the proof by
  1624   using the fifth equation and noting that @{term "Delta A \<cdot> Der c (A\<star>) \<subseteq> (Der
  1624   using the fifth equation and noting that @{term "Delta A \<cdot> Der c (A\<star>) \<subseteq> (Der
  1625   c A) \<cdot> A\<star>"}. 
  1625   c A) \<cdot> A\<star>"}. 
  1626 
  1626 
  1692   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
  1692   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "ders x r = ders y r"}
  1693   \end{equation}
  1693   \end{equation}
  1694 
  1694 
  1695 
  1695 
  1696   \noindent
  1696   \noindent
  1697   which means the right-hand side (seen as relation) refines the Myhill-Nerode
  1697   which means the right-hand side (seen as a relation) refines the Myhill-Nerode
  1698   relation.  Consequently, we can use @{text
  1698   relation.  Consequently, we can use @{text
  1699   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
  1699   "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as a
  1700   tagging-relation. However, in order to be useful for the second part of the
  1700   tagging-relation. However, in order to be useful for the second part of the
  1701   Myhill-Nerode theorem, we have to be able to establish that for the
  1701   Myhill-Nerode theorem, we have to be able to establish that for the
  1702   corresponding language there are only finitely many derivatives---thus
  1702   corresponding language there are only finitely many derivatives---thus
  1706   w.r.t.~the language \mbox{@{term "({a} \<cdot> {b})\<star> \<union> ({a} \<cdot> {b})\<star> \<cdot> {a}"}}
  1706   w.r.t.~the language \mbox{@{term "({a} \<cdot> {b})\<star> \<union> ({a} \<cdot> {b})\<star> \<cdot> {a}"}}
  1707   (see \cite[Page~141]{Sakarovitch09}).
  1707   (see \cite[Page~141]{Sakarovitch09}).
  1708 
  1708 
  1709 
  1709 
  1710   What Brzozowski \cite{Brzozowski64} established is that for every language there
  1710   What Brzozowski \cite{Brzozowski64} established is that for every language there
  1711   \emph{are} only finitely `dissimilar' derivatives of a regular
  1711   \emph{are} only finitely `dissimilar' derivatives for a regular
  1712   expression. Two regular expressions are said to be \emph{similar} provided
  1712   expression. Two regular expressions are said to be \emph{similar} provided
  1713   they can be identified using the using the @{text "ACI"}-identities:
  1713   they can be identified using the using the @{text "ACI"}-identities:
  1714 
  1714 
  1715 
  1715 
  1716   \begin{equation}\label{ACI}
  1716   \begin{equation}\label{ACI}
  1721   \end{tabular}}
  1721   \end{tabular}}
  1722   \end{equation}
  1722   \end{equation}
  1723 
  1723 
  1724   \noindent
  1724   \noindent
  1725   Carrying this idea through, we must not consider the set of all derivatives,
  1725   Carrying this idea through, we must not consider the set of all derivatives,
  1726   but the ones modulo @{text "ACI"}.  In principle, this can be done formally, 
  1726   but the one modulo @{text "ACI"}.  In principle, this can be done formally, 
  1727   but it is very painful in a theorem prover (since there is no
  1727   but it is very painful in a theorem prover (since there is no
  1728   direct characterisation of the set of dissimilar derivatives).
  1728   direct characterisation of the set of dissimilar derivatives).
  1729 
  1729 
  1730 
  1730 
  1731   Fortunately, there is a much simpler approach using \emph{partial
  1731   Fortunately, there is a much simpler approach using \emph{partial
  1851   For every language @{text A} and every regular expression @{text r}, 
  1851   For every language @{text A} and every regular expression @{text r}, 
  1852   \mbox{@{thm finite_pders_lang}}.
  1852   \mbox{@{thm finite_pders_lang}}.
  1853   \end{thrm}
  1853   \end{thrm}
  1854 
  1854 
  1855   \noindent
  1855   \noindent
  1856   Antimirov's argument first shows this theorem for the language @{term UNIV1}, 
  1856   Antimirov's proof first shows this theorem for the language @{term UNIV1}, 
  1857   which is the set of all non-empty strings. For this he proves:
  1857   which is the set of all non-empty strings. For this he proves:
  1858 
  1858 
  1859   \begin{equation}\label{pdersunivone}
  1859   \begin{equation}\label{pdersunivone}
  1860   \mbox{\begin{tabular}{l}
  1860   \mbox{\begin{tabular}{l}
  1861   @{thm pders_lang_Zero}\\
  1861   @{thm pders_lang_Zero}\\
  1880   \begin{center}
  1880   \begin{center}
  1881   @{thm pders_lang_UNIV}\\
  1881   @{thm pders_lang_UNIV}\\
  1882   \end{center}
  1882   \end{center}
  1883 
  1883 
  1884   \noindent
  1884   \noindent
  1885   and for all languages @{text "A"}, @{thm pders_lang_subset[where B="UNIV",
  1885   and for all languages @{text "A"}, @{term "pders_lang A r"} is a subset of
  1886   simplified]} holds.  Since we follow Antimirov's proof quite closely in our
  1886   @{term "pders_lang UNIV r"}.  Since we follow Antimirov's proof quite
  1887   formalisation (only the last two cases of \eqref{pdersunivone} involve some
  1887   closely in our formalisation (only the last two cases of
  1888   non-routine induction argument), we omit the details.
  1888   \eqref{pdersunivone} involve some non-routine induction argument), we omit
  1889 
  1889   the details.
  1890 
  1890 
  1891   Let us now return to our proof about the second direction in the Myhill-Nerode
  1891   Let us now return to our proof about the second direction in the Myhill-Nerode
  1892   theorem. The point of the above calculations is to use 
  1892   theorem. The point of the above calculations is to use 
  1893   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. 
  1893   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. 
  1894 
  1894 
  2051   proved using the Myhill-Nerode theorem.  
  2051   proved using the Myhill-Nerode theorem.  
  2052 
  2052 
  2053   Our insistence on regular expressions for proving the Myhill-Nerode theorem
  2053   Our insistence on regular expressions for proving the Myhill-Nerode theorem
  2054   arose from the limitations of HOL, on which the popular theorem provers HOL4,
  2054   arose from the limitations of HOL, on which the popular theorem provers HOL4,
  2055   HOLlight and Isabelle/HOL are based. In order to guarantee consistency,
  2055   HOLlight and Isabelle/HOL are based. In order to guarantee consistency,
  2056   formalisations can only extend HOL by definitions that introduce a new concept in
  2056   formalisations can extend HOL with definitions that introduce a new concept in
  2057   terms of already existing notions. A convenient definition for automata
  2057   only terms of already existing notions. A convenient definition for automata
  2058   (based on graphs) uses a polymorphic type for the state nodes. This allows
  2058   (based on graphs) uses a polymorphic type for the state nodes. This allows
  2059   us to use the standard operation of disjoint union in order to compose two
  2059   us to use the standard operation for disjoint union whenever we need to compose two
  2060   automata. Unfortunately, we cannot use such a polymorphic definition
  2060   automata. Unfortunately, we cannot use such a polymorphic definition
  2061   in HOL as part of the definition for regularity of a language (a
  2061   in HOL as part of the definition for regularity of a language (a predicate
  2062   set of strings).  Consider the following attempt
  2062   over set of strings).  Consider the following attempt:
  2063 
  2063 
  2064   \begin{center}
  2064   \begin{center}
  2065   @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_finite_automata (M) \<and> \<calL>(M) = A"}
  2065   @{text "is_regular A \<equiv> \<exists>M(\<alpha>). is_finite_automata (M) \<and> \<calL>(M) = A"}
  2066   \end{center}
  2066   \end{center}
  2067 
  2067 
  2068   \noindent
  2068   \noindent
  2069   which means the definiens is polymorphic in the type of the automata @{text
  2069   In this definifion, the definiens is polymorphic in the type of the automata
  2070   "M"}, but the definiendum @{text "is_regular"} is not. Such definitions are
  2070   @{text "M"} (indicated by the @{text "\<alpha>"}), but the definiendum @{text
  2071   excluded in HOL, because they can lead easily to inconsistencies (see
  2071   "is_regular"} is not. Such definitions are excluded in HOL, because they can
  2072   \cite{PittsHOL4} for a simple example). Also HOL does not contain
  2072   lead easily to inconsistencies (see \cite{PittsHOL4} for a simple
  2073   type-quantifiers which would allow us to get rid of the polymorphism by
  2073   example). Also HOL does not contain type-quantifiers which would allow us to
  2074   quantifying over the type-variable @{text "\<alpha>"}. Therefore when defining
  2074   get rid of the polymorphism by quantifying over the type-variable @{text
  2075   regularity in terms of automata, the only natural way out in HOL is to use
  2075   "\<alpha>"}. Therefore when defining regularity in terms of automata, the only
  2076   state nodes with an identity, for example a natural number. Unfortunatly,
  2076   natural way out in HOL is to resort to state nodes with an identity, for
  2077   the consequence is that we have to be careful when combining two automata so
  2077   example a natural number. Unfortunatly, the consequence is that we have to
  2078   that there is no clash between two such states. This makes formalisations
  2078   be careful when combining two automata so that there is no clash between two
  2079   quite fiddly and rather unpleasant. Regular expressions proved much more
  2079   such states. This makes formalisations quite fiddly and rather
  2080   convenient for reasoning in HOL and we showed they can be used for
  2080   unpleasant. Regular expressions proved much more convenient for reasoning in
  2081   establishing the central result in regular language theory: the Myhill-Nerode 
  2081   HOL since they can be defined as inductive datatype and a reasoning
  2082   theorem.
  2082   infrastructure comes for free. We showed they can be used for establishing
       
  2083   the central result in regular language theory---the Myhill-Nerode theorem.
  2083 
  2084 
  2084   While regular expressions are convenient, they have some limitations. One is
  2085   While regular expressions are convenient, they have some limitations. One is
  2085   that there seems to be no method of calculating a minimal regular expression
  2086   that there seems to be no method of calculating a minimal regular expression
  2086   (for example in terms of length) for a regular language, like there is for
  2087   (for example in terms of length) for a regular language, like there is for
  2087   automata. On the other hand, efficient regular expression matching, without
  2088   automata. On the other hand, efficient regular expression matching, without
  2122   \cite{OwensSlind08}. In our opinion, their formalisation is considerably
  2123   \cite{OwensSlind08}. In our opinion, their formalisation is considerably
  2123   slicker than for example the approach to regular expression matching taken
  2124   slicker than for example the approach to regular expression matching taken
  2124   in \cite{Harper99} and \cite{Yi06}. While Brzozowski's derivatives lead to a
  2125   in \cite{Harper99} and \cite{Yi06}. While Brzozowski's derivatives lead to a
  2125   simple regular expression matcher and he established that there are only
  2126   simple regular expression matcher and he established that there are only
  2126   finitely many dissimilar derivatives for every regular expression, this
  2127   finitely many dissimilar derivatives for every regular expression, this
  2127   result is not as straightforward to formalise in a theorem prover. The
  2128   result is not as straightforward to formalise in a theorem prover as one
  2128   reason is that the set of dissimilar derivatives is not defined inductively,
  2129   might wish. The reason is that the set of dissimilar derivatives is not
  2129   but in terms of an ACI-equivalence relation. This difficulty prevented for
  2130   defined inductively, but in terms of an ACI-equivalence relation. This
  2130   example Krauss and Nipkow to prove termination of their equivalence checker
  2131   difficulty prevented for example Krauss and Nipkow to prove termination of
  2131   for regular expressions \cite{KraussNipkow11}. Their checker is based on
  2132   their equivalence checker for regular expressions
  2132   derivatives and for their argument the lack of a formal proof of termination
  2133   \cite{KraussNipkow11}. Their checker is based on Brzozowski's derivatives
  2133   is not crucial (it merely lets them ``sleep better'' \cite{KraussNipkow11}).
  2134   and for their argument the lack of a formal proof of termination is not
  2134   We expect that their development simplifies by using partial derivatives,
  2135   crucial (it merely lets them ``sleep better'' \cite{KraussNipkow11}).  We
       
  2136   expect that their development simplifies by using partial derivatives,
  2135   instead of derivatives, and that termination of the algorithm can be
  2137   instead of derivatives, and that termination of the algorithm can be
  2136   formally established. However, since partial derivatives use sets of regular
  2138   formally established (the main incredience is
  2137   expressions, one needs to carefully analyse whether the resulting algorithm
  2139   Theorem~\ref{antimirov}). However, since partial derivatives use sets of
  2138   is still executable. Given the existing infrastructure for executable sets
  2140   regular expressions, one needs to carefully analyse whether the resulting
  2139   in Isabelle/HOL, it should.
  2141   algorithm is still executable. Given the existing infrastructure for
       
  2142   executable sets in Isabelle/HOL, it should.
       
  2143 
  2140 
  2144 
  2141   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
  2145   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
  2142   Isabelle/Isar code for the first direction and 460 for the second (the one
  2146   Isabelle/Isar code for the first direction and 460 for the second (the one
  2143   based on tagging functions), plus around 300 lines of standard material
  2147   based on tagging functions), plus around 300 lines of standard material
  2144   about regular languages. The formalisation about derivatives and partial
  2148   about regular languages. The formalisation of derivatives and partial
  2145   derivatives shown in Section~\ref{derivatives} consists of 390 lines of
  2149   derivatives shown in Section~\ref{derivatives} consists of 390 lines of
  2146   code.  The algorithm for solving equational systems, which we used in the
  2150   code.  The algorithm for solving equational systems, which we used in the
  2147   first direction, is conceptually relatively simple. Still the use of sets
  2151   first direction, is conceptually relatively simple. Still the use of sets
  2148   over which the algorithm operates means it is not as easy to formalise as
  2152   over which the algorithm operates means it is not as easy to formalise as
  2149   one might hope. However, it seems sets cannot be avoided since the `input'
  2153   one might hope. However, it seems sets cannot be avoided since the `input'
  2150   of the algorithm consists of equivalence classes and we cannot see how to
  2154   of the algorithm consists of equivalence classes and we cannot see how to
  2151   reformulate the theory so that we can use lists. Lists would be much easier
  2155   reformulate the theory so that we can use lists. Lists would be much easier
  2152   to reason about, since we can define function over them by recursion. For
  2156   to reason about, since we can define functions over them by recursion. For
  2153   sets we have to use set-comprehensions.
  2157   sets we have to use set-comprehensions, which is slightly unwieldy.
  2154 
  2158 
  2155   While our formalisation might be seen large, it should be seen
  2159   While our formalisation might appear large, it should be seen
  2156   in the context of the work done by Constable at al \cite{Constable00} who
  2160   in the context of the work done by Constable at al \cite{Constable00} who
  2157   formalised the Myhill-Nerode theorem in Nuprl using automata. They write
  2161   formalised the Myhill-Nerode theorem in Nuprl using automata. They write
  2158   that their four-member team needed something on the magnitude of 18 months
  2162   that their four-member team needed something on the magnitude of 18 months
  2159   for their formalisation. The estimate for our formalisation is that we
  2163   for their formalisation. The estimate for our formalisation is that we
  2160   needed approximately 3 months and this included the time to find our proof
  2164   needed approximately 3 months and this included the time to find our proof