Journal/Paper.thy
changeset 201 9fbf6d9f85ae
parent 200 204856ef5573
child 203 5d724fe0e096
equal deleted inserted replaced
200:204856ef5573 201:9fbf6d9f85ae
   148   HopcroftUllman69}). It seems natural to exercise theorem provers by
   148   HopcroftUllman69}). It seems natural to exercise theorem provers by
   149   formalising the theorems and by verifying formally the algorithms.  
   149   formalising the theorems and by verifying formally the algorithms.  
   150 
   150 
   151   A popular choice for a theorem prover would be one based on Higher-Order
   151   A popular choice for a theorem prover would be one based on Higher-Order
   152   Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
   152   Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development
   153   presented in this paper we will use the latter. HOL is a predicate calculus
   153   presented in this paper we will use the Isabelle/HOL. HOL is a predicate calculus
   154   that allows quantification over predicate variables. Its type system is
   154   that allows quantification over predicate variables. Its type system is
   155   based on Church's Simple Theory of Types \cite{Church40}.  Although many
   155   based on Church's Simple Theory of Types \cite{Church40}.  Although many
   156   mathematical concepts can be conveniently expressed in HOL, there are some
   156   mathematical concepts can be conveniently expressed in HOL, there are some
   157   limitations that hurt badly, if one attempts a simple-minded formalisation
   157   limitations that hurt badly, if one attempts a simple-minded formalisation
   158   of regular languages in it. 
   158   of regular languages in it. 
   159 
   159 
   160   The typical approach to regular languages is to introduce finite
   160   The typical approach (for example \cite{HopcroftUllman69,Kozen97}) to
   161   deterministic automata and then define everything in terms of them \cite{
   161   regular languages is to introduce finite deterministic automata and then
   162   HopcroftUllman69,Kozen97}.  For example, a regular language is normally
   162   define everything in terms of them.  For example, a regular language is
   163   defined as:
   163   normally defined as:
   164 
   164 
   165   \begin{dfntn}\label{baddef}
   165   \begin{dfntn}\label{baddef}
   166   A language @{text A} is \emph{regular}, provided there is a 
   166   A language @{text A} is \emph{regular}, provided there is a 
   167   finite deterministic automaton that recognises all strings of @{text "A"}.
   167   finite deterministic automaton that recognises all strings of @{text "A"}.
   168   \end{dfntn}
   168   \end{dfntn}
   313   lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata
   313   lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata
   314   working over bit strings in the context of Presburger arithmetic.  The only
   314   working over bit strings in the context of Presburger arithmetic.  The only
   315   larger formalisations of automata theory are carried out in Nuprl
   315   larger formalisations of automata theory are carried out in Nuprl
   316   \cite{Constable00} and in Coq \cite{Filliatre97}.
   316   \cite{Constable00} and in Coq \cite{Filliatre97}.
   317 
   317 
   318   Also one might consider automata theory and regular languages as a well-worn
   318   Also one might consider automata theory and regular languages as a meticulously
   319   stock subject where everything is crystal clear. However, paper proofs about
   319   researched subject where everything is crystal clear. However, paper proofs about
   320   automata often involve subtle side-conditions which are easily overlooked,
   320   automata often involve subtle side-conditions which are easily overlooked,
   321   but which make formal reasoning rather painful. For example Kozen's proof of
   321   but which make formal reasoning rather painful. For example Kozen's proof of
   322   the Myhill-Nerode theorem requires that automata do not have inaccessible
   322   the Myhill-Nerode theorem requires that automata do not have inaccessible
   323   states \cite{Kozen97}. Another subtle side-condition is completeness of
   323   states \cite{Kozen97}. Another subtle side-condition is completeness of
   324   automata, that is automata need to have total transition functions and at
   324   automata, that is automata need to have total transition functions and at
   438 
   438 
   439   Central to our proof will be the solution of equational systems
   439   Central to our proof will be the solution of equational systems
   440   involving equivalence classes of languages. For this we will use Arden's Lemma 
   440   involving equivalence classes of languages. For this we will use Arden's Lemma 
   441   (see for example \cite[Page 100]{Sakarovitch09}),
   441   (see for example \cite[Page 100]{Sakarovitch09}),
   442   which solves equations of the form @{term "X = A \<cdot> X \<union> B"} provided
   442   which solves equations of the form @{term "X = A \<cdot> X \<union> B"} provided
   443   @{term "[] \<notin> A"}. However we will need the following `reverse' 
   443   @{term "[] \<notin> A"}. However we will need the following `reversed' 
   444   version of Arden's Lemma (`reverse' in the sense of changing the order of @{term "A \<cdot> X"} to
   444   version of Arden's Lemma (`reversed' in the sense of changing the order of @{term "A \<cdot> X"} to
   445   \mbox{@{term "X \<cdot> A"}}).
   445   \mbox{@{term "X \<cdot> A"}}).
   446 
   446 
   447   \begin{lmm}[Reverse Arden's Lemma]\label{arden}\mbox{}\\
   447   \begin{lmm}[Reversed Arden's Lemma]\label{arden}\mbox{}\\
   448   If @{thm (prem 1) arden} then
   448   If @{thm (prem 1) arden} then
   449   @{thm (lhs) arden} if and only if
   449   @{thm (lhs) arden} if and only if
   450   @{thm (rhs) arden}.
   450   @{thm (rhs) arden}.
   451   \end{lmm}
   451   \end{lmm}
   452 
   452 
   454   For the right-to-left direction we assume @{thm (rhs) arden} and show
   454   For the right-to-left direction we assume @{thm (rhs) arden} and show
   455   that @{thm (lhs) arden} holds. From Property~\ref{langprops}@{text "(i)"} 
   455   that @{thm (lhs) arden} holds. From Property~\ref{langprops}@{text "(i)"} 
   456   we have @{term "A\<star> = A \<cdot> A\<star> \<union> {[]}"},
   456   we have @{term "A\<star> = A \<cdot> A\<star> \<union> {[]}"},
   457   which is equal to @{term "A\<star> = A\<star> \<cdot> A \<union> {[]}"}. Adding @{text B} to both 
   457   which is equal to @{term "A\<star> = A\<star> \<cdot> A \<union> {[]}"}. Adding @{text B} to both 
   458   sides gives @{term "B \<cdot> A\<star> = B \<cdot> (A\<star> \<cdot> A \<union> {[]})"}, whose right-hand side
   458   sides gives @{term "B \<cdot> A\<star> = B \<cdot> (A\<star> \<cdot> A \<union> {[]})"}, whose right-hand side
   459   is equal to @{term "(B \<cdot> A\<star>) \<cdot> A \<union> B"}. This completes this direction. 
   459   is equal to @{term "(B \<cdot> A\<star>) \<cdot> A \<union> B"}. Applying the assumed equation 
       
   460   completes this direction. 
   460 
   461 
   461   For the other direction we assume @{thm (lhs) arden}. By a simple induction
   462   For the other direction we assume @{thm (lhs) arden}. By a simple induction
   462   on @{text n}, we can establish the property
   463   on @{text n}, we can establish the property
   463 
   464 
   464   \begin{center}
   465   \begin{center}
   538 
   539 
   539 section {* The Myhill-Nerode Theorem, First Part *}
   540 section {* The Myhill-Nerode Theorem, First Part *}
   540 
   541 
   541 text {*
   542 text {*
   542   \noindent
   543   \noindent
   543   \footnote{Folklore: Henzinger (arden-DFA-regexp.pdf); Hofmann}
       
   544   The key definition in the Myhill-Nerode theorem is the
   544   The key definition in the Myhill-Nerode theorem is the
   545   \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
   545   \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two 
   546   strings are related, provided there is no distinguishing extension in this
   546   strings are related, provided there is no distinguishing extension in this
   547   language. This can be defined as a tertiary relation.
   547   language. This can be defined as a tertiary relation.
   548 
   548 
   660   the method by Brzozowski \cite{Brzozowski64}, where he marks the
   660   the method by Brzozowski \cite{Brzozowski64}, where he marks the
   661   `terminal' states. We are forced to set up the equational system in our
   661   `terminal' states. We are forced to set up the equational system in our
   662   way, because the Myhill-Nerode relation determines the `direction' of the
   662   way, because the Myhill-Nerode relation determines the `direction' of the
   663   transitions---the successor `state' of an equivalence class @{text Y} can
   663   transitions---the successor `state' of an equivalence class @{text Y} can
   664   be reached by adding a character to the end of @{text Y}. This is also the
   664   be reached by adding a character to the end of @{text Y}. This is also the
   665   reason why we have to use our reverse version of Arden's Lemma.}
   665   reason why we have to use our reversed version of Arden's Lemma.}
   666   In our running example we have the initial equational system
   666   In our running example we have the initial equational system
   667 
   667 
   668   \begin{equation}\label{exmpcs}
   668   \begin{equation}\label{exmpcs}
   669   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   669   \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   670   @{term "X\<^isub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\
   670   @{term "X\<^isub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\
   792   \end{tabular}
   792   \end{tabular}
   793   \end{center}
   793   \end{center}
   794 
   794 
   795 
   795 
   796   \noindent
   796   \noindent
   797   That means we eliminated the dependency of @{text "X\<^isub>3"} on the
   797   That means we eliminated the recursive occurrence of @{text "X\<^isub>3"} on the
   798   right-hand side.  Note we used the abbreviation 
   798   right-hand side.  Note we used the abbreviation 
   799   @{text "\<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>, ATOM c\<^isub>m}"} 
   799   @{text "\<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>, ATOM c\<^isub>m}"} 
   800   to stand for a regular expression that matches with every character. In 
   800   to stand for a regular expression that matches with every character. In 
   801   our algorithm we are only interested in the existence of such a regular expression
   801   our algorithm we are only interested in the existence of such a regular expression
   802   and do not specify it any further. 
   802   and do not specify it any further. 
  1222   \end{lmm}
  1222   \end{lmm}
  1223 
  1223 
  1224   \begin{proof}
  1224   \begin{proof}
  1225   We set in \eqref{finiteimageD}, @{text f} to be @{text "X \<mapsto> tag ` X"}. We have
  1225   We set in \eqref{finiteimageD}, @{text f} to be @{text "X \<mapsto> tag ` X"}. We have
  1226   @{text "range f"} to be a subset of @{term "Pow (range tag)"}, which we know must be
  1226   @{text "range f"} to be a subset of @{term "Pow (range tag)"}, which we know must be
  1227   finite by assumption. Now @{term "f (UNIV // =tag=)"} is a subset of @{text "range f"},
  1227   finite by assumption. Now @{term "f ` (UNIV // =tag=)"} is a subset of @{text "range f"},
  1228   and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the
  1228   and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the
  1229   assumptions that @{text "X, Y \<in> "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}.
  1229   assumptions that @{text "X, Y \<in> "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}.
  1230   From the assumptions we obtain \mbox{@{text "x \<in> X"}} and @{text "y \<in> Y"} with 
  1230   From the assumptions we obtain \mbox{@{text "x \<in> X"}} and @{text "y \<in> Y"} with 
  1231   @{text "tag x = tag y"}. Since @{text x} and @{text y} are tag-related, this in 
  1231   @{text "tag x = tag y"}. Since @{text x} and @{text y} are tag-related, this in 
  1232   turn means that the equivalence classes @{text X}
  1232   turn means that the equivalence classes @{text X}
  1897   \eqref{pdersunivone} involve some non-routine induction arguments), we omit
  1897   \eqref{pdersunivone} involve some non-routine induction arguments), we omit
  1898   the details.
  1898   the details.
  1899 
  1899 
  1900   Let us now return to our proof for the second direction in the Myhill-Nerode
  1900   Let us now return to our proof for the second direction in the Myhill-Nerode
  1901   theorem. The point of the above calculations is to use 
  1901   theorem. The point of the above calculations is to use 
  1902   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. 
  1902   @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"} as tagging-relation. 
  1903 
  1903 
  1904 
  1904 
  1905   \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)]
  1905   \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)]
  1906   Using \eqref{mhders}
  1906   Using \eqref{mhders}
  1907   and \eqref{Derspders} we can easily infer that
  1907   and \eqref{Derspders} we can easily infer that
  1909   \begin{center}
  1909   \begin{center}
  1910   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "pders x r = pders y r"}
  1910   @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "pders x r = pders y r"}
  1911   \end{center}
  1911   \end{center}
  1912 
  1912 
  1913   \noindent
  1913   \noindent
  1914   which means the tagging-relation @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} refines @{term "\<approx>(lang r)"}.
  1914   which means the tagging-relation @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"} refines @{term "\<approx>(lang r)"}.
  1915   So we know by Lemma~\ref{fintwo}, \mbox{@{term "finite (UNIV // (\<approx>(lang r)))"}} 
  1915   So we know by Lemma~\ref{fintwo}, \mbox{@{term "finite (UNIV // (\<approx>(lang r)))"}} 
  1916   holds if @{term "finite (UNIV // (=(\<lambda>x. pders x r)=))"}. In order to establish 
  1916   holds if @{term "finite (UNIV // (=(\<lambda>x. pders x r)=))"}. In order to establish 
  1917   the latter, we can use Lemma~\ref{finone} and show that the range of the 
  1917   the latter, we can use Lemma~\ref{finone} and show that the range of the 
  1918   tagging-function \mbox{@{term "\<lambda>x. pders x r"}} is finite. For this recall Definition
  1918   tagging-function \mbox{@{term "\<lambda>x. pders x r"}} is finite. For this recall Definition
  1919   \ref{Pdersdef}, which gives us that 
  1919   \ref{Pdersdef}, which gives us that 
  1923   \end{center}
  1923   \end{center}
  1924 
  1924 
  1925   \noindent
  1925   \noindent
  1926   Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"},
  1926   Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"},
  1927   which we know is finite by Theorem~\ref{antimirov}. Consequently there 
  1927   which we know is finite by Theorem~\ref{antimirov}. Consequently there 
  1928   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"},
  1928   are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"},
  1929   which refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the 
  1929   which refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the 
  1930   second part of the Myhill-Nerode theorem.
  1930   second part of the Myhill-Nerode theorem.
  1931   \end{proof}
  1931   \end{proof}
  1932 *}
  1932 *}
  1933 
  1933 
  2105 
  2105 
  2106   Our proof of the first direction is very much inspired by \emph{Brzozowski's
  2106   Our proof of the first direction is very much inspired by \emph{Brzozowski's
  2107   algebraic method} used to convert a finite automaton to a regular expression
  2107   algebraic method} used to convert a finite automaton to a regular expression
  2108   \cite{Brzozowski64}. The close connection can be seen by considering the
  2108   \cite{Brzozowski64}. The close connection can be seen by considering the
  2109   equivalence classes as the states of the minimal automaton for the regular
  2109   equivalence classes as the states of the minimal automaton for the regular
  2110   language.  However there are some subtle differences. Since we identify
  2110   language.  However there are some subtle differences. Because our equivalence 
  2111   equivalence classes with the states of the automaton, then the most natural
  2111   classes (or correspondingly states) arise from the Myhil-Nerode Relation, the most natural
  2112   choice is to characterise each state with the set of strings starting from
  2112   choice is to characterise each state with the set of strings starting from
  2113   the initial state leading up to that state. Usually, however, the states are
  2113   the initial state leading up to that state. Usually, however, the states are
  2114   characterised as the strings starting from that state leading to the
  2114   characterised as the strings starting from that state leading to the
  2115   terminal states.  The first choice has consequences about how the initial
  2115   terminal states.  The first choice has consequences about how the initial
  2116   equational system is set up. We have the $\lambda$-term on our `initial
  2116   equational system is set up. We have the $\lambda$-term on our `initial
  2117   state', while Brzozowski has it on the terminal states. This means we also
  2117   state', while Brzozowski has it on the terminal states. This means we also
  2118   need to reverse the direction of Arden's Lemma. We have not found anything
  2118   need to reverse the direction of Arden's Lemma. We have not found anything
  2119   in the literature about this way of proving the first direction of the
  2119   in the literature about our way of proving the first direction of the
  2120   Myhill-Nerode theorem, but it appears to be folklore.
  2120   Myhill-Nerode theorem, but it appears to be folklore.
  2121 
  2121 
  2122   We presented two proofs for the second direction of the Myhill-Nerode
  2122   We presented two proofs for the second direction of the Myhill-Nerode
  2123   theorem. One direct proof using tagging-functions and another using partial
  2123   theorem. One direct proof using tagging-functions and another using partial
  2124   derivatives. This part of our work is where our method using regular
  2124   derivatives. This part of our work is where our method using regular
  2149   instead of derivatives, and that termination of the algorithm can be
  2149   instead of derivatives, and that termination of the algorithm can be
  2150   formally established (the main incredience is
  2150   formally established (the main incredience is
  2151   Theorem~\ref{antimirov}). However, since partial derivatives use sets of
  2151   Theorem~\ref{antimirov}). However, since partial derivatives use sets of
  2152   regular expressions, one needs to carefully analyse whether the resulting
  2152   regular expressions, one needs to carefully analyse whether the resulting
  2153   algorithm is still executable. Given the existing infrastructure for
  2153   algorithm is still executable. Given the existing infrastructure for
  2154   executable sets in Isabelle/HOL, it should.
  2154   executable sets in Isabelle/HOL \cite{Haftmann09}, it should.
  2155 
  2155 
  2156   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
  2156   Our formalisation of the Myhill-Nerode theorem consists of 780 lines of
  2157   Isabelle/Isar code for the first direction and 460 for the second (the one
  2157   Isabelle/Isar code for the first direction and 460 for the second (the one
  2158   based on tagging-functions), plus around 300 lines of standard material
  2158   based on tagging-functions), plus around 300 lines of standard material
  2159   about regular languages. The formalisation of derivatives and partial
  2159   about regular languages. The formalisation of derivatives and partial