148 HopcroftUllman69}). It seems natural to exercise theorem provers by |
148 HopcroftUllman69}). It seems natural to exercise theorem provers by |
149 formalising the theorems and by verifying formally the algorithms. |
149 formalising the theorems and by verifying formally the algorithms. |
150 |
150 |
151 A popular choice for a theorem prover would be one based on Higher-Order |
151 A popular choice for a theorem prover would be one based on Higher-Order |
152 Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development |
152 Logic (HOL), for example HOL4, HOLlight or Isabelle/HOL. For the development |
153 presented in this paper we will use the latter. HOL is a predicate calculus |
153 presented in this paper we will use the Isabelle/HOL. HOL is a predicate calculus |
154 that allows quantification over predicate variables. Its type system is |
154 that allows quantification over predicate variables. Its type system is |
155 based on Church's Simple Theory of Types \cite{Church40}. Although many |
155 based on Church's Simple Theory of Types \cite{Church40}. Although many |
156 mathematical concepts can be conveniently expressed in HOL, there are some |
156 mathematical concepts can be conveniently expressed in HOL, there are some |
157 limitations that hurt badly, if one attempts a simple-minded formalisation |
157 limitations that hurt badly, if one attempts a simple-minded formalisation |
158 of regular languages in it. |
158 of regular languages in it. |
159 |
159 |
160 The typical approach to regular languages is to introduce finite |
160 The typical approach (for example \cite{HopcroftUllman69,Kozen97}) to |
161 deterministic automata and then define everything in terms of them \cite{ |
161 regular languages is to introduce finite deterministic automata and then |
162 HopcroftUllman69,Kozen97}. For example, a regular language is normally |
162 define everything in terms of them. For example, a regular language is |
163 defined as: |
163 normally defined as: |
164 |
164 |
165 \begin{dfntn}\label{baddef} |
165 \begin{dfntn}\label{baddef} |
166 A language @{text A} is \emph{regular}, provided there is a |
166 A language @{text A} is \emph{regular}, provided there is a |
167 finite deterministic automaton that recognises all strings of @{text "A"}. |
167 finite deterministic automaton that recognises all strings of @{text "A"}. |
168 \end{dfntn} |
168 \end{dfntn} |
313 lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata |
313 lexing. Berghofer and Reiter \cite{BerghoferReiter09} formalise automata |
314 working over bit strings in the context of Presburger arithmetic. The only |
314 working over bit strings in the context of Presburger arithmetic. The only |
315 larger formalisations of automata theory are carried out in Nuprl |
315 larger formalisations of automata theory are carried out in Nuprl |
316 \cite{Constable00} and in Coq \cite{Filliatre97}. |
316 \cite{Constable00} and in Coq \cite{Filliatre97}. |
317 |
317 |
318 Also one might consider automata theory and regular languages as a well-worn |
318 Also one might consider automata theory and regular languages as a meticulously |
319 stock subject where everything is crystal clear. However, paper proofs about |
319 researched subject where everything is crystal clear. However, paper proofs about |
320 automata often involve subtle side-conditions which are easily overlooked, |
320 automata often involve subtle side-conditions which are easily overlooked, |
321 but which make formal reasoning rather painful. For example Kozen's proof of |
321 but which make formal reasoning rather painful. For example Kozen's proof of |
322 the Myhill-Nerode theorem requires that automata do not have inaccessible |
322 the Myhill-Nerode theorem requires that automata do not have inaccessible |
323 states \cite{Kozen97}. Another subtle side-condition is completeness of |
323 states \cite{Kozen97}. Another subtle side-condition is completeness of |
324 automata, that is automata need to have total transition functions and at |
324 automata, that is automata need to have total transition functions and at |
438 |
438 |
439 Central to our proof will be the solution of equational systems |
439 Central to our proof will be the solution of equational systems |
440 involving equivalence classes of languages. For this we will use Arden's Lemma |
440 involving equivalence classes of languages. For this we will use Arden's Lemma |
441 (see for example \cite[Page 100]{Sakarovitch09}), |
441 (see for example \cite[Page 100]{Sakarovitch09}), |
442 which solves equations of the form @{term "X = A \<cdot> X \<union> B"} provided |
442 which solves equations of the form @{term "X = A \<cdot> X \<union> B"} provided |
443 @{term "[] \<notin> A"}. However we will need the following `reverse' |
443 @{term "[] \<notin> A"}. However we will need the following `reversed' |
444 version of Arden's Lemma (`reverse' in the sense of changing the order of @{term "A \<cdot> X"} to |
444 version of Arden's Lemma (`reversed' in the sense of changing the order of @{term "A \<cdot> X"} to |
445 \mbox{@{term "X \<cdot> A"}}). |
445 \mbox{@{term "X \<cdot> A"}}). |
446 |
446 |
447 \begin{lmm}[Reverse Arden's Lemma]\label{arden}\mbox{}\\ |
447 \begin{lmm}[Reversed Arden's Lemma]\label{arden}\mbox{}\\ |
448 If @{thm (prem 1) arden} then |
448 If @{thm (prem 1) arden} then |
449 @{thm (lhs) arden} if and only if |
449 @{thm (lhs) arden} if and only if |
450 @{thm (rhs) arden}. |
450 @{thm (rhs) arden}. |
451 \end{lmm} |
451 \end{lmm} |
452 |
452 |
454 For the right-to-left direction we assume @{thm (rhs) arden} and show |
454 For the right-to-left direction we assume @{thm (rhs) arden} and show |
455 that @{thm (lhs) arden} holds. From Property~\ref{langprops}@{text "(i)"} |
455 that @{thm (lhs) arden} holds. From Property~\ref{langprops}@{text "(i)"} |
456 we have @{term "A\<star> = A \<cdot> A\<star> \<union> {[]}"}, |
456 we have @{term "A\<star> = A \<cdot> A\<star> \<union> {[]}"}, |
457 which is equal to @{term "A\<star> = A\<star> \<cdot> A \<union> {[]}"}. Adding @{text B} to both |
457 which is equal to @{term "A\<star> = A\<star> \<cdot> A \<union> {[]}"}. Adding @{text B} to both |
458 sides gives @{term "B \<cdot> A\<star> = B \<cdot> (A\<star> \<cdot> A \<union> {[]})"}, whose right-hand side |
458 sides gives @{term "B \<cdot> A\<star> = B \<cdot> (A\<star> \<cdot> A \<union> {[]})"}, whose right-hand side |
459 is equal to @{term "(B \<cdot> A\<star>) \<cdot> A \<union> B"}. This completes this direction. |
459 is equal to @{term "(B \<cdot> A\<star>) \<cdot> A \<union> B"}. Applying the assumed equation |
|
460 completes this direction. |
460 |
461 |
461 For the other direction we assume @{thm (lhs) arden}. By a simple induction |
462 For the other direction we assume @{thm (lhs) arden}. By a simple induction |
462 on @{text n}, we can establish the property |
463 on @{text n}, we can establish the property |
463 |
464 |
464 \begin{center} |
465 \begin{center} |
538 |
539 |
539 section {* The Myhill-Nerode Theorem, First Part *} |
540 section {* The Myhill-Nerode Theorem, First Part *} |
540 |
541 |
541 text {* |
542 text {* |
542 \noindent |
543 \noindent |
543 \footnote{Folklore: Henzinger (arden-DFA-regexp.pdf); Hofmann} |
|
544 The key definition in the Myhill-Nerode theorem is the |
544 The key definition in the Myhill-Nerode theorem is the |
545 \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two |
545 \emph{Myhill-Nerode relation}, which states that w.r.t.~a language two |
546 strings are related, provided there is no distinguishing extension in this |
546 strings are related, provided there is no distinguishing extension in this |
547 language. This can be defined as a tertiary relation. |
547 language. This can be defined as a tertiary relation. |
548 |
548 |
660 the method by Brzozowski \cite{Brzozowski64}, where he marks the |
660 the method by Brzozowski \cite{Brzozowski64}, where he marks the |
661 `terminal' states. We are forced to set up the equational system in our |
661 `terminal' states. We are forced to set up the equational system in our |
662 way, because the Myhill-Nerode relation determines the `direction' of the |
662 way, because the Myhill-Nerode relation determines the `direction' of the |
663 transitions---the successor `state' of an equivalence class @{text Y} can |
663 transitions---the successor `state' of an equivalence class @{text Y} can |
664 be reached by adding a character to the end of @{text Y}. This is also the |
664 be reached by adding a character to the end of @{text Y}. This is also the |
665 reason why we have to use our reverse version of Arden's Lemma.} |
665 reason why we have to use our reversed version of Arden's Lemma.} |
666 In our running example we have the initial equational system |
666 In our running example we have the initial equational system |
667 |
667 |
668 \begin{equation}\label{exmpcs} |
668 \begin{equation}\label{exmpcs} |
669 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
669 \mbox{\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
670 @{term "X\<^isub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\ |
670 @{term "X\<^isub>1"} & @{text "="} & @{text "\<lambda>(ONE)"}\\ |
792 \end{tabular} |
792 \end{tabular} |
793 \end{center} |
793 \end{center} |
794 |
794 |
795 |
795 |
796 \noindent |
796 \noindent |
797 That means we eliminated the dependency of @{text "X\<^isub>3"} on the |
797 That means we eliminated the recursive occurrence of @{text "X\<^isub>3"} on the |
798 right-hand side. Note we used the abbreviation |
798 right-hand side. Note we used the abbreviation |
799 @{text "\<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>, ATOM c\<^isub>m}"} |
799 @{text "\<^raw:\ensuremath{\bigplus}>{ATOM c\<^isub>1,\<dots>, ATOM c\<^isub>m}"} |
800 to stand for a regular expression that matches with every character. In |
800 to stand for a regular expression that matches with every character. In |
801 our algorithm we are only interested in the existence of such a regular expression |
801 our algorithm we are only interested in the existence of such a regular expression |
802 and do not specify it any further. |
802 and do not specify it any further. |
1222 \end{lmm} |
1222 \end{lmm} |
1223 |
1223 |
1224 \begin{proof} |
1224 \begin{proof} |
1225 We set in \eqref{finiteimageD}, @{text f} to be @{text "X \<mapsto> tag ` X"}. We have |
1225 We set in \eqref{finiteimageD}, @{text f} to be @{text "X \<mapsto> tag ` X"}. We have |
1226 @{text "range f"} to be a subset of @{term "Pow (range tag)"}, which we know must be |
1226 @{text "range f"} to be a subset of @{term "Pow (range tag)"}, which we know must be |
1227 finite by assumption. Now @{term "f (UNIV // =tag=)"} is a subset of @{text "range f"}, |
1227 finite by assumption. Now @{term "f ` (UNIV // =tag=)"} is a subset of @{text "range f"}, |
1228 and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the |
1228 and so also finite. Injectivity amounts to showing that @{text "X = Y"} under the |
1229 assumptions that @{text "X, Y \<in> "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}. |
1229 assumptions that @{text "X, Y \<in> "}~@{term "UNIV // =tag="} and @{text "f X = f Y"}. |
1230 From the assumptions we obtain \mbox{@{text "x \<in> X"}} and @{text "y \<in> Y"} with |
1230 From the assumptions we obtain \mbox{@{text "x \<in> X"}} and @{text "y \<in> Y"} with |
1231 @{text "tag x = tag y"}. Since @{text x} and @{text y} are tag-related, this in |
1231 @{text "tag x = tag y"}. Since @{text x} and @{text y} are tag-related, this in |
1232 turn means that the equivalence classes @{text X} |
1232 turn means that the equivalence classes @{text X} |
1897 \eqref{pdersunivone} involve some non-routine induction arguments), we omit |
1897 \eqref{pdersunivone} involve some non-routine induction arguments), we omit |
1898 the details. |
1898 the details. |
1899 |
1899 |
1900 Let us now return to our proof for the second direction in the Myhill-Nerode |
1900 Let us now return to our proof for the second direction in the Myhill-Nerode |
1901 theorem. The point of the above calculations is to use |
1901 theorem. The point of the above calculations is to use |
1902 @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} as tagging-relation. |
1902 @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"} as tagging-relation. |
1903 |
1903 |
1904 |
1904 |
1905 \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)] |
1905 \begin{proof}[Proof of Theorem~\ref{myhillnerodetwo} (second version)] |
1906 Using \eqref{mhders} |
1906 Using \eqref{mhders} |
1907 and \eqref{Derspders} we can easily infer that |
1907 and \eqref{Derspders} we can easily infer that |
1909 \begin{center} |
1909 \begin{center} |
1910 @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "pders x r = pders y r"} |
1910 @{term "x \<approx>(lang r) y"}\hspace{4mm}\mbox{provided}\hspace{4mm}@{term "pders x r = pders y r"} |
1911 \end{center} |
1911 \end{center} |
1912 |
1912 |
1913 \noindent |
1913 \noindent |
1914 which means the tagging-relation @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"} refines @{term "\<approx>(lang r)"}. |
1914 which means the tagging-relation @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"} refines @{term "\<approx>(lang r)"}. |
1915 So we know by Lemma~\ref{fintwo}, \mbox{@{term "finite (UNIV // (\<approx>(lang r)))"}} |
1915 So we know by Lemma~\ref{fintwo}, \mbox{@{term "finite (UNIV // (\<approx>(lang r)))"}} |
1916 holds if @{term "finite (UNIV // (=(\<lambda>x. pders x r)=))"}. In order to establish |
1916 holds if @{term "finite (UNIV // (=(\<lambda>x. pders x r)=))"}. In order to establish |
1917 the latter, we can use Lemma~\ref{finone} and show that the range of the |
1917 the latter, we can use Lemma~\ref{finone} and show that the range of the |
1918 tagging-function \mbox{@{term "\<lambda>x. pders x r"}} is finite. For this recall Definition |
1918 tagging-function \mbox{@{term "\<lambda>x. pders x r"}} is finite. For this recall Definition |
1919 \ref{Pdersdef}, which gives us that |
1919 \ref{Pdersdef}, which gives us that |
1923 \end{center} |
1923 \end{center} |
1924 |
1924 |
1925 \noindent |
1925 \noindent |
1926 Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"}, |
1926 Now the range of @{term "\<lambda>x. pders x r"} is a subset of @{term "Pow (pders_lang UNIV r)"}, |
1927 which we know is finite by Theorem~\ref{antimirov}. Consequently there |
1927 which we know is finite by Theorem~\ref{antimirov}. Consequently there |
1928 are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. ders x r)\<^esub>"}, |
1928 are only finitely many equivalence classes of @{text "\<^raw:$\threesim$>\<^bsub>(\<lambda>x. pders x r)\<^esub>"}, |
1929 which refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the |
1929 which refines @{term "\<approx>(lang r)"}, and therefore we can again conclude the |
1930 second part of the Myhill-Nerode theorem. |
1930 second part of the Myhill-Nerode theorem. |
1931 \end{proof} |
1931 \end{proof} |
1932 *} |
1932 *} |
1933 |
1933 |
2105 |
2105 |
2106 Our proof of the first direction is very much inspired by \emph{Brzozowski's |
2106 Our proof of the first direction is very much inspired by \emph{Brzozowski's |
2107 algebraic method} used to convert a finite automaton to a regular expression |
2107 algebraic method} used to convert a finite automaton to a regular expression |
2108 \cite{Brzozowski64}. The close connection can be seen by considering the |
2108 \cite{Brzozowski64}. The close connection can be seen by considering the |
2109 equivalence classes as the states of the minimal automaton for the regular |
2109 equivalence classes as the states of the minimal automaton for the regular |
2110 language. However there are some subtle differences. Since we identify |
2110 language. However there are some subtle differences. Because our equivalence |
2111 equivalence classes with the states of the automaton, then the most natural |
2111 classes (or correspondingly states) arise from the Myhil-Nerode Relation, the most natural |
2112 choice is to characterise each state with the set of strings starting from |
2112 choice is to characterise each state with the set of strings starting from |
2113 the initial state leading up to that state. Usually, however, the states are |
2113 the initial state leading up to that state. Usually, however, the states are |
2114 characterised as the strings starting from that state leading to the |
2114 characterised as the strings starting from that state leading to the |
2115 terminal states. The first choice has consequences about how the initial |
2115 terminal states. The first choice has consequences about how the initial |
2116 equational system is set up. We have the $\lambda$-term on our `initial |
2116 equational system is set up. We have the $\lambda$-term on our `initial |
2117 state', while Brzozowski has it on the terminal states. This means we also |
2117 state', while Brzozowski has it on the terminal states. This means we also |
2118 need to reverse the direction of Arden's Lemma. We have not found anything |
2118 need to reverse the direction of Arden's Lemma. We have not found anything |
2119 in the literature about this way of proving the first direction of the |
2119 in the literature about our way of proving the first direction of the |
2120 Myhill-Nerode theorem, but it appears to be folklore. |
2120 Myhill-Nerode theorem, but it appears to be folklore. |
2121 |
2121 |
2122 We presented two proofs for the second direction of the Myhill-Nerode |
2122 We presented two proofs for the second direction of the Myhill-Nerode |
2123 theorem. One direct proof using tagging-functions and another using partial |
2123 theorem. One direct proof using tagging-functions and another using partial |
2124 derivatives. This part of our work is where our method using regular |
2124 derivatives. This part of our work is where our method using regular |
2149 instead of derivatives, and that termination of the algorithm can be |
2149 instead of derivatives, and that termination of the algorithm can be |
2150 formally established (the main incredience is |
2150 formally established (the main incredience is |
2151 Theorem~\ref{antimirov}). However, since partial derivatives use sets of |
2151 Theorem~\ref{antimirov}). However, since partial derivatives use sets of |
2152 regular expressions, one needs to carefully analyse whether the resulting |
2152 regular expressions, one needs to carefully analyse whether the resulting |
2153 algorithm is still executable. Given the existing infrastructure for |
2153 algorithm is still executable. Given the existing infrastructure for |
2154 executable sets in Isabelle/HOL, it should. |
2154 executable sets in Isabelle/HOL \cite{Haftmann09}, it should. |
2155 |
2155 |
2156 Our formalisation of the Myhill-Nerode theorem consists of 780 lines of |
2156 Our formalisation of the Myhill-Nerode theorem consists of 780 lines of |
2157 Isabelle/Isar code for the first direction and 460 for the second (the one |
2157 Isabelle/Isar code for the first direction and 460 for the second (the one |
2158 based on tagging-functions), plus around 300 lines of standard material |
2158 based on tagging-functions), plus around 300 lines of standard material |
2159 about regular languages. The formalisation of derivatives and partial |
2159 about regular languages. The formalisation of derivatives and partial |