831 |
832 |
832 |
833 |
833 section {* Myhill-Nerode, Second Part *} |
834 section {* Myhill-Nerode, Second Part *} |
834 |
835 |
835 text {* |
836 text {* |
|
837 TO BE DONE |
|
838 |
836 |
839 |
837 \begin{theorem} |
840 \begin{theorem} |
838 Given @{text "r"} is a regular expressions, then @{thm rexp_imp_finite}. |
841 Given @{text "r"} is a regular expressions, then @{thm Myhill_Nerode2}. |
839 \end{theorem} |
842 \end{theorem} |
840 |
843 |
841 \begin{proof} |
844 % \begin{proof} |
842 By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} |
845 % By induction on the structure of @{text r}. The cases for @{const NULL}, @{const EMPTY} |
843 and @{const CHAR} are straightforward, because we can easily establish |
846 % and @{const CHAR} are straightforward, because we can easily establish |
844 |
847 |
845 \begin{center} |
848 % \begin{center} |
846 \begin{tabular}{l} |
849 % \begin{tabular}{l} |
847 @{thm quot_null_eq}\\ |
850 % @{thm quot_null_eq}\\ |
848 @{thm quot_empty_subset}\\ |
851 % @{thm quot_empty_subset}\\ |
849 @{thm quot_char_subset} |
852 % @{thm quot_char_subset} |
850 \end{tabular} |
853 % \end{tabular} |
851 \end{center} |
854 % \end{center} |
852 |
855 % |
853 \end{proof} |
856 % \end{proof} |
854 |
857 |
855 |
858 |
856 @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} |
859 % @{thm tag_str_ALT_def[where ?L1.0="A" and ?L2.0="B"]} |
857 |
860 |
858 @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} |
861 % @{thm tag_str_SEQ_def[where ?L1.0="A" and ?L2.0="B"]} |
859 |
862 |
860 @{thm tag_str_STAR_def[where ?L1.0="A"]} |
863 % @{thm tag_str_STAR_def[where ?L1.0="A"]} |
861 *} |
864 *} |
862 |
865 |
863 |
866 |
864 section {* Conclusion and Related Work *} |
867 section {* Conclusion and Related Work *} |
865 |
868 |
866 text {* |
869 text {* |
867 In this paper we took the view that a regular language is one where there exists |
870 In this paper we took the view that a regular language is one where there |
868 a regular expression that matches all its strings. For us it was ineteresting to find |
871 exists a regular expression that matches all its strings. Regular |
869 out how far we can push this point of view. Having formalised the Myhill-Nerode |
872 expressions can be conveniently defined as a datatype in a HOL-based theorem |
870 theorem means pushed quite far. Having the Myhill-Nerode theorem means we can |
873 prover. For us it was therefore interesting to find out how far we can push |
871 formalise much of the textbook results in this subject. |
874 this point of view. |
872 |
875 |
873 Our proof of the first direction is very much inspired by \emph{Brz |
876 Having formalised the Myhill-Nerode theorem means we |
874 algebraic mehod} used to convert a finite atomaton to a regular |
877 pushed quite far. Using this theorem we can obviously prove when a language |
|
878 is \emph{not} regular---by establishing that it has infinitely many |
|
879 equivalence classes generated by the Myhill-Nerode relation (this is usually |
|
880 the purpose of the pumping lemma \cite{Kozen97}). We can also use it to |
|
881 establish the standard textbook results about closure properties of regular |
|
882 languages. Interesting is the case of closure under complement, because |
|
883 it seems difficult to construct a regular expression for the complement |
|
884 language by direct means. However the existence can be easily proved using |
|
885 the Myhill-Nerode theorem since clearly |
|
886 |
|
887 \begin{center} |
|
888 @{term "s\<^isub>1 \<approx>A s\<^isub>2"} if and only if @{term "s\<^isub>1 \<approx>(-A) s\<^isub>2"} |
|
889 \end{center} |
|
890 |
|
891 \noindent |
|
892 holds for any strings @{text "s\<^isub>1"} and @{text |
|
893 "s\<^isub>2"}. Therefore @{text A} and @{term "-A"} give rise to the same |
|
894 partitions. From the closure under complementation follows also the closure |
|
895 under intersection and set difference by some simple set calculations. |
|
896 Proving the same result via automata would be quite involved. It includes the |
|
897 steps: regular expression @{text "\<Rightarrow>"} non-deterministic automaton @{text |
|
898 "\<Rightarrow>"} deterministic automaton @{text "\<Rightarrow>"} complement automaton @{text "\<Rightarrow>"} |
|
899 regular expression. |
|
900 |
|
901 Our formalisation consists of ??? lines of Isar code for the first |
|
902 direction and ??? for the second. While this might be seen as too large |
|
903 to count as a concise proof pearl, this should be seen in the context |
|
904 of the work done by Constable at al \cite{Constable00} who formalised |
|
905 the Myhill-Nerode theorem in Nuprl using automata. They write that |
|
906 their four-member team needed something on the magnitute of 18 months |
|
907 to formalise the Myhill-Nerode theorem. Our estimate is that we needed |
|
908 approximately 3 months for our fomalisation and this included the time |
|
909 to find our proof arguments, as we could not find them in the literature. |
|
910 So for us the formalisation was not the bottleneck. It is hard for us |
|
911 to gauge the size of a formalisation in Nurpl, but from what is shown in |
|
912 the Nuprl Math Library their development seems substantially larger. |
|
913 |
|
914 Our proof of the first direction is very much inspired by \emph{Brzozowski's |
|
915 algebraic mehod} used to convert a finite automaton to a regular |
875 expression. The close connection can be seen by considering the equivalence |
916 expression. The close connection can be seen by considering the equivalence |
876 classes as the states of the minimal automaton for the regular language. |
917 classes as the states of the minimal automaton for the regular language. |
877 However there are some subtle differences. If we identify equivalence |
918 However there are some subtle differences. If we identify equivalence |
878 classes with the states of the automaton, then the most natural choice is to |
919 classes with the states of the automaton, then the most natural choice is to |
879 characterise each state with the set of strings starting from the initial |
920 characterise each state with the set of strings starting from the initial |
880 state leading up to that state. Usually the states are characterised as the |
921 state leading up to that state. Usually the states are characterised as the |
881 ones starting from that state leading to the terminal states. The first |
922 ones starting from that state leading to the terminal states. The first |
882 choice has consequences how the initial equational system is set up. We have |
923 choice has consequences how the initial equational system is set up. We have |
883 the $\lambda$-term on our ``initial state'', while Brz has it on the |
924 the $\lambda$-term on our ``initial state'', while Brzozowski has it on the |
884 terminal states. This means we also need to reverse the direction of Arden's |
925 terminal states. This means we also need to reverse the direction of Arden's |
885 lemma. |
926 lemma. |
886 |
927 |
887 We briefly considered using the method Brz presented in the Appendix of ??? |
928 We briefly considered using the method Brzozowski presented in the Appendix |
888 in order to prove the second direction of the Myhill-Nerode thereom. There |
929 of \cite{Brzozowski64} in order to prove the second direction of the |
889 he calculates the derivatives for regular expressions and shows that there |
930 Myhill-Nerode theorem. There he calculates the derivatives for regular |
890 can be only finitely many of them. We could use as the tag of a string |
931 expressions and shows that there can be only finitely many of them. We could |
891 @{text s} the derivative of a regular expression generated with respect to |
932 use as the tag of a string @{text s} the derivative of a regular expression |
892 @{text s}. Using the fact that two strings are Myhill-Nerode related |
933 generated with respect to @{text s}. Using the fact that two strings are |
893 whenever their derivative is the same together with the fact that there are |
934 Myhill-Nerode related whenever their derivative is the same together with |
894 only finitely many derivatives for a regular expression would give us the |
935 the fact that there are only finitely many derivatives for a regular |
895 same argument. However it seems not so easy to calculate the derivatives |
936 expression would give us the same argument. However it seems not so easy to |
896 and then to count them. Therefore we preferred our direct method of |
937 calculate the derivatives and then to count them. Therefore we preferred our |
897 using tagging-functions involving equivalence classes. This is also where |
938 direct method of using tagging-functions involving equivalence classes. This |
898 our method shines, because we can completely side-step the standard |
939 is also where our method shines, because we can completely side-step the |
899 argument \cite{Kozen97} where automata need to be composed, which is not so |
940 standard argument \cite{Kozen97} where automata need to be composed, which |
900 convenient to formalise in a HOL-based theorem prover. |
941 is not so convenient to formalise in a HOL-based theorem prover. |
901 |
942 |
902 |
943 While regular expressions are convenient in formalisations, they have some |
903 Lines of code / nuprl |
944 limitations. One is that there seems to be no notion of a minimal regular |
904 |
945 expression, like there is a notion of a minimal automaton for a regular |
905 closure properties |
946 expression. |
906 |
947 |
907 *} |
948 *} |
908 |
949 |
909 |
950 |
910 (*<*) |
951 (*<*) |
911 end |
952 end |